国产片侵犯亲女视频播放_亚洲精品二区_在线免费国产视频_欧美精品一区二区三区在线_少妇久久久_在线观看av不卡

服務器之家:專注于服務器技術及軟件下載分享
分類導航

PHP教程|ASP.NET教程|Java教程|ASP教程|編程技術|正則表達式|C/C++|IOS|C#|Swift|Android|VB|R語言|JavaScript|易語言|vb.net|

服務器之家 - 編程語言 - Java教程 - java爬取并下載酷狗TOP500歌曲的方法

java爬取并下載酷狗TOP500歌曲的方法

2021-06-28 10:36后山悟道人 Java教程

這篇文章主要介紹了java爬取并下載酷狗TOP500歌曲的方法,非常具有實用價值,需要的朋友可以參考下

是這樣的,之前買車送的垃圾記錄儀不能用了,這兩天狠心買了好點的記錄儀,帶導航、音樂、藍牙、4g等功能,尋思,既然有這些功能就利用起來,用4g聽歌有點奢侈,就準備去酷狗下點歌聽,居然都是需要辦會員才能下載,而且vip一月只能下載300首,我這么窮又這么摳怎么可能沖會員,于是百度搜了下怎么免費下載,都是python爬取,雖然也會一點,但是電腦上沒安裝python,再安裝再研究感覺有點費勁,于是就花了半小時做了這個爬蟲,技術一般,只記錄分析實現過程,大牛請繞行。其中用到了一些庫,包括:jsoup、httpclient、net.sf.json大家可以自行去下載jar包

1、分析是否能獲得top500歌單

首先,打開酷狗首頁查看酷狗top500,說好的500首,怎么就只有22首呢,

java爬取并下載酷狗TOP500歌曲的方法

是真的只讓看這些還是能找到其余的呢,于是我就看了下這top500的鏈接

https://www.kugou.com/yy/rank/home/1-8888.html?from=rank

可以看的出home后邊有個1,難道這是代表第一頁的意思?于是我就把1改成2,進入,果然進入了第二頁,至此可以知道我們可以在網頁里獲取這500首的歌單。

2.分析找到真正的mp3下載地址(這個有點繞)

點一個歌曲進入播放頁面,使用谷歌瀏覽器的控制臺的elements,搜一下mp3,很輕松就定位到了mp3的位置,

java爬取并下載酷狗TOP500歌曲的方法

但是使用java訪問的時候爬取的html里卻沒有該mp3的文件地址,那么這肯定是在該頁面的位置使用了js來加載mp3,那么刷新下網頁,看網頁加載了哪些東西,加載的東西有點多,著重看一下js、php的請求,主要是看里面有沒有mp3的地址,分析細節就不用說了,

java爬取并下載酷狗TOP500歌曲的方法

最終我在列表的

https://wwwapi.kugou.com/yy/index.php?r=play/getdata&callback=jquery191027067069941080546_1546235744250&hash=667939c6e784265d541deee65ae4f2f8&album_id=0&_=1546235744251

這個請求里發現了mp3的完整地址,

"play_url": "http://fs.w.kugou.com/201812311325/dcf5b6449160903c6ee48035e11434bb/g128/m08/02/09/iicbafrzqf2anoadadn94ubomau995.mp3",

那這個js是怎么判斷是哪首歌的呢,那么只可能是hash這個參數來決定歌曲的,然后到播放頁面里找到這個hash的位置,是在下面的js里

?
1
2
3
var datafromsmarty = [{"hash":"667939c6e784265d541deee65ae4f2f8","timelength":"237051","audio_name":"u767du5c0fu767d - u6700u7f8eu5a5au793c","author_name":"u767du5c0fu767d","song_name":"u6700u7f8eu5a5au793c","album_id":0}],//當前頁面歌曲信息
      playtype = "search_single";//當前播放
  </script>

在去java爬取該網頁,查看能否爬到這個hash,果然,爬取的html里有這段js,到現在mp3的地址也找到了,歌單也找到了,那么下一步就用程序實現就可以了。

3.java實現爬取酷狗mp3

先看一下爬取結果

java爬取并下載酷狗TOP500歌曲的方法

找到了資源,程序實現就好說了,其中使用到了自己寫的幾個工具類,自己整理點自己的工具類還是有好處的,以后遇到什么問題就沒必要重新寫了,直接拿來用就可以了。沒什么好說的了,下面直接貼出源碼

spiderkugou.java

  1. package com.bing.spider; 
  2.  
  3. import java.io.ioexception; 
  4. import java.util.regex.matcher; 
  5. import java.util.regex.pattern; 
  6.  
  7. import org.jsoup.nodes.document; 
  8. import org.jsoup.nodes.element; 
  9. import org.jsoup.select.elements; 
  10.  
  11. import com.bing.download.filedownload; 
  12. import com.bing.html.htmlmanage; 
  13. import com.bing.http.httpgetconnect; 
  14.  
  15. import net.sf.json.jsonobject; 
  16.  
  17. public class spiderkugou { 
  18.  
  19.     public static string filepath = "f:/music/"
  20.     public static string mp3 = "https://wwwapi.kugou.com/yy/index.php?r=play/getdata&callback=jquery191027067069941080546_1546235744250&" 
  21.             + "hash=hash&album_id=0&_=time"
  22.      
  23.     public static string link = "https://www.kugou.com/yy/rank/home/page-8888.html?from=rank"
  24.     //"https://www.kugou.com/yy/rank/home/page-23784.html?from=rank"; 
  25.      
  26.      
  27.     public static void main(string[] args) throws ioexception { 
  28.          
  29.         for(int i = 1 ; i < 23 ; i++){ 
  30.             string url = link.replace("page", i + ""); 
  31.             gettitle(url); 
  32.             //download("https://www.kugou.com/song/mfy6je5.html"); 
  33.         } 
  34.     } 
  35.      
  36.     public static string gettitle(string url) throws ioexception{ 
  37.         httpgetconnect connect = new httpgetconnect(); 
  38.         string content = connect.connect(url, "utf-8"); 
  39.         htmlmanage html = new htmlmanage(); 
  40.         document doc = html.manage(content); 
  41.         element ele = doc.getelementsbyclass("pc_temp_songlist").get(0); 
  42.         elements eles = ele.getelementsbytag("li"); 
  43.         for(int i = 0 ; i < eles.size() ; i++){ 
  44.             element item = eles.get(i); 
  45.             string title = item.attr("title").trim(); 
  46.             string link = item.getelementsbytag("a").first().attr("href"); 
  47.              
  48.             download(link,title); 
  49.         } 
  50.         return null
  51.     } 
  52.      
  53.     public static string download(string url,string name) throws ioexception{ 
  54.         string hash = ""
  55.         httpgetconnect connect = new httpgetconnect(); 
  56.         string content = connect.connect(url, "utf-8"); 
  57.         htmlmanage html = new htmlmanage(); 
  58.          
  59.         string regex = ""hash":"[0-9a-z]+""
  60.         // 編譯正則表達式 
  61.         pattern pattern = pattern.compile(regex); 
  62.         matcher matcher = pattern.matcher(content); 
  63.         if (matcher.find()) { 
  64.             hash = matcher.group(); 
  65.             hash = hash.replace(""hash":"", ""); 
  66.             hash = hash.replace(""", ""); 
  67.         } 
  68.          
  69.         string item = mp3.replace("hash", hash); 
  70.         item = item.replace("time", system.currenttimemillis() + ""); 
  71.  
  72.         system.out.println(item); 
  73.         string mp = connect.connect(item, "utf-8"); 
  74.          
  75.         mp = mp.substring(mp.indexof("(") + 1, mp.length() - 3); 
  76.          
  77.         jsonobject json = jsonobject.fromobject(mp); 
  78.         string playurl = json.getjsonobject("data").getstring("play_url"); 
  79.          
  80.  
  81.         system.out.print(playurl + " == "); 
  82.         filedownload down = new filedownload(); 
  83.         down.download(playurl, filepath + name + ".mp3"); 
  84.          
  85.         system.out.println(name + "下載完成"); 
  86.         return playurl; 
  87.     } 
  88.  

httpgetconnect.java

 

?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
package com.bing.http;
 
import java.io.bufferedreader;
import java.io.ioexception;
import java.io.inputstream;
import java.io.inputstreamreader;
import java.security.nosuchalgorithmexception;
import java.security.cert.certificateexception;
import java.security.cert.x509certificate;
 
import javax.net.ssl.sslcontext;
import javax.net.ssl.trustmanager;
import javax.net.ssl.x509trustmanager;
 
import org.apache.commons.logging.log;
import org.apache.commons.logging.logfactory;
import org.apache.http.httpentity;
import org.apache.http.client.clientprotocolexception;
import org.apache.http.client.httpclient;
import org.apache.http.client.responsehandler;
import org.apache.http.client.config.requestconfig;
import org.apache.http.client.methods.closeablehttpresponse;
import org.apache.http.client.methods.httpget;
import org.apache.http.conn.clientconnectionmanager;
import org.apache.http.conn.scheme.scheme;
import org.apache.http.conn.scheme.schemeregistry;
import org.apache.http.conn.ssl.sslsocketfactory;
import org.apache.http.impl.client.basicresponsehandler;
import org.apache.http.impl.client.closeablehttpclient;
import org.apache.http.impl.client.defaulthttpclient;
import org.apache.http.impl.client.httpclients;
import org.apache.http.impl.conn.basichttpclientconnectionmanager;
import org.apache.http.params.httpparams;
/**
 * @說明:
 * @author: gaoll
 * @createtime:2014-11-13
 * @modifytime:2014-11-13
 */
public class httpgetconnect {
    
    /**
     * 獲取html內容
     * @param url
     * @param charsetname utf-8、gb2312
     * @return
     * @throws ioexception
     */
    public static string connect(string url,string charsetname) throws ioexception{
        basichttpclientconnectionmanager connmanager = new basichttpclientconnectionmanager();
        
        closeablehttpclient httpclient = httpclients.custom()
          .setconnectionmanager(connmanager)
          .build();
        string content = "";
        
        try{
            httpget httpget = new httpget(url);
            
            requestconfig requestconfig = requestconfig.custom()
            .setsockettimeout(5000)
            .setconnecttimeout(50000)
            .setconnectionrequesttimeout(50000)
            .build();
        httpget.setconfig(requestconfig);
        httpget.setheader("accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8");
        httpget.setheader("accept-encoding", "gzip,deflate,sdch");
        httpget.setheader("accept-language", "zh-cn,zh;q=0.8");
        httpget.setheader("connection", "keep-alive");
        httpget.setheader("upgrade-insecure-requests", "1");
        httpget.setheader("user-agent", "mozilla/5.0 (windows nt 6.1; wow64) applewebkit/537.36 (khtml, like gecko) chrome/45.0.2454.101 safari/537.36");
        //httpget.setheader("hosts", "www.oschina.net");
        httpget.setheader("cache-control", "max-age=0");
          
            closeablehttpresponse response = httpclient.execute(httpget);
            
            int status = response.getstatusline().getstatuscode();
      if (status >= 200 && status < 300) {
        
        httpentity entity = response.getentity();
        inputstream instream = entity.getcontent();
        bufferedreader br = new bufferedreader(new inputstreamreader(instream,charsetname));
                stringbuffer sbf = new stringbuffer();
                string line = null;
                while ((line = br.readline()) != null){
                    sbf.append(line + " ");
                }
 
                br.close();
                content = sbf.tostring();
      } else {
        content = "";
      }
            
        }catch(exception e){
            e.printstacktrace();
        }finally{
            httpclient.close();
        }
        //log.info("content is " + content);
        return content;
    }
    private static log log = logfactory.getlog(httpgetconnect.class);
}

htmlmanage.java

 

?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
package com.bing.html;
 
import java.io.ioexception;
import java.util.arraylist;
import java.util.list;
 
import org.apache.commons.logging.log;
import org.apache.commons.logging.logfactory;
import org.jsoup.jsoup;
import org.jsoup.nodes.document;
import org.jsoup.nodes.element;
import org.jsoup.select.elements;
 
import com.bing.http.httpgetconnect;
 
/**
 * @說明:
 * @author: gaoll
 * @createtime:2014-11-13
 * @modifytime:2014-11-13
 */
public class htmlmanage {
    
    public document manage(string html){
        document doc = jsoup.parse(html);
        return doc;
    }
    
    public document managedirect(string url) throws ioexception{
        document doc = jsoup.connect( url ).get();
        return doc;
    }
    
    public list<string> managehtmltag(document doc,string tag ){
        list<string> list = new arraylist<string>();
        
        elements elements = doc.getelementsbytag(tag);
        for(int i = 0; i < elements.size() ; i++){
            string str = elements.get(i).html();
            list.add(str);
        }
        return list;
    }
    
    public list<string> managehtmlclass(document doc,string clas ){
        list<string> list = new arraylist<string>();
        
        elements elements = doc.getelementsbyclass(clas);
        for(int i = 0; i < elements.size() ; i++){
            string str = elements.get(i).html();
            list.add(str);
        }
        return list;
    }
    
    public list<string> managehtmlkey(document doc,string key,string value ){
        list<string> list = new arraylist<string>();
        
        elements elements = doc.getelementsbyattributevalue(key, value);
        for(int i = 0; i < elements.size() ; i++){
            string str = elements.get(i).html();
            list.add(str);
        }
        return list;
    }
    
    private static log log = logfactory.getlog(htmlmanage.class);
}

filedownload.java

 

?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
package com.bing.download;
 
import java.io.bufferedinputstream;
import java.io.bufferedoutputstream;
import java.io.file;
import java.io.fileoutputstream;
 
import org.apache.commons.logging.log;
import org.apache.commons.logging.logfactory;
import org.apache.http.client.config.requestconfig;
import org.apache.http.client.methods.closeablehttpresponse;
import org.apache.http.client.methods.httpget;
import org.apache.http.impl.client.closeablehttpclient;
import org.apache.http.impl.client.httpclients;
 
/**
 * @說明:
 * @author: gaoll
 * @createtime:2014-11-20
 * @modifytime:2014-11-20
 */
public class filedownload {
    
    /**
     * 文件下載
     * @param url 鏈接地址
     * @param path 要保存的路徑及文件名
     * @return
     */
    public static boolean download(string url,string path){
        
        boolean flag = false;
        
        closeablehttpclient httpclient = httpclients.createdefault();
        requestconfig requestconfig = requestconfig.custom().setsockettimeout(2000)
                .setconnecttimeout(2000).build();
 
        httpget get = new httpget(url);
        get.setconfig(requestconfig);
        
        bufferedinputstream in = null;
        bufferedoutputstream out = null;
        try{
            for(int i=0;i<3;i++){
                closeablehttpresponse result = httpclient.execute(get);
                system.out.println(result.getstatusline());
                if(result.getstatusline().getstatuscode() == 200){
                    in = new bufferedinputstream(result.getentity().getcontent());
                    file file = new file(path);
                    out = new bufferedoutputstream(new fileoutputstream(file));
                    byte[] buffer = new byte[1024];
                    int len = -1;
                    while((len = in.read(buffer,0,1024)) > -1){
                        out.write(buffer,0,len);
                    }
                    flag = true;
                    break;
                }else if(result.getstatusline().getstatuscode() == 500){
                    continue ;
                }
            }
            
        }catch(exception e){
            e.printstacktrace();
            flag = false;
        }finally{
            get.releaseconnection();
            try{
                if(in != null){
                    in.close();
                }
                if(out != null){
                    out.close();
                }
            }catch(exception e){
                e.printstacktrace();
                flag = false;
            }
        }
        return flag;
    }
 
    private static log log = logfactory.getlog(filedownload.class);
}

到這就結束了,有可能有些代碼沒貼全,主要代碼已經差不多,應該可以跑起來,多多指教。

以上就是本文的全部內容,希望對大家的學習有所幫助,也希望大家多多支持服務器之家。

原文鏈接:https://my.oschina.net/gllfeixiang/blog/2995570

延伸 · 閱讀

精彩推薦
Weibo Article 1 Weibo Article 2 Weibo Article 3 Weibo Article 4 Weibo Article 5 Weibo Article 6 Weibo Article 7 Weibo Article 8 Weibo Article 9 Weibo Article 10 Weibo Article 11 Weibo Article 12 Weibo Article 13 Weibo Article 14 Weibo Article 15 Weibo Article 16 Weibo Article 17 Weibo Article 18 Weibo Article 19 Weibo Article 20 Weibo Article 21 Weibo Article 22 Weibo Article 23 Weibo Article 24 Weibo Article 25 Weibo Article 26 Weibo Article 27 Weibo Article 28 Weibo Article 29 Weibo Article 30 Weibo Article 31 Weibo Article 32 Weibo Article 33 Weibo Article 34 Weibo Article 35 Weibo Article 36 Weibo Article 37 Weibo Article 38 Weibo Article 39 Weibo Article 40
主站蜘蛛池模板: 久久精品无码一区二区三区 | av网站入口| 欧美精品影院 | 国产在线专区 | 精品成人一区二区 | 一特黄a大片免费视频 视频 | 色婷婷一区二区三区 | 韩日一区二区 | 日韩av在线中文字幕 | 欧美日在线 | 青青国产视频 | 国产精品资源在线观看 | 久久久精品| a级国产黄色片 | 日本三级中文在线电影 | www.44181com | 午夜视频在线观看一区二区三区 | 国产a视频| 国产一区二区影院 | 亚洲成人xxx| 日韩欧美中文在线观看 | 欧美日韩国产精品一区二区 | 国产精品美女久久久久久免费 | 视频一区在线播放 | 一级网站在线观看 | 午夜视频网站 | 久久天天 | 懂色av一区二区三区免费观看 | 欧美日韩中文 | 四虎影院在线 | 亚洲国产精品99久久久久久久久 | 美女毛片 | 精品一区二区三区中文字幕 | 午夜精品网站 | 在线欧美一区 | 欧美日韩国产三级 | 欧美日韩高清不卡 | 国产深夜视频在线观看 | 日韩欧美一区二区三区在线观看 | 中文字幕一区二区三区四区 | 欧美久久久网站 |