今天就和大家一起來討論一下python實現(xiàn)12306余票查詢(pycharm+python3.7),一起來感受一下python爬蟲的簡單實踐
我們說先在瀏覽器中打開開發(fā)者工具(f12),嘗試一次余票的查詢,通過開發(fā)者工具查看發(fā)出請求的包
余票查詢界面
可以看到紅框框中的url就是我們向12306服務(wù)器發(fā)出的請求,那么具體是什么呢?我們來看看
https://kyfw.12306.cn/otn/leftticket/queryz?leftticketdto.train_date=2019-01-21&leftticketdto.from_station=cdw&leftticketdto.to_station=szq&purpose_codes=adult
可以看到發(fā)出請求的幾個字段:
leftticketdto.train_date:查詢的日期
leftticketdto.from_station:查詢的出發(fā)地
leftticketdto.to_station:查詢的目的地
purpose_codes:不太清楚這個字段是用來做什么的,就默認吧
可以從我們遞交的url請求看出,我們輸入的成都,深圳都變成了對應(yīng)的編號,比如,成都(cdw)、深圳(szq),所以當我們程序進行輸入的時候要進行一下處理,12306的一個地方存儲著這些城市名與編碼對應(yīng)的文檔:
https://kyfw.12306.cn/otn/resources/js/framework/station_name.js?station_version=1.8971
站點編碼對應(yīng)
下面我們就編寫一個小程序,將這些城市名與編號提取出來:
1
2
3
4
5
6
7
8
9
10
|
import re,requests url = "https://kyfw.12306.cn/otn/resources/js/framework/station_name.js?station_version=1.8971" response = requests.get(url,verify = false) #將車站的名字和編碼進行提取 chezhan = re.findall(r '([\u4e00-\u9fa5]+)\|([a-z]+)' , response.text) chezhan_code = dict (chezhan) #進行交換 chezhan_names = dict ( zip (chezhan_code.values(),chezhan_code.keys())) #打印出得到的車站字典 print (chezhan_names) |
得到的打印結(jié)果如下(只截取部分顯示):
{'vap': '北京北', 'bop': '北京東', 'bjp': '北京', 'vnp': '北京南', 'bxp': '北京西', 'izq': '廣州南', 'cuw': '重慶北', 'cqw': '重慶', 'crw': '重慶南', 'cxw': '重慶西', 'ggq': '廣州東', 'shh': '上海', 'snh': '上海南', 'aoh': '上海虹橋', 'sxh': '上海西', 'tbp': '天津北', 'tjp': '天津', 'tip': '天津南', 'txp': '天津西', 'xja': '香港西九龍', 'cct': '長春', 'cet': '長春南', 'crt': '長春西', 'icw': '成都東', 'cnw': '成都南', 'cdw': '成都', 'csq': '長沙', 'cwq': '長沙南',}
接下來我們就動手開始程序的主要代碼編寫:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|
def main(): date = input ( "請輸入時間(如2019-01-22):\n" ) from_station = chezhan_code[ input ( "請輸入起始站點:\n" )] to_station = chezhan_code[ input ( "請輸入目的站點:\n" )] url = "https://kyfw.12306.cn/otn/leftticket/queryz?" headers = { "user-agent" : "mozilla/5.0 (windows nt 6.1; wow64) applewebkit/537.36 (khtml, like gecko) chrome/63.0.3239.26 safari/537.36 core/1.63.5702.400 qqbrowser/10.2.1893.400" } url = url + "leftticketdto.train_date=" + date + "&leftticketdto.from_station=" + from_station + "&leftticketdto.to_station=" + to_station + "&purpose_codes=adult" #print(url) 已經(jīng)檢查過生成的url是正確的 #request請求獲取主頁 r = requests.get(url,headers = headers) r.raise_for_status() #如果發(fā)送了一個錯誤的請求,會拋出異常 r.encoding = r.apparent_encoding showticket(r.text) |
用戶輸入時間、起始站點、目的站點,然后通過get來請求,然后我們對返回的網(wǎng)頁信息進行解析。我們現(xiàn)將上面代碼的r.text進行打印,看看我們請求之后,返回了什么樣的信息,然后決定我們應(yīng)該如何解析
運行結(jié)果
這樣看著不方便,我們粘貼到記事本中,進行詳細的分析:
請求返回的結(jié)果信息
可以與12306顯示的信息進行對比,k829是車次,cdw與bjq是出發(fā)地和目的地,10:10是出發(fā)時間,06:13是到達時間,44:21是歷時時間,20190123為查詢的日期,剩下的就是一系列票的各種信息。
下面就是對這些返回的信息進行解析,其實這也是python爬蟲的關(guān)鍵,就是解析!!!
我們先把信息轉(zhuǎn)化為json格式,可以看到都是用“|”隔開的,那么我們就用split函數(shù)分割出來,下面是主要功能代碼:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
|
def showticket(html): html = json.loads(html) table = prettytable([ " 車次 " , "出發(fā)車站" , "到達車站" , "出發(fā)時間" , "到達時間" , " 歷時 " , "商務(wù)座" , " 一等座" , "二等座" , "高級軟臥" , "軟臥" , "動臥" , "硬臥" , "軟座" , "硬座" , "無座" , "其他" , "備注" ]) for i in html[ 'data' ][ 'result' ]: name = [ "station_train_code" , "from_station_name" , "to_station_name" , "start_time" , "arrive_time" , "lishi" , "swz_num" , "zy_num" , "ze_num" , "dw_num" , "gr_num" , "rw_num" , "yw_num" , "rz_num" , "yz_num" , "wz_num" , "qt_num" , "note_num" ] data = { "station_train_code" : '', "from_station_name" : '', "to_station_name" : '', "start_time" : '', "arrive_time" : '', "lishi" : '', "swz_num" : '', "zy_num" : '', "ze_num" : '', "dw_num" : '', "gr_num" : '', "rw_num" : '', "yw_num" : '', "rz_num" : '', "yz_num" : '', "wz_num" : '', "qt_num" : '', "note_num" : '' } #將各項信息提取并賦值 item = i.split( '|' ) #使用“|”進行分割 data[ "station_train_code" ] = item[ 3 ] #獲取車次信息,在3號位置 data[ "from_station_name" ] = item[ 6 ] #始發(fā)站信息在6號位置 data[ "to_station_name" ] = item[ 7 ] #終點站信息在7號位置 data[ "start_time" ] = item[ 8 ] #出發(fā)時間在8號位置 data[ "arrive_time" ] = item[ 9 ] #抵達時間在9號位置 data[ "lishi" ] = item[ 10 ] #經(jīng)歷時間在10號位置 data[ "swz_num" ] = item[ 32 ] or item[ 25 ] #特別注意,商務(wù)座在32或25位置 data[ "zy_num" ] = item[ 31 ] #一等座信息在31號位置 data[ "ze_num" ] = item[ 30 ] #二等座信息在30號位置 data[ "gr_num" ] = item[ 21 ] #高級軟臥信息在21號位置 data[ "rw_num" ] = item[ 23 ] #軟臥信息在23號位置 data[ "dw_num" ] = item[ 27 ] #動臥信息在27號位置 data[ "yw_num" ] = item[ 28 ] #硬臥信息在28號位置 data[ "rz_num" ] = item[ 24 ] #軟座信息在24號位置 data[ "yz_num" ] = item[ 29 ] #硬座信息在29號位置 data[ "wz_num" ] = item[ 26 ] #無座信息在26號位置 data[ "qt_num" ] = item[ 22 ] #其他信息在22號位置 data[ "note_num" ] = item[ 1 ] #備注信息在1號位置 color = colored() data[ "note_num" ] = color.white(item[ 1 ]) #如果沒有信息,那么就用“-”代替 for pos in name: if data[pos] = = "": data[pos] = "-" tickets = [] cont = [] cont.append(data) for x in cont: tmp = [] for y in name: if y = = "from_station_name" : s = color.green(chezhan_names[data[ "from_station_name" ]]) tmp.append(s) elif y = = "to_station_name" : s = color.red(chezhan_names[data[ "to_station_name" ]]) tmp.append(s) elif y = = "start_time" : s = color.green(data[ "start_time" ]) tmp.append(s) elif y = = "arrive_time" : s = color.red(data[ "arrive_time" ]) tmp.append(s) elif y = = "station_train_code" : s = color.yellow(data[ "station_train_code" ]) tmp.append(s) else : tmp.append(data[y]) tickets.append(tmp) for ticket in tickets: table.add_row(ticket) print (table) |
那么我們程序就成功啦!!!
運行結(jié)果
但是在編譯器里面prettytable的格子沒有對齊,不要擔心,我們到終端運行一下腳本,就可以看到很好看的輸出啦:
終端運行結(jié)果
完成!!!下面是完整代碼
main.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
|
# -*- coding: utf-8 -*- import re,requests,datetime,time,json from prettytable import prettytable from colorama import init,fore from stationinfo import chezhan_code,chezhan_names init(autoreset = false) class colored( object ): def yeah( self ,s): return fore.lightcyan_ex + s + fore.reset def green( self ,s): return fore.lightgreen_ex + s + fore.reset def yellow( self ,s): return fore.lightyellow_ex + s + fore.reset def white( self ,s): return fore.lightwhite_ex + s + fore.reset def blue( self ,s): return fore.lightblue_ex + s + fore.reset def showticket(html): html = json.loads(html) table = prettytable([ " 車次 " , "出發(fā)車站" , "到達車站" , "出發(fā)時間" , "到達時間" , " 歷時 " , "商務(wù)座" , " 一等座" , "二等座" , "高級軟臥" , "軟臥" , "動臥" , "硬臥" , "軟座" , "硬座" , "無座" , "其他" , "備注" ]) for i in html[ 'data' ][ 'result' ]: name = [ "station_train_code" , "from_station_name" , "to_station_name" , "start_time" , "arrive_time" , "lishi" , "swz_num" , "zy_num" , "ze_num" , "dw_num" , "gr_num" , "rw_num" , "yw_num" , "rz_num" , "yz_num" , "wz_num" , "qt_num" , "note_num" ] data = { "station_train_code" : '', "from_station_name" : '', "to_station_name" : '', "start_time" : '', "arrive_time" : '', "lishi" : '', "swz_num" : '', "zy_num" : '', "ze_num" : '', "dw_num" : '', "gr_num" : '', "rw_num" : '', "yw_num" : '', "rz_num" : '', "yz_num" : '', "wz_num" : '', "qt_num" : '', "note_num" : '' } #將各項信息提取并賦值 item = i.split( '|' ) #使用“|”進行分割 data[ "station_train_code" ] = item[ 3 ] #獲取車次信息,在3號位置 data[ "from_station_name" ] = item[ 6 ] #始發(fā)站信息在6號位置 data[ "to_station_name" ] = item[ 7 ] #終點站信息在7號位置 data[ "start_time" ] = item[ 8 ] #出發(fā)時間在8號位置 data[ "arrive_time" ] = item[ 9 ] #抵達時間在9號位置 data[ "lishi" ] = item[ 10 ] #經(jīng)歷時間在10號位置 data[ "swz_num" ] = item[ 32 ] or item[ 25 ] #特別注意,商務(wù)座在32或25位置 data[ "zy_num" ] = item[ 31 ] #一等座信息在31號位置 data[ "ze_num" ] = item[ 30 ] #二等座信息在30號位置 data[ "gr_num" ] = item[ 21 ] #高級軟臥信息在21號位置 data[ "rw_num" ] = item[ 23 ] #軟臥信息在23號位置 data[ "dw_num" ] = item[ 27 ] #動臥信息在27號位置 data[ "yw_num" ] = item[ 28 ] #硬臥信息在28號位置 data[ "rz_num" ] = item[ 24 ] #軟座信息在24號位置 data[ "yz_num" ] = item[ 29 ] #硬座信息在29號位置 data[ "wz_num" ] = item[ 26 ] #無座信息在26號位置 data[ "qt_num" ] = item[ 22 ] #其他信息在22號位置 data[ "note_num" ] = item[ 1 ] #備注信息在1號位置 color = colored() data[ "note_num" ] = color.white(item[ 1 ]) #如果沒有信息,那么就用“-”代替 for pos in name: if data[pos] = = "": data[pos] = "-" tickets = [] cont = [] cont.append(data) for x in cont: tmp = [] for y in name: if y = = "from_station_name" : s = color.green(chezhan_names[data[ "from_station_name" ]]) tmp.append(s) elif y = = "to_station_name" : s = color.yeah(chezhan_names[data[ "to_station_name" ]]) tmp.append(s) elif y = = "start_time" : s = color.green(data[ "start_time" ]) tmp.append(s) elif y = = "arrive_time" : s = color.yeah(data[ "arrive_time" ]) tmp.append(s) elif y = = "station_train_code" : s = color.yellow(data[ "station_train_code" ]) tmp.append(s) else : tmp.append(data[y]) tickets.append(tmp) for ticket in tickets: table.add_row(ticket) print (table) def main(): date = input ( "請輸入時間:\n" ) from_station = chezhan_code[ input ( "請輸入起始站點:\n" )] to_station = chezhan_code[ input ( "請輸入目的站點:\n" )] url = "https://kyfw.12306.cn/otn/leftticket/queryz?" headers = { "user-agent" : "mozilla/5.0 (windows nt 6.1; wow64) applewebkit/537.36 (khtml, like gecko) chrome/63.0.3239.26 safari/537.36 core/1.63.5702.400 qqbrowser/10.2.1893.400" } url = url + "leftticketdto.train_date=" + date + "&leftticketdto.from_station=" + from_station + "&leftticketdto.to_station=" + to_station + "&purpose_codes=adult" #print(url) 已經(jīng)檢查過生成的url是正確的 #request請求獲取主頁 r = requests.get(url,headers = headers) r.raise_for_status() #如果發(fā)送了一個錯誤的請求,會拋出異常 r.encoding = r.apparent_encoding showticket(r.text) #print(r.text) main() |
stationinfo.py
1
2
3
4
5
6
7
8
|
import re,requests url = "https://kyfw.12306.cn/otn/resources/js/framework/station_name.js?station_version=1.8971" response = requests.get(url,verify = false) #將車站的名字和編碼進行提取 chezhan = re.findall(r '([\u4e00-\u9fa5]+)\|([a-z]+)' , response.text) chezhan_code = dict (chezhan) chezhan_names = dict ( zip (chezhan_code.values(),chezhan_code.keys())) #print(chezhan_names) |
總結(jié)
以上就是這篇文章的全部內(nèi)容了,希望本文的內(nèi)容對大家的學習或者工作具有一定的參考學習價值,謝謝大家對服務(wù)器之家的支持。如果你想了解更多相關(guān)內(nèi)容請查看下面相關(guān)鏈接
原文鏈接:https://blog.csdn.net/qq_41841569/article/details/86570150