本文實例講述了Python打印scrapy蜘蛛抓取樹結構的方法。分享給大家供大家參考。具體如下:
通過下面這段代碼可以一目了然的知道scrapy的抓取頁面結構,調用也非常簡單
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
|
#!/usr/bin/env python import fileinput, re from collections import defaultdict def print_urls(allurls, referer, indent = 0 ): urls = allurls[referer] for url in urls: print ' ' * indent + referer if url in allurls: print_urls(allurls, url, indent + 2 ) def main(): log_re = re. compile (r '<GET (.*?)> \(referer: (.*?)\)' ) allurls = defaultdict( list ) for l in fileinput. input (): m = log_re.search(l) if m: url, ref = m.groups() allurls[ref] + = [url] print_urls(allurls, 'None' ) main() |
希望本文所述對大家的Python程序設計有所幫助。