python爬蟲思路

2020 年 1 月 10 日
筆記

python2 爬蟲：從網頁上採取數據爬蟲模組：urllib,urllib2,re,bs4,requests,scrapy,xlml 1.urllib 2.request 3.bs4 4.正則re 5種數據類型 (1)數字Number (2)字元串String (3)列表List[] 中文在可迭代對象就是unicode對象 (4)元組Tuple() (5)字典Set{} 爬蟲思路： 1.靜態 urlopen打開網頁——獲取源碼read 2.requests(模組) get/post請求—-獲取源碼 text()方法 content()方法(建議) 3.bs4 能夠解析HTML和XML #-– coding:utf-8 –– from bs4 import BeautifulSoup #1 #html="<div>2018.1.8 14:03</div>" #soup=BeautifulSoup(html,'html.parser') #解析網頁 #print soup.div #2從文件中讀取 html='' soup=BeautifulSoup(open('index.html'),'html.parser') print soup.prettify() 4.獲取所需資訊

python爬蟲思路

VirMach 便宜 VPS

QNews

python爬蟲思路

分享此文：

Related Posts

Water 2.5.9 發布，一站式服務治理平台

低功耗藍牙 ATT/GATT/Service/Characteristic 規格解讀

利用requests和正則表達式爬取虎撲

13條Python2.x和3.x的區別？

VirMach 便宜 VPS

QNews

熱門搜尋