【Python】 requests 爬取部落格園內容AttributeError: ‘NoneType’ object has no attribute ‘xpath’
- 2020 年 8 月 10 日
- 筆記
- python學習之路
本篇文章主要介紹requests獲取網頁內容出現 ‘NoneType’ object has no attribute ‘xpath’ 異常的解決思路
下面是出錯的程式碼:
import requests from lxml import etree response = requests.get('//blog.csdn.net/it_xf?viewmode=contents') etree_html = etree.HTML(response.text) content = etree_html.xpath('//*[@id="mainBox"]/main/div[2]/div[1]/h4/a/text()') for each in content: replace = each.replace('\n', '').replace(' ', '') if replace == '\n' or replace == '': continue else: print(replace)
1、錯誤分析
獲取到的html.text 為 空字元串;所以下面拋出異常NoneType
原因是請求Get 需要增加 headers來解決反扒;模擬瀏覽器請求來獲取數據;
2、解決辦法
首先找到需要的headers,headers 如何尋找?看下圖的標記:
然後直接把上面的headers複製出來放到程式碼中進行改造;
改造後的程式碼如下:
1 import requests 2 from lxml import etree 3 headers = { 4 'user-agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) ' 5 'AppleWebKit/537.36 (KHTML, like Gecko) ' 6 'Chrome/58.0.3029.110 Safari/537.36 SE 2.X MetaSr 1.0' 7 } 8 response = requests.get('//blog.csdn.net/it_xf?viewmode=contents', headers = headers) 9 etree_html = etree.HTML(response.text) 10 content = etree_html.xpath('//*[@id="mainBox"]/main/div[2]/div[1]/h4/a/text()') 11 12 for each in content: 13 replace = each.replace('\n', '').replace(' ', '') 14 if replace == '\n' or replace == '': 15 continue 16 else: 17 print(replace)