【Python】 requests 爬取部落格園內容AttributeError: 'NoneType' object has no attribute 'xpath'

【Python】 requests 爬取部落格園內容AttributeError: ‘NoneType’ object has no attribute ‘xpath’

2020 年 8 月 10 日
筆記
python學習之路

本篇文章主要介紹requests獲取網頁內容出現 ‘NoneType’ object has no attribute ‘xpath’ 異常的解決思路

下面是出錯的程式碼：

import requests
from lxml import etree
response = requests.get('//blog.csdn.net/it_xf?viewmode=contents')
etree_html = etree.HTML(response.text)
content = etree_html.xpath('//*[@id="mainBox"]/main/div[2]/div[1]/h4/a/text()')

for each in content:
    replace = each.replace('\n', '').replace(' ', '')
    if replace == '\n' or replace == '':
        continue
    else:
        print(replace)

1、錯誤分析

獲取到的html.text 為空字元串；所以下面拋出異常NoneType

原因是請求Get 需要增加 headers來解決反扒；模擬瀏覽器請求來獲取數據；

2、解決辦法

首先找到需要的headers，headers 如何尋找？看下圖的標記：

然後直接把上面的headers複製出來放到程式碼中進行改造；

改造後的程式碼如下：

 1 import requests
 2 from lxml import etree
 3 headers = {
 4     'user-agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) '
 5                  'AppleWebKit/537.36 (KHTML, like Gecko) '
 6                  'Chrome/58.0.3029.110 Safari/537.36 SE 2.X MetaSr 1.0'
 7 }
 8 response = requests.get('//blog.csdn.net/it_xf?viewmode=contents', headers = headers)
 9 etree_html = etree.HTML(response.text)
10 content = etree_html.xpath('//*[@id="mainBox"]/main/div[2]/div[1]/h4/a/text()')
11 
12 for each in content:
13     replace = each.replace('\n', '').replace(' ', '')
14     if replace == '\n' or replace == '':
15         continue
16     else:
17         print(replace)

Tags: python學習之路

【Python】 requests 爬取部落格園內容AttributeError: ‘NoneType’ object has no attribute ‘xpath’

VirMach 便宜 VPS

QNews

【Python】 requests 爬取部落格園內容AttributeError: ‘NoneType’ object has no attribute ‘xpath’

分享此文：

Related Posts

[排序演算法] 快速排序 (C++) (含三種寫法)

刷群上牆，揭秘Q群SEO背後的秘密

Ambiguous mapping. Cannot map ‘xxxController’ method

全球變暖警告！加拿大最後一個完整北極冰架坍塌

VirMach 便宜 VPS

QNews

熱門搜尋