Python的urllib庫

2019 年 10 月 7 日
筆記

本文鏈接：https://blog.csdn.net/weixin_36670529/article/details/101290763

urllib是python自帶的請求庫，各種功能相比較之下也是比較完備的，urllib庫包含了一下四個模組：

urllib.request 請求模組

urllib.error 異常處理模組

urllib.parse url解析模組

urllib.robotparse robots.txt解析模組

下面是一些urllib庫的使用方法。

使用urllib.request

import urllib.request    response = urllib.request.urlopen('http://www.bnaidu.com')  print(response.read().decode('utf-8'))

使用read()方法列印網頁的HTML，read出來的是位元組流,需要decode一下

import urllib.request    response = urllib.request.urlopen('http://www.baidu.com')  print(response.status) #列印狀態碼資訊  其方法和response.getcode() 一樣  都是列印當前response的狀態碼  print(response.getheaders()) #列印出響應的頭部資訊，內容有伺服器類型，時間、文本內容、連接狀態等等  print(response.getheader('Server'))  #這種拿到響應頭的方式需要加上參數，指定你想要獲取的頭部中那一條數據  print(response.geturl())  #獲取響應的url  print(response.read())#使用read()方法得到響應體內容，這時是一個位元組流bytes，看到明文還需要decode為charset格式

為一個請求添加請求頭，偽裝為瀏覽器

1.在請求時就加上請求頭參數

import urllib.request  import urllib.parse    url = 'http://httpbin.org/post'  header = {}  header['User-Agent'] = 'Mozilla/5.0 '                             '(Macintosh; U; Intel Mac OS X 10_6_8; en-us) AppleWebKit/534.50 '                             '(KHTML, like Gecko) Version/5.1 Safari/534.50'    req = urllib.request.Request(url=url, headers=header)  res = urllib.request.urlopen(req)

Request是一個請求類，在構造時將headers以參數形式加入到請求中

2.使用動態追加headers的方法

若要使用動態追加的方法，必須實例化Request這個類

import urllib.request  import urllib.parse    url = 'http://httpbin.org/post'    req = urllib.request.Request(url=url)  req.add_header('User-Agent','Mozilla/5.0 (Windows NT 6.1; WOW64; rv:53.0) Gecko/20100101 Firefox/53.0')  res = urllib.request.urlopen(req)

使用代理：

ProxyHandler是urllib.request下的一個類，藉助這個類可以構造代理請求

參數為一個dict形式的，key對應著類型，IP，埠

import urllib.request    proxy_handler = urllib.request.ProxyHandler({      'http':'112.35.29.53:8088',      'https':'165.227.169.12:80'  })  opener = urllib.request.build_opener(proxy_handler)  response = opener.open('http://www.baidu.com')  print(response.read())

urllib.parse的用法

import urllib.request  import urllib.parse    url = 'http://httpbin.org/post'  header = {}  header['User-Agent'] = 'Mozilla/5.0 '                             '(Macintosh; U; Intel Mac OS X 10_6_8; en-us) AppleWebKit/534.50 '                             '(KHTML, like Gecko) Version/5.1 Safari/534.50'    data = {}  data['name'] = 'us'  data = urllib.parse.urlencode(data).encode('utf-8')  req = urllib.request.Request(url=url, data=data, headers=header, method='POST')  response = urllib.request.urlopen(req)  print(response.read().decode('utf-8'))  print(type(data))

Python的urllib庫

VirMach 便宜 VPS

QNews

Python的urllib庫

分享此文：

Related Posts

大家是怎麼做APP介面的版本控制的？歡迎進來看看我的方案。升級版的Versioning

痞子衡嵌入式：利用GPIO模組來測量i.MXRT1xxx的系統中斷延遲時間

One layer SoftMax Classifier, "Handwriting recognition"

Java記憶體模型以及執行緒安全的可見性問題

VirMach 便宜 VPS

QNews

熱門搜尋