（Python基礎教程之十三）Python中使用httplib2 – HTTP GET和POST示例

2020 年 5 月 17 日
筆記
Python基礎教程

學習使用Python httplib2模塊。的超文本傳輸協議（HTTP）是用於分佈式，協作，超媒體信息系統的應用協議。HTTP是萬維網數據通信的基礎。

Python httplib2模塊提供了用於通過HTTP訪問Web資源的方法。它支持許多功能，例如HTTP和HTTPS，身份驗證，緩存，重定向和壓縮。

$ service nginx status

* nginx is running

我們在本地主機上運行nginx Web服務器。我們的一些示例將連接到本地運行的nginx服務器上的PHP腳本。

目錄
檢查httplib2庫版本
 使用httplib2讀取網頁
 發送HTTP HEAD請求
 發送HTTP GET請求
 發送HTTP POST請求
 發送用戶代理信息
 將用戶名/密碼添加到請求

檢查httplib2庫版本

第一個程序打印庫的版本，其版權和文檔字符串。

#!/usr/bin/python3

import httplib2

print(httplib2.__version__)

print(httplib2.__copyright__)

print(httplib2.__doc__)

在httplib2.__version__給出的版本httplib2庫中，httplib2.__copyright__給出了其版權，以及httplib2.__doc__它的文檔字符串。

$ ./version.py

0.8

Copyright 2006, Joe Gregorio

httplib2

A caching http interface that supports ETags and gzip

to conserve bandwidth.

Requires Python 3.0 or later

Changelog:

2009-05-28, Pilgrim: ported to Python 3

2007-08-18, Rick: Modified so it's able to use a socks proxy if needed.

這是示例的示例輸出。

使用httplib2讀取網頁

在下面的示例中，我們展示了如何從名為www.something.com的網站獲取HTML內容。

#!/usr/bin/python3

import httplib2

http = httplib2.Http()

content = http.request("[//www.something.com](//www.something.com)")[1]

print(content.decode())

使用創建一個HTTP客戶端httplib2.HTTP()。使用該request()方法創建一個新的HTTP請求。默認情況下，它是一個GET請求。返回值是響應和內容的元組。

$ ./get_content.py

<html><head><title>Something.</title></head>

<body>Something.</body>

</html>

這是示例的輸出。

剝離HTML標籤

以下程序獲取一個小型網頁，並剝離其HTML標籤。

#!/usr/bin/python3

import httplib2

import re

http = httplib2.Http()

content = http.request("[//www.something.com](//www.something.com)")[1]

stripped = re.sub('<[^<]+?>', '', content.decode())

print(stripped)

一個簡單的正則表達式用於剝離HTML標記。請注意，我們正在剝離數據，我們沒有對其進行清理。（這是兩件事。）

$ ./strip_tags.py

Something.

Something.

該腳本將打印網頁的標題和內容。

檢查響應狀態

響應對象包含status提供響應狀態代碼的屬性。

#!/usr/bin/python3

import httplib2

http = httplib2.Http()

resp = http.request("[//www.something.com](//www.something.com)")[0]

print(resp.status)

resp = http.request("[//www.something.com/news/](//www.something.com/news/)")[0]

print(resp.status)

我們使用request()方法執行兩個HTTP請求，並檢查返回的狀態。

$ ./get_status.py

200

404

200是成功HTTP請求的標準響應，而404則表明找不到所請求的資源。

發送HTTP HEAD請求

HTTP HEAD方法檢索文檔標題。標頭由字段組成，包括日期，服務器，內容類型或上次修改時間。

#!/usr/bin/python3

import httplib2

http = httplib2.Http()

resp = http.request("[//www.something.com](//www.something.com)", "HEAD")[0]

print("Server: " + resp['server'])

print("Last modified: " + resp['last-modified'])

print("Content type: " + resp['content-type'])

print("Content length: " + resp['content-length'])

該示例打印服務器，上次修改時間，內容類型和www.something.com網頁的內容長度。

$ ./do_head.py

Server: Apache/2.4.12 (FreeBSD) OpenSSL/1.0.1l-freebsd mod_fastcgi/mod_fastcgi-SNAP-0910052141

Last modified: Mon, 25 Oct 1999 15:36:02 GMT

Content type: text/html

Content length: 72

這是程序的輸出。從輸出中，我們可以看到該網頁是由FreeBSD託管的Apache Web服務器交付的。該文檔的最後修改時間為1999年。網頁是HTML文檔，其長度為72個位元組。

發送HTTP GET請求

HTTP GET方法請求指定資源的表示形式。對於此示例，我們還將使用greet.php腳本：

<?php

echo "Hello " . htmlspecialchars($_GET['name']);

?>

在/usr/share/nginx/html/目錄內，我們有此greet.php文件。該腳本返回name變量的值，該值是從客戶端檢索的。

該htmlspecialchars()函數將特殊字符轉換為HTML實體；例如＆到＆amp.。

#!/usr/bin/python3

import httplib2

http = httplib2.Http()

content = http.request("[//localhost/greet.php?name=Peter](//localhost/greet.php?name=Peter)",

method="GET")[1]

print(content.decode())

該腳本將帶有值的變量發送到服務器上的PHP腳本。該變量直接在URL中指定。

$ ./mget.py

Hello Peter

這是示例的輸出。

$ tail -1 /var/log/nginx/access.log

127.0.0.1 - - [21/Aug/2016:17:32:31 +0200] "GET /greet.php?name=Peter HTTP/1.1" 200 42 "-"

"Python-httplib2/0.8 (gzip)"

我們檢查了nginx訪問日誌。

發送HTTP POST請求

POST請求方法請求Web服務器接受並存儲請求消息正文中包含的數據。上載文件或提交完整的Web表單時經常使用它。

<?php

echo "Hello " . htmlspecialchars($_POST['name']);

?>

在本地Web服務器上，我們有此target.php文件。它只是將過帳的值打印回客戶。

#!/usr/bin/python3

import httplib2

import urllib

http = httplib2.Http()

body = {'name': 'Peter'}

content = http.request("[//localhost/target.php](//localhost/target.php)",

method="POST",

headers={'Content-type': 'application/x-www-form-urlencoded'},

body=urllib.parse.urlencode(body) )[1]

print(content.decode())

腳本發送name帶有Peter值的鍵的請求。數據使用urllib.parse.urlencode()方法進行編碼，並在請求的正文中發送。

$ ./mpost.py

Hello Peter

這是mpost.py腳本的輸出。

$ tail -1 /var/log/nginx/access.log

127.0.0.1 - - [23/Aug/2016:12:21:07 +0200] "POST /target.php HTTP/1.1"

200 37 "-" "Python-httplib2/0.8 (gzip)"

使用POST方法時，不會在請求URL中發送該值。

發送用戶代理信息

在本節中，我們指定用戶代理的名稱。

<?php

echo $_SERVER['HTTP_USER_AGENT'];

?>

在nginx文檔根目錄下，我們有agent.php文件。它返回用戶代理的名稱。

#!/usr/bin/python3

import httplib2

http = httplib2.Http()

content = http.request("[//localhost/agent.php](//localhost/agent.php)", method="GET",

headers={'user-agent': 'Python script'})[1]

print(content.decode())

該腳本向腳本創建一個簡單的GET請求agent.php。在headers字典中，我們指定用戶代理。PHP腳本將讀取此內容，並將其返回給客戶端。

$ ./user_agent.py

Python script

服務器使用我們隨請求發送的代理名稱進行了響應。

向請求添加用戶名/密碼

客戶端的add_credentials()方法設置用於領域的名稱和密碼。安全領域是一種用於保護Web應用程序資源的機制。

$ sudo apt-get install apache2-utils

$ sudo htpasswd -c /etc/nginx/.htpasswd user7

New password:

Re-type new password:

Adding password for user user7

我們使用該htpasswd工具創建用於基本HTTP身份驗證的用戶名和密碼。

location /secure {

auth_basic "Restricted Area";

auth_basic_user_file /etc/nginx/.htpasswd;

}

在nginx /etc/nginx/sites-available/default配置文件中，我們創建一個安全頁面。領域的名稱為「禁區」。

<!DOCTYPE html>

<html lang="en">

<head>

<title>Secure page</title>

</head>

<body>

<p>

This is a secure page.

</p>

</body>

</html>

在/usr/share/nginx/html/secure目錄中，我們有上面的HTML文件。

#!/usr/bin/python3

import httplib2

user = 'user7'

passwd = '7user'

http = httplib2.Http()

http.add_credentials(user, passwd)

content = http.request("[//localhost/secure/](//localhost/secure/)")[1]

print(content.decode())

該腳本連接到安全網頁；它提供訪問該頁面所需的用戶名和密碼。

$ ./credentials.py

<!DOCTYPE html>

<html lang="en">

<head>

<title>Secure page</title>

</head>

<body>

<p>

This is a secure page.

</p>

</body>

</html>

使用正確的憑據，腳本將返回受保護的頁面。

在本教程中，我們探索了Python httplib2模塊。

Tags: Python基礎教程