Requests源碼閱讀v0.8.0

2019 年 11 月 30 日
筆記

工作兩年了，一直用python寫一些API之類的東西，自動化框架也有涉及，卻一直感覺對個人技能提升緩慢。決定開這個坑，是之前看到@wangshunping的read requests，生動有趣，可惜0.8.0之後沒有更新了。待我稍稍有了一點看源碼的動力，就想接著下去寫。真是漫漫長路啊，4409個commit，1000多個PR，更何況還有珠玉在前，實在沒有把握能把這塊硬骨頭給啃下來，寫一點是一點吧。作為python的小學生，一些錯誤在所難免，希望大家指出，互相討論。下面就開始吧！

目標

0.8.0 (2011-11-13)  ++++++++++++++++++    * Keep-alive support!  * Complete removal of Urllib2  * Complete removal of Poster  * Complete removal of CookieJars  * New ConnectionError raising  * Safe_mode for error catching  * prefetch parameter for request methods  * OPTION method  * Async pool size throttling  * File uploads send real names

源碼閱讀

v0.7.1

0.7.1 (2011-10-23)  ++++++++++++++++++    * Move away from urllib2 authentication handling.  * Fully Remove AuthManager, AuthObject, &c.  * New tuple-based auth system with handler callbacks.

移除urllib2的authentication處理
完全移除AuthManager, AuthObject和。。。&c？
新的元組形式的auth機制和處理器回調函數。

1. 移除`urllib2`的authentication處理

添加一個auth.py文件，加入了自己實現的auth處理器，包含http_basic和http_digest，分別對應Headers中Autohorization以Basic和Digest開頭的情形。

2. 完全刪除`AuthManager`, `AuthObject`和。。。&c？

由於介面改用了session，於是就沒有必要使用AuthManager儲存認證資訊。使用自己實現的處理器，完全刪除models.py中相關的程式碼。

3. 新的元組形式的`auth`機制和處理器回調函數。

現在：

Python

self.auth = auth_dispatch(auth)    if self.auth:      auth_func, auth_args = self.auth      r = auth_func(self, *auth_args)      self.__dict__.update(r.__dict__)

Python

def dispatch(t):      """Given an auth tuple, return an expanded version."""        if not t:          return t      else:          t = list(t)        # Make sure they're passing in something.      assert len(t) >= 2        # If only two items are passed in, assume HTTPBasic.      if (len(t) == 2):          t.insert(0, 'basic')        # Allow built-in string referenced auths.      if isinstance(t[0], basestring):          if t[0] in ('basic', 'forced_basic'):              t[0] = http_basic          elif t[0] in ('digest',):              t[0] = http_digest        # Return a custom callable.      return (t[0], tuple(t[1:]))

通過dispatch函數，若傳入二元元組，則默認前面加上'basic'，使用http_basic處理，否則需要指定處理類型。支援自定義處理器：

Python

def pizza_auth(r, username):      """Attaches HTTP Pizza Authentication to the given Request object.      """      r.headers['X-Pizza'] = username        return r    Then, we can make a request using our Pizza Auth::    >>> requests.get('http://pizzabin.org/admin', auth=(pizza_auth, 'kenneth'))  <Response [200]>

v0.7.2

0.7.2 (2011-10-23)  ++++++++++++++++++    * PATCH Fix.

修正BUG（略）

v0.7.3

0.7.3 (2011-10-23)  ++++++++++++++++++    * Digest Auth fix.

修正Digest Auth的BUG 主要是刪除了一些debug的print語句，估計當時作者腦子也不清醒了，我還注意到他改了一個文件頭的"~"的長度，是有夠無聊的！0.7.1到0.7.3都在一個多小時內完成，小夥子動力很足啊！

v0.7.4

0.7.4 (2011-10-26)  ++++++++++++++++++    * Sesion Hooks fix.

主要是一些程式碼的美化和小BUG，給session加了一個keep_alive參數，暫時還沒用上，應該是為以後做準備。

v0.7.5

0.7.5 (2001-11-04)  ++++++++++++++++++    * Response.content = None if there was an invalid repsonse.  * Redirection auth handling.

咦？日期穿越了10年？哈哈，什麼時候會改呢？

如果是無效響應則content = None
重定向認證處理

1. 無效響應`content = None`

加入一個Error Handling:

Python

try:      self._content = self.raw.read()  except AttributeError:      return None

2. 重定向認證處理

一個BUG，原來是用dispatch後的auth構造新的Request會導致錯誤，現在使用self._auth保存原始auth並傳入新的Request對象。

v0.7.6

0.7.6 (2011-11-07)  ++++++++++++++++++    * Digest authentication bugfix (attach query data to path)

Digest 認證的BUG 修復（在路徑後附上query）

原來：

Python

path = urlparse(r.request.url).path

現在：

Python

p_parsed = urlparse(r.request.url)  path = p_parsed.path + p_parsed.query

我注意到日期問題已經修復了：

Updated your 2001, to 2011… unless you went back in time 😉

這個幽默。

v0.8.0

0.8.0 (2011-11-13)  ++++++++++++++++++    * Keep-alive support!  * Complete removal of Urllib2  * Complete removal of Poster  * Complete removal of CookieJars  * New ConnectionError raising  * Safe_mode for error catching  * prefetch parameter for request methods  * OPTION method  * Async pool size throttling  * File uploads send real names

支援keep_alive參數（填坑來了）
完全拋棄urllib2
完全拋棄Poster
完全拋棄CookieJars
新的ConnectionError拋出
安全的處理異常機制。
為請求方法加入prefetch參數
新的OPTION方法
節省Async池的大小
上傳文件發送真實文件名

1. 支援`keep_alive`參數

作者在v0.8.0全面轉向urllib3，這是個第三方的輪子，它相對於urllib2最大的改進是可以重用 HTTP 連接，不用每個 request 都新建一個連接了。這樣大大加快了大量 request 時的響應速度。

Python

self.poolmanager = PoolManager(      num_pools=self.config.get('pool_connections'),      maxsize=self.config.get('pool_maxsize')  )

Python

proxy = self.proxies.get(_p.scheme)    if proxy:      conn = poolmanager.proxy_from_url(url)  else:      # Check to see if keep_alive is allowed.      if self.config.get('keep_alive'):          conn = self._poolmanager.connection_from_url(url)      else:          conn = connectionpool.connection_from_url(url)

keep_alive是默認打開的，在urllib3中維護了一個連接池，當對某個url進行請求時，會從連接池中取出該連接，然後發送請求時直接調用此連接的子方法。

2. 完全拋棄`urllib2`

刪除了models.py中用來發送請求的build_opener函數，使用urllib3的conn.urlopen方法。

3.完全拋棄`Poster`

同上，用一個輪子換了另一個輪子。。

4. 完全拋棄`CookieJars`

上測試

Python

def test_session_persistent_cookies(self):        s = requests.session()        # Internally dispatched cookies are sent.      _c = {'kenneth': 'reitz', 'bessie': 'monke'}      r = s.get(httpbin('cookies'), cookies=_c)      r = s.get(httpbin('cookies'))        # Those cookies persist transparently.      c = json.loads(r.content).get('cookies')      assert c == _c        # Double check.      r = s.get(httpbin('cookies'), cookies={})      c = json.loads(r.content).get('cookies')      assert c == _c        # Remove a cookie by setting it's value to None.      r = s.get(httpbin('cookies'), cookies={'bessie': None})      c = json.loads(r.content).get('cookies')      del _c['bessie']      assert c == _c        # Test session-level cookies.      s = requests.session(cookies=_c)      r = s.get(httpbin('cookies'))      c = json.loads(r.content).get('cookies')      assert c == _c        # Have the server set a cookie.      r = s.get(httpbin('cookies', 'set', 'k', 'v'), allow_redirects=True)      c = json.loads(r.content).get('cookies')        assert 'k' in c        # And server-set cookie persistience.      r = s.get(httpbin('cookies'))      c = json.loads(r.content).get('cookies')        assert 'k' in c

處理響應的cookie:

Python

if 'set-cookie' in response.headers:      cookie_header = response.headers['set-cookie']        c = SimpleCookie()      c.load(cookie_header)        for k,v in c.items():          cookies.update({k: v.value})    # Save cookies in Response.  response.cookies = cookies  cookies = self.cookies  self.cookies.update(r.cookies)

發送請求時：

Python

if self.cookies:        # Skip if 'cookie' header is explicitly set.      if 'cookie' not in self.headers:            # Simple cookie with our dict.          c = SimpleCookie()          for (k, v) in self.cookies.items():              c[k] = v            # Turn it into a header.          cookie_header = c.output(header='').strip()            # Attach Cookie header to request.          self.headers['Cookie'] = cookie_header

使用了標準庫里的SimpleCookie處理和生成cookie，而讀取cookie全部都是字典類型。其實這些都是為了新的urllib3介面而服務的，從原來的各種Handler改成conn.urlopen以後原來的東西都相應的變化。

5. 新的`ConnectionError`

6. 安全模式

直接看程式碼吧：

Python

except MaxRetryError, e:      if not self.config.get('safe_mode', False):          raise ConnectionError(e)      else:          r = None    except (_SSLError, _HTTPError), e:      if not self.config.get('safe_mode', False):          raise Timeout('Request timed out.')

所謂安全模式就是不拋出異常。

7. 新的`prefetch`參數

也是urllib3支援的參數，當為True時，在發送請求時就讀取響應內容，否則跟原來一樣調用content方法時讀取。至於這個有什麼用我還不是太懂，因為我發現當prefetch=True時讀取content會出錯並且無法獲取響應內容，疑似BUG，先放在這裡。

8. `OPTION`請求方法

Option 是一種 HTTP 的請求類型，返回當前 url 支援的全部方法。

9. 節省 async 池的大小

原來：

Python

jobs = [gevent.spawn(send, r) for r in requests]  gevent.joinall(jobs)

現在：

Python

if size:      pool = Pool(size)      pool.map(send, requests)      pool.join()  else:      jobs = [gevent.spawn(send, r) for r in requests]      gevent.joinall(jobs)

大概就是傳入一個size參數，所有的非同步請求都在這個有限大小的池裡處理，嗯，又是池，真是一個好用的東西。

10. 上傳文件時包含真實文件名

看程式碼：

Python

def guess_filename(obj):      """Tries to guess the filename of the given object."""      name = getattr(obj, 'name', None)      if name and name[0] != '<' and name[-1] != '>':          return name

嗯，怎麼得到真實文件名？靠猜啊，沒有就拉倒。

後記

呼，終於整完了，v0.8.0 包含一個大的重構，我這個累的啊。第一次寫這種東西，感覺不是很滿意，程式碼太多了自己的試驗不太夠，總的也就能理解 80% 左右吧。不管怎樣，謝謝大家的閱讀，歡迎交流。