Pip trusted_host問題記錄

  • 2019 年 11 月 30 日
  • 筆記

問題定位

一日我在Pipenv上收到一個issue: 用戶說Pipenv執行的pip命令中--trusted-host缺少了port部分。然後我去扒源碼,結果發現有兩處同樣的函數:[1][2]邏輯不一致。頓時感覺事情沒那麼簡單。於是我本地搞了一個pypi server, 並用自簽名支援了https,然後用pip測試1

Bash

$ pip install -i https://localtest.me:5001 urllib3 --trusted-host localtest.me:5001  Successful    $ pip install -i https://localtest.me:5001 urllib3 --trusted-host localtest.me  Looking in indexes: https://localtest.me:5001  Collecting urllib3    Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1076)'))': /urllib3/    ...    Failed    $ pip install -i http://localtest.me:5000 urllib3 --trusted-host localtest.me:5000  Looking in indexes: http://localtest.me:5000  Collecting urllib3    The repository located at localtest.me is not a trusted or secure host and is being ignored. If this repository is available via HTTPS we recommend you use HTTPS instead, otherwise you may silence this warning and allow it anyway with '--trusted-host localtest.me'.    Could not find a version that satisfies the requirement urllib3 (from versions: )  No matching distribution found for urllib3    $ pip install -i http://localtest.me:5000 urllib3 --trusted-host localtest.me  Successful

驚呆,HTTPS和HTTP針對trusted-host帶不帶port的處理方式不一樣:HTTPS希望你帶port,而HTTP不需要帶port。這顯然是不合理的,於是我去看pip的源碼關於這塊的處理邏輯。以下基於pip 19.2.3的源碼。 src/pip/_internal/download.py

Python

...      insecure_adapter = InsecureHTTPAdapter(max_retries=retries)      # Save this for later use in add_insecure_host().      self._insecure_adapter = insecure_adapter        self.mount("https://", secure_adapter)      self.mount("http://", insecure_adapter)        # Enable file:// urls      self.mount("file://", LocalFSAdapter())        # We want to use a non-validating adapter for any requests which are      # deemed insecure.      for host in insecure_hosts:          self.add_insecure_host(host)    def add_insecure_host(self, host):      # type: (str) -> None      self.mount('https://{}/'.format(host), self._insecure_adapter)

insecure_adapter是不檢查證書的,secure_adapter是檢查證書的,可以看到在add_insecure_host()這個函數中,是把傳進來的host加上末尾的/拼成一個URL來新增一個adapter端點的,而在requests中,多個adapter端點是依靠startswith來識別是否匹配的。所以如果trusted-hostexample.org,則只有https://example.org/會被識別為信任的站點而https://example.org:8080/不會。

以上是僅針對HTTPS而言,HTTP是無需證書檢查的,它相關的邏輯在

src/pip/_internal/index.py

Python

def _validate_secure_origin(self, logger, location):      # type: (Logger, Link) -> bool      # Determine if this url used a secure transport mechanism      parsed = urllib_parse.urlparse(str(location))      origin = (parsed.scheme, parsed.hostname, parsed.port)        # The protocol to use to see if the protocol matches.      # Don't count the repository type as part of the protocol: in      # cases such as "git+ssh", only use "ssh". (I.e., Only verify against      # the last scheme.)      protocol = origin[0].rsplit('+', 1)[-1]        # Determine if our origin is a secure origin by looking through our      # hardcoded list of secure origins, as well as any additional ones      # configured on this PackageFinder instance.      for secure_origin in self.iter_secure_origins():          if protocol != secure_origin[0] and secure_origin[0] != "*":              continue            try:              # We need to do this decode dance to ensure that we have a              # unicode object, even on Python 2.x.              addr = ipaddress.ip_address(                  origin[1]                  if (                      isinstance(origin[1], six.text_type) or                      origin[1] is None                  )                  else origin[1].decode("utf8")              )              network = ipaddress.ip_network(                  secure_origin[1]                  if isinstance(secure_origin[1], six.text_type)                  # setting secure_origin[1] to proper Union[bytes, str]                  # creates problems in other places                  else secure_origin[1].decode("utf8")  # type: ignore              )          except ValueError:              # We don't have both a valid address or a valid network, so              # we'll check this origin against hostnames.              if (origin[1] and                      origin[1].lower() != secure_origin[1].lower() and                      secure_origin[1] != "*"):                  continue          else:              # We have a valid address and network, so see if the address              # is contained within the network.              if addr not in network:                  continue            # Check to see if the port patches          if (origin[2] != secure_origin[2] and                  secure_origin[2] != "*" and                  secure_origin[2] is not None):              continue            # If we've gotten here, then this origin matches the current          # secure origin and we should return True          return True        # If we've gotten to this point, then the origin isn't secure and we      # will not accept it as a valid location to search. We will however      # log a warning that we are ignoring it.      logger.warning(          "The repository located at %s is not a trusted or secure host and "          "is being ignored. If this repository is available via HTTPS we "          "recommend you use HTTPS instead, otherwise you may silence "          "this warning and allow it anyway with '--trusted-host %s'.",          parsed.hostname,          parsed.hostname,      )        return False

看上去是分別匹配scheme, hostname和port,沒什麼問題。問題在於self.iter_secure_origins()這裡產生的值,在同一個文件中:

Python

def iter_secure_origins(self):      # type: () -> Iterator[SecureOrigin]      for secure_origin in SECURE_ORIGINS:          yield secure_origin      for host in self.trusted_hosts:          yield ('*', host, '*')

這裡沒做任何處理,就把trusted-host當做hostname丟出來了,看來這裡壓根沒考慮過trusted-host帶port的需求。

問題修復

找到了問題所在,總結一下:

  • HTTPS需要帶port是因為requests.Sessionmount是依靠前綴匹配來獲取對應的適配器(adapter)的,並且末尾會加上一個/
  • HTTP需要不帶port是因為檢查是否安全URL的時候,是拿目標URL的hostname(不帶port)去匹配trusted-host的值。

所以對應的修復方法就是:

  • 添加信任的端點時,如果trusted-host不帶port,則需要把https://hostname:也添加為無需安全檢查的端點(利用前綴匹配的特性)。
  • 生成secure_origin時解析傳入的trusted-host值,分成hostname與port部分分別匹配。

具體程式碼可以看我提交的PR,這個PR已經被merge,預計可以在下個版本中發布。

  1. 這裡我用了一個trick,使用了localtest.me轉發localhost的請求,因為localhost是永遠被信任的地址,trusted-host不起作用。