Python爬蟲教程:爬取下載b站影片【附源碼】

  • 2020 年 3 月 31 日
  • 筆記

爬取下載b站影片【附源碼】,話不多說,說干就干。

下載倉庫

[email protected]:inspurer/PythonSpider.git

或者直接下載:https://github.com/inspurer/PythonSpider/tree/master/bilibili

替換

隨便打開一個b站的介面,比如

將url複製到程式碼中去,運行程式碼,稍等一會兒,上述圖中的影片就被下載下來了。

完整程式碼奉上!

import requests  import re  import json  from contextlib import closing  from pyquery import PyQuery as pq  from requests import RequestException  class bilibili():      def __init__(self):          self.getHtmlHeaders={              'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36',              'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',              'Accept-Encoding': 'gzip, deflate, br',              'Accept-Language': 'zh-CN,zh;q = 0.9'          }            self.downloadVideoHeaders={              'Origin': 'https://www.bilibili.com',              'Referer': 'https://www.bilibili.com/video/av26522634',              'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36',          }        #一般這裡得到的網頁源碼和F12查看看到的不一樣,因為F12開發者工具里的源碼經過了瀏覽器的解釋      def getHtml(self,url):          try:              response = requests.get(url=url, headers= self.getHtmlHeaders)              print(response.status_code)              if response.status_code == 200:                  return response.text          except RequestException:              print('請求Html錯誤:')        def parseHtml(self,html):          #用pq解析得到影片標題          doc = pq(html)          video_title = doc('#viewbox_report > h1 > span').text()            #用正則、json得到影片url;用pq失敗後的無奈之舉          pattern = r'<script>window.__playinfo__=(.*?)</script>'          result = re.findall(pattern, html)[0]          temp = json.loads(result)          #temp['durl']是一個列表,裡面有很多字典          #video_url = temp['durl']          for item in temp['durl']:              if 'url' in item.keys():                  video_url = item['url']          #print(video_url)          return{              'title': video_title,              'url': video_url          }        def download_video(self,video):          title = re.sub(r'[/:*?"<>|]', '-', video['title'])  # 去掉創建文件時的非法字元          url = video['url']          filename = title +'.flv'          with open(filename, "wb") as f:              f.write(requests.get(url=url, headers=self.downloadVideoHeaders, stream=True, verify=False).content)            #closing適用於提供了 close() 實現的對象,比如網路連接、資料庫連接          # with closing(requests.get(video['url'], headers=self.downloadVideoHeaders, stream=True, verify=False)) as res:          #     if res.status_code == 200:          #         with open(filename, "wb") as f:          #             for chunk in res.iter_content(chunk_size=1024):          #                 if chunk:          #                     f.write(chunk)        def run(self,url):          self.download_video(self.parseHtml(self.getHtml(url)))    if __name__ == '__main__':      url = 'https://www.bilibili.com/video/av18100312'      bilibili().run(url)

運行結果奉上!