(Python基礎教程之二十二)爬蟲下載網頁影片(video blob)

  1. Python基礎教程
  2. 在SublimeEditor中配置Python環境
  3. Python程式碼中添加註釋
  4. Python中的變數的使用
  5. Python中的數據類型
  6. Python中的關鍵字
  7. Python字元串操作
  8. Python中的list操作
  9. Python中的Tuple操作
  10. Pythonmax()和min()–在列表或數組中查找最大值和最小值
  11. Python找到最大的N個(前N個)或最小的N個項目
  12. Python讀寫CSV文件
  13. Python中使用httplib2–HTTPGET和POST示例
  14. Python將tuple開箱為變數或參數
  15. Python開箱Tuple–太多值無法解壓
  16. Pythonmultidict示例–將單個鍵映射到字典中的多個值
  17. PythonOrderedDict–有序字典
  18. Python字典交集–比較兩個字典
  19. Python優先順序隊列示例
  20. python中如何格式化日期
  21. 30 分鐘 Python 爬蟲教程
  22. 爬蟲下載網頁影片(video blob)

現在影片鏈接一般為m3u8,找到m3u8地址就可以下載了

  1. 打開Chrome Developer工具,然後點擊「網路」標籤。
  2. 導航到包含影片的頁面,然後開始播放。
  3. 將文件列表過濾為「 m3u8」。
  4. 找到master.m3u8或index.m3u8並單擊它。
  5. 將文件保存到磁碟並在其中查看。
  6. 如果文件包含一個m3u8主URL,則複製該URL。
  7. 使用ffmpeg 工具下載m3u8影片
ffmpeg -i "//secure.brightcove.com/services/mobile/streaming/index/rendition.m3u8?assetId=6138283938001&secure=true&videoId=6138277786001" -bsf:a aac_adtstoasc -vcodec copy -c copy -crf 50 6138277786001.mp4

Python下載程式碼

#!/usr/bin/env python3
import requests,urllib
from bs4 import BeautifulSoup
import os
import subprocess

pwd = os.path.split(os.path.realpath(__file__))[0]

url = "//www.topgear.com/videos"

headers = {
    'upgrade-insecure-requests': "1",
    'user-agent': "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36",
    'accept': "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
    'accept-encoding': "gzip, deflate, br",
    'accept-language': "zh-CN,zh;q=0.9,en;q=0.8",
    'cookie': "has_js=1; minVersion={\"experiment\":1570672462,\"minFlavor\":\"new_vermi-1.13.7.11.js100\"}; minUniq=%7B%22minUID%22%3A%22bb80328a30-e8cdeb4d55-9a314411d2-aff4bb11a6-4aa23e3779%22%7D; minDaily=%7B%22testMode%22%3Atrue%2C%22dailyUser%22%3Atrue%7D; __gads=ID=b6eee23a8df86f72:T=1588041695:S=ALNI_MYCQR1Bf2fq53bqISIZBy8kIgI9oA; minBuffer=%7B%22minAnalytics%22%3A%22%7B%5C%22clicks%5C%22%3A%5B%5D%2C%5C%22clicksDelay%5C%22%3A%5B%5D%7D%22%2C%22_minEE1%22%3A%22%5B%5D%22%7D; minSession=%7B%22minSID%22%3A%227f32fd50ab-88cc4cf6f3-68d284cdee-1faeb65c08-c5966d76ac%22%2C%22minSessionSent%22%3Atrue%2C%22hadImp%22%3Atrue%2C%22sessionUniqs%22%3A%22%7Btime%3A1588053248571%2Clist%3A%5B11206251nt0%5D%7D%22%7D; OptanonConsent=landingPath=NotLandingPage&datestamp=Tue+Apr+28+2020+13%3A55%3A33+GMT%2B0800+(%E4%B8%AD%E5%9B%BD%E6%A0%87%E5%87%86%E6%97%B6%E9%97%B4)&version=3.6.24&AwaitingReconsent=false&groups=1%3A1%2C101%3A0%2C2%3A0%2C0_132429%3A0%2C3%3A0%2C4%3A0%2C0_132431%3A0%2C104%3A0%2C106%3A0%2C111%3A0%2C114%3A0%2C120%3A0%2C124%3A0%2C126%3A0%2C130%3A0%2C133%3A0%2C134%3A0%2C144%3A0%2C145%3A0%2C146%3A0%2C147%3A0%2C150%3A0%2C151%3A0%2C157%3A0%2C162%3A0%2C173%3A0%2C0_126679%3A0%2C0_137695%3A0%2C0_132361%3A0%2C0_132391%3A0; GED_PLAYLIST_ACTIVITY=W3sidSI6Ijh5clQiLCJ0c2wiOjE1ODgwNTMzNDksIm52IjowLCJ1cHQiOjE1ODgwNTMzMzMsImx0IjoxNTg4MDUzMzM3fV0.",
    'cache-control': "no-cache"}

if __name__ == '__main__':
    response = requests.request("GET", url, headers=headers)
    soup = BeautifulSoup(response.text, 'html.parser')
    videoId = soup.find_all('video', class_="video-js")[0]['data-video-id'] ##獲取影片Id
    title = soup.find_all('h1', class_="video-player__title")[0].contents[0] ##獲取影片標題
    url = "//secure.brightcove.com/services/mobile/streaming/index/master.m3u8?videoId={}&secure=true".format(videoId)  ##生成影片下載Url
    filename = '{}.mp4'.format(title).replace(" ","-")
    cmd_str = 'ffmpeg -i \"' + url + '\" ' + '-acodec copy -vcodec copy -absf aac_adtstoasc ' + pwd + "/" +filename  ##下載影片
    print(cmd_str)
    subprocess.call(cmd_str,shell=True)