爬蟲實現csdn文章一鍵(批量)更換閱讀類型(全部可見、粉絲可見、vip可見)

2019 年 10 月 6 日
筆記

「動手創造美好」 – bigsai

前言

在IT首屈一指的交流平台上，我們可以看得到csdn在最近的一些進步和變化：如blink，文章收益,等等方面。促進和提升文章價值，激發筆者寫作分享！這無疑是一件好事。
但csdn依然還有很多不夠完善或者需要優化的地方，如推薦演算法、新出的收益無法一鍵更改文章閱讀類型。這讓一些大的部落客或者乾貨很多的部落客(成百上千文章)很難有精力一個一個手動修改、維護自己權益。
作為社會新青年、IT從事者。我們清楚公作人員每天為我們服務很忙，不一定照顧到所有群體。筆者本著樂於助人的精神，故自己動手，寫個腳本，幫助大家解決心理生理難題！
該方案針對markdown用戶。富文本可參考類推。
功能上分為直接全部更改和分類更改，分類更改需要多輸入一個分類進去的首頁url。其他一致！按照提升即可。
如有問題可以聯繫作者！

分析

需求既然有了，那麼技術上怎麼實現呢？
我們以前徹底的分析過csdn的登錄機制，發現csdn很容易登錄的。那麼我們就可以用著登錄的cookie進行我們想要的操作。

獲取文章鏈接、id

我們要找到自己站點的所有文章的url和id。因為我們可能會根據文章id進行操作。思路：

從登錄的cookie種找到你的id，進入csdn主頁。
解析主頁的資訊獲取頁數，這裡說一下。他的頁數是js動態載入進去的，並不是直接渲染，如果非得頭鐵debug找也沒問題。但是頁數和總文章數有關係。因為一頁最多只能有20篇文章！解析這些，你就能獲取所有文章鏈接、id。

分析markdown文本

對任意一個文章、檢查編輯。查看元素獲取下來鏈接。你會發現鏈接是有規律的。跟文章id有關。

進入之後，你會發現這個是md好像提不出什麼資訊。點擊提交看看ajax請求把。

這些參數沒加密。都是原文。我想這個md文件csdn怎麼提取。還能根據h5規則反向提取？csdn沒那麼強吧。肯定有其他方案。仔細觀察發現載入時候有個xhr文件有了所有資訊。我們只需要進行修改部分即可。

程式碼編寫

思路清晰，流程大致為

依賴外部包：bs4、requests python程式碼為（沒用多進程，簡單用了下多執行緒，執行完手動結束關閉）

import  requests  from bs4 import BeautifulSoup  import json  import threading  from queue import Queue  queue= Queue()  header={'user-agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36',          'referer':'https://passport.csdn.net/login',          'origin':'https://passport.csdn.net',          'content-Type':'application/json;charset=UTF-8',          'x-requested-with':'XMLHttpRequest',          'accept':'application/json, text/plain, */*',          'accept-encoding':'gzip, deflate, br',          'accept-language':'zh-CN,zh;q=0.9',           'connection': 'keep-alive'           ,'Host': 'passport.csdn.net'          }  data={"loginType":"1","pwdOrVerifyCode":"",'        '"userIdentification":"","uaToken":"",'        '"webUmidToken":""}  cookies=""  type='public'  ## 登錄  def login(usename,password):      global cookies      global data      loginurl = 'https://passport.csdn.net/v1/register/pc/login/doLogin'      data['userIdentification']=usename      data['pwdOrVerifyCode']=password      data=str(data)      print(data)      req = requests.post(loginurl, data=data, headers=header)      cookies = requests.utils.dict_from_cookiejar(req.cookies)      res = req.text      print(req.status_code)      print(cookies)      url="https://blog.csdn.net/"+str(cookies['UN'])      return url  #將url文章添加到queue  def addurl(url):      req2 = requests.get(url, cookies=cookies)      soup = BeautifulSoup(req2.text, 'lxml')      ##獲取頁數      pagetotal = soup.select(".text-center")[0].get("title")      pagetotal = (int)(((int)(pagetotal) + 19) / 20);      print(pagetotal)      for index in range(pagetotal):          url2 = url+"/article/list/" + str(index + 1)          print(url2)          req = requests.get(url2, cookies=cookies)          soup = BeautifulSoup(req.text, 'lxml')          pages = soup.find(id='mainBox').find_all(attrs={'class': 'article-item-box'})          for page in pages:              try:                  href = page.find("a").get("href")                  id = href.split("/")                  id = id[len(id) - 1]                  print(href, id)                  queue.put(id)              except Exception as e:                  print(e)  def addurl_by_type(url):      req2 = requests.get(url, cookies=cookies)      soup = BeautifulSoup(req2.text, 'lxml')      ##獲取頁數      pagetotal = soup.select(".text-center")[0].get("title")      pagetotal = (int)(((int)(pagetotal) + 19) / 20);      print(pagetotal)      for index in range(pagetotal):          url2 = url + "/" + str(index + 1)+"?"          print(url2)          req = requests.get(url2, cookies=cookies)          soup = BeautifulSoup(req.text, 'lxml')          pages = soup.find(id='mainBox').find_all(attrs={'class': 'article-item-box'})          for page in pages:              try:                  href = page.find("a").get("href")                  id = href.split("/")                  id = id[len(id) - 1]                  print(href, id)                  queue.put(id)              except Exception as e:                  print(e)    def change(id):      global read_needType      url3 = "https://mp.csdn.net/mdeditor/" + str(id) + "#"      # req = requests.get(url3, cookies=cookies)      url = "https://mp.csdn.net/mdeditor/getArticle?id=" + str(id)      req = requests.get(url, cookies=cookies)      res = req.json()      data = res['data']      # for i in data:      #     print(i)        print(res)      data['readType'] = read_needType      #print(data['readType'])        url = "https://mp.csdn.net/mdeditor/saveArticle"      req = requests.post(url, cookies=cookies, data=data)      res = req.text      #print(res)      class downspider(threading.Thread):      def __init__(self, threadname, que):          threading.Thread.__init__(self)          self.threadname = threadname          self.que = que      def run(self):          print('start thread' + self.threadname)          while True:              try:                  print(self.name,end='')                  id=queue.get()                  change(id)              except Exception as e:                  print(e)                  break    if __name__ == '__main__':      url=""      threads=[]      read_needType=['public','read_need_fans','read_need_vip']      name=input("name:")      password=input("password:")      print("type:n1:全部可看 n2關注可看 n3vip會員可看")      value=input("請輸入數字")      value=int(value)-1      read_needType=read_needType[value]      print("type:n1:全部更改 n2更改一個分類")      all_or_type=input("輸入更改範圍(數字)")      all_or_type=int(all_or_type)        if all_or_type==1:          url=login(name,password)          addurl(url)      else:          print("輸入分類首頁url：")          url=input("url:")          login(name,password)          addurl_by_type(url)      print(url)        threadList = ['thread-1', 'thread-2', 'thread-3', 'thread-4', 'thread-5']      for j in threadList:          thread = downspider(j, queue)          thread.start()          threads.append(thread)      for t in threads:          t.join()

執行測試

執行

粉絲可見

還原

爬蟲實現csdn文章一鍵(批量)更換閱讀類型(全部可見、粉絲可見、vip可見)

前言

分析

獲取文章鏈接、id

分析markdown文本

程式碼編寫

執行測試

VirMach 便宜 VPS

QNews

爬蟲實現csdn文章一鍵(批量)更換閱讀類型(全部可見、粉絲可見、vip可見)

前言

分析

獲取文章鏈接、id

分析markdown文本

程式碼編寫

執行測試

分享此文：

Related Posts

變數命名網站 Codelf

怎麼閱讀源碼【調試觀察源碼】

MGR 的主要優點

簡述存儲虛擬化（一）

VirMach 便宜 VPS

QNews

熱門搜尋