20行程式碼爬取Github上Star最多的Python項目
- 2019 年 10 月 11 日
- 筆記
「
不熟悉Github的程式設計師不是好程式設計師
——魯迅
」
傳說Github上有一種叫做star的寶藏,海賊王羅傑臨刑前說將所有的star都放到了那裡,偉大Python的終點——機器學習。無數人為此奔向機器學習的大坑。從此拉開了人工智慧時代的序幕。 就在大家忙著爭搶star的時候,無數能人異士已經把star賺了個盆滿缽滿。
我們用一個非常非常簡單的程式碼,只有20行,來爬一個高star的琅琊榜,看看都有哪些有名的項目上榜吧。
程式碼:
=======================================
import requests
import pygal
from pygal.style import LightColorizedStyle as LCS, LightenStyle as LS
URL='https://api.github.com/search/repositories?q=machine+learning&sort=stars'
r = requests.get(URL)
print("Status code:",r.status_code)
response_dict = r.json()
print("Total repositories:",response_dict['total_count'])
repo_dicts = response_dict['items']
names,stars = [],[]
for repo_dict in repo_dicts:
names.append(repo_dict['name'])
stars.append(repo_dict['stargazers_count'])
my_style = LS('#333366',base_style=LCS)
chart=pygal.Bar(style=my_style, x_label_rotation=45,show_legend=False)
chart.title = 'Most-Starred Python Projects>chart.x_labels = names
chart.add('',stars)
chart.render_to_file('python_repos.svg')
程式碼講解:
=======================================
requests是python實現的簡單易用的HTTP庫requests.get()用於請求目標網站
requests模組中,r.json()為Requests中內置的JSON解碼器
Pygal 是一個簡單易用的數據圖庫,它以面向對象的方式來創建各種數據圖,而且使用 Pygal 可以非常方便地生成各種格式的數據圖,包括 PNG、SVG 等。
github琅琊榜分析:
=======================================我們先給出Python的高star排行版:

我們將結果按照項目的名字,作者,star數目,地址以及描述列舉出來,對於重點項目,我們用加粗標記。
Name: awesome-python
Owner: vinta
Stars: 72286
Repository: https://github.com/vinta/awesome-python
Description: 著名的awesome系列中的Python大合集,我們有一個詳細介紹:
Name: system-design-primer
Owner: donnemartin
Stars: 72004
Repository:https://github.com/donnemartin/system-design-primer
Description: Learn how to design large-scale systems. Prep for the system design interview. Includes Anki flashcards.
Name: models
Owner: tensorflow
Stars: 57007
Repository: https://github.com/tensorflow/models
Description: TensorFlow中的重要模型
Name: youtube-dl
Owner: ytdl-org
Stars: 54830
Repository: https://github.com/ytdl-org/youtube-dl
Description: Command-line program to download videos from YouTube.com and other video sites
Name: thefuck
Owner: nvbn
Stars: 46347
Repository: https://github.com/nvbn/thefuck
Description: Magnificent app which corrects your previous console command.
Name: flask
Owner: pallets
Stars: 46128
Repository: https://github.com/pallets/flask
Description: The Python micro framework for building web applications.
Name: keras
Owner: keras-team
Stars: 43736
Repository: https://github.com/keras-team/keras
Description: 深度學習中一個重要的框架
Name: django
Owner: django
Stars: 43716
Repository: https://github.com/django/django
Description: The Web framework for perfectionists with deadlines.
Name: httpie
Owner: jakubroztocil
Stars: 42871
Repository: https://github.com/jakubroztocil/httpie
Description: As easy as httpie /aitch-tee-tee-pie/ � Modern command line HTTP client – user-friendly curl alternative with intuitive UI, JSON support, syntax highlighting, wget-like downloads, extensions, etc. https://twitter.com/clihttp
Name: requests
Owner: psf
Stars: 39952
Repository: https://github.com/psf/requests
Description: Python HTTP Requests for Humans,爬蟲程式碼常用
Name: ansible
Owner: ansible
Stars: 39060
Repository: https://github.com/ansible/ansible
Description: Ansible is a radically simple IT automation platform that makes your applications and systems easier to deploy. Avoid writing scripts or custom code to deploy and update your applications — automate in a language that approaches plain English, using
SSH, with no agents to install on remote systems. https://docs.ansible.com/ansible/
Name: scikit-learn
Owner: scikit-learn
Stars: 36784
Repository: https://github.com/scikit-learn/scikit-learn
Description: 機器學習的一個重要框架。
Name: big-list-of-naughty-strings
Owner: minimaxir
Stars: 33016
Repository:https://github.com/minimaxir/big-list-of-naughty-strings
Description: The Big List of Naughty Strings is a list of strings which have a high probability of causing issues when used as user-input data.
Name: shadowsocks
Owner: shadowsocks
Stars: 31137
Repository: https://github.com/shadowsocks/shadowsocks
Description: None
Name: XX-Net
Owner: XX-net
Stars: 28545
Repository: https://github.com/XX-net/XX-Net
Description: a web proxy tool
Name: face_recognition
Owner: ageitgey
Stars: 27552
Repository: https://github.com/ageitgey/face_recognition
Description: 世界上最簡單的Python人臉識別api和命令行
Name: you-get
Owner: soimort
Stars: 26672
Repository: https://github.com/soimort/you-get
Description: :arrow_double_down: Dumb downloader that scrapes the web
Name: cpython
Owner: python
Stars: 26326
Repository: https://github.com/python/cpython
Description: The Python programming language
Name: Algorithm_Interview_Notes-Chinese
Owner: imhuay
Stars: 26204
Repository: https://github.com/imhuay/Algorithm_Interview_Notes-Chinese
Description: 2018/2019/校招/春招/秋招/演算法/機器學習(Machine Learning)/深度學習(Deep Learning)/自然語言處理(NLP)/C/C++/Python/面試筆記
Name: home-assistant
Owner: home-assistant
Stars: 26059
Repository:https://github.com/home-assistant/home-assistant
Description: :house_with_garden: Open source home automation that puts local control and privacy first
Name: certbot
Owner: certbot
Stars: 25551
Repository: https://github.com/certbot/certbot
Description: Certbot is EFF's tool to obtain certs from Let's Encrypt and (optionally) auto-enable HTTPS on your server. It can also act as a client for any other CA that uses the ACME protocol.
Name: 100-Days-Of-ML-Code
Owner: Avik-Jain
Stars: 25417
Repository: https://github.com/Avik-Jain/100-Days-Of-ML-Code
Description: 一百天掌握機器學習,我們有過介紹:
Name: CppCoreGuidelines
Owner: isocpp
Stars: 24036
Repository: https://github.com/isocpp/CppCoreGuidelines
Description: The C++ Core Guidelines are a set of tried-and-true guidelines, rules, and best practices about coding in C++
Name: Deep-Learning-Papers-Reading-Roadmap
Owner: floodsung
Stars: 23848
Repository: https://github.com/floodsung/Deep-Learning-Papers-Reading-Roadmap
Description: Deep Learning papers reading roadmap for anyone who are eager to learn this amazing tech!
Name: tldr
Owner: tldr-pages
Stars: 23308
Repository: https://github.com/tldr-pages/tldr
Description: :books: Simplified and community-driven man pages
Name: faceswap
Owner: deepfakes
Stars: 22064
Repository: https://github.com/deepfakes/faceswap
Description: Deepfakes Software For All
Name: sentry
Owner: getsentry
Stars: 21932
Repository: https://github.com/getsentry/sentry
Description: Sentry is cross-platform application monitoring, with a focus on error reporting.
Name: python-patterns
Owner: faif
Stars: 21797
Repository: https://github.com/faif/python-patterns
Description: A collection of design patterns/idioms in Python
Name: Detectron
Owner: facebookresearch
Stars: 21607
Repository: https://github.com/facebookresearch/Detectron
Description: FAIR's research platform for object detection research, implementing popular algorithms like Mask R-CNN and RetinaNet.
Name: pandas
Owner: pandas-dev
Stars: 21019
Repository: https://github.com/pandas-dev/pandas
Description: Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
程式碼來源:
https://blog.csdn.net/qq_33583069/article/details/89078973