20行代码爬取Github上Star最多的Python项目
- 2019 年 10 月 11 日
- 筆記
“
不熟悉Github的程序员不是好程序员
——鲁迅
”
传说Github上有一种叫做star的宝藏,海贼王罗杰临刑前说将所有的star都放到了那里,伟大Python的终点——机器学习。无数人为此奔向机器学习的大坑。从此拉开了人工智能时代的序幕。 就在大家忙着争抢star的时候,无数能人异士已经把star赚了个盆满钵满。
我们用一个非常非常简单的代码,只有20行,来爬一个高star的琅琊榜,看看都有哪些有名的项目上榜吧。
代码:
=======================================
import requests
import pygal
from pygal.style import LightColorizedStyle as LCS, LightenStyle as LS
URL='https://api.github.com/search/repositories?q=machine+learning&sort=stars'
r = requests.get(URL)
print("Status code:",r.status_code)
response_dict = r.json()
print("Total repositories:",response_dict['total_count'])
repo_dicts = response_dict['items']
names,stars = [],[]
for repo_dict in repo_dicts:
names.append(repo_dict['name'])
stars.append(repo_dict['stargazers_count'])
my_style = LS('#333366',base_style=LCS)
chart=pygal.Bar(style=my_style, x_label_rotation=45,show_legend=False)
chart.title = 'Most-Starred Python Projects>chart.x_labels = names
chart.add('',stars)
chart.render_to_file('python_repos.svg')
代码讲解:
=======================================
requests是python实现的简单易用的HTTP库requests.get()用于请求目标网站
requests模块中,r.json()为Requests中内置的JSON解码器
Pygal 是一个简单易用的数据图库,它以面向对象的方式来创建各种数据图,而且使用 Pygal 可以非常方便地生成各种格式的数据图,包括 PNG、SVG 等。
github琅琊榜分析:
=======================================我们先给出Python的高star排行版:

我们将结果按照项目的名字,作者,star数目,地址以及描述列举出来,对于重点项目,我们用加粗标记。
Name: awesome-python
Owner: vinta
Stars: 72286
Repository: https://github.com/vinta/awesome-python
Description: 著名的awesome系列中的Python大合集,我们有一个详细介绍:
Name: system-design-primer
Owner: donnemartin
Stars: 72004
Repository:https://github.com/donnemartin/system-design-primer
Description: Learn how to design large-scale systems. Prep for the system design interview. Includes Anki flashcards.
Name: models
Owner: tensorflow
Stars: 57007
Repository: https://github.com/tensorflow/models
Description: TensorFlow中的重要模型
Name: youtube-dl
Owner: ytdl-org
Stars: 54830
Repository: https://github.com/ytdl-org/youtube-dl
Description: Command-line program to download videos from YouTube.com and other video sites
Name: thefuck
Owner: nvbn
Stars: 46347
Repository: https://github.com/nvbn/thefuck
Description: Magnificent app which corrects your previous console command.
Name: flask
Owner: pallets
Stars: 46128
Repository: https://github.com/pallets/flask
Description: The Python micro framework for building web applications.
Name: keras
Owner: keras-team
Stars: 43736
Repository: https://github.com/keras-team/keras
Description: 深度学习中一个重要的框架
Name: django
Owner: django
Stars: 43716
Repository: https://github.com/django/django
Description: The Web framework for perfectionists with deadlines.
Name: httpie
Owner: jakubroztocil
Stars: 42871
Repository: https://github.com/jakubroztocil/httpie
Description: As easy as httpie /aitch-tee-tee-pie/ � Modern command line HTTP client – user-friendly curl alternative with intuitive UI, JSON support, syntax highlighting, wget-like downloads, extensions, etc. https://twitter.com/clihttp
Name: requests
Owner: psf
Stars: 39952
Repository: https://github.com/psf/requests
Description: Python HTTP Requests for Humans,爬虫代码常用
Name: ansible
Owner: ansible
Stars: 39060
Repository: https://github.com/ansible/ansible
Description: Ansible is a radically simple IT automation platform that makes your applications and systems easier to deploy. Avoid writing scripts or custom code to deploy and update your applications — automate in a language that approaches plain English, using
SSH, with no agents to install on remote systems. https://docs.ansible.com/ansible/
Name: scikit-learn
Owner: scikit-learn
Stars: 36784
Repository: https://github.com/scikit-learn/scikit-learn
Description: 机器学习的一个重要框架。
Name: big-list-of-naughty-strings
Owner: minimaxir
Stars: 33016
Repository:https://github.com/minimaxir/big-list-of-naughty-strings
Description: The Big List of Naughty Strings is a list of strings which have a high probability of causing issues when used as user-input data.
Name: shadowsocks
Owner: shadowsocks
Stars: 31137
Repository: https://github.com/shadowsocks/shadowsocks
Description: None
Name: XX-Net
Owner: XX-net
Stars: 28545
Repository: https://github.com/XX-net/XX-Net
Description: a web proxy tool
Name: face_recognition
Owner: ageitgey
Stars: 27552
Repository: https://github.com/ageitgey/face_recognition
Description: 世界上最简单的Python人脸识别api和命令行
Name: you-get
Owner: soimort
Stars: 26672
Repository: https://github.com/soimort/you-get
Description: :arrow_double_down: Dumb downloader that scrapes the web
Name: cpython
Owner: python
Stars: 26326
Repository: https://github.com/python/cpython
Description: The Python programming language
Name: Algorithm_Interview_Notes-Chinese
Owner: imhuay
Stars: 26204
Repository: https://github.com/imhuay/Algorithm_Interview_Notes-Chinese
Description: 2018/2019/校招/春招/秋招/算法/机器学习(Machine Learning)/深度学习(Deep Learning)/自然语言处理(NLP)/C/C++/Python/面试笔记
Name: home-assistant
Owner: home-assistant
Stars: 26059
Repository:https://github.com/home-assistant/home-assistant
Description: :house_with_garden: Open source home automation that puts local control and privacy first
Name: certbot
Owner: certbot
Stars: 25551
Repository: https://github.com/certbot/certbot
Description: Certbot is EFF's tool to obtain certs from Let's Encrypt and (optionally) auto-enable HTTPS on your server. It can also act as a client for any other CA that uses the ACME protocol.
Name: 100-Days-Of-ML-Code
Owner: Avik-Jain
Stars: 25417
Repository: https://github.com/Avik-Jain/100-Days-Of-ML-Code
Description: 一百天掌握机器学习,我们有过介绍:
Name: CppCoreGuidelines
Owner: isocpp
Stars: 24036
Repository: https://github.com/isocpp/CppCoreGuidelines
Description: The C++ Core Guidelines are a set of tried-and-true guidelines, rules, and best practices about coding in C++
Name: Deep-Learning-Papers-Reading-Roadmap
Owner: floodsung
Stars: 23848
Repository: https://github.com/floodsung/Deep-Learning-Papers-Reading-Roadmap
Description: Deep Learning papers reading roadmap for anyone who are eager to learn this amazing tech!
Name: tldr
Owner: tldr-pages
Stars: 23308
Repository: https://github.com/tldr-pages/tldr
Description: :books: Simplified and community-driven man pages
Name: faceswap
Owner: deepfakes
Stars: 22064
Repository: https://github.com/deepfakes/faceswap
Description: Deepfakes Software For All
Name: sentry
Owner: getsentry
Stars: 21932
Repository: https://github.com/getsentry/sentry
Description: Sentry is cross-platform application monitoring, with a focus on error reporting.
Name: python-patterns
Owner: faif
Stars: 21797
Repository: https://github.com/faif/python-patterns
Description: A collection of design patterns/idioms in Python
Name: Detectron
Owner: facebookresearch
Stars: 21607
Repository: https://github.com/facebookresearch/Detectron
Description: FAIR's research platform for object detection research, implementing popular algorithms like Mask R-CNN and RetinaNet.
Name: pandas
Owner: pandas-dev
Stars: 21019
Repository: https://github.com/pandas-dev/pandas
Description: Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
代码来源:
https://blog.csdn.net/qq_33583069/article/details/89078973