20行代码爬取Github上Star最多的Python项目

  • 2019 年 10 月 11 日
  • 笔记

不熟悉Github的程序员不是好程序员

——鲁迅

传说Github上有一种叫做star的宝藏,海贼王罗杰临刑前说将所有的star都放到了那里,伟大Python的终点——机器学习。无数人为此奔向机器学习的大坑。从此拉开了人工智能时代的序幕。 就在大家忙着争抢star的时候,无数能人异士已经把star赚了个盆满钵满。

我们用一个非常非常简单的代码,只有20行,来爬一个高star的琅琊榜,看看都有哪些有名的项目上榜吧。

代码:

=======================================

import requests

import pygal

from pygal.style import LightColorizedStyle as LCS, LightenStyle as LS

URL='https://api.github.com/search/repositories?q=machine+learning&sort=stars'

r = requests.get(URL)

print("Status code:",r.status_code)

response_dict = r.json()

print("Total repositories:",response_dict['total_count'])

repo_dicts = response_dict['items']

names,stars = [],[]

for repo_dict in repo_dicts:

names.append(repo_dict['name'])

stars.append(repo_dict['stargazers_count'])

my_style = LS('#333366',base_style=LCS)

chart=pygal.Bar(style=my_style, x_label_rotation=45,show_legend=False)

chart.title = 'Most-Starred Python Projects>chart.x_labels = names

chart.add('',stars)

chart.render_to_file('python_repos.svg')

代码讲解:

=======================================

requests是python实现的简单易用的HTTP库requests.get()用于请求目标网站

requests模块中,r.json()为Requests中内置的JSON解码器

Pygal 是一个简单易用的数据图库,它以面向对象的方式来创建各种数据图,而且使用 Pygal 可以非常方便地生成各种格式的数据图,包括 PNG、SVG 等。

github琅琊榜分析:

=======================================我们先给出Python的高star排行版:

我们将结果按照项目的名字,作者,star数目,地址以及描述列举出来,对于重点项目,我们用加粗标记。

Name: awesome-python

Owner: vinta

Stars: 72286

Repository: https://github.com/vinta/awesome-python

Description: 著名的awesome系列中的Python大合集,我们有一个详细介绍:

awesome-python介绍

Name: system-design-primer

Owner: donnemartin

Stars: 72004

Repository:https://github.com/donnemartin/system-design-primer

Description: Learn how to design large-scale systems. Prep for the system design interview. Includes Anki flashcards.

Name: models

Owner: tensorflow

Stars: 57007

Repository: https://github.com/tensorflow/models

Description: TensorFlow中的重要模型

Name: youtube-dl

Owner: ytdl-org

Stars: 54830

Repository: https://github.com/ytdl-org/youtube-dl

Description: Command-line program to download videos from YouTube.com and other video sites

Name: thefuck

Owner: nvbn

Stars: 46347

Repository: https://github.com/nvbn/thefuck

Description: Magnificent app which corrects your previous console command.

Name: flask

Owner: pallets

Stars: 46128

Repository: https://github.com/pallets/flask

Description: The Python micro framework for building web applications.

Name: keras

Owner: keras-team

Stars: 43736

Repository: https://github.com/keras-team/keras

Description: 深度学习中一个重要的框架

Name: django

Owner: django

Stars: 43716

Repository: https://github.com/django/django

Description: The Web framework for perfectionists with deadlines.

Name: httpie

Owner: jakubroztocil

Stars: 42871

Repository: https://github.com/jakubroztocil/httpie

Description: As easy as httpie /aitch-tee-tee-pie/ � Modern command line HTTP client – user-friendly curl alternative with intuitive UI, JSON support, syntax highlighting, wget-like downloads, extensions, etc. https://twitter.com/clihttp

Name: requests

Owner: psf

Stars: 39952

Repository: https://github.com/psf/requests

Description: Python HTTP Requests for Humans,爬虫代码常用

Name: ansible

Owner: ansible

Stars: 39060

Repository: https://github.com/ansible/ansible

Description: Ansible is a radically simple IT automation platform that makes your applications and systems easier to deploy. Avoid writing scripts or custom code to deploy and update your applications — automate in a language that approaches plain English, using

SSH, with no agents to install on remote systems. https://docs.ansible.com/ansible/

Name: scikit-learn

Owner: scikit-learn

Stars: 36784

Repository: https://github.com/scikit-learn/scikit-learn

Description: 机器学习的一个重要框架。

Name: big-list-of-naughty-strings

Owner: minimaxir

Stars: 33016

Repository:https://github.com/minimaxir/big-list-of-naughty-strings

Description: The Big List of Naughty Strings is a list of strings which have a high probability of causing issues when used as user-input data.

Name: shadowsocks

Owner: shadowsocks

Stars: 31137

Repository: https://github.com/shadowsocks/shadowsocks

Description: None

Name: XX-Net

Owner: XX-net

Stars: 28545

Repository: https://github.com/XX-net/XX-Net

Description: a web proxy tool

Name: face_recognition

Owner: ageitgey

Stars: 27552

Repository: https://github.com/ageitgey/face_recognition

Description: 世界上最简单的Python人脸识别api和命令行

Name: you-get

Owner: soimort

Stars: 26672

Repository: https://github.com/soimort/you-get

Description: :arrow_double_down: Dumb downloader that scrapes the web

Name: cpython

Owner: python

Stars: 26326

Repository: https://github.com/python/cpython

Description: The Python programming language

Name: Algorithm_Interview_Notes-Chinese

Owner: imhuay

Stars: 26204

Repository: https://github.com/imhuay/Algorithm_Interview_Notes-Chinese

Description: 2018/2019/校招/春招/秋招/算法/机器学习(Machine Learning)/深度学习(Deep Learning)/自然语言处理(NLP)/C/C++/Python/面试笔记

Name: home-assistant

Owner: home-assistant

Stars: 26059

Repository:https://github.com/home-assistant/home-assistant

Description: :house_with_garden: Open source home automation that puts local control and privacy first

Name: certbot

Owner: certbot

Stars: 25551

Repository: https://github.com/certbot/certbot

Description: Certbot is EFF's tool to obtain certs from Let's Encrypt and (optionally) auto-enable HTTPS on your server. It can also act as a client for any other CA that uses the ACME protocol.

Name: 100-Days-Of-ML-Code

Owner: Avik-Jain

Stars: 25417

Repository: https://github.com/Avik-Jain/100-Days-Of-ML-Code

Description: 一百天掌握机器学习,我们有过介绍:

一百天掌握机器学习

Name: CppCoreGuidelines

Owner: isocpp

Stars: 24036

Repository: https://github.com/isocpp/CppCoreGuidelines

Description: The C++ Core Guidelines are a set of tried-and-true guidelines, rules, and best practices about coding in C++

Name: Deep-Learning-Papers-Reading-Roadmap

Owner: floodsung

Stars: 23848

Repository: https://github.com/floodsung/Deep-Learning-Papers-Reading-Roadmap

Description: Deep Learning papers reading roadmap for anyone who are eager to learn this amazing tech!

Name: tldr

Owner: tldr-pages

Stars: 23308

Repository: https://github.com/tldr-pages/tldr

Description: :books: Simplified and community-driven man pages

Name: faceswap

Owner: deepfakes

Stars: 22064

Repository: https://github.com/deepfakes/faceswap

Description: Deepfakes Software For All

Name: sentry

Owner: getsentry

Stars: 21932

Repository: https://github.com/getsentry/sentry

Description: Sentry is cross-platform application monitoring, with a focus on error reporting.

Name: python-patterns

Owner: faif

Stars: 21797

Repository: https://github.com/faif/python-patterns

Description: A collection of design patterns/idioms in Python

Name: Detectron

Owner: facebookresearch

Stars: 21607

Repository: https://github.com/facebookresearch/Detectron

Description: FAIR's research platform for object detection research, implementing popular algorithms like Mask R-CNN and RetinaNet.

Name: pandas

Owner: pandas-dev

Stars: 21019

Repository: https://github.com/pandas-dev/pandas

Description: Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more

代码来源:

https://blog.csdn.net/qq_33583069/article/details/89078973