20行程式碼爬取Github上Star最多的Python項目

  • 2019 年 10 月 11 日
  • 筆記

不熟悉Github的程式設計師不是好程式設計師

——魯迅

傳說Github上有一種叫做star的寶藏,海賊王羅傑臨刑前說將所有的star都放到了那裡,偉大Python的終點——機器學習。無數人為此奔向機器學習的大坑。從此拉開了人工智慧時代的序幕。 就在大家忙著爭搶star的時候,無數能人異士已經把star賺了個盆滿缽滿。

我們用一個非常非常簡單的程式碼,只有20行,來爬一個高star的琅琊榜,看看都有哪些有名的項目上榜吧。

程式碼:

=======================================

import requests

import pygal

from pygal.style import LightColorizedStyle as LCS, LightenStyle as LS

URL='https://api.github.com/search/repositories?q=machine+learning&sort=stars'

r = requests.get(URL)

print("Status code:",r.status_code)

response_dict = r.json()

print("Total repositories:",response_dict['total_count'])

repo_dicts = response_dict['items']

names,stars = [],[]

for repo_dict in repo_dicts:

names.append(repo_dict['name'])

stars.append(repo_dict['stargazers_count'])

my_style = LS('#333366',base_style=LCS)

chart=pygal.Bar(style=my_style, x_label_rotation=45,show_legend=False)

chart.title = 'Most-Starred Python Projects>chart.x_labels = names

chart.add('',stars)

chart.render_to_file('python_repos.svg')

程式碼講解:

=======================================

requests是python實現的簡單易用的HTTP庫requests.get()用於請求目標網站

requests模組中,r.json()為Requests中內置的JSON解碼器

Pygal 是一個簡單易用的數據圖庫,它以面向對象的方式來創建各種數據圖,而且使用 Pygal 可以非常方便地生成各種格式的數據圖,包括 PNG、SVG 等。

github琅琊榜分析:

=======================================我們先給出Python的高star排行版:

我們將結果按照項目的名字,作者,star數目,地址以及描述列舉出來,對於重點項目,我們用加粗標記。

Name: awesome-python

Owner: vinta

Stars: 72286

Repository: https://github.com/vinta/awesome-python

Description: 著名的awesome系列中的Python大合集,我們有一個詳細介紹:

awesome-python介紹

Name: system-design-primer

Owner: donnemartin

Stars: 72004

Repository:https://github.com/donnemartin/system-design-primer

Description: Learn how to design large-scale systems. Prep for the system design interview. Includes Anki flashcards.

Name: models

Owner: tensorflow

Stars: 57007

Repository: https://github.com/tensorflow/models

Description: TensorFlow中的重要模型

Name: youtube-dl

Owner: ytdl-org

Stars: 54830

Repository: https://github.com/ytdl-org/youtube-dl

Description: Command-line program to download videos from YouTube.com and other video sites

Name: thefuck

Owner: nvbn

Stars: 46347

Repository: https://github.com/nvbn/thefuck

Description: Magnificent app which corrects your previous console command.

Name: flask

Owner: pallets

Stars: 46128

Repository: https://github.com/pallets/flask

Description: The Python micro framework for building web applications.

Name: keras

Owner: keras-team

Stars: 43736

Repository: https://github.com/keras-team/keras

Description: 深度學習中一個重要的框架

Name: django

Owner: django

Stars: 43716

Repository: https://github.com/django/django

Description: The Web framework for perfectionists with deadlines.

Name: httpie

Owner: jakubroztocil

Stars: 42871

Repository: https://github.com/jakubroztocil/httpie

Description: As easy as httpie /aitch-tee-tee-pie/ � Modern command line HTTP client – user-friendly curl alternative with intuitive UI, JSON support, syntax highlighting, wget-like downloads, extensions, etc. https://twitter.com/clihttp

Name: requests

Owner: psf

Stars: 39952

Repository: https://github.com/psf/requests

Description: Python HTTP Requests for Humans,爬蟲程式碼常用

Name: ansible

Owner: ansible

Stars: 39060

Repository: https://github.com/ansible/ansible

Description: Ansible is a radically simple IT automation platform that makes your applications and systems easier to deploy. Avoid writing scripts or custom code to deploy and update your applications — automate in a language that approaches plain English, using

SSH, with no agents to install on remote systems. https://docs.ansible.com/ansible/

Name: scikit-learn

Owner: scikit-learn

Stars: 36784

Repository: https://github.com/scikit-learn/scikit-learn

Description: 機器學習的一個重要框架。

Name: big-list-of-naughty-strings

Owner: minimaxir

Stars: 33016

Repository:https://github.com/minimaxir/big-list-of-naughty-strings

Description: The Big List of Naughty Strings is a list of strings which have a high probability of causing issues when used as user-input data.

Name: shadowsocks

Owner: shadowsocks

Stars: 31137

Repository: https://github.com/shadowsocks/shadowsocks

Description: None

Name: XX-Net

Owner: XX-net

Stars: 28545

Repository: https://github.com/XX-net/XX-Net

Description: a web proxy tool

Name: face_recognition

Owner: ageitgey

Stars: 27552

Repository: https://github.com/ageitgey/face_recognition

Description: 世界上最簡單的Python人臉識別api和命令行

Name: you-get

Owner: soimort

Stars: 26672

Repository: https://github.com/soimort/you-get

Description: :arrow_double_down: Dumb downloader that scrapes the web

Name: cpython

Owner: python

Stars: 26326

Repository: https://github.com/python/cpython

Description: The Python programming language

Name: Algorithm_Interview_Notes-Chinese

Owner: imhuay

Stars: 26204

Repository: https://github.com/imhuay/Algorithm_Interview_Notes-Chinese

Description: 2018/2019/校招/春招/秋招/演算法/機器學習(Machine Learning)/深度學習(Deep Learning)/自然語言處理(NLP)/C/C++/Python/面試筆記

Name: home-assistant

Owner: home-assistant

Stars: 26059

Repository:https://github.com/home-assistant/home-assistant

Description: :house_with_garden: Open source home automation that puts local control and privacy first

Name: certbot

Owner: certbot

Stars: 25551

Repository: https://github.com/certbot/certbot

Description: Certbot is EFF's tool to obtain certs from Let's Encrypt and (optionally) auto-enable HTTPS on your server. It can also act as a client for any other CA that uses the ACME protocol.

Name: 100-Days-Of-ML-Code

Owner: Avik-Jain

Stars: 25417

Repository: https://github.com/Avik-Jain/100-Days-Of-ML-Code

Description: 一百天掌握機器學習,我們有過介紹:

一百天掌握機器學習

Name: CppCoreGuidelines

Owner: isocpp

Stars: 24036

Repository: https://github.com/isocpp/CppCoreGuidelines

Description: The C++ Core Guidelines are a set of tried-and-true guidelines, rules, and best practices about coding in C++

Name: Deep-Learning-Papers-Reading-Roadmap

Owner: floodsung

Stars: 23848

Repository: https://github.com/floodsung/Deep-Learning-Papers-Reading-Roadmap

Description: Deep Learning papers reading roadmap for anyone who are eager to learn this amazing tech!

Name: tldr

Owner: tldr-pages

Stars: 23308

Repository: https://github.com/tldr-pages/tldr

Description: :books: Simplified and community-driven man pages

Name: faceswap

Owner: deepfakes

Stars: 22064

Repository: https://github.com/deepfakes/faceswap

Description: Deepfakes Software For All

Name: sentry

Owner: getsentry

Stars: 21932

Repository: https://github.com/getsentry/sentry

Description: Sentry is cross-platform application monitoring, with a focus on error reporting.

Name: python-patterns

Owner: faif

Stars: 21797

Repository: https://github.com/faif/python-patterns

Description: A collection of design patterns/idioms in Python

Name: Detectron

Owner: facebookresearch

Stars: 21607

Repository: https://github.com/facebookresearch/Detectron

Description: FAIR's research platform for object detection research, implementing popular algorithms like Mask R-CNN and RetinaNet.

Name: pandas

Owner: pandas-dev

Stars: 21019

Repository: https://github.com/pandas-dev/pandas

Description: Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more

程式碼來源:

https://blog.csdn.net/qq_33583069/article/details/89078973