28 條關於 Jupyter Notebook 的一些訣竅與捷徑（上）

2020 年 10 月 29 日
AI

原文地址：//www.dataquest.io/blog/jupyter-notebook-tips-tricks-shortcuts/

今天為大家分享一篇文章，總結了 28 個 Jupyter 中的實用技巧，本篇文章是上篇，為大家介紹了前 14 條，希望大家可以讓 Jupyter 成為你的一大助力。

Jupyter Notebook

Jupyter notebook 的前身是 IPython notebook，一個可以幫你創建具有優秀可讀性分析結果的靈活工具，因為你可以輕鬆的將程式碼，圖片，注釋，公式和繪圖結果放在一起。在這篇文章中，我們收集了一些最熱門的 Jupyter notebook 小技巧，來幫助你快速的成為一個 Jupyter 的熟練老鳥。

「這篇文章是基於 Alex 的部落格^[1] 這篇部落格。我們在在此基礎上進行了擴展，並且會繼續更新下去——如果你有任任何建議也請讓我們知道^[2]。感謝 Alex 讓我們在這裡再參考整理他的文章。」

Jupyter 非常易於擴展，支援多種程式語言，並且很容易運行在你的電腦或者幾乎任何伺服器上——你只需要有 ssh 或者 http 許可權。最棒的點是：它完全免費。現在讓我們進入到這 28 條（還在繼續統計中）Jupyter notebook 的小技巧中吧！

Jupyter 項目誕生於 IPython 項目，然後逐漸發展為一個支援多種語言的 notebook，因此它在歷史上的原名是 IPython notebook。這個名字 Jupyter 來自於三個核心語言的間接縮寫：「JU」lia、「PYT」hon 和 「R」，同時也從火星（Jupiter）一詞獲得了靈感。

當我們在 Jupyter 中使用 python 工作時，IPython 的內核會被使用，這讓我們可以在我們的 Jupyter notebook 中很容易的去訪問一些 IPython 的特性（後面我們會詳細介紹！）

接下來我們會給你展示 28 個訣竅和技巧，讓你更容易的使用 Jupyter 來完成工作。

1. 鍵盤快捷鍵

眾所周知，鍵盤快捷鍵可以節約你的大量時間。Jupyter 在頂部的菜單欄中存儲了一個鍵盤快捷鍵的列表：Help > Keyboard Shortcuts，或者在命令模式（後面我們會詳細介紹）按下 H 也可以。每次你更新 Jupyter 都需要檢查一下這個，因為會持續添加新的快捷鍵進來。

另外一種可以訪問鍵盤快捷鍵，並且方便的學習它們的方法是使用命令行介面：Cmd + Shift + P（在 Linux 和 Windows 上面則是 Ctrl + Shift + P）。這個對話框可以幫助你通過名稱運行任何命令——如果你不知道某個操作的鍵盤快捷鍵，或者你想要執行的操作沒有快捷鍵，那麼這個方法就會非常有用。它的功能類似於 Mac 上的 Spotlight search，一旦你開始使用它，你將難以想像沒有它該如何生活。

推薦一些我比較喜歡的：

Esc 會使你進入命令模式，此時你可以通過方向鍵來導航你的 notebook。
當在命令模式時：

A 鍵會在當前 cell 前面插入一個新的 cell，B 鍵則是在後面進行插入。
M 鍵改變當前的 cell 為 Markdown 格式，Y 鍵將其換回程式碼格式。
D + D（雙擊該鍵）可以刪除當前的 cell

Enter 可以從命令模式返回到當前 cell 的編輯模式，
Shift + Tab 將會展示你剛剛輸入的程式碼塊的文檔——你可以通過一直按這個快捷鍵來循環展示幾種文檔模式。
Ctrl + Shift + - 可以將當前程式碼塊從你的游標位置，拆分為兩個。
Esc + F 發現並替換你的程式碼，但是並不會輸出。
Esc + O 切換當前 cell 的輸出。
選擇多個 cell：

Shift + J 或者 Shift + Down 向下選擇下一個 cell。你也可以通過使用 Shift + K 或者 Shift + Up 來向上選擇對應的 cell。
當多個 cell 被選擇時，你就可以一次性刪除/複製/剪切/粘貼/運行它們。當你想要在 notebook 中移動一部分內容時，這就很有幫助。
你也可以使用 Shift + M 來合併多個 cell。

2. 漂亮的展示變數

關於漂亮的第一點是眾所周知的。當完成一個 Jupyter cell 時，如果是一個變數名或者是一個沒有將輸出賦值的語句，Jupyter 在沒有 print 語句的情況下依然會展示該變數。這一點在處理 Pandas 的 DataFrames 時尤其有用，對應的輸出會被整齊的展示為一個表格。

比較鮮為人知的是，你可以調整 ast_note_interactivity 內核選項，來使得 Jupyter 可以在自己的每行變數或語句上執行此操作，所以你可以一次性看到多個語句的變數值。

from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

from pydataset import data
quakes = data('quakes')
quakes.head()
quakes.tail()

lat	long	depth	mag	stations
1	-20.42	181.62	562	4.8	41
2	-20.62	181.03	650	4.2	15
3	-26.00	184.10	42	5.4	43
4	-17.97	181.66	626	4.1	19
5	-20.42	181.96	649	4.0	11

lat	long	depth	mag	stations
996	-25.93	179.54	470	4.4	22
997	-12.28	167.06	248	4.7	35
998	-20.13	184.20	244	4.5	34
999	-17.40	187.80	40	4.5	14
1000	-21.59	170.56	165	6.0	119

如果你想要在所有的 Jupyter（Notebook 和控制台）例子中設置這種形式，可以按照下面的方法簡單的創建一個文件 ~/.ipython/profile_default/ipython_config.py。

c = get_config()

# Run all nodes interactively
c.InteractiveShell.ast_node_interactivity = "all"

3. 方便的鏈接文檔

在內置的 Help 菜單中，你可以發現一些常見庫在線文檔的便捷鏈接，包括 NumPy，Pandas，SciPy 和 Matplotlib。

不要忘記在一個庫，方法或者變數前面附加 ?，你就可以訪問文檔來獲取相應語法的快速參考。

?str.replace()

Docstring:
S.replace(old, new[, count]) -> str

Return a copy of S with all occurrences of substring
old replaced by new.  If the optional argument count is
given, only the first count occurrences are replaced.
Type:      method_descriptor

4. 在 notebook 中繪圖

在你的 notebook 中有一些選項可以生成繪圖結果。

matplotlib^[3]（事實上已成為標準選項），通過 %matplotlib inline 來激活。這裡推薦一個 Dataquest 上的 Matplotlib 教程^[4]。
%matplotlib notebook 提供了交互性，但是略微有點慢，因為渲染是在服務端完成的。
Seaborn^[5] 是建立在 Matplotlib 之上的，可以非常容易的構建一些更具有吸引力的圖形。只需要導入 Seaborn，無需任何程式碼上的修改，你的 matplotlib 圖形就可以變得「更漂亮」。
mpld3^[6] 提供了對 matplotlib 程式碼替代的渲染器（使用 d3）。很不錯，儘管還不夠完整。
bokeh^[7] 是一個構建交互圖形更好的選項。
plot.ly^[8] 可以生成漂亮的圖形——這在過去只是一個付費服務，但是最近開源了。
Altair^[9] 是一個相對較新的 python 可視化庫。它易於使用並且看一看做出很漂亮的圖形，但是在自定義訂製圖形的能力上不如 Matplotlib 強大。

5. IPython 魔術命令

上面我們提到的 %matplotlib inline 就是 IPython 魔術命令的一個例子。由於是基於 IPython 內核，Jupyter 可以從 IPython 內核中訪問所有魔術命令，它們可以讓你的生活輕鬆很多。

# 這將會展示所有魔術命令
%lsmagic

Available line magics:
%alias %alias_magic %autocall %automagic %autosave %bookmark %cat %cd %clear %colors %config %connect_info %cp %debug %dhist %dirs %doctest_mode %ed %edit %env %gui %hist %history %killbgscripts %ldir %less %lf %lk %ll %load %load_ext %loadpy %logoff %logon %logstart %logstate %logstop %ls %lsmagic %lx %macro %magic %man %matplotlib %mkdir %more %mv %notebook %page %pastebin %pdb %pdef %pdoc %pfile %pinfo %pinfo2 %popd %pprint %precision %profile %prun %psearch %psource %pushd %pwd %pycat %pylab %qtconsole %quickref %recall %rehashx %reload_ext %rep %rerun %reset %reset_selective %rm %rmdir %run %save %sc %set_env %store %sx %system %tb %time %timeit %unalias %unload_ext %who %who_ls %whos %xdel %xmode 
Available cell magics:%%! %%HTML %%SVG %%bash %%capture %%debug %%file %%html %%javascript %%js %%latex %%perl %%prun %%pypy %%python %%python2 %%python3 %%ruby %%script %%sh %%svg %%sx %%system %%time %%timeit %%writefile 
Automagic is ON, % prefix IS NOT needed for line magics.

我建議你瀏覽所有 IPython 魔術方法的文檔^[10]，你將無疑會發現一些對你有幫助的內容。下面介紹一些我比較喜歡的：

6. IPython Magic – %env：設置環境變數

你可以在你的 notebook 中管理環境變數，而無需重啟 jupyter 服務進程。一些庫（例如 theano）使用環境變數來控制性能，%env 是最方便的方法。

# 不帶任何參數運行 %env 會列出所有環境變數
# 下面的語句設置環境變數
%env OMP_NUM_THREADS=4

env: OMP_NUM_THREADS=4

7. IPython Magic – %run：執行 python 程式碼

%run 可以從 .py 文件中執行 python 程式碼——這是大量文檔證明的做法。事實上很少有人知道，它同樣可以執行其它 jupyter notebook，這個相當有用。

注意使用 %run 並不等同於導入一個 python 包。

# 這個方法會執行並且輸出給定 notebook 中的所有程式碼塊
%run ./two-histograms.ipynb

8. IPython Magic – %load：從一個外部腳本中插入程式碼

這個會用一個外部腳本來替換當前 cell 中的內容。你可以使用一個你電腦上的文件作為替換源，也可以對應替換成一個 URL。

# Before Running
%load ./hello_world.py

# After Running
# %load ./hello_world.py
if __name__ == "__main__":
print("Hello World!")

Hello World!

9. IPython Magic – %store：在不同 notebook 中傳遞變數

%store 命令可以讓你在兩個不同的 notebook 中傳遞變數。

data = 'this is the string I want to pass to different notebook'
%store data
del data # This has deleted the variable

Stored 'data' (str)

現在，在另外一個 notebook 中…

%store -r data
print(data)

this is the string I want to pass to different notebook

10. IPython Magic – %who：展示全部變數

不帶任何參數的 %who 命令會展示所有全局作用域中的變數。傳入一個參數例如 str 將會只列出對應的類型。

one = "for the money"
two = "for the show"
three = "to get ready now go cat go"
%who str

one three two

11. IPython Magic – Timing

關於時間方面有兩個 IPython 魔術命令比較常用——%%time 和 %timeit。當你有一些運行比較慢的程式碼，你又嘗試去定位問題時，這些就非常有用了。

%%time 會為一個單獨的程式碼塊給出運行資訊。

%%time
import time
for _ in range(1000):
time.sleep(0.01) # sleep for 0.01 seconds

CPU times: user 21.5 ms, sys: 14.8 ms, total: 36.3 ms Wall time: 11.6 s

%%timeit 使用 python 的 timeit 模組^[11]，會運行一個語句 100,000 次（默認值）並且提供最快的三次運行的平均時間。

import numpy
%timeit numpy.random.normal(size=100)

The slowest run took 7.29 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 5.5 µs per loop

12. IPython Magic – %%writefile 和 %pycat：導出 cell 內容/展示外部腳本內容

使用 %%writefile 魔術命令將 cell 中的內容保存到一個外部文件中。%pycat 恰恰相反，可以在語法高亮的基礎上為你展示（以彈出的方式）外部文件的內容。

%%writefile pythoncode.py 
import numpy
def append_if_not_exists(arr, x):
if x not in arr:
arr.append(x)def some_useless_slow_function():
arr = list()
for i in range(10000):
x = numpy.random.randint(0, 10000)
append_if_not_exists(arr, x)

Writing pythoncode.py

%pycat pythoncode.py

import numpy
def append_if_not_exists(arr, x):
if x not in arr:
arr.append(x)def some_useless_slow_function():
arr = list()
for i in range(10000):
x = numpy.random.randint(0, 10000)
append_if_not_exists(arr, x)

13. IPython Magic – %prun：展示每個函數中你程式的消耗時間

使用 %prun statement_name 會給你返回一個有序的表，內容包括語句中每個內部函數被調用的次數，每次調用所用的時間，以及所有函數運行的累計時間。

%prun some_useless_slow_function()

26324 function calls in 0.556 seconds 
Ordered by: internal time 
ncalls tottime percall cumtime percall filename:lineno(function)
10000 0.527 0.000 0.528 0.000 :2(append_if_not_exists)
10000 0.022 0.000 0.022 0.000 {method 'randint' of 'mtrand.RandomState' objects}
1 0.006 0.006 0.556 0.556 :6(some_useless_slow_function)
6320 0.001 0.000 0.001 0.000 {method 'append' of 'list' objects}
1 0.000 0.000 0.556 0.556 :1()
1 0.000 0.000 0.556 0.556 {built-in method exec}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}

14. IPython Magic – 用 %pdb 調試

Jupyter 有自己的 The Python Debugger (`pdb`)^[12] 介面。這就使得它能夠進入函數內部，並且挖掘具體發生了什麼。

你可以在這裡^[13] 瀏覽可用的 pdb 命令的具體內容。

%pdb 
def pick_and_take():
picked = numpy.random.randint(0, 1000)
raise NotImplementedError()
pick_and_take()

Automatic pdb calling has been turned ON

--------------------------------------------------------------------
NotImplementedError Traceback (most recent call last)
in ()
5 raise NotImplementedError()
6
----> 7 pick_and_take()
in pick_and_take()
3 def pick_and_take():
4 picked = numpy.random.randint(0, 1000)
----> 5 raise NotImplementedError()
6
7 pick_and_take()
NotImplementedError:

> (5)pick_and_take()
3 def pick_and_take():
4 picked = numpy.random.randint(0, 1000)
----> 5 raise NotImplementedError()
6
7 pick_and_take()

ipdb>

譯者小結

今天我們先介紹這 14 個小技巧，下篇文章我們再繼續介紹後面的 14 個 Jupyter 小技巧，包括與外部 shell 交互，使用 LaTeX，在不同的內核上運行等。我們下篇再見喲~~

Reference

[1]

Alex 的部落格: //arogozhnikov.github.io/2016/09/10/jupyter-features.html

[2]

讓我們知道: //twitter.com/dataquestio

[3]

matplotlib: //matplotlib.org/

[4]

Matplotlib 教程: //www.dataquest.io/blog/matplotlib-tutorial/

[5]

Seaborn: //seaborn.pydata.org/

[6]

mpld3: //github.com/mpld3/mpld3

[7]

bokeh: //bokeh.pydata.org/en/latest/

[8]

plot.ly: //plot.ly/

[9]

Altair: //github.com/altair-viz/altair

[10]

所有 IPython 魔術方法的文檔: //ipython.readthedocs.io/en/stable/interactive/magics.html

[11]

timeit 模組: //docs.python.org/3.5/library/timeit.html

[12]

The Python Debugger (pdb): //docs.python.org/3.5/library/pdb.html

[13]

pdb 可用命令: //docs.python.org/3.5/library/pdb.html#debugger-commands

往期回顧

一篇長文學懂 pytorch

一個例子告訴你，在 pytorch 中應該如何並行生成數據

一篇長文學懂入門推薦演算法庫：surprise

推薦系統教程：八篇文章給你梳理推薦系統的技術