【Pandas】數據分析工具Pandas的基本操作和可視化工具Matplotlib

2020 年 2 月 21 日
筆記

1、Pandas簡介

pandas是python的一個數據分析包，最初由AQR Capital Management於2008年4月開發，並於2009年底開源出來，目前由專註於Python數據包開發的PyData開發team繼續開發和維護，屬於PyData項目的一部分。Pandas最初被作為金融數據分析工具而開發出來，因此，pandas為時間序列分析提供了很好的支援。 Pandas的名稱來自於面板數據（panel data）和python數據分析（data analysis）。panel data是經濟學中關於多維數據集的一個術語，在Pandas中也提供了panel的數據類型。官網：http://pandas.pydata.org/ 參考文檔：http://pandas.pydata.org/pandas-docs/stable/

2、Pandas安裝

Python的Anaconda發行版，已經安裝好了pandas庫，因此無需另外安裝。使用Anaconda介面安裝：打開Anaconda Navigator，選擇開發環境，從Not installed下找到pandas相關的庫，勾選安裝。 Anaconda安裝命令： conda install pandas PyPi安裝命令： pip install pandas

3、Pandas數據結構

（1）Pandas引入約定

from pandas import Series, DataFrame  import pandas as pd

（2）Series Series是一種類似於一維數組的對象，它是由一組數據（各種Numpy數據類型）以及一組與之相關的數據標籤（即索引）組成。僅由一組數據即可產生簡單的Series。 1）通過一維數組創建Series

>> import numpy as np  >> import pandas as pd  >> from pandas import Series, DataFrame    >> arr = np.array([1, 2, 3, 4])  >> series01 = Series(arr)  >> series01  0	1  1	2  2	3  3	4  dtype: int32  >> series01.index  RangeIndex(start=0, stop=4, step=1)  >> series01.values  array([1, 2, 3, 4])  >> series01.dtype  dtype('int32')    >> series02 = Series([34.5, 56.78, 45.67])										# 通過數組創建時，如果沒有為數據指定索引，則會自動創建一個從0到N-1（N為數據的長度）的整數型索引  >> series02  0	34.50  1	56.78  2	45.67  dtype: float64  >> series02.index = ['product1', 'product2', 'product3']						# 默認索引可通過賦值方式進行修改  >> series02  product1	34.50  product2	56.78  product3	45.67  dtype: float64    >> series03 = Series([98, 56, 88, 45], index=['語文', '數學', '英語', '體育'])		# 通過數組創建Series時，可以通過index參數傳入一個明確的標籤索引  >> series03  語文	98  數學	56  英語	88  體育	45  dtype: int64  >> series03.index  Index([u'語文', u'數學', u'英語', u'體育'], dtype='object')  >> series03.values  array([98, 56, 88, 45], dtype=int64)

2）通過字典的方式創建Series Series可以被看成是一個定長的有序字典，是索引值到數據值的一個映射，因此可以直接通過字典來創建Series。

>> a_dict = {'20071001':6798.98, '20071002':34556.89, '20071003':3748758.88}  >> series04 = Series(a_dict)		# 通過字典創建Series時，字典中的key組成Series的索引，字典中的value組成Series中的values  >> series04.index  Index([u'20071001', u'20071002', u'20071003'], dtype='object')  >> series04  20071001	6798.98  20071002	34556.89  20071003	3748758.88

3）Series應用Numpy數組運算通過索引取值：

>> series04['20071001']  6798.9799999999996  >> series04[0]  6798.9799999999996

Numpy中的數組運算，在Series中都保留使用，並且Series進行數組運算時，索引與值之間的映射關係不會改變。

>> series04  20071001	6798.98  20071002	34556.89  20071003	3748758.88  dtype: float64  >> series04[series04>10000]  20071002	34556.89  20071003	3748758.88  dtype: float64  >> series04 / 100  20071001	67.9898  20071002	345.5689  20071003	37487.5888  dtype: float64  >> series01 = Series([1, 2, 3, 4])  >> np.exp(series01)  0	2.718282  1	7.389056  2	20.085537  3	54.598150  dtype: float64

4）Series缺失值檢測

>> scores = Series({"Tom":89, "John":88, "Merry":96, "Max":65})  >> scores  John	88  Max		65  Merry	96  Tom		89  dtype: int64  >> new_index = ['Tom', 'Max', 'Joe', 'John', 'Merry']  >> scores = Series(scores, index=new_index)  >> scores  Tom		89.0  Max		65.0  Joe		NaN				# NaN（not a number）在pandas中用於表示一個缺失或者NA值  John	88.0  Merry	96.0  dtype: float64

pandas中的isnull和notnull函數可用於Series缺失值檢測，isnull和notnull都返回一個布爾類型的Series。

>> pd.isnull(scores)  Tom		False  Max		False  Joe		True  John	False  Merry	False  dtype: bool  >> pd.notnull(scores)  Tom		True  Max		True  Joe		False  John	True  Merry	True  dtype: bool  >> scores[pd.isnull(scores)]		# 過濾出為缺失值的項  Joe		NaN  dtype: float64  >> scores[pd.notnull(scores)]		# 過濾出不是缺失值的項  Tom		89.0  Max		65.0  John	88.0  Merry	96.0  dtype: float64

5）Series自動對齊不同Series之間進行算術運算，會自動對齊不同索引的數據。

product_num = Series([23, 45, 67, 89], index=['p3', 'p1', 'p2', 'p5'])  product_price_table = Series([9.98, 2.34, 4.56, 5.67, 8.78], index=['p1', 'p2', 'p3', 'p4', 'p5'])  product_sum = product_num * product_price_table  product_sum  p1		449.10  p2		156.78  p3		104.88  p4		NaN  p5		781.42  dtype: float64

6）Series及其索引的name屬性 Series對象本身及其索引都有一個name屬性，可賦值設置。

>> product_num.name = 'ProductNums'  >> product_num.index.name = 'ProductType'  >> product_num  ProductType  p3		23  p1		45  p2		67  p5		89  Name: ProductNums, dtype: int64

（3）DataFrame DataFrame是一個表格型的數據結構，含有一組有序的列，每列可以是不同的值類型（數值、字元串、布爾值等），DataFrame既有行索引也有列索引，可以被看做是由Series組成的字典。 1）通過二維數組創建DataFrame

>> df01 = DataFrame([['Tom', 'Merry', 'John'], [76, 98, 100]])  >> df01  		0		1			2  0		Tom		Merry		John  1		76		98			100  >> df02 = DataFrame([['Tom', 76], ['Merry', 98], ['John', 100]])  >> df02  		0		1  0		Tom		76  1		Merry	98  2		John	100  >> arr = np.array([['Tom', 76], ['Merry', 98], ['John', 100]])  >> df03 = DataFrame(arr, columns=['name', 'score'])  >> df03  		name	score  0		Tom		76  1		Merry	98  2		John	100  >> df04 = DataFrame(arr, index=['one', 'two', 'three'], columns=['name', 'score'])		# 自定義行索引index，自定義列索引columns  >> df04  		name	score  one		Tom		76  two		Merry	98  three	John	100

2）通過字典的方式創建DataFrame

>> data = {"apart":['1001', '1002', '1003', '1001'], "profits":[567.87, 987.87, 873, 498.87], "year":[2001, 2001, 2001, 2000]}  >> df = DataFrame(data)  >> df  		apart		profits		year  0		1001		567.87		2001  1		1002		987.87		2001  2		1003		873.00		2001  3		1001		498.87		2000  >> df.index  RangeIndex(start=0, stop=4, step=1)  >> df.columns  Index([u'apart', u'profits', u'year'], dtype='object')  >> df.values  array([['1001', 567.87, 2001L],  	   ['1002', 987.87, 2001L],  	   ['1003', 873.0, 2001L],  	   ['1001', 498.87, 2000L]], dtype=object)  >> data = {"apart":['1001', '1002', '1003', '1001'], "profits":[567.87, 987.87, 873, 498.87], "year":[2001, 2001, 2001, 2000]}  >> df = DataFrame(data, index=['one', 'two', 'three', 'four'])  >> df  		apart		profits		year  one		1001		567.87		2001  two		1002		987.87		2001  three	1003		873.00		2001  four	1001		498.87		2000  >> df.index  Index([u'one', u'two', u'three', u'four'], dtype='object')

（4）索引對象不管是Series對象還是DataFrame對象，都有索引對象。索引對象負責管理軸標籤和其他元數據（比如軸名稱等）。通過索引可以從Series、DataFrame中取值或對某個位置的值重新賦值。Series或者DataFrame自動化對齊功能就是通過索引進行的。 1）通過索引從Series中取值

>> series02 = Series([34.56, 23.34, 45.66, 98.08], index=['2001', '2002', '2003', '2004'])  >> series02  2001	34.56  2002	23.34  2003	45.66  2004	98.08  dtype: float64  >> series02['2003']  45.659999999999997  >> series02['2002':'2004']		# 包含右邊界，這與Python基礎中的列表等不一樣  2002	23.34  2003	45.66  2004	98.08  dtype: float64  >> series02['2001':]  2001	34.56  2002	23.34  2003	45.66  2004	98.08  dtype: float64  >> series02[:'2003']  2001	34.56  2002	23.34  2003	45.66  dtype: float64  >> series02['2001'] = 35.65  >> series02  2001	35.65  2002	23.34  2003	45.66  2004	98.08  dtype: float64  >> series02[:'2002'] = [23.45, 56.78]  >> series02  2001	23.45  2002	56.78  2003	45.66  2004	98.08  dtype: float64

2）通過索引從DataFrame中取值可以直接通過列索引獲取指定列的數據，要通過行索引獲取指定行數據需要ix方法。

>> df  		apart	profits		year  0		1001	567.87		2001  1		1002	987.87		2001  2		1003	873.00		2001  3		1001	498.87		2000  >> df['year']  0		2001  1		2001  2		2001  3		2000  Name: year, dtype: int64  >> df.ix[0]  apart			1001  profits			567.87  year			2001  Name: 0, dtype: object  >> df = DataFrame(data)  >> df  		apart	profits		year  0		1001	567.87		2001  1		1002	987.87		2001  2		1003	873.00		2001  3		1001	498.87		2000  >> df['pdn'] = np.NaN  >> df  		apart	profits		year	pdn  0		1001	567.87		2001	NaN  1		1002	987.87		2001	NaN  2		1003	873.00		2001	NaN  3		1001	498.87		2000	NaN

4、Pandas基本功能

（1）匯總和計算描述統計 1）常用的數學和統計方法

方法	說明
count	非NA值的數量
describe	針對Series或各DataFrame列計算多個統計量
min/max	計算最小值、最大值
argmin、argmax	計算能夠獲取到最小值和最大值的索引位置（整數）
idxmin、idxmax	計算能夠獲取到最小值和最大值的索引值
quantile	計算樣本的分位數（0到1）
sum	值的總和
mean	值的平均數
median	值的算術中位數（50%分位數）
mad	根據平均值計算平均絕對離差
var	樣本數值的方差
std	樣本值的標準差
cumsum	樣本值的累計和
cummin、cummax	樣本值的累計最小值、最大值
cumprod	樣本值的累計積
Pct_change	計算百分數變化

>> data = {'a': [0, 2, 4, 6, 8, 10, 12, 14], 'b': [1, 3, 5, 7, 9, 11, 13, 15]}  >> df = DataFrame(data)  >> df.describe()  		a			b  count	8.00000		8.00000  mean	7.00000		8.00000  std		4.89898		4.89898  min		0.00000		1.00000  25%		3.50000		4.50000  50%		7.00000		8.00000  75%		10.50000	11.50000  max		14.00000	15.00000  >> frame  		d	a	b	c  three	0	1	2	3  one		4	5	6	7  >> frame.count()			# 對於DataFrame，這些統計方法，默認是計算各列上的數據  d	2  a	2  b	2  c	2  dtype: int64  >> frame.count(axis=1)		# 如果要應用於各行數據，則增加參數axis=1  three	4  one		4  dtype: int64

2）相關係數與協方差

>> df = DataFrame({"GDP": [12, 23, 34, 45, 56], "air_temperature": [23, 25, 26, 27, 30]}, index=['2001', '2002', '2003', '2004', '2005'])  >> df  		GDP		air_temperature  2001	12		23  2002	23		25  2003	34		26  2004	45		27  2005	56		30  >> df.corr()  					GDP			air_temperature  GDP					1.000000	0.977356  air_temperature		0.977356	1.000000  >> df.cov()  					GDP			air_temperature  GDP					302.5		44.0  air_temperature		44.0		6.7  >> df['GDP'].corr(df['air_temperature'])  0.97735555485044179  >> df['GDP'].cov(df['air_temperature'])  44.0  >> series = Series([13, 13.3, 13.5, 13.6, 13.7], index=['2001', '2002', '2003', '2004', '2005'])  >> series  2001	13.0  2002	13.3  2003	13.5  2004	13.6  2005	13.7  dtype: float64  >> df.corrwith(series)  GDP					0.968665  air_temperature		0.932808  dtype: float64

3）唯一值、值計數以及成員資格 unique方法用於獲取Series唯一值數組。value_counts方法用於計算一個Series中各值出現的頻率。isin方法用於判斷矢量化集合的成員資格，可用於選取Series中或者DataFrame中列數據的子集。

>> ser = Series(['a', 'b', 'c', 'a', 'a', 'b', 'c'])  >> ser  0	a  1	b  2	c  3	a  4	a  5	b  6	c  dtype: object  >> ser.unique()  array(['a', 'b', 'c'], dtype=object)  >> df = DataFrame({'orderId': ['1001', '1002', '1003', '1004'], 'orderAmt': [345.67, 34.23, 456.77, 334.55], 'memberId': ['a1001', 'b1002', 'a1001', 'a1001']})  >> df  	memberId	orderAmt	orderId  0	a1001		345.67		1001  1	b1002		34.23		1002  2	a1001		456.77		1003  3	a1001		334.55		1004  >> df['memberId'].unique()  array(['a1001', 'b1002'], dtype=object)  >> ser  0	a  1	b  2	c  3	a  4	a  5	b  6	c  dtype: object  >> ser.value_counts()			# 默認情況下會按值出現頻率降序排列  a	3  b	2  c	2  dtype: int64  >> ser.value_counts(ascending=False)  a	3  b	2  c	2  dtype: int64  >> mask = ser.isin(['b', 'c'])  >> mask  0	False  1	True  2	True  3	False  4	False  5	True  6	True  dtype: bool  >> ser[mask]					# 選出值為'b'、'c'的項  1	b  2	c  5	b  6	c

（2）處理缺失數據 1）缺失值NaN處理方法

方法	說明
dropna	根據標籤的值中是否存在缺失數據對軸標籤進行過濾（刪除），可通過閾值調節對缺失值的容忍度
fillna	用指定值或插值方法（如ffill或bfill）填充缺失數據
isnull	返回一個含有布爾值的對象，這些布爾值表示哪些值是缺失值NA
notnull	Isnull的否定式

2）缺失值檢測

>> df = DataFrame([['Tom', np.nan, 456.67, 'M'], ['Merry', 34, 4567.34, np.NaN], ['John', 23, np.NaN, 'M'], ['Joe', 18, 342.45, 'F']], columns=['name', 'age', 'salary', 'gender'])  >> df  	name	age		salary	gender  0	Tom		NaN		456.67	M  1	Merry	34.0	4567.34	NaN  2	John	23.0	NaN		M  3	Joe		18.0	342.45	F  >> df.isnull()  	name	age		salary	gender  0	False	True	False	False  1	False	False	False	True  2	False	False	True	False  3	False	False	False	False  >> df.notnull()  	name	age		salary	gender  0	True	False	True	True  1	True	True	True	False  2	True	True	False	True  3	True	True	True	True

3）過濾缺失數據

>> series = Series([1, 2, 3, 4, np.NaN, 5])  >> series.dropna()  0	1.0  1	2.0  2	3.0  3	4.0  5	5.0  dtype: float64  >> data = DataFrame([[1., 3.4, 4.], [np.nan, np.nan, np.nan], [np.nan, 4.5, 6.7]])  >> data  	0		1		2  0	1.0		3.4		4.0  1	NaN		NaN		NaN  2	NaN		4.5		6.7  >> data.dropna()						# 默認丟棄只要含有缺失值的行  	0		1		2  0	1.0		3.4		4.0  >> data.dropna(how='all')				# 丟棄全部為缺失值的行  	0		1		2  0	1.0		3.4		4.0  2	NaN		4.5		6.7  >> data[4] = np.nan  >> data  	0		1		2		4  0	1.0		3.4		4.0		NaN  1	NaN		NaN		NaN		NaN  2	NaN		4.5		6.7		NaN  >> data.dropna(axis=1, how='all')		# 丟棄全部為缺失值的列  	0		1		2  0	1.0		3.4		4.0  1	NaN		NaN		NaN  2	NaN		4.5		6.7

4）填充缺失數據

>> df = DataFrame(np.random.randn(7, 3))  >> df.ix[:4, 1] = np.nan  >> df.ix[:2, 2] = np.nan  >> df  	0			1			2  0	1.101286	NaN			NaN  1	1.071460	NaN			NaN  2	0.058237	NaN			NaN  3	-1.629676	NaN			-0.556655  4	-1.036194	NaN			-0.063239  5	0.686838	0.666562	1.252273  6	0.852754	-1.035739	0.102285  >> df.fillna(0)  	0			1			2  0	1.101286	0.000000	0.000000  1	1.071460	0.000000	0.000000  2	0.058237	0.000000	0.000000  3	-1.629676	0.000000	-0.556655  4	-1.036194	0.000000	-0.063239  5	0.686838	0.666562	1.252273  6	0.852754	-1.035739	0.102285  >> df.fillna({1: 0.5, 2: -1, 3: -2})  	0			1			2  0	1.101286	0.500000	-1.000000  1	1.071460	0.500000	-1.000000  2	0.058237	0.500000	-1.000000  3	-1.629676	0.500000	-0.556655  4	-1.036194	0.500000	-0.063239  5	0.686838	0.666562	1.252273  6	0.852754	-1.035739	0.102285

（3）層次化索引在某個方向上擁有多個（兩個及兩個以上）索引級別。通過層次化索引，pandas能夠以低維度形式處理高維度數據。通過層次化索引，可以按層級統計數據。 1）Series層次化索引

>> data = Series([988.44, 95859, 3949.44, 32445.44, 234.45], index=[['2001', '2001', '2001', '2002', '2002'], ['蘋果', '香蕉', '西瓜', '蘋果', '西瓜']])  >> data  2001	蘋果	988.44  		香蕉	95859.00  		西瓜	3949.44  2002	蘋果	32445.44  		西瓜	234.45  dtype: float64  >> data.index.names = ['年份', '水果類別']  >> data  年份	水果類別  2001	蘋果	988.44  		香蕉	95859.00  		西瓜	3949.44  2002	蘋果	32445.44  		西瓜	234.45  dtype: float64

2）DataFrame層次化索引

>> df = DataFrame({'year': [2001, 2001, 2002, 2002, 2003], 'fruit': ['apple', 'banana', 'apple', 'banana', 'apple'], 'production': [2345, 3423, 4556, 4455, 534], 'profits': [2334.44, 44556.55, 6677.88, 77856.778, 3345.55]})  >> df  	fruit	production	profits		year  0	apple	2345		2334.440	2001  1	banana	3423		44556.550	2001  2	apple	4556		6677.880	2002  3	banana	4455		77856.778	2002  4	apple	534			3345.550	2003  >> df.set_index(['year', 'fruit'])  year	fruit		production	profits  2001	apple		2345		2334.440  		banana		3423		44556.550  2002	apple		4556		6677.880  		banana		4455		77856.778  2003	apple		534			3345.550  >> new_df = df.set_index(['year', 'fruit'])  >> new_df.index  MultiIndex(levels=[[2001, 2002, 2003], [u'apple', u'banana']],  		   labels=[[0, 0, 1, 1, 2], [0, 1, 0, 1, 0]],  		   names=[u'year', u'fruit'])

3）按層級統計數據

>> new_df.index  MultiIndex(levels=[[2001, 2002, 2003], [u'apple', u'banana']],  		   labels=[[0, 0, 1, 1, 2], [0, 1, 0, 1, 0]],  		   names=[u'year', u'fruit'])  >> new_df.sum(level='year')  year	production	profits  2001	5768		46890.990  2002	9011		84534.658  2003	534			3345.550  >> new_df.sum(level='fruit')  fruit	production	profits  apple	7435		12357.870  banana	7878		122413.328

5、Matplotlib

（1）Matplotlib簡介 Matplotlib是python最著名的繪圖庫，它提供了一整套和matlab相似的命令API，十分適合互動式地進行製圖。而且也可以方便地將它作為繪圖控制項，嵌入GUI應用程式中。官網地址：http://matplotlib.org/。學習方式，從官網examples入手學習：http://matplotlib.org/examples/index.html。 http://matplotlib.org/gallery.html有各種圖示案例。（2）Figure和Subplot matplotlib的影像都位於Figure對象中，Figure對象下創建一個或多個subplot對象（即axes）用於繪製圖表。

import matplotlib.pyplot as plt  import numpy as np    # 設置中文和 '-' 負號  from pylab import mpl  mpl.rcParams['font.sans-serif'] = ['FangSong']  mpl.rcParams['axes.unicode_minus'] = False    # 獲得Figure對象  fig = plt.figure(figsize=(8, 6))  # 在Figure對象上創建axes對象  ax1 = fig.add_subplot(2, 2, 1)  ax2 = fig.add_subplot(2, 2, 2)  ax3 = fig.add_subplot(2, 2, 3)  # 在當前axes上繪製曲線圖（ax3）  plt.plot(np.random.randn(50).cumsum(), 'k--')  # 在ax1上繪製柱狀圖  ax1.hist(np.random.randn(300), bins=20, color='k', alpha=0.3)  # 在ax2上繪製散點圖  ax2.scatter(np.arange(30), np.arange(30) + 3 * np.random.randn(30))  plt.show()

import matplotlib.pyplot as plt  import numpy as np    # 設置中文和 '-' 負號  from pylab import mpl  mpl.rcParams['font.sans-serif'] = ['FangSong']  mpl.rcParams['axes.unicode_minus'] = False    fig, axes = plt.subplots(2, 2, sharex=True, sharey=True)  print axes    for i in range(2):  	for j in range(2):  		axes[i, j].hist(np.random.randn(500), bins=10, color='k', alpha=0.5)    plt.subplots_adjust(wspace=0, hspace=0)  plt.show()

（3）Matplotlib繪製曲線圖

import numpy as np  import matplotlib.pyplot as plt  x = np.linspace(0, 10, 100)  y = np.sin(x)  z = np.cos(x ** 2)  plt.figure(figsize=(8, 4))				# 創建一個繪圖對象，並且指定寬8英寸，高4英寸  # label：給所繪製的曲線一個名字，此名字在圖示（legend）中顯示  # 只要在字元串前後添加"$"符號，matplotlib就會使用其內嵌的latex引擎繪製數學公式  # color指定曲線顏色，linewidth指定曲線寬度，"b--"指定曲線的顏色和線型  plt.plot(x, y, label="$sin(x)$", color="red", linewidth=2)  plt.plot(x, z, "b--", label="$cos(x^2)$")  plt.xlabel("Time(s)")					# 設置x軸標題  plt.ylabel("Volt")						# 設置y軸標題  plt.title("PyPlot First Example")		# 設置圖表標題  plt.ylim(-1.2, 1.2)						# 設置x軸範圍  plt.legend()							# 顯示圖示說明  plt.grid(True)							# 顯示虛線框  plt.show()								# 展示圖表

（4）Matplotlib繪製散點圖

import matplotlib.pyplot as plt  plt.axis([0, 5, 0, 20])  plt.title('My First Chart', fontsize=20, fontname='Times New Roman')  plt.xlabel('Counting', color='gray')  plt.ylabel('Square values', color='gray')  plt.text(1, 1.5, 'First')  plt.text(2, 4.5, 'Second')  plt.text(3, 9.5, 'Third')  plt.text(4, 16.5, 'Fourth')  plt.text(1, 11.5, r'$y=x^2$', fontsize=20, bbox={'facecolor': 'yellow', 'alpha': 0.2})  plt.grid(True)  plt.plot([1, 2, 3, 4], [1, 4, 9, 16], 'ro')  plt.plot([1, 2, 3, 4], [0.8, 3.5, 8, 15], 'g^')  plt.plot([1, 2, 3, 4], [0.5, 2.5, 5.4, 12], 'b*')  plt.legend(['First series', 'Second series', 'Third series'], loc=2)  plt.savefig('my_chart.png')  plt.show()

（5）顏色、標記和線型通過help(plt.plot)查看文檔。

import matplotlib.pyplot as plt  import numpy as np  from pylab import mpl  mpl.rcParams['font.sans-serif'] = ['FangSong']  mpl.rcParams['axes.unicode_minus'] = False  x = np.arange(-5, 5)  y = np.sin(np.arange(-5, 5))  plt.axis([-5, 5, -5, 5])  plt.plot(x, y, color='g', linestyle='dashed', marker='o')  plt.text(-3, -3, '$y=sin(x)$', fontsize=20, bbox={'facecolor': 'yellow', 'alpha': 0.2})  plt.show()

（6）刻度、標籤和圖例

xlim、ylim控制圖表的範圍
xticks、yticks控制圖表刻度位置
xtickslabels,yticklabels控制圖表刻度標籤

（7）將圖表保存到文件

plt.savefig(文件名稱)

（8）Matplotlib輸出中文修改matplotlib安裝目錄（Lib/site-packages/ matplotlib ）下mpl-data子目錄的matplotlibrc文件，去掉font.family和font.sans-serif的注釋，並且在font.sans-serif添加FangSong中文字體。

或者在程式碼中添加下面這個函數並調用該函數：

def set_ch():  	from pylab import mpl  	mpl.rcParams['font.sans-serif'] = ['FangSong']  	mpl.rcParams['axes.unicode_minus'] = False    set_ch()

import numpy as np  months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']  mean_sales = [343.56, 566.99, 309.81, 456.78, 989, 345.98, 235.67, 934, 119.09, 245.6, 213.98, 156.77]  np_months = np.array([i+1 for i, _ in enumerate(months)])  np_mean_sales = np.array(mean_sales)  plt.figure(figsize=(15, 8))  plt.bar(np_months, np_mean_sales, width=1, facecolor='yellowgreen', edgecolor='white')  plt.xlim(0.5, 13)  plt.xlabel(u"月份")  plt.ylabel(u"月均銷售額")  for x, y in zip(np_months, np_mean_sales):  	plt.text(x, y, y, ha="center", va="bottom")  plt.show()

（9）用LaTex編寫數學表達式參考：http://matplotlib.org/users/mathtext.html