python 內置模組

2021 年 11 月 25 日
筆記
python -內置模組, Python入門

python 內置模組

取消轉義的兩種方法：

\ \n
r\n

*單獨寫正則表達式的時候在轉義字元前加 r 一般不識別，在python中推薦使用加 r 的方法，單獨寫正則推薦使用雙*

1、re模組（重點）

1、re 模組的基本操作方法

findall（）：根據正則匹配所有符合條件的數據，匹配成功返回list，如果沒有匹配到返回空列表。
search（）：根據正則匹配到一個符合條件的就結束，查看結果需要用group（）方法，如果沒有符合條件的數據，那麼返回None，沒有符合條件的數據再使用group（）會報錯。
match（）：根據正則從頭開始匹配，相當於正則表達式中的^，文本內容必須在開頭匹配上，如果沒有符合條件的數據，那麼match返回None，並且使用group會直接報錯
split（）：根據匹配的字元串進行分割
sub（）：替換正則匹配到的內容，如果不寫替換的個數默認替換所有，返回替換之後的字元串
subn（）：和sub方法功能一樣，結果返回元組，並提示換了幾處
compile（）：在需要匹配相同正則表達式情況下, 事先定義一個compile可以簡化程式碼量，可以多次多次調用compile（）返回的結果
finditer（）：和findall方法一樣，返回的結果是一個iterator，需要遍歷輸出

2、re 模組方法示例：

1.findall（）方法

定義：findall根據正則匹配所有符合條件的數據，匹配成功返回list，如果沒有匹配到返回空列表。
格式：findall(pattern, string, flags=0)

import re
# findall示例:
# 有匹配結果
res = re.findall('a.','abcaaa,ccc,abcd123')
print(res)
# 無匹配結果
res1 = re.findall('z','abc,123,df,eg,edg,456qqq')
print(res1)

# 結果
['ab', 'aa', 'a,', 'ab']
[]

2、search（）方法

定義：search根據正則匹配到一個符合條件的就結束，查看結果需要用group（）方法，如果沒有符合條件的數據，那麼返回None，沒有符合條件的數據再使用group（）會報錯。
格式：search(pattern, string, flags=0)

import re
# search示例:
# 有匹配結果
res = re.search('a','Hammer,Alien,Tony')
print(res) # 返回match對象<_sre.SRE_Match object; span=(1, 2), match='a'>
print(res.group()) # 匹配到一個就結束，Hammer中的a

# 無匹配結果
res1 = re.search('b','Hammer,Alien,Tony')
print(res1) # 返回None
print(res1.group()) # 報錯

# 結果
<_sre.SRE_Match object; span=(1, 2), match='a'>
a
None
AttributeError: 'NoneType' object has no attribute 'group'

3、match（）方法

定義：match根據正則從頭開始匹配，相當於正則表達式中的^，文本內容必須在開頭匹配上，如果沒有符合條件的數據，那麼match返回None，並且使用group會直接報錯
格式：match(pattern, string, flags=0)

import re
# match示例:
# 有匹配結果
res = re.match('a','abc,bcd,efg')
print(res)
print(res.group())

# 無匹配結果
res1 = re.match('b','zxc,vbn,nmk')
print(res1)
print(res1.group()) # 沒有匹配結果使用group報錯

# 結果
<_sre.SRE_Match object; span=(0, 1), match='a'>
a
None

4、split（）方法

定義：split根據匹配的字元串進行分割
格式：split(pattern, string, maxsplit=0, flags=0)

import re
# split示例:
# 有匹配結果
res = re.split('ab','ab,abc,abcd') # 會根據待匹配字元中的ab切分成不同的空字元串
print(res)
# 無匹配結果：原樣返回，組織成列表
res1 = re.split('zq','ab,abc,abcd') # 會根據待匹配字元中的ab切分成不同的空字元串
print(res1)

# 結果
['', ',', 'c,', 'cd']
['ab,abc,abcd']

5、sub（）方法

定義：替換正則匹配到的內容，如果不寫替換的個數默認替換所有，返回替換之後的字元串
格式：sub(pattern, repl, string, count=0, flags=0)

import re
# sub示例:
# 有匹配結果
res = re.sub('\d','Ze','HammerZe9854') # 將數字替換成Ze
print(res)
# 更改替換個數
two_change = re.sub('\d','Ze','HammerZe9854',2) # 替換兩個數字
print(two_change)

# 無匹配結果
res1 = re.sub('\d','Ze','HammerZe') # 將數字替換成Ze
print(res1)   # 沒有可匹配的數字，原樣輸出

# 結果
HammerZeZeZeZeZe
HammerZeZeZe54
HammerZe

6、sunb（）方法

定義：替換正則匹配到的內容，如果不寫替換的個數默認替換所有，返回替換之後的字元串
格式： subn(pattern, repl, string, count=0, flags=0)

import re
# subn示例:
# 有匹配結果
res = re.subn('\d','Ze','HammerZe9854') # 將數字替換成Ze
print(res)
# 更改替換個數
two_change = re.subn('\d','Ze','HammerZe9854',2) # 替換兩個數字
print(two_change)

# 無匹配結果
res1 = re.subn('\d','Ze','HammerZe') # 將數字替換成Ze
print(res1)   # 沒有可匹配的數字，原樣輸出

# 結果
('HammerZeZeZeZeZe', 4)
('HammerZeZeZe54', 2)
('HammerZe', 0)

7、compile（）方法

定義：在需要匹配相同正則表達式情況下, 事先定義一個compile可以簡化程式碼量，可以多次多次調用compile（）返回的結果
格式：compile(pattern, flags=0)

# compile示例
re_exp = re.compile('\d*')  # 編寫公用正則公式
res = re.match(re_exp, '1aa,2bb,3cc')  # 返回開頭數字
print(res.group())
res1 = re.findall(re_exp, '1aa,2bb,3cc')
print(res1)
res2 = re.search(re_exp, '1aa,2bb,3cc')  # 遇到一個符合的就結束
print(res2.group())
# 結果
1
['1', '', '', '', '2', '', '', '', '3', '', '', '']
1

8、finditer（）方法

定義：和findall方法一樣，返回的結果是一個iterator，需要遍歷輸出
格式：finditer(pattern, string, flags=0)

import re

res = re.finditer('\d+','HammerZe123,HammerZe456,HammerZE789')
print([i.group() for i in res])

res1 = re.findall('\d+','HammerZe123,HammerZe456,HammerZE789')
print(res1)

# 結果，一毛一樣，前者會節省資源，遍歷才能輸出
['123', '456', '789']
['123', '456', '789']

3、無名分組、有名分組

無名分組：

沒有分組的情況，返回的就是正則匹配的結果
有分組的情況，優先返回分組的內容
無名分組的取值方式可以通過group(n)，n為輸出第幾組的值
取消分組優先展示，只需在括弧首位添加？:

import re
# 匹配身份證號的案例
# findall針對分組優先展示   無名分組
res = re.findall("^[1-9]\d{14}(\d{2}[0-9x])?$",'110105199812067023')
print(res)  # ['023']
# 取消分組優先展示          無名分組
res1 = re.findall("^[1-9](?:\d{14})(?:\d{2}[0-9x])?$",'110105199812067023')
print(res1)

# 結果
['023']
['110105199812067023']

有名分組：

格式：(?P正則表達式)
可以通過group（name）方法按名字標籤輸出

import re
# 匹配身份證號的案例
# 有名分組
res = re.search('^[1-9](?P<name1>\d{14})(?P<name2>\d{2}[0-9x])?$','110105199812067023')
print(res)
print(res.group())  # 110105199812067023
print(res.group(1))  # 10105199812067  無名分組的取值方式(索引取)
print(res.group('name1'))  
print(res.group('name2'))  

# 結果
<_sre.SRE_Match object; span=(0, 18), match='110105199812067023'>
110105199812067023
10105199812067
10105199812067
023

re 模組實戰（爬蟲）

爬取紅牛分公司資訊案例：

打開url查看網頁源碼分析標籤規律：
程式碼實現：

import re

import pandas as pd
import requests

url = '//www.redbull.com.cn/about/branch'
response = requests.get(url)
# print(response)  <Response [200]>
# 分公司名稱，頁面源碼<h2>紅牛杭州分公司</h2>
get_company_name = re.findall('<h2>(.*?)</h2>', response.text)
# 地址：<p class='mapIco'>杭州市上城區慶春路29號遠洋大廈11樓A座</p>
get_company_addre = re.findall("<p class='mapIco'>(.*?)</p>", response.text)
# 郵編：<p class='mailIco'>310009</p>
get_company_post = re.findall("<p class='mailIco'>(.*?)</p>", response.text)
# 電話：<p class='telIco'>020-38927681</p>
get_company_telephone = re.findall("<p class='telIco'>(.*?)</p>", response.text)
# 調整爬取的資訊結構
company_info = pd.DataFrame({'公司名': get_company_name, '地址': get_company_addre,
                             '郵編': get_company_post, '電話': get_company_telephone})
# 存到excel表裡
company_info.to_excel(excel_writer=r"db\redbull_info.xlsx", index=None)
# 查看部分公司資訊
line = company_info.head(10)
print(line)

# 結果查看elcel表格

2、time模組

1、調用模組之前需要掌握的理論知識：

時間戳：時間戳表示的是從1970年1月1日00:00:00開始按秒計算的偏移量
世界標準時間：全球24個時區，中國所在為東八區，UTC+8,夏令時DST。
- 👉[24時區劃分](世界時區劃分時差在線查詢計算_時間換算器 (beijing-time.org))
- 👉[夏令時DST](夏令時_百度百科 (baidu.com))
元組方式：struct_time元組共有9個元素，返回struct_time的函數主要有gmtime()，localtime()，strptime()。

2、時間三種表現形式

時間戳：timestamp
結構化時間：strut_time
格式化時間：format time

年-月-日：%Y-%m-%d
時：分：秒：%H:%M:%S 或 %X

import time

# 獲取時間戳
print(time.time()) # 1637838298.6971347
# 結構化時間戳
print(time.mktime(time.localtime())) # 1637838298.0
# 格林威治時間
time.gmtime()
print(time.gmtime(time.time()))
# 原地阻塞1秒
time.sleep(1)


# 格式化時間表現形式
# 年-月-日
print(time.strftime('%Y-%m-%d'))  # 2021-11-25
# 年-月-日 時：分：秒
print(time.strftime('%Y-%m-%d %H:%M:%S'))  # 2021-11-25 19:00:37
# 時分秒可以簡寫為%X
print(time.strftime('%Y-%m-%d %X'))  # 2021-11-25 19:00:37
print(time.strftime('%Y-%m-%d %X',time.localtime())) # 等價上

# 格式化轉結構化時間
print(time.strptime('2021-11-25 19:00:37','%Y-%m-%d %X'))
# time.struct_time(tm_year=2021, tm_mon=11, tm_mday=25, tm_hour=19, tm_min=0, tm_sec=37, tm_wday=3, tm_yday=329, tm_isdst=-1)

# 生成固定格式的時間表現形式：
print(time.asctime(time.localtime())) #Thu Nov 25 19:22:39 2021
print(time.ctime(time.time())) # Thu Nov 25 19:22:39 2021
# 時間加減
res = time.time()
print(time.ctime(res+1)) # Thu Nov 25 19:23:27 2021
print(time.ctime(res-1)) # Thu Nov 25 19:22:38 2021

下面表格參考博文：[time模組]((20條消息) python time模組和datetime模組詳解_weixin_34162629的部落格-CSDN部落格)

struct_time元組元素結構

屬性                            值
tm_year（年）                  比如2011 
tm_mon（月）                   1 - 12
tm_mday（日）                  1 - 31
tm_hour（時）                  0 - 23
tm_min（分）                   0 - 59
tm_sec（秒）                   0 - 61
tm_wday（weekday）             0 - 6（0表示周日）
tm_yday（一年中的第幾天）        1 - 366
tm_isdst（是否是夏令時）        默認為-1

3、datatime模組

datatime模組市time的優化模組，功能更加強大

# datetime 模組
import datetime

# 獲取當天年月日
print(datetime.date.today())  # 2021-11-25
# 獲取當天精確時間
print(datetime.datetime.today())  # 2021-11-25 19:30:08.967812

# 分別輸出年月日周
res = datetime.datetime.today()
print(res.year)  # 2021
print(res.month)  # 11
print(res.day)  # 25

# 獲取星期(weekday星期是0-6) 0表示周一
print(res.weekday())  # 3，表示周四
# 獲取星期(weekday星期是1-7) 1表示周一
print(res.isoweekday())  # 4

# 時間差  ---timedelta
ctime = datetime.datetime.today()
time_tel = datetime.timedelta(days=3)
print(ctime) # 2021-11-25 19:33:24.800420
print(ctime-time_tel) # 2021-11-22 19:34:18.376427
print(ctime+time_tel) # 2021-11-28 19:34:18.376427

'''日期對象 = 日期對象 +/- timedelta對象'''
'''timedelta對象 = 日期對象 +/- 日期對象'''

ret = ctime + time_tel
print(ret - ctime) # 3 days, 0:00:00
print(ctime - ret) # -3 days, 0:00:00

# 小練習1:
'''輸出東八區時間'''
print(datetime.datetime.now()) # 2021-11-25 19:38:51.478786
'''輸出utc時間'''
print(datetime.datetime.utcnow()) # 2021-11-25 11:38:51.478786

# 扯淡小練習2:
'''計算活了多少天了'''
bir_days = datetime.date(1998,5,4)
now_data= datetime.date.today()
live_days = now_data - bir_days
print(f'活了{live_days}') # 活了8606 days, 0:00:00

4、collections 模組

1、namedtuple（具名元組）

格式：
- namedtuple(‘名稱’,[名字1,名字2,…])
- namedtuple(‘名稱’,’名字1 名字2 …’)

from collections import namedtuple
point = namedtuple('坐標',['x','y'])
res = point(10,20)
print(res,res.x,res.y)
# 結果：坐標(x=10, y=20) 10 20
point1 = namedtuple('坐標','x y z')
res1 = point1(10,20,30)
print(res1) # 坐標(x=10, y=20, z=30)
print(res1.x) # 10
print(res1.y) # 20
print(res1.z) # 30

2、隊列模組-queue

# 隊列模組
import queue  # 內置隊列模組:FIFO
# 初始化隊列
q = queue.Queue()
# 隊列中添加元素
q.put('first')
q.put('second')
q.put('third')
# 從隊列中獲取元素
print(q.get())
print(q.get())
print(q.get())
# 只有三個值，獲取完就在原地等待
print(q.get())  

# 結果
# first
# second
# third

3、雙端隊列-deque

from collections import deque

q = deque([11, 22, 33])
q.append(44)  # 從右邊添加
print(q) # deque([11, 22, 33, 44])
q.appendleft(55)  # 從左邊添加
print(q) # deque([55, 11, 22, 33, 44])
print(q.pop())  # 從右邊取值
print(q.popleft())  # 從做邊取值

4、有序字典

字典是無序的，想生成有序的字典，使用OrderedDict

# 生成普通字典
normal_dict = dict([('name', 'Hammer'), ('pwd', 123), ('hobby', 'study')])
print(normal_dict)
# {'name': 'jason', 'pwd': 123, 'hobby': 'study'}

# 有序字典
from collections import OrderedDict
order_dict = OrderedDict([('name', 'Hammer'), ('pwd', 123), ('hobby', 'study')])
print(order_dict)
# 結果
# OrderedDict([('name', 'Hammer'), ('pwd', 123), ('hobby', 'study')])

5、默認值字典 -defaultdict

# 默認值字典
from collections import defaultdict
# 大於66的作為k2 的值，小於66的作為k1的值
values = [11, 22, 33,44,55,66,77,88,99,90]
my_dict = defaultdict(list)
for value in  values:
    if value>60:
        my_dict['k2'].append(value)
    else:
        my_dict['k1'].append(value)
print(my_dict)
# defaultdict(<class 'list'>, {'k1': [11, 22, 33, 44, 55], 'k2': [66, 77, 88, 99, 90]})

# 這個例子用列表解析更簡單一點
res = {'k1':[i for i in values if i <66],'k2':[i for i in values if i>=66]}
print(res)
# {'k1': [11, 22, 33, 44, 55], 'k2': [66, 77, 88, 99, 90]}

6、計數器 – Counter

# 計數器
res = 'HammerZeHammerZeHammerZe'
# 統計字元串中每個元素出現的次數
# 不用模組實現
new_dict = {}
for i in res:
    if i not in new_dict:
        new_dict[i] = 1
    else:
        new_dict[i] += 1
print(new_dict)
# {'H': 3, 'a': 3, 'm': 6, 'e': 6, 'r': 3, 'Z': 3}
# 使用模組實現
from collections import Counter  # 計數器
ret = Counter(res)
print(ret)
# Counter({'m': 6, 'e': 6, 'H': 3, 'a': 3, 'r': 3, 'Z': 3})

【待續···】-如有錯誤歡迎指正，感謝🤞🤞🤞

Tags: python -內置模組 Python入門

python 內置模組

python 內置模組

取消轉義的兩種方法：

1、re模組（重點）

1、re 模組的基本操作方法

2、re 模組方法示例：

1.findall（）方法

2、search（）方法

3、match（）方法

4、split（）方法

5、sub（）方法

6、sunb（）方法

7、compile（）方法

8、finditer（）方法

3、無名分組、有名分組

無名分組：

有名分組：

re 模組實戰（爬蟲）

爬取紅牛分公司資訊案例：

2、time模組

1、調用模組之前需要掌握的理論知識：

2、時間三種表現形式

3、datatime模組

4、collections 模組

1、namedtuple（具名元組）

2、隊列模組-queue

3、雙端隊列-deque

4、有序字典

5、默認值字典 -defaultdict

6、計數器 – Counter

VirMach 便宜 VPS

QNews

python 內置模組

python 內置模組

取消轉義的兩種方法：

1、re模組（重點）

1、re 模組的基本操作方法

2、re 模組方法示例：

1.findall（）方法

2、search（）方法

3、match（）方法

4、split（）方法

5、sub（）方法

6、sunb（）方法

7、compile（）方法

8、finditer（）方法

3、無名分組、有名分組

無名分組：

有名分組：

re 模組實戰（爬蟲）

爬取紅牛分公司資訊案例：

2、time模組

1、調用模組之前需要掌握的理論知識：

2、時間三種表現形式

3、datatime模組

4、collections 模組

1、namedtuple（具名元組）

2、隊列模組-queue

3、雙端隊列-deque

4、有序字典

5、默認值字典 -defaultdict

6、計數器 – Counter

分享此文：

Related Posts

identity server4 授權成功頁面跳轉時遇到錯誤：Exception: Correlation failed. Unknown location的解決方法

netty系列之:使用netty搭建websocket伺服器

前端框架擼起來——根組件

Swift-技巧（八）CVPixelBuffer To CGImage

VirMach 便宜 VPS

QNews

熱門搜尋