【Python基礎】05、Python文

2020 年 1 月 10 日
筆記

一、文件系統和文件

1、文件系統和文件

文件系統是OS用於明確磁碟或分區上的文件的方法和數據結構——即在磁碟上組織文件的方法

電腦文件(或稱文件、電腦檔案、檔案)，是存儲在某種長期儲存設備或臨時存儲設備中的一段數據流，並且歸屬於電腦文件系統管理之下

概括來講：

文件是電腦中由OS管理的具有名字的存儲區域

在Linux系統上，文件被看做是位元組序列

2、linux文件系統組件的體系結構

3、Python打開文件

Python內置函數open()用於打開文件和創建文件對象

語法格式：

open(name[,mode[,bufsize]])

open方法可以接收三個參數：文件名、模式和緩衝區參數

open函數返回的是一個文件對象

mode：指定文件的打開模式

bufsize：定義輸出快取

0表示無輸出快取

1表示使用緩衝

負數表示使用系統默認設置

正數表示使用近似指定大小的緩衝

4、文件的打開模式

簡單模式：

r: 只讀以讀的方式打開，定位到文件開頭

open(『/var/log/message.log』,』r』)

w: 寫入以寫的方式打開，不能讀，定位到文件開頭，會清空文件內的數據

a: 附加以寫的方式打開，定位到文件末尾

在模式後使用「+」表示同時支援輸入、輸出操作

如r+、w+和a+

在模式後附加「b」表示以二進位方式打開

如rb、wb+

In [4]: file.  file.close       file.isatty      file.read        file.tell  file.closed      file.mode        file.readinto    file.truncate  file.encoding    file.mro         file.readline    file.write  file.errors      file.name        file.readlines   file.writelines  file.fileno      file.newlines    file.seek        file.xreadlines  file.flush       file.next        file.softspace         In [6]: f1=open('/etc/passwd','r')    In [7]: f1  Out[7]: <open file '/etc/passwd', mode 'r' at 0x21824b0>    In [8]: print f1  <open file '/etc/passwd', mode 'r' at 0x21824b0>    In [9]: type(f1)  Out[9]: file        In [10]: f1.next  Out[10]: <method-wrapper 'next' of file object at 0x21824b0>    In [11]: f1.next()   #文件也是可迭代對象，  Out[11]: 'root:x:0:0:root:/root:/bin/bashn'    In [12]: f1.next()  Out[12]: 'bin:x:1:1:bin:/bin:/sbin/nologinn'      In [22]: f1.close()      #關閉文件，    In [23]: f1.next()      #文件被關閉後，不能再讀取數據  ---------------------------------------------------------------------------  ValueError                                Traceback (most recent call last)  <ipython-input-23-4a9d57471e88> in <module>()  ----> 1 f1.next()    ValueError: I/O operation on closed file

In [50]: f1=open('/etc/passwd','r')    In [51]: f1.  f1.close       f1.isatty      f1.readinto    f1.truncate  f1.closed      f1.mode        f1.readline    f1.write  f1.encoding    f1.name        f1.readlines   f1.writelines  f1.errors      f1.newlines    f1.seek        f1.xreadlines  f1.fileno      f1.next        f1.softspace     f1.flush       f1.read        f1.tell            In [51]: f1.readl  f1.readline   f1.readlines      In [51]: f1.readline  Out[51]: <function readline>    In [52]: f1.readline()  Out[52]: 'root:x:0:0:root:/root:/bin/bashn'    In [53]: f1.readlines()  Out[53]:   ['bin:x:1:1:bin:/bin:/sbin/nologinn',   'daemon:x:2:2:daemon:/sbin:/sbin/nologinn',   'adm:x:3:4:adm:/var/adm:/sbin/nologinn',   'lp:x:4:7:lp:/var/spool/lpd:/sbin/nologinn',   'sync:x:5:0:sync:/sbin:/bin/syncn',   'shutdown:x:6:0:shutdown:/sbin:/sbin/shutdownn',   'halt:x:7:0:halt:/sbin:/sbin/haltn',   'mail:x:8:12:mail:/var/spool/mail:/sbin/nologinn',   'uucp:x:10:14:uucp:/var/spool/uucp:/sbin/nologinn',   'operator:x:11:0:operator:/root:/sbin/nologinn',   'games:x:12:100:games:/usr/games:/sbin/nologinn',   'gopher:x:13:30:gopher:/var/gopher:/sbin/nologinn',   'ftp:x:14:50:FTP User:/var/ftp:/sbin/nologinn',   'nobody:x:99:99:Nobody:/:/sbin/nologinn',   'dbus:x:81:81:System message bus:/:/sbin/nologinn',   'vcsa:x:69:69:virtual console memory owner:/dev:/sbin/nologinn',   'saslauth:x:499:76:"Saslauthd user":/var/empty/saslauth:/sbin/nologinn',   'postfix:x:89:89::/var/spool/postfix:/sbin/nologinn',   'haldaemon:x:68:68:HAL daemon:/:/sbin/nologinn',   'sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologinn']    In [54]:

讀取文件的指針：

In [57]: f1.tell()    #查看當前指針在文件中位置，返回的事，已讀文件位元組數  Out[57]: 949

In [69]: help(f1.seek)      Help on built-in function seek:    seek(...)      seek(offset[, whence]) -> None.  Move to new file position.      #offse 偏移量，默認是0  whence從什麼位置偏移,0表示從文件頭開始偏移，1表示從當前位置開始偏移，2表示從文件尾開始偏移，默認是0                 Argument offset is a byte count.  Optional argument whence defaults to      0 (offset from start of file, offset should be >= 0); other values are 1      (move relative to current position, positive or negative), and 2 (move      relative to end of file, usually negative, although many platforms allow      seeking beyond the end of a file).  If the file is opened in text mode,      only offsets returned by tell() are legal.  Use of other offsets causes      undefined behavior.      Note that not all file objects are seekable.  (END)     In [72]: f1.seek(0)          #沒有指定whence默認是0從文件首部偏移0    In [73]: f1.tell()  Out[73]: 0

In [29]: help(file.read)    Help on method_descriptor:    read(...)      read([size]) -> read at most size bytes, returned as a string.            If the size argument is negative or omitted, read until EOF is reached.      Notice that when in non-blocking mode, less data than what was requested      may be returned, even if no size parameter was given.  In [82]: f1.read(10)        #返回最多10個位元組的字元串  Out[82]: 'root:x:0:0'    In [83]: f1.tell()  Out[83]: 10      In [88]: f1.name  Out[88]: '/etc/passwd'

5、文件方法

文件對象維護它所打開文件的狀態，其tell()方法返回當前在所打開的文件中的位置

read()方法用於將文件讀進單一字元串，也可以為其指定要讀取的位元組數

readline()：可以讀取下一行到一個字元串，包括行尾的結束符

readlines()：則讀取整個文件的所有行至以行為單位的字元串列表中

write(aString)：輸出位元組字元串到文件

writelines(aList)：用於把列表內所有字元串寫入文件

f.isatty()：是否是終端設備文件

f.truncate：截取最大指定位元組

注意：

文件方法read()等在讀取文件時，會一併讀取其行結束符

文件方法write()執行寫出操作時，不會自動為其添加行結束符

6、文件對象屬性

with語法

2.5開始支援with語法

用於需要打開、關閉成對的操作

可以自動關閉打開的對象

語法：

with open_expr as obj:

expression

In [90]: f = open("/tmp/passwd","r+")    In [91]: f.closed  Out[91]: False    In [92]: with open("/tmp/passwd","r+") as f:      pass     ....:     In [93]: f.closed  Out[93]: True

二、python文本處理

1、基本字元串處理

1）字元串分隔和連接

str.split() 分隔

str.rsplit() 從右邊開始分隔

In [11]: s1="xie xiao jun"      In [13]: help(s1.split)               Help on built-in function split:    split(...)      S.split([sep [,maxsplit]]) -> list of strings   #sep為分隔符，默認為空格  最大分隔次數            Return a list of the words in the string S, using sep as the      delimiter string.  If maxsplit is given, at most maxsplit      splits are done. If sep is not specified or is None, any      whitespace string is a separator and empty strings are removed      from the result.  (END)       In [12]: s1.spli  s1.split       s1.splitlines      In [12]: s1.split()  Out[12]: ['xie', 'xiao', 'jun']    In [16]: s1.split("",2)  ---------------------------------------------------------------------------  ValueError                                Traceback (most recent call last)  <ipython-input-16-f3d385e69a09> in <module>()  ----> 1 s1.split("",2)    ValueError: empty separator    In [17]: s1.split(" ",2)  Out[17]: ['xie', 'xiao', 'jun']    In [18]: s1.split(" ",1)  Out[18]: ['xie', 'xiao jun']    In [26]: s1.split("x")  Out[26]: ['', 'ie ', 'iao jun']    In [27]: s1  Out[27]: 'xie xiao jun'    In [28]: s1.split("i")  Out[28]: ['x', 'e x', 'ao jun']    In [29]: s1.split("n")  Out[29]: ['xie xiao ju', '']        In [35]: s1.rsplit  s1.rsplit    In [37]: s1.rsplit()                #當設置的分隔次數足夠的話rsplit和split沒區別  Out[37]: ['xie', 'xiao', 'jun']    In [38]: s1.rsplit(" ",1)          #當設置的分隔次數不夠時，rsplit從右邊開始分隔  Out[38]: ['xie xiao', 'jun']    In [39]: s1.split(" ",1)  Out[39]: ['xie', 'xiao jun']

join() + 連接

In [57]: s1  Out[57]: 'xie xiao jun'    In [58]: s2  Out[58]: 'aa bb cc dd ee'    In [59]: ",".join(s1)  Out[59]: 'x,i,e, ,x,i,a,o, ,j,u,n'    In [60]: "-".join(s1)  Out[60]: 'x-i-e- -x-i-a-o- -j-u-n'    In [67]: 'xie' + "jun"  Out[67]: 'xiejun'

2）字元串格式化

佔位符替換 %s %d|%i %f

In [75]: "adfas%s" % "hello"  Out[75]: 'adfashello'    In [76]: "adfas %s" % "hello"  Out[76]: 'adfas hello'    In [77]: "adfas %s " % "hello"  Out[77]: 'adfas hello '    In [78]: "adfas %s %s%s" % "hello"  ---------------------------------------------------------------------------  TypeError                                 Traceback (most recent call last)  <ipython-input-78-97faba8d8356> in <module>()  ----> 1 "adfas %s %s%s" % "hello"    TypeError: not enough arguments for format string    In [80]: "adfas %s %s%s" % ("hello","A","B")  #站位符和元祖的元素個數要相同  Out[80]: 'adfas hello AB'

3）字元串查找

str.find() 查找

In [90]: help(s1.find)    Help on built-in function find:    find(...)      S.find(sub [,start [,end]]) -> int            Return the lowest index in S where substring sub is found,      such that sub is contained within S[start:end].  Optional      arguments start and end are interpreted as in slice notation.            Return -1 on failure.        In [102]: s1.find("i")       #元素第一次出線的位置  Out[102]: 1        In [101]: s1.find("i",4,8)  Out[101]: 5

4）字元串替換

str.replace()

In [104]: s1.replace("x","X")  Out[104]: 'Xie Xiao jun'    In [105]: s1.replace("x","X",1)  Out[105]: 'Xie xiao jun'

5）str.strip() 移除字元串首尾的空白字元

str.rstrip() 只去除右邊的空白字元

str.strip() 只去除左邊的空白字元

In [21]: s2 = ' xie xiao jun '    In [22]: s2  Out[22]: ' xie xiao jun '    In [23]: s2.st  s2.startswith  s2.strip           In [23]: s2.strip()  Out[23]: 'xie xiao jun'    In [24]: s2.r  s2.replace     s2.rindex      s2.rpartition  s2.rstrip        s2.rfind       s2.rjust       s2.rsplit          In [24]: s2.rstrip()  Out[24]: ' xie xiao jun'    In [25]: s2.l  s2.ljust   s2.lower   s2.lstrip      In [25]: s2.lstrip()

三、os模組

目錄不屬於文件對象，輸於文件系統，和文件系統打交道，要使用os模組

os模組常用的方法：

1、目錄

getcwd()：獲取當前工作目錄

chdir()：切換工作目錄

chroot()：設定當前進程的根目錄

listdir()：列出指定目錄下的所有文件名

mkdir()：創建指定目錄

makedirs()：創建多級目錄

rmdir()：刪除目錄

removedirs()：刪除多級目錄

In [1]: import os    In [4]: help(os.mkdir)      Help on built-in function mkdir in module posix:    mkdir(...)      mkdir(path [, mode=0777])            Create a directory.  (END)     In [2]: os.mkdir('/tmp/test')    In [3]: ls /tmp  passwd  vgauthsvclog.txt.0  yum_save_tx-2016-09-02-17-11cyWWR1.yumtx  test/   vmware-root/        yum_save_tx-2016-09-21-23-45jB1DoO.yumtx    In [6]: os.getcwd()  Out[6]: '/root'    In [7]: os.c  os.chdir          os.chroot         os.confstr        os.curdir  os.chmod          os.close          os.confstr_names    os.chown          os.closerange     os.ctermid            In [8]: os.chdir('/tmp')    In [9]: os.getcwd()  Out[9]: '/tmp'    In [10]: os.stat('test')  Out[10]: posix.stat_result(st_mode=16877, st_ino=522528, st_dev=2050L, st_nlink=2, st_uid=0, st_gid=0, st_size=4096, st_atime=1474959686, st_mtime=1474959686, st_ctime=1474959686)

2、文件

mkfifo()：創建匿名管道

mknod()：創建設備文件

remove()：刪除文件

unlink():刪除鏈接文件

rename()：重命名

stat()：返迴文件狀態資訊，適用於文件和目錄

symlink()：創建鏈接

utime()：更新時間戳

tmpfile()：創建並打開（w+b）一個新的臨時文件

walk()：目錄生成器

In [49]: g1=os.walk('/tmp')    In [50]: g1.  g1.close       g1.gi_frame    g1.next        g1.throw         g1.gi_code     g1.gi_running  g1.send            In [50]: g1.next  Out[50]: <method-wrapper 'next' of generator object at 0x24f0050>    In [51]: g1.next()  Out[51]:   ('/tmp',   ['x', 'test1', 'vmware-root', 'test', '.ICE-unix'],   ['test2',    'yum_save_tx-2016-09-02-17-11cyWWR1.yumtx',    'vgauthsvclog.txt.0',    'passwd',    'yum_save_tx-2016-09-21-23-45jB1DoO.yumtx'])

3、訪問許可權 access()：判斷指定用戶對文件是否有訪問許可權

chmod()：修改許可權

chown()：改變屬者，屬組

umask()：設置默認許可權模式

In [66]: os.a  os.abort   os.access  os.altsep      In [66]: os.access('/root',0)  Out[66]: True    In [67]: os.access('/root',100)  Out[67]: False

4、文件描述符

open()：系統底層函數，打開文件

read()：

write()：

5、設備文件

mkdev()：根據主設備號，次設備號創建設備

major()：

minor()：

四、os.path模組

os.path是os模組的的子模組

實現路徑管理，文件路徑字元串本身的管理

In [5]: os.path  Out[5]: <module 'posixpath' from '/usr/local/python27/lib/python2.7/posixpath.pyc'>  In [3]: os.path.  os.path.abspath                     os.path.join  os.path.altsep                      os.path.lexists  os.path.basename                    os.path.normcase  os.path.commonprefix                os.path.normpath  os.path.curdir                      os.path.os  os.path.defpath                     os.path.pardir  os.path.devnull                     os.path.pathsep  os.path.dirname                     os.path.realpath  os.path.exists                      os.path.relpath  os.path.expanduser                  os.path.samefile  os.path.expandvars                  os.path.sameopenfile  os.path.extsep                      os.path.samestat  os.path.genericpath                 os.path.sep  os.path.getatime                    os.path.split  os.path.getctime                    os.path.splitdrive  os.path.getmtime                    os.path.splitext  os.path.getsize                     os.path.stat  os.path.isabs                       os.path.supports_unicode_filenames  os.path.isdir                       os.path.sys  os.path.isfile                      os.path.walk  os.path.islink                      os.path.warnings  os.path.ismount

1、跟文件路徑相關

basename()：去文件路徑基名

dirname()：去文件路徑目錄名

join()：將字元串連接起來

split()：返回dirname(),basename()元祖

splitext()：返回（filename，extension 擴展名）元祖

In [6]: os.path.basename('/tmp/passwd')  Out[6]: 'passwd'    In [7]: os.path.dirname('/tmp/passwd')  Out[7]: '/tmp'    In [8]: os.listdir('/tmp')  Out[8]:   ['x',   'test2',   'yum_save_tx-2016-09-02-17-11cyWWR1.yumtx',   'test1',   'vmware-root',   'vgauthsvclog.txt.0',   'passwd',   'test',   '.ICE-unix',   'yum_save_tx-2016-09-21-23-45jB1DoO.yumtx']    In [9]: for filename in os.listdir('/tmp'):print os.path.join("/tmp",filename)  /tmp/x  /tmp/test2  /tmp/yum_save_tx-2016-09-02-17-11cyWWR1.yumtx  /tmp/test1  /tmp/vmware-root  /tmp/vgauthsvclog.txt.0  /tmp/passwd  /tmp/test  /tmp/.ICE-unix  /tmp/yum_save_tx-2016-09-21-23-45jB1DoO.yumtx      In [24]: os.path.split('/etc/sysconfig/network')  Out[24]: ('/etc/sysconfig', 'network')

2、文件相關資訊

getatime()：返迴文件最近訪問時間

getctime()

getmtime()

getsize()：返迴文件的大小

3、查詢

exists()：判斷指定文件是否存在

isabs()：判斷指定路徑是否為絕對路徑

isdir()：是否為目錄

isfile()：是否存在而且文件

islink()：是否存在且為鏈接

ismount()：是否為掛載點

samefile()：兩個路徑是否指向同一個文件

五、pickle模組

Python程式中實現文件讀取或寫出時，要使用轉換工具把對象轉換成字元串

實現對象持久存儲

把對象存儲在文件中：

pickle模組：

marshal：

把對象存儲在DB中：

DBM介面（需要裝載第三方介面）：

shelve模組：既然實現流式化也能存在DB中

In [31]: l1=[1,2,3,"4",'abc']    In [34]: f1=open('/tmp/test2','a+')    In [36]: s1="xj"    In [37]: f1.write(s1)    In [40]: cat /tmp/test2    In [42]: f1.close()    In [43]: cat /tmp/test2  xj    In [47]: print l1  [1, 2, 3, '4', 'abc']    In [57]: f1.write(l1)  ---------------------------------------------------------------------------  TypeError                                 Traceback (most recent call last)  <ipython-input-57-83ae8c8c88d4> in <module>()  ----> 1 f1.write(l1)    TypeError: expected a character buffer object      #期望字元快取對象

pickle模組：

In [58]: import pickle    In [61]: help(pickle.dump)    Help on function dump in module pickle:    dump(obj, file, protocol=None)  (END)       [root@Node3 tmp]# cat test2  hello    n [77]: pickle.dump(l1,f1)     #前面已經定義了l1和f1，f1要是已打開的文件    In [78]: f1.flush()    [root@Node3 tmp]# cat test2  hello                            (lp0  I1  aI2  aI3  aS'4'  p1  aS'abc'  p2    In [105]: l2=pickle.load(f2)    In [106]: l2  Out[106]: [1, 2, 3, '4', 'abc']

六、Python中的正則表達式

文件是可迭代對象，以行為單位迭代

正則表達式是一個特殊的字元序列，它能幫助你方便的檢查一個字元串是否與某種模式匹配。

Python 自1.5版本起增加了re 模組，它提供 Perl 風格的正則表達式模式。

re 模組使 Python 語言擁有全部的正則表達式功能。

compile 函數根據一個模式字元串和可選的標誌參數生成一個正則表達式對象。該對象擁有一系列方法用於正則表達式匹配和替換。

re 模組也提供了與這些方法功能完全一致的函數，這些函數使用一個模式字元串做為它們的第一個參數。

1、python中正則表達式的元字元

和bash中擴展正則表達式一樣：

.，[]，[^]，

中括弧用於指向一個字符集合比如[a-z],[a,b,c]

中括弧中可以使用元字元

中括弧中元字元.僅代表字面量

[0-9]，d（任意數字），D（任意非數字）

[0-9a-zA-Z]，w，W

s：任意空白字元：[ntfvr]，S

?，+，｛m｝，{m,n}，{0,n}，{m,}

^，$，b，

|，（），nn

（*|+|?|{}）?：默認貪婪模式，在表示重複的元字元後面加個？非貪婪模式

捕獲|分組

位置捕獲：（…）

命名捕獲：（?P<name>…） #python所特有的

斷言

在目標字元串當前匹配位置的前面或後面進行的一種測試，但不佔用字元

前向斷言(?=…) 肯定 (?!…) 否定

後向斷言(?<=…) 肯定 (?<!) 否定

?是前向，？<是後向

?=是肯定， ?! 是否定

2、re模組常用的方法

re.math()：返回match對象

屬性：

string

pos

endpos

方法：

group() ：分組，返回字元串

groups()：分組，返回以括弧內的內容組成的元祖

start()

end()

re.search()：第一次匹配到的字元,返回match對象

re.findall()：匹配到的所有字元，返回一個列表

re.finditer()：匹配到的所有字元，返回一個迭代器，內容是math對象

re.split(「m」,str)：以m為分隔符，分割str，返回列表

re.sub（）：替換，返回字元串

re.subn（）：返回元祖

flags：

I或IGNORECASE：忽略大小寫

M或MULTILINE：實現跨行匹配 #用的不多

A或ASCII：僅執行8位ASCII碼匹配

U或UNICODE：

In [251]: import re    In [252]: re.  re.DEBUG        re.S            re.compile      re.search  re.DOTALL       re.Scanner      re.copy_reg     re.split  re.I            re.T            re.error        re.sre_compile  re.IGNORECASE   re.TEMPLATE     re.escape       re.sre_parse  re.L            re.U            re.findall      re.sub  re.LOCALE       re.UNICODE      re.finditer     re.subn  re.M            re.VERBOSE      re.match        re.sys  re.MULTILINE    re.X            re.purge        re.template      In [262]: re.match('a','abc')  Out[262]: <_sre.SRE_Match at 0x319b3d8>       #返回一個match對象    In [263]: match=re.match('a',"abc")    In [264]: match.         #match對象內部的相關屬性或方法  match.end        match.groupdict  match.pos        match.start  match.endpos     match.groups     match.re         match.string  match.expand     match.lastgroup  match.regs         match.group      match.lastindex  match.span           In [264]: match.string     #匹配的字元串本身  Out[264]: 'abc'    In [266]: match.re  Out[266]: re.compile(r'a')    #匹配使用的文本，匹配時會自動編譯    In [268]: match.group()     #greoup是一個方法，匹配到的字元串  Out[268]: 'a'    In [269]: match=re.match('a.',"abc")    In [270]: match.group()  Out[270]: 'ab'    In [271]: match.groups()   #以元祖方式返回所有匹配到的結果  Out[271]: ()        In [58]: str1="heelo world"    In [59]: re.search("(l(.))",str1)  Out[59]: <_sre.SRE_Match at 0x1603580>    In [60]: mat1=re.search("(l(.))",str1)    In [61]: mat1.  mat1.end        mat1.groupdict  mat1.pos        mat1.start  mat1.endpos     mat1.groups     mat1.re         mat1.string  mat1.expand     mat1.lastgroup  mat1.regs         mat1.group      mat1.lastindex  mat1.span           In [62]: help(mat1.group)      Help on built-in function group:    group(...)      group([group1, ...]) -> str or tuple.      Return subgroup(s) of the match by indices or names.      For 0 returns the entire match.    In [63]: mat1.group()  #匹配到的全部字元串  Out[63]: 'lo'        In [66]: mat1.group(0)  #匹配到的全部字元串  Out[66]: 'lo'    In [67]: mat1.group(1)  #匹配到的第一個分組，不保護分組外的內容（括弧外匹配到的內容）  Out[67]: 'lo'    In [68]: mat1.group(2)  #匹配到的第2個分組（第2個括弧內的內容）  Out[68]: 'o'    In [69]: mat1.group(3)  #沒有第三個  ---------------------------------------------------------------------------  IndexError                                Traceback (most recent call last)  <ipython-input-69-fe309512a255> in <module>()  ----> 1 mat1.group(3)    IndexError: no such group      In [77]: mat1.groups()    #返回以匹配到分組為內容的元祖，不包括分組外的內容（括弧外匹配到的內容）  Out[77]: ('lo', 'o')    In [78]: mat1.groups(0)  Out[78]: ('lo', 'o')    In [79]: mat1.groups(1)  Out[79]: ('lo', 'o')    In [80]: mat1.groups(2)  Out[80]: ('lo', 'o')    In [81]: mat1.groups(3)  Out[81]: ('lo', 'o')    In [89]: re.findall("(l(.))",str1)  Out[89]: [('lo', 'o'), ('ld', 'd')]    In [146]: for mat in re.findall("(o(.))",str1):print mat  ('o ', ' ')  ('or', 'r')    In [148]: for mat in re.finditer("(o(.))",str1):print mat  <_sre.SRE_Match object at 0x1603938>  <_sre.SRE_Match object at 0x16039c0>      In [150]: for mat in re.finditer("(o(.))",str1):print mat.group()  o   or    In [151]: for mat in re.finditer("(o(.))",str1):print mat.groups()  ('o ', ' ')  ('or', 'r')            In [114]: str2                 Out[114]: 'hellO wOrld'    In [120]: mat2=re.findall("(l(o))",str2,re.I)    #忽略大小寫    In [121]: mat2  Out[121]: [('lO', 'O')]    In [122]: mat2=re.findall("(l(o))",str2,)    In [123]: mat2  Out[123]: []      In [282]: match.start()   #從哪個位置開始匹配到  Out[282]: 0    In [283]: match.end()     #匹配結束的位置  Out[283]: 2    In [299]: match.pos     #從哪個位置開始搜索  Out[299]: 0    In [300]: match.endpos   #搜索的結束位置  Out[300]: 3

In [2]: url="www.magedu.com"    In [3]: re.search("m",url)  Out[3]: <_sre.SRE_Match at 0x14f7098>    In [5]: mat=re.search("m",url)    In [6]: mat  Out[6]: <_sre.SRE_Match at 0x14f7100>    In [7]: mat.  mat.end        mat.group      mat.lastgroup  mat.re         mat.start  mat.endpos     mat.groupdict  mat.lastindex  mat.regs       mat.string  mat.expand     mat.groups     mat.pos        mat.span           In [8]: mat.group()  Out[8]: 'm'      In [10]: re.findall("m",url)  Out[10]: ['m', 'm']    In [11]: re.finditer("m",url)  Out[11]: <callable-iterator at 0x162f510>    In [12]: mat1=re.fi  re.findall   re.finditer      In [12]: mat1=re.finditer("m",url)    In [13]: mat1.next()  Out[13]: <_sre.SRE_Match at 0x1626e68>    In [14]: mat1.next()  Out[14]: <_sre.SRE_Match at 0x1626ed0>    In [15]: mat1.next()  ---------------------------------------------------------------------------  StopIteration                             Traceback (most recent call last)  <ipython-input-15-e0f232c7f87c> in <module>()  ----> 1 mat1.next()    StopIteration:       In [19]: re.split(".",url)    #需要轉義  Out[19]: ['', '', '', '', '', '', '', '', '', '', '', '', '', '', '']    In [20]: re.split(".",url)  Out[20]: ['www', 'magedu', 'com']    In [30]: f1=open("/tmp/passwd","r+")    In [31]: re.split(":",f1.readline())  Out[31]: ['root', 'x', '0', '0', 'root', '/root', '/bin/bashn']

re.sub（）：

In [34]: help(re.sub)      Help on function sub in module re:    sub(pattern, repl, string, count=0, flags=0)      Return the string obtained by replacing the leftmost      non-overlapping occurrences of the pattern in string by the      replacement repl.  repl can be either a string or a callable;      if a string, backslash escapes in it are processed.  If it is      a callable, it's passed the match object and must return      a replacement string to be used.      In [35]: url  Out[36]: 'www.magedu.com'    In [37]: re.sub("ma","MA",url)  Out[38]: 'www.MAgedu.com'    In [35]: re.sub("m","M",url)  Out[35]: 'www.Magedu.coM'    In [36]: re.sub("m","M",url,1)  Out[36]: 'www.Magedu.com'    In [37]: re.sub("m","M",url,2)  Out[37]: 'www.Magedu.coM'    In [39]: re.subn("m","M",url,3)    #會顯示替換了幾次  Out[39]: ('www.Magedu.coM', 2)    In [169]: re.sub("M","S",url,count=2,flags=re.I)  Out[169]: 'www.Sagedu.coS'    In [170]: re.sub("M","S",url,count=2)  Out[170]: 'www.magedu.com'

re.match與re.search的區別

re.match只匹配字元串的開始，如果字元串開始不符合正則表達式，則匹配失敗，函數返回None；而re.search匹配整個字元串，直到找到一個匹配。

實例：

#!/usr/bin/pythonimport re    line = "Cats are smarter than dogs";matchObj = re.match( r'dogs', line, re.M|re.I)if matchObj: #加一個r表示是自然字元串不會被轉義，例如n在raw string中，是兩個字元，和n，而不會轉意為換行符。由於正則表達式和會有衝突，因此，當一個字元串使用了正則表達式後，最好在前面加上r     print "match --> matchObj.group() : ", matchObj.group()else:     print "No match!!"matchObj = re.search( r'dogs', line, re.M|re.I)if matchObj:     print "search --> matchObj.group() : ", matchObj.group()else:     print "No match!!"

以上實例運行結果如下：

No match!!search --> matchObj.group() :  dogs

檢索和替換

Python 的re模組提供了re.sub用於替換字元串中的匹配項。

語法：

re.sub(pattern, repl, string, max=0)

返回的字元串是在字元串中用 RE 最左邊不重複的匹配來替換。如果模式沒有發現，字元將被沒有改變地返回。

可選參數 count 是模式匹配後替換的最大次數；count 必須是非負整數。預設值是 0 表示替換所有的匹配。

實例：

#!/usr/bin/pythonimport re    phone = "2004-959-559 # This is Phone Number"# Delete Python-style commentsnum = re.sub(r'#.*$', "", phone)print "Phone Num : ", num# Remove anything other than digitsnum = re.sub(r'D', "", phone)    print "Phone Num : ", num

以上實例執行結果如下：

Phone Num :  2004-959-559Phone Num :  2004959559

正則表達式修飾符 – 可選標誌

正則表達式可以包含一些可選標誌修飾符來控制匹配的模式。修飾符被指定為一個可選的標誌。多個標誌可以通過按位 OR(|) 它們來指定。如 re.I | re.M 被設置成 I 和 M 標誌：

修飾符	描述
re.I	使匹配對大小寫不敏感
re.L	做本地化識別（locale-aware）匹配
re.M	多行匹配，影響 ^ 和 $
re.S	使 . 匹配包括換行在內的所有字元
re.U	根據Unicode字符集解析字元。這個標誌影響 w, W, b, B.
re.X	該標誌通過給予你更靈活的格式以便你將正則表達式寫得更易於理解。