Python中re模块基本用法解析

2020 年 1 月 3 日
笔记

基于Python的正则表达式, 使用re模块:

1. match()方法, 从字符串头部开始匹配

import re    content = 'The 123456 is my one phone number.'  print(len(content)) #字符串长度  result = re.match(r'^Thesd+sw*', content) #使用match匹配, 第一个参数为正则表达式, 第二个为要匹配的字符串  print(result)  print(result.group()) #输出匹配内容  print(result.span()) #输出匹配内容的位置索引

结果:

34  <_sre.SRE_Match object; span=(0, 13), match='The 123456 is'>  The 123456 is  (0, 13)

2. 匹配目标

import re    content = 'The 123456 is my one phone number.'  print(len(content)) #字符串长度  result = re.match(r'^Thes(d+)sis', content) #使用match匹配, 第一个参数为正则表达式, 第二个为要匹配的字符串  print(result)  print(result.group()) #输出匹配内容  print(result.group(1)) #输出第一个被()包裹的内容  print(result.span()) #输出匹配内容的位置索引

结果:

34  <_sre.SRE_Match object; span=(0, 13), match='The 123456 is'>  The 123456 is  123456  (0, 13)

在正则表达式中用()括起来可以使用group()输出, 若有n个(), 那么可以表示为group(n), 输出第n个括号匹配的内容.

3.通用匹配

import re    content = 'The 123456 is my one phone number.'  result = re.match(r'^The.*number.$', content) #使用match匹配, 第一个参数为正则表达式, 第二个为要匹配的字符串  print(result)  print(result.group()) #输出匹配内容  print(result.span()) #输出匹配内容的位置索引

结果:

<_sre.SRE_Match object; span=(0, 34), match='The 123456 is my one phone number.'>  The 123456 is my one phone number.  (0, 34)

其中 . 表示匹配任意字符, *表示匹配前面字符无限次.

4.贪婪与非贪婪

import re    content = 'The 123456 is my one phone number.'  print('贪婪匹配:')  result = re.match(r'^The.*(d+).*', content) #使用match匹配, 第一个参数为正则表达式, 第二个为要匹配的字符串  print(result.group()) #输出匹配内容  print('result = %s'%result.group(1)) #输出第一个被()包裹的内容  print('-'*20)  print('非贪婪匹配:')  result = re.match(r'^The.*?(d+).*', content)  print(result.group())  print('result = %s'%result.group(1))

结果:

贪婪匹配:  The 123456 is my one phone number.  result = 6  --------------------  非贪婪匹配:  The 123456 is my one phone number.  result = 123456

5.修饰符 re.S

import re    content = '''The 123456 is  one of my phone.  '''  result = re.match('^The.*?(d+).*?phone.', content, re.S)  if result:      print(result.group(1))  else:      print('result = None')  result2 = re.match('^The.*?(d+).*?phone.', content)  if result2:      print(result2.group(1))  else:      print('result2 = None')

结果:

123456  result2 = None

由于加上re.S参数后, 通配符 . 将可以匹配换行符, 所以result不为空, result2为空. 出了re.S, 还有许多修饰符如, re.I: 使用匹配时忽略大小写.

6.转义匹配

import re    content = '(百度)www.baidu.com'  result = re.match('(百度)www.baidu.com', content)  result2 = re.match('(百度)www.baidu.com', content)  if result:      print(result.group())  else:      print('result = None')  if result2:      print(result2.group())  else:      print('result2 = None')

结果:

result = None  (百度)www.baidu.com

由于()属于正则表达式的特殊字符, 因此在需要匹配()时, 需要加上转义字符’’.

7.search()方法, 与match()方法不同, 不需要从头部开始匹配

import re    content = 'Other The 123456 is my one phone number.'  result = re.search('The.*?(d+).*?number.', content)  print(result.group())

结果:

The 123456 is my one phone number.

8.findall()方法, match()和search()都是返回匹配到的第一个内容就结束匹配, findall()是返回所有符合匹配规则的内容

import re    html = '''  <div id="songs-list">  <h2 class="title">歌单</h2>  <p class="introduction">歌单列表</p>  <ul id="list" class="list-group">  <li data-view="2">一路上有你</li>  <li data-view="7">  <a href="/2.mp3" singer="任贤齐">沧海一声笑</a>  </li>  <li data-view="4" class="active">  <a href="/3.mp3" singer="齐秦">往事随风</a>  </li>  <li data-view="6"><a href="/4.mp3" singer="beyond">光辉岁月</a></li>  <li data-view="5"><a href="/5.mp3" singer="程慧玲">记事本</a></li>  <li data-veiw="5">  <a href="/6.mp3" singer="邓丽君">但愿人长久</a>  </li>  </ul>  </div>  '''    result = re.findall('<li.*?href="(.*?)".*?singer="(.*?)">(.*?)</a>', html, re.S)  if result:      print(result)      for res in result:          print(res[0], res[1], res[2])

[('/2.mp3', '任贤齐', '沧海一声笑'), ('/3.mp3', '齐秦', '往事随风'), ('/4.mp3', 'beyond', '光辉岁月'), ('/5.mp3', '程慧玲', '记事本'), ('/6.mp3', '邓丽君', '但愿人长久')]  /2.mp3 任贤齐 沧海一声笑  /3.mp3 齐秦 往事随风  /4.mp3 beyond 光辉岁月  /5.mp3 程慧玲 记事本  /6.mp3 邓丽君 但愿人长久

9.sub()方法, 去除匹配的字符

第二个参数是两个’，表示吧’d+ 匹配的内容替换成空，如果写sub(’d+’, ‘-’), 则把匹配的内容替换成 -。

import re    content = '54abc59de335f7778888g'  content = re.sub('d+', '', content)  print(content)

结果:

abcdefg

10.compile()

import re    content1 = '2016-1-1 12:01'  content2 = '2017-1-1 12:02'  content3 = '2018-1-1 12:03'    pattern = re.compile('d{2}:d{2}')  result1 = re.sub(pattern, '', content1)  result2 = re.sub(pattern, '', content2)  result3 = re.sub(pattern, '', content3)  print(result1, result2, result3)

结果:

2016-1-1  2017-1-1  2018-1-1

在需要匹配相同正则表达式情况下, 事先定义一个compile可以简化代码量, 同时compile中也可以使用修饰符r.S等.

Python中re模块基本用法解析

基于Python的正则表达式, 使用re模块:

1. match()方法, 从字符串头部开始匹配

2. 匹配目标

3.通用匹配

4.贪婪与非贪婪

5.修饰符 re.S

6.转义匹配

7.search()方法, 与match()方法不同, 不需要从头部开始匹配

8.findall()方法, match()和search()都是返回匹配到的第一个内容就结束匹配, findall()是返回所有符合匹配规则的内容

9.sub()方法, 去除匹配的字符

10.compile()

VirMach 便宜 VPS

QNews

Python中re模块基本用法解析

基于Python的正则表达式, 使用re模块:

1. match()方法, 从字符串头部开始匹配

2. 匹配目标

3.通用匹配

4.贪婪与非贪婪

5.修饰符 re.S

6.转义匹配

7.search()方法, 与match()方法不同, 不需要从头部开始匹配

8.findall()方法, match()和search()都是返回匹配到的第一个内容就结束匹配, findall()是返回所有符合匹配规则的内容

9.sub()方法, 去除匹配的字符

10.compile()

分享此文：

Related Posts

鸿蒙真的是套壳吗？HarmonyOS应用开发初体验，Java原生和JavaScript的mvvm开发

聊聊目标管理之 OKR

cmd中如何退出Python

本田CEO质疑纯电动汽车：需求不大短期内难成主流 更看好混动

VirMach 便宜 VPS

QNews

热门搜寻

本田CEO质疑纯电动汽车：需求不大短期内难成主流更看好混动