爬虫入门系列（六）：正则表达式完全指南（下）

志军 · 公众号 · Python · 2017-05-27 09:10

正文

请到「今天看啥」查看全文

re.search( r"f.o" , "foobarfeobar" ).group()
'foo'

两者的差异使得他们在应用场景上也不一样，如果是检查文本是否匹配某种模式，比如，检查字符串是不是有效的邮箱地址，则可以使用 match 来判断：

>>> rex = r"[\w]+@[\w]+\.[\w]+$"
>>> re.match(rex, "[email protected]")  # 匹配
<_sre.sre_match object="" at=""> 
>>> re.match(rex, "the email is [email protected]") # 不匹配
>>>

尽管第二个字符串中包含有邮件地址，但字符串整体不能当作一个邮件地址来使用，在网页上填邮件地址时，显然第二种写法是无效的。

通常， search 方法可用于判断字符串中是否包含有与正则表达式相匹配的子字符串，还可以从中提出匹配的子字符串，例如：

>>> rex = r"[\w]+@[\w]+\.[\w]+"
>>> m = re.search(rex, "the email is [email protected] .")
>>> m is None
False
>>> m.group()
'[email protected]'
>>>

细心的你可能已经发现了，上面例子与前面例子的正则表达式写法有细微区别，前者多一个元字符 $ ，它的目的是用于完全匹配字符串。因为不加 $ ，那么下面这种情况用match方法也匹配，显示这在表单验证时是无法满足要求的。

>>> rex = r"[\w]+@[\w]+\.[\w]+"
>>> re.match(rex, "[email protected] is my email")
<_sre.sre_match object="" at="">
>>>

那么有没有可能不加 $ ，就可以判断是否完全匹配字符串呢？在 Python3 中， re.fullmatch 就可以满足这样的需求。

>>> rex = r"[\w]+@[\w]+\.[\w]+"
>>> re.fullmatch(rex, "[email protected] is my email") # 不匹配
>>> re.fullmatch(rex, "[email protected]") # 匹配
<_sre.sre_match object="" span="(0, 10), match='[email protected]'>