Python RegEx

❮ Předchozí Další ❯

RegEx neboli regulární výraz je posloupnost znaků, které tvoří vyhledávací vzor.

RegEx lze použít ke kontrole, zda řetězec obsahuje zadaný vyhledávací vzor.

Modul RegEx

Python má vestavěný balíček s názvem re, který lze použít pro práci s regulárními výrazy.

Import remodulu:

import re

RegEx v Pythonu

Po importu remodulu můžete začít používat regulární výrazy:

Příklad

Vyhledejte řetězec, abyste zjistili, zda začíná „The“ a končí „Spain“:

import re

txt = "The rain in Spain"
x = re.search("^The.*Spain$", txt)

Funkce RegEx

Modul renabízí sadu funkcí, které nám umožňují vyhledávat v řetězci shodu:

Function	Description
findall	Returns a list containing all matches
search	Returns a Match object if there is a match anywhere in the string
split	Returns a list where the string has been split at each match
sub	Replaces one or many matches with a string

Metaznaky

Metaznaky jsou znaky se zvláštním významem:

Character	Description	Example
[]	A set of characters	"[a-m]"
\	Signals a special sequence (can also be used to escape special characters)	"\d"
.	Any character (except newline character)	"he..o"
^	Starts with	"^hello"
$	Ends with	"planet$"
*	Zero or more occurrences	"he.*o"
+	One or more occurrences	"he.+o"
?	Zero or one occurrences	"he.?o"
{}	Exactly the specified number of occurrences	"he{2}o"
\|	Either or	"falls\|stays"
()	Capture and group

Speciální sekvence

Speciální sekvence je \následovaná jedním ze znaků v níže uvedeném seznamu a má zvláštní význam:

Character	Description	Example
\A	Returns a match if the specified characters are at the beginning of the string	"\AThe"
\b	Returns a match where the specified characters are at the beginning or at the end of a word (the "r" in the beginning is making sure that the string is being treated as a "raw string")	r"\bain" r"ain\b"
\B	Returns a match where the specified characters are present, but NOT at the beginning (or at the end) of a word (the "r" in the beginning is making sure that the string is being treated as a "raw string")	r"\Bain" r"ain\B"
\d	Returns a match where the string contains digits (numbers from 0-9)	"\d"
\D	Returns a match where the string DOES NOT contain digits	"\D"
\s	Returns a match where the string contains a white space character	"\s"
\S	Returns a match where the string DOES NOT contain a white space character	"\S"
\w	Returns a match where the string contains any word characters (characters from a to Z, digits from 0-9, and the underscore _ character)	"\w"
\W	Returns a match where the string DOES NOT contain any word characters	"\W"
\Z	Returns a match if the specified characters are at the end of the string	"Spain\Z"

Sady

Sada je sada znaků uvnitř dvojice hranatých závorek []se speciálním významem:

Set	Description	Try it
[arn]	Returns a match where one of the specified characters (`a`, `r`, or `n`) are present
[a-n]	Returns a match for any lower case character, alphabetically between `a` and `n`
[^arn]	Returns a match for any character EXCEPT `a`, `r`, and `n`
[0123]	Returns a match where any of the specified digits (`0`, `1`, `2`, or `3`) are present
[0-9]	Returns a match for any digit between `0` and `9`
[0-5][0-9]	Returns a match for any two-digit numbers from `00` and `59`
[a-zA-Z]	Returns a match for any character alphabetically between `a` and `z`, lower case OR upper case
[+]	In sets, `+`, `*`, `.`, `\|`, `()`, `$`,`{}` has no special meaning, so `[+]` means: return a match for any `+` character in the string

Funkce findall().

Funkce findall()vrátí seznam obsahující všechny shody.

Příklad

Vytisknout seznam všech zápasů:

import re

txt = "The rain in Spain"
x = re.findall("ai", txt)
print(x)

Seznam obsahuje shody v pořadí, v jakém byly nalezeny.

Pokud nejsou nalezeny žádné shody, vrátí se prázdný seznam:

Příklad

Vraťte prázdný seznam, pokud nebyla nalezena žádná shoda:

import re

txt = "The rain in Spain"
x = re.findall("Portugal", txt)
print(x)

Funkce search().

Funkce search()hledá v řetězci shodu a v případě shody vrátí objekt Match .

Pokud existuje více než jedna shoda, bude vrácen pouze první výskyt shody:

Příklad

Vyhledejte první prázdný znak v řetězci:

import re

txt = "The rain in Spain"
x = re.search("\s", txt)

print("The first white-space character is located in position:", x.start())

Pokud nejsou nalezeny žádné shody, Noneje vrácena hodnota:

Příklad

Proveďte vyhledávání, které nevrací žádnou shodu:

import re

txt = "The rain in Spain"
x = re.search("Portugal", txt)
print(x)

Funkce split().

Funkce split()vrátí seznam, kde byl řetězec rozdělen při každé shodě:

Příklad

Rozdělit u každého prázdného znaku:

import re

txt = "The rain in Spain"
x = re.split("\s", txt)
print(x)

Počet výskytů můžete řídit zadáním maxsplit parametru:

Příklad

Rozdělte řetězec pouze při prvním výskytu:

import re

txt = "The rain in Spain"
x = re.split("\s", txt, 1)
print(x)

Funkce sub().

Funkce sub()nahradí shody textem podle vašeho výběru:

Příklad

Nahraďte každý prázdný znak číslem 9:

import re

txt = "The rain in Spain"
x = re.sub("\s", "9", txt)
print(x)

Počet náhrad můžete řídit zadáním count parametru:

Příklad

Nahraďte první 2 výskyty:

import re

txt = "The rain in Spain"
x = re.sub("\s", "9", txt, 2)
print(x)

Přiřadit objekt

Shoda objektu je objekt obsahující informace o hledání a výsledku.

Poznámka: Pokud nedojde k žádné shodě, Nonebude vrácena hodnota namísto objektu Match.

Příklad

Proveďte vyhledávání, které vrátí shodný objekt:

import re

txt = "The rain in Spain"
x = re.search("ai", txt)
print(x) #this will print an object

Objekt Match má vlastnosti a metody používané k načtení informací o hledání a výsledku:

.span()vrátí n-tici obsahující počáteční a koncovou pozici zápasu.
.stringvrátí řetězec předaný do funkce
.group()vrátí část řetězce, kde byla shoda

Příklad

Vytiskněte polohu (počáteční a koncovou polohu) prvního výskytu shody.

Regulární výraz hledá všechna slova, která začínají velkým „S“:

import re

txt = "The rain in Spain"
x = re.search(r"\bS\w+", txt)
print(x.span())

Příklad

Vytiskněte řetězec předaný do funkce:

import re

txt = "The rain in Spain"
x = re.search(r"\bS\w+", txt)
print(x.string)

Příklad

Vytiskněte část provázku, kde byla shoda.

Regulární výraz hledá všechna slova, která začínají velkým „S“:

import re

txt = "The rain in Spain"
x = re.search(r"\bS\w+", txt)
print(x.group())

Poznámka: Pokud nedojde k žádné shodě, Nonebude vrácena hodnota namísto objektu Match.

❮ Předchozí Další ❯

Výukový program Python

Manipulace se soubory

Moduly Pythonu

Python Matplotlib

Strojové učení

Python MySQL

Python MongoDB

Reference Pythonu

Reference modulu

Jak na to Python

Příklady Pythonu

Python RegEx

Modul RegEx

RegEx v Pythonu

Příklad

Funkce RegEx

Metaznaky

Speciální sekvence

Sady

Funkce findall().

Příklad

Příklad

Funkce search().

Příklad

Příklad

Funkce split().

Příklad

Příklad

Funkce sub().

Příklad

Příklad

Přiřadit objekt

Příklad

Příklad

Příklad

Příklad