Найти - Пользователи
Полная версия: Как подружить regex и unicode?
Начало » Python для экспертов » Как подружить regex и unicode?
1
kornieff
Помогите заставить это рег-выражение работать с русским текстом.
 
word = "Ветер++,;"
non_word_regex = re.compile('\W+')
word = non_word_regex.sub('', word)
print word
tabajara
# -*- coding: Windows-1251 -*-
import re
word = u"Ветер++,;"
non_word_regex = re.compile(r'\W+', re.U)
word = non_word_regex.sub('', word)
print word
kornieff
Огромное спасибо. Еше один вопрос: “Что означает r В r'\W+' и естх ли подробное описание строкоформирующих символов?”
tabajara
http://docs.python.org/ref/strings.html
String literals may optionally be prefixed with a letter “r” or “R”; such strings are called raw strings and use different rules for interpreting backslash escape sequences.



When an “r” or “R” prefix is present, a character following a backslash is included in the string without change, and all backslashes are left in the string. For example, the string literal r“\n” consists of two characters: a backslash and a lowercase “n”. String quotes can be escaped with a backslash, but the backslash remains in the string; for example, r“\”" is a valid string literal consisting of two characters:a backslash and a double quote;
>>> print "\naaa\n"
aaa
>>> print r"\naaa\n"
\naaa\n
>>>
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Powered by DjangoBB