我有一個關鍵字列表:如何替換字典中的字典值的關鍵字(不區分大小寫)?
keywords = ["test", "Ok", "great stuff", "PaaS", "mydata"]
和類型的字典列表:
statements = [
{"id":"1","text":"Test, this is OK, great stuff, PaaS."},
{"id":"2","text":"I would like to test this, Great stuff."}
]
期望的行爲
當keyword
存在於statement['text']
(不分情況下),我想用關鍵字的「標記」版本替換關鍵字,即匹配的關鍵字Test
將變爲:
<span class="my_class" data-mydata="<a href="#">test</a>">Test</span>
我已經試過
下面是我已經試過,觀測/注意事項是:
01)它不更換關鍵字。
02)如果是,一旦施加標記,我不想要標記中存在的比賽 - 即標記內mydata
不應該匹配。
03)我可能已經開始在這個錯誤的方向,並需要從頭開始重新設計邏輯。
Python 2.7版代碼
import re
keywords = ["test", "ok", "great stuff", "paas"]
statements = [
{"id":"1","text":"Test, this is OK, great stuff, PaaS."},
{"id":"2","text":"I would like to test this, Great stuff."}
]
keyword_markup = {}
print "\nKEYWORDS (all lowercase):\n"
for i in keywords:
print "\"" + i + "\" "
print "\nORIGINAL STATEMENTS:\n"
for statement in statements:
print statement['text'] + "\n"
statement_counter = 1
# for each statement
for statement in statements:
print "\nIN STATEMENT " + str(statement_counter) + ": \n"
# get the original statement
original_statement = statement['text']
# for each keyword in the keyword list
for keyword in keywords:
# if the keyword is not in the keyword_markup dict
# add it (with a lowercase key)
if keyword.lower() not in keyword_markup:
keyword_markup[keyword.lower()] = "<span class=\"my_class\" data-mydata=\"<a href="#">" + keyword + "</a>\">" + keyword + "</span>"
print "The key added to the keyword_markup dict is: " + keyword.lower()
# if the keyword is in a lowercase version of the statement
if keyword in original_statement.lower():
# sanity check - print the matched keyword
print "The keyword matched in the statement is: " + keyword
# change the text value of the statement "in place"
# by replacing the keyword, with its marked up equivalent.
# using the original_statement as the source string
statement['text'] = re.sub(keyword,keyword_markup[keyword.lower()],original_statement)
statement_counter += 1
print "\nMARKED UP KEYWORDS AVAILABLE:\n"
for i in keyword_markup:
print keyword_markup[i]
print "\nNEW STATEMENTS:\n"
for statement in statements:
print statement['text'] + "\n"
結果
KEYWORDS (all lowercase):
"test"
"ok"
"great stuff"
"paas"
ORIGINAL STATEMENTS:
Test, this is OK, great stuff, PaaS.
I would like to test this, Great stuff.
IN STATEMENT 1:
The key added to the keyword_markup dict is: test
The keyword matched in the statement is: test
The key added to the keyword_markup dict is: ok
The keyword matched in the statement is: ok
The key added to the keyword_markup dict is: great stuff
The keyword matched in the statement is: great stuff
The key added to the keyword_markup dict is: paas
The keyword matched in the statement is: paas
IN STATEMENT 2:
The keyword matched in the statement is: test
The keyword matched in the statement is: great stuff
MARKED UP KEYWORDS AVAILABLE:
<span class="my_class" data-mydata="<a href="#">test</a>">test</span>
<span class="my_class" data-mydata="<a href="#">paas</a>">paas</span>
<span class="my_class" data-mydata="<a href="#">ok</a>">ok</span>
<span class="my_class" data-mydata="<a href="#">great stuff</a>">great stuff</span>
NEW STATEMENTS:
Test, this is OK, great stuff, PaaS.
I would like to test this, Great stuff.
您是否嘗試過對輸入進行標記,標記特殊標記,然後將標記重新組合到輸出中? https://docs.python.org/3/library/re.html#writing-a-tokenizer – IceArdor