分離其他地方包含的字符串

我正在設置一個腳本來根據文件中包含的文本合併PDF。我在這裏的問題是「小提琴I」也包含在「小提琴II」中，並且「中音薩克斯管I」也包含在「中音薩克斯管II」中。我該如何設置，以便tempList只包含來自「Violin I」的條目並排除「Violin II」，反之亦然？分離其他地方包含的字符串

pdfList = ["01 Violin I.pdf", "02 Violin I.pdf","01 Violin II.pdf", "02 Violin II.pdf", ] 
instruments = ["Soprano", "Tenor", "Violin I", "Violin II", "Viola", "Cello", "Contrabass", "Alto Saxophone I", "Alto Saxophone II", "Tenor Saxophone", "Baritone Saxophone"] 


# create arrays for each instrument that can be used for merging/organization 
def organizer(): 
    for fileName in pdfList: 
     for instrument in instruments: 
      tempList = [] 
      if instrument in fileName: 
       tempList.append(fileName) 
     print tempList 


print pdfList 
organizer()

來源

2013-03-23 jumbopap

PDF是否總是像這樣命名？ IE瀏覽器。 '號碼+儀表+ .pdf'。或者我們是否應該假定PDF可以有任何包含該工具的名稱？ – woemler 2013-03-23 16:22:39

是的，PDFs將始終採用格式「（初始數字）+（一些文本）+（儀器）+ .pdf – jumbopap 2013-03-23 16:23:37

嘗試使這一變化：

... 
if instrument+'.pdf' in fileName: 
...

這會涵蓋所有情況？以避免包括子

來源

2013-03-23 16:25:50 woemler

簡單而有效，謝謝。 – jumbopap 2013-03-23 17:57:40

一種方法是使用正則表達式，如：

import re 

pdfList = ["01 Violin I.pdf", "02 Violin I.pdf","01 Violin II.pdf", "02 Violin \ 
II.pdf", ] 
instruments = ["Soprano", "Tenor", "Violin I", "Violin II", "Viola", "Cello", "\ 
Contrabass", "Alto Saxophone I", "Alto Saxophone II", "Tenor Saxophone", "Barit\ 
one Saxophone"] 

# create arrays for each instrument that can be used for merging/organization 
def organizer(): 
    for fileName in pdfList: 
     tempList = [] 
     for instrument in instruments: 
      if re.search(r'\b{}\b'.format(instrument), fileName): 
       tempList.append(fileName) 
     print tempList 

print pdfList 
organizer()

這種包裝了\b搜索詞，使其只在開頭和結尾都以字邊界匹配。此外，也許很明顯但值得指出的是，這也會使你的樂器名稱成爲正則表達式的一部分，所以請注意，如果你使用任何也是正則表達式元字符的字符，它們將被相互插入（現在你不是）。更普遍的方案將需要一些代碼來查找和正確地逃避這些角色。

來源

2013-03-23 16:28:22 FatalError

分離其他地方包含的字符串

回答

相關問題