在python中，使用正則表達式搜索字符串並將其替換爲另一個

我有一個db.sql文件，其中包含大量的url，如下所示。在python中，使用正則表達式搜索字符串並將其替換爲另一個

....<td class=\"column-1\"><a href=\"http://geni.us/4Lk5\" rel=nofollow\"><img src=\"http://www.toprateten.com/wp-content/uploads/2016/08/25460A-Panini-Press-Gourmet-Sandwich-Maker.jpg \" alt=\"25460A Panini Press Gourmet Sandwich Maker\" height=\"100\" width=\"100\"></a></td><td class=\"column-2\"><a href=\"http://geni.us/4Lk5\" rel=\"nofollow\">25460A Panini Press Gourmet Sandwich Maker</a></td><td class....

正如您所見，文件中有http://geni.us/4Lk5 \。

我有另一個product.csv文件，其中包含ID（如上面的4LK5）和Amazon產品URL，如下所示。

4Lk5 8738 8/16/2016 0:20 https://www.amazon.com/gp/product/B00IWOJRSM/ref=as_li_qf_sp_asin_il_tl?ie=UTF8 
Jx9Aj2 8738 8/22/2016 20:16 https://www.amazon.com/gp/product/B007EUSL5U/ref=as_li_qf_sp_asin_il_tl?ie=UTF8 
9sl2 8738 8/22/2016 20:18 https://www.amazon.com/gp/product/B00C3GQGVG/ref=as_li_qf_sp_asin_il_tl?ie=UTF8

正如您所看到的，有4LK5與亞馬遜產品URL匹配。

我已經閱讀過csv文件，並且只用Python選取了ID和Amazon產品網址。

def openFile(filename, mode): 
    index = 0 
    result = [] 
    with open(filename, mode) as csvfile: 
     spamreader = csv.reader(csvfile, delimiter = ',', quotechar = '\n') 
     for row in spamreader: 
      result.append({ 
       "genu_id": row[0], 
       "amazon_url": row[3] 
      }); 
    return result

我必須添加一些代碼來搜索合適的URL與genu_id在db.sql並與上面的代碼描述amazon_url更換。

請幫幫我。

來源

2017-06-06 Yuiry Kozlenko

爲什麼你想用這個正則表達式，而不是用'lxml.html'或類似的方式解析單元格內容？ –

我是python的新手，所以我不太瞭解。我認爲我必須使用正則表達式來選擇'http：//'+'geni.us/4Lk5'in ... ** - 1 \「>

沒有必要對正則表達式，如果你有這樣的預定義的結構 - 如果所有的鏈接都是在http://geni.us/<geni_id>形式，你可以通過閱讀你的CSV的每一行，並在您的SQL文件替換比賽用簡單的str.replace()做到這一點。喜歡的東西：

import csv 

with open("product.csv", "rb") as source, open("db.sql", "r+") as target: # open the files 
    sql_contents = target.read() # read the SQL file contents 
    reader = csv.reader(source, delimiter="\t") # build a CSV reader, tab as a delimiter 
    for row in reader: # read the CSV line by line 
     # replace any match of http://geni.us/<first_column> with third column's value 
     sql_contents = sql_contents.replace("http://geni.us/{}".format(row[0]), row[3]) 
    target.seek(0) # seek back to the start of your SQL file 
    target.truncate() # truncate the rest 
    target.write(sql_contents) # write back the changed content 
    # ... 
    # Profit? :D

當然，如果你原來的CSV文件是逗號分隔，在csv.reader()調用替換分隔符 - 在此提出的一個似乎製表符分隔。

來源

2017-06-06 18:07:51 zwer

在python中，使用正則表達式搜索字符串並將其替換爲另一個

回答

相關問題