2016-09-24 153 views
0

概述提取各種信息

想寫到另一個csv文件

條件之前提取,如姓名,從2列的CSV文件的日期和地址的各種信息

  1. 提取名稱第一行,因爲它將始終是第一行 行。
  2. 提取日期通過正則表達式(在Python有正則表達式?)##由常量關鍵字 '路'

/##/#### 格式
  • 提取地址 從EXCEL


    觀察

    例CSV僞源數據參考文件格式

     ID,DATA 88888,DADDY 88888,2/06/2016 88888,new issac road 99999,MUMMY 99999,samsung road 99999,12/02/2016 

    期望中的CSV結果

    ID,Name,Address,DATE 
    8888,DADDY,new issac road,2/06/2016 
    9999,MUMMY,samsung road,12/02/2016 
    

    是我到目前爲止有:

    import csv 
    from collections import defaultdict 
    
    columns = defaultdict(list) # each value in each column is appended to a list 
    
    with open('dummy_data.csv') as f: 
        reader = csv.DictReader(f) # read rows into a dictionary format 
        for row in reader: # read a row as {column1: value1, column2: value2,...} 
         for (k,v) in row.items(): # go over each column name and value 
          columns[k].append(v) # append the value into the appropriate list 
               # based on column name k 
    uniqueidstatement = columns['receipt_id'] 
    
    print uniqueidstatement 
    
    resultFile = open("wtf.csv",'wb') 
    wr = csv.writer(resultFile, dialect='excel') 
    wr.writerow(uniqueidstatement) 
    
  • +0

    您遇到的實際問題是什麼? –

    +0

    一個while循環會是正確的想法? – Perlinn

    +0

    我不知道從哪裏開始,基於那些我已經聲明 – Perlinn

    回答

    0

    您可以將通過ID並從每個組可以判斷這是日期的章節這是一些簡單的邏輯地址。

    import csv 
    from itertools import groupby 
    from operator import itemgetter 
    
    with open("test.csv") as f, open("out.csv", "w") as out: 
        reader = csv.reader(f) 
        next(reader) 
        writer = csv.writer(out) 
        writer.writerow(["ID","NAME","ADDRESS", "DATE"]) 
        groups = groupby(csv.reader(f), key=itemgetter(0)) 
        for k, v in groups: 
         id_, name = next(v) 
         add_date_1, add_date_2 = next(v)[1], next(v)[1] 
         date, add = (add_date_1, add_date_2) if "road" in add_date_2 else (add_date_2, add_date_1) 
         writer.writerow([id_, name, add, date]) 
    
    +0

    我得到了ID,NAME,ADDRESS,DATE,然後是空行和88888 DADDY new issac road 2/06/2016(沒有逗號是好的)就是這樣。我有點失落,剩下的地方去了 – Perlinn

    +0

    'id_,name,_ = next(v)>>>>>>參考source csv文件列的id_,而name參考源csv文件名列?我刪除了,_ – Perlinn

    +0

    您的數據是用空行和逗號分隔的嗎?那些評論實際上是否存在? –