是去正則表達式,閱讀帶分隔符的csv',',您可以提取最後兩個日期並將其存儲在列表中。然後用''
填入日期,然後加入你想要的列並刪除剩下的部分。例如
如果你有一個CSV文件:
239845723,28374,2384234,AEVNE EFU 5 GN OR WNV,Owinv Vnwo Badvw 5 VIN,Ginq 2 jnwve wef evera wve 6 vwe as fgsb bfd bdfwd dsf (sdv seves 4-6), sebsbe sve(sevsev esvse 7-10) fsesef fesevsesv PaVvin (1 evesve vEV VEWee, 2 for WVEee VEWE. paper tuff as sWEFEWoon as VEWeew.).,2011-07-13 00:00:00,2011-07-13 00:00:00
239845723,28374,2384234,AEVNE EFU 5 GN OR WNV,Owinv Vnwo Badvw 5 VIN,Ginq 2 jnwve wef evera wve 6 vwe as fgsb bfd bdfwd dsf (sdv seves 4-6), sebsbe sve(sevsev esvse 7-10) fsesef fesevsesv PaVvin (1 evesve vEV VEWee 2 for WVEee VEWE.).,2011-07-13 00:00:00,2011-07-13 00:00:00
239845723,28374,2384234,AEVNE EFU 5 GN OR WNV,Owinv Vnwo Badvw 5 VIN sebsbe sve(sevsev esvse 7-10) fsesef fesevsesv PaVvin (1 evesve vEV VEWee 2 for WVEee VEWE. paper tuff as sWEFEWoon as VEWeew.).,2011-07-13 00:00:00,2011-07-13 00:00:00
然後
df = pd.read_csv('good.txt',delimiter=',',header=None)
# Get the Dates from all the DataFrame
dates = [[item] for i in df.values for item in i if '2011-' in str(item)]
# Merge two Dates for each column
dates = pd.DataFrame([x+y for x,y in zip(dates[0::2], dates[1::2])])
# Remove the dates present
df = df.replace({'2011-': np.nan}, regex=True).replace(np.nan,'')
#Get the columns you want to merge
cols = df.columns[4:]
# Merge the columns
df[4] = df[cols].astype(str).apply(lambda x: ','.join(x), axis=1)
df[4] = df[4].replace('\,+$', '',regex=True)
#Drop the Columns
df = df.drop(df.columns[5:],axis=1)
#Concat the dates
df = pd.concat([df,dates],axis=1)
輸出:打印(DF)
0 1 2 3 \
0 239845723 28374 2384234 AEVNE EFU 5 GN OR WNV
1 239845723 28374 2384234 AEVNE EFU 5 GN OR WNV
2 239845723 28374 2384234 AEVNE EFU 5 GN OR WNV
4 0 \
0 Owinv Vnwo Badvw 5 VIN,Ginq 2 jnwve wef evera ... 2011-07-13 00:00:00
1 Owinv Vnwo Badvw 5 VIN,Ginq 2 jnwve wef evera ... 2011-07-13 00:00:00
2 Owinv Vnwo Badvw 5 VIN sebsbe sve(sevsev esvse... 2011-07-13 00:00:00
1
0 2011-07-13 00:00:00
1 2011-07-13 00:00:00
2 2011-07-13 00:00:00
輸出繼電器的第四列:
['Owinv Vnwo Badvw 5 VIN,Ginq 2 jnwve wef evera wve 6 vwe as fgsb bfd bdfwd dsf (sdv seves 4-6), sebsbe sve(sevsev esvse 7-10) fsesef fesevsesv PaVvin (1 evesve vEV VEWee, 2 for WVEee VEWE. paper tuff as sWEFEWoon as VEWeew.).',
'Owinv Vnwo Badvw 5 VIN,Ginq 2 jnwve wef evera wve 6 vwe as fgsb bfd bdfwd dsf (sdv seves 4-6), sebsbe sve(sevsev esvse 7-10) fsesef fesevsesv PaVvin (1 evesve vEV VEWee 2 for WVEee VEWE.).',
'Owinv Vnwo Badvw 5 VIN sebsbe sve(sevsev esvse 7-10) fsesef fesevsesv PaVvin (1 evesve vEV VEWee 2 for WVEee VEWE. paper tuff as sWEFEWoon as VEWeew.).']
如果你想改變列索引
df.columns = [i for i in range(df.shape[1])]
希望它可以幫助
只是看着問題之列,是否有任何一致的特徵添加到數據,你正在尋找捕捉?例如,這個例子以a結尾,他們都會像這樣結束嗎? – JBuete
嘿JBuete!但是,它們都是以句點結束的,但是,在本例中,整個列中也有句點6字符串 –
如果數據中有一個帶有未轉義逗號的csv文件,那麼您確實沒有csv文件。你有一堆行中有一串逗號。 –