分隔一個特定的列並將它們添加爲CSV（Python3，CSV）的列

我有一個csv文件，其中有幾列，我首先用冒號（;）分隔。但是，ONE列由管道分隔我想分隔這個列並創建新的列。分隔一個特定的列並將它們添加爲CSV（Python3，CSV）的列

輸入：

Column 1 Column 2  Column 3 
    1   2   3|4|5 
    6   7   6|7|8 
    10   11   12|13|14

所需的輸出：

Column 1 Column 2  ID Age Height 
    1   2   3  4 5 
    6   7   6  7 8 
    10   11   12  13 14

到目前爲止我的代碼限定在第一時間通過;然後轉換爲DF（這是我期望的最終格式）

delimit = list(csv.reader(open('test.csv', 'rt'), delimiter=';')) 
df = pd.DataFrame(delimit)

來源

2015-11-06 user3682157

可以解析最後一欄和[分割它（http://stackoverflow.com/questions/14745022/pandas-dataframe-how-do- –

你並沒有完全顯示的數據是什麼樣子（你說它是由分號分隔，但你的例子沒有任何），但如果它看起來像

Column 1;Column 2;Column 3 
1;2;3|4|5 
6;7;6|7|8 
10;11;12|13|14

你可以做somethi NG像

>>> df = pd.read_csv("test.csv", sep="[;|]", engine='python', skiprows=1, 
        names=["Column 1", "Column 2", "ID", "Age", "Height"]) 
>>> df 
    Column 1 Column 2 ID Age Height 
0   1   2 3 4  5 
1   6   7 6 7  8 
2  10  11 12 13  14

這是通過使用正則表達式分離，意思是「無論是;或|」，並手動強制列名。

或者，你能做到這一點的幾個步驟：

>>> df = pd.read_csv("test.csv", sep=";") 
>>> df 
    Column 1 Column 2 Column 3 
0   1   2  3|4|5 
1   6   7  6|7|8 
2  10  11 12|13|14 
>>> c3 = df.pop("Column 3").str.split("|", expand=True) 
>>> c3.columns = ["ID", "Age", "Height"] 
>>> df.join(c3) 
    Column 1 Column 2 ID Age Height 
0   1   2 3 4  5 
1   6   7 6 7  8 
2  10  11 12 13  14

來源

2015-11-11 00:52:42 DSM

嘗試運行代碼的後半部分時出現以下錯誤：TypeError：split（）得到了意外的關鍵字參數'expand' – user3682157

@ user3682157：您很可能使用老版本的熊貓。 – DSM

delimit = list(csv.reader(open('test.csv', 'rt'), delimiter=';')) 

for row in delimit: 
    piped = row.pop() 
    row.extend(piped.split('|')) 

df = pd.DataFrame(delimit)

delimit最終看起來像：

[ 
    ['1', '2', '3', '4', '5'], 
    ['6', '7', '6', '7', '8'], 
    ['10', '11', '12', '13', '14'], 
]

來源

2015-11-11 00:39:14

它實際上是更快使用CSV lib和str.replace：

import csv 
with open("test.txt") as f: 
    next(f) 
    # itertools.imap python2 
    df = pd.DataFrame.from_records(csv.reader(map(lambda x: x.rstrip().replace("|", ";"), f), delimiter=";"), 
            columns=["Column 1", "Column 2", "ID", "Age", "Height"]).astype(int)

一些計時：

In [35]: %%timeit 
pd.read_csv("test.txt", sep="[;|]", engine='python', skiprows=1, 
        names=["Column 1", "Column 2", "ID", "Age", "Height"]) 
    ....: 
100 loops, best of 3: 14.7 ms per loop 

In [36]: %%timeit                
with open("test.txt") as f: 
    next(f) 
    df = pd.DataFrame.from_records(csv.reader(map(lambda x: x.rstrip().replace("|", ";"), f),delimiter=";"), 
           columns=["Column 1", "Column 2", "ID", "Age", "Height"]).astype(int) 
    ....: 
100 loops, best of 3: 6.05 ms per loop

你可以str.split：

with open("test.txt") as f: 
    next(f) 
    df = pd.DataFrame.from_records(map(lambda x: x.rstrip().replace("|", ";").split(";"), f), 
            columns=["Column 1", "Column 2", "ID", "Age", "Height"])

來源

2015-11-16 12:38:47

想出了一個解決方案，我自己：

df = pd.DataFrame(delimit) 
s = df['Column 3'].apply(lambda x: pd.Series(x.split('|'))) 
frame = pd.DataFrame(s) 
frame.rename(columns={0: 'ID',1:'Height',2:'Age'}, inplace=True) 
result = pd.concat([df, frame], axis=1)

來源

2015-11-17 19:08:06 user3682157

分隔一個特定的列並將它們添加爲CSV（Python3，CSV）的列

回答

相關問題