2016-09-25 79 views
3

在處理熊貓數據框內的數據時需要一些幫助。 任何幫助是最受歡迎的。在熊貓數據框中將字符串2.90K轉換爲2900或5.2M到5200000

我有CSV格式的OHCLV數據。我已將文件加載到熊貓數據框中。

如何將體積欄從2.90K轉換爲2900或5.2M轉換爲5200000. 該欄可以包含K形式的千位和M百萬位數。

import pandas as pd 

file_path = '/home/fatjoe/UCHM.csv' 
df = pd.read_csv(file_path, parse_dates=[0], index_col=0) 
df.columns = [ 
"closing_price", 
"opening_price", 
"high_price", 
"low_price", 
"volume", 
"change"] 

df['opening_price'] = df['closing_price'] 
df['opening_price'] = df['opening_price'].shift(-1) 
df = df.replace('-', 0) 
df = df[:-1] 
print(df.head()) 

Console: 
Date 
2016-09-23   0 
2016-09-22  9.60K 
2016-09-21  54.20K 
2016-09-20 115.30K 
2016-09-19  18.90K 
2016-09-16 176.10K 
2016-09-15  31.60K 
2016-09-14  10.00K 
2016-09-13  3.20K 

回答

2

假設你有以下DF:

In [30]: df 
Out[30]: 
     Date  Val 
0 2016-09-23  100 
1 2016-09-22 9.60M 
2 2016-09-21 54.20K 
3 2016-09-20 115.30K 
4 2016-09-19 18.90K 
5 2016-09-16 176.10K 
6 2016-09-15 31.60K 
7 2016-09-14 10.00K 
8 2016-09-13 3.20M 

你能做到這樣:

In [31]: df.Val = (df.Val.replace(r'[KM]+$', '', regex=True).astype(float) * \ 
    ....:   df.Val.str.extract(r'[\d\.]+([KM]+)', expand=False) 
    ....:    .fillna(1) 
    ....:    .replace(['K','M'], [10**3, 10**6]).astype(int)) 

In [32]: df 
Out[32]: 
     Date  Val 
0 2016-09-23  100.0 
1 2016-09-22 9600000.0 
2 2016-09-21 54200.0 
3 2016-09-20 115300.0 
4 2016-09-19 18900.0 
5 2016-09-16 176100.0 
6 2016-09-15 31600.0 
7 2016-09-14 10000.0 
8 2016-09-13 3200000.0 

說明:

In [36]: df.Val.replace(r'[KM]+$', '', regex=True).astype(float) 
Out[36]: 
0 100.0 
1  9.6 
2  54.2 
3 115.3 
4  18.9 
5 176.1 
6  31.6 
7  10.0 
8  3.2 
Name: Val, dtype: float64 

In [37]: df.Val.str.extract(r'[\d\.]+([KM]+)', expand=False) 
Out[37]: 
0 NaN 
1  M 
2  K 
3  K 
4  K 
5  K 
6  K 
7  K 
8  M 
Name: Val, dtype: object 

In [38]: df.Val.str.extract(r'[\d\.]+([KM]+)', expand=False).fillna(1) 
Out[38]: 
0 1 
1 M 
2 K 
3 K 
4 K 
5 K 
6 K 
7 K 
8 M 
Name: Val, dtype: object 

In [39]: df.Val.str.extract(r'[\d\.]+([KM]+)', expand=False).fillna(1).replace(['K','M'], [10**3, 10**6]).astype(int) 
Out[39]: 
0   1 
1 1000000 
2  1000 
3  1000 
4  1000 
5  1000 
6  1000 
7  1000 
8 1000000 
Name: Val, dtype: int32 
+0

謝謝......它的工作原理........現在讓我試着瞭解代碼在做什麼 –

+0

@JosephMNjuguna,歡迎您!我已經爲我的答案添加了一步一步的解釋 - 請檢查... – MaxU

+0

@MaxU ..........現在我知道如何使用正則表達式與熊貓....... ..在它幾天 –

0
def value_to_float(x): 
    if type(x) == float or type(x) == int: 
     return x 
    if 'K' in x: 
     if len(x) > 1: 
      return float(x.replace('K', '')) * 1000 
     return 1000.0 
    if 'M' in x: 
     if len(x) > 1: 
      return float(x.replace('M', '')) * 1000000 
     return 1000000.0 
    if 'B' in x: 
     return float(x.replace('B', '')) * 1000000000 
    return 0.0 

    df."Insert data series column" = df."Insert data series column" .apply(value_to_float) 

`

相關問題