2017-02-23 65 views
1

這是一個後續問題是: pandas replace only part of a column大熊貓與日期時間指數僅更換一列的一部分

這裏是我的電流輸入:

import pandas as pd 
from pandas_datareader import data, wb 
import numpy as np 
from datetime import date 

pd.set_option('expand_frame_repr', False) 

df = data.DataReader('GE', 'yahoo', date (2000, 1, 1), date (2000, 2, 1)) 
df['x'] = np.where (df['Open'] > df['High'].shift(-2), 1, np.nan) 
print (df.round(2)) 

# this section of code works perfectly for an integer based index....... 
ii = df[pd.notnull(df['x'])].index 
dd = np.diff(ii) 
jj = [ii[i] for i in range(1,len(ii)) if dd[i-1] > 2] 
jj = [ii[0]] + jj 

for ci in jj: 
    df.loc[ci:ci+2,'x'] = 1.0 
# end of section that works perfectly for an integer based index...... 

print (df.round(2)) 

這裏是我的電流輸出:

   Open High  Low Close Volume Adj Close x 
Date                 
2000-01-03 153.00 153.69 149.19 150.00 22069800  29.68 1.0 
2000-01-04 147.25 148.00 144.00 144.00 22121400  28.49 1.0 
2000-01-05 143.75 147.00 142.56 143.75 27292800  28.44 NaN 
2000-01-06 143.12 146.94 142.63 145.67 19873200  28.82 NaN 
2000-01-07 148.00 151.88 147.00 151.31 20141400  29.94 NaN 
2000-01-10 152.69 154.06 151.12 151.25 15226500  29.93 NaN 
2000-01-11 151.00 152.69 150.62 151.50 15123000  29.98 NaN 
2000-01-12 151.06 153.25 150.56 152.00 18342300  30.08 NaN 
2000-01-13 153.13 154.94 153.00 153.75 14953500  30.42 1.0 
2000-01-14 153.38 154.63 149.56 151.00 18480300  29.88 1.0 
2000-01-18 149.62 149.62 146.75 148.00 18296700  29.29 NaN 
2000-01-19 146.50 150.94 146.25 148.72 14849700  29.43 NaN 
2000-01-20 149.06 149.75 142.63 145.94 30759000  28.88 1.0 
2000-01-21 147.94 148.25 143.94 144.13 24005400  28.52 1.0 
2000-01-24 145.31 145.94 136.44 138.13 27116100  27.33 1.0 
2000-01-25 138.06 140.38 137.00 138.50 25387500  27.41 NaN 
2000-01-26 140.50 142.19 138.88 141.44 15856800  27.99 NaN 
2000-01-27 141.56 141.75 137.06 141.75 19243500  28.05 1.0 
2000-01-28 140.31 140.50 133.63 134.00 29846700  26.52 1.0 
2000-01-31 134.00 135.94 133.06 134.00 21782700  26.52 NaN 
2000-02-01 134.25 137.00 134.00 136.00 27339000  26.91 NaN 
Traceback (most recent call last): 
    File "C:\stocks\question4 for stack overflow.py", line 15, in <module> 
    jj = [ii[i] for i in range(1,len(ii)) if dd[i-1] > 2] 
    File "C:\stocks\question4 for stack overflow.py", line 15, in <listcomp> 
    jj = [ii[i] for i in range(1,len(ii)) if dd[i-1] > 2] 
TypeError: Cannot cast ufunc greater input from dtype('<m8[ns]') to dtype('<m8') with casting rule 'same_kind' 

我想要做的是將列'x'更改爲連續三個1的集合,不重疊。所期望的輸出是:

   Open High  Low Close Volume Adj Close x 
Date                 
2000-01-03 153.00 153.69 149.19 150.00 22069800  29.68 1.0 
2000-01-04 147.25 148.00 144.00 144.00 22121400  28.49 1.0 
2000-01-05 143.75 147.00 142.56 143.75 27292800  28.44 1.0 
2000-01-06 143.12 146.94 142.63 145.67 19873200  28.82 NaN 
2000-01-07 148.00 151.88 147.00 151.31 20141400  29.94 NaN 
2000-01-10 152.69 154.06 151.12 151.25 15226500  29.93 NaN 
2000-01-11 151.00 152.69 150.62 151.50 15123000  29.98 NaN 
2000-01-12 151.06 153.25 150.56 152.00 18342300  30.08 NaN 
2000-01-13 153.13 154.94 153.00 153.75 14953500  30.42 1.0 
2000-01-14 153.38 154.63 149.56 151.00 18480300  29.88 1.0 
2000-01-18 149.62 149.62 146.75 148.00 18296700  29.29 1.0 
2000-01-19 146.50 150.94 146.25 148.72 14849700  29.43 NaN 
2000-01-20 149.06 149.75 142.63 145.94 30759000  28.88 1.0 
2000-01-21 147.94 148.25 143.94 144.13 24005400  28.52 1.0 
2000-01-24 145.31 145.94 136.44 138.13 27116100  27.33 1.0 
2000-01-25 138.06 140.38 137.00 138.50 25387500  27.41 NaN 
2000-01-26 140.50 142.19 138.88 141.44 15856800  27.99 NaN 
2000-01-27 141.56 141.75 137.06 141.75 19243500  28.05 1.0 
2000-01-28 140.31 140.50 133.63 134.00 29846700  26.52 1.0 
2000-01-31 134.00 135.94 133.06 134.00 21782700  26.52 1.0 
2000-02-01 134.25 137.00 134.00 136.00 27339000  26.91 NaN 

所以,1月18日5,以及從NaN的〜1.0 31變化。

正如上面的評論所述,代碼的第二部分適用於基於整數的索引。但是,在使用dtype datetime64 [ns]的日期時間索引時它不起作用。我想我只需要對代碼的第二部分進行小小的調整就可以實現這一點(希望)。

由於提前, 大衛

--------------------------跟進部分------- -----------------------------

謝謝你和我一起掛在b2002。由於它的簡潔性,我真的試圖保持最佳的解決方案。當我運行代碼開箱,這裏是輸出:

原始輸出... JJ = [II對於i [i]於範圍(1,LEN(ⅱ)),如果DD [I-1]> 2] ...

...一個[CI:CI + 2 = 1.0 ...

   Open High  Low Close Volume Adj Close x ii dd jj jj desired 
Date                 
2000-01-03 153.00 153.69 149.19 150.00 22069800  29.68 1.0 1 
2000-01-04 147.25 148.00 144.00 144.00 22121400  28.49 1.0 1 
2000-01-05 143.75 147.00 142.56 143.75 27292800  28.44 1.0 2   x x 
2000-01-06 143.12 146.94 142.63 145.67 19873200  28.82 1.0 3 1 
2000-01-07 148.00 151.88 147.00 151.31 20141400  29.94 NaN 4 1 
2000-01-10 152.69 154.06 151.12 151.25 15226500  29.93 NaN 5 1 
2000-01-11 151.00 152.69 150.62 151.50 15123000  29.98 NaN 6 1 
2000-01-12 151.06 153.25 150.56 152.00 18342300  30.08 NaN 7 1 
2000-01-13 153.13 154.94 153.00 153.75 14953500  30.42 1.0 1 
2000-01-14 153.38 154.63 149.56 151.00 18480300  29.88 1.0 1 
2000-01-18 149.62 149.62 146.75 148.00 18296700  29.29 1.0 10 3 x x x 
2000-01-19 146.50 150.94 146.25 148.72 14849700  29.43 1.0 11 1 
2000-01-20 149.06 149.75 142.63 145.94 30759000  28.88 1.0 1 
2000-01-21 147.94 148.25 143.94 144.13 24005400  28.52 1.0 1 
2000-01-24 145.31 145.94 136.44 138.13 27116100  27.33 1.0 1 
2000-01-25 138.06 140.38 137.00 138.50 25387500  27.41 1.0 15 4 z z 
2000-01-26 140.50 142.19 138.88 141.44 15856800  27.99 1.0 16 1 
2000-01-27 141.56 141.75 137.06 141.75 19243500  28.05 1.0 1 
2000-01-28 140.31 140.50 133.63 134.00 29846700  26.52 1.0 1 
2000-01-31 134.00 135.94 133.06 134.00 21782700  26.52 1.0 19 3 x x x 
2000-02-01 134.25 137.00 134.00 136.00 27339000  26.91 1.0 20 1    

我真的想了解是怎麼回事,所以我在之前,之後和之後設置列ii,dd,jj。當我調整輸入到:

... JJ = [II對於i [i]於範圍(1,LEN(ⅱ)),如果DD [I-1]> 2] ...

...一個[CI:CI + 1] = 1.0 ...

這裏是輸出:

   Open High  Low Close Volume Adj Close x 
Date                 
2000-01-03 153.00 153.69 149.19 150.00 22069800  29.45 1.0 
2000-01-04 147.25 148.00 144.00 144.00 22121400  28.27 1.0 
2000-01-05 143.75 147.00 142.56 143.75 27292800  28.22 1.0 
2000-01-06 143.12 146.94 142.63 145.67 19873200  28.60 NaN 
2000-01-07 148.00 151.88 147.00 151.31 20141400  29.70 NaN 
2000-01-10 152.69 154.06 151.12 151.25 15226500  29.69 NaN 
2000-01-11 151.00 152.69 150.62 151.50 15123000  29.74 NaN 
2000-01-12 151.06 153.25 150.56 152.00 18342300  29.84 NaN 
2000-01-13 153.13 154.94 153.00 153.75 14953500  30.18 1.0 
2000-01-14 153.38 154.63 149.56 151.00 18480300  29.64 1.0 
2000-01-18 149.62 149.62 146.75 148.00 18296700  29.05 1.0 
2000-01-19 146.50 150.94 146.25 148.72 14849700  29.19 NaN 
2000-01-20 149.06 149.75 142.63 145.94 30759000  28.65 1.0 
2000-01-21 147.94 148.25 143.94 144.13 24005400  28.29 1.0 
2000-01-24 145.31 145.94 136.44 138.13 27116100  27.12 1.0 
2000-01-25 138.06 140.38 137.00 138.50 25387500  27.19 1.0 
2000-01-26 140.50 142.19 138.88 141.44 15856800  27.77 NaN 
2000-01-27 141.56 141.75 137.06 141.75 19243500  27.83 1.0 
2000-01-28 140.31 140.50 133.63 134.00 29846700  26.31 1.0 
2000-01-31 134.00 135.94 133.06 134.00 21782700  26.31 1.0 
2000-02-01 134.25 137.00 134.00 136.00 27339000  26.70 NaN 

唯一的問題是與1月25日,其中np.diff是給的值4.我只需要跳過4的值就可以保留現有的三個1。我試圖它進入與這兩個嘗試沒有工作就上www.jj-fashion.com JJ之前修改DD:

dd[dd == 4] = 1 

dd = [3 if x==4 else x for x in dd] 

也試圖修改與此JJ條目:

JJ = [II [I]對於i在範圍(1,LEN(ⅱ))如果((DD == 4)或([I-1]> 2))]

其給出此錯誤消息:

Traceback (most recent call last): 
    File "C:\stocks\question4 for stack overflow.py", line 109, in <module> 
    jj = [ii[i] for i in range(1,len(ii)) if ((dd == 4) or (dd[i-1] > 2))] 
    File "C:\stocks\question4 for stack overflow.py", line 109, in <listcomp> 
    jj = [ii[i] for i in range(1,len(ii)) if ((dd == 4) or (dd[i-1] > 2))] 
ValueError: The truth value of an array with more than one element is  ambiguous. Use a.any() or a.all() 

人有有任何想法嗎?

+0

您可以嘗試使用'ix'作爲基於混合標籤/基於整數的訪問而不是'loc',或reset_index並執行轉換並將set_index返回到日期 – Anzel

+0

您可以解釋代碼的邏輯嗎?你想做什麼?爲什麼在這些行中需要連續3個1? – Parfait

+0

凍糕 - 這只是一個例子。它背後沒有具體的原因。 –

回答

-1

---------------------最終答案/最終求助----------- 好吧,這是一對夫婦幾周的兼職工作和幾十個小時,但我終於明白了!我知道這個代碼是一個鈍器,但它的工作原理。如果任何人有關於減少代碼或加速它的建議,請讓我知道!

這裏是最後輸入:

import pandas as pd 
from pandas_datareader import data, wb 
import numpy as np 
from datetime import date 

df = data.DataReader('GE', 'yahoo', date (2000, 1, 1), date (2000, 6, 1)) 
df['x'] = np.where (df['Open'] < df['High'].shift(-2), 1, np.nan) 
df['x2'] = df['x'] 

test = 0 

for i in np.nditer(df['x2'], op_flags=['readwrite']): 

    if test == 4: 
     test = 0 

    if test == 3: 
     i[...] = 3 
     test = 4 

    if test == 2: 
     i[...] = 2 
     test = 3 

    if (test == 1) & (i[...] == 1): 
     i[...] = 1 
     test = 2 

    if (test == 0) & (i[...] == 1): 
     i[...] = 1 
     test = 2 

    if (test == 0) & (i[...] == np.nan): 
     i[...] = np.nan 
     test = 1 

print (df.round(2)) 

這裏是一個部分中的最終輸出:

   Open High  Low Close Volume Adj Close x x2 
Date                  
2000-01-03 153.00 153.69 149.19 150.00 22069800  29.45 NaN NaN 
2000-01-04 147.25 148.00 144.00 144.00 22121400  28.27 NaN NaN 
2000-01-05 143.75 147.00 142.56 143.75 27292800  28.22 1.0 1.0 
2000-01-06 143.12 146.94 142.63 145.67 19873200  28.60 1.0 2.0 
2000-01-07 148.00 151.88 147.00 151.31 20141400  29.70 1.0 3.0 
2000-01-10 152.69 154.06 151.12 151.25 15226500  29.69 1.0 1.0 
2000-01-11 151.00 152.69 150.62 151.50 15123000  29.74 1.0 2.0 
2000-01-12 151.06 153.25 150.56 152.00 18342300  29.84 1.0 3.0 
2000-01-13 153.13 154.94 153.00 153.75 14953500  30.18 NaN NaN 
2000-01-14 153.38 154.63 149.56 151.00 18480300  29.64 NaN NaN 
2000-01-18 149.62 149.62 146.75 148.00 18296700  29.05 1.0 1.0 
2000-01-19 146.50 150.94 146.25 148.72 14849700  29.19 1.0 2.0 
2000-01-20 149.06 149.75 142.63 145.94 30759000  28.65 NaN 3.0 
2000-01-21 147.94 148.25 143.94 144.13 24005400  28.29 NaN NaN 
2000-01-24 145.31 145.94 136.44 138.13 27116100  27.12 NaN NaN 
2000-01-25 138.06 140.38 137.00 138.50 25387500  27.19 1.0 1.0 
2000-01-26 140.50 142.19 138.88 141.44 15856800  27.77 NaN 2.0 
2000-01-27 141.56 141.75 137.06 141.75 19243500  27.83 NaN 3.0 
2000-01-28 140.31 140.50 133.63 134.00 29846700  26.31 NaN NaN 
2000-01-31 134.00 135.94 133.06 134.00 21782700  26.31 1.0 1.0 
2000-02-01 134.25 137.00 134.00 136.00 27339000  26.70 1.0 2.0 
2000-02-02 137.12 137.62 134.06 134.06 21820200  26.32 1.0 3.0 
2000-02-03 135.94 139.81 135.25 139.25 20232000  27.34 1.0 1.0 
2000-02-04 141.00 143.12 140.50 141.56 18167100  27.79 NaN 2.0 
2000-02-07 141.69 141.75 135.88 136.50 18285000  26.80 NaN 3.0 

我改變在列X2中的值,以顯示第1 - 3,而不是僅僅1看當一個新系列在一箇舊系列的結尾處開始時。

+0

不明白-1 ...你能解釋一下嗎? –

1

,如果它不依賴於指數的代碼將工作:

#mod version 
a = np.array(df.x) 
ii = np.where(np.isnan(a))[0] 

dd = np.diff(ii) 
jj = [ii[i] for i in range(1,len(ii)) if dd[i-1] > 2] 
jj = [ii[0]] + jj 

for ci in jj: 
    a[ci:ci+2] = 1.0 
df.x = a 

我不知道,結果是你在尋找什麼,但...

代碼以下允許您搜索特定模式,然後用其他定義的模式替換這些模式。缺點是,整個陣列通過 多次循環,具體取決於搜索模式的數量, 根據您的數據大小可能會或可能沒有關係。

'發現'模式被標記,並且不包括在隨後的 搜索循環中,以避免重疊結果。因此,搜索以優先級方式以 完成。調整模式中的元素並填充列表以更改規則。

我覺得下面的模式規則按您previous question,但它僅被輕輕地測試產生輸出所需...

# search patterns in original data (zeros represent nans) 
p1 = [1., 1., 1.] 
p2 = [1., 0., 1.] 
p3 = [1., 1., 0.] 
p4 = [1., 0., 0.] 

# markers to 'set aside' found patterns (can be any list of floats > 1.0 
# for each, the same float for each fill makes it easy to see which 
# replacements were done where for testing...) 
f1 = [5., 5., 5.] 
f2 = [4., 4., 4.] 
f3 = [3., 3., 3.] 
f4 = [2., 2., 2.] 

patterns = [p1, p2, p3, p4] 
fills = [f1, f2, f3, f4] 

def fill_segments(a, test_patterns, fill_patterns): 
    # replace nans with zeros so fast numpy array_equal will work 
    nan_idx = np.where(np.isnan(a))[0] 
    np.put(a, nan_idx, 0.) 
    col_index = list(np.arange(a.size)) 
    # loop forward through sequence comparing segment patterns 
    for j in np.arange(len(test_patterns)): 
     this_pattern = test_patterns[j] 
     snip = len(this_pattern) 
     rng = col_index[:-snip + 1] 
     for i in rng: 
      seg = a[col_index[i: i + snip]] 
      if np.array_equal(seg, this_pattern): 
       # when a match is found, replace values in array segment 
       # with fill pattern 
       pattern_indexes = col_index[i: i + snip] 
       np.put(a, pattern_indexes, fill_patterns[j]) 
    # convert all fillers to ones 
    np.put(a, np.where(a > 1.)[0], 1.) 
    # convert zeros back to nans 
    np.put(a, np.where(a == 0.)[0], np.nan) 

    return a 

運行功能,並分配給df.x列

df.x = fill_segments(np.array(df.x), patterns, fills) 

輸入:

   Open High  Low Close Volume Adj Close x 
Date                 
2000-01-03 153.00 153.69 149.19 150.00 22069800 29.68  1.0 
2000-01-04 147.25 148.00 144.00 144.00 22121400 28.49  1.0 
2000-01-05 143.75 147.00 142.56 143.75 27292800 28.44  NaN 
2000-01-06 143.12 146.94 142.63 145.67 19873200 28.82  NaN 
2000-01-07 148.00 151.88 147.00 151.31 20141400 29.94  NaN 
2000-01-10 152.69 154.06 151.12 151.25 15226500 29.93  NaN 
2000-01-11 151.00 152.69 150.62 151.50 15123000 29.98  NaN 
2000-01-12 151.06 153.25 150.56 152.00 18342300 30.08  NaN 
2000-01-13 153.13 154.94 153.00 153.75 14953500 30.42  1.0 
2000-01-14 153.38 154.63 149.56 151.00 18480300 29.88  1.0 
2000-01-18 149.62 149.62 146.75 148.00 18296700 29.29  NaN 
2000-01-19 146.50 150.94 146.25 148.72 14849700 29.43  NaN 
2000-01-20 149.06 149.75 142.63 145.94 30759000 28.88  1.0 
2000-01-21 147.94 148.25 143.94 144.13 24005400 28.52  1.0 
2000-01-24 145.31 145.94 136.44 138.13 27116100 27.33  1.0 
2000-01-25 138.06 140.38 137.00 138.50 25387500 27.41  NaN 
2000-01-26 140.50 142.19 138.88 141.44 15856800 27.99  NaN 
2000-01-27 141.56 141.75 137.06 141.75 19243500 28.05  1.0 
2000-01-28 140.31 140.50 133.63 134.00 29846700 26.52  1.0 
2000-01-31 134.00 135.94 133.06 134.00 21782700 26.52  NaN 
2000-02-01 134.25 137.00 134.00 136.00 27339000 26.91  NaN 

輸出:

   Open High  Low Close Volume Adj Close x 
Date                 
2000-01-03 153.00 153.69 149.19 150.00 22069800 29.68  1.0 
2000-01-04 147.25 148.00 144.00 144.00 22121400 28.49  1.0 
2000-01-05 143.75 147.00 142.56 143.75 27292800 28.44  1.0 
2000-01-06 143.12 146.94 142.63 145.67 19873200 28.82  NaN 
2000-01-07 148.00 151.88 147.00 151.31 20141400 29.94  NaN 
2000-01-10 152.69 154.06 151.12 151.25 15226500 29.93  NaN 
2000-01-11 151.00 152.69 150.62 151.50 15123000 29.98  NaN 
2000-01-12 151.06 153.25 150.56 152.00 18342300 30.08  NaN 
2000-01-13 153.13 154.94 153.00 153.75 14953500 30.42  1.0 
2000-01-14 153.38 154.63 149.56 151.00 18480300 29.88  1.0 
2000-01-18 149.62 149.62 146.75 148.00 18296700 29.29  1.0 
2000-01-19 146.50 150.94 146.25 148.72 14849700 29.43  NaN 
2000-01-20 149.06 149.75 142.63 145.94 30759000 28.88  1.0 
2000-01-21 147.94 148.25 143.94 144.13 24005400 28.52  1.0 
2000-01-24 145.31 145.94 136.44 138.13 27116100 27.33  1.0 
2000-01-25 138.06 140.38 137.00 138.50 25387500 27.41  NaN 
2000-01-26 140.50 142.19 138.88 141.44 15856800 27.99  NaN 
2000-01-27 141.56 141.75 137.06 141.75 19243500 28.05  1.0 
2000-01-28 140.31 140.50 133.63 134.00 29846700 26.52  1.0 
2000-01-31 134.00 135.94 133.06 134.00 21782700 26.52  1.0 
2000-02-01 134.25 137.00 134.00 136.00 27339000 26.91  NaN