我知道矢量化函數是編寫速度代碼的首選方法,但我無法想出一個方法來做這個函數沒有循環的方法。我寫這個函數的方式導致了非常緩慢的完成時間。 (傳遞兩個dataframes與100列和2000行作爲參數,該功能需要100秒的時間完成。我希望更多的像1秒。)有沒有辦法加快這個熊貓的功能?
def gen_fuzz_logic_signal(longp, shortp):
# Input dataframes should have 0, -1, or 1 value
flogic_signal = pd.DataFrame(index = longp.index, columns = longp.columns)
for sym in longp.columns:
print sym
prev_enter = 0
for inum in range(0, len(longp.index)):
cur_val = np.nan
if longp.ix[inum, sym] == 0 and prev_enter == +1:
cur_val = 0.5
if shortp.ix[inum, sym] == 0 and prev_enter == -1:
cur_val = -0.5
if longp.ix[inum, sym] == 1 and shortp.ix[inum, sym] == -1:
if longp.ix[inum - 1, sym] != 1:
cur_val = 1
prev_enter = 1
elif shortp.ix[inum - 1, sym] != -1:
cur_val = -1
prev_enter = -1
else:
cur_val = prev_enter
else:
if longp.ix[inum, sym] == 1:
cur_val = 1
prev_enter = 1
if shortp.ix[inum, sym] == -1:
cur_val = -1
prev_enter = -1
flogic_signal.ix[inum, sym] = cur_val
return flogic_signal
給函數的輸入是根本就是兩個dataframes用的值無論是1,-1還是0.我真的很感激,如果有人有想法如何矢量化或加速。我嘗試用「[sym] [inum]」替換「.ix [inum,sym]」,但速度更慢。
GOOG longp GOOG shortp GOOG func result
2011-07-28 0 -1 -1
2011-07-29 0 -1 -1
2011-08-01 0 -1 -1
2011-08-02 0 -1 -1
2011-08-03 0 -1 -1
2011-08-04 0 -1 -1
2011-08-05 0 -1 -1
2011-08-08 0 0 -0.5
2011-08-09 0 0 -0.5
2011-08-10 0 0 -0.5
2011-08-11 0 0 -0.5
2011-08-12 1 0 1
2011-08-15 1 0 1
2011-08-16 1 0 1
2011-08-17 1 0 1
2011-08-18 1 0 1
2011-08-19 1 0 1
2011-08-22 1 0 1
2011-08-23 1 0 1
2011-08-24 1 0 1
2011-08-25 1 0 1
2011-08-26 1 0 1
2011-08-29 1 0 1
2011-08-30 1 0 1
2011-08-31 1 0 1
2011-09-01 1 0 1
2011-09-02 1 0 1
2011-09-06 1 0 1
2011-09-07 1 0 1
2011-09-08 1 0 1
2011-09-09 1 0 1
2011-09-12 1 0 1
2011-09-13 1 0 1
2011-09-14 1 0 1
2011-09-15 1 0 1
2011-09-16 1 0 1
2011-09-19 1 0 1
2011-09-20 1 0 1
2011-09-21 1 0 1
2011-09-22 1 0 1
2011-09-23 1 0 1
2011-09-26 1 0 1
2011-09-27 1 0 1
2011-09-28 1 0 1
2011-09-29 0 0 0.5
2011-09-30 0 -1 -1
2011-10-03 0 -1 -1
2011-10-04 0 -1 -1
2011-10-05 0 -1 -1
2011-10-06 0 -1 -1
2011-10-07 0 -1 -1
2011-10-10 0 -1 -1
2011-10-11 0 -1 -1
2011-10-12 0 -1 -1
2011-10-13 0 -1 -1
2011-10-14 0 -1 -1
2011-10-17 0 -1 -1
2011-10-18 0 -1 -1
2011-10-19 0 -1 -1
2011-10-20 0 -1 -1
IBM longp IBM shortp IBM func result
2012-05-01 1 -1 1
2012-05-02 1 -1 1
2012-05-03 1 -1 1
2012-05-04 1 -1 1
2012-05-07 1 -1 1
2012-05-08 1 0 1
2012-05-09 1 0 1
2012-05-10 1 0 1
2012-05-11 1 0 1
2012-05-14 1 0 1
2012-05-15 1 0 1
2012-05-16 0 -1 -1
2012-05-17 0 -1 -1
2012-05-18 0 -1 -1
2012-05-21 0 -1 -1
2012-05-22 0 -1 -1
2012-05-23 0 -1 -1
2012-05-24 0 -1 -1
2012-05-25 0 -1 -1
2012-05-29 0 -1 -1
2012-05-30 0 -1 -1
2012-05-31 0 -1 -1
2012-06-01 0 -1 -1
2012-06-04 0 -1 -1
2012-06-05 0 -1 -1
2012-06-06 0 -1 -1
2012-06-07 0 -1 -1
2012-06-08 1 -1 1
2012-06-11 1 -1 1
2012-06-12 1 -1 1
2012-06-13 1 -1 1
2012-06-14 1 -1 1
2012-06-15 1 -1 1
2012-06-18 1 -1 1
2012-06-19 1 -1 1
2012-06-20 1 -1 1
2012-06-21 1 0 1
2012-06-22 1 0 1
2012-06-25 1 0 1
2012-06-26 1 0 1
2012-06-27 1 0 1
2012-06-28 1 0 1
2012-06-29 1 0 1
編輯:
我只是重新運行所用類似的循環通過熊貓據幀設定值有一些舊的代碼。它過去大概需要5秒鐘,現在我發現它可能是100倍。我想知道這個問題是否是由於熊貓更新版本中發生了變化。這是我能想到的唯一變化。請參閱下面的代碼。這需要73秒鐘使用Pandas 0.11在我的電腦上運行。這對於一個非常基本的功能來說似乎非常緩慢,儘管它是按照元素操作的,但仍然如此。如果有人有機會,我會好奇下面多久你的電腦和你的熊貓版本。
import time
import numpy as np
import pandas as pd
def timef(func, *args):
start= time.clock()
for i in range(2):
func(*args)
end= time.clock()
time_complete = (end-start)/float(2)
print time_complete
def tfunc(num_row, num_col):
df = pd.DataFrame(index = np.arange(1,num_row), columns = np.arange(1,num_col))
for col in df.columns:
for inum in range(1, len(df.index)):
df.ix[inum, col] = 0 #np.nan
return df
timef(tfunc, 1000, 1000) <<< This takes 73 seconds on a Core i5 M460 2.53gz Windows 7 laptop.
EDIT 2 13年7月9日下午1:23:
我找到了一個臨時的解決方案!我將代碼更改爲下面的代碼。基本上將每列轉換爲一個ndarray,然後將新列組裝到一個python列表中,然後再插入到新的pandas DataFrame中的列中。使用上面的舊版本做約2000行的50列需要101秒。以下版本只需要0.19秒!現在對我來說足夠快了。不知道爲什麼.ix太慢了。就像我上面所說的,在較早版本的熊貓中,我認爲元素操作要快得多。
def gen_fuzz_logic_signal3(longp, shortp):
# Input dataframes should have 0 or 1 value
flogic_signal = pd.DataFrame(index = longp.index, columns = longp.columns)
for sym in longp.columns:
coll = longp[sym].values
cols = shortp[sym].values
prev_enter = 0
newcol = [None] * len(coll)
for inum in range(1, len(coll)):
cur_val = np.nan
if coll[inum] == 0 and prev_enter == +1:
cur_val = 0.5
if cols[inum] == 0 and prev_enter == -1:
cur_val = -0.5
if coll[inum] == 1 and cols[inum] == -1:
if coll[inum -1] != 1:
cur_val = 1
prev_enter = 1
elif cols[inum-1] != -1:
cur_val = -1
prev_enter = -1
else:
cur_val = prev_enter
else:
if coll[inum] == 1:
cur_val = 1
prev_enter = 1
if cols[inum] == -1:
cur_val = -1
prev_enter = -1
newcol[inum] = cur_val
flogic_signal[sym] = newcol
return flogic_signal
燦你解釋了函數的目標,而不是讓讀者通過閱讀代碼來解決它? – BrenBarn
這是一個財務數據問題。 Longp是由1或0組成的數據幀。 1表示購買或持有證券。 0意味着出售或保留現金。短缺由-1或0組成。-1賣空或者維持賣空。 0是現金或保留現金。這個功能是將多頭頭寸和空頭頭寸組合成一個信號,其中1代表買入或持有,0.5代表退出買入頭寸或保持現金,-1代表空頭或保持空頭,-0.5代表退出空頭或保持空頭用現金。 – geronimo
我添加了一些示例數據和期望的結果。請讓我知道是否需要額外澄清。 – geronimo