您可以使用DataFrame
構造:
N = 10
df =pd.DataFrame(newsampledata.values.tolist(),index=np.arange(N),columns=sampledata.columns)
print (df)
float_col int_col str_col r v new_coltest eddd
0 0.1 1 a 5 1.0 0.1 -0.539783
1 0.1 1 a 5 1.0 0.1 -0.539783
2 0.1 1 a 5 1.0 0.1 -0.539783
3 0.1 1 a 5 1.0 0.1 -0.539783
4 0.1 1 a 5 1.0 0.1 -0.539783
5 0.1 1 a 5 1.0 0.1 -0.539783
6 0.1 1 a 5 1.0 0.1 -0.539783
7 0.1 1 a 5 1.0 0.1 -0.539783
8 0.1 1 a 5 1.0 0.1 -0.539783
9 0.1 1 a 5 1.0 0.1 -0.539783
print (df.dtypes)
float_col float64
int_col int64
str_col object
r int64
v float64
new_coltest float64
eddd float64
dtype: object
個
時序:
是小DataFrame
更快sample
和reindex
方法,在大型DataFrame
構造方法。
N = 1000
In [88]: %timeit (pd.DataFrame(newsampledata.values.tolist(), index=np.arange(N), columns=sampledata.columns))
1000 loops, best of 3: 745 µs per loop
In [89]: %timeit (newsampledata.sample(N, replace=True).reset_index(drop=True))
The slowest run took 4.88 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 470 µs per loop
In [90]: %timeit (newsampledata.reindex(newsampledata.index.repeat(N)).reset_index(drop=True))
1000 loops, best of 3: 476 µs per loop
N = 10000
In [92]: %timeit (pd.DataFrame(newsampledata.values.tolist(), index=np.arange(N), columns=sampledata.columns))
1000 loops, best of 3: 946 µs per loop
In [93]: %timeit (newsampledata.sample(N, replace=True).reset_index(drop=True))
1000 loops, best of 3: 775 µs per loop
In [94]: %timeit (newsampledata.reindex(newsampledata.index.repeat(N)).reset_index(drop=True))
1000 loops, best of 3: 827 µs per loop
N = 100000
In [97]: %timeit (pd.DataFrame(newsampledata.values.tolist(), index=np.arange(N), columns=sampledata.columns))
The slowest run took 12.98 times longer than the fastest. This could mean that an intermediate result is being cached.
100 loops, best of 3: 6.93 ms per loop
In [98]: %timeit (newsampledata.sample(N, replace=True).reset_index(drop=True))
100 loops, best of 3: 7.07 ms per loop
In [99]: %timeit (newsampledata.reindex(newsampledata.index.repeat(N)).reset_index(drop=True))
100 loops, best of 3: 7.87 ms per loop
良好的解決方案的一個,似乎工作沒有問題,我同意,它更快。不知道如何設置索引,將不得不記住這一個! – rajan
在以前的版本中,你有一個numpy版本,缺點是轉換爲object的dtypes。當回到原始數據類型時,這個解決方案如何比較性能?也許numpy仍然更快;) – Quickbeam2k1
@ Quickbeam2k1 - 我嘗試。 – jezrael