2016-12-01 167 views

回答

4

版本0.13.0(發佈日期2017年1月),包括DataFrame.valuesDataFrame.to_records方法,可以一個DASK數據幀轉換爲DASK陣列

In [1]: import dask.dataframe as dd 

In [2]: import pandas as pd 

In [3]: df = pd.DataFrame({'x': [1, 2, 3], 'y': [4, 5, 6]}) 

In [4]: ddf = dd.from_pandas(df, npartitions=2) 

In [5]: ddf 
Out[5]: dd.DataFrame<from_pa..., npartitions=1, divisions=(0, 2)> 

In [6]: ddf.values 
Out[6]: dask.array<values-..., shape=(nan, 2), dtype=int64, chunksize=(nan, 2)> 

In [7]: ddf.values.compute() 
Out[7]: 
array([[1, 4], 
     [2, 5], 
     [3, 6]]) 

In [8]: ddf.to_records() 
Out[8]: dask.array<to-reco..., shape=(nan,), dtype=(numpy.record, [('index', '<i8'), ('x', '<i8'), ('y', '<i8')]), chunksize=(nan,)> 

In [9]: ddf.to_records().compute() 
Out[9]: 
rec.array([(0, 1, 4), (1, 2, 5), (2, 3, 6)], 
      dtype=[('index', '<i8'), ('x', '<i8'), ('y', '<i8')])