2016-08-01 285 views
0

我想要選擇經緯度範圍內的所有網格單元格,並且對於每個網格單元格,將其導出爲日期框架,然後導出爲csv文件(即df.to_csv)。我的數據集如下。我可以使用xr.where(...)來屏蔽掉我輸入之外的網格單元格,但不知道如何循環未屏蔽的剩餘網格。或者,我嘗試過使用xr.sel功能,但他們似乎並不接受像ds.sel(gridlat_0>45)這樣的運營商。 xr.sel_points(...)也可能工作,但我無法弄清楚在我的情況下使用索引器的正確語法。提前謝謝你的幫助。python xarray選擇緯度/長度並將點數據提取到數據幀

<xarray.Dataset> 
Dimensions: (time: 48, xgrid_0: 685, ygrid_0: 485) 
Coordinates: 
    gridlat_0 (ygrid_0, xgrid_0) float32 44.6896 44.6956 44.7015 44.7075 ... 
    * ygrid_0 (ygrid_0) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ... 
    * xgrid_0 (xgrid_0) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ... 
    * time  (time) datetime64[ns] 2016-07-28T01:00:00 2016-07-28T02:00:00 ... 
    gridlon_0 (ygrid_0, xgrid_0) float32 -129.906 -129.879 -129.851 ... 
Data variables: 
    u   (time, ygrid_0, xgrid_0) float64 nan nan nan nan nan nan nan ... 
    gridrot_0 (time, ygrid_0, xgrid_0) float32 nan nan nan nan nan nan nan ... 
    Qli  (time, ygrid_0, xgrid_0) float64 nan nan nan nan nan nan nan ... 
    Qsi  (time, ygrid_0, xgrid_0) float64 nan nan nan nan nan nan nan ... 
    p   (time, ygrid_0, xgrid_0) float64 nan nan nan nan nan nan nan ... 
    rh   (time, ygrid_0, xgrid_0) float64 nan nan nan nan nan nan nan ... 
    press  (time, ygrid_0, xgrid_0) float64 nan nan nan nan nan nan nan ... 
    t   (time, ygrid_0, xgrid_0) float64 nan nan nan nan nan nan nan ... 
    vw_dir  (time, ygrid_0, xgrid_0) float64 nan nan nan nan nan nan nan ... 

回答

0

要做到這一點,最簡單的方法可能是通過每個格點的循環,用類似如下:

# (optionally) create a grid dataset so we don't need to pull out all 
# the data from the main dataset before looking at each point 
grid = ds[['gridlat_0', 'gridlon_0']] 

for i in range(ds.coords['xgrid_0'].size): 
    for j in range(ds.coords['ygrid_0'].size): 
     sub_grid = grid.isel(xgrid_0=i, ygrid_0=j) 
     if is_valid(sub_grid.gridlat_0, sub_grid.gridlon_0): 
      sub_ds = ds.isel(xgrid_0=i, ygrid_0=j) 
      sub_ds.to_dataframe().to_csv(...) 

即使有一個685x485,這應該只需要幾秒鐘就可以遍歷每一點。

預過濾ds = ds.where(..., drop=True)(可以在本週晚些時候發佈的下一個xarray版本中發佈)之前,可以使這個過程顯着加快,但是您仍然有可能無法表示所選網格的問題正交軸。

最後一個選項,可能是最乾淨的,是使用stack將數據集轉換爲2D。然後你可以使用標準的選擇和GROUPBY操作沿着新'space'尺寸:

ds_stacked = ds.stack(space=['xgrid_0', 'ygrid_0']) 
ds_filtered = ds_stacked.sel(space=(ds_stacked.gridlat_0 > 45)) 
for _, ds_one_place in ds_filtered.groupby('space'): 
    ds_one_place.to_dataframe().to_csv(...) 
+0

謝謝斯蒂芬的建議,我會盡快嘗試出來爲我解決這個問題段錯誤:(http://stackoverflow.com/questions-38711915/segmentation-fault-writing-xarray-datset-to-netcdf-or-dataframe) – nicway

+0

第一個選項在1.6分鐘內使用xarray 0.7.3中的預過濾建議。 – nicway