Xarray使用大熊貓索引/列標籤作爲默認元數據。當所有變量共享相同的維度時,您可以在一次函數調用中進行轉換,但是如果不同的變量具有不同的維度,則需要將它們分別從熊貓轉換爲單獨的,然後將它們放在xarray一側。例如:
import pandas as pd
import io
import xarray
# read your data
cell_metadata = pd.read_csv(io.StringIO(u"""\
cell,Uniquely mapped reads number,Number of input reads,EXP_ID,TAXON,WELL_MAPPING,Lysis Plate Batch,dNTP.batch,oligodT.order.no,plate.type,preparation.site,date.prepared,date.sorted,tissue,subtissue,mouse.id,FACS.selection,nozzle.size,FACS.instument,Experiment ID ,Columns sorted,Double check,Plate,Location ,Comments,mouse.age,mouse.number,mouse.sex
A1-MAA100140-3_57_F-1-1,428699,502312,170928_A00111_0068_AH3YKKDMXX,mus,MAA100140,,,,Biorad 96well,Stanford,,170720,Liver,Hepatocytes,3_57_F,,,,,,,,,,3,57,F
A10-MAA100140-3_57_F-1-1,324428,360285,170928_A00111_0068_AH3YKKDMXX,mus,MAA100140,,,,Biorad 96well,Stanford,,170720,Liver,Hepatocytes,3_57_F,,,,,,,,,,3,57,F
A11-MAA100140-3_57_F-1-1,381310,431800,170928_A00111_0068_AH3YKKDMXX,mus,MAA100140,,,,Biorad 96well,Stanford,,170720,Liver,Hepatocytes,3_57_F,,,,,,,,,,3,57,F
A12-MAA100140-3_57_F-1-1,393498,446705,170928_A00111_0068_AH3YKKDMXX,mus,MAA100140,,,,Biorad 96well,Stanford,,170720,Liver,Hepatocytes,3_57_F,,,,,,,,,,3,57,F
A2-MAA100140-3_57_F-1-1,717,918,170928_A00111_0068_AH3YKKDMXX,mus,MAA100140,,,,Biorad 96well,Stanford,,170720,Liver,Hepatocytes,3_57_F,,,,,,,,,,3,57,F"""))
counts = pd.read_csv(io.StringIO(u"""\
cell,0610005C13Rik,0610007C21Rik,0610007L01Rik,0610007N19Rik,0610007P08Rik,0610007P14Rik,0610007P22Rik,0610008F07Rik,0610009B14Rik,0610009B22Rik,0610009D07Rik,0610009L18Rik,0610009O20Rik,0610010B08Rik,0610010F05Rik,0610010K14Rik,0610010O12Rik,0610011F06Rik,0610011L14Rik,0610012G03Rik
A1-MAA100140-3_57_F-1-1,308,289,81,0,4,88,52,0,0,104,65,0,1,0,9,8,12,283,12,37
A10-MAA100140-3_57_F-1-1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
A11-MAA100140-3_57_F-1-1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
A12-MAA100140-3_57_F-1-1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
A2-MAA100140-3_57_F-1-1,375,325,70,0,2,72,36,13,0,60,105,0,13,0,0,29,15,264,0,65"""))
# build the output
xarray_counts = xarray.DataArray(counts.set_index('cell'), dims=['cell', 'gene'])
xarray_counts.coords.update(cell_metadata.set_index('cell').to_xarray())
print(xarray_counts)
這導致在一個不錯的,整潔xarray.DataArray
用於計數:
<xarray.DataArray (cell: 5, gene: 20)>
array([[308, 289, 81, 0, 4, 88, 52, 0, 0, 104, 65, 0, 1, 0,
9, 8, 12, 283, 12, 37],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0],
[375, 325, 70, 0, 2, 72, 36, 13, 0, 60, 105, 0, 13, 0,
0, 29, 15, 264, 0, 65]])
Coordinates:
* cell (cell) object 'A1-MAA100140-3_57_F-1-1' ...
* gene (gene) object '0610005C13Rik' ...
Uniquely mapped reads number (cell) int64 428699 324428 381310 393498 717
Number of input reads (cell) int64 502312 360285 431800 446705 918
EXP_ID (cell) object '170928_A00111_0068_AH3YKKDMXX' ...
TAXON (cell) object 'mus' 'mus' 'mus' 'mus' 'mus'
WELL_MAPPING (cell) object 'MAA100140' 'MAA100140' ...
Lysis Plate Batch (cell) float64 nan nan nan nan nan
dNTP.batch (cell) float64 nan nan nan nan nan
oligodT.order.no (cell) float64 nan nan nan nan nan
plate.type (cell) object 'Biorad 96well' ...
preparation.site (cell) object 'Stanford' 'Stanford' ...
date.prepared (cell) float64 nan nan nan nan nan
date.sorted (cell) int64 170720 170720 170720 170720 ...
tissue (cell) object 'Liver' 'Liver' 'Liver' ...
subtissue (cell) object 'Hepatocytes' 'Hepatocytes' ...
mouse.id (cell) object '3_57_F' '3_57_F' '3_57_F' ...
FACS.selection (cell) float64 nan nan nan nan nan
nozzle.size (cell) float64 nan nan nan nan nan
FACS.instument (cell) float64 nan nan nan nan nan
Experiment ID (cell) float64 nan nan nan nan nan
Columns sorted (cell) float64 nan nan nan nan nan
Double check (cell) float64 nan nan nan nan nan
Plate (cell) float64 nan nan nan nan nan
Location (cell) float64 nan nan nan nan nan
Comments (cell) float64 nan nan nan nan nan
mouse.age (cell) int64 3 3 3 3 3
mouse.number (cell) int64 57 57 57 57 57
mouse.sex (cell) object 'F' 'F' 'F' 'F' 'F'
如果你想有一個數據集代替,把DataArray中對象到數據集的構造函數,例如,
# shouldn't really need to use .data_vars here, that might be an xarray bug
>>> xarray.Dataset({'counts': xarray.DataArray(counts.set_index('cell'),
... dims=['cell', 'gene'])},
... coords=cell_metadata.set_index('cell').to_xarray().data_vars) <xarray.Dataset>
Dimensions: (cell: 5, gene: 20)
Coordinates:
* cell (cell) object 'A1-MAA100140-3_57_F-1-1' ...
* gene (gene) object '0610005C13Rik' ...
Uniquely mapped reads number (cell) int64 428699 324428 381310 393498 717
Number of input reads (cell) int64 502312 360285 431800 446705 918
EXP_ID (cell) object '170928_A00111_0068_AH3YKKDMXX' ...
TAXON (cell) object 'mus' 'mus' 'mus' 'mus' 'mus'
WELL_MAPPING (cell) object 'MAA100140' 'MAA100140' ...
Lysis Plate Batch (cell) float64 nan nan nan nan nan
dNTP.batch (cell) float64 nan nan nan nan nan
oligodT.order.no (cell) float64 nan nan nan nan nan
plate.type (cell) object 'Biorad 96well' ...
preparation.site (cell) object 'Stanford' 'Stanford' ...
date.prepared (cell) float64 nan nan nan nan nan
date.sorted (cell) int64 170720 170720 170720 170720 ...
tissue (cell) object 'Liver' 'Liver' 'Liver' ...
subtissue (cell) object 'Hepatocytes' 'Hepatocytes' ...
mouse.id (cell) object '3_57_F' '3_57_F' '3_57_F' ...
FACS.selection (cell) float64 nan nan nan nan nan
nozzle.size (cell) float64 nan nan nan nan nan
FACS.instument (cell) float64 nan nan nan nan nan
Experiment ID (cell) float64 nan nan nan nan nan
Columns sorted (cell) float64 nan nan nan nan nan
Double check (cell) float64 nan nan nan nan nan
Plate (cell) float64 nan nan nan nan nan
Location (cell) float64 nan nan nan nan nan
Comments (cell) float64 nan nan nan nan nan
mouse.age (cell) int64 3 3 3 3 3
mouse.number (cell) int64 57 57 57 57 57
mouse.sex (cell) object 'F' 'F' 'F' 'F' 'F'
Data variables:
counts (cell, gene) int64 308 289 81 0 4 88 52 0 ...
你能提供你的DataFrame('df.head()')的樣本和你的目標數據集或DataArray的詳細描述。你有沒有嘗試過使用熊貓的to_xarray()方法? – jhamman
要加上Joe的評論,絕對看看xarray文檔的[working with pandas](http://xarray.pydata.org/en/stable/pandas.html)部分,看看是否有幫助。如果您可以爲您的數據設置適當的'pandas.MultiIndex',轉換爲xarray *通常很容易。 – shoyer