2017-05-04 81 views
0

我有一個192欄的數據框。我想通過減去基於配對的匹配列來製作96列數據幀。配對信息可在Match列中的數據框Pairing中找到。列Pos與子串匹配我想要減去的數據框的列名稱。減去配對列

如何使用Pairing中的配對信息來確定要減去的列。

> Pairing 
Match    Pos 
Control_70   001_A01 
Control_56   001_A02 
    Case_70   001_A03 
    Case_56   001_A04 
Control_21   001_A05 
    Case_21   001_A06 


> head(matures.cpm.spike.batch[,1:6]) 
       001_A01_S1 001_A02_S2 001_A03_S3 001_A04_S4 001_A05_S5 001_A06_S6 
hsa-let-7a-5p 16.566813 11.415796 12.400252 22.701457 8.864882 20.442599 
hsa-let-7b-5p 15.574190 11.107133 12.196465 17.954547 8.527478 25.788286 
hsa-let-7c-5p 5.976763 4.372978 5.984685 9.821348 6.341252 7.480211 
hsa-let-7d-3p 16.508818 10.697730 11.001534 18.375286 7.583910 24.974774 
hsa-let-7d-5p 13.273824 5.134547 9.456675 11.567230 7.096485 13.294108 
hsa-let-7f-5p 13.900711 9.804384 11.481614 20.002110 7.878241 17.295909 
+1

目前還不清楚如何配對列。也許你應該刪除所有不必要的信息(意思是配對中不必要的列)並給出一個實際的例子。 –

+0

已更新。也許這是一個實際的例子 – user2300940

+0

你的'Pairing $ Pos'與你在數據中顯示的頭部不匹配。此外,如果您可以提供兩者的輸入,而不是粘貼數據,它將有很大的幫助。請參閱[如何使一個偉大的R可重現的例子](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) –

回答

1

我假設每一個案例都有一個控制權,反之亦然。將配對數據幀轉換爲對齊案例和控制似乎是最簡單的。一旦完成,你可以建立你想要的數據框架。

## First, recreate your data 
Pairing = read.table(text="Match    Pos 
Control_70   001_A01 
Control_56   001_A02 
    Case_70   001_A03 
    Case_56   001_A04 
Control_21   001_A05 
    Case_21   001_A06", 
header=TRUE) 

matures.cpm.spike.batch = read.table(text=" 001_A01_S1 001_A02_S2 001_A03_S3 001_A04_S4 001_A05_S5 001_A06_S6 
hsa-let-7a-5p 16.566813 11.415796 12.400252 22.701457 8.864882 20.442599 
hsa-let-7b-5p 15.574190 11.107133 12.196465 17.954547 8.527478 25.788286 
hsa-let-7c-5p 5.976763 4.372978 5.984685 9.821348 6.341252 7.480211 
hsa-let-7d-3p 16.508818 10.697730 11.001534 18.375286 7.583910 24.974774 
hsa-let-7d-5p 13.273824 5.134547 9.456675 11.567230 7.096485 13.294108 
hsa-let-7f-5p 13.900711 9.804384 11.481614 20.002110 7.878241 17.295909", 
header=TRUE) 

## Build Matches to replace your Pairing 
Control = Pairing[grep("Control", Pairing$Match),] 
Control = Control[order(Control$Match),] 
Case = Pairing[grep("Case", Pairing$Match),] 
Case = Case[order(Case$Match),] 
Matches = cbind(Control, Case) 

# Uses Matches to build desired data.frame 
Diffs = data.frame(matures.cpm.spike.batch[, Matches[1,4]] - 
     matures.cpm.spike.batch[, Matches[1,2]]) 
colnames(Diffs)[1] = sub("Control", "Diff", Matches[1,1]) 
for(i in 2:nrow(Matches)) { 
    Diffs[,i] = matures.cpm.spike.batch[, Matches[i,4]] - 
     matures.cpm.spike.batch[, Matches[i,2]] 
    colnames(Diffs)[i] = sub("Control", "Diff", Matches[i,1]) 
} 

## Result 
    Diff_21 Diff_56 Diff_70 
1 11.577717 11.285661 -4.166561 
2 17.260808 6.847414 -3.377725 
3 1.138959 5.448370 0.007922 
4 17.390864 7.677556 -5.507284 
5 6.197623 6.432683 -3.817149 
6 9.417668 10.197726 -2.419097 
+0

如何保留原始數據框的名稱? – user2300940

+0

原始數據框有192列。新的數據框有96個。你想要哪些列名?控制或案例? – G5W

0

以防萬一,一種不同的方法:

我們將需要修改的匹配數據幀,與病例和對照單獨列:

library(tidyr) 
library(reshape2) 

P <- Pairing %>% 
    separate(Match, into = c("cc", "ind"), sep = "_") %>% 
    dcast(ind ~ cc, value.var = "Pos") 

P:

ind Case Control 
1 21 001_A06 001_A05 
2 56 001_A04 001_A02 
3 70 001_A03 001_A01 

我們也希望colnamesmatures.cpm.spike.batch到比賽的名字在P

colnames(df):

[1] "001_A01" "001_A02" "001_A03" "001_A04" "001_A05" "001_A06" 

現在,我們可以完成它簡單如下:

case <- df[, P$Case] 
control <- df[, P$Control] 
res <- case - control 

res:

   001_A06 001_A04 001_A03 
hsa-let-7a-5p 11.577717 11.285661 -4.166561 
hsa-let-7b-5p 17.260808 6.847414 -3.377725 
hsa-let-7c-5p 1.138959 5.448370 0.007922 
hsa-let-7d-3p 17.390864 7.677556 -5.507284 
hsa-let-7d-5p 6.197623 6.432683 -3.817149 
hsa-let-7f-5p 9.417668 10.197726 -2.419097