循環遍歷數據幀的連續列

我想循環遍歷一個數據幀的列，並將計算結果存入矩陣。循環遍歷數據幀的連續列

該場景可以與下面的示例數據被複制：

df = data.frame(replicate(10,sample(0:20,10,rep=TRUE))) # the columns to be calculated on 

M1 = as.data.frame(matrix(0, nrow = 10, ncol = 10)) # a matrix to hold the results. 
rownames(M1) = colnames(df) 
colnames(M1) = colnames(df)

並且顯示如下：

> df # Frame with columns of data, X1 to X10 

    X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 
1 1 19 2 6 6 5 0 2 5 10 
2 16 7 14 16 16 18 11 2 18 11 
3 7 6 11 4 4 1 18 11 10 16 
4 20 2 4 20 4 6 10 5 16 7 
5 9 8 16 19 11 2 14 7 13 7 
6 5 16 6 8 20 15 5 11 4 0 
7 11 16 12 8 18 20 20 20 10 14 
8 17 14 10 4 3 10 13 11 5 1 
9 9 20 10 5 1 7 12 10 5 6 
10 8 14 3 14 20 10 17 20 9 14 

> M1 # Output frame to hold results 

    X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 
X1 0 0 0 0 0 0 0 0 0 0 
X2 0 0 0 0 0 0 0 0 0 0 
X3 0 0 0 0 0 0 0 0 0 0 
X4 0 0 0 0 0 0 0 0 0 0 
X5 0 0 0 0 0 0 0 0 0 0 
X6 0 0 0 0 0 0 0 0 0 0 
X7 0 0 0 0 0 0 0 0 0 0 
X8 0 0 0 0 0 0 0 0 0 0 
X9 0 0 0 0 0 0 0 0 0 0 
X10 0 0 0 0 0 0 0 0 0 0

在df列X1和X2是在投入到計算，然後X1和X3，然後X1和X4等，然後循環將循環X2和X3，然後循環X2和X4等。

列n和m被輸入到計算/循環中，並且結果應該被放置在對應於列的矩陣中的適當位置上，其中 x m。計算本身簡單地將Xn和Xm之間的區域確定爲繪製線。我不知道如何正確地構建循環來做到這一點：

# The first iteration of the calculation, column X1 and X2 (X1 and X1 would = 0) 

    y = seq(1,10,1) 
    f1 = approxfun(y, df[,1] - df[,2]) # takes two columns as inputs 
    f2 = function(x) abs(f1(x)) 

    area1 = integrate(f2, 1, 10, subdivisions = 500) 
    M1[2,1] = area1$value

結果框架會產生一個「半矩陣」（即所有需要的鏡像一半是相同的）：

> M1 
    X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 
X1 0 0 0 0 0 0 0 0 0 0 
X2 A 0 0 0 0 0 0 0 0 0 
X3 A A 0 0 0 0 0 0 0 0 
X4 A A A 0 0 0 0 0 0 0 
X5 A A A A 0 0 0 0 0 0 
X6 A A A A A 0 0 0 0 0 
X7 A A A A A A 0 0 0 0 
X8 A A A A A A A 0 0 0 
X9 A A A A A A A A 0 0 
X10 A A A A A A A A A 0

我開始構建一個for循環，但我使用i和j保持在X1，直到它已通過X2-X10循環，然後移動到X2等

感謝絆倒了！

來源

2016-10-19 Qaribbean

當我嘗試運行'F1 = approxfun（Y，DF [，1] - DF [，2]）'，我得到：'在xy.coords錯誤（x，y）：找不到對象'y'。函數f1和f2是您試圖在數據上運行的實際函數嗎？ – biomiha

你可以提供實際的計算輸出，可能只是一個4x4矩陣？ – CCurtis

@biomiha道歉，我從我的解釋中遺漏了'y'的細節，並修改了計算方法。對於這個問題，這應該是原文的縮小版本。 – Qaribbean

我無法讓您的功能正常工作。因此，與使用隨機替換功能，這個循環對我的作品：

area=list() # because the actual function doesn't work 
for(i in 1:ncol(df)){ 
    for(j in 1:ncol(df)){ 
    if(i==j){M[i,i]=0;next} 
    selection=df[,c(i,j)] 
    #area=integrate(f2, 1, 200, subdivisions = 500) 
    area$value=mean(colSums(selection)) # something random to check 
    M[i,j]=area$value 
    M[j,i]=area$value 
    } 
}

但循環一般不做事的最有效的方式。因此，您可能更喜歡此選項：

df = data.frame(replicate(10,sample(0:20,10,rep=TRUE))) # the columns to be calculated on 
my.f = function(x) abs(x[,1]-x[,2]) 

#y = t(as.matrix(combn(ncol(df), 2L, function(y) integrate(my.f(df[y]), 1, 200, subdivisions = 500),simplify=F))) # This doesn't work, but should be close to what you want to do 

y = t(as.matrix(combn(ncol(df), 2L, function(y) mean(f(df[y]),simplify=F)))) # this works, but is just an example 

N = seq_len(ncol(y)) 
nams = colnames(df) 
M = matrix(ncol = length(nams), nrow = length(nams)) 
M[lower.tri(M)] = y 
M = t(out) 
M[lower.tri(M)] = y 
M = t(M) 
diag(M) = 0 
rownames(M) = colnames(out) = colnames(df) 
M 

    X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 
X1 0.0 8.6 6.4 8.8 7.1 6.6 7.0 4.0 7.0 3.7 
X2 8.6 0.0 5.0 4.4 5.5 5.4 4.4 9.2 8.0 7.7 
X3 6.4 5.0 0.0 7.2 5.9 5.8 7.6 7.0 10.4 6.5 
X4 8.8 4.4 7.2 0.0 5.9 4.4 5.4 9.6 8.4 7.3 
X5 7.1 5.5 5.9 5.9 0.0 7.3 5.3 9.1 8.5 8.0 
X6 6.6 5.4 5.8 4.4 7.3 0.0 6.0 8.4 5.6 3.7 
X7 7.0 4.4 7.6 5.4 5.3 6.0 0.0 8.8 4.4 5.7 
X8 4.0 9.2 7.0 9.6 9.1 8.4 8.8 0.0 9.6 6.9 
X9 7.0 8.0 10.4 8.4 8.5 5.6 4.4 9.6 0.0 5.5 
X10 3.7 7.7 6.5 7.3 8.0 3.7 5.7 6.9 5.5 0.0

來源

2016-10-19 18:25:06 Wave

感謝您使用多種解決方案@Wave，我現在只是有機會嘗試並實現這一點，所以我會回來的結果。正如我在上面的評論中提到的那樣，我已經在計算中留下了一些東西，這是我在原始問題中編輯的，我的歉意。 – Qaribbean

我能夠實現你的第一個建議，我的原始數據和功能，與3x3樣本，這運作良好，謝謝！我正在嘗試第二個建議，因爲它看起來更有效率（我的預期應用程序將是100x100或更高）。我可以如何調整我的原始數據的十進制值的'y = t ...'行代碼？我假設'2L'涉及整數的樣本。謝謝。 – Qaribbean

您只需更改功能。 2L指的是要選擇的元素數量（2列），不應該爲您的示例進行更改。 – Wave

循環遍歷數據幀的連續列

回答

相關問題