2012-08-02 70 views
3

這裏應用的功能是示例數據:選擇n隨之而來的分組變量和r中

myd <- data.frame (matrix (sample (c("AB", "BB", "AA"), 100*100, 
replace = T), ncol = 100)) 
variablenames= paste (rep (paste ("MR.", 1:10,sep = ""), 
    each = 10), 1:100, sep = ".") 
    names(myd) <- variablenames 

每個變量都有一組,我們這裏有十組。因此,對於在該數據幀中的每一個變量的組指標如下:

group <- rep(1:10, each = 10) 

因此變量名稱和組

data.frame (group, variablenames) 
    group variablenames 
1  1  MR.1.1 
2  1  MR.1.2 
3  1  MR.1.3 
4  1  MR.1.4 
5  1  MR.1.5 
6  1  MR.1.6 
7  1  MR.1.7 
8  1  MR.1.8 
9  1  MR.1.9 
10  1  MR.1.10 
11  2  MR.2.11 
<<<<<<<<<<<<<<<<<<<<<<<< 
100 10  MR.10.100 

每個基團是指以下步驟對子級需單獨應用到變量組。

我有更長的功能起作用,以下是簡單的例子:

功能考慮在時間

myfun <- function (x1, x2) { 
out <- NULL 
out <- paste(x1, x2, sep=":") 
# for other steps to be performed here 
return (out) 
} 
# group 1 
myfun (myd[,1], myd[,2]); myfun (myd[,3], myd[,4]); myfun (myd[,5], myd[,6]); 
myfun (myd[,7], myd[,8]); myfun (myd[,9], myd[,10]); 
# group 2 
myfun (myd[,11], myd[,12]); myfun (myd[,13], myd[,14]); .......so on to group 10 ; 

兩個變量這樣,我需要走變量1:10(即在第一組執行上述操作),然後是11:20(第二組)。在這種情況下,組中的變量無關緊要,每個組中的變量數量可以與在某個時間(2)所採用(考慮)的變量數量(10)整除。

但是,在下面的例子中,每次取3個變量 - 每個組(3),10/3中總變量的數量,最後剩下一個變量。函數考慮三個變量的時間函數。

myfun <- function (x1, x2, x3) { 
out <- NULL 
out <- paste(x1, x2, x3, sep=":") 
# for other steps to be performed here 
return (out) 
} 
# for group 1 
myfun (myd[,1], myd[,2], myd[,3]) 
myfun (myd[,4], myd[,5], myd[,6]) 
myfun (myd[,7], myd[,8], myd[,9]) 
# As there one variable left before proceedomg to second group, the final group will 
have 1 extra variable 
myfun (myd[,7], myd[,8], myd[,9],myd[,10]) 
# for group 2 
    myfun (myd[,11], myd[,12], myd[,13]) 
    # and to the end all groups and to end of the file. 

欲環通過在時間,守恆變量用戶定義的n個這種過程,其中n可以是1至的每個組中的變量最大值。

編輯:只是插圖顯示的處理(只是組1和2以實例闡述例如):

enter image description here

+0

而沒有明確的answere被正在添加只是一個想法 - 可以創建可變namesmat <的名稱 - 矩陣(名稱(MYD),nrow =長度( myd)/ n,byrow = TRUE),然後將函數應用於myd和這個矩陣,但我不確定關於不平衡的數據... – shNIL 2012-08-02 18:19:41

回答

4

創建一個函數,將數據分成合適的列表,並將所需的任何函數應用於列表。

此功能將創建您的分組變量。 (第一組變量(group)在你的問題提供;如果更改該值,你也應該在以下的功能改變​​)

myfun = function(LENGTH, DIM = 10) { 
    PATTERN = rep(1:(DIM %/% LENGTH), each=LENGTH) 
    c(PATTERN, rep(max(PATTERN), DIM %% LENGTH)) 
} 

這裏是我們將拆分myd組。在這個例子中,我們首先將myd劃分爲10列組,並將每個組劃分爲3列組,除了最後一組將有4列(3 + 3 + 4 = 10)。

注:要改變你的分組由列數,例如,在一次由兩個變量分組,改變group2 = rep(myfun(3), length.out=100)group2 = rep(myfun(2), length.out=100)

group <- rep(1:10, each = 10) 
# CHANGE THE FOLLOWING LINE ACCORDING 
# TO THE NUMBER OF GROUPS THAT YOU WANT 
group2 = rep(myfun(3), length.out=100) 

這是分裂過程。我們首先按名稱拆分,然後將這些名稱與myd匹配以創建data.frames的列表。

# Extract group names for matching purposes 
temp = split(names(myd), list(group, group2)) 

# Match the names to myd 
temp = lapply(1:length(temp), 
       function(x) myd[, which(names(myd) %in% temp[[x]])]) 

# Extract the names from the list for future reference 
NAMES = lapply(temp, function(x) paste(names(x), collapse="_")) 

既然我們有一個列表,我們可以做很多有趣的事情。你想把你的列粘貼在一起,用冒號分隔。以下是你如何做到這一點。

# Do what you want with the list 
# For example, to paste the columns together: 
FINAL = lapply(temp, function(x) apply(x, 1, paste, collapse=":")) 
names(FINAL) = NAMES 

這裏的輸出的一個示例:

lapply(FINAL, function(x) head(x, 5)) 
# $MR.1.1_MR.1.2_MR.1.3 
# [1] "AA:AB:AB" "AB:BB:AA" "BB:AB:AA" "BB:AA:AB" "AA:AA:AA" 
# 
# $MR.2.11_MR.2.12_MR.2.13 
# [1] "BB:AA:AB" "BB:AB:BB" "BB:AA:AA" "AB:BB:AA" "BB:BB:AA" 
# 
# $MR.3.21_MR.3.22_MR.3.23 
# [1] "AA:AB:BB" "BB:AA:AA" "AA:AB:BB" "AB:AA:AA" "AB:BB:BB" 
# 
# <<<<<<<------SNIP------>>>>>>>> 
# 
# $MR.1.4_MR.1.5_MR.1.6 
# [1] "AB:BB:AA" "BB:BB:BB" "AA:AA:AA" "BB:BB:AB" "AB:AA:AA" 
# 
# $MR.2.14_MR.2.15_MR.2.16 
# [1] "AA:BB:AB" "BB:BB:BB" "BB:BB:AB" "AA:BB:AB" "BB:BB:BB" 
# 
# $MR.3.24_MR.3.25_MR.3.26 
# [1] "AA:AB:BB" "BB:AA:BB" "BB:AB:BB" "AA:AB:AA" "AB:AA:AA" 
# 
# <<<<<<<------SNIP------>>>>>>>> 
# 
# $MR.1.7_MR.1.8_MR.1.9_MR.1.10 
# [1] "AB:AB:AA:AB" "AB:AA:BB:AA" "BB:BB:AA:AA" "AB:BB:AB:AA" "AB:BB:AB:BB" 
# 
# $MR.2.17_MR.2.18_MR.2.19_MR.2.20 
# [1] "AB:AB:BB:BB" "AB:AB:BB:BB" "AB:AA:BB:BB" "AA:AA:AB:AA" "AB:AB:AB:AB" 
# 
# $MR.3.27_MR.3.28_MR.3.29_MR.3.30 
# [1] "BB:BB:AB:BB" "BB:BB:AA:AA" "AA:BB:AB:AA" "AA:BB:AB:AA" "AA:AB:AA:BB" 
# 
# $MR.4.37_MR.4.38_MR.4.39_MR.4.40 
# [1] "BB:BB:AB:AA" "AA:BB:AA:BB" "AA:AA:AA:AB" "AB:AA:BB:AB" "BB:BB:BB:BB" 
# 
# $MR.5.47_MR.5.48_MR.5.49_MR.5.50 
# [1] "AB:AA:AA:AB" "AB:AA:BB:AA" "AB:BB:AA:AA" "AB:BB:BB:BB" "BB:AA:AB:AA" 
# 
# $MR.6.57_MR.6.58_MR.6.59_MR.6.60 
# [1] "BB:BB:AB:AA" "BB:AB:BB:AA" "AA:AB:AB:BB" "BB:AB:AA:AB" "AB:AA:AB:BB" 
# 
# $MR.7.67_MR.7.68_MR.7.69_MR.7.70 
# [1] "BB:AB:BB:AA" "BB:AB:BB:AA" "BB:AB:BB:AB" "AB:AA:AA:AA" "AA:AA:AA:AB" 
# 
# $MR.8.77_MR.8.78_MR.8.79_MR.8.80 
# [1] "AA:AB:AA:AB" "AB:AA:AB:BB" "BB:BB:AA:AB" "AB:BB:BB:BB" "AB:AA:BB:AB" 
# 
# $MR.9.87_MR.9.88_MR.9.89_MR.9.90 
# [1] "AA:BB:AB:AA" "AA:AB:BB:BB" "AA:BB:AA:BB" "AB:AB:AA:BB" "AB:AA:AB:BB" 
# 
# $MR.10.97_MR.10.98_MR.10.99_MR.10.100 
# [1] "AB:AA:BB:AB" "AB:AA:AB:BB" "BB:AB:AA:AA" "BB:BB:AA:AA" "AB:AB:BB:AB" 
0

我建議重新編碼myfun採取矩陣並使用pasteCols從plotrix包。

library(plotrix) 

myfun = function(x){ 
    out = pasteCols(t(x), sep = ":") 
    # some code 
    return(out) 
} 

然後,它很容易:每個組,計算第一索引和最後一列的你想,當你調用myfun,採用模數和整數除法使用方法:

rubiques_solution = function(group, myd, num_to_group){ 
    # loop over groups 
    for(g in unique(group)){ 
     var_index = which(group == g) 
     num_var = length(var_index) 

     # test to make sure num_to_group is smaller than the number of variable 
     if(num_var < num_to_group){ 
     stop("num_to_group > number of variable in at least one group") 
     } 

     # number of calls to myfun 
     num_calls = num_var %/% num_to_group 

     # the idea here is that we create the first and last column 
     # in which we are interested for each call 
     first = seq(from = var_index[1], by = num_to_group, length = num_calls) 
     last = first + num_to_group -1 
     # the last call will contain possibly more varialbe, we adjust here: 
     last[length(last)] = last[length(last)] + (num_var %% num_to_group) 

     for(i in num_calls){ 
     # maybe do something with the return value of myfun ? 
     myfun(myd[,first[i]:last[i]]) 
     } 
    } 
} 

group = rep(1:10, each = 10) # same than yours 
myd = data.frame (matrix (sample (c("AB", "BB", "AA"), 100*100, replace = T), ncol = 100)) # same than yours 
num_to_group = 2 # this is your first example 
rubiques_solution(group, myd, num_to_group) 

希望我明白這個問題是正確的。

+0

Rubique,你能展示一個如何應用你的函數的例子嗎? – A5C1D2H2I1M1N2O1R2T1 2012-08-06 14:46:47

+0

你不會看到輸出,因爲在你的例子中,你只能看到_call_ myfun,所以這就是我所做的,如果你只需要放入一個像其他答案一樣的列表(或者在data.frame ...相同的東西中) – Rubique 2012-08-06 19:19:33