隨機化均衡實驗設計

我正在編寫一些代碼來爲市場研究生成均衡的實驗設計，專門用於聯合分析和最大差異縮放。隨機化均衡實驗設計

第一步是生成部分平衡不完整塊（PBIB）設計。這是直接使用R包AlgDesign。

對於大多數類型的研究，這樣的設計就足夠了。然而，在市場研究中，人們想要控制每個區塊的秩序效應。這是我希望得到一些幫助的地方。

創建測試數據

# The following code is not essential in understanding the problem, 
# but I provide it in case you are curious about the origin of the data itself. 
#library(AlgDesign) 
#set.seed(12345) 
#choices <- 4 
#nAttributes <- 7 
#blocksize <- 7 
#bsize <- rep(choices, blocksize) 
#PBIB <- optBlock(~., withinData=factor(1:nAttributes), blocksizes=bsize) 
#df <- data.frame(t(array(PBIB$rows, dim=c(choices, blocksize)))) 
#colnames(df) <- paste("Item", 1:choices, sep="") 
#rownames(df) <- paste("Set", 1:nAttributes, sep="") 

df <- structure(list(
    Item1 = c(1, 2, 1, 3, 1, 1, 2), 
    Item2 = c(4, 4, 2, 5, 3, 2, 3), 
    Item3 = c(5, 6, 5, 6, 4, 3, 4), 
    Item4 = c(7, 7, 6, 7, 6, 7, 5)), 
    .Names = c("Item1", "Item2", "Item3", "Item4"), 
    row.names = c("Set1", "Set2", "Set3", "Set4", "Set5", "Set6", "Set7"), 
    class = "data.frame")

**定義兩個輔助功能

balanceMatrix計算矩陣的平衡：

balanceMatrix <- function(x){ 
    t(sapply(unique(unlist(x)), function(i)colSums(x==i))) 
}

balanceScore計算 '適應' 的度量 - 更低的分數更好，零完美：

balanceScore <- function(x){ 
    sum((1-x)^2) 
}

定義一個隨機

findBalance <- function(x, nrepeat=100){ 
    df <- x 
    minw <- Inf 
    for (n in 1:nrepeat){ 
     for (i in 1:nrow(x)){df[i,] <- sample(df[i, ])} 
     w <- balanceMatrix(df) 
     sumw <- balanceScore(w) 
     if(sumw < minw){ 
      dfbest <- df 
      minw <- sumw 
     } 
    } 
    dfbest 
}

主代碼

數據幀df是7套一個平衡的設計重新採樣的行的函數。每組將向被訪者顯示4項。 df中的數值指的是7個不同的屬性。例如，在Set1中，被訪者將被要求從屬性1,3,4和7中選擇他/她的首選選項。

每組中的項目排序在概念上並不重要。因此（1,4,5,7）的排序與（7,5,4,1）相同。

但是，要獲得完全平衡的設計，每個屬性在每列中將顯示相同的次數。這種設計存在不平衡，因爲屬性1出現4次在第1列：

df 

    Item1 Item2 Item3 Item4 
Set1  1  4  5  7 
Set2  2  4  6  7 
Set3  1  2  5  6 
Set4  3  5  6  7 
Set5  1  3  4  6 
Set6  1  2  3  7 
Set7  2  3  4  5

，試圖找到一個更平衡的設計，我寫的函數findBalance。隨機搜索更好的解決方案，通過隨機抽樣橫跨df的行。隨着100個重複發現以下最佳解決方案：

set.seed(12345) 
dfbest <- findBalance(df, nrepeat=100) 
dfbest 

    Item1 Item2 Item3 Item4 
Set1  7  5  1  4 
Set2  6  7  4  2 
Set3  2  1  5  6 
Set4  5  6  7  3 
Set5  3  1  6  4 
Set6  7  2  3  1 
Set7  4  3  2  5

這似乎更加平衡，並計算出平衡矩陣包含了大量的人的。餘額矩陣計算每個屬性在每列中出現的次數。例如，下表顯示（在頂部左手小區）屬性1出現兩次不是在所有在第1列，並在第2列兩次：

balanceMatrix(dfbest) 

    Item1 Item2 Item3 Item4 
[1,]  0  2  1  1 
[2,]  1  1  1  1 
[3,]  1  1  1  1 
[4,]  1  0  1  2 
[5,]  1  1  1  1 
[6,]  1  1  1  1 
[7,]  2  1  1  0

的平衡得分此解決方案是6 ，這表明至少有六個單元不等，以1：

balanceScore(balanceMatrix(dfbest)) 
[1] 6

我的問題

感謝您對以下這個詳細的例子。我的問題是我怎樣才能將這個搜索功能重寫得更系統化？我想與R告訴：

最小化balanceScore(df)
通過改變df
主題行以：已經完全約束

來源

2011-04-12 Andrie

OK，我有點誤解了你的問題。所以拜拜費奧多羅夫，你好應用了費奧多羅夫。

下面的算法是基於費奧多羅夫算法的第二次迭代：

計算所有可能的排列爲每個組，並且將它們存儲在C0列表
畫出從C0第一種可能的解決方案空間（每組一個排列）。這可能是原來的，但因爲我需要這些指數，我寧願隨便開始。
計算每個新解決方案的分數，其中第一個集合被所有排列替換。
取代所述第一組與所述置換給出最低得分
重複3-4爲每一個其它組
重複3-5直到分數達到0或者n次迭代。

（可選）您可以在10次迭代後重新啓動該過程，並從另一個起點開始。在您的測試情況下，事實證明，一些出發點收斂非常緩慢到0以下功能發現了一個得分爲0的平衡實驗設計在我的電腦上平均值不到1.5秒時：

> X <- findOptimalDesign(df) 
> balanceScore(balanceMatrix(X)) 
[1] 0 
> mean(replicate(20, system.time(X <- findOptimalDesign(df))[3])) 
[1] 1.733

所以這是現在函數（給你的原始balanceMatrix和balanceScore功能）：

findOptimalDesign <- function(x,iter=4,restart=T){ 
    stopifnot(require(combinat)) 
    # transform rows to list 
    sets <- unlist(apply(x,1,list),recursive=F) 
    nsets <- NROW(x) 
    # C0 contains all possible design points 
    C0 <- lapply(sets,permn) 
    n <- gamma(NCOL(x)+1) 

    # starting point 
    id <- sample(1:n,nsets) 
    Sol <- sapply(1:nsets,function(i)C0[[i]][id[i]]) 

    IT <- iter 
    # other iterations 
    while(IT > 0){ 
     for(i in 1:nsets){ 
      nn <- 1:n 
      scores <- sapply(nn,function(p){ 
      tmp <- Sol 
      tmp[[i]] <- C0[[i]][[p]] 
      w <- balanceMatrix(do.call(rbind,tmp)) 
      balanceScore(w) 
      }) 
      idnew <- nn[which.min(scores)] 
      Sol[[i]] <- C0[[i]][[idnew]] 

     } 
     #Check if score is 0 
     out <- as.data.frame(do.call(rbind,Sol)) 
     score <- balanceScore(balanceMatrix(out)) 
     if (score==0) {break} 
     IT <- IT - 1 

     # If asked, restart 
     if(IT==0 & restart){ 
      id <- sample(1:n,nsets) 
      Sol <- sapply(1:nsets,function(i)C0[[i]][id[i]]) 
      IT <- iter 
     } 
    } 
    out 
}

HTH

編輯：固定的小bug（它立即重新啓動每一輪之後，我忘了在IT條件）。這樣做，它仍然運行得更快。

來源

2011-04-12 15:51:51

+1這太棒了。謝謝。 – Andrie 2011-04-15 11:37:39

隨機化均衡實驗設計

回答

相關問題