2017-03-06 101 views
2

我有點卡住試圖使用sample函數爲我的任務,這是從一個因子的每個級別的n抽樣隨機行,並創建一個新的變量對此和另一個變量的值。基於因子水平和隨機選擇創建新變量

簡化示例:

Subject = c("100","100","100","100", "100", "200", "200", "200", "200", "200") 
Condition = c("Blue","Blue","Blue","Blue", "Blue", "Blue", "Blue", "Blue", "Blue", "Blue") 
Response = rnorm(10) 
df = data.frame(Subject,Condition, Response) 

在這裏,目標將是採樣3點隨機的行爲的Subject每個級別,創建一個新的變量,假設Condition.Rand具有標記爲「紅」隨機選擇的行和其餘的標有無論什麼價值的都在Condition - 在這種情況下,「藍色」。因此對於每個Subject,60%的Condition.Rand將被標記爲「紅色」並且40%被標記爲「藍色」。

爲了清楚起見,我想恰好 3標記爲「紅」對被檢體100隨機行(或5周的觀察的60%),並且恰好隨機排標記爲「紅」對主題200

謝謝!

回答

2

split使用劃分df爲子組和sample"Red""Blue"具有用於每個子組期望概率。

set.seed(42) 
do.call(rbind, lapply(split(df, df$Subject), function(a) 
cbind(a, 
    cond.rand = sample(c("Red","Blue"), size = nrow(a), replace = TRUE, prob = c(0.6,0.4))))) 
#  Subject Condition Response cond.rand 
#100.1  100  Blue -1.7813084  Blue 
#100.2  100  Blue -0.1719174  Blue 
#100.3  100  Blue 1.2146747  Red 
#100.4  100  Blue 1.8951935  Blue 
#100.5  100  Blue -0.4304691  Blue 
#200.6  200  Blue -0.2572694  Red 
#200.7  200  Blue -1.7631631  Blue 
#200.8  200  Blue 0.4600974  Red 
#200.9  200  Blue -0.6399949  Blue 
#200.10  200  Blue 0.4554501  Blue 
+1

這完全不是那麼回事做到這一點,因爲有時候它返回那裏_all_'cond.rand'被標記爲「紅色」針對特定的主題實例。我想_exactly_ 3(或60%)隨機排標有「紅」爲主題100,和_exactly_ 3個隨機排標有「紅」爲主題200 – amurphy

2

我們也可以用avebase R

set.seed(42) 
df1$cond.rand <- with(df, ave(seq_along(Subject), Subject, FUN = function(x) 
    sample(c("Red", "Blue"), size = length(x), replace = TRUE, prob = c(0.6, 0.4)))) 
df1$cond.rand 
#[1] "Blue" "Blue" "Red" "Blue" "Blue" "Red" "Blue" "Red" "Blue" "Blue" 
+1

這有同樣的問題D.B.的建議。我會編輯以更清楚地解釋我的任務。 – amurphy

+1

@amurphy,嘗試用'(DF,AVE(seq_along(題目),主題,FUN =函數(X) 樣品(C(REP( '紅色',天花板(長度(X)* 0.6)),代表(」藍色',長度(x) - 天花板(長度(x)* 0.6))))))' –

+1

@db非常感謝您! – amurphy