2016-12-15 61 views
0

我正在嘗試使用基本功能做內容平衡採樣。但是,你如何確保至少有一行選擇了「a」或「b」組?採樣與內容平衡

a <- cbind(matrix(1:36,ncol=3),rbind(as.matrix(rep('a',each=10)),as.matrix(rep('b', each=2)))) 

b <- 1:5 
for (i in b){ 
    draw <- sample(nrow(a),1) 
    a<- a[-draw,] #minus that row. 
    } 
a 

使用這種方法我可能會或可能不會'b'。如何確保組b中的一行總是被選中至少一次?

+0

分層採樣:分別從每個組中取樣,根據某些規則(例如,90%a組和10%b組)選擇每個子樣本。 – lmo

+0

您可以從採樣包中的功能層獲得分層採樣 – G5W

回答

0

這是一個非常基本的解決方案,不是很漂亮,但我試圖堅持基本功能。 這將從a返回尺寸b的含有至少一個行中的樣本,其a[,4] == "b"

編輯:更新爲僅使用基函數的要求併爲其中至少一個「A」需要這兩種情況的工作要繪製並且至少需要繪製一個「b」

a <- data.frame(matrix(1:36,ncol=3),rbind(as.matrix(rep('a',each=10)),as.matrix(rep('b', each=2)))) 
names(a) <- c("X1","X2","X3","X4") 

b <- 5 
a2 <- data.frame() 

for (i in b){ 
    draw <- sample(1:nrow(a),b-1,replace = F) # draw a sample of size b-1 
    a2<- a[draw,]   # store rows in a2 
    a3<- a[-draw,]  # store rest in a3 
    if(sum(a2[,4]=="b") == 0){ # if a2 has no "b" in column 4 
    # draw 1 value from rownames containing "b" in fourth column and append to draw, store in draw2 
    draw2 <- c(draw,sample(rownames(a[which(a$X4=="b"),]),1,replace = F)) 
    # else draw one random row from rownames not in a but not in a2 
    }else{ 
    if(sum(a2[,4]=="a") == 0){ # if a2 has no "a" in column 4 
    # draw 1 value from rownames containing "a" in fourth column and append to draw, store in draw2 
    draw2 <- c(draw,sample(rownames(a[which(a$X4=="a"),]),1,replace = F)) 
    # else draw one random row from rownames not in a but not in a2 
    } 
    else {draw2 <- c(draw,sample(rownames(a3),1,replace = F))}} 
    a2<- a[draw2,] # pick these rows 
} 
a2