在一列中選擇數據框給定值的一半

我想在其中一列中選擇一半的數據框給定值。換句話說，從數據框下面我給出需要在第Y列提取一半的行給出的值：在一列中選擇數據框給定值的一半

DF: 
id1 column Y value 
9830  A   6 
7609  A   0 
9925  B   0 
9922  B   5 
9916  B   6 
9917  B   8 
9914  C   2 
9914  C   7 
9914  C   7 
9914  C   2 
9914  C   9

新的數據幀應該是這樣的：

NEW DF: 
    id1 column Y value 
    9830  A   6 
    9925  B   0 
    9922  B   5 
    9914  C   2 
    9914  C   7

而且，這將是有益知道解決方案選擇列Y中的所有行datefram DF的隨機一半（例如，不選擇第一個50％）。

任何幫助表示讚賞。謝謝！

來源

2016-10-01 Makaroni

假設你希望每個組具有相同值的行上半年的column Y其中的奇數行，我們本輪下跌，我們可以使用filter從dplyr：

library(dplyr) 
df %>% group_by(`column Y`) %>% filter(row_number() <= floor(n()/2)) 
##Source: local data frame [5 x 3] 
##Groups: column Y [3] 
## 
## id1 column Y laclen 
## <int> <fctr> <int> 
##1 9830  A  6 
##2 9925  B  0 
##3 9922  B  5 
##4 9914  C  2 
##5 9914  C  7

我們首先group_bycolumn Y（請注意後面的引號，因爲列名稱包含空格），然後使用filter僅保留row_number小於或等於n()給出的總行數除以2（和向下舍入floor ）。

選擇每個組中的行的隨機50％，使用sample生成行號，以保持和%in%匹配那些保留：

set.seed(123) 
result <- df %>% group_by(`column Y`) %>% filter(row_number() %in% sample(seq_len(n()),floor(n()/2))) 
##Source: local data frame [5 x 3] 
##Groups: column Y [3] 
## 
## id1 column Y laclen 
## <int> <fctr> <int> 
##1 9830  A  6 
##2 9922  B  5 
##3 9917  B  8 
##4 9914  C  2 
##5 9914  C  9

來源

2016-10-01 11:22:27 aichao

最好的，謝謝！你知道如何選擇隨機50％的行，而不只是前50％？ – Makaroni

@Makaroni：請參閱我的編輯。 – aichao

在一列中選擇數據框給定值的一半

回答

相關問題