我有一個數據集,我需要應用一些簡單的規範化。我想要做的是計算與colSums(DF)
colSums比我用colSums劃分一列內的所有值。這是我做的,它似乎工作,但我不能看到是否正確的colSum已被用於每列。我的數據幀是這樣的:如何劃分所有列的總和
structure(list(`2E` = c(28L, 9736L, 20L, 221L, 349L, 21L), `2I` = c(42L,
8254L, 0L, 292L, 106L, 0L), `6E` = c(49L, 4303L, 0L, 1L, 258L,
0L), `6I` = c(0L, 3409L, 0L, 70L, 92L, 0L), `15E` = c(0L, 4178L,
0L, 121L, 106L, 12L), `15I` = c(0L, 3L, 0L, 0L, 0L, 0L), `16E` = c(25L,
9715L, 4L, 167L, 533L, 30L), `16I` = c(0L, 5082L, 12L, 112L,
35L, 0L), `18E` = c(0L, 7425L, 0L, 134L, 324L, 0L), `18I` = c(0L,
15822L, 0L, 565L, 78L, 0L), `20E` = c(0L, 69881L, 0L, 2240L,
3764L, 189L), `20I` = c(0L, 27718L, 0L, 837L, 312L, 239L), `21E` = c(0L,
8841L, 5L, 241L, 458L, 12L), `21I` = c(0L, 308L, 0L, 9L, 14L,
0L), `22E` = c(52L, 34347L, 0L, 523L, 1861L, 44L), `22I` = c(0L,
4202L, 0L, 152L, 58L, 0L), `23E` = c(0L, 3742L, 0L, 30L, 185L,
0L), `23I` = c(31L, 3766L, 0L, 108L, 38L, 12L), `25E` = c(0L,
3647L, 0L, 26L, 189L, 0L), `25I` = c(0L, 11243L, 0L, 903L, 85L,
168L), `26E` = c(0L, 8162L, 0L, 56L, 753L, 0L), `26I` = c(0L,
6325L, 3L, 229L, 85L, 0L), `27E` = c(22L, 7548L, 0L, 119L, 213L,
0L), `27I` = c(4L, 8949L, 0L, 1009L, 114L, 0L), `28E` = c(0L,
6103L, 0L, 100L, 319L, 68L), `28I` = c(0L, 13306L, 0L, 582L,
57L, 0L), `29E` = c(0L, 3608L, 9L, 54L, 142L, 27L), `29I` = c(0L,
5035L, 0L, 138L, 84L, 0L), `30E` = c(0L, 27795L, 0L, 593L, 1680L,
35L), `30I` = c(0L, 5506L, 0L, 146L, 75L, 0L), `32E` = c(13L,
12516L, 22L, 230L, 745L, 17L), `32I` = c(0L, 1271L, 0L, 29L,
13L, 0L), `33E` = c(0L, 3551L, 0L, 0L, 148L, 0L), `33I` = c(0L,
15957L, 0L, 550L, 1L, 0L), `34E` = c(0L, 1852L, 0L, 18L, 138L,
0L), `34I` = c(0L, 10469L, 0L, 243L, 119L, 0L), `35E` = c(0L,
9570L, 0L, 362L, 671L, 0L), `35I` = c(19L, 4953L, 0L, 25L, 32L,
23L), `36E` = c(0L, 2497L, 15L, 55L, 125L, 4L), `36I` = c(0L,
1839L, 11L, 39L, 0L, 0L), `38E` = c(0L, 940L, 0L, 38L, 50L, 0L
), `38I` = c(0L, 2301L, 0L, 60L, 14L, 8L), `39E` = c(0L, 5324L,
0L, 107L, 92L, 41L), `39I` = c(0L, 8360L, 0L, 262L, 13L, 0L),
`40E` = c(15L, 6107L, 10L, 183L, 173L, 13L), `40I` = c(8L,
1517L, 0L, 16L, 10L, 0L), `42E` = c(0L, 14681L, 35L, 312L,
282L, 54L), `42I` = c(0L, 7385L, 1L, 138L, 48L, 0L)), .Names = c("2E",
"2I", "6E", "6I", "15E", "15I", "16E", "16I", "18E", "18I", "20E",
"20I", "21E", "21I", "22E", "22I", "23E", "23I", "25E", "25I",
"26E", "26I", "27E", "27I", "28E", "28I", "29E", "29I", "30E",
"30I", "32E", "32I", "33E", "33I", "34E", "34I", "35E", "35I",
"36E", "36I", "38E", "38I", "39E", "39I", "40E", "40I", "42E",
"42I"), row.names = c("DQ459412", "DQ459413", "DQ459415", "DQ459418",
"DQ459419", "DQ459420"), class = "data.frame")
所以我有我的數據幀,計算colSums。然後只是簡單地計數/ colSums。現在會使用colSums中的所有值還是第一個?
還需要知道的一點是,colSums應該使用與計數數據幀中相同的colname進行分配。所以一列的colSums應該被用來劃分這個列。
這是什麼
R
正在做的是一個重複:http://stackoverflow.com/questions/20596433/how 20596490#20596490 – 2014-09-24 14:34:00@ G.Grothendieck我知道這個問題已被更頻繁地詢問,但重點是我想知道怎麼只用colSums工作來區分countsDF。因爲它似乎有效,但沒有什麼能夠讓你跟蹤,除了手動分割和自動比較。我只是想知道是否有辦法跟蹤這一點。也許我必須改變我的問題 – 2014-09-24 14:38:34