2017-08-26 104 views
0

我試圖尋找一個類似的問題,但找不到一個。如果你這樣做,請讓我知道!如何用數據集中其他位置的等價值替換NA?

我一直在做一個項目尋找糧食主食

這裏是我的數據集的一個子集:

   nutrient.component.  grain nutrients 
1    Beta-carotene (μg) White Rice  0.00 
2    Beta-carotene (μg) Brown Rice  NA 
3      Calcium (mg) White Rice  28.00 
4      Calcium (mg) Brown Rice  23.00 
5     Carbohydrates (g) White Rice  80.00 
6     Carbohydrates (g) Brown Rice  77.00 
7      Copper (mg) White Rice  0.22 
8      Copper (mg) Brown Rice  NA 
9      Energy (kJ) White Rice 1528.00 
10      Energy (kJ) Brown Rice 1549.00 
11       Fat (g) White Rice  0.66 
12       Fat (g) Brown Rice  2.92 
13      Fiber (g) White Rice  1.30 
14      Fiber (g) Brown Rice  3.50 
15   Folate Total (B9) (μg) White Rice  8.00 
16   Folate Total (B9) (μg) Brown Rice  20.00 
17      Iron (mg) White Rice  0.80 
18      Iron (mg) Brown Rice  1.47 
19   Lutein+zeaxanthin (μg) White Rice  0.00 
20   Lutein+zeaxanthin (μg) Brown Rice  NA 
21     Magnesium (mg) White Rice  25.00 
22     Magnesium (mg) Brown Rice 143.00 
23     Manganese (mg) White Rice  1.09 
24     Manganese (mg) Brown Rice  3.74 
25 Monounsaturated fatty acids (g) White Rice  0.21 
26 Monounsaturated fatty acids (g) Brown Rice  1.05 
27     Niacin (B3) (mg) White Rice  1.60 
28     Niacin (B3) (mg) Brown Rice  5.09 
29  Pantothenic acid (B5) (mg) White Rice  1.01 
30  Pantothenic acid (B5) (mg) Brown Rice  1.49 
31     Phosphorus (mg) White Rice 115.00 
32     Phosphorus (mg) Brown Rice 333.00 
33 Polyunsaturated fatty acids (g) White Rice  0.18 
34 Polyunsaturated fatty acids (g) Brown Rice  1.04 
35     Potassium (mg) White Rice 115.00 
36     Potassium (mg) Brown Rice 223.00 
37      Protein (g) White Rice  7.10 
38      Protein (g) Brown Rice  7.90 
39    Riboflavin (B2)(mg) White Rice  0.05 
40    Riboflavin (B2)(mg) Brown Rice  0.09 
41  Saturated fatty acids (g) White Rice  0.18 
42  Saturated fatty acids (g) Brown Rice  0.58 
43     Selenium (μg) White Rice  15.10 
44     Selenium (μg) Brown Rice  NA 
45      Sodium (mg) White Rice  5.00 
46      Sodium (mg) Brown Rice  7.00 
47      Sugar (g) White Rice  0.12 
48      Sugar (g) Brown Rice  0.85 
49     Thiamin (B1)(mg) White Rice  0.07 
50     Thiamin (B1)(mg) Brown Rice  0.40 
51     Vitamin A (IU) White Rice  0.00 
52     Vitamin A (IU) Brown Rice  0.00 
53     Vitamin B6 (mg) White Rice  0.16 
54     Vitamin B6 (mg) Brown Rice  0.51 
55     Vitamin C (mg) White Rice  0.00 
56     Vitamin C (mg) Brown Rice  0.00 
57 Vitamin E, alpha-tocopherol (mg) White Rice  0.11 
58 Vitamin E, alpha-tocopherol (mg) Brown Rice  0.59 
59     Vitamin K1 (μg) White Rice  0.10 
60     Vitamin K1 (μg) Brown Rice  1.90 
61      Water (g) White Rice  12.00 
62      Water (g) Brown Rice  10.00 
63      Zinc (mg) White Rice  1.09 
64      Zinc (mg) Brown Rice  2.02 

糙米有四個NA值。
基於此圖表, Graphic 我認爲假定糙米的NA值非常接近白米的等價值是公平的。而反映白米值而不是將其轉換爲零值會更準確。

我的問題是,除了手動查找和輸入糙米的白米當量營養素之外,代碼如何取代白米的等值NA呢?我希望結果能夠轉換銅的NA值;糙米與銅的價值相同;白米(0.22)。首先用零代替NA會更好嗎?但如果我這樣做,那麼我有六種營養素的值爲零,而不是NA的四個值。我試圖找出正確的思維方式來通過代碼解決這個問題。任何洞察到這一點將不勝感激。

謝謝

回答

4

假設您的輸入數據的數據幀被稱爲dt,我們可以使用tidyr包中的fill函數來實現此任務。 dt2是最終輸出。

library(tidyr) 

dt2 <- dt %>% fill(nutrients) 

dt2 
    nutrient.component.       grain nutrients 
1     1 Beta-carotene (µg) White Rice  0.00 
2     2 Beta-carotene (µg) Brown Rice  0.00 
3     3  Calcium (mg) White Rice  28.00 
4     4  Calcium (mg) Brown Rice  23.00 
5     5 Carbohydrates (g) White Rice  80.00 
6     6 Carbohydrates (g) Brown Rice  77.00 
7     7  Copper (mg) White Rice  0.22 
8     8  Copper (mg) Brown Rice  0.22 
... 

fill默認將根據以往和最近的非NA行推諉的NA。因此,確保每個糙米記錄恰好是相關白米記錄的下一行非常重要。

+0

謝謝你的提示。這似乎是一個奇怪的特定功能。我試圖想象如果有其他情況下,這個功能會有用嗎? – RunAmuck

+0

@RunAmuck通常會看到人們爲一行寫下一個值,如果它們的值相同,則將以下單元格留爲空白。我認爲'fill'函數的設計是爲了在這種情況下填寫'NA'或空白單元格。 – www

3

我假設你的數據集data.frame類的,並且它的命名dat

我相信下面的代碼會做到這一點。它將df分成2或1行的列表(示例中的最後一行缺少糙米)。然後檢查這些清單中是否有2行,以及糙米的營養成分是否爲NA。如果是這樣,它分配白米的價值。然後,將結果列表收集回data.frame

sp <- split(dat, dat$nutrient.component.) 
res <- lapply(sp, function(x){ 
      if(nrow(x) == 2 & is.na(x$nutrients[x$grain == "Brown Rice"])) 
       x$grain[x$grain == "Brown Rice"] <- "White Rice" 
      x 
      } 
     ) 

rm(sp) # tidy up 

res <- do.call(rbind, res) 
res 
4

zoo包有一些有用的函數來處理NA

library(data.table) 
setDT(DT)[, nutrients := zoo::na.aggregate(nutrients), by = nutrient.component][] 
    nutrient.component  grain nutrients 
1:  Beta-carotene (<U+00B5>g) White Rice  0.00 
2:  Beta-carotene (<U+00B5>g) Brown Rice  0.00 
3:      Calcium (mg) White Rice  28.00 
4:      Calcium (mg) Brown Rice  23.00 
5:    Carbohydrates (g) White Rice  80.00 
6:    Carbohydrates (g) Brown Rice  77.00 
7:      Copper (mg) White Rice  0.22 
8:      Copper (mg) Brown Rice  0.22 
9:      Energy (kJ) White Rice 1528.00 
10:      Energy (kJ) Brown Rice 1549.00 
11:       Fat (g) White Rice  0.66 
12:       Fat (g) Brown Rice  2.92 
13:      Fiber (g) White Rice  1.30 
14:      Fiber (g) Brown Rice  3.50 
15: Folate Total (B9) (<U+00B5>g) White Rice  8.00 
16: Folate Total (B9) (<U+00B5>g) Brown Rice  20.00 
17:      Iron (mg) White Rice  0.80 
18:      Iron (mg) Brown Rice  1.47 
19: Lutein+zeaxanthin (<U+00B5>g) White Rice  0.00 
20: Lutein+zeaxanthin (<U+00B5>g) Brown Rice  0.00 
... 

注行2,8,和20

data.table這裏使用因爲它更新了DT到位避免複製整個表以節省內存和時間。

相關問題