2017-08-12 125 views
0

與子集data.frame一些有線輸出在R.

這裏是文件我用

https://d37djvu3ytnwxt.cloudfront.net/assets/courseware/v1/ccdc87b80d92a9c24de2f04daec5bb58/asset-v1:[email protected]+block/WHO.csv

讀取後R中的數據有194個obs。有13個變量。

> str(WHO) 
'data.frame': 194 obs. of 13 variables: 
$ Country      : Factor w/ 194 levels "Afghanistan",..: 1 2 3 4 5 6 7 8 9 10 ... 
$ Region      : Factor w/ 6 levels "Africa","Americas",..: 3 4 1 4 1 2 2 4 6 4 ... 
$ Population     : int 29825 3162 38482 78 20821 89 41087 2969 23050 8464 ... 
$ Under15      : num 47.4 21.3 27.4 15.2 47.6 ... 
$ Over60      : num 3.82 14.93 7.17 22.86 3.84 ... 
$ FertilityRate    : num 5.4 1.75 2.83 NA 6.1 2.12 2.2 1.74 1.89 1.44 ... 
$ LifeExpectancy    : int 60 74 73 82 51 75 76 71 82 81 ... 
$ ChildMortality    : num 98.5 16.7 20 3.2 163.5 ... 
$ CellularSubscribers   : num 54.3 96.4 99 75.5 48.4 ... 
$ LiteracyRate     : num NA NA NA NA 70.1 99 97.8 99.6 NA NA ... 
$ GNI       : num 1140 8820 8310 NA 5230 ... 
$ PrimarySchoolEnrollmentMale : num NA NA 98.2 78.4 93.1 91.1 NA NA 96.9 NA ... 
$ PrimarySchoolEnrollmentFemale: num NA NA 96.4 79.4 78.2 84.5 NA NA 97.5 NA ... 

但隨着功能子集子集的結果從DF [,]如實施例下面是不同的。

> Outliers <- WHO[WHO$GNI > 10000 & WHO$FertilityRate > 2.5,] 
> nrow(Outliers) 
    [1] 27 
Country    Region Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers 
NA     <NA>     <NA>   NA  NA  NA   NA    NA    NA     NA 
23    Botswana    Africa  2004 33.75 5.63   2.71    66   53.3    142.82 
NA.1    <NA>     <NA>   NA  NA  NA   NA    NA    NA     NA 
NA.2    <NA>     <NA>   NA  NA  NA   NA    NA    NA     NA 
(trimmed ...) 

有很多NA obs。

雖然使用子集功能,產量正確的結果。

> Outliers <- subset(WHO, GNI > 10000 & FertilityRate > 2.5) 
> nrow(Outliers) 
[1] 7 
> Outliers 
      Country    Region Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers 
23   Botswana    Africa  2004 33.75 5.63   2.71    66   53.3    142.82 
56 Equatorial Guinea    Africa  736 38.95 4.53   5.04    54   100.3    59.15 
63    Gabon    Africa  1633 38.49 7.38   4.18    62   62.0    117.32 
83    Israel    Europe  7644 27.53 15.15   2.92    82   4.2    121.66 
88   Kazakhstan    Europe  16271 25.46 10.04   2.52    67   18.7    155.74 
131   Panama    Americas  3802 28.65 10.13   2.52    77   18.5    188.60 
150  Saudi Arabia Eastern Mediterranean  28288 29.69 4.59   2.76    76   8.6    191.24 
(trimmed ...) 
+1

希望鏈接將幫助https://stackoverflow.com/questions/40446165/how-to-subset-data-in-r-without-losing-na-rows – Wen

+0

謝謝,這是明確的答案。 –

回答

0

如何確保您首先擺脫NA?

Outliers <- WHO[!is.na(WHO$GNI) & WHO$GNI > 10000 & 
!is.na(WHO$FertilityRate) & WHO$FertilityRate > 2.5,] 
+1

謝謝,然後使用** [**子集必須注意** NA **。 –