2017-03-16 42 views
1

我正在使用泰坦尼克號dataset。在大多數行中,屬性Cabin爲空。所以,我想用NA替換所有那些有Cabin列的空值的行。無法將NA放入數據集的空列

要做到這一點,我寫道:

train[train$Cabin==" "] <- "NA"

write.csv(train,file="editedtrain.csv") 

但文件editedtrain.csv不具有具有Cabin列空值的行NA

以下是運行上述代碼後head(train)的結果。

  Ticket   Fare  Cabin Embarked 
1  A/5 21171   7.2500    S 
2   PC 17599   71.2833 C85  C 
3  STON/O2.3101282  7.9250    S 
4   113803   53.1000 C123  S 
5   373450   8.0500    S 
6   330877   8.4583    Q 

而且dput

structure(
    list(
    PassengerId = 1:6, 
    Survived = c(0L, 1L, 1L, 1L,0L, 0L), 
    Pclass = c(3L, 1L, 3L, 1L, 3L, 3L), 
    Name = c("Braund, Mr. Owen Harris", "Cumings, Mrs. John Bradley (Florence Briggs Thayer)", "Heikkinen, Miss. Laina", "Futrelle, Mrs. Jacques Heath (Lily May Peel)", "Allen, Mr. William Henry", "Moran, Mr. James"), 
    Sex = c("male", "female", "female", "female", "male", "male"), 
    Age = c(22, 38, 26, 35, 35, NA), 
    SibSp = c(1L, 1L, 0L, 1L, 0L, 0L), 
    Parch = c(0L, 0L, 0L, 0L, 0L, 0L), 
    Ticket = c("A/5 21171", "PC 17599", "STON/O2. 3101282", "113803", "373450", "330877"), 
    Fare = c(7.25, 71.2833, 7.925, 53.1, 8.05, 8.4583), 
    Cabin = c("", "C85", "", "C123", "", ""), 
    Embarked = c("S", "C", "S", "S", "S", "Q")), 
    .Names = c("PassengerId", "Survived", "Pclass", "Name", "Sex", "Age", "SibSp", "Parch", "Ticket", "Fare", "Cabin", "Embarked"), 
    row.names = c(NA, 6L), class = "data.frame") 

如何實現我想要什麼?

+2

您需要有'train $ Cabin [train $ Cabin ==「」] < - NA' – akrun

+0

不,它不工作。 – a874

+1

您需要使用'dput' – akrun

回答

1

正如您在dput中看到的,train$Cabin缺失值是""

因此,要將其更改爲NA,您不能將空格放在引號內。

你只需要做到這一點train$Cabin[train$Cabin==""] <- NA

您需要指定您希望Cabin列被改變,而r識別NA不帶引號。


由於Frank評論說,如果你只是用na.strings = ""閱讀.csv文件,它會自動完成這項工作。這將是這樣的:

train <- read.csv("YOUR_PATH\\train.csv", stringAsFactors = F, na.strings = "") 

一些提示:

  • 當你read.csv(),設置stringsAsFactors = F,如果你希望你的角色列,繼續爲字符,而不是因素

  • 當你write.csv(),設置row.names = F如果你不想它創建一個行ID爲的列。

+1

謝謝弗蘭克,我更新了答案。但我的提示只是爲了提供一些我認爲會很有用的信息。 – TheBiro