這最終導致了很多步驟。你可以做得更少,但這是我做到的。我還假設yoru數據是在一個數據框中以每行一個地址開始。
dat = data.frame(Addresses = c("1626 Aviation Way, Albuquerque, NM 30906, USA",
"1626 Aviation Way, Augusta, GA 30906, USA",
"325 Main St, Stratford, CT 06615, USA",
"4205 Bessie Coleman Blvd, Tampa, FL 33607, USA"), stringsAsFactors = FALSE)
> dat
Addresses
1 1626 Aviation Way, Albuquerque, NM 30906, USA
2 1626 Aviation Way, Augusta, GA 30906, USA
3 325 Main St, Stratford, CT 06615, USA
4 4205 Bessie Coleman Blvd, Tampa, FL 33607, USA
現在,我們需要分割逗號來啓動,然後將狀態和zip分開。我也將通過分割逗號來刪除多餘的空格。
dat2 = sapply(dat$Addresses, strsplit, ",")
dat2 = lapply(dat2, trimws)
> dat2
$`1626 Aviation Way, Albuquerque, NM 30906, USA`
[1] "1626 Aviation Way" "Albuquerque" "NM 30906" "USA"
$`1626 Aviation Way, Augusta, GA 30906, USA`
[1] "1626 Aviation Way" "Augusta" "GA 30906" "USA"
$`325 Main St, Stratford, CT 06615, USA`
[1] "325 Main St" "Stratford" "CT 06615" "USA"
$`4205 Bessie Coleman Blvd, Tampa, FL 33607, USA`
[1] "4205 Bessie Coleman Blvd" "Tampa" "FL 33607" "USA"
現在,我們需要將其重新置回數據框。
dat2 = data.frame(matrix(unlist(dat2), ncol = 4, byrow = TRUE), stringsAsFactors = FALSE)
> dat2
X1 X2 X3 X4
1 1626 Aviation Way Albuquerque NM 30906 USA
2 1626 Aviation Way Augusta GA 30906 USA
3 325 Main St Stratford CT 06615 USA
4 4205 Bessie Coleman Blvd Tampa FL 33607 USA
接下來,我們可以將x3分成狀態和zip,然後刪除該列。
dat2$State = sapply(dat2$X3, function(x) strsplit(x, " ")[[1]][1])
dat2$Zip = sapply(dat2$X3, function(x) strsplit(x, " ")[[1]][2])
dat2 = dat2[, -3]
> dat2
X1 X2 X4 State Zip
1 1626 Aviation Way Albuquerque USA NM 30906
2 1626 Aviation Way Augusta USA GA 30906
3 325 Main St Stratford USA CT 06615
4 4205 Bessie Coleman Blvd Tampa USA FL 33607
最後,我們可以設置列名稱,我們就完成了。
colnames(dat2) = c("Street", "City", "Country", "State", "Zip")
> dat2
Street City Country State Zip
1 1626 Aviation Way Albuquerque USA NM 30906
2 1626 Aviation Way Augusta USA GA 30906
3 325 Main St Stratford USA CT 06615
4 4205 Bessie Coleman Blvd Tampa USA FL 33607
查看'strsplit'或'regexpr'。 – ekstroem
或者如果您使用的是數據框,則可以使用'tidyr'中的'separate()'函數。 –
我試着做這個<-strsplit($ Adress,「,」)。我沒有得到正確的答案。以下是我嘗試在數據框中寫入時發生的錯誤:錯誤(函數(...,row.names = NULL,check.rows = FALSE,check.names = TRUE,: 參數意味着行數不同:4,5 – Kaushik