2012-01-03 57 views
1

我想名字的矢量份額:子集矢量列(根據情況)

names <- c("DOE John", "VAN DYKE Dick", "SMITH Mary Jane") 

成兩個向量

last <- c("DOE", "VAN DYKE", "SMITH") 

first <- c("John", "Dick", "Mary Jane") 

任何幫助將不勝感激。謝謝。

+0

您可以分享迄今爲止已經嘗試過的,以及爲什麼它沒有按照您的意願工作嗎? – joran 2012-01-03 19:31:51

+0

我試過strplit(名稱,「」)沿空間分開。問題是姓氏和名字中的長度單詞不是恆定的。一個常數是,姓氏總是全部大寫。 – srmulcahy 2012-01-03 19:36:02

回答

2

這應該工作:

# Define a pattern that only matches words composed entirely of capital letters 
pat <- paste("^[", paste(LETTERS, collapse=""), "]*$", sep="") 
# [1] "^[ABCDEFGHIJKLMNOPQRSTUVWXYZ]*$" 

names <- c("DOE John", "VAN DYKE Dick", "SMITH Mary Jane") 
splitNames <- strsplit(names, " ") 

# LAST NAMES: (Extract and paste together words matching 'pat') 
sapply(splitNames, 
     function(X) paste(grep(pat, X, value=TRUE), collapse=" ")) 
# [1] "DOE"  "VAN DYKE" "SMITH" 

# First Names: (Extract and paste together words NOT matching 'pat') 
sapply(splitNames, 
     function(X) paste(grep(pat, X, value=TRUE, invert=TRUE), collapse=" ")) 
# [1] "John"  "Dick"  "Mary Jane" 

匹配所有大寫字母,你可以選擇使用字符類[:upper:],如:

pat <- "^[[:upper:]]*$" 

雖然在?regexp文檔似乎輕度警告反對這樣做,理由是便攜性降低。

+1

爲了獲得名字,你可以在grep語句中添加invert = TRUE。 – Dason 2012-01-03 19:55:06

+0

謝謝@Dason。很好的提醒。這比重複使用'!grepl()'要容易得多! – 2012-01-03 20:03:32

+0

非常好,謝謝! – srmulcahy 2012-01-03 20:09:56

1

這裏的一種方式:

l <- strsplit(names," ") 
splitCaps <- function(x){ 
    ind <- x == toupper(x) 
    list(upper = paste(x[ind],collapse = " "), 
     lower = paste(x[!ind],collapse = " ")) 
} 

> lapply(l,splitCaps) 
[[1]] 
[[1]]$upper 
[1] "DOE" 

[[1]]$lower 
[1] "John" 


[[2]] 
[[2]]$upper 
[1] "VAN DYKE" 

[[2]]$lower 
[1] "Dick" 


[[3]] 
[[3]]$upper 
[1] "SMITH" 

[[3]]$lower 
[1] "Mary Jane" 

做筆記,不過,這有大規模需要提醒的是,如果你開始不尋常的字符集混合挑選使用toupper的全部大寫的話將是非常不可靠的,區域設置,符號等,但對於非常簡單的ASCII類型的情況,它應該可以正常工作。