我想名字的矢量份額:子集矢量列(根據情況)
names <- c("DOE John", "VAN DYKE Dick", "SMITH Mary Jane")
成兩個向量
last <- c("DOE", "VAN DYKE", "SMITH")
和
first <- c("John", "Dick", "Mary Jane")
任何幫助將不勝感激。謝謝。
我想名字的矢量份額:子集矢量列(根據情況)
names <- c("DOE John", "VAN DYKE Dick", "SMITH Mary Jane")
成兩個向量
last <- c("DOE", "VAN DYKE", "SMITH")
和
first <- c("John", "Dick", "Mary Jane")
任何幫助將不勝感激。謝謝。
這應該工作:
# Define a pattern that only matches words composed entirely of capital letters
pat <- paste("^[", paste(LETTERS, collapse=""), "]*$", sep="")
# [1] "^[ABCDEFGHIJKLMNOPQRSTUVWXYZ]*$"
names <- c("DOE John", "VAN DYKE Dick", "SMITH Mary Jane")
splitNames <- strsplit(names, " ")
# LAST NAMES: (Extract and paste together words matching 'pat')
sapply(splitNames,
function(X) paste(grep(pat, X, value=TRUE), collapse=" "))
# [1] "DOE" "VAN DYKE" "SMITH"
# First Names: (Extract and paste together words NOT matching 'pat')
sapply(splitNames,
function(X) paste(grep(pat, X, value=TRUE, invert=TRUE), collapse=" "))
# [1] "John" "Dick" "Mary Jane"
匹配所有大寫字母,你可以選擇使用字符類[:upper:]
,如:
pat <- "^[[:upper:]]*$"
雖然在?regexp
文檔似乎輕度警告反對這樣做,理由是便攜性降低。
這裏的一種方式:
l <- strsplit(names," ")
splitCaps <- function(x){
ind <- x == toupper(x)
list(upper = paste(x[ind],collapse = " "),
lower = paste(x[!ind],collapse = " "))
}
> lapply(l,splitCaps)
[[1]]
[[1]]$upper
[1] "DOE"
[[1]]$lower
[1] "John"
[[2]]
[[2]]$upper
[1] "VAN DYKE"
[[2]]$lower
[1] "Dick"
[[3]]
[[3]]$upper
[1] "SMITH"
[[3]]$lower
[1] "Mary Jane"
做筆記,不過,這有大規模需要提醒的是,如果你開始不尋常的字符集混合挑選使用toupper
的全部大寫的話將是非常不可靠的,區域設置,符號等,但對於非常簡單的ASCII類型的情況,它應該可以正常工作。
您可以分享迄今爲止已經嘗試過的,以及爲什麼它沒有按照您的意願工作嗎? – joran 2012-01-03 19:31:51
我試過strplit(名稱,「」)沿空間分開。問題是姓氏和名字中的長度單詞不是恆定的。一個常數是,姓氏總是全部大寫。 – srmulcahy 2012-01-03 19:36:02