子集矢量列（根據情況）

我想名字的矢量份額：子集矢量列（根據情況）

names <- c("DOE John", "VAN DYKE Dick", "SMITH Mary Jane")

成兩個向量

last <- c("DOE", "VAN DYKE", "SMITH")

和

first <- c("John", "Dick", "Mary Jane")

任何幫助將不勝感激。謝謝。

來源

2012-01-03 srmulcahy

您可以分享迄今爲止已經嘗試過的，以及爲什麼它沒有按照您的意願工作嗎？ – joran 2012-01-03 19:31:51

我試過strplit（名稱，「」）沿空間分開。問題是姓氏和名字中的長度單詞不是恆定的。一個常數是，姓氏總是全部大寫。 – srmulcahy 2012-01-03 19:36:02

這應該工作：

# Define a pattern that only matches words composed entirely of capital letters 
pat <- paste("^[", paste(LETTERS, collapse=""), "]*$", sep="") 
# [1] "^[ABCDEFGHIJKLMNOPQRSTUVWXYZ]*$" 

names <- c("DOE John", "VAN DYKE Dick", "SMITH Mary Jane") 
splitNames <- strsplit(names, " ") 

# LAST NAMES: (Extract and paste together words matching 'pat') 
sapply(splitNames, 
     function(X) paste(grep(pat, X, value=TRUE), collapse=" ")) 
# [1] "DOE"  "VAN DYKE" "SMITH" 

# First Names: (Extract and paste together words NOT matching 'pat') 
sapply(splitNames, 
     function(X) paste(grep(pat, X, value=TRUE, invert=TRUE), collapse=" ")) 
# [1] "John"  "Dick"  "Mary Jane"

匹配所有大寫字母，你可以選擇使用字符類[:upper:]，如：

pat <- "^[[:upper:]]*$"

雖然在?regexp文檔似乎輕度警告反對這樣做，理由是便攜性降低。

來源

2012-01-03 19:50:28

爲了獲得名字，你可以在grep語句中添加invert = TRUE。 – Dason 2012-01-03 19:55:06

謝謝@Dason。很好的提醒。這比重複使用'！grepl（）'要容易得多！ – 2012-01-03 20:03:32

非常好，謝謝！ – srmulcahy 2012-01-03 20:09:56

這裏的一種方式：

l <- strsplit(names," ") 
splitCaps <- function(x){ 
    ind <- x == toupper(x) 
    list(upper = paste(x[ind],collapse = " "), 
     lower = paste(x[!ind],collapse = " ")) 
} 

> lapply(l,splitCaps) 
[[1]] 
[[1]]$upper 
[1] "DOE" 

[[1]]$lower 
[1] "John" 


[[2]] 
[[2]]$upper 
[1] "VAN DYKE" 

[[2]]$lower 
[1] "Dick" 


[[3]] 
[[3]]$upper 
[1] "SMITH" 

[[3]]$lower 
[1] "Mary Jane"

做筆記，不過，這有大規模需要提醒的是，如果你開始不尋常的字符集混合挑選使用toupper的全部大寫的話將是非常不可靠的，區域設置，符號等，但對於非常簡單的ASCII類型的情況，它應該可以正常工作。

來源

2012-01-03 19:55:07 joran

子集矢量列（根據情況）

回答

相關問題