從矢量中提取唯一的部分元素

我需要從下面的文件夾的內容中列出唯一的主題ID（_之前和之後的部分）。從矢量中提取唯一的部分元素

[1] "."      "./4101_0"    "./4101_0/4101 Baseline" 
[4] "./4101_1"    "./4101_2"    "./4101_2_2"    
[7] "./4101_3"    "./4101_4"    "./4101_5"    
[10] "./4101_6"

現在我正在這樣做（使用包stringr和foreach）。

# Create list of contents 
Folder.list <- list.dirs() 
# Split entries by the "/" 
SubIDs <- str_split(Folder.list, "/") 
# For each entry in the list, retrieve the second element 
SubIDs <- unlist(foreach(i=1:length(SubIDs)) %do% SubIDs[[i]][2]) 
# Split entries by the "_" 
SubIDs <- str_split(SubIDs, "_") 
# Take the second element after splitting, unlist it, find the unique entries, remove the NA and coerce to numeric 
SubIDs <- as.numeric(na.omit(unique(unlist(foreach(i=1:length(SubIDs)) %do% SubIDs[[i]][1]))))

這樣做的工作，但似乎不必要的可怕。從A點到B點的乾淨方式是什麼？

來源

2014-09-10 Krysta

stringr還具有str_extract功能，其可被用來提取匹配正則表達式的子字符串。對於/和_的積極向前看，你可以實現你的目標。

與@ Andrie的x開始：

str_extract(x, perl('(?<=/)\\d+(?=_)')) 

# [1] NA  "4101" "4101" "4101" "4101" "4101" "4101" "4101" "4101" "4101"

的圖案上方匹配由斜線之前和之後下劃線的一個或多個標記（即\\d+）。環視圖需要使用perl()環繞圖案。

來源

2014-09-10 13:58:59 jbaums

使用q正則表達式。使用正則表達式的

x <- c(".", "./4101_0", "./4101_0/4101 Baseline", "./4101_1", "./4101_2", "./4101_2_2", "./4101_3", "./4101_4", "./4101_5", "./4101_6")

的一種方法是使用gsub()提取主題代碼

gsub(".*/(\\d+)_.*", "\\1", x) 
[1] "." "4101" "4101" "4101" "4101" "4101" "4101" "4101" "4101" "4101"

來源

2014-09-10 13:35:15 Andrie

@Krysta：不過，請注意，如果找不到特定元素的模式，那麼該元素的原始字符串將不會被修改（如'.'）。 – jbaums 2014-09-10 14:03:32

從矢量中提取唯一的部分元素

回答

相關問題