2016-04-27 62 views
0

我處理包含字符串如下[R字符串解析挑戰

 Col1 
     ------------------------------------------------------------------ 
     Department of Mechanical Engineering, Department of Computer Science 
     Division of Advanced Machining, Center for Mining and Metallurgy 
     Department of Aerospace, Center for Science and Delivery 

我所試圖做的是包含單詞開始,要麼,部門或Divison或中心,直至逗號(單獨字符串列, )最終輸出應該看起來像這樣

 Dept_Mechanical_Eng Dept_Computer_Science Div_Adv_Machining Cntr_Mining_Metallurgy Dept_Aerospace Cntr_Science_Delivery 
     1      1      0     0      0    0 
     0      0      1     1      0    0 
     0      0      1     1      1    1 

我在預期的輸出中爲了審美目的而屠殺了實際名稱。任何幫助解析這個字符串非常感謝。

+4

'library(splitstackshape); cSplit_e(mydf,「Col1」,「,」,type =「character」,drop = TRUE,fill = 0)'。也可以從「qdapTools」中查看'strsplit' +'mtabulate'。 – A5C1D2H2I1M1N2O1R2T1

回答

0

這與我剛剛列表另一個文本示例的問題非常相似。你和這位提問者在同一班嗎? Count the number of times (frequency) a string occurs

inp <- "Department of Mechanical Engineering, Department of Computer Science 
     Division of Advanced Machining, Center for Mining and Metallurgy 
     Department of Aerospace, Center for Science and Delivery" 
inp2 <- factor(scan(text=inp,what="",sep=",")) 
#Read 6 items 
inp3 <- readLines(textConnection(inp)) 

as.data.frame(setNames(lapply(levels(inp2), function(ll) as.numeric(grepl(ll, inp3))), trimws(levels(inp2)))) 
    Department.of.Aerospace Division.of.Advanced.Machining 
1      0        0 
2      0        1 
3      1        0 
    Center.for.Mining.and.Metallurgy Center.for.Science.and.Delivery 
1        0        0 
2        1        0 
3        0        1 
    Department.of.Computer.Science Department.of.Mechanical.Engineering 
1        1         1 
2        0         0 
3        0         0 
+0

啊:)謝謝42,工作。 –