我一直在尋找一個針對我的問題的直觀解決方案。 我有一個巨大的單詞列表,其中我必須根據一些條件插入一個特殊字符。 因此,如果兩/三個字母詞出現在一個小區,我想加上「+」左右吧根據現有字詞在R中插入特殊字符
例
global b2b banking
會轉化爲global +b2b+ banking
how to finance commercial ale estate
會轉化爲how +to+ finance commercial +ale+ estate
下面是示例數據集:
sample <- c("commercial funding",
"global b2b banking"
"how to finance commercial ale estate"
"opening a commercial account",
"international currency account",
"miami imports banking",
"hsbc supply chain financing",
"international business expansion",
"grow business in Us banking",
"commercial trade Asia Pacific",
"business line of credits hsbc",
"Britain commercial banking",
"fx settlement hsbc",
"W Hotels")
data <- data.frame(sample)
此外,是否可以刪除具有長度爲1的字符的行? 實施例:
W Hotels
對於所有的單字母字我試圖與GSUB除去它們,
gsub(" *\\b[[:alpha:]]{1,1}\\b *", " ", sample)
這應該從設置的數據集合中移除。
任何幫助,高度讚賞。
編輯1
感謝您的幫助,我添加了幾行吧:
sample <- c("commercial funding", "global b2b banking", "how to finance commercial ale estate", "opening a commercial account","international currency account","miami imports banking","hsbc supply chain financing","international business expansion","grow business in Us banking", "commercial trade Asia Pacific","business line of credits hsbc","Britain commercial banking","fx settlement hsbc", "W Hotels")
sample <- sample[!grepl("\\b[[:alpha:]]\\b",sample)]
sample <- gsub("\\b([[:alpha:][:digit:]]{2,3})\\b", "+\\1+", sample)
sample <- gsub(" ",",",sample)
sample <- gsub("+,","+",sample)
sample <- gsub(",+","+",sample)
sample <- tolower(sample)
sample <- ifelse(substr(sample, 1, 1) == "+", sub("^.", "", sample), sample)
data <- data.frame(sample)
data
sample
1 commercial++funding
2 global+++b2b+++banking
3 how++++to+++finance++commercial+++ale+++estate
4 international++currency++account
5 miami++imports++banking
6 hsbc++supply++chain++financing
7 international++business++expansion
8 grow++business+++in++++us+++banking
9 commercial++trade++asia++pacific
10 business++line+++of+++credits++hsbc
11 britain++commercial++banking
12 fx+++settlement++hsbc
不知怎的,我無法刪除 「+」 與 「」 與GSUB?我究竟做錯了什麼 ? 所以"fx+,settlement,hsbc"
應該是"fx+settlement,hsbc"
,但它正在取代,另外還有++。
所以,你的意思是你想刪除包含整個單詞只由一個字母的任何項目? –
是的,所以任何一行如果它有多個單詞,但如果一個單詞有一個長度,我想刪除該行,然後剩下的我想在兩個字母和三個字母單詞之前和之後添加特殊字符「+」。 – PSraj
好,那麼,你有什麼嘗試? –