R：如何替換字符串中兩個字符之間的點

我正在處理大量舊的文本材料。 OCR過程經常會提出「。」總之，例如「t.h.i.s i.s a test」。我想用空白的空格替換這些點「」。但我不想擺脫表示句子結束的點。所以我正在尋找一個尋找字母/點/字母的正則表達式，然後將其替換爲無。R：如何替換字符串中兩個字符之間的點

test <- "t.h.i.s i.s a test." 
    gsub(test, pattern="\\w[[:punct:]]\\w", replacement="")

不過這是結果

". a test."

任何建議表示讚賞。

來源

2016-04-15 user3604060

這種方法不好：如果有'5.6'會怎麼樣？ –

From [here]（http://stackoverflow.com/questions/8747671/regex-remove-all-matches-leaving-the-last）：'gsub（「[\\。]（？！\\ d * $ ）「，」「，test，perl = TRUE）'的作品。熟悉正則表達式的人可以解釋爲什麼嗎？（我不能） –

paste0(gsub('\\.', '', test), '.') 
#[1] "this is a test."

爲了使這個醜陋與更多的句子來工作，

paste(paste0(gsub('\\.', '', unlist(strsplit(test, '\\. '))), '.'), collapse = ' ') 
#[1] "this is a test. With another sentence."

來源

2016-04-15 12:58:41 Sotos

如果輸入字符串中有幾個句子？ –

嘗試'「T.h.i.s是美國的州。」我認爲目前的問題不存在100％的安全解決方案。 –

我認爲你是對的。特別是如果還有小數......正如你在評論中提到 – Sotos

你可以做相反的，即不是在字符串的中間點的句子中提取的一切：

require(stringr) 
test <- "t.h.i.s i.s a test." 
paste0(str_extract_all(test, "[^\\.]|(\\.$)")[[1]], collapse = "") 

[1] "this is a test."

如果你想包括多個句子的可能性，我們可以假設一個點後跟一個空格是允許的，那麼你可以使用：

test <- "t.h.i.s i.s a test. With a.n.other sen.t.ence." 
paste0(str_extract_all(test, "[^\\.]|(\\.$)|(\\.)")[[1]], collapse = "") 

[1] "this is a test. With another sentence."

來源

2016-04-15 13:04:04 radiumhead

試試'「T.h.i.s是美國的州。」我認爲目前的問題不存在100％的安全解決方案。 –

謝謝。你能告訴我如何用gsub函數做到這一點。我正在使用lapply在大量文本上使用gsub，並使用paste0將所有內容放入單個文本中。但我不確定如何將你的建議轉換成正確的lapply命令，現在我已經是lapply了（text，gsub，pattern =「這裏放什麼」，replacement =「」） – user3604060

這裏是我最好的猜測，並就如何進一步加強模式的建議：

> test = "T.h.i.s is a U.S. state. I drove 5.5 miles. Mr. Smith know English, French, etc. and can drive a car." 
> gsub("\\b((?:U[.]S|etc|M(?:r?s|r))[.]||\\d+[.]\\d+)|[.](?!$|\\s+\\p{Lu})", "\\1", test, perl=T) 
[1] "T.h.i.s is a U.S. state. I drove 5.5 miles. Mr. Smith know English, French, etc. and can drive a car."

見regex demo

說明：

\b((?:U[.]S|etc|M(?:r?s|r))[.]|\d+[.]\d+) - 比賽我們將通過r中的\1反向引用來恢復異常放置部分。這部分匹配U.S.，etc.，Mr.，Ms.，Mrs.，ditits+.digits和可以增強
| - 或
[.](?!$|\s+\p{Lu}) - 匹配後面沒有字符串（$）或端部的點1+空格後跟一個大寫字母（\s+\p{Lu}）

來源

2016-04-15 13:42:20

R：如何替換字符串中兩個字符之間的點

回答

相關問題