2015-06-21 96 views
1

我想刪除在某個字符匹配THE ENDFINIS後出現的任何文本。我知道這與其他topic非常相似,但我在正則表達式方面還不夠熟練,無法爲我工作。R:刪除匹配字符串後的文本結尾

我的文本是從古騰堡項目採取的莎士比亞書籍。他們通常看起來像

txt <- "... thou hast tam'd a curst shrow. LUCENTIO. 'Tis a wonder, 
    by your leave, she will be tam'd so. Exeunt THE END <<THIS ELECTRONIC VERSION OF THE 
    COMPLETE WORKS OF WILLIAM ..." 

txt <- "... thou hast tam'd a curst shrow. LUCENTIO. 'Tis a wonder, 
    by your leave, she will be tam'd so. Exeunt FINIS <<THIS ELECTRONIC VERSION OF THE 
    COMPLETE WORKS OF WILLIAM ..." 

我的理想看起來像gsub("^[THE END]*|^[FINIS]*", "", txt)回到"... thou hast tam'd a curst shrow. LUCENTIO. 'Tis a wonder, by your leave, she will be tam'd so. Exeunt

回答

3

你是相當接近做到這一點,你必須使用:

gsub("(THE END|FINIS).*", "", txt) 

Working demo

順便說一句,作爲thelatemail指出他的評論與sub就足夠一個替代。

+0

'sub'應該夠了,因爲只有一個替代品。 – thelatemail