解析列和R中

這裏識別欄是我的DF的例子：解析列和R中

data 
276 '83 Rally '83 (1983) (V)\t\t\t\t1983 
277 '87: A Love Story (2007)\t\t\t\t2007                         
278 '88 Dodge Aries (2002)\t\t\t\t\t2002 
279 '9': Acting Out (2009) (V)\t\t\t\t2009

我想創建只顯示標題和一年的數據幀。有沒有人有任何建議如何去解析這個？我想我可能需要在\t\t\t\t

 Title    Year 
276 '83 Rally '83  (1983) 
277 '87: A Love Story (2007)                        
278 '88 Dodge Aries (2002) 
279 '9': Acting Out (2009)

這裏分裂列是dput

c("# (2014)\t\t\t\t\t\t2014", "#1 (2005)\t\t\t\t\t\t2005", "#1 (2009)\t\t\t\t\t\t2009", 
"#1 (2010)\t\t\t\t\t\t2010", "#1 (2010/I) (V)\t\t\t\t\t\t2010", 
"#1 (2010/II) (V)\t\t\t\t\t2010")

來源

2017-02-13 Jmira2312

您目前有多少列？ 1？ [A'dput'會有幫助。]（http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610） – alistaire

其實，你的例子不要讓你的數據結構明顯。您能否以展示結構的方式提供您的數據？請使用'dput（df）'並將結果粘貼到您的問題中。如果你的數據很長，可以使用'dput（head（df））'' – G5W

@alistaire @ G5W我現在只有一列名爲'data'。它包含了電影信息的字符串（標題，發佈日期）我不熟悉dput，但是我運行了這個：'dput（head（df））'，我會把輸出放在問題中。 – Jmira2312

使用gsub():

df$Title <- gsub("(.*?) \\(.*", "\\1", df$data) 
df$Year <- gsub(".*\\((\\d{4})\\).*", "\\1", df$data) 

> df[c("Title", "Year")] 
        Title Year 
1  276 '83 Rally '83 1983 
2 277 '87: A Love Story 2007 
3 278 '88 Dodge Aries 2002 
4 279 '9': Acting Out 2009

注意：如果data實際上是一個獨立的載體，那麼就直接使用它，例如

Title <- gsub("(.*?) \\(.*", "\\1", data)

這裏是用於提取年正則表達式的解釋：

.*  match everything 
\\(  up until the first parenthesis 
(\\d{4}) then capture a four digit year 
\\)  followed by a closing parenthesis 
.*  consume the remainder of the string

在gsub()用作替換量\\1使用將其在比賽期間捕獲的四位數年。

來源

2017-02-13 02:15:19

謝謝。當我嘗試你的代碼時，我得到這個錯誤：'df $ data中的錯誤：$操作符對原子向量無效' – Jmira2312

聽起來'data'不是數據幀，它只是一個字符串向量。在這種情況下，只需在上面給出的代碼片段中將'df $ data'替換爲'data'即可。 –

謝謝，我修正了我的數據，現在在df中。你介意解釋你的正則表達式 - 特別是'df $ year'中使用的正則表達式 – Jmira2312

解析列和R中

回答

相關問題