如何有效R中

我只是在學習科學數據提取[R在data.table列從文本的數字，並用幾行從數據中提取號碼（使用data.table）：如何有效R中

library(stringr) 
library(data.table) 
prods[, weights := str_extract(NombreProducto, "([0-9]+)[kgKG]+")] 
prods[, weights := str_extract(weights, "[0-9]+")] 
prods[, weights := as.numeric(weights)]

這是我想提取數字/文字「NombreProducto」字段的示例：

"Tostado 210g CU BIM 1182"

有沒有一種簡單的方法在一個簡潔的一行來做到這一點？我試過

prods[, weights := str_match(NombreProducto, "([0-9]+)[kgKG]+")[2]]

但它將「權重」列中的所有內容都設置爲data.table的第一個結果。順便說一句，這是來自Grupo Bimbo Kaggle比賽。

來源

2016-08-30 wordsforthewise

目前尚不清楚你真正想要什麼。 – akrun

我想在第一行代碼的第一塊。 – wordsforthewise

試試'prods [，權重：= as.numeric（str_extract（NombreProducto，「（[0-9] +）（？=（kg | KG））」））]'' – akrun

不使用stringr，你可以只使用sub與".*?(\\d+)[kgKG].*"和回參考：

s = "Tostado 210g CU BIM 1182" 

sub(".*?(\\d+)[kgKG].*", "\\1", s) 
# [1] "210"

使用(\\d+)[kgKG]匹配位數字加字母k, K, g, G;
在模式之前和之後指定.*以便除模式之外的字符串可以被移除;
在第一個.*上使用?使未匹配的匹配，以便所有的三位數將被保留;
使用\\1來指代捕獲組(\\d+);

來源

2016-08-30 15:35:31 Psidom

我們可以使用這與stringr在一行中使用正則表達式lookarounds。

prods[, weights := as.numeric(str_extract(NombreProducto, "([0-9]+)(?=[kgKG])"))]

來源

2016-08-30 15:57:12 akrun

如何有效R中

回答

相關問題