薩姆列具有相似名稱

我有一個數字向量var具有名稱（來自predict.cv.glmnet輸出）薩姆列具有相似名稱

var<-c(5.74,0.00,0.15,0.00,0.04,0.00,0.00,0.00,1.81,0.00) 
names(var)<- cbind("(Intercept)","as.factor(holiday)1","as.factor(season)2","as.factor(season)3","as.factor(season)4","as.factor(weathersit)2", "as.factor(weathersit)3","windspeed","temp","hum") 

(Intercept) as.factor(holiday)1  as.factor(season)2  as.factor(season)3  as.factor(season)4  as.factor(weathersit)2 
    5.74    0.00     0.15      0.00     0.04     0.00 
as.factor(weathersit)3    windspeed     temp     hum 
      0.00      0.00      1.81     0.00

我想提取具有非零值，並且還聚集因子級別的變量名稱（即如果一個因子的至少一個水平不爲零，那麼應該包括整個因子，輸出應該省略因子水平。我正在尋找一段代碼，它會給我這個結果：

"(Intercept)"  "as.factor(season)"   "temp"

我也有一個因子名稱變量fac可供選擇：

fac<-c("as.factor(holiday)","as.factor(season)","as.factor(weathersit)") 


"as.factor(holiday)" "as.factor(season)"  "as.factor(weathersit)"

，並在心裏對類似名稱的骨料因素而忽略自己的水平，並檢查彙總因素的總和> 0，但我不能去對其進行編碼。

來源

2016-02-12 mknut

請考慮製作一個可重現的例子 – Sotos

我打得四處which和正則表達式：

var<-c(5.74,0.00,0.15,0.00,0.04,0.00,0.00,0.00,1.81,0.00) 
names(var)<- cbind("(Intercept)","as.factor(holiday)1","as.factor(season)2","as.factor(season)3","as.factor(season)4","as.factor(weathersit)2", "as.factor(weathersit)3","windspeed","temp","hum") 

X <- names(var)[which(var!=0)] 
n <- grep("as[.]factor.*", X) 
X[n] <- gsub(")[0-9]+$", ")", X[n]) 

X <- unique(X) 
X 

#[1] "(Intercept)"  "as.factor(season)" "temp"

which選擇非零分量。 grep用於查找因子的索引。然後gsub刪除因子水平。

來源

2016-02-12 13:32:22 mra68

謝謝你的答案。它爲所提供的例子做了工作。你知道我可以如何使用gsub來將這種情況概括爲因素級別不是數字的情況嗎？假設我的變量名是： 'names（var）< - cbind（「（Intercept）」，「as.factor（holiday）1」，「as.factor（season）winter」，「as.factor （季節）夏天「，」as.factor（weathersit）2「，」as.factor（weathersit）3「，」windspeed「，」temp「，」hum「）' – mknut

好的我嘗試過'X [n] < - gsub（「）。+ $」，「）」，X [n]）'，它似乎工作正常。 – mknut

薩姆列具有相似名稱

回答

相關問題