2016-02-12 80 views
0

我有一個數字向量var具有名稱(來自predict.cv.glmnet輸出)薩姆列具有相似名稱

var<-c(5.74,0.00,0.15,0.00,0.04,0.00,0.00,0.00,1.81,0.00) 
names(var)<- cbind("(Intercept)","as.factor(holiday)1","as.factor(season)2","as.factor(season)3","as.factor(season)4","as.factor(weathersit)2", "as.factor(weathersit)3","windspeed","temp","hum") 

(Intercept) as.factor(holiday)1  as.factor(season)2  as.factor(season)3  as.factor(season)4  as.factor(weathersit)2 
    5.74    0.00     0.15      0.00     0.04     0.00 
as.factor(weathersit)3    windspeed     temp     hum 
      0.00      0.00      1.81     0.00 

我想提取具有非零值,並且還聚集因子級別的變量名稱(即如果一個因子的至少一個水平不爲零,那麼應該包括整個因子,輸出應該省略因子水平。我正在尋找一段代碼,它會給我這個結果:

"(Intercept)"  "as.factor(season)"   "temp" 

我也有一個因子名稱變量fac可供選擇:

fac<-c("as.factor(holiday)","as.factor(season)","as.factor(weathersit)") 


"as.factor(holiday)" "as.factor(season)"  "as.factor(weathersit)" 

,並在心裏對類似名稱的骨料因素而忽略自己的水平,並檢查彙總因素的總和> 0,但我不能去對其進行編碼。

+0

請考慮製作一個可重現的例子 – Sotos

回答

0

我打得四處which和正則表達式:

var<-c(5.74,0.00,0.15,0.00,0.04,0.00,0.00,0.00,1.81,0.00) 
names(var)<- cbind("(Intercept)","as.factor(holiday)1","as.factor(season)2","as.factor(season)3","as.factor(season)4","as.factor(weathersit)2", "as.factor(weathersit)3","windspeed","temp","hum") 

X <- names(var)[which(var!=0)] 
n <- grep("as[.]factor.*", X) 
X[n] <- gsub(")[0-9]+$", ")", X[n]) 

X <- unique(X) 
X 

#[1] "(Intercept)"  "as.factor(season)" "temp" 

which選擇非零分量。 grep用於查找因子的索引。然後gsub刪除因子水平。

+0

謝謝你的答案。它爲所提供的例子做了工作。你知道我可以如何使用gsub來將這種情況概括爲因素級別​​不是數字的情況嗎?假設我的變量名是: 'names(var)< - cbind(「(Intercept)」,「as.factor(holiday)1」,「as.factor(season)winter」,「as.factor (季節)夏天「,」as.factor(weathersit)2「,」as.factor(weathersit)3「,」windspeed「,」temp「,」hum「)' – mknut

+0

好的我嘗試過'X [n] < - gsub(「)。+ $」,「)」,X [n])',它似乎工作正常。 – mknut