2017-07-14 89 views
1

我有這個數據幀:如何基於列名稱對數據框進行子集劃分?

dput(df) 
structure(list(Server = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "servera", class = "factor"), 
    Date = structure(1:6, .Label = c("7/13/2017 15:01", "7/13/2017 15:02", 
    "7/13/2017 15:03", "7/13/2017 15:04", "7/13/2017 15:05", 
    "7/13/2017 15:06"), class = "factor"), Host_CPU = c(1.812950134, 
    2.288070679, 1.563278198, 1.925239563, 5.350669861, 2.612503052 
    ), UsedMemPercent = c(38.19, 38.19, 38.19, 38.19, 38.19, 
    38.22), jvm1 = c(10.91, 11.13, 11.34, 11.56, 11.77, 11.99 
    ), jvm2 = c(11.47, 11.7, 11.91, 12.13, 12.35, 12.57), jvm3 = c(75.65, 
    76.88, 56.93, 58.99, 65.29, 67.97), jvm4 = c(39.43, 40.86, 
    42.27, 43.71, 45.09, 45.33), jvm5 = c(27.42, 29.63, 31.02, 
    32.37, 33.72, 37.71)), .Names = c("Server", "Date", "Host_CPU", 
"UsedMemPercent", "jvm1", "jvm2", "jvm3", "jvm4", "jvm5"), class = "data.frame", row.names = c(NA, 
-6L)) 

我只希望能夠基於該變量的向量名子集這個數據幀:

select<-c("jvm3", "jvm4", "jvm5") 

所以,我最後的DF應該像這個:

structure(list(Server = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "servera", class = "factor"), 
    Date = structure(1:6, .Label = c("7/13/2017 15:01", "7/13/2017 15:02", 
    "7/13/2017 15:03", "7/13/2017 15:04", "7/13/2017 15:05", 
    "7/13/2017 15:06"), class = "factor"), Host_CPU = c(1.812950134, 
    2.288070679, 1.563278198, 1.925239563, 5.350669861, 2.612503052 
    ), UsedMemPercent = c(38.19, 38.19, 38.19, 38.19, 38.19, 
    38.22), jvm3 = c(75.65, 76.88, 56.93, 58.99, 65.29, 67.97 
    ), jvm4 = c(39.43, 40.86, 42.27, 43.71, 45.09, 45.33), jvm5 = c(27.42, 
    29.63, 31.02, 32.37, 33.72, 37.71)), .Names = c("Server", 
"Date", "Host_CPU", "UsedMemPercent", "jvm3", "jvm4", "jvm5"), class = "data.frame", row.names = c(NA, 
-6L)) 

有什麼想法嗎?

+1

解決的辦法是:'DF [選擇]' –

+2

'DF [C( 「服務器」, 「日期」, 「Host_CPU」, 「UsedMemPercent」,選擇)]'。或者,您可以使用'df [,c(「Server」, 「Date」,「Host_CPU」,「UsedMemPercent」,select)]'。或者'子集(select = c(「Server」,「Date」,「Host_CPU」,「UsedMemPercent」,select))'。有關詳細信息,請參閱'?subset'。或'?['。 – Gregor

+0

請注意,非常感謝您採取額外的措施將dput的輸出修改爲可直接粘貼到R中的內容。因此,如果你將它粘貼到'your_data < - {在這裏插入dput輸出}' – Dason

回答

1

保存你的數據幀給一個變量DF:

df <- 
    structure(
    list(
     Server = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "servera", class = "factor"), 
     Date = structure(
     1:6, 
     .Label = c(
      "7/13/2017 15:01", 
      "7/13/2017 15:02", 
      "7/13/2017 15:03", 
      "7/13/2017 15:04", 
      "7/13/2017 15:05", 
      "7/13/2017 15:06" 
     ), 
     class = "factor" 
    ), 
     Host_CPU = c(
     1.812950134, 
     2.288070679, 
     1.563278198, 
     1.925239563, 
     5.350669861, 
     2.612503052 
    ), 
     UsedMemPercent = c(38.19, 38.19, 38.19, 38.19, 38.19, 
         38.22), 
     jvm1 = c(10.91, 11.13, 11.34, 11.56, 11.77, 11.99), 
     jvm2 = c(11.47, 11.7, 11.91, 12.13, 12.35, 12.57), 
     jvm3 = c(75.65, 
       76.88, 56.93, 58.99, 65.29, 67.97), 
     jvm4 = c(39.43, 40.86, 
       42.27, 43.71, 45.09, 45.33), 
     jvm5 = c(27.42, 29.63, 31.02, 
       32.37, 33.72, 37.71) 
    ), 
    .Names = c(
     "Server", 
     "Date", 
     "Host_CPU", 
     "UsedMemPercent", 
     "jvm1", 
     "jvm2", 
     "jvm3", 
     "jvm4", 
     "jvm5" 
    ), 
    class = "data.frame", 
    row.names = c(NA,-6L) 
) 

df[,select]應該是什麼youre尋找

+0

這個答案不起作用 – user1471980

+0

@ user1471980如果你明顯地創建了'select',這個回答很好,但你沒有並沒有說明你還想保留其他幾個。 –

+1

@ user1471980是的,我誤解了你的問題,看起來像你需要:'cbind(df [,1:4],df [,select])' –

1

這裏有一種方法:

df[,c(1:4,7:9)]

您還可以使用dplyr選擇欄目:

select(df, Server,Date,Host_CPU,UsedMemPercent,jvm3,jvm4,jvm5)

4

請重新訪問索引。如果R中使用索引機構[,可以使用主要有三種類型的索引:

  • 邏輯矢量:長度相同的列數,TRUE手段選擇列
  • 數值向量 :選擇基於位置
  • 字符向量列:基於名稱選擇欄

如果您使用的數據幀索引機制,可以通過兩種方式處理這些對象:

  • 作爲一個列表,因爲它們是在內部列出
  • 作爲基質,因爲他們模擬天生在許多情況下,矩陣的行爲

iris數據框爲例,比較您可以從數據框中選擇列的多種方式。如果你把它當作一個列表,您有以下兩種選擇:

使用[[如果你想在矢量形式的單個列:

iris[["Species"]] 
# [1] setosa  setosa  setosa ... : is a vector 

使用[,如果你想一列或多列,但你需要一個回數據幀:

iris["Species"] 
iris[c("Sepal.Width", "Species")] 

如果你把它當作一個矩陣,你只是做同樣的,你會用一個矩陣做。如果不指定任何行索引,這些命令實際上是等同於上面所用的那些:

iris[ , "Species"] # is the same as iris[["Species"]] 
iris[ , "Species", drop = FALSE] # is the same as iris["Species"] 
iris[ , c("Sepal.Width", "Species")] # is the same as iris[c("Sepal.Width", "Species")] 

所以你的情況,你只需要:在子

select <- c("Server","Date","Host_CPU","UsedMemPercent", 
      "jvm3","jvm4","jvm5") 
df[select] 

注:子集的作品,但只能交互使用。有幫助頁面上的警告,指出:

This is a convenience function intended for use interactively. For programming it is better to use the standard subsetting functions like [, and in particular the non-standard evaluation of argument subset can have unanticipated consequences.

相關問題