儘管R studio顯示df_intrate確實是ASSET CLASS A的預期行數,但df_intrate中是否保留了df的結構信息?
是的。這是變量是如何分類的,被稱爲因素,被存入R - 無論是水平,所有可能的值的向量,並採取了實際值,存儲:
x = factor(c('a', 'b', 'c', 'a', 'b', 'b'))
x
# [1] a b c a b b
# Levels: a b c
y = x[1]
# [1] a
# Levels: a b c
可以擺脫未使用的水平與droplevels()
,或通過重新應用factor
功能,創建一個新的因素出來的唯一的東西存在:
droplevels(y)
# [1] a
# Levels: a
factor(y)
# [1] a
# Levels: a
您還可以使用droplevels
一個數據幀從所有的因素列刪除所有未使用的水平:
dat = data.frame(x = x)
str(dat)
# 'data.frame': 6 obs. of 1 variable:
# $ x: Factor w/ 3 levels "a","b","c": 1 2 3 1 2 2
str(dat[1, ])
# Factor w/ 3 levels "a","b","c": 1
str(droplevels(dat[1, ]))
# Factor w/ 1 level "a": 1
雖然無關,你目前的問題,我們還要提到的是factor
有一個可選的參數levels
可用於指定一個係數的水平和他們應該去的順序。如果你想要一個特定的順序(可能用於繪圖或建模),或者如果有更多可能的層次比實際存在的層次並且你想包含它們,這可能很有用。如果您未指定levels
,則默認將按字母順序排列。
x = c("agree", "disagree", "agree", "neutral", "strongly agree")
factor(x)
# [1] agree disagree agree neutral strongly agree
# Levels: agree disagree neutral strongly agree
## not a good order
factor(x, levels = c("disagree", "neutral", "agree", "strongly agree"))
# [1] agree disagree agree neutral strongly agree
# Levels: disagree neutral agree strongly agree
## better order
factor(x, levels = c("strongly disagree", "disagree", "neutral", "agree", "strongly agree"))
# [1] agree disagree agree neutral strongly agree
# Levels: strongly disagree disagree neutral agree strongly agree
## good order, more levels than are actually present
您可以使用?reorder
和?relevel
(或只是factor
再次)更改級別的順序對已創建的因素。
請注意,如果您使用'read.csv(path,as.is = TRUE)',那麼您將獲得字符列代替因子列。還要注意'header = TRUE'和'sep =',''''是'read.csv'的默認值,所以你不必指定它們。 –