編輯:對不起,低質量的職位。我應該花更多的時間向你介紹這件事。該文章已被編輯,我已經爲整個事情添加了一個工作語法示例。感謝迄今爲止提供建議的所有人。使我的R功能更快
EDIT2:發現腳本在其他計算機上只有緩慢。可能是由一些本地問題或REPL引起的。
我做了這個功能。它生成帶標籤(標記爲&天堂包)數據框中的值的頻率表。它可以工作,但我打算在具有許多列的數據框架上使用它,並且我認爲它運行得有點慢;用戶可能會認爲R運行超過100列時發生崩潰,所以我想加快速度。
該腳本的目的是產生輸出,幫助我查找調查數據集中的處理錯誤。這有點費勁,因爲我想知道答案的頻率,並同時評估價值標籤的形狀。因此,此腳本爲每個變量生成一個頻率表,顯示頻率,未使用的標籤以及沒有值標籤的值。在查看腳本的輸出時,希望這會更清晰。
我將不勝感激,如果你能指出一些方法,使這個更高效:
# demonstration dataset
library(knitr)
library(data.table)
library(labelled)
df <- data.frame(q1 = rep(1:6, 3), q2 = rep(6:1, 3))
val_labels(df[, c("q1", "q2")]) <- c(YES = 1, MAYBE = 2, NO = 3, DK = 4, MISSING=5)
val_label(df$q2, 1) <- NULL
# Produce a frequency table over values and labels in a labelled-class dataframe object
# --------------------------------------------------------------------------------------------------
# Example: freqlab(ds[[1]]) or freqlab(ds[1:10]) or freqlab(ds)
# Wrong: freqlab(ds[1])
freqlab <- function(x){
# If the function is called on double brackets, eg. freqlab(ds[[11]])
if (!is.list(x)){
# Make a frequency distribution, put it in a data.table
xFreq <- data.table(table(x))
names(xFreq) <- c("Value", "Frequency")
class(xFreq[[1]]) <- "numeric"
setkey(xFreq, Value)
# Put the value labels in another data.table
if (!is.null(val_labels(x))){
xLab <- data.table(val_labels(x), names(val_labels(x)))
names(xLab) <- c("Value", "Label")
setkey(xLab, Value)
} else {
# If the variable does not have labels, create one to avoid errors
xLab <- data.table(xFreq[[1,1]], "** UNLABELLED **")
names(xLab) <- c("Value", "Label")
setkey(xLab, Value)
}
# Perform a FULL OUTER JOIN
outTable <- merge(xFreq, xLab, all = TRUE)
# Arrange values in ascending order of absolute value
outTable <- arrange(outTable, abs(outTable[[1]]))
# Edit the Label column for value cases with no label
outTable[[2]][is.na(outTable[[2]])] <- 0
outTable[[3]][is.na(outTable[[3]])] <- "** UNLABELLED **"
# If the output has more than 25 rows, cut it short
if (dim(outTable)[1] > 25){
outTable <- outTable[1:25]
}
# Output the table
print(kable(outTable, format = "rst", align = "l"))
# If the function is called on a list of variables, eg. freqlab(ds[10:11]),
# do the same steps as above, looping through all the input variables
} else {
for (y in 1:length(x)){
xFreq <- data.table(table(x[[y]]))
names(xFreq) <- c("Value", "Frequency")
class(xFreq[[1]]) <- "numeric"
setkey(xFreq, Value)
if (!is.null(val_labels(x[[y]]))){
xLab <- data.table(val_labels(x[[y]]), names(val_labels(x[[y]])))
names(xLab) <- c("Value", "Label")
setkey(xLab, Value)
} else {
xLab <- data.table(xFreq[[1,1]], "** UNLABELLED **")
names(xLab) <- c("Value", "Label")
setkey(xLab, Value)
}
outTable <- merge(xFreq, xLab, all = TRUE)
outTable <- arrange(outTable, abs(outTable[[1]]))
outTable[[2]][is.na(outTable[[2]])] <- 0
outTable[[3]][is.na(outTable[[3]])] <- "** UNLABELLED **"
if (dim(outTable)[1] > 25){
outTable <- outTable[1:25]
}
# Extra information printed when function is called on a list of variables
cat("Name:\t", names(x[y]),"\n")
print(kable(outTable, format = "rst", align = "l"))
cat(rep("-", 80), sep='', "\n\n")
}
}
}
輸出的例子:
> freqlab(df)
Name: q1
===== ========= ================
Value Frequency Label
===== ========= ================
1 3 YES
2 3 MAYBE
3 3 NO
4 3 DK
5 3 MISSING
6 3 ** UNLABELLED **
===== ========= ================
--------------------------------------------------------------------------------
Name: q2
===== ========= ================
Value Frequency Label
===== ========= ================
1 3 ** UNLABELLED **
2 3 MAYBE
3 3 NO
4 3 DK
5 3 MISSING
6 3 ** UNLABELLED **
===== ========= ================
--------------------------------------------------------------------------------
您可以嘗試對其進行分析(使用RStudio)以查看您必須改進的部分。 –
關於大'for'循環,是否可以使用'apply'函數呢? –
你的功能是做什麼的?如果你想把一個頻率表寫入一個數據幀,爲什麼不只是'as.data.frame(table(x))'? –