如何將變量變爲定量？

我有一個數據矩陣（900列和5000行），我想要做一個pca ..如何將變量變爲定量？

該矩陣看起來非常好在Excel中（意味着所有的值都是定量的），但在我讀我的文件在R中，並嘗試運行pca代碼，我得到一個錯誤，說「下面的變量不是定量的」，我得到一個非定量變量列表。

所以一般來說，一些變量是定量的，有些不是。請參閱以下示例。當我檢查變量1時，它是正確和定量的..（隨機的一些變量在文件中是定量的）當我檢查變量2時，它是不正確的和非定量的..（隨機一些像這樣的變量是非 - 定量在文件中）

> data$variable1[1:5] 
[1] -0.7617504 -0.9740939 -0.5089303 -0.1032487 -0.1245882 

> data$variable2[1:5] 
[1] -0.183546332959017 -0.179283451229594 -0.191165669598284 -0.187060515423038 
[5] -0.184409474669824 
731 Levels: -0.001841783473108 -0.001855956210119 ... -1,97E+05

所以我的問題是，我怎麼能將所有的非定量變量轉化爲定量？

縮短文件並沒有幫助，因爲這些值本身就是定量的。我不知道發生了什麼事。所以這裏是我的原始文件鏈接< - https://docs.google.com/file/d/0BzP-YLnUNCdwakc4dnhYdEpudjQ/edit

我也試過下面給出的答案，但它仍然沒有幫助。

那麼讓我告訴正是我做了什麼，

> data <- read.delim("file.txt", header=T) 
> res.pca = PCA(data, quali.sup=1, graph=T) 
Error in PCA(data, quali.sup = 1, graph = T) : 
The following variables are not quantitative: batch 
The following variables are not quantitative: target79 
The following variables are not quantitative: target148 
The following variables are not quantitative: target151 
The following variables are not quantitative: target217 
The following variables are not quantitative: target266 
The following variables are not quantitative: target515 
The following variables are not quantitative: target530 
The following variables are not quantitative: target587 
The following variables are not quantitative: target620 
The following variables are not quantitative: target730 
The following variables are not quantitative: target739 
The following variables are not quantitative: target801 
The following variables are not quantitative: target803 
The following variables are not quantitative: target809 
The following variables are not quantitative: target819 
The following variables are not quantitative: target868 
The following variables a 
In addition: There were 50 or more warnings (use warnings() to see the first 50)

來源

2013-02-28 Letin

我可能是錯的，但我懷疑97E + 05做的伎倆。檢查包含諸如非數字之類的東西的條目。您是否以CSV格式導出？ – 2013-02-28 09:58:26

@ sebastian-c我現在刪除文件中的所有值與「E」（如-1,97E + 05）..我仍然得到相同的錯誤..我把它導出爲「文本製表符分隔」..另一件事情是，檢查變量1和變量2的差異。量化變量很短，非定量變量很長。 – Letin 2013-02-28 10:08:06

您的數據如何從Excel轉換爲R？這是你在變量2中的一個因素。 – themel 2013-02-28 10:09:08

考慮R的變量因素，如阿倫提及。因此它會生成一個data.frame（實際上是一個列表）。有許多方法可以解決這個問題，可以通過以下方式將其轉換爲數據矩陣;

matrix <- as.numeric(as.matrix(data)) 
dim(matrix) <- dim(data)

現在，您可以在矩陣上運行PCA。

編輯：

擴展的例子了一下，查理的建議的第二部分將無法工作。複製下面的會話，看看它是如何工作的;

d <- data.frame(
a = factor(runif(2000)), 
b = factor(runif(2000)), 
c = factor(runif(2000))) 

as.numeric(d) #does not work on a list (data frame is a list) 

as.numeric(d$a) # does work, because d$a is a vecor, but this is not what you are 
# after. R converts the factor levels to numeric instead of the actual value. 

(m <- as.numeric(as.matrix(d))) # this does the rigth thing 
dim(m)      # but m loses the dimensions and is now a vector 

dim(m) <- dim(d)    # assign the dimensions of d to m 

svd(m)      # you can do the PCA function of your liking on m

來源

2013-02-28 11:07:21 Edwin

謝謝埃德溫。讓我試試這個，然後回來。我只是花時間重新運行我對文件的分析並回到特定的錯誤。並且還會鏈接到我的文件。讓我回過頭來說一下它是否可行。 – Letin 2013-02-28 11:13:06

默認情況下，R將字符串強制爲因子。這可能會導致意外的行爲。關閉此默認選項有：

 read.csv(x, stringsAsFactors=F)

可以，或者，強制因素與數字

 newVar<-as.numeric(oldVar)

來源

2013-02-28 11:18:56 charlie

嘿查理，謝謝你的回覆。但它在這裏說file_new < - as.numeric（文件）錯誤：（列表）對象不能被強制輸入'double' – Letin 2013-02-28 12:22:16

由於對象'file_new'是用類dataframe創建的，因此會出現該錯誤，因爲某些變量是數字，有些是字符。（用'class（file_new）'檢查） – 2013-02-28 12:55:46

你是對的。我應該更清楚。你不能強制整個數據幀。而且，正如埃德溫正確指出的那樣，你可能不想。根據我的經驗，默認轉換爲read.table（）中的因子會導致頭痛。我設置了我的編輯器，默認輸入「stringsAsFactor = FALSE」。 – charlie 2013-02-28 21:34:15

如何將變量變爲定量？

回答

相關問題