2017-09-25 74 views
-3

之間的區別我想做與R功能匹配的傾向分數匹配,如果我從csv文件讀取數據,eveything看起來很好,結果是我想要的:as.data.frame和read.csv在R

> csv <- read.csv("C:/Users/Lenovo/Desktop/ddd.csv", header=TRUE) 
> df <- as.data.frame(csv) 
> df 
    PERSON_ID OUTCOME tnb gxy AGE1 
1  166920  1 2 0 61 
2  167350  1 2 0 65 
3  167757  1 1 0 58 
4  167812  1 1 0 63 
5  168271  1 2 0 55 
6  168426  0 2 0 47 
7  168652  0 2 1 57 
8  168983  0 1 0 51 
9  169083  0 2 0 50 
10 169172  0 2 1 53 
> fm <- matchit(OUTCOME ~ tnb + AGE1, data = df, method = "nearest") 
> result <- summary(fm) 
> result 

Call: 
matchit(formula = OUTCOME ~ tnb + AGE1, data = df, method = "nearest") 

Summary of balance for all data: 
     Means Treated Means Control SD Control Mean Diff eQQ Med eQQ Mean eQQ Max 
distance  0.8334  0.1666  0.2575 0.6667 0.867 0.6667 0.8964 
tnb    1.6000  1.8000  0.4472 -0.2000 0.000 0.2000 1.0000 
AGE1   60.4000  51.6000  3.7148 8.8000 8.000 8.8000 10.0000 


Summary of balance for matched data: 
     Means Treated Means Control SD Control Mean Diff eQQ Med eQQ Mean eQQ Max 
distance  0.8334  0.1666  0.2575 0.6667 0.867 0.6667 0.8964 
tnb    1.6000  1.8000  0.4472 -0.2000 0.000 0.2000 1.0000 
AGE1   60.4000  51.6000  3.7148 8.8000 8.000 8.8000 10.0000 

Percent Balance Improvement: 
     Mean Diff. eQQ Med eQQ Mean eQQ Max 
distance   0  0  0  0 
tnb    0  0  0  0 
AGE1    0  0  0  0 

Sample sizes: 
      Control Treated 
All    5  5 
Matched   5  5 
Unmatched  0  0 
Discarded  0  0 

但是如果我使用數組保持輸入數據,然後將它們轉換爲data.frame,結果矩陣具有許多行其行名稱是不是我定義:

> OUTCOME<-c("1", "1", "1", "1", "1", "0", "0", "0", "0", "0"); 
> PERSON_ID<-c("166920", "167350", "167757", "167812", "168271", "168426", "168652", "168983", "169083", "169172"); 
> tnb<-c("0", "0", "1", "0", "1", "0", "0", "1", "1", "0"); 
> gxy<-c("0", "0", "1", "0", "0", "1", "0", "0", "1", "0"); 
> AGE1<-c("61", "65", "58", "63", "55", "47", "57", "51", "50", "53"); 
> matrix <- cbind(PERSON_ID,OUTCOME,tnb,gxy,AGE1) 
> data <- as.data.frame(matrix, stringsAsFactors= TRUE) 
> data 
    PERSON_ID OUTCOME tnb gxy AGE1 
1  166920  1 0 0 61 
2  167350  1 0 0 65 
3  167757  1 1 1 58 
4  167812  1 0 0 63 
5  168271  1 1 0 55 
6  168426  0 0 1 47 
7  168652  0 0 0 57 
8  168983  0 1 0 51 
9  169083  0 1 1 50 
10 169172  0 0 0 53 
> fm <- matchit(OUTCOME ~ tnb + gxy + AGE1, data = data, method = "nearest", replace = TRUE, ratio = 1) 
> summary(fm) 

Call: 
matchit(formula = OUTCOME ~ tnb + gxy + AGE1, data = data, method = "nearest", 
    replace = TRUE, ratio = 1) 

Summary of balance for all data: 
     Means Treated Means Control SD Control Mean Diff eQQ Med eQQ Mean eQQ Max 
distance   1.0   0.0  0.0000  1.0  1  1.0  1 
tnb0    0.6   0.6  0.5477  0.0  0  0.0  0 
tnb1    0.4   0.4  0.5477  0.0  0  0.0  0 
gxy1    0.2   0.4  0.5477  -0.2  0  0.2  1 
AGE150    0.0   0.2  0.4472  -0.2  0  0.2  1 
AGE151    0.0   0.2  0.4472  -0.2  0  0.2  1 
AGE153    0.0   0.2  0.4472  -0.2  0  0.2  1 
AGE155    0.2   0.0  0.0000  0.2  0  0.2  1 
AGE157    0.0   0.2  0.4472  -0.2  0  0.2  1 
AGE158    0.2   0.0  0.0000  0.2  0  0.2  1 
AGE161    0.2   0.0  0.0000  0.2  0  0.2  1 
AGE163    0.2   0.0  0.0000  0.2  0  0.2  1 
AGE165    0.2   0.0  0.0000  0.2  0  0.2  1 


Summary of balance for matched data: 
     Means Treated Means Control SD Control Mean Diff eQQ Med eQQ Mean eQQ Max 
distance   1.0   0.0  0.0000  1.0  1.0  1.0  1 
tnb0    0.6   0.8  0.5657  -0.2  0.0  0.0  0 
tnb1    0.4   0.2  0.5657  0.2  0.0  0.0  0 
gxy1    0.2   0.8  0.5657  -0.6  0.0  0.0  0 
AGE150    0.0   0.0  0.0000  0.0  0.0  0.0  0 
AGE151    0.0   0.2  0.5657  -0.2  0.5  0.5  1 
AGE153    0.0   0.0  0.0000  0.0  0.0  0.0  0 
AGE155    0.2   0.0  0.0000  0.2  0.5  0.5  1 
AGE157    0.0   0.0  0.0000  0.0  0.0  0.0  0 
AGE158    0.2   0.0  0.0000  0.2  0.5  0.5  1 
AGE161    0.2   0.0  0.0000  0.2  0.5  0.5  1 
AGE163    0.2   0.0  0.0000  0.2  0.5  0.5  1 
AGE165    0.2   0.0  0.0000  0.2  0.5  0.5  1 

Percent Balance Improvement: 
     Mean Diff. eQQ Med eQQ Mean eQQ Max 
distance   0  0  0  0 
tnb0   -Inf  0  0  0 
tnb1   -Inf  0  0  0 
gxy1   -200  0  100  100 
AGE150   100  0  100  100 
AGE151   0 -Inf  -150  0 
AGE153   100  0  100  100 
AGE155   0 -Inf  -150  0 
AGE157   100  0  100  100 
AGE158   0 -Inf  -150  0 
AGE161   0 -Inf  -150  0 
AGE163   0 -Inf  -150  0 
AGE165   0 -Inf  -150  0 

Sample sizes: 
      Control Treated 
All    5  5 
Matched   2  5 
Unmatched  3  0 
Discarded  0  0 

我的問題是: read.csv返回一個數據幀,as.data.frame(x)也返回一個數據fra我爲什麼R的matchit輸出結果不同?

+0

請將您的csv格式化爲顯示在表格中,以便在您的問題中查看 – user93

回答

0

「我的問題是:read.csv返回一個數據幀,as.data.frame(x)也返回一個數據幀,爲什麼R的matchit輸出結果不同?」

當您使用read.csv時,您的數值數據可能會被讀入,matchit會將它們視爲數字。但是,當你聲明變量的角色:在代替作爲數字

AGE1<-c("61", "65", "58", "63", "55", "47", "57", "51", "50", "53") 

AGE1<-c(61, 65, 58, 63, 55, 47, 57, 51, 50, 53) 

matchit將它們視爲絕對的。

運行str(data)str(df)應該告訴你這個區別。