2015-09-04 62 views
1

我試圖做的時候給予未定義列中選擇錯誤:合併()試圖安排列

我有一個包含狀態和郵政編碼列的主數據幀。我有基於郵政編碼和根據州的人口普查地區包含大都會統計區的參考數據框。我使用merge()將MSA和人口普查區域附加到主表單。我寫了一個可以自動完成的功能。由於其他事情我必須做的功能,我需要有merge()不重新排列列。我在這裏找到了一個解決方案:How to prevent merge from reordering columns

但是,我收到一個錯誤。

sheet <- merge(sheet, msa, by = "Zip", all.x = TRUE)[, union(names(sheet), names(msa))] 

Error in [.data.frame (merge(sheet, msa, by = "Zip", all.x = TRUE), , : undefined columns selected

當我刪除重新奧爾丁命令([,工會(名稱(片材),名稱(MSA))]),我不再得到的錯誤,但列重新排列。

但是,下面的命令幾乎是相同的,不給我的錯誤:

sheet <- merge(sheet, census, by = "State", all.x = TRUE)[, union(names(sheet), names(census))] 

我仔細檢查過這兩個表和兩者都包含名爲「ZIP」欄目。我還能做什麼?

事實上,我有以下命令將所有的郵政編碼轉換爲五位數字造成這種情況?

sheet$Zip <- substring(sheet$Zip, 1, 5) 

任何幫助將不勝感激。

以下是我正在使用的數據框。

萬事達片

structure(list(Source.File = structure(c(1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L), .Label = "DrBillAllergy & Immunology", class = "factor"), 
    Source.Date = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L), .Label = "8/31/2015", class = "factor"), First.Name = structure(c(5L, 
    3L, 4L, 1L, 2L, 9L, 6L, 8L, 7L), .Label = c("Bill", "Jack", 
    "Jacob", "Joan", "Jon", "Karen", "Marc", "Rich", "William" 
    ), class = "factor"), Last.Name = structure(c(9L, 2L, 7L, 
    6L, 5L, 4L, 3L, 1L, 8L), .Label = c("Biden", "Brown", "Bush", 
    "Clinton", "Edwards", "Jackson", "Johnson", "Sanders", "Smith" 
    ), class = "factor"), Suffix = structure(c(1L, 1L, 1L, 2L, 
    1L, 1L, 1L, 1L, 1L), .Label = c("", "III"), class = "factor"), 
    Title = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "MD", class = "factor"), 
    Gender = structure(c(1L, 2L, 1L, 2L, 2L, 1L, 1L, 2L, 2L), .Label = c("F", 
    "M"), class = "factor"), State = structure(c(1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L), .Label = "AL", class = "factor"), 
    Zip = structure(c(3L, 3L, 6L, 5L, 4L, 3L, 2L, 1L, 7L), .Label = c("35205-4731", 
    "35209-4159", "35209-6802", "35222-4330", "35235-3411", "35243-5129", 
    "36303-9461"), class = "factor"), County = structure(c(2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L), .Label = c("Houston", "Jefferson" 
    ), class = "factor"), Phone.Dr.Bill.Allergy = c(126874384L, 
    192749635L, 984357952L, 579752485L, NA, NA, 584127544L, 587458745L, 
    532145968L), Work.Phone.Dr.Bill.Allergy = c(NA, NA, NA, NA, 
    NA, NA, NA, NA, NA), Fax.Dr.Bill.Allergy = c(9874556324, 
    NA, NA, 987456321, NA, NA, 123654789, 123654788, 321456987 
    ), Email.Dr.Bill.Allergy = structure(c(1L, 2L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L), .Label = c("", "[email protected]"), class = "factor"), 
    Phone.internal.subspec = c(NA, NA, NA, NA, NA, NA, NA, NA, 
    NA), Work.Phone.Internal.Subspec = c(NA, NA, NA, NA, NA, 
    NA, NA, NA, NA), Fax.Internal.Subspec = c(NA, NA, NA, NA, 
    587412658L, NA, NA, NA, NA), Email.internal.subspec = structure(c(1L, 
    1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L), .Label = c("", "[email protected]" 
    ), class = "factor"), Phone.Combined.Allergy = c(NA, NA, 
    NA, NA, NA, NA, NA, NA, NA), Work.Phone.Combined.Allergy = c(NA, 
    NA, NA, NA, NA, NA, NA, NA, NA), Fax.Combined.Allergy = c(NA, 
    NA, NA, NA, NA, 598264572L, 987138582L, NA, NA), Email.Combined.Allergy = structure(c(1L, 
    1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L), .Label = c("", "[email protected]" 
    ), class = "factor"), Specialty1 = structure(c(1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L), .Label = "Allergy & Immunology", class = "factor"), 
    Specialty2 = structure(c(2L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 
    1L), .Label = c("", "Allergy"), class = "factor"), NPI = c(1467660217L, 
    1851347918L, 1851634489L, 1316960271L, 1437421328L, 1982886826L, 
    1285606202L, 1487708012L, 1972504397L), Specialty.Code = structure(c(2L, 
    1L, 2L, 1L, 1L, 1L, 1L, 2L, 1L), .Label = c("207K00000X", 
    "207KA0200X"), class = "factor"), MSA = c(1000L, 1000L, 1000L, 
    1000L, 1000L, 1000L, 1000L, 1000L, 2180L), License.Number = structure(c(3L, 
    5L, 4L, 1L, 7L, 9L, 2L, 6L, 8L), .Label = c("17948", "17992", 
    "29885", "4193", "6033", "8026", "8771", "AL16176", "MD200817" 
    ), class = "factor"), License.State = structure(c(1L, 1L, 
    1L, 1L, 1L, 2L, 1L, 1L, 1L), .Label = c("AL", "LA"), class = "factor"), 
    Medical.School = structure(c(1L, 6L, 3L, 4L, 5L, 1L, 1L, 
    1L, 2L), .Label = c("", "Hahnemann Univ Sch Of Med, Philadelphia Pa 19102", 
    "Temple Univ Sch Of Med, Philadelphia Pa 19140", "Univ Of Al Sch Of Med, Birmingham Al 35294", 
    "Univ Of Chicago, Pritzker Sch Of Med, Chicago Il 60637", 
    "Univ Of Mi Med Sch, Ann Arbor Mi 48109"), class = "factor"), 
    Residency.Training = structure(c(1L, 5L, 3L, 4L, 2L, 1L, 
    1L, 1L, 6L), .Label = c("", "Duke Univ Med Ctr, Internal Medicine; Kaiser Permanente Med Ctr-Sf, Flexible Or Transitional Year", 
    "Duke Univ Med Ctr, Pediatrics; Univ Hosp & Clinics, Pediatrics; Columbia Presbyterian Hosp, Flexible Or Transitional Year", 
    "Johns Hopkins Bayview M C, Internal Medicine", "Univ Of Al Hosp, Internal Medicine", 
    "Univ Of Pa Hlth System, Allergy And Immunology; Hahnemann Univ Hosp, Internal Medicine" 
    ), class = "factor"), Graduation.Year = c(NA, 1971L, 1968L, 
    1984L, 1971L, NA, NA, NA, 1985L), Certifications = structure(c(1L, 
    3L, 4L, 3L, 1L, 1L, 1L, 1L, 2L), .Label = c("", "Allergy & Immunology", 
    "Allergy & Immunology; Internal Medicine", "Allergy & Immunology; Pediatrics" 
    ), class = "factor")), .Names = c("Source.File", "Source.Date", 
"First.Name", "Last.Name", "Suffix", "Title", "Gender", "State", 
"Zip", "County", "Phone.Dr.Bill.Allergy", "Work.Phone.Dr.Bill.Allergy", 
"Fax.Dr.Bill.Allergy", "Email.Dr.Bill.Allergy", "Phone.internal.subspec", 
"Work.Phone.Internal.Subspec", "Fax.Internal.Subspec", "Email.internal.subspec", 
"Phone.Combined.Allergy", "Work.Phone.Combined.Allergy", "Fax.Combined.Allergy", 
"Email.Combined.Allergy", "Specialty1", "Specialty2", "NPI", 
"Specialty.Code", "MSA", "License.Number", "License.State", "Medical.School", 
"Residency.Training", "Graduation.Year", "Certifications"), class = "data.frame", row.names = c(NA, 
-9L)) 

MSA

structure(list(Zip = c("35209", "35243", "35235", "35222"), MSA = structure(c(2L, 
2L, 2L, 1L), .Label = c("Houston", "New York"), class = "factor")), .Names = c("Zip", 
"MSA"), row.names = c(NA, -4L), class = "data.frame") 

人口普查

structure(list(State = structure(c(22L, 31L, 47L, 35L, 39L, 32L, 
20L, 7L, 40L, 8L, 21L, 9L, 46L, 50L, 28L, 41L, 11L, 10L, 2L, 
26L, 43L, 18L, 3L, 37L, 44L, 19L, 29L, 42L, 30L, 17L, 24L, 13L, 
25L, 49L, 15L, 16L, 36L, 23L, 27L, 51L, 6L, 33L, 4L, 45L, 14L, 
34L, 5L, 38L, 48L, 1L, 12L), .Label = c("AK", "AL", "AR", "AZ", 
"CA", "CO", "CT", "DC", "DE", "FL", "GA", "HI", "IA", "ID", "IL", 
"IN", "KS", "KY", "LA", "MA", "MD", "ME", "MI", "MN", "MO", "MS", 
"MT", "NC", "ND", "NE", "NH", "NJ", "NM", "NV", "NY", "OH", "OK", 
"OR", "PA", "RI", "SC", "SD", "TN", "TX", "UT", "VA", "VT", "WA", 
"WI", "WV", "WY"), class = "factor"), Census.Region = structure(c(2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 
3L, 3L), .Label = c("Midwest", "Northeast", "Pacific", "South", 
"West"), class = "factor")), .Names = c("State", "Census.Region" 
), class = "data.frame", row.names = c(NA, -51L)) 
+0

沒有[可重現的示例](http://stackoverflow.com/a/5963610/4132844),我們可以做的事情不多。請至少提供表格,msa和普查的負責人 – scoa

+0

OP已編輯爲包含三個數據幀的測試示例。 –

+1

看看'str(merge(sheet,msa,by =「Zip」,all.x = TRUE))'; sheet和msa都有一個名爲'MSA'的列,因此在合併之後,它將成爲MSA.x和MSA.y.您應該重命名msa $ MSA,或者編輯從'union(names(sheet),names(msa))'到'c(names(sheet)[names(sheet)!=「MSA」],「MSA.x 「,」MSA.y「) – scoa

回答

0

這已經使用SCOA的建議解決。我將msa $ MSA重命名,並且該功能完美無缺。