您好我有一個數據集,看起來像這樣創建一個新的變量
bankname bankid year totass invloc1 invamt1 invloc2 invamt2 invloc3 invamt3
Bank A 1 1881 244789 Philadelphia 7250.32 New York 20218.20 Philadelphia 29513.4
Bank B 2 1881 195755 Pittsburgh 10243.60 NA 1851.51 NA NA
Bank C 3 1881 107736 New York 13357.80 Wilkes-Barre 17761.20 NA NA
Bank D 4 1881 170600 Philadelphia 3.35 Philadelphia 2.00 NA NA
Bank E 5 1881 32000000 New York 351266.00 New York 314012.00 NA
,但我想創建一個使用invloc和invamt每個銀行變量稱爲NY_tot
新變量。對於每家銀行而言,如果他們的調查是紐約,那麼就可以進行調整。 invloc1和invamt1一起去。因此,我希望這個數據集看起來像這樣。
bankname bankid year totass invloc1 invamt1 invloc2 invamt2 invloc3 invamt3 NY_tot
Bank A 1 1881 244789 Philadelphia 7250.32 New York 20218.20 Philadelphia 29513.4 20218.20
Bank B 2 1881 195755 Pittsburgh 10243.60 NA 1851.51 NA NA 0
Bank C 3 1881 107736 New York 13357.80 Wilkes-Barre 17761.20 NA NA 13357.80
Bank D 4 1881 170600 Philadelphia 3.35 Philadelphia 2.00 NA NA 0
Bank E 5 1881 32000000 New York 351266.00 New York 314012.00 NA 665278
這裏是我使用
bankname <- c("Bank A","Bank B","Bank C","Bank D","Bank E")
bankid <- c(1, 2, 3, 4, 5)
year<- c(1881, 1881, 1881, 1881, 1881)
totass <- c(244789, 195755, 107736, 170600, 32000000)
invloc1 <-c("Philadelphia","Pittsburgh","New York","Philadelphia","New York")
invamt1<-c(7250.32,10243.6,13357.8,3.35,351266)
invloc2<-c("New York","NA","Wilkes-Barre","Philadelphia","New York")
invamt2<-c(20218.2,1851.51,17761.2,2,314012)
invloc3<-c("Philadelphia","NA","NA","NA","")
invamt3<-c(29513.4,NA,NA,NA,NA)
bankdata<-data.frame(bankname, bankid,year,totass, invloc1, invamt1, invloc2, invamt2, invloc3, invamt3)
當我嘗試下面的代碼數據集:
變化因子變量(invloc)字符
i <- sapply(bankdata, is.factor)
bankdata[i] <- lapply(bankdata[i], as.character)
然後創建一個新變量
for(i in 1:nrow(bankdata)){
bankdata$NY_tot<-0
for(j in 1:3){
if((!is.na(bankdata[i,paste("invloc",j,sep="")])) && (bankdata[i,paste("invloc",j,sep="")]=="New York")){
if (!is.na(bankdata[i,paste("invamt",j,sep="")])){
bankdata$NY_tot[i]<-bankdata$NY_tot[i]+bankdata[i,paste("invamt",j,sep="")]
}
}
}
}
我在我的NY_tot
變量中獲得0。你能告訴我爲什麼嗎?
預先感謝您!
因爲您爲每一行重新定義了bankdata $ NY_tot <-0'。你可能想在循環之外做到這一點。 – shadow 2014-10-01 13:14:49
您正在使用'for'循環,您應該使用矢量化。這導致代碼變慢。 – Roland 2014-10-01 13:22:11
我該如何更有效地做到這一點?你能給我示例代碼嗎?感謝大家。 – 2014-10-01 13:29:45