2014-10-06 39 views
6

我有一個看起來像這樣的數據集。在數據框中改變多列

bankname bankid year totass cash bond loans 
Bank A  1  1881 244789 7250 20218 29513 
Bank B  2  1881 195755 10243 185151 2800 
Bank C  3  1881 107736 13357 177612 NA 
Bank D  4  1881 170600 35000 20000 5000 
Bank E  5  1881 3200000 351266 314012 NA 

我想根據銀行資產負債表計算一些比率。我想數據集看起來像這樣

bankname bankid year totass cash bond loans CashtoAsset BondtoAsset LoanstoAsset 
Bank A  1  1881 2447890 7250 202100 951300 0.002 0.082 0.388 
Bank B  2  1881 195755 10243 185151 2800 0.052 0.945 0.014 
Bank C  3  1881 107736 13357 177612 NA 0.123 1.648585431 NA 
Bank D  4  1881 170600 35000 20000 5000 0.205 0.117 0.029 
Bank E  5  1881 32000000 351266 314012 NA 0.0109 0.009 NA 

這裏是複製數據

bankname <- c("Bank A","Bank B","Bank C","Bank D","Bank E") 
bankid <- c(1, 2, 3, 4, 5) 
year<- c(1881, 1881, 1881, 1881, 1881) 
totass <- c(244789, 195755, 107736, 170600, 32000000) 
cash<-c(7250,10243,13357,35000,351266) 
bond<-c(20218,185151,177612,20000,314012) 
loans<-c(29513,2800,NA,5000,NA) 
bankdata<-data.frame(bankname, bankid,year,totass, cash, bond, loans) 

首先代碼,我在資產負債表中擺脫的NAS。

cols <- c("totass", "cash", "bond", "loans") 
bankdata[cols][is.na(bankdata[cols])] <- 0 

然後我計算比

library(dplyr) 
bankdata<-mutate(bankdata,CashtoAsset = cash/totass) 
bankdata<-mutate(bankdata,BondtoAsset = bond/totass) 
bankdata<-mutate(bankdata,loanstoAsset =loans/totass) 

但是,而不是通過線計算所有這些比率線,我想創建一個看一次全部做到這一點。在Stata,我會做

foreach x of varlist cash bond loans { 
by bankid: gen `x'toAsset = `x'/ totass 
} 

我該怎麼做?

+1

元點評:當從一種語言翻譯成另一種,你不必過於直譯。在Stata中的循環通常在R中基於數組的計算方面效果更好。(甚至相反可能是正確的:來自其他語言的Stata的新手經常嘗試循環觀察,這很少需要。) – 2014-10-06 17:42:06

+0

我簡化了我的變量,但是在我的數據集,我有超過20類資產,所以有一個循環是有幫助的。 – 2014-10-06 18:16:55

+0

我沒有反對循環;同樣典型的R用戶肯定對20列感到滿意.... – 2014-10-06 18:17:47

回答

0

你可能會犯這比需要稍微用力。試試這個,看看它是否會產生你需要的東西。

bankdata$CashtoAsset <- bankdata$cash/bankdata$totass 
bankdata$BondtoAsset <- bankdata$bond/bankdata$totass 
bankdata$loantoAsset <- bankdata$loans/bankdata$totass 
bankdata 

此息率:

bankname bankid year totass cash bond loans CashtoAsset BondtoAsset loantoAsset 
1 Bank A  1 1881 244789 7250 20218 29513 0.02961734 0.082593581 0.12056506 
2 Bank B  2 1881 195755 10243 185151 2800 0.05232561 0.945830247 0.01430359 
3 Bank C  3 1881 107736 13357 177612  0 0.12397899 1.648585431 0.00000 
4 Bank D  4 1881 170600 35000 20000 5000 0.20515826 0.117233294 0.02930832 
5 Bank E  5 1881 32000000 351266 314012  0 0.01097706 0.009812875 0.00000000 

這應該讓你在正確的方向開始。

0

這是dplyr的一大缺點:據我所知,沒有直接的方式使用它,而不是交互式地使用它,沒有像可惡的eval(parse(text=foo))成語那樣的「黑客」。

最簡單的方法是一樣的在Stata的方法,但是字符串操作比在Stata中的R稍微更詳細的(或以任何其它腳本語言,對於這個問題)。

for (x in c("cash", "bond", "loans")) { 
    bankdata[sprintf("%stoAsset", x)] <- bankdata[x]/bankdata$totass # or, equivalently, bankdata["totass"] for a consistent "look" 
    ## can also replace `sprintf("%stoAsset", x)` with `paste0(c(x, "toAsset"))` or even `paste(x, "toAsset", collapse="") depending on what makes more sense to you. 
} 

爲了使整個事情更加的Stata樣,你可以用在整個事件中within像這樣:

bankdata <- within(bankdata, for (x in c("cash", "bond", "loans")) { 
    assign(x, get(x)/totass) 
}) 

,但是這需要與getassign功能,這是不是有些黑客儘可能安全地使用,儘管在你的情況下它可能不是什麼大不了的事情。例如,我不建議嘗試類似的技巧,例如dplyr,因爲dplyr濫用R的非標準評估功能,它可能比它的價值更麻煩。對於更快更可能更好的解決方案,請查看data.table軟件包(我認爲)可以讓您使用類似Stata的循環語法,但速度類似於dplyr。查看CRAN上的包裝小插圖。

另外,你真的,真的確定要重新分配NA項爲0?

14

更新(截至12月的第2個的,2017年)

因爲我回答了這個問題,我已經意識到有些所以用戶已經檢查這個答案。自那以後,dplyr軟件包發生了變化。因此,我留下以下更新。我希望這能幫助一些R用戶學習如何使用mutate_at()

mutate_each()現已棄用。您想改用mutate_at()。您可以指定要在.vars中應用您的功能的列。一種方法是使用vars()。另一種方法是使用包含列名稱的字符向量,您想要在.fun中應用自定義函數。另一種是用數字指定列(例如,在這種情況下爲5:7)。請注意,如果您使用group_by()的列,則需要更改列位置的數量。看看this question

bankdata %>% 
mutate_at(.funs = funs(toAsset = ./totass), .vars = vars(cash:loans)) 

bankdata %>% 
mutate_at(.funs = funs(toAsset = ./totass), .vars = c("cash", "bond", "loans")) 

bankdata %>% 
mutate_at(.funs = funs(toAsset = ./totass), .vars = 5:7) 

# bankname bankid year totass cash bond loans cash_toAsset bond_toAsset loans_toAsset 
#1 Bank A  1 1881 244789 7250 20218 29513 0.02961734 0.082593581 0.12056506 
#2 Bank B  2 1881 195755 10243 185151 2800 0.05232561 0.945830247 0.01430359 
#3 Bank C  3 1881 107736 13357 177612 NA 0.12397899 1.648585431   NA 
#4 Bank D  4 1881 170600 35000 20000 5000 0.20515826 0.117233294 0.02930832 
#5 Bank E  5 1881 32000000 351266 314012 NA 0.01097706 0.009812875   NA 

我特意給toAsset到自定義功能.fun,因爲這將幫助我安排新的列名。以前,我用rename()。但我認爲在本方法中用gsub()清理列名要容易得多。如果上述結果保存爲out,則需要運行以下代碼以刪除列名中的_

names(out) <- gsub(names(out), pattern = "_", replacement = "") 

原來的答覆

我覺得你可以節省一些打字用這種方式與dplyr。缺點是你會覆蓋現金,債券和貸款。

bankdata %>% 
    group_by(bankname) %>% 
    mutate_each(funs(whatever = ./totass), cash:loans) 

# bankname bankid year totass  cash  bond  loans 
#1 Bank A  1 1881 244789 0.02961734 0.082593581 0.12056506 
#2 Bank B  2 1881 195755 0.05232561 0.945830247 0.01430359 
#3 Bank C  3 1881 107736 0.12397899 1.648585431   NA 
#4 Bank D  4 1881 170600 0.20515826 0.117233294 0.02930832 
#5 Bank E  5 1881 32000000 0.01097706 0.009812875   NA 

如果你喜歡你的預期結果,我認爲一些打字是必要的。重命名部分似乎是你必須做的事情。

bankdata %>% 
    group_by(bankname) %>% 
    summarise_each(funs(whatever = ./totass), cash:loans) %>% 
    rename(cashtoAsset = cash, bondtoAsset = bond, loanstoAsset = loans) -> ana; 
    ana %>% 
    merge(bankdata,., by = "bankname") 

# bankname bankid year totass cash bond loans cashtoAsset bondtoAsset loanstoAsset 
#1 Bank A  1 1881 244789 7250 20218 29513 0.02961734 0.082593581 0.12056506 
#2 Bank B  2 1881 195755 10243 185151 2800 0.05232561 0.945830247 0.01430359 
#3 Bank C  3 1881 107736 13357 177612 NA 0.12397899 1.648585431   NA 
#4 Bank D  4 1881 170600 35000 20000 5000 0.20515826 0.117233294 0.02930832 
#5 Bank E  5 1881 32000000 351266 314012 NA 0.01097706 0.009812875   NA 
+0

嗨,我正在嘗試在此處發佈的所有不同選項。當我嘗試你的代碼時,我得到了。 '錯誤:未找到對象'ana'。你能向我解釋發生了什麼事嗎?謝謝。 – 2014-10-06 19:36:37

+0

@HPark我將輸出分配給管道過程中的對象,ana。如果這種方法對您不適用,您可以這樣做:'ana < - bankdata%>%group_by(bankname)%>%summarise_each(funs(whatever = ./totass),cash:loans)%>%rename(cashtoAsset =現金,bondtoAsset =債券,loanstoAsset =貸款); ana%>%merge(bankdata,。,by =「銀行名稱」)' – jazzurro 2014-10-07 00:06:04

0

嘗試:

for(i in 5:7){ 
    bankdata[,(i+3)] = bankdata[,i]/bankdata[,4] 
} 
names(bankdata)[(5:7)+3] = paste0(names(bankdata)[5:7], 'toAssest') 

輸出:

bankdata 
    bankname bankid year totass cash bond loans cashtoAssest bondtoAssest loanstoAssest 
1 Bank A  1 1881 244789 7250 20218 29513 0.02961734 0.082593581 0.12056506 
2 Bank B  2 1881 195755 10243 185151 2800 0.05232561 0.945830247 0.01430359 
3 Bank C  3 1881 107736 13357 177612  0 0.12397899 1.648585431 0.00000000 
4 Bank D  4 1881 170600 35000 20000 5000 0.20515826 0.117233294 0.02930832 
5 Bank E  5 1881 32000000 351266 314012  0 0.01097706 0.009812875 0.00000000 
2

這裏是一個data.table溶液。

library(data.table) 
setDT(bankdata) 
bankdata[, paste0(names(bankdata)[5:7], "toAsset") := 
      lapply(.SD, function(x) x/totass), .SDcols=5:7] 
bankdata 
# bankname bankid year totass cash bond loans cashtoAsset bondtoAsset loanstoAsset 
# 1: Bank A  1 1881 244789 7250 20218 29513 0.02961734 0.082593581 0.12056506 
# 2: Bank B  2 1881 195755 10243 185151 2800 0.05232561 0.945830247 0.01430359 
# 3: Bank C  3 1881 107736 13357 177612  0 0.12397899 1.648585431 0.00000000 
# 4: Bank D  4 1881 170600 35000 20000 5000 0.20515826 0.117233294 0.02930832 
# 5: Bank E  5 1881 32000000 351266 314012  0 0.01097706 0.009812875 0.00000000 
2

Applycbind

cbind(bankdata,apply(bankdata[,5:7],2, function(x) x/bankdata$totass)) 
names(bankdata)[8:10] <- paste0(names(bankdata)[5:7], 'toAssest’) 

> bankdata 
    bankname bankid year totass cash bond loans cashtoAssest bondtoAssest loanstoAssest 
1 Bank A  1 1881 244789 7250 20218 29513 0.02961734 0.082593581 0.12056506 
2 Bank B  2 1881 195755 10243 185151 2800 0.05232561 0.945830247 0.01430359 
3 Bank C  3 1881 107736 13357 177612 NA 0.12397899 1.648585431   NA 
4 Bank D  4 1881 170600 35000 20000 5000 0.20515826 0.117233294 0.02930832 
5 Bank E  5 1881 32000000 351266 314012 NA 0.01097706 0.009812875   NA