我不確定這是否符合質詢的問題,但我需要幫助才能使我的編碼更有效率。我認爲這可以做得更有效率,我只是很糟糕的寫作功能,也許看到答案會幫助我改進。轉發期間相關性R
例如:我有時間序列數據,並希望計算指標Y相關性,以指導我的X值(多個X)的期貨週期變化。 (dput最後)。
我的解決辦法:
str(data.dt)
Classes ‘data.table’ and 'data.frame': 210 obs. of 3 variables:
$ id : chr "X1" "X1" "X1" "X1" ...
$ date : Date, format: "2016-11-18" "2016-11-25" "2016-12-02" "2016-12-09" ...
$ PX_LAST: num 2.72 2.76 2.86 2.81 2.83 ...
- attr(*, ".internal.selfref")=<externalptr>
#separate indicator value
y.dt <- data.dt[id=="Y"]
#add indicator as own column for each X
step1.dt <- y.dt[data.dt, on="date"]
#rename
correl.dt <- step1.dt[, .(date=date, x_id=i.id, x_value=i.PX_LAST, y_id = id, y_value=PX_LAST)]
#discard NAs and Y from x_id
correl.dt <- na.omit(correl.dt[x_id != "Y"])
#calculate change for each X
correl.dt[, x.chg := c(rep(NA, 1), diff(x_value, 1)), by=list(x_id)]
#create forward change by leading changes
correl.dt[, fwd.xchg := shift(x.chg, type='lead', 1), by = list(x_id)]
#create multiple Y changes to test correlations
correl.dt[, y.chg1 := c(rep(NA, 1), diff(y_value, 1)), by=list(x_id)]
correl.dt[, y.chg2 := c(rep(NA, 2), diff(y_value, 2)), by=list(x_id)]
correl.dt[, y.chg3 := c(rep(NA, 3), diff(y_value, 3)), by=list(x_id)]
correl.dt[, y.chg4 := c(rep(NA, 4), diff(y_value, 4)), by=list(x_id)]
correl.dt[, y.chg5 := c(rep(NA, 5), diff(y_value, 5)), by=list(x_id)]
correl.dt[, y.chg6 := c(rep(NA, 6), diff(y_value, 6)), by=list(x_id)]
#cbind results together
cbind(correl.dt[, cor(fwd.xchg, y.chg1, method='spearman', use='pairwise'), by=.(x_id)],
correl.dt[, cor(fwd.xchg, y.chg2, method='spearman', use='pairwise'), by=.(x_id)][,2],
correl.dt[, cor(fwd.xchg, y.chg3, method='spearman', use='pairwise'), by=.(x_id)][,2],
correl.dt[, cor(fwd.xchg, y.chg4, method='spearman', use='pairwise'), by=.(x_id)][,2],
correl.dt[, cor(fwd.xchg, y.chg5, method='spearman', use='pairwise'), by=.(x_id)][,2],
correl.dt[, cor(fwd.xchg, y.chg6, method='spearman', use='pairwise'), by=.(x_id)][,2])
結果,是沒有意義的,因爲我有非常小的子集。此外,我選擇了短時間的相關性來適合我的子集。幫助表示讚賞,什麼是測試前向相關性的最佳方法。我愛上了數據表,還不是很擅長,但還在改進。我有大約100-200個指標要測試。
這裏是dput:
structure(list(id = c("X1", "X1", "X1", "X1", "X1", "X1", "X1",
"X1", "X1", "X1", "X1", "X1", "X1", "X1", "X1", "X1", "X1", "X1",
"X1", "X1", "X1", "X1", "X1", "X1", "X1", "X1", "X1", "X1", "X1",
"X1", "X2", "X2", "X2", "X2", "X2", "X2", "X2", "X2", "X2", "X2",
"X2", "X2", "X2", "X2", "X2", "X2", "X2", "X2", "X2", "X2", "X2",
"X2", "X2", "X2", "X2", "X2", "X2", "X2", "X2", "X2", "X3", "X3",
"X3", "X3", "X3", "X3", "X3", "X3", "X3", "X3", "X3", "X3", "X3",
"X3", "X3", "X3", "X3", "X3", "X3", "X3", "X3", "X3", "X3", "X3",
"X3", "X3", "X3", "X3", "X3", "X3", "X4", "X4", "X4", "X4", "X4",
"X4", "X4", "X4", "X4", "X4", "X4", "X4", "X4", "X4", "X4", "X4",
"X4", "X4", "X4", "X4", "X4", "X4", "X4", "X4", "X4", "X4", "X4",
"X4", "X4", "X4", "X5", "X5", "X5", "X5", "X5", "X5", "X5", "X5",
"X5", "X5", "X5", "X5", "X5", "X5", "X5", "X5", "X5", "X5", "X5",
"X5", "X5", "X5", "X5", "X5", "X5", "X5", "X5", "X5", "X5", "X5",
"X6", "X6", "X6", "X6", "X6", "X6", "X6", "X6", "X6", "X6", "X6",
"X6", "X6", "X6", "X6", "X6", "X6", "X6", "X6", "X6", "X6", "X6",
"X6", "X6", "X6", "X6", "X6", "X6", "X6", "X6", "Y", "Y", "Y",
"Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y",
"Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y",
"Y"), date = structure(c(17123L, 17130L, 17137L, 17144L, 17151L,
17158L, 17165L, 17172L, 17179L, 17186L, 17193L, 17200L, 17207L,
17214L, 17221L, 17228L, 17235L, 17242L, 17249L, 17256L, 17263L,
17270L, 17277L, 17284L, 17291L, 17298L, 17305L, 17312L, 17319L,
17326L, 17123L, 17130L, 17137L, 17144L, 17151L, 17158L, 17165L,
17172L, 17179L, 17186L, 17193L, 17200L, 17207L, 17214L, 17221L,
17228L, 17235L, 17242L, 17249L, 17256L, 17263L, 17270L, 17277L,
17284L, 17291L, 17298L, 17305L, 17312L, 17319L, 17326L, 17123L,
17130L, 17137L, 17144L, 17151L, 17158L, 17165L, 17172L, 17179L,
17186L, 17193L, 17200L, 17207L, 17214L, 17221L, 17228L, 17235L,
17242L, 17249L, 17256L, 17263L, 17270L, 17277L, 17284L, 17291L,
17298L, 17305L, 17312L, 17319L, 17326L, 17123L, 17130L, 17137L,
17144L, 17151L, 17158L, 17165L, 17172L, 17179L, 17186L, 17193L,
17200L, 17207L, 17214L, 17221L, 17228L, 17235L, 17242L, 17249L,
17256L, 17263L, 17270L, 17277L, 17284L, 17291L, 17298L, 17305L,
17312L, 17319L, 17326L, 17123L, 17130L, 17137L, 17144L, 17151L,
17158L, 17165L, 17172L, 17179L, 17186L, 17193L, 17200L, 17207L,
17214L, 17221L, 17228L, 17235L, 17242L, 17249L, 17256L, 17263L,
17270L, 17277L, 17284L, 17291L, 17298L, 17305L, 17312L, 17319L,
17326L, 17123L, 17130L, 17137L, 17144L, 17151L, 17158L, 17165L,
17172L, 17179L, 17186L, 17193L, 17200L, 17207L, 17214L, 17221L,
17228L, 17235L, 17242L, 17249L, 17256L, 17263L, 17270L, 17277L,
17284L, 17291L, 17298L, 17305L, 17312L, 17319L, 17326L, 17123L,
17130L, 17137L, 17144L, 17151L, 17158L, 17165L, 17172L, 17179L,
17186L, 17193L, 17200L, 17207L, 17214L, 17221L, 17228L, 17235L,
17242L, 17249L, 17256L, 17263L, 17270L, 17277L, 17284L, 17291L,
17298L, 17305L, 17312L, 17319L, 17326L), class = "Date"), PX_LAST = c(2.719,
2.761, 2.863, 2.815, 2.831, 2.872, 2.765, 2.681, 2.692, 2.783,
2.779, 2.795, 2.696, 2.803, 2.73, 2.807, 2.977, 2.861, 2.75,
2.701, 2.551, 2.474, 2.538, 2.575, 2.648, 2.635, 2.475, 2.41,
2.412, 2.373, 1.579, 1.56, 1.619, 1.73, 1.833, 1.796, 1.721,
1.731, 1.715, 1.751, 1.782, 1.766, 1.697, 1.711, 1.607, 1.702,
1.811, 1.761, 1.642, 1.625, 1.596, 1.494, 1.47, 1.547, 1.542,
1.571, 1.475, 1.445, 1.4, 1.413, 1.455, 1.417, 1.38, 1.453, 1.438,
1.345, 1.239, 1.383, 1.364, 1.431, 1.471, 1.352, 1.256, 1.211,
1.078, 1.185, 1.231, 1.244, 1.196, 1.139, 1.075, 1.043, 1.034,
1.085, 1.117, 1.086, 1.093, 1.012, 1.038, 1.02, 0.272, 0.24,
0.281, 0.365, 0.314, 0.221, 0.208, 0.298, 0.338, 0.421, 0.462,
0.412, 0.32, 0.302, 0.186, 0.356, 0.485, 0.435, 0.403, 0.328,
0.228, 0.187, 0.253, 0.317, 0.418, 0.391, 0.368, 0.331, 0.274,
0.268, 2.3548, 2.3572, 2.3831, 2.4675, 2.5916, 2.5373, 2.4443,
2.4193, 2.3964, 2.4668, 2.4843, 2.4648, 2.4073, 2.4147, 2.3117,
2.478, 2.5745, 2.5005, 2.4123, 2.3874, 2.3822, 2.2374, 2.248,
2.2802, 2.3487, 2.3257, 2.2346, 2.2465, 2.1591, 2.1538, 0.517,
0.534, 0.559, 0.611, 0.64, 0.615, 0.556, 0.628, 0.628, 0.699,
0.749, 0.71, 0.665, 0.678, 0.549, 0.694, 0.774, 0.75, 0.673,
0.605, 0.548, 0.516, 0.564, 0.587, 0.653, 0.572, 0.518, 0.514,
0.425, 0.43, 0.8906, 0.895, 0.8999, 0.9062, 0.89, 0.8864, 0.8802,
0.8839, 0.8964, 0.899, 0.9145, 0.9039, 0.9054, 0.9044, 0.8934,
0.8978, 0.9041, 0.9048, 0.8979, 0.9023, 0.892, 0.8842, 0.8942,
0.9107, 0.9121, 0.9163, 0.8944, 0.8965, 0.8995, 0.8965)), row.names = c(NA,
-210L), class = c("data.table", "data.frame"), .Names = c("id",
"date", "PX_LAST"), .internal.selfref = <pointer: 0x003c24a0>)
之前,我嘗試任何愚蠢的事,我有data.table鑑賞家側的問題:是有關聯的任何風險用這個'dput'和一個顯式指針呢?像覆蓋內存中的東西? –