對於兩種情況,我有以下數據集w
和關鍵變量x
。二進制搜索像概念在R中創建子集數據
Case 1:
x = 4
w = c(1,2,4,4,4,4,6,7,8,9,10,11,12,14,15)
Case2:
x = 12
w = c(1,2,4,4,4,4,6,7,8,9,10,11,12,14,15)
我想創建這將爲x
通過搜索數據集w
,將在w
子集原始數據集大小的數據集下按x
的位置的功能。輸出將是具有與搜索關鍵字相同的上限值的較小大小的數據集。下面是我想中的R寫入功能:
create_chunk <- function(val, tab, L=1L, H=length(tab))
{
if(H >= L)
{
mid = L + ((H-L)/2)
## If the element is present within middle length
if(tab[mid] > val)
{
## subset the original data in reduced size and again do mid position value checking
## then subset the data
} else
{
mid = mid + (mid/2)
## Increase the mid position to go for right side checking
}
}
}
在輸出我要尋找如下:
Output for Case 1:
Dataset containing: 1,2,4,4,4,4
Output for Case 2:
Dataset containing: 1,2,4,4,4,4,6,7,8,9,10,11,12
Please note:
1. Dataset may contain duplicate values for search key and
all the duplicate values are expected in the output dataset.
2. I have huge size datasets (say around 2M rows) from
where I am trying to subset smaller dataset as per my requirement of search key.
新更新:案例3
輸入數據:
date value size stockName
1 2016-08-12 12:44:43 10093.40 4 HWA IS Equity
2 2016-08-12 12:44:38 10093.35 2 HWA IS Equity
3 2016-08-12 12:44:47 10088.00 2 HWA IS Equity
4 2016-08-12 12:44:52 10089.95 1 HWA IS Equity
5 2016-08-12 12:44:53 10089.95 1 HWA IS Equity
6 2016-08-12 12:44:54 10088.95 1 HWA IS Equity
搜索關鍵字是:10089.95
in value colu MN。
預期成果是:
date value size stockName
1 2016-08-12 12:44:47 10088.00 2 HWA IS Equity
2 2016-08-12 12:44:54 10088.95 1 HWA IS Equity
3 2016-08-12 12:44:52 10089.95 1 HWA IS Equity
4 2016-08-12 12:44:53 10089.95 1 HWA IS Equity
你自己的功能有什麼問題? – 989
我沒有獲得第二個數據集的成功。如果匹配變量存在,我也希望提供關於選擇重複值的建議。 – Zico
看起來你正在尋找'?findInterval' - 'w [seq_len(findInterval(4,w))]' –