SQL：從單個列中獲取所有對和三元組並計算它們在另一列上的頻率

輸入上的一個簡單表格user_id, item_id（均爲text數據）。SQL：從單個列中獲取所有對和三元組並計算它們在另一列上的頻率

的問題是：什麼是提取所有對的方式，從item_id柱三元組的組合和在user_id（所有用戶的即1％％具有（1，2）item_id對）計數其頻率的頻道

我已經試過一些野蠻：

select FirstID, SecondID, count(user_id) 
from 
(
SELECT 
    t1.item_id as FirstID, 
    t2.item_id as SecondID 

FROM 
(
    SELECT item_id, ROW_NUMBER()OVER(ORDER BY item_id) as Inc 
    FROM t1 
) t1 
LEFT JOIN 
(
    SELECT item_id, ROW_NUMBER()OVER(ORDER BY item_id)-1 as Inc 
    FROM t1 
) t2 ON t2.Inc = t1.Inc 
) t3 join upg_log on t3.FirstID = upg_log.item_id and t3.SecondID = upg_log.item_id 
group by FirstID, SecondID

，但一無所獲

來源

2016-08-21 Sasha Korekov

請提供樣品數據和預期結果的INSERT。 –

這個特殊的任務屬於哪個更容易編寫，而不是執行類型：

declare @t table (
    UserId int not null, 
    ItemId int not null 
); 

insert into @t 
values 
    (1, 1), 
    (1, 2), 
    (1, 3), 
    (2, 1), 
    (2, 2), 
    (3, 2), 
    (3, 3), 
    (4, 1), 
    (4, 4), 
    (5, 4); 

-- Pairs 
select t1.ItemId as [Item1], t2.ItemId as [Item2], count(*) as [UserCount] 
from @t t1 
    inner join @t t2 on t1.UserId = t2.UserId and t1.ItemId < t2.ItemId 
group by t1.ItemId, t2.ItemId 
order by UserCount desc, t1.ItemId, t2.ItemId;

正如您所看到的，這裏有一個半笛卡爾（三角形）連接，這意味着隨着記錄數量的增長，性能將快速下降。而且，當然，適當的索引對於這種查詢是至關重要的。

從理論上講，您可以很容易地擴展這種方法來識別三元組，但它可能證明對您的實際數據不可行。理想情況下，應該使用每行方法計算這些內容，並對結果進行緩存。

來源

2016-08-22 01:03:50

SQL：從單個列中獲取所有對和三元組並計算它們在另一列上的頻率

回答

相關問題