2016-09-27 73 views
1

我有以下類型的數據集:SQL/Vertica的 - 分組多屬性組合

user_id country1 city1  country2 city2 
1   usa  new york france paris 
2   usa  dallas  japan  tokyo 
3   india  mumbai  italy  rome 
4   france paris  usa  new york 
5   brazil sao paulo russia moscow 

欲組的country1city1country2city2組合,其中的順序(東西country1country2 )應該沒關係。通常情況下,我會嘗試:

SELECT country1 
     , city1 
     , country2 
     , city2 
     , COUNT(*) 
FROM dataset 
GROUP BY country1 
     , city1 
     , country2 
     , city2 

然而,這個代碼片斷認爲與user_id=1user_id=4作爲兩個獨立的情況下,我想他們被視爲等同行。

任何人都知道如何解決這個問題?

在此先感謝!

回答

1

通常,您使用least()greatest()來處理此類問題,但您有兩列,而不是一列。所以,我們通過比較城市來做到這一點。我猜citycountry更獨特:

select (case when city1 < city2 then country1 else country2 end) as country1, 
     (case when city1 < city2 then city1 else city2 end) as city1, 
     (case when city1 < city2 then country2 else country1 end) as country2, 
     (case when city1 < city2 then city2 else city1 end) as city2, 
     count(*) 
from dataset 
group by (case when city1 < city2 then country1 else country2 end), 
     (case when city1 < city2 then city1 else city2 end), 
     (case when city1 < city2 then country2 else country1 end), 
     (case when city1 < city2 then city2 else city1 end)