2017-10-06 114 views
0

我有一個帳戶和IP地址的列表,我想獲取位置摘要。然而,計算對於我們的服務器來說太重了,我想知道是否有辦法改變我的代碼,我可以得到我所有的結果。賬戶數據集約爲150k行和2列。轉換IP地址到位置,需要優化

select city, state, count(*) from(
    select account_id, 256*256*256*one+256*256*two+256*three+four as Converted, city, state from 
     (select *, convert(bigint, split_part(ip_address, '.', 1)) as one, convert(int, split_part(ip_address, '.', 2)) as two, 
     convert(int, split_part(ip_address, '.', 3)) as three, convert(int, split_part(ip_address, '.', 4)) as four from AccountsIP) 
    inner join 
    (select city, state, ip_from, ip_to from ip_ranges a left join ip_locations b on a.ip_location_id = b.ip_location_id 
     where country = 'US') b 
     on (256*256*256*one+256*256*two+256*three+four) between ip_from and ip_to 
) 
group by city, state 

回答

0

您可以創建一個Python UDF轉換的IP地址爲bigint和使用,在BETWEEN條件:

create or replace function ip_to_ipnum (ip varchar) 
    returns bigint 
    stable as $$ 
    ip_array = ip.split('.') 
    return int(ip_array[0])*16777216+int(ip_array[1])*65536+int(ip_array[2])*256+int(ip_array[3]) 
$$ language plpythonu; 

而且,瓶頸可能是在必須進行排序您ip_rangesip_locations表適當。如果您的數據僅在美國,您可以刪除所有其他數據而不是過濾,並按(ip_from, ip_to)對錶格進行排序,以便查找效率更高。

而且,由於在ip_rangesip_locations數據沒有太大的波動,你可以創建一個連接它們的物理表,這樣你就不必每次都加入他們在上面的查詢。