使用配置單元SQL提取不同字符之間的字符串

我有一個名爲geo_data_display的字段，其中包含國家，地區和dma。這3個值包含在第一個「=」和第一個「&」之間的字符，第二個「=」和第二個「&」之間的區域和第三個「=」和第三個「=」之間的DMA之間的國家， &「。這是一個可重新生成的表格。國家總是字符，但地區和DMA可以是數字或字符和DMA不存在所有國家。使用配置單元SQL提取不同字符之間的字符串

幾個樣本值是：

country=us&region=tx&dma=625&domain=abc.net&zipcodes=76549 
country=us&region=ca&dma=803&domain=abc.com&zipcodes=90404 
country=tw&region=hsz&domain=hinet.net&zipcodes=300 
country=jp&region=1&dma=a&domain=hinet.net&zipcodes=300

我有一些樣本SQL但geo_dma行代碼不是在所有的工作和GEO_REGION代碼行僅適用於字符值

SELECT 

UPPER(REGEXP_REPLACE(split(geo_data_display, '\\&')[0], 'country=', '')) AS geo_country 
,UPPER(split(split(geo_data_display, '\\&')[1],'\\=')[1]) AS geo_region 
,split(split(cast(geo_data_display as int), '\\&')[2],'\\=')[2] AS geo_dma 
FROM mytable

來源

2017-10-12 Joel B

Source

regexp_extract(string subject, string pattern, int index)

返回使用模式提取的字符串。例如，REGEXP_EXTRACT（ 'foothebar'， '富（。*？）（巴）'，1）返回 '的'

select 
     regexp_extract(geo_data_display, 'country=(.*?)(&region)', 1), 
     regexp_extract(geo_data_display, 'region=(.*?)(&dma)', 1), 
     regexp_extract(geo_data_display, 'dma=(.*?)(&domain)', 1)

來源

2017-10-12 16:20:51

完美，謝謝！ –

當DMA不存在時，過度複雜並返回錯誤結果。 –

str_to_map

select geo_map['country'] as geo_country 
     ,geo_map['region'] as geo_region 
     ,geo_map['dma']  as geo_dma 

from (select str_to_map(geo_data_display,'&','=') as geo_map 
     from mytable 
     ) t 
;

+--------------+-------------+----------+ 
| geo_country | geo_region | geo_dma | 
+--------------+-------------+----------+ 
| us   | tx   | 625  | 
| us   | ca   | 803  | 
| tw   | hsz   | NULL  | 
| jp   | 1   | a  | 
+--------------+-------------+----------+

來源

2017-10-12 17:14:47

請嘗試以下，

create table ch8(details map string,string>) 

row format delimited 

collection items terminated by '&' 

map keys terminated by '=';

將數據加載到表中。

create another table using CTAS 

create table ch9 as select details["country"] as country, details["region"] as region, details["dma"] as dma, details["domain"] as domain, details["zipcodes"] as zipcode from ch8; 

Select * from ch9;

來源

2017-11-27 18:09:34 MuraliSunil

使用配置單元SQL提取不同字符之間的字符串

回答

相關問題