2017-02-23 372 views
1

在三個表格test_3,test_2test_1之間存在連接。GROUP BY使用CLOB數據

test_1test_3是主表,並且沒有公共列。有加入表test_2test_1sr_idlast_updated_date
test_2sr_idsm_idtest_3sm_idsql_statementtest_3有clob數據導致所有的麻煩。

我必須找到與sm_id關聯的最新sr_id。我的想法是使用一個聚合函數max(last_updated_date)並按它進行分組。 而且它沒有發生,原因很多。

  1. 它包含的CLOB數據列是sql_statement。

  2. 我已經使用了一個我不熟悉的連接。

任何想法都會有所幫助。

WITH xx as (
    (select ANSWER ,sr_id AS ID from test 
    WHERE Q_ID in (SELECT Q_ID FROM test_2 WHERE field_id='LM_LRE_Q6') 
    ) 
) 
-- end of source data 


SELECT t.ID, t1.n, t1.SM_ID,seg_dtls.SEGMENTation_NAME ,to_char(mst.LAST_UPDATED_DATE,'dd-mon-yyyy hh24:mi:ss'),seg_dtls.sql_statement 
FROM xx t 
CROSS JOIN LATERAL (
     select LEVEL AS n, regexp_substr(t.answer, '\d+', 1, level) as SM_ID 
     from dual 
     connect by regexp_substr(t.answer, '\d+', 1, level) IS NOT NULL 
) t1 
left join test_1 mst 
on mst.sr_id=t.id 
right join test_3 seg_dtls 
on seg_dtls.sm_id=t1.sm_id; 

樣本數據會看起來像

sr_id sm_id SEGMENTATION_NAME LAST_UPDATED_DATE 
1108197 958 test_not_in   05-feb-2017 23:56:59  
1108217 958 test_not_in   14-feb-2017 00:37:39 
1108218 958 test_not_in   14-feb-2017 01:39:50 
1108220 958 test_not_in   14-feb-2017 03:39:07 

和預期輸出是

1108220 958 test_not_in   14-feb-2017 03:39:07 

我不張貼CLOB數據,因爲它是巨大的。每行都包含CLOB數據。

table test_3 contains 
q_id  sr_id answer 
1009330 1108246 976~feb_24^941~Test_regionwithcountry 
1009330 1108247 941~Test_regionwithcountry_2016^787~Test_Request_28^976~feb_24 
1009330 1108239 972~test_emea 
1009330 1108240 972~test_emea^827~test_with_region_country 
1009330 1108251 981~MSE100579729 testing. 

和樣本數據看起來像上述的test_3
回答包含SM_ID。我必須從這裏拉它。
例如:

941~Test_regionwithcountry_2016^787~Test_Request_28^976~feb_24 
the sm_id is 941,787,976 

所以我已經拿出了上面提到的上面的查詢。
再次來到左右連接,所有來自test_3的sm_id都是必需的,所以我在這裏使用了正確的連接。

edit1:接受的答案給出了帶有max(last_updated_date)的SEGMENTS的SR_ID。
我需要所有的SR_ID。所以,我使用MINUS運算符來獲取那些不是最大值的(last_updated_date)。
我需要將該結果集附加到接受的答案。

這就是我所做的其他SR_ID。

select sr_id,segmentation_name,request_status from (with test_31 (q_id, sr_id, answer) as (
(SELECT Q_ID,SR_ID,ANSWER FROM test_3 WHERE Q_ID=(SELECT Q_ID FROM test_4 WHERE FIELD_ID='LM_LRE_Q6')) 
), 
answer_extraction as (
    select q_id, sr_id, 
    regexp_substr(regexp_substr(answer, '[^^]+', 1, level),'\d+') as sm_id 
    from test_31 
    connect by q_id = prior q_id 
    and sr_id = prior sr_id 
    and prior dbms_random.value is not null 
    and regexp_substr(answer, '[^^]+', 1, level) is not null 
) 
select sr_id, 
    sm_id, 
    segmentation_name, 
    LAST_UPDATED_DATE, 
    sql_statement,request_status 
from (
    select t1.sr_id, 
    t2.sm_id, 
    t2.segmentation_name, 
    t1.last_updated_date, 
    t2.sql_statement, 
    t1.request_status 

    from test_4 t4 
    join answer_extraction t3 on t3.q_id = t4.q_id 
    join test_2 t2 on t2.sm_id = t3.sm_id 
    join test1 t1 on t1.sr_id = t3.sr_id 
) 
) 
minus 

(select sr_id,segmentation_name , request_status from (with test_31 (q_id, sr_id, answer) as (
(SELECT Q_ID,SR_ID,ANSWER FROM test_3 WHERE Q_ID=(SELECT Q_ID FROM test_4 WHERE FIELD_ID='LM_LRE_Q6')) 
), 
answer_extraction as (
    select q_id, sr_id, 
    regexp_substr(regexp_substr(answer, '[^^]+', 1, level), '\d+') as sm_id 
    from test_31 
    connect by q_id = prior q_id 
    and sr_id = prior sr_id 
    and prior dbms_random.value is not null 
    and regexp_substr(answer, '[^^]+', 1, level) is not null 
) 
select sr_id, 
    segmentation_name, 
    sql_statement, 
    request_status 
from (
    select t1.sr_id, 
    t2.sm_id, 
    t2.segmentation_name, 
    t1.last_updated_date, 
    t2.sql_statement, 
    t1.request_status, 
    max(t1.last_updated_date) over (partition by t2.sm_id) as max_updated_date 
    from test_4 t4 
    join answer_extraction t3 on t3.q_id = t4.q_id 
    join test_2 t2 on t2.sm_id = t3.sm_id 
    join test_1 t1 on t1.sr_id = t3.sr_id 
) 
where last_updated_date = max_updated_date)); 

}

樣本數據:
接受的答案給出以下輸出與該段的最大(時間:LAST_UPDATED_DATELAST_UPDATED_TIME)。

1097661 Submitted o2k lad 30-NOV-15 01-DEC-16 62 CLOB DATA 

上面發佈了查詢GIVES下面的輸出,它是帶有其他更新日期的段的sr_id。

1097621 o2k lad Submitted 
    1097625 o2k lad Submitted 
    1097627 o2k lad Submitted 
    1097632 o2k lad Submitted 
    1097633 o2k lad Submitted 
    1097658 o2k lad Pending 
    1097640 o2k lad Submitted 
    1097644 o2k lad Submitted 
    1097646 o2k lad Submitted 

預期輸出:

sr_id status  segment_name updated_date sql_statement other_sr_id 
1097661 Submitted o2k lad  30-NOV-15  CLOB DATA 1097618,1097621,1097625,1097627,1097632,1097633,1097658,1097640,1097644,1097646 

將二者結合起來的查詢,以便最後一列包含所有舊sr_id。

+0

請郵寄樣本輸入數據和預期的輸出。這對所有用戶都有幫助。 – Tajinder

+1

你最初的計劃使用'max(last_updated_date)'似乎比你的問題中的代碼更有前途。也許你應該重新開始。 –

+0

我知道,但我需要的所有列,甚至包含一個CLOB – user3165555

回答

0

一個相當簡單的選擇是修改當前的查詢來添加查找每個ID的最大日期解析函數,就像這樣:

..., max(mst.last_updated_date) over (partition by id) as max_updated_date 

的總體思路的快速演示:

with cte (id, last_updated_date, sql_statement) as (
    select 1, date '2017-01-01', to_clob('stmt 1') from dual 
    union all select 1, date '2017-01-02', to_clob('stmt 2') from dual 
    union all select 1, date '2017-01-03', to_clob('stmt 3') from dual 
    union all select 2, date '2017-01-02', to_clob('stmt 4') from dual 
) 
select id, last_updated_date, sql_statement 
from (
    select id, last_updated_date, sql_statement, 
    max(last_updated_date) over (partition by id) as max_updated_date 
    from cte 
) 
where last_updated_date = max_updated_date; 

     ID LAST_UPDAT SQL_STATEMENT                 
---------- ---------- -------------------------------------------------------------------------------- 
     1 2017-01-03 stmt 3                   
     2 2017-01-02 stmt 4                   

您可以使用row_number()或rank()或dense_rank()來確定具有最早日期和過濾條件的行,但總體思路是相同的。

但是,您當前的查詢不是很清楚(或在12c之前有效)以開始。與其試圖猜測如何包含這樣一個函數和過濾器,從基表重新開始可能會更簡單,儘管這會對你正在做的事情做出很多假設,並且可能會忽略一些事情 - 如左和右連接 - 可能或可能不需要。

通過CTE的製作了一些數據:

with test_1 (sr_id, last_updated_date) as (
    select 1108197, timestamp '2017-02-05 23:56:59' from dual 
    union all select 1108217, timestamp '2017-02-14 00:37:39' from dual 
    union all select 1108218, timestamp '2017-02-14 01:39:50' from dual 
    union all select 1108220, timestamp '2017-02-14 03:39:07' from dual 
), 
test_2 (sm_id, segmentation_name, sql_statement) as (
    select 958, 'test_not_in', to_clob('select * from dual') from dual 
), 
test_3 (q_id, sr_id, answer) as (
    select 41, 1108197, 958 from dual 
    union all select 42, 1108217, 958 from dual 
    union all select 43, 1108218, 958 from dual 
    union all select 44, 1108220, 958 from dual 
), 
test_4 (q_id, field_id) as (
    select 41, 'LM_LRE_Q6' from dual 
    union all select 42, 'LM_LRE_Q6' from dual 
    union all select 43, 'LM_LRE_Q6' from dual 
    union all select 44, 'LM_LRE_Q6' from dual 
) 

那麼這可以讓你在問題中表現出相同的輸出:

select t1.sr_id, 
    t2.sm_id, 
    t2.segmentation_name, 
    to_char(t1.last_updated_date, 'dd-mon-yyyy hh24:mi:ss') as last_updated_date, 
    t2.sql_statement 
from test_4 t4 
join test_3 t3 on t3.q_id = t4.q_id 
join test_2 t2 on t2.sm_id = t3.answer 
join test_1 t1 on t1.sr_id = t3.sr_id; 

    SR_ID SM_ID SEGMENTATIO LAST_UPDATED_DATE    SQL_STATEMENT                 
---------- ----- ----------- ----------------------------- -------------------------------------------------------------------------------- 
    1108197 958 test_not_in 05-feb-2017 23:56:59   select * from dual                
    1108217 958 test_not_in 14-feb-2017 00:37:39   select * from dual                
    1108218 958 test_not_in 14-feb-2017 01:39:50   select * from dual                
    1108220 958 test_not_in 14-feb-2017 03:39:07   select * from dual                

在野外假設接近正確,你會發現每行最近的日期爲sm_id,如下所示:

您需要調整它來處理任何其他不明確的限制或要求(例如,包括您的左/右外連接)。

我故意忽略了將'答案'分成多個值的子查詢。有可能你有一些可怕的東西,比如裏面的分隔ID列表,這是一個數據模型問題。如果是這種情況,那麼你仍然需要提取個人的價值;是這樣的:

with answer_extraction as (
    select q_id, sr_id, regexp_substr(answer, '\d+', 1, level) as sm_id 
    from test_3 
    connect by q_id = prior q_id 
    and sr_id = prior sr_id 
    and prior dbms_random.value is not null 
    and regexp_substr(answer, '\d+', 1, level) is not null 
) 
select sr_id, 
    sm_id, 
    segmentation_name, 
    to_char(last_updated_date, 'dd-mon-yyyy hh24:mi:ss') as last_updated_date, 
    sql_statement 
from (
    select t1.sr_id, 
    t2.sm_id, 
    t2.segmentation_name, 
    t1.last_updated_date, 
    t2.sql_statement, 
    max(t1.last_updated_date) over (partition by t2.sm_id) as max_updated_date 
    from test_4 t4 
    join answer_extraction t3 on t3.q_id = t4.q_id 
    join test_2 t2 on t2.sm_id = t3.sm_id 
    join test_1 t1 on t1.sr_id = t3.sr_id 
) 
where last_updated_date = max_updated_date; 

基於對您添加test3實際內容,正則表達式是不是做得不錯,你所需要的。您使用的模式會找到14個數字值,即任何數字:

with test_3 (q_id, sr_id, answer) as (
    select 1009330, 1108246, '976~feb_24^941~Test_regionwithcountry' from dual 
    union all select 1009330, 1108247, '941~Test_regionwithcountry_2016^787~Test_Request_28^976~feb_24' from dual 
    union all select 1009330, 1108239, '972~test_emea' from dual 
    union all select 1009330, 1108240, '972~test_emea^827~test_with_region_country' from dual 
    union all select 1009330, 1108251, '981~MSE100579729 testing.' from dual 
), 
answer_extraction as (
    select q_id, sr_id, regexp_substr(answer, '\d+', 1, level) as sm_id 
    from test_3 
    connect by q_id = prior q_id 
    and sr_id = prior sr_id 
    and prior dbms_random.value is not null 
    and regexp_substr(answer, '\d+', 1, level) is not null 
) 
select * from answer_extraction; 

     Q_ID  SR_ID SM_ID  
---------- ---------- ---------- 
    1009330 1108239 972  
    1009330 1108240 972  
    1009330 1108240 827  
    1009330 1108246 976  
    1009330 1108246 24   
    1009330 1108246 941  
    1009330 1108247 941  
    1009330 1108247 2016  
    1009330 1108247 787  
    1009330 1108247 28   
    1009330 1108247 976  
    1009330 1108247 24   
    1009330 1108251 981  
    1009330 1108251 100579729 

看來你只想要^分隔符和〜標記之間的位。拆分分隔字符串的常見方法是:

with test_3 (q_id, sr_id, answer) as (
    select 1009330, 1108246, '976~feb_24^941~Test_regionwithcountry' from dual 
    union all select 1009330, 1108247, '941~Test_regionwithcountry_2016^787~Test_Request_28^976~feb_24' from dual 
    union all select 1009330, 1108239, '972~test_emea' from dual 
    union all select 1009330, 1108240, '972~test_emea^827~test_with_region_country' from dual 
    union all select 1009330, 1108251, '981~MSE100579729 testing.' from dual 
), 
answer_extraction as (
    select q_id, sr_id, regexp_substr(answer, '[^^]+', 1, level) as sm_id 
    from test_3 
    connect by q_id = prior q_id 
    and sr_id = prior sr_id 
    and prior dbms_random.value is not null 
    and regexp_substr(answer, '[^^]+', 1, level) is not null 
) 
select * from answer_extraction; 

     Q_ID  SR_ID SM_ID         
---------- ---------- ---------------------------------------- 
    1009330 1108239 972~test_emea       
    1009330 1108240 972~test_emea       
    1009330 1108240 827~test_with_region_country    
    1009330 1108246 976~feb_24        
    1009330 1108246 941~Test_regionwithcountry    
    1009330 1108247 941~Test_regionwithcountry_2016   
    1009330 1108247 787~Test_Request_28      
    1009330 1108247 976~feb_24        
    1009330 1108251 981~MSE100579729 testing.    

但你需要得到的是,如第一部分借用原來的模式(其他的有效!):

column sm_id format a10 
with test_3 (q_id, sr_id, answer) as (
    select 1009330, 1108246, '976~feb_24^941~Test_regionwithcountry' from dual 
    union all select 1009330, 1108247, '941~Test_regionwithcountry_2016^787~Test_Request_28^976~feb_24' from dual 
    union all select 1009330, 1108239, '972~test_emea' from dual 
    union all select 1009330, 1108240, '972~test_emea^827~test_with_region_country' from dual 
    union all select 1009330, 1108251, '981~MSE100579729 testing.' from dual 
), 
answer_extraction as (
    select q_id, sr_id, 
    regexp_substr(regexp_substr(answer, '[^^]+', 1, level), '\d+') as sm_id 
    from test_3 
    connect by q_id = prior q_id 
    and sr_id = prior sr_id 
    and prior dbms_random.value is not null 
    and regexp_substr(answer, '[^^]+', 1, level) is not null 
) 
select * from answer_extraction; 

     Q_ID  SR_ID SM_ID  
---------- ---------- ---------- 
    1009330 1108239 972  
    1009330 1108240 972  
    1009330 1108240 827  
    1009330 1108246 976  
    1009330 1108246 941  
    1009330 1108247 941  
    1009330 1108247 787  
    1009330 1108247 976  
    1009330 1108251 981  

注意額外regexp_substr()僅在選擇列表中,的connect by語句;並且提取sm_id仍然是一個字符串。如果test_2.sm_id是一個數字,則在該選擇列表中的該對子字符串周圍添加一個to_number()調用。

+0

謝謝亞歷克斯。你所有的假設都是現實的。大部分問題出在表test_3上。我正在編輯這個問題以獲得更多的理解。 – user3165555

+0

@ user3165555 - 你的答案值比我想象的更糟糕。我已經添加了一些關於如何提取你實際感興趣的數字,我介於^和〜之間的數字。你可以使用修改後的'answer_extraction' CTE和我原來的代碼的其餘部分。 –

+0

謝謝Alex,我在這個過程中學到了很多東西。 – user3165555