2014-10-27 76 views
-2

我試圖提取使用文本段落匹配部正則表達式stringr-文本之一是匹配R中

if returnValue is not null then 
1. if instrument type is "Bond" then 
     Status is equals to 138 if the instrument is sensible coupon, 
     coupon type is not null and not equals to "ZERO COUPON" and previous value 
     is not equals to current value, and iinstrument creation date is not D 
- Status is equals to 137 if the instrument is sensible bbg, previous value 
     is not equals to current value, and iinstrument creation date is not D or D-1 
- Status is equals to the previous status if the value is not manual 
     and previous status is 138, or 137 

2. if attribute SEC_PAYT_DTE is not null then 
    if attribute SEC_PAYT_DTE (typed as date) is fresher than 
     returnValue (typed as date) then 
    set status to 136 that is "Functional Error" 
3. if acrual date (DEBT_STRT_ACRL_DTE) is not null and instrument 
     category is "Structured Product", and acrual date is different 
     frorm return value then 
    set status to 150 that is "Non blocking functional error". 

我想提取什麼是「狀態138」,「137狀態」 ,'狀態136','狀態150'。我做的是str_extract_all(x,'(S | s)tatus [a-z \ s] {1,10} [0-9] {1,3} [^ \。'')。但它不起作用。

+0

這裏有什麼規則?請明確定義你想要正則表達式做什麼 – 2014-10-27 20:37:11

+0

我希望正則表達式能夠找到字符串的'S(s)status'+ 0-3位數字。例如'狀態等於138',則正則表達式應該找到138.然而,'不是D或D-1'中的1不應該返回 – 2014-10-27 20:44:25

+0

那麼該行和前一狀態應該是138還是137返回? – 2014-10-27 20:56:06

回答

0

str_extract_all中的正則表達式匹配使用POSIX標準,該標準不會繼續查找新行,因此您需要自行完成此操作。

matches <- sapply(strsplit(val, "\n")[[1]], 
    str_extract_all, "[Ss]tatus is(?: equals to)? [0-9]+") 
matches <- gsub(fixed = TRUE, "is ", "", gsub(fixed = TRUE, " equals to", "", 
    Filter(length, matches))) 
# [1] "Status 138" "Status 137" "status 138"