要分配的標題級別。
第一個標題是分配的級別1.我提取字體系列和它的大小,尋找匹配的標題。一旦級別被分配,我取消標題的標題,在另一個註釋(HeadingHierarchy)中保留標題&。等級完成後,只要Headinglevel註釋中剩下任何標題,就會一次又一次調用同一個塊。
問題:
該腳本工作正常,發現所有1級標題。但是,當通過Call語句執行該塊時,它僅找到每個級別的第一個匹配(級別2以上)。因此電平爲低於輸入的總數變爲10,而它必須是4。
輸入:(.txt)的
Apache UIMA Ruta Overview =>Arial,18
What is Apache UIMA Ruta? =>Arial,16
Getting started =>Arial,16
UIMA Analysis Engines =>Arial,16
Ruta Engine =>Times New Roman,14
Configuration Parameters =>Arial,10
Annotation Writer =>Times New Roman,14
Configuration Parameters =>Arial,10
Apache UIMA Ruta Language =>Arial,18
Syntax =>Arial,16
Rule elements and their matching order =>Arial,16
腳本:
PACKAGE uima.ruta.example;
DECLARE Headinglevel(STRING family, INT size, INT level);
DECLARE HeadingHierarchy(STRING family, INT size, INT level);
DECLARE FontFamily, FontSize;
STRING family;
INT size;
RETAINTYPE(BREAK);
BREAK? #{-PARTOF(Headinglevel)} @SPECIAL+ W+ COMMA NUM{->MARK(Headinglevel,2,6), MARK(HeadingHierarchy,2,6), MARK(FontFamily,4), MARK(FontSize,6)};
RETAINTYPE;
h:Headinglevel{->h.family = family, HeadingHierarchy.family = family}
<-{FontFamily{PARSE(family)};};
h:Headinglevel{->h.size = size, HeadingHierarchy.size = size}
<-{FontSize{PARSE(size)};};
INT i=1;
BLOCK(ForEachHeadLevel)Document{}
{
# h:Headinglevel{-> family = h.family, size = h.size};
h:Headinglevel{AND(h.family == family, h.size == size)-> h.level=i, HeadingHierarchy.level = i, UNMARK(h)};
}
Headinglevel{->i=i+1, CALL(Test2.ForEachHeadLevel)};
Document{->LOG(" LEVELS : " + (i))};
預計產量:
HeadingHierarchy Feature
Apache UIMA... =>Arial,18 level: 1
What is Apa... =>Arial,16 level: 2
Getting sta... =>Arial,16 level: 2
UIMA Analys... =>Arial,16 level: 2
Ruta Engine... =>Times New Roman,14 level: 3
Configurati... =>Arial,10 level: 4
Annotation ... =>Times New Roman,14 level: 3
Configurati... =>Arial,10 level: 4
Apache UIMA... =>Arial,18 level: 1
Syntax =>Ar... =>Arial,16 level: 2
Rule elemen... =>Arial,16 level: 2
我加org.apache.uima.ruta.block.DocumentBlockExtension在additionalExtensions。但是我得到錯誤,輸入「DOCUMENTBLOCK」沒有在這個腳本/塊中定義! – prasanth
看起來像腳本運行後新添加的參數被刪除(在這種情況下有錯誤)。 dictRemoveWS也會發生同樣的情況,所以每次運行腳本時都需要添加它。 – prasanth
是的,看起來擴展在Workbench中不可用。我會修好它。 –