2017-07-06 59 views
3

我有我需要解析的不規則(儘管一致)「csv」文件。內容是這樣的:Univocity - 不規則的csv解析

Field1: Field1Text 
Field2: Field2Text 

Field3 (need to ignore) 
Field4 (need to ignore) 

Field5 
Field5Text 

// Cars - for example 
#,Col1,Col2,Col3,Col4,Col5,Col6 
#1,Col1Text,Col2Text,Col3Text,Col4Text,Col5Text,Col6Text 
#2,Col1Text,Col2Text,Col3Text,Col4Text,Col5Text,Col6Text 
#3,Col1Text,Col2Text,Col3Text,Col4Text,Col5Text,Col6Text 

理想我想使用類似的方法爲here

我最終要與像的目的是結束:

String field1; 
String field2; 
String field5; 
List<Car> cars; 

我現在有以下幾個問題:

  • 增加了一些探索性測試後,用哈希開始(#)線被忽略。我不想要這個,反正有逃跑嗎?
  • 我的目的是爲cars部分使用BeanListProcessor,並使用單獨的行處理器處理其他字段。然後將結果合併到上述對象中。我在這裏想念任何技巧嗎?

回答

1

你的第一個問題是#默認情況下被視爲註釋字符。爲了防止開始#線被視爲註釋,這樣做:

parserSettings.getFormat().setComment('\0'); 

至於你解析結構,有沒有辦法做到開箱的,但它很容易充分利用該API爲了它。下面的工作:

CsvParserSettings settings = new CsvParserSettings(); 
    settings.getFormat().setComment('\0'); //prevent lines starting with # to be parsed as comments 

    //Creates a parser 
    CsvParser parser = new CsvParser(settings); 

    //Open the input 
    parser.beginParsing(new File("/path/to/input.csv"), "UTF-8"); 

    //create BeanListProcessor for instances of Car, and initialize it. 
    BeanListProcessor<Car> carProcessor = new BeanListProcessor<Car>(Car.class); 
    carProcessor.processStarted(parser.getContext()); 

    String[] row; 
    Parent parent = null; 
    while ((row = parser.parseNext()) != null) { //read rows one by one. 
     if (row[0].startsWith("Field1:")) { // when Field1 is found, create your parent instance 
      if (parent != null) { //if you already have a parent instance, cars have been read. Associate the list of cars to the instance 
       parent.cars = new ArrayList<Car>(carProcessor.getBeans()); //copy the list of cars from the processor. 
       carProcessor.getBeans().clear(); //clears the processor list 
       //you probably want to do something with your parent bean here. 
      } 
      parent = new Parent(); //create a fresh parent instance 
      parent.field1 = row[0]; //assign the fields as appropriate. 
     } else if (row[0].startsWith("Field2:")) { 
      parent.field2 = row[0]; //and so on 
     } else if (row[0].startsWith("Field5:")) { 
      parent.field5 = row[0]; 
     } else if (row[0].startsWith("#")){ //got a "Car" row, invoke the rowProcessed method of the carProcessor. 
      carProcessor.rowProcessed(row, parser.getContext()); 
     } 
    } 

    //at the end, if there is a parent, get the cars parsed 
    if (parent != null) { 
     parent.cars = carProcessor.getBeans(); 
    } 

對於BeanListProcessor工作,你需要已經宣佈你這樣的實例:

public static final class Car { 
    @Parsed(index = 0) 
    String id; 
    @Parsed(index = 1) 
    String col1; 
    @Parsed(index = 2) 
    String col2; 
    @Parsed(index = 3) 
    String col3; 
    @Parsed(index = 4) 
    String col4; 
    @Parsed(index = 5) 
    String col5; 
    @Parsed(index = 6) 
    String col6; 
} 

您可以使用頭代替,但它會讓你寫更多的代碼。如果標題總是相同的,那麼你可以假設位置是固定的。

希望這會有幫助

+0

感謝您花時間回覆傑羅尼莫。也很喜歡使用解析器! – Hurricane