2017-04-06 75 views
0

一個TDM我有以下代碼:Weka的API:創建使用StringToWordVector

ArrayList<Attribute> attributes = new ArrayList<>() 
    attributes.add(new Attribute("tweet", true)) 

    ArrayList<String> theLines = new ArrayList<>() 
    File cleanestTweets = new File("cleanestTweets.txt") 
    File savedResults = new File("savedResults.arff") 
    Instances instances 

    try { 

     Scanner console = new Scanner(cleanestTweets) 

     while (console.hasNextLine()) { 

      String line = console.nextLine() 

       theLines.add(theLine) 

     } 

     Instance ins = new DenseInstance(1) 
     instances = new Instances("TwitterData", attributes, theLines.size()) 
     theLines.each { it -> 
      ins.setValue(attributes[0], it) 
      instances.add(ins) 
     } 

     StringToWordVector filter = new StringToWordVector() 
     filter.setInputFormat(instances) 
     filter.setOutputWordCounts(true) 
     filter.setTFTransform(true) 
     filter.setDictionaryFileToSaveTo(savedResults) 
     filter.getDictionaryFileToSaveTo() 


    } catch (IOException e) { 

    } 

它創建的實例代碼工作正常。然後我嘗試創建一個TDM並將其寫入savedResults.txt。運行代碼時,沒有任何內容寫入savedResults.txt。我不完全確定爲什麼。我已閱讀documentation,但沒有提及任何內容。

回答

0
StringToWordVector filter = new StringToWordVector() 
    filter.setInputFormat(instances) 
    filter.setDictionaryFileToSaveTo(savedResults) 
    filter.setOutputWordCounts(true) 
    filter.setTFTransform(true) 
    Instances dataFiltered = weka.filters.Filter.useFilter(instances, filter) 

這確實將文字及其出現寫入到文件中。看起來你必須創建新的實例並明確聲明使用過濾器。我用this question來得出這個結論。