2015-03-19 49 views
0

我有符號化的代碼,我想結果寫入文件中的.txt這樣我就可以使用該文件刪除所有的標點文件寫令牌到Java中

public static void tokenization() throws FileNotFoundException, IOException{ 

    String msg; 

    int numberOfTotalTokens =0; 
    int numberOfMessages = 0; 

    String data = "E:\\Data\\SMSSpamCollection.txt"; 

    FileInputStream fisData = new FileInputStream(data); 
    BufferedReader readBufferData = new BufferedReader(new InputStreamReader(fisData)); 

    try{ 
     while ((msg =readBufferData.readLine()) != null) { 
      int numberOfTokens = 0; 
      StringTokenizer tokens = new StringTokenizer(msg); 
      StringBuilder sb = new StringBuilder(); 
      System.out.println("Before: "+msg); 

      System.out.print("After : "); 
      while (tokens.hasMoreTokens()) { 
       msg = tokens.nextToken(); 
       String msgLower = msg.toLowerCase(); 
       numberOfTokens++; 
       numberOfTotalTokens++; 
      } 

      System.out.println(""); 
      System.out.println("Total Tokens: "+numberOfTokens); 
      System.out.println("\n"); 
      numberOfTokens++; 
      numberOfMessages++; 
     } 
     System.out.println("Total Tokens: "+numberOfTotalTokens); 
     System.out.println("Total Messages: "+numberOfMessages); 
    } 
    catch (Exception e){ 
     System.out.println("Error Exception: "+e.getMessage()); 
    } 
} 

的這個結果代碼設置標記: 例如: 我 名 是 計算器

該令牌集需要寫入到文件中的.txt 我怎麼可以編寫成文件的.txt

+0

請[編輯]你的問題來解釋究竟有什麼錯,你已經得到了代碼。感謝您提高問題的參考價值並使其更具責任感! – 2015-03-19 03:53:23

+0

你需要標記化,刪除標點符號(或停止詞),然後寫入.txt? – VedX 2015-03-19 05:47:19

回答

0

我在評論中添加了對代碼的更改。使用此代碼,您可以使用BufferedWriter將每行一個令牌寫入文件output.txt

一些言論:

  • 你的代碼做一些基本的標誌化(documentation),不照顧停止詞像標點符號例如
  • StringTokenizer是爲了向後遺留類兼容性,但建議使用正則表達式String.split而不是
  • 對於更高級的標記化,您可以使用Apache Lucene庫

希望它有幫助!

public static void tokenization() throws FileNotFoundException, IOException{ 

String msg; 

int numberOfTotalTokens =0; 
int numberOfMessages = 0; 

String data = "E:\\Data\\SMSSpamCollection.txt"; 
// create a new output file output.txt 
String outfilename = "E:\\output.txt"; 
File file =new File(outfilename); 
file.createNewFile(); 

FileInputStream fisData = new FileInputStream(data); 
BufferedReader readBufferData = new BufferedReader(new InputStreamReader(fisData)); 

// create a buffer writer tokDataB 
FileWriter tokData = new FileWriter(outfilename,true); 
BufferedWriter tokDataB = new BufferedWriter(tokData); 

try{ 
    while ((msg =readBufferData.readLine()) != null) { 
     int numberOfTokens = 0; 
     StringTokenizer tokens = new StringTokenizer(msg); 
     StringBuilder sb = new StringBuilder(); 
     System.out.println("Before: "+msg); 

     System.out.print("After : "); 
     while (tokens.hasMoreTokens()) { 
      msg = tokens.nextToken(); 
      String msgLower = msg.toLowerCase(); 

      // write one token per line to output file 
      tokDataB.write(msgLower); 
      tokDataB.write("\n"); 

      numberOfTokens++; 
      numberOfTotalTokens++; 
     } 

     System.out.println(""); 
     System.out.println("Total Tokens: "+numberOfTokens); 
     System.out.println("\n"); 
     numberOfTokens++; 
     numberOfMessages++; 
    } 

    // close output writer 
    tokDataB.close();      

    System.out.println("Total Tokens: "+numberOfTotalTokens); 
    System.out.println("Total Messages: "+numberOfMessages); 
} 
catch (Exception e){ 
    System.out.println("Error Exception: "+e.getMessage()); 
} 

}

+0

非常感謝你,它幫助我很多 – 2015-03-19 06:27:35

+0

嗨!我很高興你找到了有用的答案!要將我的答案標記爲已接受,請單擊答案旁邊的複選標記以將其從灰色變爲填充(請參閱stackoverflow.com/help/someone-answers) – user2314737 2015-03-19 07:19:48