我有符號化的代碼,我想結果寫入文件中的.txt這樣我就可以使用該文件刪除所有的標點文件寫令牌到Java中
public static void tokenization() throws FileNotFoundException, IOException{
String msg;
int numberOfTotalTokens =0;
int numberOfMessages = 0;
String data = "E:\\Data\\SMSSpamCollection.txt";
FileInputStream fisData = new FileInputStream(data);
BufferedReader readBufferData = new BufferedReader(new InputStreamReader(fisData));
try{
while ((msg =readBufferData.readLine()) != null) {
int numberOfTokens = 0;
StringTokenizer tokens = new StringTokenizer(msg);
StringBuilder sb = new StringBuilder();
System.out.println("Before: "+msg);
System.out.print("After : ");
while (tokens.hasMoreTokens()) {
msg = tokens.nextToken();
String msgLower = msg.toLowerCase();
numberOfTokens++;
numberOfTotalTokens++;
}
System.out.println("");
System.out.println("Total Tokens: "+numberOfTokens);
System.out.println("\n");
numberOfTokens++;
numberOfMessages++;
}
System.out.println("Total Tokens: "+numberOfTotalTokens);
System.out.println("Total Messages: "+numberOfMessages);
}
catch (Exception e){
System.out.println("Error Exception: "+e.getMessage());
}
}
的這個結果代碼設置標記: 例如: 我 名 是 計算器
該令牌集需要寫入到文件中的.txt 我怎麼可以編寫成文件的.txt
請[編輯]你的問題來解釋究竟有什麼錯,你已經得到了代碼。感謝您提高問題的參考價值並使其更具責任感! – 2015-03-19 03:53:23
你需要標記化,刪除標點符號(或停止詞),然後寫入.txt? – VedX 2015-03-19 05:47:19