2015-10-15 74 views
2

我正在嘗試使用馬爾可夫鏈製作一個簡單的聊天機器人。我已經能夠在輸入文本中使用模式成功創建字典,但是我無法弄清楚如何使用它來生成句子。如何使用馬爾可夫鏈生成句子?

import java.text.BreakIterator; 
import java.util.ArrayList; 
import java.util.List; 
import java.util.Map; 
import java.util.TreeMap; 

final class MarkovChain { 

    private static final BreakIterator sentenceIterator = BreakIterator.getSentenceInstance(); 
    private static final BreakIterator wordIterator = BreakIterator.getWordInstance(); 

    private static final Map<String, List<String>> dictionary = new TreeMap<>(); 

    public static void addDictionary(String string) { 
     string = string.toLowerCase().trim(); 
     for (final String sentence : splitSentences(string)) { 
      String lastWord = null, lastLastWord = null; 
      for (final String word : splitWords(sentence)) { 
       if (lastLastWord != null) { 
        final String key = lastLastWord + ' ' + lastWord; 
        List<String> value = dictionary.get(key); 
        if (value == null) 
         value = new ArrayList<>(); 
        value.add(word); 
        dictionary.put(key, value); 
       } 
       lastLastWord = lastWord; 
       lastWord = word; 
      } 
     } 
    } 

    private static List<String> splitSentences(final String string) { 
     sentenceIterator.setText(string); 
     final List<String> sentences = new ArrayList<>(); 
     for (int start = sentenceIterator.first(), end = sentenceIterator.next(); end != BreakIterator.DONE; start = end, end = sentenceIterator.next()) { 
      sentences.add(string.substring(start, end).trim()); 
     } 
     return sentences; 
    } 

    private static List<String> splitWords(final String string) { 
     wordIterator.setText(string); 
     final List<String> words = new ArrayList<>(); 
     for (int start = wordIterator.first(), end = wordIterator.next(); end != BreakIterator.DONE; start = end, end = wordIterator.next()) { 
      String word = string.substring(start, end).trim(); 
      if (word.length() > 0 && Character.isLetterOrDigit(word.charAt(0))) 
       words.add(word); 
     } 
     return words; 
    } 
} 

我怎麼會從字典中生成句子?

回答

1

這裏是我將如何改變你的代碼,使其能夠生成句子。我添加了Map<String, List<String>> singleWords指向上一個單詞到可能的下一個單詞列表和代碼填充此地圖循環迭代在句子中的單詞。此外,我在單詞列表的兩邊添加了點,以便記錄稱爲「在第一個單詞之前」和「在最後一個單詞之後」的特殊狀態(請參閱addDots(...))。

import java.nio.charset.Charset; 
import java.nio.file.Files; 
import java.nio.file.Paths; 
import java.text.BreakIterator; 
import java.util.ArrayList; 
import java.util.List; 
import java.util.Map; 
import java.util.Random; 
import java.util.TreeMap; 

final class MarkovChain { 

    private static final BreakIterator sentenceIterator = BreakIterator.getSentenceInstance(); 
    private static final BreakIterator wordIterator = BreakIterator.getWordInstance(); 

    private static final Map<String, List<String>> singleWords = new TreeMap<>(); 
    private static final Map<String, List<String>> dictionary = new TreeMap<>(); 

    public static void main(String[] args) throws Exception { 
     String text = new String(Files.readAllBytes(Paths.get("text.txt")), Charset.defaultCharset()); 
     addDictionary(text); 
     StringBuilder output = new StringBuilder(); 
     generateSentence(singleWords, dictionary, output, 5); 
     System.out.println(output.toString()); 
    } 

    public static void addDictionary(String string) { 
     string = string.toLowerCase().trim(); 
     for (final String sentence : splitSentences(string)) { 
      String lastWord = null, lastLastWord = null; 
      for (final String word : addDots(splitWords(sentence))) { 
       if (lastLastWord != null) { 
        final String key = lastLastWord + ' ' + lastWord; 
        List<String> value = dictionary.get(key); 
        if (value == null) 
         value = new ArrayList<>(); 
        value.add(word); 
        dictionary.put(key, value); 
       } 
       if (lastWord != null) { 
        final String key = lastWord; 
        List<String> value = singleWords.get(key); 
        if (value == null) 
         value = new ArrayList<>(); 
        value.add(word); 
        singleWords.put(key, value); 
       } 
       lastLastWord = lastWord; 
       lastWord = word; 
      } 
     } 
    } 

    private static List<String> splitSentences(final String string) { 
     sentenceIterator.setText(string); 
     final List<String> sentences = new ArrayList<>(); 
     for (int start = sentenceIterator.first(), end = sentenceIterator.next(); end != BreakIterator.DONE; start = end, end = sentenceIterator.next()) { 
      sentences.add(string.substring(start, end).trim()); 
     } 
     return sentences; 
    } 

    private static List<String> splitWords(final String string) { 
     wordIterator.setText(string); 
     final List<String> words = new ArrayList<>(); 
     for (int start = wordIterator.first(), end = wordIterator.next(); end != BreakIterator.DONE; start = end, end = wordIterator.next()) { 
      String word = string.substring(start, end).trim(); 
      if (word.length() > 0 && Character.isLetterOrDigit(word.charAt(0))) 
       words.add(word); 
     } 
     return words; 
    } 

    private static List<String> addDots(List<String> words) { 
     words.add(0, "."); 
     words.add("."); 
     return words; 
    } 

    public static void generateSentence(Map<String, List<String>> singleWords, 
      Map<String, List<String>> dictionary, StringBuilder target, int count) { 
     Random r = new Random(); 
     for (int i = 0; i < 5; i++) { 
      String w1 = "."; 
      String w2 = pickRandom(singleWords.get(w1), r); 
      while (w2 != null) { 
       target.append(w2).append(" ");    
       if (w2.equals(".")) 
        break; 
       String w3 = pickRandom(dictionary.get(w1 + " " + w2), r); 
       w1 = w2; 
       w2 = w3; 
      } 
      target.append("\n"); 
     } 
    } 

    private static String pickRandom(List<String> alternatives, Random r) { 
     return alternatives.get(r.nextInt(alternatives.size())); 
    } 
} 

我應該提到這種方法沒有優化。如果我需要提高效率,我會計算字典映射中的單詞數量,最後將它們歸一化以產生頻率。如:Map<String, Map<String, Double>> dictionary,其中內部地圖指向字頻率。儘管如此,它需要選擇不同於我在我的示例中完成的單詞。

+0

這是一個很好的例子,謝謝你! – 64test1234