計算文本文件中的模式

我有一個巨大的文本文件。我希望從該文本文件中出現短語「我感覺」後出現的單詞數量。計算文本文件中的模式

這裏是什麼樣的文件就像一個小例子：

i feel awesome 
i feel nothing but i also feel awesome 
i feel good.

，我讀了包含文本文件和匹配行「我覺得」。現在我的輸出形式是：

res3: Array[String] = Array("awesome", "nothing", "good", ....)

我需要在文本文件中找到這些詞的出現。我使用至今爲此，

代碼如下：

val c1 = scala.io.Source.fromFile("text.txt", "UTF-8"). 
    getLines.flatMap(regexpr.findAllIn(_).toList). 
    foldLeft(Map.empty[String, Int]) { 
    (count, word) => count + (word -> (count.getOrElse(word, 0) + 1)) 
    }

但是，這給我的只有幾句話是存在該數組中的計數。例如，它返回：

c1: scala.collection.immutable.Map[String,Int] = Map(awesome -> 1, nothing -> 4)

不退還的出現在列表中的所有字計數。另外，如何將Map[String,Int]寫入文本文件？

來源

2017-04-15 AzkaGilani

可能的複製[斯卡拉初學者 - 在文件來算的話最簡單的方法（http://stackoverflow.com/questions/15487413/scala-beginners-simplest-way-to- count-words-in-file） – starlight

您所指的解決方案不會返回所有匹配項。我已更新原始評論 – AzkaGilani

這裏是行的在文本文件中的列表：

val lines = scala.io.Source.fromFile("text.txt","UTF-8").getLines

這裏是一個Java打印作家：

val f = new java.io.PrintWriter(new java.io.File("counts.txt"))

這裏後，分組的話比賽「感覺」語句寫入文本文件：

lines.flatMap { 
    "i feel (\\w+)".r.findAllMatchIn(_).map(_.group(1)) // Return only paren matches 
}.toTraversable.groupBy(identity).mapValues(_.size).foreach { 
    case (word, count) => f.write(s"$count\t$word\n") // Separate by tab 
}

然後關閉文件

f.close()

見的Scala documentation on regular expressions

來源

2017-04-15 18:28:47 ashawley

中的代碼，請檢查我更新的問題。你錯過了這一點。我需要在大文本文件中找到特定數組中存在的字符串。 – AzkaGilani

謝謝這麼多:) – AzkaGilani

計算文本文件中的模式

回答

相關問題