如何在Scala中將文件拆分爲連續的非空行塊？

例如我們有以下內容的文件：如何在Scala中將文件拆分爲連續的非空行塊？

aaa 
    bbb 
ccc 

dd dd 
eee 
fff 

gg 
hhhhh

的任務是分析此文件爲有序的/編號的集合（地圖，數組或其他）將包含三個相鄰塊作爲集合字符串。

這樣做算法Java風格的方式似乎相當明顯，但如果有人可以建議一個功能性的Scala-idiomatic解決方案，它會很好。

來源

2016-09-20 Ivan

分裂由任意分隔符（|）和分組到不同的塊：

val blocks: List[List[String]] = Source 
    .fromFile("<path-to-file>").getLines() 
    .mkString("|") 
    .split("\\|{2,}").toList 
    .map(_.split("\\|").toList)

這給你一個

List(List(aaa, bbb, ccc), List(dd dd, eee, fff), List(gg, hhhhh))

來源

2016-09-20 15:11:01

相當整齊那'mkString'會導致在合理大文件上出現許多不需要的問題。如果你想讀取內存中的所有文件，那麼你根本不需要執行'mkString'，只需在每個元素上使用'isEmpty'檢查來對列表進行分割。儘管你在記憶中獲得結果，所以它無論如何不會影響。但是如果我們想分流那些連續的塊，我們應該避免將所有這些都讀入內存。 –

使用Stream.span：

scala> def chunks(s: Stream[String]): Stream[Seq[String]] = { 
    | val (h, t) = s.span(_.nonEmpty) 
    | h.toSeq #:: chunks(t.tail) } 
chunks: (s: Stream[String])Stream[Seq[String]]

隨着一些技巧：

scala> def chunks(s: Stream[String]): Stream[Stream[String]] = { 
    | val (h, t) = s.span(_.nonEmpty) 
    | if (h.isEmpty) Stream.empty else h #:: chunks(t drop 1) } 
chunks: (s: Stream[String])Stream[Stream[String]] 

scala> val cs = chunks(lines.lines.toStream).iterator 
cs: Iterator[Stream[String]] = non-empty iterator 

scala> cs.next.toList 
res0: List[String] = List(aaa, " bbb", ccc) 

scala> cs.next.toList 
res1: List[String] = List(dd dd, eee, fff) 

scala> cs.next.toList 
res2: List[String] = List(gg, hhhhh) 

scala> cs.hasNext 
res3: Boolean = false

來源

2016-09-20 15:18:14

使用可以使用拆分改造它：

def contigSplit(s : String) : Array[Array[String]] = s.split("\n\n").map(_.split("\n"))

這工作，因爲一個連續的塊帶有兩個換行終止。

REPL用法：

scala> val s = """ 
    | aaa 
    | bbb 
    | ccc 
    | 
    | dd dd 
    | eee 
    | fff 
    | 
    | gg 
    | hhhhh 
    | """ 

scala> s.split("\n\n").map(_.split("\n")) 
res7: Array[Array[String]] = Array(Array("", aaa, " bbb", ccc), Array(dd dd, eee, fff), Array(gg, hhhhh))

備選：

如果線可以包含其它空白的空白，可以使用一個正則表達式分裂：

def contigSplitRegEx(s : String) : Array[Array[String]] = "\n\\s*\n".r.split(s).map(_.split("\n"))

來源

2016-09-20 15:29:50

def getListOfContguousLists(iterator: Iterator[String]): List[List[String]] = { 
    val (listOfContiguousList, lastList) = iterator 
    .foldLeft((List.empty[List[String]], List.empty[String]))({ 
     case ((listOfLists, list), line) => (line.isEmpty, list.isEmpty) match { 
     case (true, true) => (listOfLists, list) 
     case (true, false) => (listOfLists :+ list, List.empty[String]) 
     case (false, _) => (listOfLists, list :+ line) 
     } 
    }) 
    lastList.isEmpty match { 
    case true => listOfContiguousList 
    case false => listOfContiguousList :+ lastList 
    } 
} 

val list = getListOfContiguousLists(scala.io.Source.fromFile("").getLines)

來源

2016-09-20 15:31:40

麥ntain地圖編號的水桶，並採取行號％3桶決定放線

Source.fromFile("some_file").getLines().toList.filterNot(_.isEmpty) 
     .zipWithIndex 
     .foldLeft(Map(0 -> List.empty[String], 1 -> List.empty[String], 2 -> List.empty[String])) { (result, current) => 
     result.updated(current._2 % 3, result(current._2 % 3) ++ List(current._1)) 
     }

來源

2016-09-20 16:35:14 pamu

這不是單純的功能，卻是作爲一個迭代

def groupBlanksIterator(xs:Iterator[String]) = 
new Iterator[List[String]] 
    { def hasNext = xs.hasNext; def next = xs.takeWhile(_.nonEmpty).toList} 

groupBlanksIterator(scala.io.Source.fromFile("whatever").getLines)

來源

2016-09-20 20:59:56

如何在Scala中將文件拆分爲連續的非空行塊？

回答

相關問題