2014-09-20 103 views
2

我正在嘗試基於Clojure中的迭代來爲大文件編寫閱讀器。但是我怎樣才能在Clojure中一行一行地返回?我想讓這樣的事情:大文件逐行閱讀

(的println(do_something(READFILE(:文件選擇採用)));處理和打印第一線
(的println(do_something(READFILE(:文件選擇採用)));工藝並打印第二行

代碼:

(ns testapp.core 
    (:gen-class) 
    (:require [clojure.tools.cli :refer [cli]]) 
    (:require [clojure.java.io])) 


(defn readFile [file, cnt] 
    ; Iterate over opened file (read line by line) 
    (with-open [rdr (clojure.java.io/reader file)] 
    (let [seq (line-seq rdr)] 
     ; how return only one line there? and after, when needed, take next line? 
    ))) 

(defn -main [& args] 
    ; Main function for project 
    (let [[opts args banner] 
     (cli args 
      ["-h" "--help" "Print this help" :default false :flag true] 
      ["-f" "--file" "REQUIRED: File with data"] 
      ["-c" "--clusters" "Count of clusters" :default 3] 
      ["-g" "--hamming" "Use Hamming algorithm"] 
      ["-e" "--evklid" "Use Evklid algorithm"] 
     )] 
    ; Print help, when no typed args 
    (when (:help opts) 
     (println banner) 
     (System/exit 0)) 
    ; Or process args and start work 
    (if (and (:file opts) (or (:hamming opts) (:evklid opts))) 
     (do 
     ; Use Hamming algorithm 
     (if (:hamming opts) 
      (do 
      (println (readFile (:file opts)) 
      (println (readFile (:file opts)) 
     ) 
      ;(count (readFile (:file opts))) 
     ; Use Evklid algorithm 
     (println "Evklid"))) 
     (println "Please, type path for file and algorithm!")))) 
+0

你所說的 「回線」 是什麼意思?你可以在一些原子中寫出你的行,但是所有的逐行讀數都是毫無意義的 - 你的原子保存在記憶中。讓你的readFile接受處理函數並打印結果。 – coredump 2014-09-20 15:23:01

回答

3

可能是我很不理解什麼叫「由線回線」的意思是對的,但我會建議你寫的功能,接受文件和處理功能,t母雞爲您的大文件的每一行打印處理功能的結果。或者,evem更一般的方式,讓我們接受處理功能和輸出功能(默認調用println),所以如果我們想不僅僅是打印,但把它通過網絡,保存在某處,發送到另一個線程,等:

(defn process-file-by-lines 
    "Process file reading it line-by-line" 
    ([file] 
    (process-file-by-lines file identity)) 
    ([file process-fn] 
    (process-file-by-lines file process-fn println)) 
    ([file process-fn output-fn] 
    (with-open [rdr (clojure.java.io/reader file)] 
    (doseq [line (line-seq rdr)] 
     (output-fn 
     (process-fn line)))))) 

所以

(process-file-by-lines "/tmp/tmp.txt") ;; Will just print file line by ine 
(process-file-by-lines "/tmp/tmp.txt" 
         reverse) ;; Will print each line reversed 
4

您也可以嘗試從讀者,這是不一樣的line-seq返回的字符串列表懶懶洋洋地閱讀。細節在this answer to a very similar question討論,但它的要點是在這裏:

(defn lazy-file-lines [file] 
     (letfn [(helper [rdr] 
       (lazy-seq 
        (if-let [line (.readLine rdr)] 
        (cons line (helper rdr)) 
        (do (.close rdr) nil))))] 
     (helper (clojure.java.io/reader file)))) 

然後,您可以在map將只在必要時儘量讀線。正如鏈接答案中更詳細地討論的那樣,缺點是如果您直到文件結尾都沒有閱讀,則(.close rdr)將永遠不會運行,這可能會導致資源問題。

+1

即使您想要結束,也無法關閉文件,因爲描述符在本地範圍內。可能是如果你真的需要懶惰seq,最好明確地打開和關閉。 – coredump 2014-09-20 16:52:47

2

嘗試doseq:

(defn readFile [file] 
    (with-open [rdr (clojure.java.io/reader file)] 
    (doseq [line (line-seq rdr)] 
     (println line))))