試圖用混合的csv格式來處理混亂的4GB txt文件。數據有大約38個'分隔符'定義的''''(下面的例子)數據使用逗號作爲字段分隔符輸出,但也有逗號與數據內聯,這導致難以導入大多數平臺。我相信使用awk/sed/cat我可以修復數據,每列數據都可以用引號來定義,我只是想不出如何去做。在兩組引用中,所有逗號替換爲句點或類似的內容,包含逗號的部分位於我的列的中間,而不是數據集中的最後一個字段,我試圖撕掉部分用逗號加awk,用sed替換它們,然後用cat將其粘貼迴文件中。使用awk解析帶有分隔符的變量文本的分隔符
實際數據很敏感,無法共享,下面的例子雖然是類似的。
數據樣本:
"identifier","Status","Name","City","Application","Job","Details","column 39"
"red","paid","Dave","Philadelphia","55823","Cashier","No commas in this comment","spare1"
"rojo","past due","Steve","San Francisco","78434","trainer","Does not like sushi, beer, or ham","spare2"
"verde","pending","Duncan","Columbus","65478","CEO","Late for work, on the fifth","spare3"
期望的結果是專注於改變逗號,而「39列」後添加數據回inline或在年底
"identifier","Status","Name","City","Application","Job","Details","column 39"
"red","paid","Dave","Philadelphia","55823","Cashier","No commas in this comment","spare1"
"rojo","past due","Steve","San Francisco","78434","trainer","Does not like sushi. beer. or ham","spare2"
"verde","pending","Duncan","Columbus","65478","CEO","Late for work. on the fifth","spare3"
任何建議都非常感謝!
0123緩解,併發布一些麻煩的數據與預期的輸出。我們不喜歡自己製作測試用例。 –
@JamesBrown對缺乏內容表示歉意,我添加了幾行例子。 –