我有一個需要解析的CSV文件類型。下面的正是我需要考慮(缺少列標題,引號內換行,丟失數據等)的條件:RegExp適用於String.match,但不適用於String.split
ID,NAME,TITLE,DESCRIPTION,,
PRO1234,"JOHN SMITH",ENGINEER,"JOHN HAS BEEN WORKING
HARD ON BEING A GOOD
SERVENT."
PRO1235,"KEITH SMITH",ENGINEER,"keith has been working
hard on being a good
servent."
PRO1235,"KENNY SMITH",,"keith has been working
hard on being a good
servent."
PRO1235,"RICK SMITH",,,
你會發現,有行以及換行說明內部將用於新的數據行。
我寫這個正則表達式查找換行符報價之外,它的偉大工程here
代碼,如何使用Node.js:
var fs = require('fs');
function parseCSV(filename){
var rx = new RegExp(/\n(?=([^"\\]*(\\.|"([^"\\]*\\.)*[^"\\]*"))*[^"]*$)/g);
var strFile = fs.readFileSync(filename).toString();
console.log("line feed count via match: " + strFile.match(rx).length);
var csv = strFile.split(rx);
console.log("csv length: " + csv.length);
console.log("csv items ###############################");
csv.forEach(function(e,i,a){
console.log("item e: " + e);
});
}
當我運行這個,你」會看到換行計數(按匹配找到的換行)是正確的,即。然而,使用與String.split()相同的RET時,它回來了所得陣列是不穩定的:
line feed count via match: 4
csv length: 17
csv items ###############################
item e: ID,NAME,TITLE,DESCRIPTION,,
item e:
PRO1235,"RICK SMITH"
item e: "RICK SMITH"
item e: undefined
item e: PRO1234,"JOHN SMITH",ENGINEER,"JOHN HAS BEEN WORKING
HARD ON BEING A GOOD
SERVENT."
item e:
PRO1235,"RICK SMITH"
item e: "RICK SMITH"
item e: undefined
item e: PRO1235,"KEITH SMITH",ENGINEER,"keith has been working
hard on being a good
servent."
item e:
PRO1235,"RICK SMITH"
item e: "RICK SMITH"
item e: undefined
item e: PRO1235,"KENNY SMITH",,"keith has been working
hard on being a good
servent."
item e: PRO1235,"RICK SMITH"
item e: "RICK SMITH"
item e: undefined
item e: PRO1235,"RICK SMITH",,,
我在做什麼毛病分裂?我的想法是,如果我能確定4個與match()完美配合的換行符,那麼同一個regEx應該提供將字符串「分割」的位置。
重新發明輪子的經典案例。 [爲什麼不使用專用的CSV解析器?](https://code.google.com/p/jquery-csv/) – anubhava 2014-09-23 16:46:19
首先,您不能從中間開始解析字符串。 – sln 2014-09-23 17:01:17
sln - 你能解釋一下你的評論嗎?如果我調用string.split(regExp),如何解析中間的字符串? – neoRiley 2014-09-23 17:10:59