不知道任何包裝解決方案的事,但不是很靈活是相當簡單的事情假設你可以在文件做兩遍:(以下是部分Perlish的僞代碼示例所示)
- 假設:數據可能包含空格,而不是引用ALA CSV,如果有一個空間 - 如果不是這種情況,只需使用
Text::CSV(_XS)
。
- 假設:沒有用於格式化的選項卡。
- 該邏輯定義一個「列分隔符」爲任何連續的垂直行填充100%的空間。
- 如果偶然每行有一個空格,這些空格是偏移M個字符處的數據的一部分,則邏輯將認爲偏移量M是列分隔符,因爲它無法知道更好的結果。 它可以知道更好的唯一方法是如果您需要列分隔至少X空格,其中X> 1 - 請參閱第二個代碼片段。
示例代碼:
my $INFER_FROM_N_LINES = 10; # Infer columns from this # of lines
# 0 means from entire file
my $lines_scanned = 0;
my @non_spaces=[];
# First pass - find which character columns in the file have all spaces and which don't
my $fh = open(...) or die;
while (<$fh>) {
last if $INFER_FROM_N_LINES && $lines_scanned++ == $INFER_FROM_N_LINES;
chomp;
my $line = $_;
my @chars = split(//, $line);
for (my $i = 0; $i < @chars; $i++) { # Probably can be done prettier via map?
$non_spaces[$i] = 1 if $chars[$i] ne " ";
}
}
close $fh or die;
# Find columns, defined as consecutive "non-spaces" slices.
my @starts, @ends; # Index at which columns start and end
my $state = " "; # Not inside a column
for (my $i = 0; $i < @non_spaces; $i++) {
next if $state eq " " && !$non_spaces[$i];
next if $state eq "c" && $non_spaces[$i];
if ($state eq " ") { # && $non_spaces[$i] of course => start column
$state = "c";
push @starts, $i;
} else { # meaning $state eq "c" && !$non_spaces[$i] => end column
$state = " ";
push @ends, $i-1;
}
}
if ($state eq "c") { # Last char is NOT a space - produce the last column end
push @ends, $#non_spaces;
}
# Now split lines
my $fh = open(...) or die;
my @rows =();
while (<$fh>) {
my @columns =();
push @rows, \@columns;
chomp;
my $line = $_;
for (my $col_num = 0; $col_num < @starts; $col_num++) {
$columns[$col_num] = substr($_, $starts[$col_num], $ends[$col_num]-$starts[$col_num]+1);
}
}
close $fh or die;
現在,如果你需要柱分離至少爲X的空間,其中X> 1,這也是可行的,但柱位置的解析器需要有點更復雜:
# Find columns, defined as consecutive "non-spaces" slices separated by at least 3 spaces.
my $min_col_separator_is_X_spaces = 3;
my @starts, @ends; # Index at which columns start and end
my $state = "S"; # inside a separator
NEXT_CHAR: for (my $i = 0; $i < @non_spaces; $i++) {
if ($state eq "S") { # done with last column, inside a separator
if ($non_spaces[$i]) { # start a new column
$state = "c";
push @starts, $i;
}
next;
}
if ($state eq "c") { # Processing a column
if (!$non_spaces[$i]) { # First space after non-space
# Could be beginning of separator? check next X chars!
for (my $j = $i+1; $j < @non_spaces
|| $j < $i+$min_col_separator_is_X_spaces; $j++) {
if ($non_spaces[$j]) {
$i = $j++; # No need to re-scan again
next NEXT_CHAR; # OUTER loop
}
# If we reach here, next X chars are spaces! Column ended!
push @ends, $i-1;
$state = "S";
$i = $i + $min_col_separator_is_X_spaces;
}
}
next;
}
}
來源
2010-10-14 04:37:07
DVK
請提供和榜樣。 – DVK 2010-10-14 04:39:37
我提供了一個解決方案,但它會生成六列。你在做一個列分隔符必須> 1的空間的假設嗎? – DVK 2010-10-14 04:49:14
不,但我們可以假設我知道列標題字符串,並且列數據在標題下正確對齊。 – Thilo 2010-10-14 04:51:38