2012-03-11 104 views
3

我期待將0000-2400小時格式的CSV中的一些字符串轉換爲00-24小時格式。例如使用正則表達式來修改CSV中的特定列

2011-01-01,"AA",12478,31703,12892,32575,"0906",-4.00,"1209",-26.00,2475.00 
2011-01-02,"AA",12478,31703,12892,32575,"0908",-2.00,"1236",1.00,2475.00 
2011-01-03,"AA",12478,31703,12892,32575,"0907",-3.00,"1239",4.00,2475.00 

第7和第9列分別是出發和到達時間。最好的線路應該是這樣的,當我做:

2011-01-01,"AA",12478,31703,12892,32575,"09",-4.00,"12",-26.00,2475.00 

整個CSV最終將被導入到R和我想嘗試處理一些處理的事前,因爲這將是有點兒大。我最初試圖用Perl來做這件事,但我在挑選多個數字時遇到了麻煩。我可以在給出一個帶有後視表達式的逗號之前得到一個數字,但不能超過一個。

我也開放給被告知,在Perl這樣做是不必要的愚蠢,我應該堅持R. :)

+1

我會考慮使用設計爲處理CSV,如[文本:: CSV](一個模塊http://search.cpan.org/perldoc?Text :: CSV)。 – TLP 2012-03-11 19:37:01

回答

2

就像我在評論中提到的那樣,使用像Text::CSV這樣的CSV模塊是一個安全選項。這是一個如何使用它的快速示例腳本。你會注意到它不保存報價,儘管它應該,因爲我把它放在keep_meta_info。如果對你很重要,我相信有辦法解決它。

use strict; 
use warnings; 
use Data::Dumper; 

use Text::CSV; 
my $csv = Text::CSV->new({ 
     binary => 1, 
     eol => $/, 
     keep_meta_info => 1, 
}); 
while (my $row = $csv->getline(*DATA)) { 
    for ($row->[6], $row->[8]) { 
     s/\d\d\K\d\d//; 
    } 
    $csv->print(*STDOUT, $row); 
} 

__DATA__ 
2011-01-01,"AA",12478,31703,12892,32575,"0906",-4.00,"1209",-26.00,2475.00 
2011-01-02,"AA",12478,31703,12892,32575,"0908",-2.00,"1236",1.00,2475.00 
2011-01-03,"AA",12478,31703,12892,32575,"0907",-3.00,"1239",4.00,2475.00 

輸出:

2011-01-01,AA,12478,31703,12892,32575,09,-4.00,12,-26.00,2475.00 
2011-01-02,AA,12478,31703,12892,32575,09,-2.00,12,1.00,2475.00 
2011-01-03,AA,12478,31703,12892,32575,09,-3.00,12,4.00,2475.00 
+0

感謝您的更新。我原本只是需要一些愚蠢和不安全的東西,但這可能更聰明。 :) – 2012-03-15 17:58:20

+0

@AdamHyland不客氣。 – TLP 2012-03-15 19:16:25

3

我可能也提供了自己的解決方案這一點,這是

s/"(\d\d)\d\d"/"$1"/g