2016-12-21 15 views
1

我有一個包含數百個單詞文檔的目錄,每個文檔都包含一組標準的表格。我需要解析這些表並提取它們中的數據。我開發了吐出整個表格的腳本。如何使用WIN32 :: OLE perl包導航Word表格?

#!/usr/bin/perl; 
use strict; 
use warnings; 

use Carp qw(croak); 
use Cwd qw(abs_path); 
use Path::Class; 
use Win32::OLE qw(in); 
use Win32::OLE::Const 'Microsoft Word'; 
$Win32::OLE::Warn = 3; 
=d 
my $datasheet_dir = "./path/to/worddocs"; 
my @files = glob "$datasheet_dir/*.doc"; 
print "scalar: ".scalar(@files)."\n"; 
foreach my $f (@files){ 
    print $f."\n"; 
} 
=cut 
#my $file = $files[0]; 
my $file = "word.doc"; 
print "file: $file\n"; 

run(\@files); 

sub run { 
    my $argv = shift; 
    my $word = get_word(); 

    $word->{DisplayAlerts} = wdAlertsNone; 
    $word->{Visible}  = 1; 

    for my $word_file (@$argv) { 
     print_tables($word, $word_file); 
    } 

    return; 
} 

sub print_tables { 
    my $word = shift; 
    my $word_file = file(abs_path(shift)); 

    my $doc = $word->{Documents}->Open("$word_file"); 
    my $tables = $word->ActiveDocument->{Tables}; 

    for my $table (in $tables) { 
     my $text = $table->ConvertToText(wdSeparateByTabs)->Text; 
     $text =~ s/\r/\n/g; 
     print $text, "\n"; 
    } 

    $doc->Close(0); 
    return; 
} 

sub get_word { 
    my $word; 
    eval { $word = Win32::OLE->GetActiveObject('Word.Application'); 1 } 
     or die "[email protected]\n"; 
    $word and return $word; 
    $word = Win32::OLE->new('Word.Application', sub { $_[0]->Quit }) 
     or die "Oops, cannot start Word: ", Win32::OLE->LastError, "\n"; 
    return $word; 
} 

有沒有導航單元格的方法?我只想返回第一列中具有特定值的行?

例如,對於下表,我只想grep第一列中有結果的行。

apple  pl 
banana  xml 
California csv 
pickle  txt 
Illinois gov 
pear  doc 

回答

0

你可以使用OLE訪問表的單個單元格,在第一次使用Columns對象和Rows收集得到的尺寸。

或者你可以將文本後處理成一個Perl數組,並迭代它。 代替

my $text = $table->ConvertToText(wdSeparateByTabs)->Text; 
$text =~ s/\r/\n/g; 
print $text, "\n"; 

類似

my %fruit; # population of look-up table of fruit omitted 

my $text = $table->ConvertToText(wdSeparateByTabs)->Text; 
my @lines = split /\r/, $text; 
for my $line (@lines) { 
    my @fields = split /\t/, $lines; 

    next unless exists $fruit{$fields[0]}; 

    print "$line\n"; 
} 

加細用於區分大小寫等,可以根據需要添加。

相關問題