2015-12-02 52 views
-1

我有一個看起來像如何找出殼列式,並獲得差異

emp_id(int),name(string),age(int) 
1,hasa,34 
2,dafa,45 
3,fasa,12 
8f,123Rag,12 
8,fafl,12 

要求的示例文件:列數據類型指定爲字符串和整數。 Emp_id應該是一個不是字符串的整數。這些條件對於名稱和年齡列是相同的。

我的輸出應該像#

Actual column Emp_id type is INT but string was found at the position 4, value is 8f 
Actual column name type is STRING but numbers were found at the position 4, value is 123Rag 

繼續..

這裏是我的代碼 shell腳本

read input 
if [ $input -eq $input 2>/dev/null ] 
then 
    echo "$input is an integer" 
else 
    echo "$input is not an integer" 
fi 

在Python中,我與Isinstance嘗試(OBJ ,類型),但它沒有達到目的。 可以在這方面指導我,任何shell/python/perl腳本的幫助將不勝感激!

+0

您的代碼與您的要求無關。至少表現出誠實的嘗試。 – karakfa

+0

[BASH:測試字符串是否作爲一個整數有效嗎?]可能的重複(http://stackoverflow.com/questions/2210349/bash-test-whether-string-is-valid-as-an-integer) – tripleee

+0

什麼是將數字放入字符串字段的問題? –

回答

1

這裏是一個awk的解決方案:

awk -F"," 'NR==1{for(i=1; i <= NF; i++){ 
         split($i,a,"("); 
         name[i]=a[1]; 
         type[i] = ($i ~ "int" ? "INT" : "String")}next} 
      {for(i=1; i <= NF; i++){ 
       if($i != int($i) && type[i] == "INT"){error[i][NR] = $i} 
       if($i ~ /[0-9]+/ && type[i] == "String"){error[i][NR] = $i} 
      }} 
      END{for(i in error){ 
         for(key in error[i]){ 
          print "Actual column "name[i]" type is "type[i]\ 
            " but string was found at the position "key-1\ 
            ", value is "error[i][key]}}}' inputFile 

輸出是 - 根據需要:

Actual column emp_id type is INT but string was found at the position 4, value is 8f 
Actual column name type is String but string was found at the position 4, value is 123Rag 

然而,在我看來123Rag是一個字符串,不應該被表示爲一個不正確的項在第二列。

+0

你的INT測試錯誤IMO:值'1.1'會通過。這更好:'$ i!= int($ i)'。否則,我的想法也是如此。 –

+0

@glenn jackman:是的,你說得對,當然! '$ i == $ i + 0'測試這個值是否是一個數字(int或double無關緊要)。我不知何故忘記了'int'限制。 –

0

隨着perl我會解決它像這樣:

  • 定義匹配/不字符串內容相匹配的正則表達式的一些模式。
  • 挑出標題行 - 將其分爲名稱和類型。 (可選地報告類型是否不匹配)。
  • 迭代你的領域,通過柱匹配,找出類型和應用正則表達式來驗證

喜歡的東西:

#!/usr/bin/env perl 

use strict; 
use warnings; 
use Data::Dumper; 

#define regex to apply for a given data type 
my %pattern_for = (
    int => qr/^\d+$/, 
    string => qr/^[A-Z]+$/i, 
); 

print Dumper \%pattern_for; 

#read the first line. 
# <> is a magic filehandle, that reads files specified as arguments 
# or piped input - like grep/sed do. 
my $header_row = <>; 
#extract just the names, in order. 
my @headers = $header_row =~ m/(\w+)\(/g; 
#create a type lookup for the named headers. 
my %type_for = $header_row =~ m|(\w+)\((\w+)\)|g; 

print Dumper \@headers; 
print Dumper \%type_for; 

#iterate input again 
while (<>) { 
    #remove trailing linefeed 
    chomp; 

    #parse incoming data into named fields based on ordering. 
    my %fields; 
    @fields{@headers} = split /,/; 
    #print for diag 
    print Dumper \%fields; 

    #iterate the headers, applying the looked up 'type' regex 
    foreach my $field_name (@headers) { 
     if ($fields{$field_name} =~ m/$pattern_for{$type_for{$field_name}}/) { 
      print 
       "$field_name => $fields{$field_name} is valid, $type_for{$field_name} matching $pattern_for{$type_for{$field_name}}\n"; 
     } 
     else { 
      print "$field_name $fields{$field_name} not valid $type_for{$field_name} matching $pattern_for{$type_for{$field_name}}\n"; 
     } 
    } 
} 

這讓您的輸入(不僅僅是殘疾人爲了簡潔):

name 123Rag not valid string matching (?^i:^[A-Z]+$) 
emp_id 8f not valid int matching (?^:^\d+$) 

注意 - 它僅支持「簡單」的CSV風格(沒有嵌套逗號或引號),但可以很容易地適應使用Text::CSV模塊。