2015-03-08 93 views
1

讀空值我想讀一個CSV到一個結構:與升壓::精神

struct data 
{ 
    std::string a; 
    std::string b; 
    std::string c; 
} 

不過,我想讀甚至空字符串,以確保所有值都各得其所。 我適應該結構到一個boost ::融合,所以下面的工作:

// Our parser (using a custom skipper to skip comments and empty lines) 
template <typename Iterator, typename skipper = comment_skipper<Iterator> > 
    struct google_parser : qi::grammar<Iterator, addressbook(), skipper> 
{ 
    google_parser() : google_parser::base_type(contacts, "contacts") 
    { 
    using qi::eol; 
    using qi::eps; 
    using qi::_1; 
    using qi::_val; 
    using qi::repeat; 
    using standard_wide::char_; 
    using phoenix::at_c; 
    using phoenix::val; 

    value = *(char_ - ',' - eol) [_val += _1]; 

    // This works but only for small structs 
    entry %= value >> ',' >> value >> ',' >> value >> eol; 
    } 

    qi::rule<Iterator, std::string()> value; 
    qi::rule<Iterator, data()> entry; 
}; 

不幸的是,repeat存儲在向量中的所有非空值,從而屬性的值可以被混合在一起(即,如果場爲b爲空,也可能包含來自c內容):

entry %= repeat(2)[ value >> ','] >> value >> eol; 

我想用類似repeat短規則爲我的結構在實踐60個屬性!不僅是編寫60條規則乏味,但似乎Boost不喜歡長規則...

+0

我注意到 - 在編寫答案之後 - 你認爲輸入是CSV。請參閱[如何使用Spirit解析CSV](http://stackoverflow.com/questions/18365463/h/18366335#18366335)和[this other answer](http://stackoverflow.com/questions/7436481/h/) 7462539#7462539)(以及[適用於映射文件的零拷貝分析](http://stackoverflow.com/questions/23699731/s/23703810#23703810))。還有一個映射列:[提高精神解析CSV與變量順序列](http://stackoverflow.com/questions/27967195/b/27967473#27967473)。爲了您的靈感 – sehe 2015-03-09 00:55:03

+0

sehe,感謝您的深入解答。你不僅非常清楚,而且你不厭其煩地寫出完整的例子。像你這樣的人讓stackoverflow值得。 – 2015-03-09 20:35:04

回答

2

你只是想確保你解析「空」字符串的值。

value = +(char_ - ',' - eol) | attr("(unspecified)"); 
entry = value >> ',' >> value >> ',' >> value >> eol; 

觀看演示:

Live On Coliru

//#define BOOST_SPIRIT_DEBUG 
#include <boost/fusion/adapted/struct.hpp> 
#include <boost/spirit/include/qi.hpp> 

namespace qi = boost::spirit::qi; 

struct data { 
    std::string a; 
    std::string b; 
    std::string c; 
}; 

BOOST_FUSION_ADAPT_STRUCT(data, (std::string, a)(std::string, b)(std::string, c)) 

template <typename Iterator, typename skipper = qi::blank_type> 
struct google_parser : qi::grammar<Iterator, data(), skipper> { 
    google_parser() : google_parser::base_type(entry, "contacts") { 
     using namespace qi; 

     value = +(char_ - ',' - eol) | attr("(unspecified)"); 
     entry = value >> ',' >> value >> ',' >> value >> eol; 

     BOOST_SPIRIT_DEBUG_NODES((value)(entry)) 
    } 
    private: 
    qi::rule<Iterator, std::string()> value; 
    qi::rule<Iterator, data(), skipper> entry; 
}; 

int main() { 
    using It = std::string::const_iterator; 
    google_parser<It> p; 

    for (std::string input : { 
      "something, awful, is\n", 
      "fine,,just\n", 
      "like something missing: ,,\n", 
     }) 
    { 
     It f = input.begin(), l = input.end(); 

     data parsed; 
     bool ok = qi::phrase_parse(f,l,p,qi::blank,parsed); 

     if (ok) 
      std::cout << "Parsed: '" << parsed.a << "', '" << parsed.b << "', '" << parsed.c << "'\n"; 
     else 
      std::cout << "Parse failed\n"; 

     if (f!=l) 
      std::cout << "Remaining unparsed: '" << std::string(f,l) << "'\n"; 
    } 
} 

打印:

Parsed: 'something', 'awful', 'is' 
Parsed: 'fine', '(unspecified)', 'just' 
Parsed: 'like something missing: ', '(unspecified)', '(unspecified)' 

但是,你有一個更大的問題。假設qi::repeat(2) [ value ]將解析爲2個字符串不起作用。

repeat,像operator*,operator+operator%解析成容器屬性。在這種情況下,容器屬性(字符串)將接收輸入從第二value還有:

Live On Coliru

Parsed: 'somethingawful', 'is', '' 
Parsed: 'fine(unspecified)', 'just', '' 
Parsed: 'like something missing: (unspecified)', '(unspecified)', '' 

由於這是不是你想要的,考慮你的數據類型:

auto_方法:

如果您教Qi如何提取單個值,您可以使用一個簡單的規則,如

entry = skip(skipper() | ',') [auto_] >> eol; 

這樣一來,Spirit本身就會爲給定的Fusion序列生成正確數量的值提取!

這裏有一個快速的髒方法:

CAVEAT專業爲std::string直接像這可能不是最好的主意(它可能並不總是合適的,並可能與其他解析器嚴重交互)。然而,在默認情況下create_parser<std::string>未定義(因爲,​​它會做什麼?),所以我抓住了機會,這個演示的目的:

namespace boost { namespace spirit { namespace traits { 
    template <> struct create_parser<std::string> { 
     typedef proto::result_of::deep_copy< 
      BOOST_TYPEOF(
       qi::lexeme [+(qi::char_ - ',' - qi::eol)] | qi::attr("(unspecified)") 
      ) 
     >::type type; 

     static type call() { 
      return proto::deep_copy(
       qi::lexeme [+(qi::char_ - ',' - qi::eol)] | qi::attr("(unspecified)") 
      ); 
     } 
    }; 
}}} 

再次,看演示輸出:

Live On Coliru

Parsed: 'something', 'awful', 'is' 
Parsed: 'fine', 'just', '(unspecified)' 
Parsed: 'like something missing: ', '(unspecified)', '(unspecified)' 

備註有些先進的魔法讓船長「恰到好處」工作(見skip()[]lexeme[])。一些一般性的解釋可以在這裏找到:Boost spirit skipper issues

UPDATE

容器方法

有一個微妙了這一點。其實兩個。所以這裏有一個演示:

Live On Coliru

//#define BOOST_SPIRIT_DEBUG 
#include <boost/fusion/adapted/struct.hpp> 
#include <boost/spirit/include/qi.hpp> 

namespace qi = boost::spirit::qi; 

struct data { 
    std::vector<std::string> parts; 
}; 

BOOST_FUSION_ADAPT_STRUCT(data, (std::vector<std::string>, parts)) 

template <typename Iterator, typename skipper = qi::blank_type> 
struct google_parser : qi::grammar<Iterator, data(), skipper> { 
    google_parser() : google_parser::base_type(entry, "contacts") { 
     using namespace qi; 
     qi::as<std::vector<std::string> > strings; 

     value = +(char_ - ',' - eol) | attr("(unspecified)"); 
     entry = strings [ repeat(2) [ value >> ',' ] >> value ] >> eol; 

     BOOST_SPIRIT_DEBUG_NODES((value)(entry)) 
    } 
    private: 
    qi::rule<Iterator, std::string()> value; 
    qi::rule<Iterator, data(), skipper> entry; 
}; 

int main() { 
    using It = std::string::const_iterator; 
    google_parser<It> p; 

    for (std::string input : { 
      "something, awful, is\n", 
      "fine,,just\n", 
      "like something missing: ,,\n", 
     }) 
    { 
     It f = input.begin(), l = input.end(); 

     data parsed; 
     bool ok = qi::phrase_parse(f,l,p,qi::blank,parsed); 

     if (ok) { 
      std::cout << "Parsed: "; 
      for (auto& part : parsed.parts) 
       std::cout << "'" << part << "' "; 
      std::cout << "\n"; 
     } 
     else 
      std::cout << "Parse failed\n"; 

     if (f!=l) 
      std::cout << "Remaining unparsed: '" << std::string(f,l) << "'\n"; 
    } 
} 

的精妙之處是:

+0

添加了讓Qi從適應融合結構中生成適當解析器的方法:** ['entry = skip(skipper()|',')[auto_] >> eol;'](http:// coliru。 stacked-crooked.com/a/30541461bab7e1a1)** – sehe 2015-03-09 00:48:51

+0

我想保留我的結構以備後用,所以我在容器和'auto_'方法之間猶豫不決。不幸的是,關於容器的文檔有點令人恐懼,缺乏示例。所以我更喜歡'auto_',但所有代碼對我來說都像是黑魔法:) – 2015-03-09 20:36:14

+0

我個人建議通過將AST類型與您希望使用的語法進行匹配來簡化。在這裏顯示的auto_方法中已經存在一些限制(關於相鄰分隔符呢?),這將終結夢想的自動語法分析器生成。 – sehe 2015-03-09 20:49:22