2012-08-16 27 views
2

我如何能在Safari的網絡督察獲得一個DOM樹特定DOM對象喜歡用的libxml2DOM樹就像在Safari Web檢查使用的libxml2

Safari Screenshot

+0

http://stackoverflow.com/q/800104/694576 – alk 2012-08-16 11:50:11

+0

@ Ph99Ph可能的欺騙:你想使用libxml2解析HTML頁面還是什麼? – TOC 2012-08-16 18:41:54

+0

@TOC基本上我想知道在某個位置DOM樹是什麼樣的。 – Ph99Ph 2012-08-16 21:15:53

回答

3

有了這個示例代碼,您可以要求TAG,如果它出現在你的HTML程序會轉儲(我這裏使用計算器,從頭部標籤,在你的代碼,你可能需要使用libcurl讓你的HTML緩衝劑):

/* Compile like this : 
* gcc -Wall html_dom_dump.c -o html_dom_dump `xml2-config --cflags` `xml2-config --libs` 
*/ 
#include <stdio.h> 
#include <libxml/HTMLparser.h> 
#include <libxml/tree.h> 
#include <stdlib.h> 

char stackoverflow_html_head[] = "<head>\ 
    <title>Stack Overflow</title>\ 
    <link rel=\"shortcut icon\" href=\"http://cdn.sstatic.net/stackoverflow/img/favicon.ico\">\ 
    <link rel=\"apple-touch-icon\" href=\"http://cdn.sstatic.net/stackoverflow/img/apple-touch-icon.png\">\ 
    <link rel=\"search\" type=\"application/opensearchdescription+xml\" title=\"Stack Overflow\" href=\"/opensearch.xml\">\ 
\ 
\ 
     StackExchange.init({\"stackAuthUrl\":\"https://stackauth.com\",\"serverTime\":1345183802,\"styleCode\":true,\"enableUserHovercards\":true,\"site\":{\"name\":\"Stack Overflow\",\"description\":\"Q\\u0026A for professional and enthusiast programmers\",\"isNoticesTabEnabled\":true,\"newTitleSearchBoxEnabled\":false,\"enableSocialMediaInSharePopup\":true},\"user\":{\"isAnonymous\":true,\"fkey\":\"52eb3bfedea6eccd9936d40e8ca0c8de\",\"notificationsUnviewedCount\":0,\"inboxUnviewedCount\":-1}});  StackExchange.using.setCacheBreakers({\"js/prettify-full.js\":\"d1cd9a23171c\",\"js/moderator.js\":\"8c49fc268737\",\"js/full-anon.js\":\"945170d238e3\",\"js/full.js\":\"c60de8021771\",\"js/wmd.js\":\"93b92575f8bc\",\"js/third-party/jquery.autocomplete.min.js\":\"e5f01e97f7c3\",\"js/mobile.js\":\"6eb68240242f\",\"js/help.js\":\"fc9fb0517db2\",\"js/tageditor.js\":\"c1ba807b32aa\",\"js/tageditornew.js\":\"bd66fabe1c71\",\"js/inline-tag-editing.js\":\"be882e188985\",\"js/revisions.js\":\"8c6bcd93b7fe\",\"js/suggested-edits.js\":\"46c4696efca5\",\"js/probes.js\":\"beb933322ff0\",\"js/review.js\":\"fca067ef962b\"});\ 
    </script>\ 
\ 
</head>"; 

int found = 0; 

int walk_tree(xmlNode *node, xmlDocPtr doc, char *pattern) 
{ 
     xmlNode *cur_node = NULL; 

     for (cur_node = node; cur_node; cur_node = cur_node->next) 
     { 
       if ((!xmlStrcmp(cur_node->name, (const xmlChar *)pattern))) 
       { 
         found++; 
         fprintf(stdout, "\n----> WE GOT IT\n\n"); 
         xmlElemDump(stdout, doc, cur_node); 
         fprintf(stdout, "\n<----\n"); 
       } 
       walk_tree(cur_node->children, doc, pattern); 
     } 

     return found; 
} 

int main(int argc, char **argv) 
{ 
     int ret; 
     /* Create a parser context*/ 
     htmlParserCtxtPtr html_parser = htmlCreatePushParserCtxt(NULL, NULL, NULL, 0, NULL, 0); 

     if (argc != 2) 
     { 
       fprintf(stderr, "Usage : ./html_dom_dump TAG"); 

       exit(EXIT_FAILURE); 
     } 

     /* remove blank nodes 
     * suppress error reports 
     * suppress warning reports 
     * Forbid network access 
     * more on this options: http://xmlsoft.org/html/libxml-HTMLparser.html#htmlParserOption 
     */ 
     htmlCtxtUseOptions(html_parser, HTML_PARSE_NOBLANKS | HTML_PARSE_NOERROR | HTML_PARSE_NOWARNING | HTML_PARSE_NONET); 
     /* parsing our stackoverflow html header */ 
     htmlParseChunk(html_parser, stackoverflow_html_head, sizeof(stackoverflow_html_head), 0); 
     /* Traverse all the tree to find the given TAG (pattern) */ 
     ret = walk_tree(xmlDocGetRootElement(html_parser->myDoc), html_parser->myDoc, argv[1]); 
     if (!ret) 
       fprintf(stdout, "No luck, this tag does not exit!\n"); 

     return 0; 
} 

編譯和鏈接的libxml2:

​​

你也可以像這樣運行:

[email protected]:~$ ./html_dom_dump head 

----> WE GOT IT 

<head> 
<title>Stack Overflow</title> 
<link rel="shortcut icon" href="http://cdn.sstatic.net/stackoverflow/img/favicon.ico"> 
<link rel="apple-touch-icon" href="http://cdn.sstatic.net/stackoverflow/img/apple-touch-icon.png"> 
<link rel="search" type="application/opensearchdescription+xml" title="Stack Overflow" href="/opensearch.xml"> 
</head> 

<---- 
[email protected]:~$ ./html_dom_dump link 

----> WE GOT IT 

<link rel="shortcut icon" href="http://cdn.sstatic.net/stackoverflow/img/favicon.ico"> 

<---- 

----> WE GOT IT 

<link rel="apple-touch-icon" href="http://cdn.sstatic.net/stackoverflow/img/apple-touch-icon.png"> 

<---- 

----> WE GOT IT 

<link rel="search" type="application/opensearchdescription+xml" title="Stack Overflow" href="/opensearch.xml"> 
<---- 
[email protected]:~$ ./html_dom_dump TAG 
No luck, this tag does not exit! 

如果你不知道你也可以使用libcurl + LibTidy來獲取和分析你的HTML: http://curl.haxx.se/libcurl/c/htmltidy.html