2012-07-14 39 views
0

我試圖解析論壇news.ycombinator.com上的評論主題。但是,在查看html之後,似乎沒有層次結構來嵌套註釋。這會使分析真的很難。例如,這裏是一個父評論及其子:黑客新聞:如何提取評論層次

<!-- This part below draws the upvote/downvote images --> 
<table border=0><tr><td><table border=0><tr><td><img src="http://ycombinator.com/images/s.gif" height=1 width=0></td><td valign=top><center><a id=up_4241971 href="vote?for=4241971&dir=up&whence=%69%74%65%6d%3f%69%64%3d%34%32%34%31%37%38%34"><img src="http://ycombinator.com/images/grayarrow.gif" border=0 vspace=3 hspace=2></a><span id=down_4241971></span></center></td><td class="default"><div style="margin-top:2px; margin-bottom:-10px; "> 


<!-- This part below is user/time and permalink info for a parent comment --> 
<span class="comhead"><a href="user?id=JshWright">JshWright</a> 7 hours ago | <a href="item?id=4241971">link</a></span></div><br> 


<!-- This part below is actual Comment --> 
<span class="comment"><font color=#000000>I just got my Verizon Galaxy S3, and ordered the 20-pack of NFC tags offered by <a href="http://tagsfordroid.com" rel="nofollow">http://tagsfordroid.com</a><p>I think I know what my Dad felt like when he got his first label printer... Within days it seemed like every object in his office was labeled...<p>I've got a tag in my car to automatically send my wife a "Headed home" SMS, a tag on my night stand to toggle between 'night' (silent) and 'day' (loud) volume settings, a tag by my back door to launch CardioTrainer when I go out for a run (this one may have crossed the "I've run out of ideas" line...). I'm using the keychain tag to dial a response number for the fire department I'm a member of.</font></span><p><font size=1><u><a href="reply?id=4241971&whence=%69%74%65%6d%3f%69%64%3d%34%32%34%31%37%38%34">reply</a></u></font></td></tr></table></td></tr> 


<!-- This part below is upvote/downvote arrow for child of parent --> 
<tr><td><table border=0><tr><td><img src="http://ycombinator.com/images/s.gif" height=1 width=40></td><td valign=top><center><a id=up_4242025 href="vote?for=4242025&dir=up&whence=%69%74%65%6d%3f%69%64%3d%34%32%34%31%37%38%34"><img src="http://ycombinator.com/images/grayarrow.gif" border=0 vspace=3 hspace=2></a><span id=down_4242025></span></center></td><td class="default"><div style="margin-top:2px; margin-bottom:-10px; "> 

<!-- This part has user/time/permalink for child comment --> 
<span class="comhead"><a href="user?id=msbmsb">msbmsb</a> 7 hours ago | <a href="item?id=4242025">link</a></span></div><br> 

<!-- This part is the content of the child comment --> 
<span class="comment"><font color=#000000>I did the same thing. Tag next to the entry-way light switch for changing to an "at-home" profile, tag next to the bed for switching between night mode and morning mode, tag at work, keychain tag for switching between car mode and quiet mode.<p>And profile switching is just the basics. You can have a tag that connects guests' NFC-enabled phones to your wifi without having to hand out the password, for instance.<p>NFC task launcher + tasker is an amazing combination that opens up all kinds of possibilities.</font></span><p><font size=1><u><a href="reply?id=4242025&whence=%69%74%65%6d%3f%69%64%3d%34%32%34%31%37%38%34">reply</a></u></font></td></tr></table></td></tr><tr><td> 

那麼,如何黑客新聞專賣店的評論的分級結構,我怎麼能複製它,當我刮他們的數據?

回答

2

在表中,縮進通過圖像標籤來完成:

...<td><img src="http://ycombinator.com/images/s.gif" height=1 width=0></td>... 
...<td><img src="http://ycombinator.com/images/s.gif" height=1 width=40></td>... 

想必你會讀和解析這些。通過保留width值的內部堆棧,可以完成重新構建實際線程代表。

+0

哇!我錯過了。非常感謝。 – yayu 2012-07-14 05:24:32