2013-02-26 110 views
6

我想獲取CNN評論系統評論的所有評論。 舉個例子,http://edition.cnn.com/2013/02/25/tech/innovation/google-glass-privacy-andrew-keen/index.html?hpt=hp_c1如何從Disqus獲取所有評論?

評論系統要求我們點擊「加載更多」,以便我們可以看到更多評論。 我已經嘗試使用PHP來解析HTML,但它無法加載所有評論,因爲使用JavaScript。 所以我想知道如果有人有一個更方便的方式來檢索所有來自特定的cnn網址的評論。

有沒有人成功了? 在此先感謝

回答

6

Disqus API包含使用在JSON響應中返回的遊標的分頁方法。看到這裏有關遊標的信息:http://disqus.com/api/docs/cursors/

既然你提到的PHP,這樣的事情應該讓你開始:

<?php 
$apikey = '<your key here>'; // get keys at http://disqus.com/api/ — can be public or secret for this endpoint 
$shortname = '<the disqus forum shortname>'; // defined in the var disqus_shortname = '...'; 
$thread = 'link:<URL of thread>'; // IMPORTANT the URL that you're viewing isn't necessarily the one stored with the thread of comments 
//$thread = 'ident:<identifier of thread>'; Use this if 'link:' has no results. Defined in 'var disqus_identifier = '...'; 
$limit = '100'; // max is 100 for this endpoint. 25 is default 

$endpoint = 'https://disqus.com/api/3.0/threads/listPosts.json?api_key='.$apikey.'&forum='.$shortname.'&limit='.$limit.'&cursor='.$cursor; 

$j=0; 
listcomments($endpoint,$cursor,$j); 

function listcomments($endpoint,$cursor,$j) { 

    // Standard CURL 
    $session = curl_init($endpoint.$cursor); 
    curl_setopt($session, CURLOPT_RETURNTRANSFER, 1); // instead of just returning true on success, return the result on success 
    $data = curl_exec($session); 
    curl_close($session); 

    // Decode JSON data 
    $results = json_decode($data); 
    if ($results === NULL) die('Error parsing json'); 

    // Comment response 
    $comments = $results->response; 

    // Cursor for pagination 
    $cursor = $results->cursor; 

    $i=0; 
    foreach ($comments as $comment) { 
     $name = $comment->author->name; 
     $comment = $comment->message; 
     $created = $comment->createdAt; 
     // Get more data... 

     echo "<p>".$name." wrote:<br/>"; 
     echo $comment."<br/>"; 
     echo $created."</p>"; 
     $i++; 
    } 

    // cursor through until today 
    if ($i == 100) { 
     $cursor = $cursor->next; 
     $i = 0; 
     listcomments($endpoint,$cursor); 
     /* uncomment to only run $j number of iterations 
     $j++; 
     if ($j < 10) { 
      listcomments($endpoint,$cursor,$j); 
     }*/ 
    } 
} 

?> 
+0

非常感謝!但是,我們需要準確地爲$ thread(線程的URL)和$ cursor?順便說一句,我們最多隻能有100條評論嗎? – 2013-02-26 09:15:23

+0

線程的URL只是註釋頁面的URL。在這種情況下,它是http://www.cnn.com/2013/02/25/tech/innovation/google-glass-privacy-andrew-keen/index.html - 遊標值從API響應中提取,並且代表下一組100條評論。該腳本將一直持續到沒有其他評論爲止。 – 2013-02-27 05:39:14

+0

我將$ shortname設置爲'cnn'(var disqus_shortname ='cnn';)和$ thread''鏈接:'並保持$ cursor爲空,但事實證明「Error parsing json」。我想念什麼? – 2013-02-27 07:11:14

3

只是一個加法:拿到disqus評論的URL的網頁上,它的發現,在Web瀏覽器控制檯運行這段JavaScript代碼:

var visit = function() { 
var url = document.querySelector('div#disqus_thread iframe').src; 

String.prototype.startsWith = function (check) { 
    return(this.indexOf(check) == 0); 
}; 

if (!url.startsWith('https://')) return url.slice(0, 4) + "s" + url.slice(4); 

return url; 
}(); 

自變量現在是在 '參觀'

console.log(visit); 

我幫你把所有的數據都轉換成UTF-8 json格式,保存成.txt格式,可以在這裏找到它link。 json格式包含一些變量名稱,但您需要的是'data'變量,它是一個JavaScript數組。

遍歷其中的每一個,然後將它們拆分爲'x == x'。 'x == x'是爲了確保那些在捕獲的地方發表評論的人的用戶名。在數字格式中沒有用戶標識而只有名稱的情況下,這意味着該帳戶不再處於活動狀態。

要使用用戶ID,它是一個https://disqus.com/users/106222183無論在哪裏的是用戶ID

-1

沒有API:

#disqus_thread { 
    position: relative; 
    height: 300px; 
    background-color: #fff; 
    overflow: hidden; 
} 
#disqus_thread:after { 
    content: ""; 
    display: block; 
    height: 10px; 
    width: 100%; 
    position: absolute; 
    bottom: 0; 
    background: white; 
} 
#disqus_thread.loaded { 
    height: auto; 
} 
#disqus_thread.loaded:after{ 
    height:55px; 
} 
#disqus-load { 
    text-align: center; 
    color: #fff; 
    padding: 11px 14px; 
    font-size: 13px; 
    font-weight: 500; 
    display: block; 
    text-align: center; 
    border: none; 
    background: rgba(29,47,58,.6); 
    line-height: 1.1; 
    border-radius: 3px; 
    font-weight: 500; 
    transition: background .2s; 
    text-shadow: none; 
    cursor:pointer; 
} 

<div class="disqus-comments"> 
    <div id='disqus_thread'></div> 
    <div id='disqus-load'>Load comments</div> 
</div> 

<script type="text/javascript"> 


$(document).ready(function() { 
    var disqus_shortname = 'testare-123'; 

    (function() { 
     var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true; 
     dsq.src = '//' + disqus_shortname + '.disqus.com/embed.js'; 
     (document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq); 
    })(); 
     $('#disqus-load').on('click', function(){ 

     $.ajax({ 
      type: "GET", 
      url: "http://" + disqus_shortname + ".disqus.com/embed.js", 
      dataType: "script", 
      cache: true 
     }); 

     $(this).fadeOut(); 
     $('#disqus_thread').addClass('loaded'); 
    }); 
}); 
    /* * * CONFIGURATION VARIABLES * * */ 
    // var disqus_shortname = 'testare-123'; 

    // /* * * DON'T EDIT BELOW THIS LINE * * */ 
    // (function() { 
    // var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true; 
    // dsq.src = '//' + disqus_shortname + '.disqus.com/embed.js'; 
    // (document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq); 
    // })(); 
</script> 
<noscript>Please enable JavaScript to view the <a href="https://disqus.com/?ref_noscript" rel="nofollow">comments powered by Disqus.</a></noscript>