使用以下代碼我試圖從我們的電話提供商的Web應用程序中刮取通話記錄,以將信息輸入到我的Ruby on Rails應用程序中。使用Nokogiri和Mechanize解析html表格
desc "Import incoming calls"
task :fetch_incomingcalls => :environment do
# Logs into manage.phoneprovider.co.uk and retrieved list of incoming calls.
require 'rubygems'
require 'mechanize'
require 'logger'
# Create a new mechanize object
agent = Mechanize.new { |a| a.log = Logger.new(STDERR) }
# Load the Phone Provider website
page = agent.get("https://manage.phoneprovider.co.uk/login")
# Select the first form
form = agent.page.forms.first
form.username = 'username
form.password = 'password
# Submit the form
page = form.submit form.buttons.first
# Click on link called Call Logs
page = agent.page.link_with(:text => "Call Logs").click
# Click on link called Incoming Calls
page = agent.page.link_with(:text => "Incoming Calls").click
# Prints out table rows
# puts doc.css('table > tr')
# Print out the body as a test
# puts page.body
end
正如您可以從最後五行看到的,我測試了'puts page.body'成功工作並且上面的代碼有效。它成功登錄,然後導航到通話記錄,然後傳入Calls.The來電錶看起來像這樣:
| Timestamp | Source | Destination | Duration |
| 03 Jan 13:40 | 12345678 | 12345679 | 00:01:01 |
| 03 Jan 13:40 | 12345678 | 12345679 | 00:01:01 |
| 03 Jan 13:40 | 12345678 | 12345679 | 00:01:01 |
| 03 Jan 13:40 | 12345678 | 12345679 | 00:01:01 |
這是從下面的代碼生成:
<thead>
<tr>
<td>Timestamp</td>
<td>Source</td>
<td>Destination</td>
<td>Duration</td>
<td>Cost</td>
<td class='centre'>Recording</td>
</tr>
</thead>
<tbody>
<tr class='o'>
<tr>
<td>03 Jan 13:40</td>
<td>12345678</td>
<td>12345679</td>
<td>00:01:14</td>
<td></td>
<td class='opt recording'>
</td>
</tr>
</tr>
<tr class='e'>
<tr>
<td>30 Dec 20:31</td>
<td>12345678</td>
<td>12345679</td>
<td>00:02:52</td>
<td></td>
<td class='opt recording'>
</td>
</tr>
</tr>
<tr class='o'>
<tr>
<td>24 Dec 00:03</td>
<td>12345678</td>
<td>12345679</td>
<td>00:00:09</td>
<td></td>
<td class='opt recording'>
</td>
</tr>
</tr>
<tr class='e'>
<tr>
<td>23 Dec 14:56</td>
<td>12345678</td>
<td>12345679</td>
<td>00:00:07</td>
<td></td>
<td class='opt recording'>
</td>
</tr>
</tr>
<tr class='o'>
<tr>
<td>21 Dec 13:26</td>
<td>07793770851</td>
<td>12345679</td>
<td>00:00:26</td>
<td></td>
<td class='opt recording'>
</td>
</tr>
</tr>
我想找出如何選擇我想要的單元格(時間戳,源,目標和持續時間)並輸出它們。然後我可以擔心輸出到數據庫而不是終端。
我試過使用Selector Gadget,但它只是顯示'td'或'tr:nth-child(6)td,tr:nth-child(2)td'如果我選擇多個。
任何幫助或指針,將不勝感激!
我不確定如何將這個代碼應用到我已有的代碼中,如果你看到以下的想法應該是我的想法..https: //gist.github.com/1574942 – dannymcc 2012-01-07 14:53:10
直到現在才注意到您的回覆。我已經[分解了你的要點並添加了一些代碼](https://gist.github.com/1592493)。我也回答了你關於這個問題的其他問題。 – 2012-01-11 02:03:04