嘗試使用cheerio.js從此站點刮取威士忌名稱,image_url和描述:https://www.thewhiskyexchange.com/c/33/american-whiskey?filter=true#productlist-filter。我想將這些信息轉換成一個JSON對象數組來存儲在我的MongoDB中。無法顯示該網站的完整的HTML,但這裏是無序列表的相關基礎結構的一部分:用cheerio.js刮,得到:錯誤:只能在暫停時執行操作
<body>
<div class="siteWrapper">
<div class="wrapper">
<div class="products-wrapper">
<ul class="products-list">
<li>
<a>
<div class="product-content">
<div class="information">
<p class="name">
" Jack Daniel's Old No. 7"
<span>Small Bottle</span>
</p>
</div>
</div>
</a>
</li>
<li></li>
<li></li> etc. </all closing tags>
出發只是試圖在<p class="name">
拿到威士忌的名字,而不<span>
任何文本標籤,我用在瀏覽器控制檯這個jQuery代碼,它讓我正是我需要的:
const express = require('express');
const request = require('request');
const cheerio = require('cheerio');
const fs = require('fs');
const app = express();
const port = 8000;
request('https://www.thewhiskyexchange.com/c/33/american-whiskey?filter=true#productlist-filter', function(error, response, body) {
if(error) {
console.log("Error: " + error);
}
console.log("Status code: " + response.statusCode);
const $ = cheerio.load(body);
// console.log(body);
$('ul.products-list > li').each(function(index) {
const nameOnly = $(this).find('a div div.information p.name').first().contents().filter(function() {
return this.nodeType == 3;
}).text().trim();
const whiskeyObject = {name: nameOnly};
const whiskeys = JSON.stringify(whiskeyObject);
console.log(whiskeys);
})
});
app.listen(port);
console.log(`Stuff is working on Port ${port}!`);
:
$('ul.products-list > li').each(function(index) {
const nameOnly = $(this).find('a div div.information p.name').first().contents().filter(function() {
return this.nodeType == 3;
}).text();
const whiskeyObject = {name: nameOnly};
const whiskeys = JSON.stringify(whiskeyObject);
console.log(whiskeys);
})
與cheerio嘗試相同的代碼在我的應用程序文件(威士忌scraper.js)
當我運行在我的終端node inspect whiskey-scraper.js
,控制檯記錄的200狀態碼,也記錄此錯誤:
"Error: Can only perform operation while paused. - undefined
at _pending.(anonymous function) (node-
inspect/lib/internal/inspect_client.js:243:27)
at Client._handleChunk (node-inspect/lib/internal/inspect_client.js:213:11)
at emitOne (events.js:96:13)
at Socket.emit (events.js:191:7)
at readableAddChunk (_stream_readable.js:178:18)
at Socket.Readable.push (_stream_readable.js:136:10)
at TCP.onread (net.js:561:20)"
想不通這意味着什麼或如何解決此錯誤。任何想法如何消除這個錯誤,至少讓我的console.log(whiskeys);
線路工作?如果我能做到這一點,我可以從那裏拿走它。
當我取消註釋console.log(body);
我得到整個網站的html記錄到控制檯,所以我覺得cheerio正在從網站獲取我需要的信息。一旦我消除了這個錯誤,我可以弄清楚獲取image_url,描述並將其加入到我的MongoDB中。
謝謝!