2017-08-04 121 views
0

我只是想構建一個使用chrome-remote-interface的抓取工具,但我不知道如何獲取像特定目標id,類這樣的多個dom元素。如何使用chrome-remote-interface節點js獲取多個DOM元素?

防爆:

price = document.getelementbyid('price') 

name= document.getelementbyid('name') 

代碼

const CDP = require('chrome-remote-interface'); 
    CDP((client) => { 
     // Extract used DevTools domains. 
     const {Page, Runtime} = client; 

     // Enable events on domains we are interested in. 
     Promise.all([ 
     Page.enable() 
     ]).then(() => { 
     return Page.navigate({url: 'http://example.com'}) 
     }); 

     // Evaluate outerHTML after page has loaded. 
     Page.loadEventFired(() => { 
     Runtime.evaluate({expression: 'document.body.outerHTML'}).then((result) => { 
//How to get Multiple Dom elements 
      console.log(result.result.value); 
      client.close(); 
     }); 
     }); 
    }).on('error', (err) => { 
     console.error('Cannot connect to browser:', err); 
    }); 

更新

const CDP = require('chrome-remote-interface'); 

CDP((client) => { 
    // Extract used DevTools domains. 
    const {DOM,Page, Runtime} = client; 

    // Enable events on domains we are interested in. 
    Promise.all([ 
    Page.enable() 
    ]).then(() => { 
    return Page.navigate({url: 'https://someDomain.com'}); 
    }) 
    Page.loadEventFired(() => { 
    const expression = `({ 
      test: document.getElementsByClassName('rows')), 
     })` 
    Runtime.evaluate({expression,returnByValue: true}).then((result) => { 
     console.log(result.result) // Error 
     client.close() 
    }) 
    }) 
}).on('error', (err) => { 
    console.error('Cannot connect to browser:', err); 
}); 

錯誤

{ type: 'object', 
    subtype: 'error', 
    className: 'SyntaxError', 
    description: 'SyntaxError: Unexpected token)', 
    objectId: '{"injectedScriptId":14,"id":1}' } 

其實我想遍歷元素列表但是我不知道它出錯的地方

回答

1

您不能將DOM對象從瀏覽器上下文移動到Node.js上下文,您只需傳遞屬性或什麼都可以被視爲一個JSON對象。這裏我假設你對計算的HTML感興趣。

一種可能的解決方案是:

const CDP = require('chrome-remote-interface'); 

CDP((client) => { 
    // Extract used DevTools domains. 
    const {Page, Runtime} = client; 

    // Enable events on domains we are interested in. 
    Promise.all([ 
     Page.enable() 
    ]).then(() => { 
     return Page.navigate({url: 'http://example.com'}); 
    }); 

    // Evaluate outerHTML after page has loaded. 
    Page.loadEventFired(() => { 
     const expression = `({ 
      name: document.getElementById('name').outerHTML, 
      price: document.getElementById('price').outerHTML 
     })`; 
     Runtime.evaluate({ 
      expression, 
      returnByValue: true 
     }).then(({result}) => { 
      const {name, price} = result.value; 
      console.log(`name: ${name}`); 
      console.log(`price: ${price}`); 
      client.close(); 
     }); 
    }); 
}).on('error', (err) => { 
    console.error('Cannot connect to browser:', err); 
}); 

的關鍵點是使用返回一個returnByValue: true JSON對象。


更新:您有一個錯誤在你的表達,在...('rows')),尾隨)。但即使你修復了它,你仍然會遇到錯誤的情況,因爲你試圖傳遞一個DOM對象數組(參見這個答案的第一段)。再次,如果你想只是外部的HTML你可以做這樣的事情:

// Evaluate outerHTML after page has loaded. 
Page.loadEventFired(() => { 
    const expression = ` 
     // fetch an array-like of DOM elements 
     var elements = document.getElementsByTagName('p'); 
     // create and return an array containing 
     // just a property (in this case `outerHTML`) 
     Array.prototype.map.call(elements, x => x.outerHTML); 
    `; 
    Runtime.evaluate({ 
     expression, 
     returnByValue: true 
    }).then(({result}) => { 
     // this is the returned array 
     const elements = result.value; 
     elements.forEach((html) => { 
      console.log(`- ${html}`); 
     }); 
     client.close(); 
    }); 
}); 
+0

謝謝你的答案但它仍然給我一些有線錯誤生病更新問題 – Nane