What are the best options for performing Web Scraping of a not currently open tab from within a Google Chrome Extension with JavaScript and whatever more technologies are available. Other JavaScript-libraries are also accepted.
The important thing is to mask the scraping to behave like a normal web-request. No indications of AJAX or XMLHttpRequest, like X-Requested-With: XMLHttpRequest
or Origin
.
The scraped content must be accessible from JavaScript for further manipulation and presentation within the extension, most probably as a string.
Are there any hooks in any WebKit/Chrome-specific API:s that can be used to make a normal web-request and get the results for manipulation?
var pageContent = getPageContent(url); // TODO: Implement
var items = $(pageContent).find('.item');
// Display items with further selections
Bonus-points to make this work from a local file on disk, for initial debugging. But if that is the only point is stopping a solution, then disregard the bonus-points.
Attempt to use XHR2 responseType = "document"
and fall back on (new DOMParser).parseFromString(responseText, getResponseHeader("Content-Type"))
with my text/html
patch. See https://gist.github.com/1138724 for an example of how I detect responseType = "document
support (synchronously checking response === null
on an object URL created from a text/html
blob).
Use the Chrome WebRequest API to hide X-Requested-With
, etc. headers.