I am creating an iOS app that needs to get some data from a web page. My first though was to use NSXMLParser initWithContentsOfURL:
and parse the HTML with the NSXMLParser
delegate. However this approach seems like it could quickly become painful (if, for example, the HTML changed I would have to rewrite the parsing code which could be awkward).
Seeing as I'm loading a web page I took take a look at UIWebView
too. It looks like UIWebView
may be the way to go. stringByEvaluatingJavaScriptFromString:
seems like a very handy way to extract the data and would allow the javascript to be stored in a separate file that would be easy to edit if the HTML changed. However, using UIWebView
seems a bit hacky (seeing as UIWebView
is a UIView
subclass it may block the main thread, and the docs say that the javascript has a limit of 10MB).
Does anyone have any advice regarding parsing XML/HTML before I get stuck in?
UPDATE:
I wrote a blog post about my solution:HTML parsing/screen scraping in iOS
I've done this a few times. The best approach I've found is to use libxml2 which has a mode for HTML. Then you can use XPath to query the document.
Working with the libxml2 API is not the most enjoyable. So, I usually bring over the XPathQuery.h/.m files documented on this page:
http://cocoawithlove.com/2008/10/using-libxml2-for-parsing-and-xpath.html
Then I fetch the data using a NSConnection and query the data with something like this:
NSArray *tdNodes = PerformHTMLXPathQuery(self.receivedData, @"//td[@class='col-name']/a/span");
Summary:
Add libxml2 to your project, here are some quick instructions for XCode4: http://cmar.me/2011/04/20/adding-libxml2-to-an-xcode-4-project/
Get the XPathQuery.h/.m
Use an XPath statement to query the html document.