I am trying to extract information from a HTML page using Vb script. This is the HTML page from which I am trying to extract the information.
<div id="profile-education">
<div class="position first education vevent vcard" id="xxxxxx">
University 1
<span class="degree">Ph.D.</span>
<span class="major">Computer Science</span>
<p class="period">
<abbr class="dtstart" title="2005-01-01">2005</abbr> – <abbr class="dtend"
title="2012-12-31">2012</abbr>
</div>
<div class="position education vevent vcard" id="xxxxxx">
University 2
<span class="degree">M.Eng.</span>
<span class="major">Computer Science</span>
<p class="period">
<abbr class="dtstart" title="2000-01-01">2000</abbr> – <abbr class="dtend"
title="2004-12-31">2004</abbr>
</p>
</div>
</div>
I want to extract the information in the below format.
Period: 2005 - 2012
University Name: University 2
In my VB script, I have the following code which extracts the entire information as a single variable.
Dim openedpage as String
openedpage = iedoc1.getElementById("profile-education").innerText
However, if I use the following statement in my vb Script, I can get a particular span information.
openedpage = iedoc1.getElementById("profile-education").getElementsByTagName("span")
(0).innerText
The above code gives me Phd as the output. However, I will not know the total spans beforehand and so I cannot simply give span(0) and span(1) in my code. Also, I would like to extract the information for all div tags and I won't be knowing this information either. Basically, I want some loop structure to iterate through the div tags with the id profile-education from which I should be able to extract multiple div and span information.
Dim divs, div
set divs = iedoc1.getElementById("profile-education").getElementsByTagName("div")
for each div in divs
debug.print "*************************************"
debug.Print div.ChildNodes(0).toString
debug.print div.getElementsByTagName("span")(0).innerText
debug.print div.getElementsByTagName("span")(1).innerText
' etc...
next div