When controlling IE instance via MSHTML, how to suppress Open/Save dialogs for non-HTML content?
I need to get data from another system and import it into our one. Due to budget constraints no development (e.g. WS) can be done on the other side for some time, so my only option for now is to do web scraping.
The remote site is ASP.NET-based, so simple HTML requests won't work -- too much JS.
I wrote a simple C# application that uses MSHTML and SHDocView to control an IE instance. So far so good: I can perform login, navigate to desired page, populate required fields and do submit.
Then I face a couple of problems:
First is that report is opening in another window. I suspect I can attach to that window too by enumerating IE windows in the system.
Second, more troublesome, is that report itself is CSV file, and triggers Open/Save dialog. I'd like to avoid it and make IE save the file into given location OR I'm fine with programmatically clicking dialog buttons too (how?)
I'm actually totally non-Windows guy (unix/J2EE), and hope someone with better knowledge would give me a hint how to do those tasks.
Thanks!
UPDATE
I've found a promising document on MSDN: http://msdn.microsoft.com/en-ca/library/aa770041.aspx
Control the kinds of content that are downloaded and what the WebBrowser Control does with them once they are downloaded. For example, you can prevent videos from playing, script from running, or new windows from opening when users click on links, or prevent Microsoft ActiveX controls from downloading or executing.
Slowly reading through...
UPDATE 2: MADE IT WORK, SORT OF...
Finally I made it work, but in an ugly way. Essentially, I register a handler "before navigate", then, in the handler, if the URL is matching my target file, I cancel the navigation, but remember the URL, and use WebClient class to access and download that temporal URL directly.
I cannot copy the whole code here, it contains a lot of garbage, but here are the essential parts:
Installing handler:
_IE2.FileDownload += new DWebBrowserEvents2_FileDownloadEventHandler(IE2_FileDownload);
_IE.BeforeNavigate2 += new DWebBrowserEvents2_BeforeNavigate2EventHandler(IE_OnBeforeNavigate2);
Recording URL and then cancelling download (thus preventing Save dialog to appear):
public string downloadUrl;
void IE_OnBeforeNavigate2(Object ob1, ref Object URL, ref Object Flags, ref Object Name, ref Object da, ref Object Head, ref bool Cancel)
{
Console.WriteLine("Before Navigate2 "+URL);
if (URL.ToString().EndsWith(".csv"))
{
Console.WriteLine("CSV file");
downloadUrl = URL.ToString();
}
Cancel = false;
}
void IE2_FileDownload(bool activeDocument, ref bool cancel)
{
Console.WriteLine("FileDownload, downloading "+downloadUrl+" instead");
cancel = true;
}
void IE_OnNewWindow2(ref Object o, ref bool cancel)
{
Console.WriteLine("OnNewWindow2");
_IE2 = new SHDocVw.InternetExplorer();
_IE2.BeforeNavigate2 += new DWebBrowserEvents2_BeforeNavigate2EventHandler(IE_OnBeforeNavigate2);
_IE2.Visible = true;
o = _IE2;
_IE2.FileDownload += new DWebBrowserEvents2_FileDownloadEventHandler(IE2_FileDownload);
_IE2.Silent = true;
cancel = false;
return;
}
And in the calling code using the found URL for direct download:
...
driver.ClickButton(".*_btnRunReport");
driver.WaitForComplete();
Thread.Sleep(10000);
WebClient Client = new WebClient();
Client.DownloadFile(driver.downloadUrl, "C:\\affinity.dump");
(driver is a simple wrapper over IE instance = _IE)
Hope that helps someone.
The easiest way to do this would be to adjust the MIME type for CSV files on the system that does the downloading. IE is trying to download the file because of the action associated with .CSV files.
I think you can change this in Windows Explorer by going to Tools-Folder Options-File Types. If you associate CSV files with Internet Explorer then the CSV file will open in IE. At that point you should be able to use IE automation to save the current open document to a file.