I have a few hundred thousand URLs that I need to call. These are calls to an application server which will process them and write a status code to a table. I do not need to wait for a response (success/fail), only that the server got the request. I also want to be able to specify how many concurrent jobs can be running at once as I haven't worked out how many concurrent requests tomcat can handle.
Here's what I've got so far, basically taken from someone's else's attempt to do something similar, just not with url calls. The text file contains each url on its own line. The url looks like this:
http://webserver:8080/app/mwo/services/create?server=ServerName&e1user=admin&newMWONum=123456&sourceMWONum=0&tagNum=33-A-1B
And the code:
$maxConcurrentJobs = 10
$content = Get-Content -Path "C:\Temp\urls.txt"
foreach ($url in $content) {
$running = @(Get-Job | Where-Object { $_.State -eq 'Running' })
if ($running.Count -le $maxConcurrentJobs) {
Start-Job {
Invoke-WebRequest -UseBasicParsing -Uri $using:url
}
} else {
$running | Wait-Job -Any
}
Get-Job | Receive-Job
}
The problems I'm having is that it is giving 2 errors per "job" and I'm not sure why. When I dump the url array $content it looks fine and when I run my Invoke-WebRequest one by one they work without error.
126 Job126 BackgroundJob Running True localhost ...
Invalid URI: The hostname could not be parsed.
+ CategoryInfo : NotSpecified: (:) [Invoke-RestMethod], UriFormatException
+ FullyQualifiedErrorId : System.UriFormatException,Microsoft.PowerShell.Commands.InvokeRestMethodComman
d
+ PSComputerName : localhost
Invalid URI: The hostname could not be parsed.
+ CategoryInfo : NotSpecified: (:) [Invoke-RestMethod], UriFormatException
+ FullyQualifiedErrorId : System.UriFormatException,Microsoft.PowerShell.Commands.InvokeRestMethodComman
d
+ PSComputerName : localhost
Any help or alternative implementations would be appreciated. I'm open to not using powershell, but I'm limited to Windows 7 Desktops or Windows 2008 R2 servers, and I'd probably be running the final script on the server itself using localhost in the url to cut down on network delays.
With Jobs you incur a large amount of overhead, because each new Job spawns a new process.
Use Runspaces instead!
$maxConcurrentJobs = 10
$content = Get-Content -Path "C:\Temp\urls.txt"
# Create a runspace pool where $maxConcurrentJobs is the
# maximum number of runspaces allowed to run concurrently
$Runspace = [runspacefactory]::CreateRunspacePool(1,$maxConcurrentJobs)
# Open the runspace pool (very important)
$Runspace.Open()
foreach ($url in $content) {
# Create a new PowerShell instance and tell it to execute in our runspace pool
$ps = [powershell]::Create()
$ps.RunspacePool = $Runspace
# Attach some code to it
[void]$ps.AddCommand("Invoke-WebRequest").AddParameter("UseBasicParsing",$true).AddParameter("Uri",$url)
# Begin execution asynchronously (returns immediately)
[void]$ps.BeginInvoke()
# Give feedback on how far we are
Write-Host ("Initiated request for {0}" -f $url)
}
As noted in the linked ServerFault post, you can also use a more generic solution, like Invoke-Parallel
, which basically does the above