power shell moving Files to Amazon s3

Robbo picture Robbo · Mar 10, 2014 · Viewed 12.1k times · Source

I have the below PowerShell script that move files to my amazon bucket for me and all works ok for a few small files, however when copying larger files the for loop continues loop and starts they copy before others have finished and it doesn't take long before I have 100s of files all transferring at once.

what I want is to be able to limit the number of simultaneous file transfers to say 5 or 10?

foreach ($line in $csv) {  

#--------------------Transfer files Put in a for each loop here---------------------------
$SourceFolder  =$line.destination
$sourceFile = $line.name

if(test-Path -path $SourceFolder){
    Write-S3Object -BucketName $BucketName  -Key $sourceFile  -File  $SourceFolder 
    #check fro missing files
        $S3GetRequest = New-Object Amazon.S3.Model.S3Object  #get-S3Object  -BucketName   $BucketName  -Key $sourceFile
        $S3GetRequest = get-S3Object  -BucketName $BucketName  -Key $sourceFile

        if($S3GetRequest -eq $null){
            Write-Error "ERROR: Amazon S3 get requrest failed. Script halted."
            $sourceFile + ",Transfer Error" |out-file $log_loc -append
    }
}else {$SourceFolder + ",Missing File Error" |out-file $log_loc -append}

}

Answer

Anthony Neace picture Anthony Neace · Mar 17, 2014

From the description, it sounds like your larger files are triggering multipart upload. From the Write-S3Object documentation:

If you are uploading large files, Write-S3Object cmdlet will use multipart upload to fulfill the request. If a multipart upload is interrupted, Write-S3Object cmdlet will attempt to abort the multipart upload.

Unfortunately, Write-S3Object doesn't really have a native way to handle your use case. However, the Multipart Upload Overview describes a behavior we may be able to leverage:

Multipart uploading is a three-step process: You initiate the upload, you upload the object parts, and after you have uploaded all the parts, you complete the multipart upload. Upon receiving the complete multipart upload request, Amazon S3 constructs the object from the uploaded parts, and you can then access the object just as you would any other object in your bucket.

This leads me to suspect that we can ping our objects with Get-S3Object to see if they exist yet. If not, we should wait on uploading more files until they do.

I've created a script below that will do this -- it iterates through a collection of files and collects their names as you upload them. Once you exceed 5 uploaded files, the script will check if they exist and continue on if they do. Otherwise, it will continue checking that they exist.

$BucketName = "myS3Bucket"
$s3Directory = "C:\users\$env:username\documents\s3test"
$concurrentLimit = 5
$inProgressFiles = @()

foreach ($i in Get-ChildItem $s3Directory) 
{ 
  # Write the file to S3 and add the filename to a collection.
  Write-S3Object -BucketName $BucketName -Key $i.Name -File $i.FullName 
  $inProgressFiles += $i.Name

  # Wait to continue iterating through files if there are too many concurrent uploads
  while($inProgressFiles.Count -gt $concurrentLimit) 
  {
    Write-Host "Before: "$($inProgressFiles.Count)

    # Reassign the array by excluding files that have completed the upload to S3.
    $inProgressFiles = @($inProgressFiles | ? { @(get-s3object -BucketName $BucketName -Key $_).Count -eq 0 })

    Write-Host "After: "$($inProgressFiles.Count)

    Start-Sleep -s 1
  }

  Start-Sleep -s 1
}

You can modify this for your needs by changing the foreach loop to use your csv content. I added sleep statements for you to be able to watch this and see how it works -- feel free to change/remove them.