I am using ruby Net-sftp gem,I need to download large number of small files before I download I need to make sure to get a list of files in the given directory.
In order to do that I am using sftp.dir.entries('folder path').size
to get list of file count but doing this operation on more than 10,000 files taking too much time(even hours) is there a better way to do this ?
even I tried using ssh.exec!("ls -l")
this is also slow.
I am tring to connect to windows box which is windows server 2008 R2
to download a series of files with validations, i would do something like the following:
Net::SFTP.start(ftp_host, user, :password => password) do |sftp|
sftp.dir.entries('/path/to/folder').each do |remote_file|
if passes_validation?(remote_file)
file_data = sftp.download!('/path/to/folder' + '/' + remote_file.name)
local_file = File.open('/path/to/local', 'wb')
local_file.print file_data
local_file.close
end
end
end
one thing to remember when using this approach is that there are differences in SFTP server protocols which affect how many attributes will be accessible for remote_file
; you can check what protocol you are working with by calling sftp.protocol
after opening the connection.
alternatively, if you wanted to try to pass validations as part of your query to the SFTP, you could try .glob("/path/to/folder", "*.ext")
instead of .entries
if your validation is based on the file extension, though i can't speak for how it will work speed-wise (documentation here). in theory, it could speed-up the query (less data to return), but as it involves more work up-front, i'm not certain it will help.
i'm running my script from a VirtualBox running Ubuntu 12 with 2 GB of RAM dedicated (host is Windows 7), and connecting to a server with Windows Server 2008 R2 SP1 installed, running SolarWind for the SFTP portion; Ruby 1.9.3p392, Net-SFTP 2.1.2 and Net-SSH 2.6.8. with those tech specs, i roughly average 78 files a minute (though that is without validations).