I have a web crawling python script that takes hours to complete, and is infeasible to run in its entirety on my local machine. Is there a convenient way to deploy this to a simple web server? The script basically downloads webpages into text files. How would this be best accomplished? Thanks!
Since you said that performance is a problem and you are doing web-scraping, first thing to try is a Scrapy
framework - it is a very fast and easy to use web-scraping framework. scrapyd
tool would allow you to distribute the crawling - you can have multiple scrapyd
services running on different servers and split the load between each. See:
There is also a Scrapy Cloud
service out there:
Scrapy Cloud bridges the highly efficient Scrapy development environment with a robust, fully-featured production environment to deploy and run your crawls. It's like a Heroku for Scrapy, although other technologies will be supported in the near future. It runs on top of the Scrapinghub platform, which means your project can scale on demand, as needed.