Wget Mirror HTML only

B.Mr.W. picture B.Mr.W. · Aug 29, 2013 · Viewed 11.2k times · Source

I have a small website that I try to mirror to my local machine with only the html file, no images, image attach files... pdf, ..etc.

I have never mirrored a website before and think it would be a good idea to ask the question before doing anything catastrophical.

This is the command that I want to run and wondering if anything else should be added.

wget --mirror <url> 

Thanks!

Answer

user2062950 picture user2062950 · Aug 29, 2013

The -R and -A options are used to reject or accept specific file types.

Also consider the bandwidth used to download a whole website. You may want to add the --random-wait option as well.

If you want to skip all images and pdfs, your command will look something like:

wget --mirror --random-wait -R gif,jpg,pdf <url>

Note: mirroring a website may go against the policy, so I suggest you check first.

Sources: