How can I find sitemap.xml file of websites?
e.g. Going to stackoverflow/sitemap.xml gets me a 404.
In stackoverflow/robots.txt is written the following:
"this technically isn't valid, since for some godforsaken reason sitemap paths must be ABSOLUTE and not relative. Sitemap: /sitemap.xml"
There is no standard, so there is no guarantee. With that said, its common for the sitemap to be self labeled and on the root, like this:
example.com/sitemap.xml
Case is sensitive on some servers, so keep that in mind. If its not there, look in the robots file on the root:
example.com/robots.txt
If you don't see it listed in the robots file head to Google and search this:
site:example.com filetype:xml
This will limit the results to XML files on your target domain. At this point its trial-and-error and based on the specifics of the website you are working with. If you get several pages of results from the Google search phrase above then try to limit the results further:
filetype:xml site:example.com inurl:sitemap
or
filetype:xml site:example.com inurl:products
If you still can't find it you can right-click > "View Source"
and do a search (aka: "control find" or Ctrl + F
) for .xml
to see if there is a reference to it in the code.