We're using a curl HEAD request in a PHP application to verify the validity of generic links. We check the status code just to make sure that the link the user has entered is valid. Links to all websites have succeeded, except LinkedIn.
While it seems to work locally (Mac), when we attempt the request from any of our Ubuntu servers, LinkedIn returns a 999 status code. Not an API request, just a simple curl like we do for every other link. We've tried on a few different machines and tried altering the user agent, but no dice. How do I modify our curl so that working links return a 200?
A sample HEAD request:
curl -I --url https://www.linkedin.com/company/linkedin
Sample Response on Ubuntu machine:
HTTP/1.1 999 Request denied
Date: Tue, 18 Nov 2014 23:20:48 GMT
Server: ATS
X-Li-Pop: prod-lva1
Content-Length: 956
Content-Type: text/html
To respond to @alexandru-guzinschi a little better. We've tried masking the User Agents. To sum up our trials:
So now I'm thinking they block any curl requests that dont provide an alternate UA and also block hosting providers?
Is there any other way I can check if a link to linkedin is valid or if it will lead to their 404 page, from an Ubuntu machine using PHP?
It looks like they filter requests based on the user-agent:
$ curl -I --url https://www.linkedin.com/company/linkedin | grep HTTP
HTTP/1.1 999 Request denied
$ curl -A "Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3" -I --url https://www.linkedin.com/company/linkedin | grep HTTP
HTTP/1.1 200 OK