i'm trying to automate the login of the UK's data archive service. that website is obviously trustworthy. unfortunately, both RCurl
and httr
break at SSL verification. my web browser doesn't give any sort of warning. i can work around the issue by using ssl.verifypeer = FALSE
in RCurl
but i'd like to understand what's going on?
# breaks
library(httr)
GET( "https://www.esds.ac.uk/secure/UKDSRegister_start.asp" )
# breaks
library(RCurl)
cert <- system.file("CurlSSL/cacert.pem", package = "RCurl")
getURL("https://www.esds.ac.uk/secure/UKDSRegister_start.asp",cainfo = cert)
# works
library(RCurl)
getURL(
"https://www.esds.ac.uk/secure/UKDSRegister_start.asp" ,
.opts = list(ssl.verifypeer = FALSE)
) # note: use list(ssl.verifypeer = FALSE,followlocation=TRUE) to see content
Get the TERENA SSL CA PEM file from TERENA's repository of trusted certificates and use this file as your cainfo
parameter.
EDIT: You might need to add two lines to the beginning of that file. The code works for me using the following TERENA.pem
file:
TERENA
======
-----BEGIN CERTIFICATE-----
MIIEmDCCA4CgAwIBAgIQS8gUAy8H+mqk8Nop32F5ujANBgkqhkiG9w0BAQUFADCB
lzELMAkGA1UEBhMCVVMxCzAJBgNVBAgTAlVUMRcwFQYDVQQHEw5TYWx0IExha2Ug
Q2l0eTEeMBwGA1UEChMVVGhlIFVTRVJUUlVTVCBOZXR3b3JrMSEwHwYDVQQLExho
dHRwOi8vd3d3LnVzZXJ0cnVzdC5jb20xHzAdBgNVBAMTFlVUTi1VU0VSRmlyc3Qt
SGFyZHdhcmUwHhcNMDkwNTE4MDAwMDAwWhcNMjAwNTMwMTA0ODM4WjA2MQswCQYD
VQQGEwJOTDEPMA0GA1UEChMGVEVSRU5BMRYwFAYDVQQDEw1URVJFTkEgU1NMIENB
MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAw+NIxC9cwcupmf0booNd
ij2tOtDipEMfTQ7+NSUwpWkbxOjlwY9UfuFqoppcXN49/ALOlrhfj4NbzGBAkPjk
tjolnF8UUeyx56+eUKExVccCvaxSin81joL6hK0V/qJ/gxA6VVOULAEWdJRUYyij
8lspPZSIgCDiFFkhGbSkmOFg5vLrooCDQ+CtaPN5GYtoQ1E/iptBhQw1jF218bbl
p8ODtWsjb9Sl61DllPFKX+4nSxQSFSRMDc9ijbcAIa06Mg9YC18em9HfnY6pGTVQ
L0GprTvG4EWyUzl/Ib8iGodcNK5Sbwd9ogtOnyt5pn0T3fV/g3wvWl13eHiRoBS/
fQIDAQABo4IBPjCCATowHwYDVR0jBBgwFoAUoXJfJhsomEOVXQc31YWWnUvSw0Uw
HQYDVR0OBBYEFAy9k2gM896ro0lrKzdXR+qQ47ntMA4GA1UdDwEB/wQEAwIBBjAS
BgNVHRMBAf8ECDAGAQH/AgEAMBgGA1UdIAQRMA8wDQYLKwYBBAGyMQECAh0wRAYD
VR0fBD0wOzA5oDegNYYzaHR0cDovL2NybC51c2VydHJ1c3QuY29tL1VUTi1VU0VS
Rmlyc3QtSGFyZHdhcmUuY3JsMHQGCCsGAQUFBwEBBGgwZjA9BggrBgEFBQcwAoYx
aHR0cDovL2NydC51c2VydHJ1c3QuY29tL1VUTkFkZFRydXN0U2VydmVyX0NBLmNy
dDAlBggrBgEFBQcwAYYZaHR0cDovL29jc3AudXNlcnRydXN0LmNvbTANBgkqhkiG
9w0BAQUFAAOCAQEATiPuSJz2hYtxxApuc5NywDqOgIrZs8qy1AGcKM/yXA4hRJML
thoh45gBlA5nSYEevj0NTmDa76AxTpXv8916WoIgQ7ahY0OzUGlDYktWYrA0irkT
Q1mT7BR5iPNIk+idyfqHcgxrVqDDFY1opYcfcS3mWm08aXFABFXcoEOUIEU4eNe9
itg5xt8Jt1qaqQO4KBB4zb8BG1oRPjj02Bs0ec8z0gH9rJjNbUcRkEy7uVvYcOfV
r7bMxIbmdcCeKbYrDyqlaQIN4+mitF3A884saoU4dmHGSYKrUbOCprlBmCiY+2v+
ihb/MX5UR6g83EMmqZsFt57ANEORMNQywxFa4Q==
-----END CERTIFICATE-----
The GET
method of httr
uses RCurl::curlPerform
internally, as does RCurl::getURL
, so the observed behavior is not surprising. The curl
command-line tools with the "verbose" switch -v
gives the following additional hints:
$ curl -v "https://www.esds.ac.uk/secure/UKDSRegister_start.asp"
* About to connect() to www.esds.ac.uk port 443 (#0)
* Trying 155.245.69.4...
* Connected to www.esds.ac.uk (155.245.69.4) port 443 (#0)
* successfully set certificate verify locations:
* CAfile: none
CApath: /etc/ssl/certs
* SSLv3, TLS handshake, Client hello (1):
* SSLv3, TLS handshake, Server hello (2):
* SSLv3, TLS handshake, CERT (11):
* SSLv3, TLS alert, Server hello (2):
* SSL certificate problem: unable to get local issuer certificate
* Closing connection 0
curl: (60) SSL certificate problem: unable to get local issuer certificate
More details here: http://curl.haxx.se/docs/sslcerts.html
curl performs SSL certificate verification by default, using a "bundle"
of Certificate Authority (CA) public keys (CA certs). If the default
bundle file isn't adequate, you can specify an alternate file
using the --cacert option.
If this HTTPS server uses a certificate signed by a CA represented in
the bundle, the certificate verification probably failed due to a
problem with the certificate (it might be expired, or the name might
not match the domain name in the URL).
If you'd like to turn off curl's verification of the certificate, use
the -k (or --insecure) option.
The link in the above error message contains, at enumeration item 3, instruction on obtaining the server's certificate:
$ openssl s_client -connect "www.esds.ac.uk:443"
CONNECTED(00000003)
depth=0 C = GB, ST = Essex, L = Colchester, O = University of Essex, OU = UK Data Archive, CN = www.esds.ac.uk
verify error:num=20:unable to get local issuer certificate
verify return:1
depth=0 C = GB, ST = Essex, L = Colchester, O = University of Essex, OU = UK Data Archive, CN = www.esds.ac.uk
verify error:num=27:certificate not trusted
verify return:1
depth=0 C = GB, ST = Essex, L = Colchester, O = University of Essex, OU = UK Data Archive, CN = www.esds.ac.uk
verify error:num=21:unable to verify the first certificate
verify return:1
---
Certificate chain
0 s:/C=GB/ST=Essex/L=Colchester/O=University of Essex/OU=UK Data Archive/CN=www.esds.ac.uk
i:/C=NL/O=TERENA/CN=TERENA SSL CA
---
Server certificate
-----BEGIN CERTIFICATE-----
MIIEIzCCAwugAwIBAgIQO9FPWbAYKDAuFHq61U3gDDANBgkqhkiG9w0BAQUFADA2
MQswCQYDVQQGEwJOTDEPMA0GA1UEChMGVEVSRU5BMRYwFAYDVQQDEw1URVJFTkEg
U1NMIENBMB4XDTEwMTIwNjAwMDAwMFoXDTEzMTIwNTIzNTk1OVowgYMxCzAJBgNV
......
To me, this reads as if the certificate is not trusted. A quick search for "terena ssl root certificate" found this website of the University of Helsinki which reads:
Unfortunately root certificates of these authorities are not always present in devices in use, instead, we need to install those root certificates ourselves.
This site also contains a link to the certificate repository.