How does one go about implementing a HTTP proxy compared to implementing a HTTP webserver, what are the differences? Is there a definitive guide or RFC or a helpful book on this subject?
The header sent to a proxy is different.
For example, here is what is sent by Google Chrome to www.baidu.com via a proxy server:
GET http://www.baidu.com/ HTTP/1.1
Host: www.baidu.com
Proxy-Connection: keep-alive
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
DNT: 1
Accept-Encoding: gzip, deflate, sdch
Accept-Language: zh-CN,zh;q=0.8
We can see it is
GET http://www.baidu.com/ HTTP/1.1
instead of
GET / HTTP/1.1
and here is
Proxy-Connection: keep-alive
also
Host: www.baidu.com
Host field is required for http proxy.
For HTTPS tunnel proxy:
CONNECT comet.zhihu.com:443 HTTP/1.1
Host: comet.zhihu.com:443
Proxy-Connection: keep-alive
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36
We can see
CONNECT comet.zhihu.com:443 HTTP/1.1
domain:443
instead of https://domain
.
CONNECT field turn the proxy server to something like a TCP tunnel, then the protocol HTTPS
is replaced by the port :443
For socks5 proxy, things become easy, because socks5 care nothing about higher protocol, you just tell it host and port.