I want to understand how my local IP address (localhost) can be exposed to Internet. For that I've read [here] a method of port forwarding using SSH. Which basically does routing from publicly available server to our localhost using SSH.
But I wonder how the service like 'LocalTunnel' works? In above article it's written as following:
There services (localtunnel for example) creates a tunnel from their server back to port 3000 on your dev box. It functions pretty much exactly like an SSH tunnel, but doesn’t require that you have a server.
I've tried reading code from it's github repository and what I understood is:
These services have a server which is publicly available, which generates different URLs, and when we hit that URL, It forward the request to localhost corresponding to that URL.
So basically it's works like a proxy server.
Is these understanding correct? If yes then what I don't understand is that how these server has access to some localhost running on my computer? How it perform request to it? What I'm missing here? Here is the code which I referred.
The Remote (i.e. the localtunnel software on your computer) initializes the connection to the Relay (i.e. localtunnel.me) acts as a multiplexing proxy and when Clients (i.e. web browsers) connect, the relay multiplexes the connections between remotes and clients by appending special headers with network information.
Browser <--\ /--> Device
Browser <---- M-PROXY Service ----> Device
Browser <--/ \--> Device
If you prefer a video preso, I just gave a talk on this at UtahJS Conf 2018, in which I talk a little about all of the other potential solutions as well: SSH Socksv5 proxies (which you mentioned), VPN, UPnP, DHT, Relays, etc:
Access Ability: Access your Devices, Share your Stuff
Slides: http://telebit.cloud/utahjs2018
I'm the author of Telebit, which provides service with similar features to what ngrok, localtunnel, and libp2p provide (as well as open source code for both the remote/client and relay/server to run it yourself).
Although I don't know the exact internals of how localtunnel is implemented, I can give you an explanation of how it's generally done (and how we do it), and it's most likely nearly identical to how they do it.
The magic that you're curious about happens in two places: the remote socket and the multiplexer.
How does a remote client access the server on my localhost?
This is pretty simple. When you run a "remote" (telebit, ngrok, and localtunnel all work nearly the same in this regard), it's actually your computer that initiates the request.
So imagine that the relay (the localtunnel proxy in your case) uses port 7777 to receive traffic from "remotes" (like your computer) and your socket number (randomly chosen source address on your computer) is 1234:
Devices: [Your Computer] (tcp 1234:7777) [Proxy Server]
Software: [Remote] -----------------------> [Relay]
(auth & request 5678)
However, the clients (such as browsers, netcat, or other user agents) that connect to you actually also initiate requests with the relay.
Devices: [Proxy Service] (tcp 5678) [Client Computer]
Software: [Relay] <------------------------ [netcat]
If you're using tcp ports, then the relay service keeps an internal mapping, much like NAT
Internal Relay "Routing Table"
Rule:
Socket remote[src:1234] --- Captures ------> ALL_INCOMING[dst:5678]
Condition:
Incoming client[dst:5678] --- MATCHES -------> ALL_INCOMING[dst:5678]
Therefore:
Incoming client[dst:5678] --- Forwards To ---> remote[src:1234]
Both connections are "incoming" connections, but the remote connection on the "south end" is authorized to receive traffic coming from another incoming source (without some form of authorized session anyone could claim use of that port or address and hijack your traffic).
[Client Computer] (tcp 5678) [Proxy Service] (tcp 1234) [Your Computer]
[netcat] --------------> <--[Relay]--> <------------ [remote]
You may have noticed that there's a critical flaw in the description above. If you just route the traffic as-is, your computer (the remote) could only handle one connection at a time. If another client (browser, netcat, etc) hopped on the connection, your computer wouldn't be able to tell which packets came from where.
Browser <--\ /--> Device
Browser <---- M-PROXY Service ----> Device
Browser <--/ \--> Device
Essentially what the relay (i.e. localtunnel proxy) and the remote (i.e. your computer) do is place a header in front of all data that is to be received by the client. It needs to be something very similar to HAProxy's The PROXY Protocol, but works for non-local traffic as well. It could look like this:
<src-address>,<src-port>,<sni>,<dst-port>,<protocol-guess>,<datalen>
For example
172.2.3.4,1234,example.com,443,https,1024
That info could be sent exactly before or append to each data packet that's routed.
Likewise, when the remote responds to the relay, it has to include that information so that the relay knows which client the data packet is responding to.
See https://www.npmjs.com/package/proxy-packer for long details
The initial explanation I gave using tcp ports, because it's easy to understand. However, localtunnel, ngrok, and telebit all have the ability to use tls servername indicator (SNI) instead of relying on port numbers.
[Client Computer] (https 443) [Proxy Service] (wss 443) [Your Computer]
[netcat+openssl] --------------------> <--[Relay]--> <------------ [remote]
(or web browser) (sni:xyz.somerelay.com) (sni:somerelay.com)
There are still a few different ways you can go about this (and this is where I want to give a shameless plug for telebit because if you're into decentralization and peer-to-peer, this is where we shine!)
If you only use the tls sni for routing (which is how localtunnel and ngrok both work by default last time I checked) all of the traffic is decrypted at the relay.
Anther way requires ACME/Let's Encrypt integration (i.e. Greenlock.js) so that the traffic remains encrypted, end-to-end, routing the tls traffic to the client without decrypting it. This method functions as peer-to-peer channel for all practical purposes (the relay acts as just another opaque "switch" on the network of the Internet, unaware of the contents of the traffic).
Furthermore, if https is used both for remotes (for example, via Secure WebSockets) and the clients, then the clients look just like any other type of https request and won't be hindered by poorly configured firewalls or other harsh / unfavorable network conditions.
Now, solutions built on libp2p also give you a virtualized peer connection, but it's far more indirect and requires routing through untrusted parties. I'm not a fan of that because it's typically slower and I see it as more risky. I'm a big believer than network federation will win out over anonymization (like libp2p) in the long. (for our use case we needed something that could be federated - run by independently trusted parties- which is why we built our solution the way that we did)