Many of the sites we visit daily are protected by HTTPS. Gmail, Twitter, Facebook or Google are just a few examples. They tell us it’s for our security, but how does that protocol protect us? What are the advantages over normal HTTP? And, most importantly: How does it work?
We will try to answer all these questions in this article, starting with the most important: Why do we need HTTPS?
I imagine that you all know what HTTP is, the protocol that browsers use to communicate with web servers and that allows you to view web pages like this one. It works fine but it has a problem: security.
With HTTP, any data is transmitted in plain text, unencrypted. That is, anyone who connects to your WiFi network, or who has access to communication between your computer and the server (your ISP, for example) can see all the data you receive and send. For example, they could see the passwords you use to enter your email, or perform additional operations when you browse your bank’s website.
What is HTTPS?
You have to encrypt that data to prevent anyone else from seeing it. That’s what the HTTPS (Hyper Text Transfer Protocol Secure) protocol is for. HTTPS itself is nothing more than normal HTTP over SSL / TLS.
SSL / TLS (Secure Sockets Layer / Transmission Layer Security) are two protocols for sending encrypted packets over the Internet, the latter being the most modern. They are the same for HTTP as for any other communication protocol, although in this article we will only see their application in HTTPS.
How is a secure connection established?
As you well know, to encrypt data you need a key. This password will have to be known by both the browser and the server in order to communicate. The first problem appears quickly: How do we share a key securely (without anyone else knowing)? It is clear that the key has to be unique for each connection, so it cannot be preconfigured on computers.
This is where one of the wonders of modern cryptography comes into play: a public key / private key system, asymmetric encryption that was already explained to us in Genbeta Dev.
The pre-key is generated in the browser, and shared with the server using asymmetric cryptography
The public and private keys are a pair of numbers related in a special way, such that a message encrypted with a key can only be encrypted with its corresponding pair. For example, if I want to send a message to a server, I encrypt it with its public key so that it can only be decrypted with its private key.
And this is precisely the first essential step of an HTTPS connection. After having agreed on technical details between the browser and the server (version of the protocol, asymmetric and symmetric encryption algorithms to be used …), the browser encrypts a prekey generated at the moment with the public key of the server to which we want to connect. That is sent to the server, which decrypts the prekey with your private key.
Both the server and the browser will apply a certain algorithm to the prekey and they will get the same encryption key. In this way we have overcome the first (and biggest) problem we had: exchanging the key. Thereafter, the data is simply encrypted and decrypted with that key.
Since no one else knows that key, our communications will be secure and no one will be able to see them (as long as the symmetric encryption algorithm is secure, of course).
Some will wonder why exchange a key and not directly encrypt the data using asymmetric encryption. The main reason is that symmetric encryption is much faster than asymmetric, and exchanging the key is not a problem to be overcome.
What data does HTTPS protect?
The obvious question is to know what data is protected when we use an HTTPS connection. As it is based on SSL / TLS, which works in the lower layer of communication, all HTTP data is encrypted. This is, not only the web page but also the complete URL, the parameters sent, the cookies … The only thing that remains uncovered is the data of the TCP packet: the server and the port to which we connect.
HTTPS only reveals the server and port to which we connect.
Therefore, HTTPS not only prevents someone from seeing the web pages that we are visiting. It also prevents them from knowing the URLs through which we move, the parameters that we send to the server (for example, users and passwords are sent as POST parameters normally) or the cookies that we send and receive (someone with access to these cookies could steal our session, as demonstrated by the Firesheep tool).
How do I know that I am not communicating with an imposter?
There is an additional problem in Internet communication. Let’s say I want to navigate to Genbeta.com. How do I know that I am communicating with the Genbeta server and not a fake server that is masquerading as Genbeta (which is known as an attack man-in-the-middle)?
Therefore, servers must be authenticated. There is no use having the data encrypted if we do not make sure that we are connecting to the correct server. That’s what SSL certificates are for, which contain the public key and the domain names in which they can be used.
CAs (trusted third parties) assure us that the certificate is valid and that the server is who it claims to be.
The SSL certificate alone is useless. Any attacker could fake one and you wouldn’t realize it. This is where the CAs, or Certificate Authorities, the issuing authorities of signed SSL certificates, come into play, which only give certificates on a domain to its owner. I could not, for example, go to anyone and ask for a domain for google.com. In addition, the signatures ensure that the content of the certificate has not changed. I am not going to expand on how this process is, we already saw it in a previous article.
In short, a signature of a CA assures us that the server is who it claims to be. Of course, you have to verify that signature to make sure it is real. To do this, the browser looks for the certificate in its store (actually it is a trust ring, something more complex) and verifies the signature. If the certificate is not valid or cannot be found, it will show you a warning that it cannot authenticate the connection to the server.
Precisely the latter happened a while ago with Firefox (and other browsers, I think), when you were trying to connect to secure servers of the Spanish administration. The certificates of these servers were signed by the FNMT, but Firefox did not have the FNMT certificates in its store. For this reason, it could not ensure that the certificate was valid and warned the user.
What weaknesses does HTTPS have?
The weakest point of HTTPS is the certificates. If, for example, someone steals the certificate (with the private key) from Google, they could create a fake server with no way to distinguish it from a legitimate one.
A bigger problem can occur if a CA’s private key is leaked. An attacker could create and sign valid certificates for any domain without being prevented from doing so, thereby tricking users into connecting to rogue servers.
Fortunately, all browsers have a mechanism to revoke certificates. When a certificate is compromised, its certificate thumbprint is listed. The browser downloads that list and will no longer consider any certificate that appears there as valid. The only problem is that browsers do not always take advantage of these lists well, but the mechanism is there.
With this, we have finished our installment of the “How it works …” series on HTTPS. You can imagine, especially the most lay people in the field, that there are things that I have overlooked and others that have been simplified to make reading easier. As always, We will try to resolve the doubts and criticisms you have in the comments.