GFW Technical Review 06 – HTTPS and Domain Fronting

GFW originated in the late 1990s to early 2000s – the very early days of the Internet. Its original DPI strategies, like keyword and URL matching against plaintext HTTP, have largely lost their effectiveness on the modern Internet. Today, the vast majority of web traffic runs over HTTPS. This is not good news for GFW, but it has found ways to adapt.


TLS

To understand how GFW handles HTTPS traffic, we first need to understand TLS – the cryptographic protocol underlying HTTPS.

When a client connects to an HTTPS server, the two parties perform a TLS handshake immediately after the TCP three-way handshake to establish a secure channel. This handshake accomplishes several things: the server proves its identity through a certificate, the parties agree on cryptographic parameters by exchanging supported cipher suites, and they derive shared session keys for encrypting subsequent communication through a key exchange. Once the handshake completes, all application data is encrypted. An eavesdropper sees only opaque ciphertext and can learn nothing about the content being transferred – which pages are visited, what data is submitted, or what responses are returned.

TLS 1.2 Handshake

However, TLS was not designed with censorship resistance in mind. The handshake itself, particularly in TLS 1.2 and earlier, exposes metadata that proves invaluable to censors. The most significant piece of exposed information is the Server Name Indication, or SNI.

Server Name Indication (SNI)

SNI is a TLS extension that allows a client to indicate which hostname it is attempting to connect to during the handshake. It was introduced to solve a practical problem: a single IP address often hosts multiple HTTPS websites, and the server needs to know which certificate to present before the encrypted channel is established. Although technically optional in the protocol specification, SNI is practically mandatory due to the prevalence of shared hosting, and is used by virtually all clients.

The SNI field is sent in the ClientHello message – the very first message from client to server – and it is sent in plaintext. This creates a fundamental information leak: even though the content of HTTPS communication is encrypted, the destination hostname is visible to any network observer.

Example Client Hello packet. Server domain name is in plaintext

For GFW, SNI is a gift. It provides a reliable, standardized field that reveals exactly which website a user is attempting to visit. GFW can simply inspect the SNI field of every TLS ClientHello and match it against a blocklist of forbidden domains. When a match is found, GFW injects TCP RST packets to terminate the connection, just as it does for other blocked content.

Encrypted Client Hello (ECH)

The plaintext SNI troubles the entire Internet security community. It represents the most substantial privacy leak remaining in the TLS/HTTPS infrastructure. Encrypted Server Name Indication (ESNI) was introduced to address this issue, and its successor, Encrypted Client Hello (ECH), has since been standardized as a TLS 1.3 extension.

ECH works by splitting the ClientHello into two parts: an outer portion with the standard plaintext ClientHello format, and an inner portion containing encrypted extensions – including the real SNI and other sensitive fields like ALPN. For compatibility reasons, the outer extensions still contain a plaintext SNI, but this is a decoy (often pointing to a generic CDN hostname). Only the encrypted inner SNI reveals the actual destination.

Encrypting the inner ClientHello presents a bootstrapping problem: at this point, no TLS connection has been established, so there is no secure channel available. ECH solves this by working in conjunction with DNS – typically DNS-over-HTTPS to prevent DNS-level leakage. Along with the standard IP address, the DNS response includes the server’s ECH configuration, which contains a public key. The client uses this public key to encrypt the inner ClientHello. The server decrypts it and proceeds with the real handshake, while GFW sees nothing useful.

TLS 1.3 Handshake with ECH

ECH poses a major challenge for GFW. It eliminates the only field that GFW can reliably use to identify the destination. Fortunately for GFW, ECH deployment remains limited. While most major browsers already support ECH, server-side adoption will take years to become widespread. This allows GFW to block all ECH connections without causing significant collateral damage for the time being.

Researchers have observed that GFW inspects the TLS extension identifiers in ClientHello messages and drops packets (rather than injecting TCP RST) when it detects ESNI or ECH. This is followed by residual censorship on that connection for two to three minutes. However, as ECH adoption grows over the coming years, the collateral damage from blanket blocking will become substantial, and GFW may be forced to abandon this approach.


Domain Fronting

Domain fronting is a circumvention technique designed to evade GFW’s SNI-based censorship. It was widely used between 2014 and 2018 in popular circumvention tools like Lantern, exploiting a non-standard behavior that major cloud providers unintentionally supported.

In an HTTPS connection, the destination domain name appears in three places: the DNS request, the TLS ClientHello (SNI), and the HTTP Host header inside the encrypted request. The first two are visible to GFW; the third is encrypted.

Ordinarily, the same domain name is used in all three places. The key insight of domain fronting is to use different domain names. In the DNS query and TLS SNI – both visible to the censor – the client specifies a benign domain that happens to share infrastructure with the blocked domain. Inside the encrypted HTTP request, the Host header points to the actual blocked destination.

Domain Fronting

CDN Infrastructure

Domain fronting works exceptionally well on the modern Internet, where CDNs are often the first point of entry for web requests. CDN providers such as Google Cloud and Cloudflare act on behalf of origin servers, processing user requests directly. If a request hits a CDN cache, the cached content is returned immediately. Otherwise, the CDN forwards the request to the origin server on the user’s behalf.

User requests are often handled by CDN frontend servers first

CDN frontend servers typically host a wide range of websites, particularly popular ones. Crucially, they use the HTTP Host header to determine which website a request is intended for. This means that if the TLS SNI and HTTP Host header specify different domains, the CDN will happily accept the TLS connection as long as it hosts the domain in the SNI, but then route the traffic according to the Host header.

With domain fronting, GFW loses the ability to block by SNI alone. Its only options are to block the CDN server by IP entirely – causing massive collateral damage – or to let the traffic through. In 2018, Russian authorities attempted to block Telegram, which used domain fronting, by banning 1.8 million IP addresses belonging to major cloud providers. This knocked countless legitimate websites offline. Eventually, the Russian authorities were forced to reverse course and lift the bans.

A variation of domain fronting is domain hiding, where the TLS ClientHello simply omits the SNI extension entirely. In this case, GFW has no domain name to match against. SNI-less connections constitute a significant portion of Internet traffic, so blocking them indiscriminately would also incur substantial collateral damage.

The Decline of Domain Fronting

Domain fronting was a short-lived technique. Though powerful, it relied on unintended behavior that was never part of any Internet standard. Starting around 2018, major cloud and CDN providers began disabling domain fronting.

Several factors drove this decision. There were terms-of-service concerns, as domain fronting effectively allows traffic to impersonate one service while actually destined for another. There were also security concerns: attackers could use domain fronting to disguise malicious traffic as legitimate connections to trusted services. And of course, there was speculation that government pressure played a role as well.


Closing Thoughts

As the Internet has shifted toward encryption and HTTPS, GFW faces significant challenges. Classic DPI methodologies have become increasingly ineffective. For now, the presence of plaintext SNI and the decline of domain fronting mean that GFW can still effectively censor HTTPS traffic. But the long-term trend toward ECH may fundamentally alter this dynamic. The rise of HTTPS has also created new opportunities for circumvention technologies. Rather than relying on “looks like nothing” protocols like Shadowsocks, newer protocols have emerged that disguise themselves as legitimate HTTPS traffic, making fingerprinting even more difficult for GFW. We will explore how this cat-and-mouse game continues to evolve in future posts.


References




Enjoy Reading This Article?

Here are some more articles you might like to read next:

  • GFW Technical Review 08 – Tor
  • GFW Technical Review 07 – Active Probing
  • GFW Technical Review 05 – Shadowsocks
  • GFW Technical Review 04 – The West Chamber Project
  • GFW Technical Review 03 – Deep Packet Inspection