GFW Technical Review 06 – HTTPS Censorship
The GFW originated in the late 1990s to early 2000s, the very early days of the Internet. Its original DPI strategies, such as keyword and URL matching against plaintext HTTP, have largely lost their effectiveness on the modern Internet. Today, the vast majority of web traffic runs over HTTPS. This is bad news for the GFW, but it has found ways to adapt.
TLS
To understand how the GFW handles HTTPS traffic, we first need to understand TLS, the cryptographic protocol underlying HTTPS.
When a client connects to an HTTPS server, the two parties perform a TLS handshake immediately after the TCP three-way handshake to establish a secure channel. This handshake accomplishes several things: the server proves its identity through a certificate, the parties agree on cryptographic parameters by exchanging supported cipher suites, and they derive shared session keys for encrypting subsequent communication through a key exchange. Once the handshake completes, all application data is encrypted. An eavesdropper sees only opaque ciphertext and learns nothing about the content being transferred.
However, TLS was not designed with censorship resistance in mind. The handshake itself, particularly in TLS 1.2 and earlier, exposes metadata that proves invaluable to censors. The most significant piece of exposed information is the Server Name Indication, or SNI.
Server Name Indication (SNI)
SNI is a TLS extension that allows a client to indicate which hostname it is attempting to connect to during the handshake. It was introduced to solve a practical problem: a single IP address often hosts multiple HTTPS websites, and the server needs to know which certificate to present before the encrypted channel is established. Although technically optional in the protocol specification, SNI is practically mandatory due to the prevalence of shared hosting, and is used by virtually all clients.
The SNI field is sent in the ClientHello message, the very first message from client to server, and it is sent in plaintext. This creates a fundamental information leak: even though the content of HTTPS communication is encrypted, the destination hostname is visible to any network observer.
For the GFW, SNI is a gift. It provides a reliable, standardized field that reveals exactly which website a user is attempting to visit. The GFW inspects the SNI field of every TLS ClientHello and matches it against a blocklist of forbidden domains. When a match is found, it injects TCP RST packets to terminate the connection, just as it does for other blocked content.
Encrypted Client Hello (ECH)
The plaintext SNI is the largest remaining privacy leak in TLS/HTTPS. Encrypted Server Name Indication (ESNI) was introduced to address it, and its successor, Encrypted Client Hello (ECH), has since been standardized as a TLS 1.3 extension.
ECH works by splitting the ClientHello into two parts: an outer portion with the standard plaintext ClientHello format, and an inner portion containing encrypted extensions, including the real SNI and other sensitive fields like ALPN. For compatibility reasons, the outer extensions still carry a plaintext SNI, but this is a decoy (often pointing to a generic CDN hostname). Only the encrypted inner SNI reveals the actual destination.
Encrypting the inner ClientHello presents a bootstrapping problem: at this point, no TLS connection has been established, so there is no secure channel available. ECH solves this with help from DNS, typically DNS-over-HTTPS to prevent DNS-level leakage. Along with the standard IP address, the DNS response includes the server’s ECH configuration, which contains a public key. The client uses this public key to encrypt the inner ClientHello. The server decrypts it and proceeds with the real handshake, while the GFW sees nothing useful.
ECH poses a major challenge for the GFW. It eliminates the only field the GFW can reliably use to identify the destination. Fortunately for the GFW, ECH deployment remains limited. Most major browsers already support it, but server-side adoption will take years to become widespread. For the time being, the GFW can block all ECH connections without causing significant collateral damage.
Researchers have observed that the GFW inspects the TLS extension identifiers in ClientHello messages and drops packets (rather than injecting TCP RST) when it detects ESNI or ECH. Residual censorship then kicks in on that connection for two to three minutes. As ECH adoption grows over the coming years, the collateral damage from blanket blocking will become substantial, and the GFW may be forced to abandon this approach.
Domain Fronting
Domain fronting is a circumvention technique designed to evade the GFW’s SNI-based censorship. It was widely used between 2014 and 2018 in popular circumvention tools like Lantern, exploiting a non-standard behavior that major cloud providers unintentionally supported.
In an HTTPS connection, the destination domain name appears in three places: the DNS request, the TLS ClientHello (SNI), and the HTTP Host header inside the encrypted request. The first two are visible to the GFW; the third is encrypted.
Ordinarily, the same domain name is used in all three places. The key insight of domain fronting is to use different ones. In the DNS query and TLS SNI (both visible to the censor), the client specifies a benign domain that happens to share infrastructure with the blocked domain. Inside the encrypted HTTP request, the Host header points to the actual blocked destination.
CDN Infrastructure
Domain fronting works exceptionally well on the modern Internet, where CDNs are often the first point of entry for web requests. CDN providers such as Google Cloud and Cloudflare act on behalf of origin servers, processing user requests directly. If a request hits a CDN cache, the cached content is returned immediately. Otherwise, the CDN forwards the request to the origin server on the user’s behalf.
CDN frontend servers typically host a wide range of websites, particularly popular ones. Crucially, they use the HTTP Host header to determine which website a request is intended for. This means that if the TLS SNI and HTTP Host header specify different domains, the CDN will happily accept the TLS connection as long as it hosts the domain in the SNI, but then route the traffic according to the Host header.
With domain fronting, the GFW loses the ability to block by SNI alone. Its only options are to block the CDN server by IP entirely (causing massive collateral damage) or to let the traffic through. In 2018, Russian authorities tried to block Telegram, which used domain fronting, by banning 1.8 million IP addresses belonging to major cloud providers. Countless legitimate websites were knocked offline. Eventually, the Russian authorities were forced to reverse course and lift the bans.
A variation of domain fronting is domain hiding, where the TLS ClientHello simply omits the SNI extension entirely. In that case, the GFW has no domain name to match against. SNI-less connections make up a significant portion of Internet traffic, so blocking them indiscriminately would also incur substantial collateral damage.
The Decline of Domain Fronting
Domain fronting was a short-lived technique. Though powerful, it relied on unintended behavior that was never part of any Internet standard. Starting around 2018, major cloud and CDN providers began disabling domain fronting.
Several factors drove this decision. There were terms-of-service concerns, as domain fronting effectively allows traffic to impersonate one service while actually destined for another. There were also security concerns: attackers could use domain fronting to disguise malicious traffic as legitimate connections to trusted services. And of course, there was speculation that government pressure played a role as well.
Closing Thoughts
As the Internet has shifted toward encryption, classic DPI has lost much of its edge. For now, plaintext SNI and the decline of domain fronting let the GFW continue to censor HTTPS traffic effectively. But the long-term trend toward ECH may fundamentally change that. The rise of HTTPS has also opened new avenues for circumvention. Rather than relying on “looks like nothing” protocols like Shadowsocks, newer protocols disguise themselves as legitimate HTTPS traffic, making fingerprinting even harder for the GFW. Later posts trace how this cat-and-mouse game continues to evolve.
References
- Cloudflare. What is a TLS handshake. https://www.cloudflare.com/learning/ssl/what-happens-in-a-tls-handshake/
- Cloudflare. Encrypted Client Hello - the last puzzle piece to privacy. https://blog.cloudflare.com/announcing-encrypted-client-hello/
- GFW Report. Exposing and Circumventing China’s Censorship of ESNI. https://gfw.report/blog/gfw_esni_blocking/en/
- David Fifield, Chang Lan, Rod Hynes, Percy Wegmann, and Vern Paxson. Blocking-resistant communication through domain fronting. In Proceedings on Privacy Enhancing Technologies 2015. https://petsymposium.org/2015/papers/03_Fifield.pdf
- Russia Bans 1.8 Million Amazon and Google IPs in Attempt to Block Telegram. https://www.bleepingcomputer.com/news/government/russia-bans-18-million-amazon-and-google-ips-in-attempt-to-block-telegram/
Enjoy Reading This Article?
Here are some more articles you might like to read next: