GFW Technical Review 08 – Tor
In the early days of GFW, before Shadowsocks and its variants emerged, there were two off-the-shelf solutions for circumvention: VPNs and Tor. Like VPNs, Tor was invented for Internet privacy and anonymity – censorship circumvention was not its original purpose. Tor is an entire distributed network that users can leverage to access the Internet while hiding their identity. Since the Tor protocol encrypts traffic and conceals its true destination, people quickly discovered its utility for bypassing censorship.
Tor is often associated with the Dark Web. Indeed, the Dark Web is only accessible through the Tor network, but this is not Tor’s sole purpose. In fact, the majority of Tor traffic is used to access the regular Internet. Tor simply provides a layer of privacy and anonymity, similar to a VPN.
Onion Routing
Unlike a VPN, which establishes a point-to-point secure channel, Tor is a distributed network maintained by the nonprofit Tor Project. The network consists of volunteer hosts that run the Tor protocol and act as relays for user traffic. When a user wants to connect through the Tor network, they select three relays: an entry relay, a middle relay, and an exit relay. All traffic is encrypted and routed through these three relays in sequence.
This architecture ensures that no single party – not even the relays themselves – can observe both the user and the destination. The entry relay sees only the user’s IP address; the exit relay sees only the destination. Any censor tapping the network can never observe a direct connection between the user and the true destination – only connections to, from, and between relays.
Tor employs layered encryption to maintain anonymity throughout the network. When the user sends a request, it is encrypted in three layers – each layer encrypted with a key established between the user and one of the three relays. When the entry relay receives the request, it decrypts the outer layer with its key and forwards the payload (still wrapped in two more encryption layers) to the next hop. It cannot read the inner layers because it lacks those keys. Each relay decrypts and forwards the request until it reaches the exit relay, where the original payload is restored and forwarded to the destination. This layered approach, where the payload is decrypted layer by layer like peeling an onion, is where “onion routing” gets its name.
Connecting to the Network
To connect to the Tor network, the user must first obtain an up-to-date list of relays, known as the Tor directory. Clients typically obtain this directory from directory authorities, whose primary function is to maintain this list. Relays themselves also cache copies of the directory. The Tor client ships with a set of known cache locations as bootstrap points. To protect against impersonation, the directory is cryptographically signed by the directory authorities.
Once the client obtains the directory, it selects three relays to establish a circuit. The circuit is built incrementally, starting from the entry relay and extending to the exit relay.
First, the client establishes a standard TLS session with the entry relay. Through this TLS session, the client performs an ntor handshake – essentially a key exchange that establishes a Tor-specific secure channel between the client and the entry relay.
Next, the client extends the circuit by performing another ntor handshake with the middle relay, relayed through the entry relay over the already-established secure channel. The cryptographic design ensures that the entry relay cannot read the keys exchanged between the client and middle relay. This extension process continues until the client has completed key exchanges with all three relays.
Tor Bridges
Like VPNs, Tor was built with anonymity and encryption in mind; censorship circumvention was not part of the initial design. In particular, Tor’s relay discovery mechanism – the directory document and directory authorities – represents a significant weakness.
Although Tor includes a certificate mechanism to reject forged directories, the directory authorities themselves can become targets. More relevant in the context of GFW: the Tor directory and directory authorities are public by necessity as clients must know them to connect to the network. This allows GFW to trivially block the entire Tor network by downloading a copy of the directory and blocking the IP addresses of all listed relays.
To counter censorship, the Tor network maintains a collection of hidden relays that do not appear in the public directory. These are called Tor bridges. Users must discover bridges manually – for example, by requesting them via email or Telegram, sharing them peer-to-peer, or obtaining them from the Tor website after completing a CAPTCHA. Users typically receive only one or two bridge addresses rather than the entire list. The key principle is that censors cannot programmatically obtain a complete, up-to-date list of bridges, while users can still gain access with minimal effort.
Tor DPI
Without a definitive list of bridges, GFW identifies them dynamically through deep packet inspection. Tor relays typically run outside censored networks, so GFW observes the traffic between the client and the entry relay – a standard TLS connection. Although GFW cannot decrypt the TLS session, certain characteristics of Tor’s TLS usage leak information.
TLS Fingerprinting
Every application that uses TLS has a unique identifier known as a TLS fingerprint, which can be extracted from handshake packets. Tools like JA3 and JA3S compute fingerprints from the ClientHello and ServerHello messages using the cipher suites, extensions, and other fields present. This fingerprint is consistent across sessions and depends only on the software implementation – the cryptographic library and version, the supported cipher suites and extensions, and so on.
Since different TLS applications almost always differ slightly in how they implement TLS, knowing the fingerprint often reveals the application.
The standard Tor client has distinctive JA3 and JA3S fingerprints that appear to be used exclusively by the Tor protocol. This allows GFW to identify Tor TLS sessions easily.
Packet Size
Tor has highly distinctive packet sizes. Every payload is arranged into fixed-size units called “Tor cells” – 512 or 514 bytes – with larger payloads fragmented and smaller payloads padded with null bytes. The actual packets on the wire are slightly larger due to TLS and TCP headers, but they still exhibit fixed, predictable sizes. This design was intended to obscure traffic analysis by making all cells uniform. Ironically, it makes Tor traffic extremely identifiable to GFW.
Handshake Pattern
Tor exhibits a distinctive handshake pattern. Once a TLS session is established, the client must perform three ntor handshakes to exchange keys with each relay. This bursty behavior is distinguishable from normal TLS traffic.
Moreover, these handshakes have specific timing characteristics dictated by Tor’s topology: the first handshake (directly with the entry relay) completes fastest, while subsequent handshakes take progressively longer due to additional network hops.
Combining these three DPI methodologies, GFW can identify Tor bridges dynamically with high accuracy.
Active Probing
As with Shadowsocks, GFW actively probes suspected Tor bridges to confirm its passive DPI analysis. Given Tor’s public-facing nature, active probing is straightforward and highly accurate.
GFW establishes a TLS session with the suspected bridge and attempts to build a Tor circuit following the standard handshake process. If the circuit is successfully built, GFW can be 100% certain that the server is running Tor and blocks its IP address and port. Unlike Shadowsocks probing, which requires multiple attempts and is not always conclusive, Tor probing needs only a single successful connection to make a definitive determination.
Due to this accuracy and high confidence, GFW can afford to be lenient with blocking duration and scope. Studies have found that Tor bridge blocks typically last only 12 hours if no further Tor activity is detected, and blocks are always specific to the IP and port tuple. This demonstrates GFW’s confidence in its Tor detection – it can minimize collateral damage because false positives are rare.
Pluggable Transports
Tor includes a built-in mechanism for incorporating censorship circumvention technologies called Pluggable Transports (PTs). PTs are not specific circumvention methods, but rather a framework built around the idea of modular subprocesses that transform Tor traffic to evade censorship. Any newly invented circumvention protocol can be “plugged into” the core Tor ecosystem.
PTs work similarly to Shadowsocks: they run as SOCKS proxies on the client side to obfuscate outgoing traffic. A symmetric component runs on the server side in front of the server application. In the context of Tor, the server-side component typically runs on bridges acting as entry relays for users in censored networks.
The obfs Family
The obfs family of pluggable transports is analogous to Shadowsocks in Tor’s world – it aims to make Tor traffic look like random bytes. Chronologically, obfs predates Shadowsocks.
obfs2 addressed TLS fingerprinting and other pattern-matchable characteristics by applying simple XOR-based encryption to Tor traffic, including the TLS handshake phase, thereby hiding the fingerprint. Unlike Shadowsocks, which relies on a pre-shared secret, obfs2 performs a key exchange at connection start to establish an encrypted channel. However, this key exchange is not cryptographically secure by design – a censor can decrypt the obfs2 layer, though it requires additional complexity and computation. The key exchange phase itself is also an identifiable pattern that censors can exploit.
obfs3 replaced the naive encryption with a proper Diffie-Hellman key exchange, making the encryption cryptographically secure. With obfs3, Tor traffic runs through five layers of encryption: the obfs3 layer, the TLS layer, and the three onion layers from Tor itself. Now censors cannot decipher the traffic even with additional compute, completely hiding TLS fingerprints. However, packet size and handshake timing patterns remained detectable, and active probing vulnerabilities persisted.
Scramblesuit represented a major step forward. Its primary improvement was probe resistance: when a Scramblesuit-capable bridge shares its identity, it includes a secret (effectively a password) that clients must possess for authentication. Any connection attempt that cannot prove possession of the secret cannot establish a Tor circuit through the bridge, thereby defeating active probers.
Beyond probe resistance, Scramblesuit introduced “protocol polymorphism” – additional obfuscation targeting packet timing and size characteristics. It adds random padding bytes to packets and timing jitter to hide the distinctive patterns of the underlying Tor protocol.
obfs4 is Tor’s current state-of-the-art obfuscation solution. It evolved from Scramblesuit, transforming it from an academic research protocol into a production-ready system. obfs4 inherits Scramblesuit’s core principles of probe resistance and protocol polymorphism while making various improvements to the Diffie-Hellman process and authentication method for better performance and cryptographic properties. obfs4 is the standard obfuscation pluggable transport for Tor today.
meek
meek is a domain fronting solution for Tor. Instead of connecting to a bridge directly, the meek client makes a seemingly legitimate request to a frontend server such as a CDN provider, with a benign domain name in the SNI and the actual bridge destination hidden inside the encrypted HTTP headers. The CDN provider forwards the request to the actual Tor bridge.
meek was gradually phased out starting in 2018 after major cloud providers began disabling domain fronting.
Closing Thoughts
Tor was not invented for censorship circumvention, but its philosophy of Internet privacy, anonymity, and freedom has driven significant innovation in this space. Many circumvention techniques – obfuscation, domain fronting, and others – are strikingly similar to approaches later adopted outside the Tor ecosystem, but these ideas were pioneered by the Tor community.
Similarly, many of GFW’s censorship methods – active probing, TLS fingerprinting, packet size and timing analysis – were first deployed against Tor before being applied more broadly. Tor served as an experimental proving ground for both sides of the arms race.
Beyond the circumvention techniques themselves, Tor’s pluggable transport architecture has seen wide adoption outside the Tor ecosystem. Although protocols like the obfs family and meek were designed for Tor, they are modular components decoupled from the Tor protocol itself – nothing prevents them from being used independently. This architectural insight – the separation of obfuscation transport from the underlying application protocol – is the foundational idea behind V2Ray, the most widely used censorship circumvention platform today.
References
- Networks at ITP, NYU. Demystifying the Dark Web: An Introduction to Tor and Onion Routing. https://itp.nyu.edu/networks/explanations/demystifying-the-dark-web-an-introduction-to-tor-and-onion-routing/
- Philipp Winter and Stefan Lindskog. How the Great Firewall of China is Blocking Tor. In 2nd USENIX Workshop on Free and Open Communications on the Internet (FOCI 12). https://www.usenix.org/conference/foci12/workshop-program/presentation/winter
- The Tor Project. Tor Specifications. https://spec.torproject.org/
- The Tor Project. Where to find Tor bridges? https://support.torproject.org/tor-browser/circumvention/getting-bridges/
- Tor source code. https://gitlab.torproject.org/tpo/core/tor
- Tim Wilde. Great Firewall Tor Probing Circa 09 DEC 2011. https://gist.github.com/twilde/da3c7a9af01d74cd7de7
- Salesforce. TLS Fingerprinting with JA3 and JA3S. https://engineering.salesforce.com/tls-fingerprinting-with-ja3-and-ja3s-247362855967/
- Yawning Angel. obfs4. https://gitlab.com/yawning/obfs4/-/tree/master
- Philipp Winter, Tobias Pulls, and Juergen Fuss. 2013. ScrambleSuit: a polymorphic network protocol to circumvent censorship. In Proceedings of the 12th ACM workshop on Workshop on privacy in the electronic society (WPES ‘13). https://doi.org/10.1145/2517840.2517856
- The Tor Project. meek. https://gitlab.torproject.org/legacy/trac/-/wikis/doc/meek
Enjoy Reading This Article?
Here are some more articles you might like to read next: