GFW Technical Review 02 – VPN

When the Great Firewall first deployed its earliest filtering mechanisms (IP blocking, DNS poisoning, keyword-based DPI), the Internet was still dominated by plaintext protocols. The GFW had broad visibility into traffic contents, but that visibility also made it vulnerable to anything that encrypted or encapsulated those contents. VPNs were never designed for circumvention, but they offered exactly that: a mature, standardized way to wrap traffic inside an encrypted tunnel.

Once the traffic was wrapped, the GFW’s early toolkit ran out of leverage. Destinations were hidden, so IP and DNS blocks missed. Payloads were encrypted, so keyword DPI had nothing to match. The assumption at the time, largely correct, was that encrypted tunnels would look “opaque enough” to slip past early DPI.


How VPNs Work

A VPN takes arbitrary IP packets, wraps them inside another protocol, and usually encrypts the inner payload. The client exposes a virtual tunnel interface that behaves like a normal network adapter. Applications send packets to this interface as if they were going straight to the Internet, but each one gets wrapped and encrypted before leaving the machine. Conceptually, a VPN packet looks like this:

Outer IP header → Transport header (TCP/UDP) → Encrypted VPN payload (inner IP packet)

The outer IP header points to the VPN server, usually located outside the censored network. The server decrypts the packets, restores the original IP packets, and forwards them to their intended destinations. As a result, the only traffic visible to the GFW is the encrypted, encapsulated VPN tunnel. The destination IPs, DNS queries, and application-layer content are hidden.

Not everything is concealed, however: the outer IP and transport headers remain plaintext. If GFW operators know the IP address of a VPN endpoint, they can still block it outright.


Common VPN Protocols

Different VPN protocols encapsulate traffic in different ways, which also influences how detectable they are.

IPsec

One of the earliest standardized VPN protocols. It uses ESP (Encapsulating Security Payload) and AH (Authentication Header) and relies on IKE (Internet Key Exchange) for key negotiation. IPsec has recognizable packet formats and commonly uses UDP ports 500 and 4500, making it relatively easy to fingerprint.

OpenVPN

A later protocol that reuses TLS for its handshake and cryptographic layer. OpenVPN can run over UDP or TCP, often defaulting to UDP/1194. The handshake borrows from TLS, but OpenVPN wraps it in its own record format with a plaintext opcode byte at a fixed offset, so even the cryptographic exchange does not look like HTTPS on the wire.

L2TP/PPTP

Older tunneling technologies widely supported in early consumer devices. PPTP relied on GRE and MS-CHAP, while L2TP typically ran on top of IPsec. Both fell out of favor over time: PPTP’s authentication was broken by the mid-2000s, and the L2TP/IPsec stack paid a double-encapsulation penalty that newer protocols avoided.

SSH Tunneling

Not technically a VPN protocol but functionally similar. Users run SSH clients and servers on each end, encapsulating traffic inside an encrypted SSH stream. While simpler, SSH sessions also have identifiable handshake patterns.

SSH tunneling works much like a VPN

Case Study: OpenVPN

OpenVPN is a representative classic VPN protocol and a clean example of how early VPN traffic looked “on the wire.”

OpenVPN connections are stateful. The client opens with a “Client Reset” packet; the server replies with a “Server Reset.” This exchange establishes a session ID and sets up a control channel. The two sides then run a TLS handshake over that control channel, using OpenSSL for cryptography. Once the TLS session is up and keys are exchanged, the data channel activates and encrypted payloads begin to flow.

OpenVPN Session Establishment

Although OpenVPN encrypts the data channel, it was never designed to hide the fact that it is a VPN. Several aspects remain visible:

  • The control channel is not further obfuscated beyond standard TLS.
  • The data channel exposes certain fields, such as opcode and Key ID, in plaintext.
  • Opcodes appear at fixed offsets in packets and follow predictable sequences.
OpenVPN wireline packet format. It exposes opcode in plaintext

For an adversary like the GFW, these characteristics are reliable identification vectors. Xue et al. (USENIX Security 2022) demonstrated this concretely, identifying the vast majority of OpenVPN flows in real ISP traffic by combining opcode-based filtering with lightweight active probing. As DPI matured, OpenVPN’s distinct “fingerprint” became trivial to detect and block.


Closing Thoughts

VPNs stood out as the first mature, widely available circumvention option, and their impact still shows in the language: many users call every circumvention tool a “VPN,” regardless of what protocol is underneath.

But VPNs were never purpose-built for censorship resistance. They encrypted the payload without hiding metadata or protocol identity, and once the GFW’s DPI matured, those exposed signals were enough to detect and block them. By late 2012, VPN disruptions in China had become widespread enough to draw international press coverage. The next post turns to those DPI mechanisms, which set the agenda for every circumvention protocol that followed.


References




Enjoy Reading This Article?

Here are some more articles you might like to read next:

  • GFW Technical Review 14 – The Cat and Mouse Game
  • GFW Technical Review 13 – Hysteria
  • GFW Technical Review 12 – Advanced TLS Evasion
  • GFW Technical Review 11 – Statistical Fingerprinting
  • GFW Technical Review 10 – Trojan