GFW Technical Review 03 – Deep Packet Inspection
The GFW is far more than a traditional firewall. Architecturally, it resembles a large-scale Intrusion Detection and Prevention System (IDPS) of the kind deployed in enterprise networks, government agencies, and financial institutions: it observes traffic, analyzes behavior, and identifies policy violations.
At the heart of any IDPS lies the ability to understand traffic behavior and intent. That ability is Deep Packet Inspection (DPI): examining every connection and packet to decide whether the communication should be allowed or blocked.
What distinguishes the GFW is not the concept but the scale. The throughput and geographic coverage it must support dwarf any enterprise IDPS deployment, and off-the-shelf commercial architectures cannot keep up. The GFW’s response is a design built around distribution, parallelism, and raw performance.
Load Distribution
To manage the enormous traffic volume on national backbone links, the GFW deploys a distributed, on-path architecture backed by datacenter-scale compute.
As described in the first post, the GFW primarily uses passive network taps to obtain a copy of traffic on backbone links. Multiple tapped streams are aggregated and load-balanced into parallel data pipelines, likely keyed on the flow 5-tuple: source IP, source port, destination IP, destination port, and transport protocol. This keeps both directions of a connection on the same sensor, so per-flow state never has to cross pipelines.
Each pipeline feeds a cluster of DPI sensors (or IDS sensors), each running a different analysis algorithm. Sensors on the same stream can share intermediate state to refine a detection result. Their outputs (scores, classification labels, rule violations) flow downstream to logging, alert generation, and active response measures such as packet injection.
Packet Processing
On a conventional host, packets traverse the kernel’s networking stack and are copied multiple times along the way: NIC buffers to kernel memory, then kernel memory to user space. That overhead is prohibitive at backbone throughput. The GFW sidesteps it with zero-copy ingestion, using modified NIC drivers to DMA packets directly into user-space buffers shared with DPI processes, bypassing the kernel entirely.
Most protocols cannot be identified from a single packet, so DPI engines must reassemble TCP streams. To do this, the GFW runs a lightweight TCP/IP stack in user space, with extensive optimization to track millions of concurrent flows at line rate.
This design introduces new attack surfaces. As the next post explores, a hand-rolled user-space TCP stack can fall out of sync with real endpoints; carefully crafted packet sequences can trigger parsing errors or desynchronization between what the DPI engine sees and what the actual server processes.
Unlike a conventional TCP/IP stack, which must handle full bidirectional communication, the GFW only needs to parse inbound traffic from one direction. That simplification lets the system fan out parallel TCP-stack instances across cores, another expression of the GFW’s distribution-and-parallelism philosophy.
DPI methodologies
Early DPI techniques fell into three categories:
Pattern Matching
String matching on keywords, URL substrings, hostnames, or protocol signatures. Conceptually straightforward, but reaching backbone-grade throughput requires highly optimized algorithms and fast, memory-efficient state machines.
Protocol Identification
Even in its early iterations, the GFW could detect well-known application-layer protocols (HTTP, SMTP, FTP) along with early circumvention protocols like traditional VPNs and tools such as Freegate. As discussed in the previous post, those early circumvention methods offered little or no traffic obfuscation, making them trivial to identify with protocol heuristics or signature matching.
Port Matching
The GFW also looks at port numbers. OpenVPN, for example, defaults to UDP/1194. This is easily sidestepped: port numbers are conventions, and any circumvention tool can pick a different one.
Residual Censorship
Residual censorship is a relatively recent technique adopted by the GFW. It acts as a punitive mechanism: once the GFW detects and blocks a connection it considers suspicious, it continues to block subsequent attempts between the same endpoints for a short period of time, even if those later attempts are completely benign. In effect, the endpoints become temporarily blacklisted.
Empirical observations suggest that the GFW keys this temporary blacklist using a 3-tuple: (client IP, server IP, server port). This choice reflects a practical compromise. By applying a coarse-grained but short-lived blacklist, the GFW can significantly increase blocking effectiveness while limiting the collateral damage caused by potential misclassification. The result is an enforcement mechanism that is aggressive in the moment yet self-corrects quickly enough to avoid long-term disruption to normal traffic.
Closing Thoughts
Deep Packet Inspection forms the core analytical capability of the GFW. As both the GFW and circumvention tools evolve, detection methodologies have grown more sophisticated. But the architectural choices that buy that performance, particularly the parallel DPI pipelines and user-space TCP reassembly, introduce significant structural weaknesses.
The next post examines how those weaknesses arise and how circumvention protocols exploit inconsistencies in the GFW’s TCP reassembly logic.
References
- Sheharbano Khattak, Mobin Javed, Philip D. Anderson, Vern Paxson. Towards Illuminating a Censorship Monitor’s Model to Facilitate Evasion. 3rd USENIX Workshop on Free and Open Communications on the Internet (FOCI 13). https://www.usenix.org/conference/foci13/workshop-program/presentation/khattak
- B. Mukherjee, L. T. Heberlein and K. N. Levitt. 1994. Network intrusion detection. IEEE Network. https://ieeexplore.ieee.org/abstract/document/283931
- 深入理解GFW: 内部结构. http://gfwrev.blogspot.com/2010/02/gfw.html
- 陈训逊, 方滨兴, 李蕾. 高速网络环境下入侵检测系统结构研究. 计算机研究与发展. https://www.icir.org/christian/outback/fang.pdf
- 张兆心, 方滨兴, 胡铭曾. 支持IDS的高速网络信息获取体系结构. 北京邮电大学学报. https://journal.bupt.edu.cn/EN/article/downloadArticleFile.do?attachType=PDF&id=1712
Enjoy Reading This Article?
Here are some more articles you might like to read next: