GFW Technical Review 03 – Deep Packet Inspection

The GFW is far more than a traditional firewall. Architecturally, it resembles a large-scale Intrusion Detection and Prevention System (IDPS), similar to those deployed in enterprise networks, government agencies, or financial institutions. Conceptually, it operates much like the security systems banks use to protect internal assets: it observes traffic, analyzes behavior, and identifies policy violations.

The challenges, methodologies, and design trade-offs of IDPS systems apply directly to the GFW. And at the heart of any IDPS lies the ability to understand traffic behavior and intent. This requires Deep Packet Inspection (DPI) – the examination of every connection and packet to determine whether the communication should be allowed or blocked.

What distinguishes the GFW is not its underlying concept, but its scale. The throughput and geographical coverage it must support dwarf any enterprise IDPS deployment, rendering most commercial architectures insufficient. To meet this challenge, the GFW relies on a design that emphasizes distribution, parallelism, and extremely high performance.

Load Distribution

To manage the enormous traffic volume on national backbone links, the GFW employs a distributed, on-path architecture supported by data-center–scale compute resources. Its design is highly scalable and elastic.

As described in the first blog post, the GFW primarily uses passive network taps to obtain a copy of traffic on backbone links. Multiple tapped streams are aggregated and load-balanced into a set of parallel data pipelines. This load balancing is likely performed based on the flow 5-tuple: source IP, source port, destination IP, destination port, and transport protocol.

Each data stream is then processed by a cluster of DPI sensors (or IDS sensors). Each “sensor” implements a different traffic analysis algorithm. Sensors analyzing the same stream may also share intermediate information or cooperate to refine a detection result. The output of these sensors – whether scores, classification labels, or rule violations—is then forwarded to downstream systems responsible for logging, alert generation, or active response measures such as packet injection.

Logical Topology of GFW’s DPI system, consisting of a load balancer, a cluster of DPI sensors, and downstream services

Packet Processing

In a conventional system, packets arriving at a host must traverse the kernel’s networking stack and are copied multiple times: from NIC buffers to kernel memory and then into user space. This overhead becomes prohibitive at backbone throughput. To avoid this cost, the GFW implements zero-copy packet ingestion by modifying NIC drivers. Packets are DMA’d directly into user-space memory buffers shared with DPI processes, bypassing the kernel entirely.

The zero-copy stack used in early versions of GFW

Because many protocols cannot be accurately identified from a single packet, DPI engines must perform TCP stream reassembly. To support this, the GFW implements a lightweight TCP/IP stack in user space, capable of reconstructing flows and tracking per-connection state. Numerous optimizations are necessary to enable this stack to handle millions of concurrent flows and sustain high throughput.

However, this design introduces new attack surfaces. As we will explore in the next blog, a user-space self-implemented TCP stack can have subtle inconsistencies. Carefully crafted packet sequences can cause parsing errors or desynchronization between the DPI engine and the real network endpoints.

Unlike a conventional TCP/IP stack, which must manage full bidirectional communication, the GFW’s reassembly logic only needs to parse inbound traffic from one direction. This simplification allows the system to run multiple parallel instances of its TCP stack, taking advantage of multi-core architectures. It is yet another example of the GFW’s philosophy of distribution and parallelism.

DPI methodologies

Early DPI techniques used by the GFW were relatively simple and focused mainly on two dimensions:

1. Pattern Matching

Often implemented as string matching on keywords, URL substrings, hostnames, or protocol signatures. While conceptually straightforward, achieving high performance across massive traffic volumes requires highly optimized algorithms and fast, memory-efficient state machines.

2. Protocol Identification

Even in its early iterations, the GFW supported detection of well-known application-layer protocols such as HTTP, SMTP, and FTP, as well as early circumvention protocols like traditional VPNs and tools such as Freegate. As discussed in the previous blog, these early circumvention methods offered little or no traffic obfuscation, making them trivial to identify using protocol heuristics or signature matching.

3. Port Matching

The GFW also looks at the port number for protocol identification. For example, OpenVPN often uses UDP port 1994. Though this can be easily bypassed by circumvention tools as port numbers are just conventions.

Residual Censorship

Residual censorship is a relatively recent technique adopted by the GFW. It acts as a punitive mechanism: once the GFW detects and blocks a connection it considers suspicious, it continues to block subsequent attempts between the same endpoints for a short period of time – even if those later attempts are completely benign. In effect, the endpoints become temporarily blacklisted.

Empirical observations suggest that the GFW keys this temporary blacklist using a 3-tuple: (client IP, server IP, server port). This choice reflects a practical compromise. By applying a coarse-grained but short-lived blacklist, the GFW can significantly increase blocking effectiveness while limiting the collateral damage caused by potential misclassification. The result is an enforcement mechanism that is aggressive in the moment yet self-corrects quickly enough to avoid long-term disruption to normal traffic.

Closing Thoughts

Deep Packet Inspection forms the core analytical capability of the GFW. As both the GFW and circumvention technologies evolve, detection methodologies have grown increasingly sophisticated. That said, the system’s architectural emphasis on performance – particularly its reliance on parallel DPI pipelines and user-space TCP stream reassembly – introduces significant structural weaknesses.

In the next blog, we will examine how these weaknesses arise and how certain circumvention protocols exploit inconsistencies in the GFW’s TCP reassembly logic.

References

Sheharbano Khattak, Mobin Javed, Philip D. Anderson, Vern Paxson. Towards Illuminating a Censorship Monitor’s Model to Facilitate Evasion. 3rd USENIX Workshop on Free and Open Communications on the Internet (FOCI 13). https://www.usenix.org/conference/foci13/workshop-program/presentation/khattak
B. Mukherjee, L. T. Heberlein and K. N. Levitt. 1994. Network intrusion detection. IEEE Network. https://ieeexplore.ieee.org/abstract/document/283931
深入理解GFW: 内部结构. http://gfwrev.blogspot.com/2010/02/gfw.html
陈训逊, 方滨兴, 李蕾. 高速网络环境下入侵检测系统结构研究. 计算机研究与发展. https://www.icir.org/christian/outback/fang.pdf
张兆心, 方滨兴, 胡铭曾. 支持IDS的高速网络信息获取体系结构. 北京邮电大学学报. https://journal.bupt.edu.cn/EN/article/downloadArticleFile.do?attachType=PDF&id=1712