CCPA: A Concurrent Content Processing Architecture for Hardware Firewalls

Published in ISPASS 2026, 2026

alt text

Hardware firewalls are critical components of today’s data centers and large enterprises. However, these firewalls demonstrate poor utilization for network traffic that is dominated by a small number of large sessions (elephant flows), sessions with high network flow bandwidth. For these large sessions (and, in fact, all sessions), we observe that the bottleneck is that the processing of all packets is serialized on a single data processing card (DPC) of the firewall to support per-connection consistency, lowering overall utilization. We make a novel observation that only the stateful inspection phase of packet processing truly needs to be serialized – the content inspection phase of packet processing, which dominates the overall processing time, can be parallelized across multiple DPCs without impacting correctness. Based on this observation, we propose CCPA, a novel architecture of hardware firewalls where the stateful inspection for all packets in a session is first performed sequentially on a dedicated processor before the packets are sent to DPCs for concurrent content inspection. By addressing the utilization bottleneck, CCPA improves the average firewall throughput by 4.29x - 14.3x when using an optical backplane.

Paper

Recommended citation: S. Chen, S. Pal and R. Kumar https://davidchen.page/files/CCPA.pdf