Networking for AI: Building High-Performance Fabrics for GPU Clusters
An overview of the networking challenges and design considerations when building fabrics for large-scale AI/ML training clusters, from RDMA to rail-optimized topologies.
Data Center Network Engineer
Bridging AI and networking with 18+ years of experience across enterprise, telco, and data center environments. Simplifying complex workflows so engineers can work more efficiently.
With over 18 years of experience spanning enterprise architecture, NOCs, telco, and data center environments, I specialize in designing and operating large-scale network infrastructures. Currently serving as HPE Juniper Apstra TAC Manager, I lead technical support for intent-based data center networking.
My passion lies at the intersection of AI for networking and networking for AI — leveraging machine learning to make networks smarter while building the high-performance fabrics that power AI/ML workloads. I've worked internationally across Taiwan, the Philippines, and Europe, bringing a global perspective to data center challenges.
Design and operations of GPU-dense AI/ML clusters, high-performance interconnects, and scale-out architectures for training and inference workloads.
Multi-tenant data center fabric design with EVPN-VXLAN overlays, spine-leaf architectures, and seamless workload mobility across DCs.
Intent-based networking with Apstra, infrastructure as code, CI/CD for network configs, and Python-driven operational tooling.
SmartNIC and DPU offload strategies, RDMA/RoCE performance tuning, and hardware acceleration for high-throughput data center networks.
Insights on AI clusters, EVPN-VXLAN, automation workflows, and real-world data center operations.