these days, the more i read about modern ai inference, the more i start to draw these web of connections to HFT (high-frequency trading).
replace market data with tokens, NICs with GPUs, and order execution with inference, and suddenly… you’re reading the same playbook.
not to mention, the underlying skill set is a near exact match.
in HFT, engineers obsess over shaving microseconds off the critical path. we bypass the kernel to eliminate unnecessary context switches, build lock-free structures so threads never wait on each other (on a side note: the mutex strategy generally wins over lock-free lol), organize memory to maximize cache locality, vectorize hot-paths with SIMD instructions, and profile everything like maniacs because every cache miss has a measurable cost.
modern ai infra looks surprisingly similar.
instead of market data arriving over the network, its tokens flowing through transformer layers. instead of optimizing packet processing, engineers are scripting custom CUDA kernels, squeezing every ounce of throughput from tensor cores, managing KV caches to avoid costly redundancy, and carefully scheduling GPU work so expensive hardware never sits idle.
this looks like a different workload with the same obsession.
what’s interesting to me is that both industries use C++ or care about performance. what’s even more interesting to me is that they optimize for the exact same constraints. waiting – even for a few hundred nanoseconds – becomes unacceptable, memory bandwidth tends to become more important than raw compute (e.g., memory wall in ai inference), and data movement often costs more than arithmetic.
eventually, you realize… neither field is really about finance or ai – they’re about fighting hardware lol!
every optimization is an attempt to convince a CPU or GPU to spend less time waiting and more time doing useful work. whether that means reducing cache misses, fusing CUDA kernels, or bypassing the OS entirely… it doesn’t really matter. they’re all different expressions of the same idea.
that’s why i think the transition between quant engineering and ai infra feels so natty. sure… the tech evolves, but the engineering principles remain stern.
industries come and go, hardware constraints don’t.
~ a.k