An online technical session on batching, KV caching and squeezing latency out of model serving.
A focused technical talk + AMA on the systems behind fast, cheap inference. Recording included.
Lena Vasquez
Distinguished Engineer, Helix Compute