Appearance
Howard Shan
Building technical scarcity in AI Infrastructure.
I am focused on LLM inference systems, CUDA kernels, source code reading, and high-performance computing.
Current Focus
- LLM inference systems
- SGLang / vLLM source code reading
- CUDA / Triton kernel optimization
- FlashAttention and attention optimization
- Distributed inference and serving systems