DeepSeek Launches Open Source Week with FlashMLA Release

DeepSeek initiated its “Open Source Week” on February 24, 2025, by open-sourcing FlashMLA, an efficient MLA decoding kernel optimized for NVIDIA’s Hopper GPUs. This release marks the beginning of the company’s plan to open source five code repositories, following their announcement on February 21.

FlashMLA demonstrates significant performance capabilities on the NVIDIA H800 SXM5 platform, achieving up to 3000 GB/s in memory-bound configuration and 580 TFLOPS in computation-bound configuration when using CUDA 12.6. The project currently supports BF16 and implements paged kvcache with a block size of 64, specifically optimized for variable-length sequences serving.

The project, hosted on GitHub under the MIT license, has garnered 5.1k stars and 213 forks. FlashMLA is developed primarily in C++ (88.8%), Python (11.0%), and CUDA (0.2%), with contributions from four developers including Jiashi Li, Shao Tang, sazc, and homorunner.

Technical requirements for using FlashMLA include:

  • Hopper GPUs
  • CUDA 12.3 and above
  • PyTorch 2.0 and above

The implementation draws inspiration from FlashAttention 2&3 and cutlass projects, focusing on optimizing performance for NVIDIA’s Hopper architecture, which was introduced in 2022 with GPUs such as the H100 and H800 for AI computing applications.

DeepSeek’s Open Source Week initiative will continue with daily releases of new content, as the company shares its research progress with the global developer community.