My Knowledge Base

❯

Multi-Query Attention

Jun 11, 20261 min read

machine_learning/deep_learning
machine_learning/large_language_model

In multi-query attention, each attention head shares single key and value matrix. The only difference between each attention head is the query matrix. It significantly reduces memory usage, but at the cost of each head’s specialization.

Graph View

Backlinks

Grouped-Query Attention
KV Caching

GitHub
Discord Community

My Knowledge Base

Explorer

Multi-Query Attention

Graph View

Backlinks