My Knowledge Base

❯

❯

Grouped Query Attention

Grouped-Query Attention

Jul 26, 20261 min read

machine_learning/deep_learning
machine_learning/large_language_model

In grouped-query attention, heads are grouped and each group shares same key value matrix. It’s less destructive than the Multi-Query Attention, but has performance hit relative to the full multi-head attention.

Graph View

Backlinks

KV Caching

Created with Quartz v4.5.1 © 2026

GitHub
Discord Community