In grouped-query attention, heads are grouped and each group shares same key value matrix. It’s less destructive than the Multi-Query Attention, but has performance hit relative to the full multi-head attention.
In grouped-query attention, heads are grouped and each group shares same key value matrix. It’s less destructive than the Multi-Query Attention, but has performance hit relative to the full multi-head attention.