In grouped-query attention, heads are grouped and each group shares same key value matrix. It’s less destructive than the multi-query attention, but has performance hit relative to the full multi-head attention.