Definition

MuLan applied the architecture of CLIP to audio clip (represented as a spectrogram)-caption pairs