Meta has
created a new AI model ‘ImageBind’ that offers multisensory experiences
by connecting multiple forms of data into one multidimensional index or embedding
space. The model is open source, but with no consumer or practical applications
as of yet.
The multi-modality approach, as explained by Meta, “brings
machines one step closer to humans’ ability to learn simultaneously,
holistically, and directly from many different forms of information.” The types of data that ImageBind incorporates include
audio, text, images / videos, infrared images, depth, and movement
readings. Meta says that the model is the first one to merge six types of data.
The cross-referencing system is the basis for all types of generative
AI tools. For instance, a generative AI tool that produces images from text
prompts looks for patterns in visual data and simultaneously matches them with
descriptions of the images.
Meta has designed the model to open a new path for futuristic AI tools capable of generating hyper realistic results through incorporation of a variety of sensory information, even touch, speech, smell, and brain fMRI signals, which is both strange and amusing to imagine.