ChatGPT has been updated with a new, useful functionality:
the ability to ask the bot questions in the form of audio and images. Most AI
models, including ChatGPT itself, were originally designed to receive and
respond to questions in the form of text prompts. With further advancements, OpenAI
has expanded the potential of its technology, hence the new prompting methods.
ChatGPT can now see, hear, and speak. Rolling out over next two weeks, Plus users will be able to have voice conversations with ChatGPT (iOS & Android) and to include images in conversations (all platforms). https://t.co/uNZjgbR5Bm pic.twitter.com/paG0hMshXb
— OpenAI (@OpenAI) September 25, 2023
The process of entering commands or questions via audio is apparently pretty simple: the user will need to tap a button to record their query which will be automatically converted to text and processed by the chatbot. Similarly, the answer from the bot will be converted back to speech and spoken out loud for the user.
The master technology behind this speech-to-text conversion is OpenAI’s Whisper model. The company has also developed another similar model for text-to-speech conversion, that it claims can produce “human-like audio” from text, after being trained on a few seconds of sample speech. In this domain, OpenAI has even collaborated with Spotify to translate podcasts into other languages, while maintaining the original sound of the podcaster’s voice.
However, with the ability to generate synthetic voices also
come various potential ethical problems; for instance, impersonation attempts
by malicious actors for the purpose of defamation or fraud. This is also the
reason why OpenAI is choosing to keep the accessibility of its model limited to
specific use cases and partnerships, hence preventing broad use of the
technology to minimize potentially harmful behaviours.
The image prompt method works somewhat like Google Lens –
the user shows the chatbot an image, based on which the bot makes a guess about
what is being inquired, and then generates a response accordingly. Additionally,
the user can type in questions to add to the context of their query via image.
These new ChatGPT features will be initially available to
users who pay for the service and will later be expanded to all users.