ChatGPT Introduces Image and Audio Formats of Asking Questions to the Bot

ChatGPT has been updated with a new, useful functionality: the ability to ask the bot questions in the form of audio and images. Most AI models, including ChatGPT itself, were originally designed to receive and respond to questions in the form of text prompts. With further advancements, OpenAI has expanded the potential of its technology, hence the new prompting methods.

ChatGPT can now see, hear, and speak. Rolling out over next two weeks, Plus users will be able to have voice conversations with ChatGPT (iOS & Android) and to include images in conversations (all platforms). https://t.co/uNZjgbR5Bm pic.twitter.com/paG0hMshXb
— OpenAI (@OpenAI) September 25, 2023

The process of entering commands or questions via audio is apparently pretty simple: the user will need to tap a button to record their query which will be automatically converted to text and processed by the chatbot. Similarly, the answer from the bot will be converted back to speech and spoken out loud for the user.

The master technology behind this speech-to-text conversion is OpenAI’s Whisper model. The company has also developed another similar model for text-to-speech conversion, that it claims can produce “human-like audio” from text, after being trained on a few seconds of sample speech. In this domain, OpenAI has even collaborated with Spotify to translate podcasts into other languages, while maintaining the original sound of the podcaster’s voice.

However, with the ability to generate synthetic voices also come various potential ethical problems; for instance, impersonation attempts by malicious actors for the purpose of defamation or fraud. This is also the reason why OpenAI is choosing to keep the accessibility of its model limited to specific use cases and partnerships, hence preventing broad use of the technology to minimize potentially harmful behaviours.

The image prompt method works somewhat like Google Lens – the user shows the chatbot an image, based on which the bot makes a guess about what is being inquired, and then generates a response accordingly. Additionally, the user can type in questions to add to the context of their query via image.

These new ChatGPT features will be initially available to users who pay for the service and will later be expanded to all users.

Upper Pages

Social Items

Visualistan

ChatGPT Introduces Image and Audio Formats of Asking Questions to the Bot

Notifications

ChatGPT Introduces Image and Audio Formats of Asking Questions to the Bot

Related Post

TikTok’s Latest Transparency Report Provides Disclosure on Influence Operations of State-Affiliated Groups

A New Windows and Meta Collab is Bringing Windows Volumetric Apps to Meta Quest Headsets

Samsung is Offering a Free 50-Inch TV with Pre Order of its Copilot Plus PC

Notifications