ChatGPT Introduces Image and Audio Formats of Asking Questions to the Bot - Visualistan -->

    Social Items



ChatGPT has been updated with a new, useful functionality: the ability to ask the bot questions in the form of audio and images. Most AI models, including ChatGPT itself, were originally designed to receive and respond to questions in the form of text prompts. With further advancements, OpenAI has expanded the potential of its technology, hence the new prompting methods.

 

 

The process of entering commands or questions via audio is apparently pretty simple: the user will need to tap a button to record their query which will be automatically converted to text and processed by the chatbot. Similarly, the answer from the bot will be converted back to speech and spoken out loud for the user.


The master technology behind this speech-to-text conversion is OpenAI’s Whisper model. The company has also developed another similar model for text-to-speech conversion, that it claims can produce “human-like audio” from text, after being trained on a few seconds of sample speech. In this domain, OpenAI has even collaborated with Spotify to translate podcasts into other languages, while maintaining the original sound of the podcaster’s voice.

 

However, with the ability to generate synthetic voices also come various potential ethical problems; for instance, impersonation attempts by malicious actors for the purpose of defamation or fraud. This is also the reason why OpenAI is choosing to keep the accessibility of its model limited to specific use cases and partnerships, hence preventing broad use of the technology to minimize potentially harmful behaviours.

 

The image prompt method works somewhat like Google Lens – the user shows the chatbot an image, based on which the bot makes a guess about what is being inquired, and then generates a response accordingly. Additionally, the user can type in questions to add to the context of their query via image.

 

These new ChatGPT features will be initially available to users who pay for the service and will later be expanded to all users.

ChatGPT Introduces Image and Audio Formats of Asking Questions to the Bot



ChatGPT has been updated with a new, useful functionality: the ability to ask the bot questions in the form of audio and images. Most AI models, including ChatGPT itself, were originally designed to receive and respond to questions in the form of text prompts. With further advancements, OpenAI has expanded the potential of its technology, hence the new prompting methods.

 

 

The process of entering commands or questions via audio is apparently pretty simple: the user will need to tap a button to record their query which will be automatically converted to text and processed by the chatbot. Similarly, the answer from the bot will be converted back to speech and spoken out loud for the user.


The master technology behind this speech-to-text conversion is OpenAI’s Whisper model. The company has also developed another similar model for text-to-speech conversion, that it claims can produce “human-like audio” from text, after being trained on a few seconds of sample speech. In this domain, OpenAI has even collaborated with Spotify to translate podcasts into other languages, while maintaining the original sound of the podcaster’s voice.

 

However, with the ability to generate synthetic voices also come various potential ethical problems; for instance, impersonation attempts by malicious actors for the purpose of defamation or fraud. This is also the reason why OpenAI is choosing to keep the accessibility of its model limited to specific use cases and partnerships, hence preventing broad use of the technology to minimize potentially harmful behaviours.

 

The image prompt method works somewhat like Google Lens – the user shows the chatbot an image, based on which the bot makes a guess about what is being inquired, and then generates a response accordingly. Additionally, the user can type in questions to add to the context of their query via image.

 

These new ChatGPT features will be initially available to users who pay for the service and will later be expanded to all users.

Related Post

Disqus Codes
  • To write a bold letter please use <strong></strong> or <b></b>
  • To write a italic letter please use <em></em> or <i></i>
  • To write a underline letter please use <u></u>
  • To write a strikethrought letter please use <strike></strike>
  • To write HTML code, please use <code></code> or <pre></pre> or <pre><code></code></pre>
    And use parse tool below to easy get the style.
Show Parser Box

strong em u strike
pre code pre code spoiler
embed

Subscribe Our Newsletter

Notifications

Disqus Logo