OpenAI brings a voice and image search option for ChatGPT

OpenAI has always come up with major improvements in its revolutionary chatbot ChatGPT. This time it’s the voice feature that has been granted to the chatbot for enabling users to be able to prompt using their voice. The upcoming update of ChatGPT will allow users to choose a voice and easily give the prompt to ChatGPT through it.

OpenAI leaves no stone unturned when it comes to ensuring that ChatGPT constantly maintains its superior spot in the Artificial Intelligence world. Earlier, users could only give the prompt to the AI chatbot using text but now, one can provide quick prompts through voice as well as images.

As OpenAI says, “ChatGPT can now see, hear, and speak”, the AI chatbot will be able to respond effectively to your prompts sent through images or voice notes. Whether you send in an image of a circled math problem or want to discuss a beautiful sight you just witnessed on your way back home through voice conversation, you can do it all!

The ChatGPT will instantly respond to your prompts by talking back with you making you feel like you are having an actual phone call conversation with a friend. This is a great way to engage with the chatbot.

The update has started rolling out for the users globally. OpenAI’s official blog post mentioned: “We’re rolling out voice and images in ChatGPT to Plus and Enterprise users over the next two weeks. Voice is coming on iOS and Android (opt-in in your settings) and images will be available on all platforms.”

To access the new voice feature, one will need to go to the App Settings and click on New Features. Then look for the headphone button on the top-right corner. There will be five voice options and you will be required to choose your one preferred option.

OpenAI explained the voice feature is powered by a text-to-speech model. In the blog post, they mentioned, “We collaborated with professional voice actors to create each of the voices. We also use Whisper, our open-source speech recognition system, to transcribe your spoken words into text”.

Meanwhile, the image feature is powered by multimodal GPT-3.5 and GPT-4 models. Image understanding through these models’ language reasoning skills enables the chatbot to comprehend a vast range of images including text documents, screenshots, and camera photographs.

With the voice feature, ChatGPT comes a bit closer to the other AI voice assistants like Amazon Alexa, Apple’s Siri, and others. These capabilities will enhance the user experience and allow the users to use ChatGPT more frequently, more widely, and more effectively.