Open AI Latest Revolutionary Update: ChatGPT can now See, Hear, and Speak

In a groundbreaking leap forward for artificial intelligence, OpenAI has announced a monumental update to its ChatGPT, marking its most significant enhancement since the introduction of GPT-4. Aptly dubbed “ChatGPT can now see, hear, and speak,” this latest development showcases a remarkable fusion of capabilities that are set to revolutionize human-AI interactions.

The evolution of ChatGPT introduces three pivotal features: voice conversations, auditory responses in five distinct synthetic voices, and image recognition, all designed to redefine the boundaries of AI engagement. Voice functionality, available on the iOS and Android mobile apps, enables users to engage in seamless, back-and-forth dialogues with ChatGPT. The synthetic voices, meticulously crafted by professional actors, deliver an unparalleled sense of realism.

Simultaneously, ChatGPT’s image recognition capabilities allow users to share images, fostering a new dimension of interaction. It empowers ChatGPT to analyze, discuss, and glean insights from visual content, creating opportunities for diverse applications, from troubleshooting technical problems to meal planning.

However, amid these remarkable advances, it’s crucial to recognize the limitations. ChatGPT may not possess human-like common sense and can provide overly literal answers. Its knowledge, extensive though it may be, is not exhaustive, and it may not stay abreast of recent developments. Furthermore, the computational demands of these features can lead to potential performance issues.

As the AI landscape continues to evolve, these new features in ChatGPT represent a significant stride forward, offering both tremendous potential and essential considerations for the AI-driven future.

Read on to discover what these new features of ChatGPT will offer you and how you can make use of them in your everyday work.

ChatGPT can now See, Hear, and Speak

Complete Overview of Voice Conversations Feature of ChatGPT

Imagine having a casual chat with your AI buddy, just like you would with a pal. Thanks to ChatGPT’s fresh voice capability, this dream is now a reality. You can have smooth, interactive conversations with ChatGPT using your voice. 

Whether you’re on the move, hunting for a bedtime story for your family, or trying to settle a lively dinner table discussion, ChatGPT is all set to engage.

Getting Started with Voice Chats:

  • Update Your App: Ensure your ChatGPT mobile app is up-to-date.
  • Opt for Voice: Visit the ‘Settings’ section within the app, and click on ‘New Features.’ Opt-in for voice chats.
  • Pick Your Voice: Once you’ve activated voice, tap the headphone icon in the upper-right corner of your home screen. You can choose your favorite voice from a selection of five.

The remarkable realistic quality of ChatGPT’s voices is brought to you by a text-to-speech model, with each voice meticulously crafted by skilled actors. Furthermore, Whisper, OpenAI’s publicly available speech recognition system, transforms your spoken words into text.

What are the five synthetic voices available for ChatGPT voice conversations?

You’ve got a variety of five synthetic voices to choose from when engaging in voice conversations with ChatGPT:

1.      Juniper: A female voice featuring an American accent.

2.      Sky: Another female voice with an American accent.

3.      Cove: A male voice carrying an American accent.

4.      Ember: A female voice sporting an American accent.

5.      Breeze: A blend of male and female voices, both with an American accent.

How to Activate ChatGPT Voice Conversation Feature?

To enable the voice conversation feature in ChatGPT, you can follow these steps:

  1. First, add the Voice Control for ChatGPT extension to your Google Chrome browser. You can easily find this extension in the Chrome Web Store.
  2.  Next, navigate to the ChatGPT website by visiting
  3. Now, either login if you already have an account or create a new one for free on ChatGPT.
  4.  Once you’ve successfully installed the extension and logged in, you’ll notice a microphone icon below the input field. Click on this icon to initiate voice control. Alternatively, you can activate voice control by simply pressing and holding the spacebar on your keyboard.
  5. When requested, grant permission for the extension to access your microphone, which is essential for its proper functioning.

Once you’ve activated the voice conversation feature, you can commence your conversation with ChatGPT. The extension will automatically voice ChatGPT’s responses. 

You can control whether the responses are read aloud by toggling the green mute button. For a more streamlined interface, you can access the extension settings (located next to the mute button) and select “Utilize compact interface.”

Benefits of ChatGPT New Voice Feature

The novel voice feature in ChatGPT offers numerous advantages to its users, elevating their interactions and enabling more versatile engagement. Some of the primary benefits of this voice feature are:

1. Natural Conversations: Users can now engage in casual conversations with ChatGPT, making communication more intuitive and human-like. This two-way conversation capability makes ChatGPT resemble a human assistant, akin to voice assistants like Amazon’s Alexa.

2. Audible Responses: When the voice feature is active, ChatGPT can respond audibly with one of five synthesized voices, delivering a more immersive experience. OpenAI’s text-to-speech model, trained using voice actor samples, produces audio that closely resembles human speech.

3. Multimodal Abilities: The introduction of voice input and output complements ChatGPT’s existing image recognition feature, enabling users to engage in real-time discussions about images and perform various tasks. This includes discussing meal ideas or requesting step-by-step recipes based on photos of items in their refrigerator.

4. Mobile Accessibility: The voice capability is accessible on mobile apps, allowing users to converse with ChatGPT on the go. This flexibility and accessibility enhance ChatGPT’s utility in daily activities, such as during travel or while completing household chores.

5. Whisper and Text-to-Speech Technology: OpenAI employs its Whisper speech recognition system to transcribe spoken words into text. Subsequently, the text-to-speech model converts this text into human-like audio. This combination of advanced technologies guarantees a seamless and high-quality voice experience for ChatGPT users.

Limitations of ChatGPT New Voice Feature

The new voice feature within ChatGPT offers numerous advantages to users, elevating their overall experience and enabling more versatile interactions. Nonetheless, there exist certain limitations that should be taken into consideration:

1. Lack of common sense: Although ChatGPT can produce responses that resemble human conversation and have access to a wealth of information, it lacks human-like common sense. Consequently, this can occasionally result in responses that appear illogical or overly literal.

2. Limited knowledge: While ChatGPT possesses access to an extensive repository of information, it may encounter difficulties in addressing highly specific or specialized topics. Furthermore, it might not stay informed about recent developments or changes within certain fields.

3. Computational resource demands: ChatGPT stands as a highly intricate and advanced AI language model, necessitating substantial computational resources for optimal performance. Consequently, this may lead to slower response times or limited availability during peak usage periods.

4. Text-based input exclusivity: Presently, the platform exclusively accepts text-based inputs, rendering it incapable of handling extensive text, such as summarizing novels or stories. Additionally, it cannot undertake multiple tasks simultaneously.

5. Contextual comprehension: ChatGPT does not consistently grasp the complete context of a subject, which can result in responses that are incomplete or inaccurate. Users may find it necessary to provide more specific information or rephrase their inquiries to achieve the desired outcomes.

Complete Overview Image Input or Recognition Feature

Introducing ChatGPT’s cutting-edge image recognition functionality, which elevates AI interaction to a whole new level. Now, you can present one or multiple images to ChatGPT for analysis, discussion, and the extraction of valuable insights. 

This innovative feature boasts a wide array of practical applications, ranging from resolving technical problems to assisting in meal planning.

To activate ChatGPT’s image recognition feature, follow these straightforward steps:

1.      Launch the ChatGPT mobile app or access the web interface.

2.      Navigate to the Settings menu.

3.      Choose “New Features” from the available options.

4.      Opt in to enable the image recognition feature.

Once you’ve successfully activated this feature, you can seamlessly communicate with ChatGPT using images. Here’s how to get started with discussions related to images:

How to Initiate Image-Based Conversations:

  1. Capture or Select an Image: Simply tap the photo button to either capture a fresh image or select one from your existing collection. For iOS and Android users, start by tapping the plus button to add images.
  2. Engage in Discussion and Add Annotations: You have the flexibility to discuss multiple images or employ the drawing tool within the mobile app to assist ChatGPT in comprehending the image better.

Users also have the option to employ their device’s touch screen to highlight specific areas within the image that they want ChatGPT to focus on. 

OpenAI may utilize CLIP, a technique aligning image and text representations within the same latent space, to bridge the gap between visual and textual data, thus enabling context-aware deductions.

The image recognition feature integrated into ChatGPT proves invaluable for various tasks, including diagnosing malfunctioning appliances, suggesting recipes based on available pantry items, and furnishing step-by-step instructions for a wide range of projects.

What types of images can I upload to ChatGPT for conversation?

You can upload various types of images to ChatGPT for conversation, including:

  • PNG (.png) files.
  • JPEG (.jpeg and .jpg) files.
  • Non-animated GIF (.gif) files

You can add images to a conversation depending on a variety of factors, such as the model you are using and the plan that you are on.

Currently, the image recognition feature is only available to Plus and ChatGPT Enterprise users, and it is not available in the United Kingdom or the European Union.

Benefits of ChatGPT Image Input New Feature

The recently introduced image input capability in ChatGPT offers numerous advantages, which include:

1. Enhanced communication with visual context: Images convey information more effectively than text alone, facilitating improved communication. By incorporating image inputs, ChatGPT can gain a deeper understanding of user queries, resulting in responses that are both more comprehensive and precise.

2. Practical applications across diverse domains: The inclusion of images in interactions has practical utility in various fields. For example, it can assist in troubleshooting malfunctioning appliances, suggesting meal ideas based on the contents of a refrigerator, or aiding in solving math problems presented on a worksheet.

3. Enhanced user experience: Integrating image inputs into ChatGPT elevates the overall user experience, fostering more natural and efficient conversations. Users can effortlessly share images by clicking the photo button within the chat interface.

4. Potential for future developments in image analysis: While ChatGPT’s current image input capability primarily centers on understanding and reasoning about visual data, there is considerable potential for future advancements in image analysis and generation as AI models continue to progress.

ChatGPT Image Input New Feature Limitation

The new image input functionality integrated into ChatGPT does come with certain limitations:

1.      Inability to directly analyze people in photos: ChatGPT’s capacity to analyze individuals within photographs and offer advice based on them is deliberately restricted by OpenAI. This measure is implemented to mitigate potential safety concerns.

2.      No support for video input: While ChatGPT now accommodates image inputs, it is not yet clear if the same capability extends to video input.

3.      Limited availability: Image inputs are presently accessible to Plus and ChatGPT Enterprise users, but they have not been made available in the United Kingdom and the European Union as of now.

4.      No image generation: ChatGPT does not possess the capability to directly create images from image inputs. Nonetheless, it can generate textual descriptions of images, which can subsequently be employed as input for image-generation tools such as DeepAI, DALL·E, and Midjourney.

5.      Not universal across all models: While the GPT-3.5 and GPT-4 models support image inputs, not all models have the capability to accept image inputs.

6.      Gradual rollout with a focus on safety: OpenAI is adopting a gradual approach in introducing the image input feature, prioritizing safety throughout the deployment process. Thorough safety testing has been conducted before the rollout of this feature.


OpenAI has unveiled a significant update to ChatGPT, introducing three groundbreaking features: voice conversations, auditory responses with five synthetic voices, and image recognition. 

Users can now engage in natural voice conversations with ChatGPT using iOS and Android mobile apps, choosing from five synthetic voices crafted by professional actors. The image recognition feature allows users to share and discuss images, enabling various applications like troubleshooting and meal planning. 

However, ChatGPT has limitations, including occasional lack of common sense, limited knowledge, and potential performance issues due to computational demands. 

These updates signify a major stride in AI-human interactions, enhancing communication and expanding possibilities, but users should be aware of both the benefits and limitations of these new capabilities.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *