Picture Smart Challenges

Article written by Ryan Jones and last updated March 11, 2024.

Understanding AI Hallucinations in Image Descriptions

In the rapidly evolving landscape of artificial intelligence (AI), one of the most promising advancements for those of us who are blind or have low vision is the ability to merge the visual world with the realm of language. Technologies like JAWS’ Picture Smart AI, which currently integrates OpenAI’s GPT and Google’s Gemini vision models, are at the forefront of this revolution, offering unprecedented capabilities in image descriptions for users worldwide. While these developments introduce a new era of accessibility and digital interaction, they also bring to light an intriguing phenomenon known as “AI hallucinations.” This post explores the basics of AI hallucinations in image descriptions, exploring their causes, implications, and the ongoing efforts to mitigate their occurrence.

The Convergence of Vision and Language

At the heart of Picture Smart AI lies a sophisticated integration of Google’s Gemini vision model and OpenAI’s GPT. This combination allows JAWS to not only perceive the contents of an image but also to articulate these observations in detailed, natural language. The vision models analyze visual data, identifying objects, scenes, and activities, while the language models generate coherent, contextually relevant descriptions. This seamless fusion promises to revolutionize how we interact with digital content, particularly for those who rely on technology to interpret the visual world.

The Mirage of AI Hallucinations

However, as with any new technology, there are challenges. AI hallucinations occur when these systems generate descriptions that include elements not present in the image or misinterpret the visual content. These inaccuracies can range from minor embellishments to significant errors, often resulting from biases in the training data, or the AI’s attempts to fill in gaps in ambiguous information.

For example, AI might describe a beach scene with seagulls flying overhead, even if no birds are present in the image. Such hallucinations can stem from the AI’s training, where beach scenes frequently included birds, leading it to infer their presence even when they’re not visually detectable.

The Impact of Hallucinations

The implications of AI hallucinations are directly related to the importance of the image being described. For those of us using AI for image descriptions, these inaccuracies could lead to confusion or misinterpretation. Beyond individual inconvenience, hallucinations raise questions about the reliability and trustworthiness of AI-generated content, highlighting the need for cautious optimism and critical engagement with these technologies.

Identifying Hallucinations

Without being able to visually see the described image, it may be difficult or impossible to tell if hallucinations exist. We may experience a similar challenge when depending on other people to describe graphical information to us. For example, if I ask someone to describe an image or scene to me, it is possible the description will have inaccurate information or maybe a bias towards the life experience of the describer. The following are some ways to help spot description hallucinations.

Context: Use context clues based on what you do know about an image compared with the description. For example, I recently viewed a description of an online image of a concert. I knew the song that the band was playing when the photo was taken was an instrumental. When part of the image description indicated that the crowd was singing along with the music I knew that could not be the case. Maybe the crowd was cheering and the AI assumed they were singing since it was clearly a concert. Based on the description, I did at least know the image contained a portion of the crowd.
Common sense and logic: Use common sense and logic to ask yourself if the description really makes sense. For example, if you read a description of an image of a coffee maker you are looking to purchase online and the description indicates the machine is of a different brand, you may want to assume the AI got the brand wrong. This may be less likely to happen if the branding is clearly displayed on the image.
Comparison of descriptions: Picture Smart AI currently leverages image descriptions from both Google and OpenAI. If you notice big differences between the descriptions, this is an indicator that one, or maybe both, are providing incorrect information.

If you believe the description may not be exactly correct you can always re-run Picture Smart AI again on the same image and compare the results. If it is critical to have a perfectly accurate description, you may wish to rely on someone who can visually review both the image and the description in order to provide validation. Never rely solely on Picture Smart AI for critical topics such as medical information.

Charting a Course Through the Mirage

Addressing AI hallucinations requires a multifaceted approach. Developers and researchers are continuously refining AI models to improve their accuracy and reduce the incidence of hallucinations. This includes diversifying training data to cover a broader range of scenarios and implementing more sophisticated error-checking mechanisms. Additionally, user feedback plays a crucial role in identifying and correcting inaccuracies, underscoring the importance of community involvement in the development process.

Moreover, ongoing research into explainable AI seeks to make the decision-making processes of AI systems more transparent, allowing for better understanding and troubleshooting of hallucinations when they occur.

Conclusion: A Vision for the Future

Despite the challenges posed by AI hallucinations, the potential of AI-powered image descriptions is extremely powerful! As we continue to refine these technologies, we move closer to a world where digital content is accessible and comprehensible to all, transcending the barriers between the visual and the verbal. By approaching these advancements with a critical eye and a commitment to continuous improvement, we can ensure that AI serves as a bridge to understanding, rather than a source of confusion.

As we cross the horizon into this new world, Vispero is embracing the promise of AI with awareness and responsibility, participating actively in shaping a future where technology enhances our perception of the world in all its richness and complexity.