Cohere Introduces Aya Vision AI Models for Multilingual Tasks

Cohere For AI has launched its latest state-of-the-art vision models, named Aya Vision, designed to enhance the performance of artificial intelligence across multiple languages. Released on Tuesday, these models come in two parameter sizes and aim to tackle the inconsistent capabilities of existing large language models (LLMs) in multimodal tasks. With the ability to generate outputs in 23 languages, Aya Vision can handle both text and image-based tasks, although it does not generate images itself. The models are now available on open-source platforms and through WhatsApp.

Cohere Releases Aya Vision AI Models

In a recent blog post, Cohere detailed the features of its new Aya Vision models, which are offered in 8B and 32B parameter sizes. These advanced models can generate text, translate text and images, analyze images, and provide answers to related queries, all while supporting 23 languages. Users can access the models through Cohere’s Hugging Face page and Kaggle. Additionally, a dedicated WhatsApp chat account allows general users to interact with the models, making it easier for individuals to learn more about images or artworks they encounter.

According to internal testing results, the Aya Vision 8B model outperformed several competitors, including Qwen2.5-VL 7B, Gemini Flash 1.5 8B, and Llama 3.2 11B Vision models on the AyaVisionBench and m-WildVision benchmarks. Notably, the AyaVisionBench benchmark was developed by Cohere and is publicly available for reference. For the Aya Vision 32B model, Cohere claims it surpassed the performance of Llama 3.2 90B Vision and Qwen2-VL 72B models on the same benchmarks.

Innovative Algorithms Drive Performance

Cohere attributes the impressive performance of the Aya Vision models to several algorithmic innovations. The development team utilized synthetic annotations and scaled up multilingual data through translation and rephrasing techniques. Additionally, they merged multiple multimodal models in separate steps, leading to significant performance improvements at each stage of development. These advancements have positioned Aya Vision as a leading solution in the AI landscape.

Developers interested in utilizing the Aya Vision models can access the open weights from Kaggle and Hugging Face. However, it is important to note that these models are released under a Creative Commons Attribution Non Commercial 4.0 license, which permits academic and research-based usage but prohibits any commercial applications.

 


Observer Voice is the one stop site for National, International news, Editorโ€™s Choice, Art/culture contents, Quotes and much more. We also cover historical contents. Historical contents includes World History, Indian History, and what happened today. The website also covers Entertainment across the India and World.

Follow Us on Twitter, Instagram, Facebook, & LinkedIn

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button