Cohere’s RAG search now includes vision
Join our daily and weekly emails to receive the latest updates on AI. Learn More
Cohere’s search model now includes multimodal embeddings, which allow users to use images in RAG-style enterprise searches.
Embed 3 was launched last year and uses embeddings that convert data into numerical representations. The embedding of documents is crucial for retrieval augmented generations (RAG), as the model compares the embedded document to the original one and can get the requested information. We’re thrilled to announce fully multimodal embeddeds that folks can start building with. pic.twitter.com/Zdj70B07zJ
— Aidan Gomez (@aidangomez) October 22, 2024
The new multimodal version is able to embed images and text. Cohere says Embed 3 “is now the most general multimodal embedding models on the market.” Aidan Gomez posted a graph to X that showed performance improvements with Embed 3 in image searches. The model’s image-search performance across a variety of categories is impressive. Significant lifts in nearly all categories. pic.twitter.com/6oZ3M6u0V0
— Aidan Gomez (@aidangomez) October 22, 2024
“This advance enables enterprises unlock real value from the vast amount of images stored,” Cohere stated in a blog. “Businesses can now build systems that accurately and quickly search important multimodal assets such as complex reports, product catalogs and design files to boost workforce productivity.”
Cohere said a more multimodal focus expands the volume of data enterprises can access through an RAG search. RAG searches are often limited to text files, despite the fact that many organizations have multiple formats of data in their libraries. Customers can now upload more charts, graphs and product images.
Performance improvements
Cohere said encoders in Embed 3 “share a unified latent space,” allowing users to include both images and text in a database. Some image embedding methods require a separate database to store images and text. This method, according to the company, leads to more mixed-mode searches. According to the company “Other models cluster text and images into separate areas which leads to poor search results that are biased towards text-only data.” Embed 3, on the other hand, prioritizes the meaning behind the data without biasing towards a specific modality.”
Embed 3 is available in more than 100 languages. Cohere announced that multimodal Embed 3 was now available on Amazon SageMaker and its platform. Playing catch up
Many users are becoming familiarized with multimodal search thanks to platforms such as Google and chat interfaces ChatGPT. It makes sense for users to want the same experience at work as they become accustomed to searching information using images.
This benefit is also being seen by businesses, since other companies who offer embedding options provide multimodal options. Google and OpenAI are two model developers that offer multimodal embedding. Open-source models are also able to facilitate embeddings of images and other modes. Multimodal embeddings models that are fast, accurate and secure will win the battle. Cohere was founded by researchers who developed the Transformer model. (Gomez, one of the authors of “Attention Is All You Need” is also a co-founder of Cohere.) It has been a struggle to get the attention of many enterprise users. In September, it updated its APIs to make it easier for customers to switch between Cohere and competitor models. Cohere said at the time that the move was made to align with industry standards, where customers frequently switch between models. Stay informed! Subscribe to receive the latest news daily in your email.
Thank you for subscribing. Click here to view more VB Newsletters.
An error occured.