Mistral AI, a relatively young company based in Paris, has made headlines with the recent release of its first AI-based large language model called Mistral 7B.
This new model boasts 7.3 billion parameters and is touted as one of the most powerful models for its size, surpassing even larger models like Meta's Llama 2, which is among Meta's smaller recent models. Mistral 7B offers natural language programming capabilities and can handle a range of English language tasks, making it a valuable choice for various enterprise applications.
The distribution of Mistral 7B under the Apache 2.0 license, known for its permissiveness and lack of usage restrictions beyond attribution, opens up opportunities for a wide range of users to adopt the model as long as they have the necessary hardware or cloud resources.
Mistral AI aims to make AI practical for businesses by relying solely on publicly available data and contributions from customers.
In the blog post, the team of Mistral AI revealed the main goal of the project, “Our ambition is to become the leading supporter of the open generative AI community and bring open models to state-of-the-art performance. Mistral 7B’s performance demonstrates what small models can do with enough conviction.”
With the introduction of Mistral 7B, the company aims to provide teams with access to a compact model capable of low-latency text summarization, categorization, text completion, and code completion. Despite its recent release, Mistral AI claims its new model has surpassed competing open-source models in various benchmarks.
Despite being in its early stages, Mistral's demonstration of a compact model delivering excellent performance across a range of tasks could have significant benefits for enterprises. Mistral 7B performs as well as a Llama 2 model with over three times as many parameters (23 billion) in the Massive Multitask Language Understanding (MMLU) test, potentially reducing memory usage and offering cost advantages without compromising output quality.
For instance, in the test, which covers 57 areas including mathematics, US history, computer science, and law, Mistral 7B achieved an accuracy of 60.1%, surpassing Llama 2 7B and 13B, which scored less than 44% and 55% accuracy, respectively. The model also outperformed Llama models in important tests related to common sense reasoning and reading comprehension.
Mistral AI also highlights sliding window attention (SWA) and grouped-query attention (GQA) for handling longer sequences at a lower cost and enabling faster inference.
Now, Mistral 7B is a free model that can be deployed anywhere using the company's reference implementation, the vLLM inference server, and Skypilot, from local setups to AWS, GCP, or Azure cloud environments. The model is available for download through various channels, including a 13.4-gigabyte torrent and a GitHub repository, where collaboration and problem-solving can take place.
The company has plans to release a larger model in 2024 capable of more advanced reasoning and multilingual capabilities. Stay tuned with Atlasiko to get more news on the development of Mistral AI LLM and other technologies not to miss important innovations.