Inflection, a well-funded AI startup focused on democratizing personal AI, has unveiled its latest conversational agent, Pi. Powering Pi is Inflection-1, a large language model comparable in size and capabilities to GPT-3.5 (also known as ChatGPT). While assessing the quality of such models objectively and systematically is challenging, healthy competition is always beneficial.
Inflection-1 has undergone rigorous benchmarking against other models in its tier, including GPT-3.5, LLaMA, Chinchilla, and PaLM-540B. According to the published results, Inflection-1 performs admirably across various metrics, excelling in tasks such as middle- and high-school-level exams (e.g., biology) and "common sense" benchmarks (e.g., spatial reasoning). However, it falls short in coding tasks where GPT-3.5 demonstrates superiority, and the upcoming GPT-4 model surpasses all competitors. OpenAI's GPT-4 has significantly elevated the bar in coding capabilities, which is no surprise given its widely acknowledged leap in quality.
Inflection plans to release results for a larger model akin to GPT-4 and PaLM-2(L) but is likely waiting for substantial outcomes before making them public. It seems Inflection-2 or Inflection-1-XL is currently in the works but not yet fully developed.
Although the AI community has not officially categorized AI models into weight classes like those in boxing, the analogy is fitting. A small AI model is not as capable as a large one, but it operates efficiently on mobile devices, while a large model necessitates extensive data center resources. Comparing small and large models is akin to comparing apples to oranges.
Due to the relative youth of the field and the absence of a consensus on the sizes and configurations of AI models that can be considered comparable, attempting a classification system at this stage is premature.
Ultimately, the true test lies in the hands-on experience. Until Inflection opens its model to widespread usage and independent evaluation, its touted benchmarks should be regarded with caution. If you're interested in trying Pi, you can easily integrate it into your preferred messaging apps or engage with it online.