Open Source AI: A Fast Forward
Open-source AI is evolving rapidly, driving significant innovation in generative AI. With accessible research and platforms like GitHub and Hugging Face, the community has launched groundbreaking projects with impressive results.
We initiated research on Open Source AI in Jan’24, here’s a summary of our piece:
- GitHub saw a 148% increase in contributors and a 248% rise in GenAI projects in '23.
- HuggingFace hosted 400K+ models, making OS AI highly competitive with closed models.
- OS models like Mistral, Vicuna, Yi, and Llama caught up to, and in some cases surpassed, proprietary models, with Mixtral 8x7B outperforming GPT-3.5 in Elo and MMLU ratings.
- OS AI startups attracted significant funding, with Mistral AI achieving unicorn status after a $487M deal. Deci AI, Supabase, AutoGPT were well positioned for future funding activity.
The space has evolved significantly in just a few months, looking completely different from our Jan’24 piece. Open-source AI models have closed the gap with their closed-source counterparts in several benchmarks. Investor interest has surged, with over 30 deals in 2024 alone. Meanwhile, startup activity has evolved from building fundamental tools to driving downstream innovation in model training and monitoring use cases.
Here’s a fast forward to mid-2024 with the latest developments in the open-source AI space.
OS AI Ecosystem: Steady growth, moving past the trough of disillusionment
Even a free hand tracing of the Hype cycle wouldn’t be as strikingly close to it as the AI GitHub stars trend. Since the tempered growth in Q1’23, developer interest has grown and stabilized, entering the “slope of enlightenment” – where value-driven innovation grows. Serious developer engagement (i.e. GitHub contributors) in open source AI has continued to increase in 2024.
Market Map: Dev tools are still hot, but momentum's increasing in training & monitoring as well
Significant surge in startups building Open Source AI products
The number of participants in open-source AI has surged since our last coverage of this space, with new players like Neum AI, Patronus AI entering the field and established players like Vian AI making open source toolkits available for their users.
Dev tools remain hot; training and monitoring tools seeing increased competition
A majority of startups are still focusing on developer tools for generative AI, which are essential for building, deploying, and managing applications. However, there has been an increase in startup activity around model training and monitoring use cases, suggesting a potential shift toward models fine-tuned on niche data and enhanced AI governance. In open-source models, winners are starting to emerge, with fewer new models being developed and greater emphasis on improved, more efficient versions from companies like Mistral and Meta.
OS development is bridging has bridged the gap with closed source solutions
41% of enterprise users prefer Open Source for Gen AI needs. Open source enables affordable and accessible research, fosters innovation from diverse creators, and operates with fewer legal constraints. Thus, it is being recognized as a much more prominent area for development, with the gap between closed-source and open-source offerings narrowing significantly.
Funding Landscape: Funding moving up a gear with larger, later-stage deals
High funding momentum in open-source AI, startups graduating to growth-stage deals
The sector has seen 60+ deals in the past two years, totalling over $13 billion in funding. More than 45% of these deals are Series A+, indicating a strong focus on growth-stage investments.
- Deci AI, a contender for future funding rounds in our OS AI research, was acquired by Nvidia for $300M
- Scale AI raised a $1B Series F round
- Mistral AI raised a $640M Series B round
- Together AI raised a $106M Series A round
Model training and developer tools are the most heavily funded segments in open-source AI (outside of Mistral and Databricks), accounting for 60% of the total funding in the sector.
NVIDIA continues to be a strategic investor, participating in eight deals (including Scale AI, Mistral AI, Together AI) within this space.
Foundational Models: Open-source models have closed the performance gap
Open Source AI models had already been catching up with their closed source counterparts towards the end of 2023. The benchmark gap between open and closed models is now narrower than ever – with Meta Llama, Mistral almost the same as GPT 4o in MMLU. Llama 3.1, launched mere weeks ago, is at par with GPT’s latest model in MMLU benchmark.
The Open-LLM-Leaderboard, launched by Hugging Face in June 2024, ranks open-source LLMs using new benchmarks focused on complex tasks, addressing limitations of existing evaluations (such as Elo and MMLU) as models plateaued in performance.
Qwen is an open-source LLM developed by Alibaba Cloud, pretrained on a large volume of data, including web texts, books, and codes.
New leaders are emerging amidst robust competition in Open Source LLMs
Falcon and Bloom were leaders in Huggingface Traction in December 2023. Over the past six months, the landscape has shifted significantly with new competitors emerging. Qwen saw the highest downloads in June’24; Llama and Phi have also seen substantial traction.
Github Traction: Huggingface, MindsDB and Roboflow have the hottest new repos in H1 2024
GitHub stars (similar to a “follow” on social media) are a direct indicator of a project’s popularity on GitHub. AutoGPT and ModularML’s Mojo led GitHub traction in 2023 – several repositories have gained significant momentum since then.
LeRobot provides models, datasets, and tools for real-world robotics in PyTorch, aiming to make robotics more accessible. It features state-of-the-art approaches in imitation and reinforcement learning, offering pretrained models, human-collected datasets, and simulation environments.
MindsDB, backed by NVIDIA, is a platform for building AI models using enterprise data. It allows users to deploy, serve, and fine-tune models in real-time by integrating with various data sources and AI/ML frameworks. MindsDB simplifies the connection between data sources and AI/ML tools, automating workflows to create customized AI systems.
Looking Forward
- Open-source AI models have closed the gap with their closed-source counterparts.
Open-source models like Mistral and Llama have established a strong foothold in the AI community, rapidly evolving with efficient versions that consistently match or outperform top proprietary models like GPT and Claude in various benchmarks. Other open-source models, such as Qwen and Yi, are also quickly catching up in performance.
- OS has moved past the AI hype cycle trough; steady growth in developer engagement continues
Developer interest has in OS AI has entered the “slope of enlightenment”, where value-driven innovation grows. While Hugging Face and OpenAI lead in traction, repositories from MindsDB and Roboflow are gaining momentum, joining previous top performers like AutoGPT and Nomic AI.
- OS AI innovation has expanded downstream, from Models and Dev tools to training and monitoring
The open source AI space has seen a surge in startup activity in 2024, with 150+ players innovating across use cases. The landscape has expanded beyond just foundational models and devtools to model training, fine-tuning and monitoring – indicating a rapid evolution of the space from fundamental tools to mature, efficiency-driven offerings.
- Fundraising in OS AI is impressive in volume as well as size, even by Gen AI standards
Open Source AI startups are standing out with large deal sizes, highlighting their growing influence and potential in the industry. Nvidia’s $300M acquisition of Deci AI, Scale AI’s $1B Series F, Mistral AI’s $640M Series B are proof. Supabase is a likely contender for a sizeable deal in the near future.
Interested in learning more about startups building in open source AI? We have curated a list of 75+ amazing Open Source AI startups, that are seeing great traction on alternative datasets like Github, Employee counts, Website visits, Job openings, and more. Download the list here.