OS AI Ecosystem: Substantial growth in AI projects as well as contributors

Specifically for Gen AI, the term “open source” typically implies that the source code, any applicable weights and parameters (for training models) of these components are publicly accessible, usable, modifiable, and their distribution is permitted.

Adhering to this definition, the open source AI stack includes comprehensive set of tools to build Gen AI applications - foundational models (such as Llama, Mistral), developer tools & frameworks (such as Langchain, Fixie), model training platforms (such as Weights & Biases, Anyscale), and monitoring tools (Datadog, Seldon).

Open source AI innovation is thriving with new projects and developers

Open source Gen AI projects are seeing significant and growing projects as well as contributors. Last year, Github witnessed 148% YOY growth in contributors and a 248% YOY growth in the total number of Gen AI projects. There are 60K Gen AI projects on Github and over 400K models on Huggingface as of 2023.

Contributor set is becoming increasingly Global, not restricted to US and Europe

Beyond the US and Europe – where a majority of open source projects originate from – the highest number of individual contributors to open source Gen AI came from India and Japan in 2023. Developers from Hong Kong, UK, Brazil, Germany and Singapore are also making numerous contributions to open source Gen AI. By 2027, India is projected to overtake US as the largest developer community on Github.

Steady increase in serious contributors, while “tourist” interest has tempered since Q1 hype

Gen AI overall has experienced a shift from initial widespread hype (peaking in Q1) to more focused and value-driven engagement - the "trough of disillusionment" phase, where initial excitement gives way to sustained, serious development.

Similar trend can be seen in # of stars across Github repos - the growth has tempered since Q1. On the other hand, serious developers (# of contributors to these projects) have grown steadily - 148% cumulatively in 2023.

Python is the preferred language for open source AI

While Javascript has been the top programming language on Github in 2023, Python is the top choice when it comes to AI repositories. Python’s preference for ML projects has carried over to Gen AI because of its comprehensive ML libraries like TensorFlow and PyTorch. Python's flexibility in data handling and its platform-independent nature make it highly adaptable for diverse AI projects.

Mojo, a variation of Python that combines the usability of Python with the performance of C++, is gaining traction as an AI-specific programming language. In Q4’23, Mojo saw a 73% MOM increase in Github stars, indicative of the repo’s popularity amongst developers.

AI repositories favouring more protective licensing

A disproportionate share of AI repos are using the Apache License, under which developers can claim patents on derivative projects. The Apache license is known to be extensive in legal terminology and therefore offer better patent protection than other licenses. Though the open MIT license is the most popular across Github; Gen AI developers are predictably keen on securing their work with more protective licensing.