Distilled Giants: Why We Must Rethink Small AI Development
In recent years, the race to develop ever-larger AI models has captivated the technology industry. These models, with their billions of parameters, promise groundbreaking advances in various fields, from natural language processing to image recognition. However, this relentless pursuit of size comes with significant drawbacks in the form of high costs and significant environmental impacts. While small AI offers a promising alternative, delivering efficiency and lower energy consumption, the current approach to building it still requires significant resources. As we pursue smaller and more sustainable AI, exploring new strategies that effectively address these limitations is critical.
Small AI: a sustainable solution for high costs and energy needs
Developing and maintaining large AI models is a costly endeavor. Estimates suggests that training GPT-3 will cost more than $4 million, with more advanced models potentially reaching millions of dollars. These costs, including the necessary hardware, storage, computing power and human resources, are prohibitive for many organizations, especially smaller enterprises and research institutions. This financial barrier creates an uneven playing field, limits access to advanced AI technology and hinders innovation.
Furthermore, the energy demands associated with training large AI models are enormous. For example, training a large language model such as GPT-3 estimated consume nearly 1,300 megawatt hours (MWh) of electricity, equivalent to the annual energy consumption of 130 U.S. households. Despite these significant training costs, each ChatGPT request involves a inference costs of 2.9 watt hours. The IEA estimates that collective energy demand from AI, data centers and cryptocurrency accounted for almost 2 percent of global energy demand. This demand is expected to double by 2026 and approach Japan’s total electricity consumption. The high energy consumption not only increases operational costs but also contributes to the carbon footprint, exacerbating the environmental crisis. To put it in perspective, researchers estimate that training a single large AI model can expend more energy 626,000 pounds of CO2which is equivalent to the emissions of five cars over their lifetime.
Amid these challenges, Small AI offers a practical solution. It is designed to be more efficient and scalable and requires much less data and computing power. This reduces overall costs and makes advanced AI technology more accessible to smaller organizations and research teams. In addition, small AI models have lower energy requirements, which helps reduce operational costs and reduce their impact on the environment. By using optimized algorithms and methods such as transfer learning, small AI can achieve high performance with fewer resources. This approach not only makes AI more affordable, but also supports sustainability by minimizing both energy consumption and CO2 emissions.
How small AI models are built today
Major technology companies such as Google, OpenAI and Meta recognize the benefits of small AI and are increasingly focusing on the development of compact models. This shift has led to the evolution of models such as Twin flash, GPT-4o MiniAnd Lama 7B. These smaller models are mainly developed using a technique called distillation of knowledge.
At its core, distillation involves transferring knowledge from a large, complex model to a smaller, more efficient version. In this process, a “teacher” model – a large AI model – is trained on extensive data sets to learn complex patterns and nuances. This model then generates predictions or ‘soft labels’ that summarize its deep understanding.
The ‘student’ model, a small AI model, is trained to replicate these soft labels. By mimicking the teacher’s behavior, the student model captures much of his knowledge and performance while working with significantly fewer parameters.
Why we need to go beyond distilling big AI
While the distillation of big AI into small, more manageable versions has become a popular approach for building small AI, there are several compelling reasons why this approach may not solve all the challenges of developing big AI.
- Continued dependence on large models: While distillation creates smaller, more efficient AI models and improves computational and energy efficiency at the time of inference, it still relies heavily on training large AI models initially. This means that building small AI models still requires significant computing resources and energy, leading to high costs and environmental impacts, even before distillation takes place. The need to repeatedly train large models for distillation shifts the resource burden rather than eliminating it. While distillation aims to reduce the size and cost of AI models, it does not eliminate the substantial upfront costs associated with training the large “teacher” models. These upfront costs can be a challenge, especially for smaller organizations and research groups. Furthermore, the environmental impact of training these large models can offset some of the benefits of using smaller, more efficient models, as the carbon footprint of the initial training phase remains significant.
- Limited innovation scope: Relying on distillation can limit innovation by focusing on replicating existing grand models rather than exploring new approaches. This could delay the development of new AI architectures or methods that could provide better solutions to specific problems. The dependence on big AI limits the development of small AI in the hands of a few resource-rich companies. As a result, the benefits of small AI are not evenly distributed, which can hinder broader technological progress and limit opportunities for innovation.
- Generalization and adaptation challenges: Small AI models created by distillation often struggle with new, unseen data. This happens because the distillation process may not fully reflect the ability of the larger model to generalize. As a result, although these smaller models perform well on familiar tasks, they often experience problems when faced with new situations. Furthermore, adapting distilled models to new modalities or datasets often involves retraining or refining the larger model first. This iterative process can be complex and labor-intensive, making it challenging to quickly adapt small AI models to rapidly evolving technological needs or new applications.
It comes down to
While distilling large AI models into smaller models may seem like a practical solution, it remains subject to the high cost of training large models. To truly make progress in small AI, we need to explore more innovative and sustainable practices. This means creating models designed for specific applications, improving training methods to be more cost and energy efficient, and focusing on environmental sustainability. By pursuing these strategies, we can advance the development of AI in a way that is both responsible and beneficial to the industry and the planet.