Google is Making AI Training 28% Faster by Using SLMs as Teachers
Training large language models (LLMs) has become out of reach for most organizations. With costs running into millions and computing requirements that make supercomputers sweat, AI development has remained behind the doors of tech giants. But Google just turned this story on its head with an approach so simple it makes you wonder why no one thought of it before: using smaller AI models as teachers.
How SALT works: a new approach to training AI models
In a recent research article entitled “A little help goes a long way: efficient LLM training using small LMs,Google Research and DeepMind introduced SALT (Small model Aided Large model Training). This is the new method that challenges our traditional approach to training LLMs.
Why is this research important? Currently, training large AI models is like trying to teach someone everything they need to know about a subject at once: it is inefficient, expensive, and often limited to organizations with vast computing resources. SALT takes a different route and introduces a two-phase training process that is both innovative and practical.
Breaking down how SALT actually works:
Phase 1: Knowledge distillation
- A smaller language model (SLM) acts as a teacher and shares its knowledge with the larger model
- The smaller model focuses on transferring the “learned knowledge” through what researchers call “soft labels.”
- Think of it as a teaching assistant covering fundamental concepts before a student moves on to advanced topics
- This phase is especially effective in ‘easy’ learning areas – areas where the smaller model has strong predictive confidence
Phase 2: Self-directed learning
- The large model switches to independent learning
- It focuses on mastering complex patterns and challenging tasks
- This is where the model develops capabilities beyond what its smaller ‘teacher’ could offer
- The transition between phases uses carefully designed strategies including linear decay and linear distillation loss weight decay
In non-technical terms, iImagine that the smaller AI model is like a helpful teacher guiding the larger model in the initial stages of training. This teacher provides additional information with the answers, indicating how certain he is of each answer. This extra information, also called ‘soft labels’, helps the larger model learn faster and more effectively.
As the larger AI model becomes more capable, it must move from relying on the teacher to learning on its own. This is where “linear decay” and “linear ratio decay” come into play.
Think of these techniques as gradually reducing the teacher’s influence over time:
- Linear decay: It’s like slowly turning down the volume of the teacher’s voice. The teacher’s guidance becomes less prominent with each step, allowing the larger model to focus more on learning from the raw data itself.
- Linear ratio decay: This is the same as adjusting the balance between the teacher’s advice and the actual task. As the training progresses, the emphasis shifts more to the original task, while the teacher’s input becomes less dominant.
The goal of both techniques is to ensure a smooth transition for the larger AI model, preventing sudden changes in its learning behavior.
The results are convincing. When Google researchers tested SALT with a 1.5 billion parameter SLM to train a 2.8 billion parameter LLM on the Stack datasetthey saw:
- A 28% reduction in training time compared to traditional methods
- Significant performance improvements after fine-tuning:
- Accuracy on arithmetic problems has increased to 34.87% (compared to 31.84% at baseline)
- Reading comprehension reached 67% accuracy (from 63.7%)
But what makes SALT truly innovative is its theoretical framework. The researchers found that even a “weaker” teacher model can improve student performance by achieving what they call a “favorable bias-variance tradeoff.” In simpler terms, the smaller model helps the larger model learn fundamental patterns more efficiently, creating a stronger foundation for advanced learning.
Why SALT could reshape the playing field for AI development
Remember when cloud computing transformed: who could start a tech company? SALT could do the same for AI development.
I’ve been following AI training innovations for years and most of the breakthroughs have mainly benefited the tech giants. But SALT is different.
Here’s what it could mean for the future:
For organizations with limited resources:
- You may no longer need a massive computing infrastructure to develop capable AI models
- Smaller research labs and companies could experiment with developing custom models
- The 28% reduction in training time directly translates into lower computing costs
- More importantly, you can start with modest computing resources and still achieve professional results
For the AI development landscape:
- More players could enter the field, leading to more diverse and specialized AI solutions
- Universities and research institutions could conduct more experiments with their existing resources
- The barrier to entry for AI research is dropping significantly
- We may see new applications in areas that previously could not afford the development of AI
What this means for the future
By using small models as teachers, we not only make AI training more efficient – we also fundamentally change who gets to participate in AI development. The implications go far beyond just technical improvements.
Important points to keep in mind:
- A 28% reduction in training time is the difference between starting an AI project or keeping it out of reach
- The performance improvements (34.87% for math, 67% for reading tasks) show that accessibility does not always mean compromising on quality
- SALT’s approach proves that sometimes the best solutions come from rethinking the fundamentals, rather than just adding more computing power
What to pay attention to:
- Keep an eye on smaller organizations that are starting to develop custom AI models
- Look forward to new applications in areas that previously could not afford AI development
- Look for innovations in the way smaller models are used for specialized tasks
To remind: SALT’s real value lies in how it can reshape who gets to innovate in AI. Whether you run a research lab, lead a technology team, or are just interested in AI development, this is the kind of breakthrough that could make your next big idea possible.
Maybe you should think about that AI project you thought was out of reach. It may be more possible than you thought.