Pruna AI open sources its AI model optimization framework

Pruna AIA European startup that has worked on compression -algorithms for AI models, makes its optimization framework Source On Thursday.
Pruna AI has made a framework that applies different efficiency methods, such as caching, pruning, quantization and distillation, on a certain AI model.
“We also standardize the storage and loading of the compressed models, the use of combinations of these compression methods and also evaluate your compressed model after you have compressed it,” Pruna Ai Co-Fonder and CTO John Rachwan told WAN.
In particular, the framework of Pruna Ai can evaluate whether there is a considerable loss of quality after compressing a model and the performance profits that you get.
“If I used a metaphor, we are similar to how hugging standardized transformers and diffusers hugging – how they call them, save, load, etc. We do the same but for efficiency methods,” he added.
Big Ai Labs have already used various compression methods. For example, OpenAi is dependent on distillation to make faster versions of its flagship models.
This is likely how OpenAi GPT-4 Turbo developed, a faster version of GPT-4. Likewise the Flux.1-Schnell Image model is a distilled version of the Flux.1 model from Black Forest Labs.
Distillation is a technique that is used to extract knowledge from a large AI model with a “teacher student” model. Developers send requests to a teacher model and register the output. Answers are sometimes compared with a data set to see how accurate they are. This output is then used to train the student model, which is trained to approach the behavior of the teacher.
“For large companies, she is usually that they build this stuff in-house. And what you can find in the open source world is usually based on some methods. Let’s say a quantization method for LLMS, or a cachem method for diffusion models,” said Rachwan. “But you can’t find a tool that they all collect, all easy to use and combine them together. And this is the great value that Pruna is currently bringing.”

Although Pruna AI supports any form of models, from large language models to diffusion models, speech-to-text models and computer vision models, the company is currently focusing more specifically on image and video generation models.
Some existing users of Pruna Ai are Scenario And Photoroma. In addition to the Open Source edition, Pruna AI has a business offer with advanced optimization functions, including an optimization agent.
“The most exciting function that we will soon make will be a compression agent,” said Rachwan. “In short, you give it your model, you say,” I want more speed, but don’t let my accuracy fall by more than 2%. ” And then the agent will just do his magic.
Pruna AI charges per hour for his pro version. “It is similar to how you would think of a GPU when you rent a GPU on AWS or a cloud service,” said Rachwan.
And if your model is a crucial part of your AI infrastructure, you will ultimately save a lot of money in the conclusion of the optimized model. Pruna Ai, for example, has made a Lama model eight times smaller without too much loss with the help of the compression framework. Pruna Ai hopes that his customers will think of his compression framework as an investment that pays itself.
Pruna Ai collected a seed financing round of $ 6.5 million a few months ago. Investors in the startup are EQT Ventures, Daphni, Motier Ventures and Kima Ventures.