Two undergrads built an AI speech model to rival NotebookLM

A few students, nor with extensive AI expertise, say that they have made an openly available AI model that Podcast-Style can generate clips that are comparable to Google’s Notebooklm.
The market for synthetic speech tools is enormous and is growing. Elevenlabs is one of the biggest players, but there is no shortage of challengers (see Playai, Sesame, etc.). Investors believe that these tools have enormous potential. According to pitchbookStartups that Voice AI Tech developed last year raised more than $ 398 million in VC financing.
Toby Kim, one of Korea-based co-founders of Nari LabsThe group behind the newly released model said that he and his fellow founder started learning about Speech AI three months ago. Inspired by Notebooklm, they wanted to make a model that offered more control over generated voices and ‘freedom in the script’.
Kim says they used the TPU Research Cloud Program of Google, that researchers offers free access to the company’s TPU AI chips to train the Nari model, DIA. With a weight of 1.6 billion parameters, Dia can generate dialogue from a script, allowing users to adjust the notes of the speakers and insert disfluencies, cough, laugh and other non -verbal signals.
Parameters are the internal variables that models use to make predictions. In general, models with more parameters perform better.
Available on the AI DEV platform Hug And GitubDia can work on most modern PCs with at least 10 GB Vram. It generates a random voice unless a description of an intended style is requested, but it can also clone the voice of a person.
In WAN’s short tests from Dia via Nari’s Web demoDia worked pretty well, not complaining the generation of two -way schats on each topic. The quality of the voices seems competitive with other tools that are available, and the speech clockwise function is one of the easiest that this reporter has tried.
Here is an example:
Like many speech generatorsHowever, DIA offers few guarantees. It would be trivial to make disinformation or a scam recording. On the Dia project pages, Nari discourages abuse of the model to pretend, mislead or otherwise enter into illegal campaigns, but the group says that it is “not responsible” for abuse.
Nari also did not disclose what data it has scraped to train slide. It is possible that DIA has been developed using copyrighted content – A commentator On Hacker News, one example sounds like the hosts of the Podcast “Planet Money” of NPR. Training models on copyright protected content is a widespread but legal dubious practice. Some AI companies claim that reasonable use protects them against liability, while holders of rights claim that reasonable use does not apply to training.
In any case, Kim says that Nari’s plan is to create a synthetic speech platform with a ‘social aspect’ on top of DIA and larger, future models. Nari is also planning to issue a technical report for DIA and to extend the support of the model to languages outside English.