AI

Irony alert: Hallucinated citations found in papers from NeurIPS, the prestigious AI conference

Starting up AI detection GPTZero everything scanned 4,841 papers accepted by the prestigious Conference on Neural Information Processing Systems (NeurIPS), which took place last month in San Diego. The company found 100 hallucinated quotes in 51 papers that it confirmed were fake, the company tells TechCrunch.

The fact that a paper is accepted by NeurIPS is a resume-worthy achievement in the world of AI. Given that these are the leading minds of AI research, one might assume they would use LLMs for the catastrophically boring task of writing citations.

There are therefore caveats to this finding: 100 confirmed hallucinatory quotes in 51 articles are not statistically significant. Each article contains dozens of quotes. So out of tens of thousands of quotes, this is, statistically speaking, zero.

It is also important to note that an inaccurate quote does not invalidate the research of the article. As NeurIPS said Fortunewhich first reported on GPTZero’s research: “Even if 1.1% of articles have one or more incorrect references due to the use of LLMs, the content of the articles themselves remains [is] not necessarily invalid.”

But having said all that, a forged quote is no small feat either. NeurIPS is proud of its “rigorous scientific publications in the field of machine learning and artificial intelligence,” it says. And each article is reviewed by multiple people, who are tasked with identifying hallucinations.

Citations are also a kind of currency for researchers. They are used as a career benchmark to show how influential a researcher’s work is among his peers. When AI makes them up, it dilutes their value.

See also  Tim Cook reportedly tells employees Apple 'must' win in AI

No one can blame the peer reviewers for not noticing a few AI-fabricated quotes, given the sheer volume involved. GPTZero is also quick to point this out. The aim of the exercise was to provide specific data on how AI is creeping in through “a tsunami of submissions” that has “strained the review pipelines of these conferences to the breaking point,” the startup says in its report. GPTZero even references a May 2025 article called “The AI ​​Conference Peer Review Crisis” who discussed the problem at premiere conferences including NeurIPS.

WAN event

San Francisco
|
October 13-15, 2026

But why couldn’t the researchers themselves fact-check the LLM’s work for accuracy? They definitely need to know the actual list of papers they used for their work.

What it all really points to is one big, ironic conclusion: if the world’s leading AI experts, with their reputations on the line, can’t ensure that their LLM usage is accurate in every detail, what does that mean for the rest of us?

Source link

Back to top button