A new study finds leading AI research agents often fail to credit original sources, even when data is accurate. Editorial teams must maintain strict human oversight to ensure source transparency.
Newsrooms adopting AI research tools to streamline reporting are facing a critical challenge: even the most advanced systems frequently fail to attribute information to the original source, according to a new study by Velora. This gap in source transparency can undermine editorial verification and force publishers to maintain rigorous human checks before publication.
The report, titled JournoBench. Which AI research agents can a newsroom trust to source a story?, evaluated nine AI-powered research products across thirty recent news events. Researchers Danny Bellion and Peter Stuart found that all tested tools, including GPT-5.5, Velora, Gemini 3.5 Flash, Claude Sonnet 4.6, and Perplexity sonar-pro, struggled to consistently identify and cite the primary source of information. Instead, these systems often linked accurate data to secondary outlets, such as other news sites or blogs, a phenomenon the authors call “fact laundering.”
This distinction matters for publishers, as accurate data alone is not enough-verifiable provenance is essential for editorial standards. The study revealed that AI-generated reports could present correct figures or statements, but if those are not directly tied to the original document, editorial teams are left with weakened fact-checking capabilities. The risk extends beyond fabricated data to poor attribution, making it difficult to trace the origin of each claim.
JournoBench focused on four criteria: whether the AI reached the primary source, captured essential data and quotes, linked each fact to the correct source, and avoided factual errors. Unlike other benchmarks that only assess answer accuracy, this test prioritized documentation standards expected in professional newsrooms. Each tool was tested twice on thirty real-world cases, with human-generated answer keys based on primary sources.
GPT-5.5 led the rankings with an 81% score, followed by Velora at 77%, and GPT-5.4 at 71%. Gemini 3.5 Flash, Gemini 3.1 Pro, Claude Sonnet 4.6, and Claude Opus 4.8 scored between 60% and 70%. Perplexity sonar-pro and Linkup trailed with 35% and 29%, respectively. The study found that while most tools could surface key facts-at rates from 63% to 92%-they diverged sharply on reaching the original document and attributing facts correctly. GPT-5.5, GPT-5.4, and Velora reached the primary source in 87% of cases, with GPT-5.5 also leading in accurate attribution at 83%.
Four main failure types emerged: loss of secondary details (40% of reports), failure to reach the primary source (27%), fact laundering (23%), and factual contradictions (11%). The “fact laundering” issue is particularly problematic, as it can go unnoticed in quick reviews. Even when the original source is found and listed, if main data points are attributed to a secondary article, editors cannot easily verify which facts come from the primary document.
One test case involved Lululemon’s fiscal 2026 forecast revision. The primary source was the company’s official earnings release, but the study found that AI agents sometimes cited secondary news coverage instead. The authors stressed that newsrooms should always credit the company directly, regardless of who reported first.
For publishers, the study highlights the need for human oversight when integrating AI research agents into editorial workflows. Reviewers should not only check for plausible summaries or numbers, but also confirm that the AI reached the original document, cited key data alongside that source, and clearly distinguished between primary and third-party information.
Cost analysis showed that GPT-5.5, while top-performing, was also the most expensive at about fifty cents per case. Velora, four points behind in accuracy, cost just two cents per case. The authors noted that cost comparisons are not fully equivalent, as Velora’s pricing reflects wholesale API and token costs, while others use public provider rates.
As publishers weigh the risks and benefits of AI in editorial workflows, some are also exploring new ways to capture and retain audience engagement through AI-powered formats. For example, several major outlets have begun deploying Q&A search and ad tools to keep users on-site, as detailed in this recent report on AI-driven audience strategies.