Google launched on Thursday a “reimagined” version of its research agent Gemini Deep Research based on its much-ballyhooed state-of-the-art foundation model, Gemini 3 Pro.
This new agent isn’t just developed to give research reports — while it could still do that. It now lets in developers to embed Google’s SATA-model research capabilities into their very own apps. That capability is made viable via Google’s latest Interactions API, which is developed to offer devs more control in the coming agentic AI era.
The latest Gemini Deep Research tool is an agent ready to synthesize mountains of information and deal with a massive context dump within the prompt. Google says it’s utilized by customers for tasks ranging from due diligence to drug toxicity protection research.
Google also says it’s going to be soon incorporating this new deep research agent into services, such as Google Search, Google Finance, its Gemini App, and its famous NotebookLM. This is another step toward arranging for a world in where humans don’t Google whatever anymore — their AI agents do.
The tech giant says that Deep Research advantages from Gemini 3 Pro’s status as its “most genuine” model this is trained to reduce hallucinations for the duration of complex tasks.
AI hallucinations — in which the LLM just makes stuff up — are an specially important issue for long-running, deep reasoning agentic tasks, in which many self sufficient decisions are made over minutes, hours, or longer. The more alternatives an LLM has to make, the more the chance that even one hallucinated choice will invalidate the whole output.
To prove its progress claims, Google has also formed yet another benchmark (as though the AI global require another one). The new benchmark is unimaginatively named DeepSearchQA and is meant to test agents on complicated, multi-step information-seeking tasks. Google has open sourced this benchmark.
It also examined Deep Research on Humanity’s Last Exam, a far much curiously named, independent benchmark of general information filled with impossibly niche tasks; and BrowserComp, a benchmark for browser-primarily based agentic tasks.
As you would possibly anticipate, Google’s new agent bested the competition on its very own benchmark, and Humanity’s. Moreover, OpenAI’s ChatGPT 5-Pro was a unexpectedly close second all the way around and slightly bested Google on BrowserComp.
But those benchmark comparisons have been obsolete nearly the moment Google posted them. Because at the same day, OpenAI released its distinctly expected GPT 5.2 — codenamed Garlic. OpenAI says its most recent model bests its competitors — specifically Google — on a suite of the typical benchmarks, along with OpenAI’s homegrown one.
It is possible that one of the most thrilling parts of this statement was the timing. Knowing that the world was awaiting the release of Garlic, Google dropped some AI news of its very own.











