I asked the same question of the four generative AI services (ChatGPT, Claude, Gemini, Grok) I asked yesterday, “What is today’s biggest geopolitical news story?” and today I received three different answers. The question was asked of each service within seconds of each other. All were queried using default settings, without follow-up prompts.
ChatGPT
Flagged U. S.-EU trade and tariff turmoil fomented by President Trump’s off-again-on-again tariffs.
Claude
Still reports the imminent 3 year anniversary of Russia’s invasion of Ukraine the biggest geopolitical story. Interestingly, Claude repeated the same error it made yesterday today.
Gemini
Declared escalating tension between the U. S. and Iran the biggest story.
Grok
Also reported tensions between the U. S. and Iran as the biggest story.
In fairness each service reported all of the stories as important—only the relative priorities were different.
The divergence doesn’t reflect factual difference so much as differing implicit models of geopolitical significance. I suspect that reflects differences in how they were trained rather than algorithmic structural difference which is itself interesting. I’ll continue to follow-up on it.







I don’t know if there’s much LLM provided value in that in question, in the sense I feel the LLM’s responses are summarizations of what are top headlines for the AP / Reuters / NYT / WashPost / WSJ / FT — and reflects the taste of editors of those sites.
More interesting questions for LLM’s, what are the most underreported geopolitical stories today (when evaluated in 1 year, 5 year, 10 year, 50 year time frames). Use more of their analytical capabilities.
I’m not trying to maximize utility. I do that every day in other uses.
I’m trying to identify differences among the various different services. For that purpose, the simpler and more standardized the prompt, the better.