I asked three different commercially available LLMs the same question:
Which TLDs have the same name as valid HTML5 elements?
This is a pretty simple question to answer. Take two lists and compare them. I know this question is possible to answer because I went through the lists two years ago. Answering the question was a little tedious and subject to my tired human eyes making no mistakes. So surely this is the sort of thing which an LLM excels at, right?
Wrong!
Here's how the three big beasts fared.
ChatGPT
Good question! Let’s carefully check this.
[…Skip preamble…]
✅ Confirmed Matches
The following TLDs exactly match valid HTML5 element names:
- .audio → <audio>
- .code → <code>
- .link → <link>
- .menu → <menu>
- .nav → <nav>
- .video → <video>
This is wrong on two counts. First, it misses off several matches. Secondly, there is no .code
TLD! There is a .codes
, but that's not a match.
Google Gemini
Gemini shits the bed completely:
Here are the top-level domains (TLDs) that have the same name as valid HTML5 elements:
- .address: The HTML <address> element is used to define the contact information for the author/owner of a document or an article.
- .article: The HTML <article> element specifies independent, self-contained content.
- .aside: The HTML <aside> element is used to define some content aside from the content it is placed in.
- .audio: The HTML <audio> element is used to embed sound content in a document.
- .body: The HTML <body> element defines the document's main content.
- […Skip a dozen more items…]
It just lists a bunch of HTML elements with no crossover to TLDs. Thanks Google. Thoogle.
Claude
Anthropic's AI does a half-arsed job.
Here are the TLDs that match HTML5 element names:Some other potential matches depend on how broadly you interpret "HTML5 elements":
- .audio - matches the <audio> element
- .video - matches the <video> element
- .data - matches the <data> element
- .link - matches the <link> element
- .menu - matches the <menu> element
- .style - matches the <style> element
- .select - matches the <select> element
- .app - while there's no <app> element in HTML5, there was discussion of it during HTML5 development
- .art - not a standard HTML element, though some frameworks use custom elements like this
The first seven are correct - although it ignores a dozen more correct answers. But then it does the LLM thing of trying to over-fit the data in a simulacrum of usefulness.
Don't believe the hype
To be clear, I would expect a moderately intelligent teenager to be able to find two lists and compare them. If an intern gave me the same attention to detail as above, we'd be having a cosy little chat about their attitude to work.
I've discussed before why people have such dramatically different experiences using AI. For some it is an all-knowing and ineffable source of wisdom. For others, like me, it is mostly garbage.
I think it comes down to how familiar you are with the domain and its constraints. When I watch a medical drama, I have no idea if they're using realistic language. It sounds good, but real doctors probably cringe at the inaccuracies.
AI seems plausible because it relies on the Barnum Effect - it tells people what they want to hear. It produces words which sound correct.
We need a new term for Gell-Mann Amnesia. Something that describes how an AI is convincing if you don't understand its reasoning, and close to useless if you understand its limitations.
20 thoughts on “LLMs are still surprisingly bad at some simple tasks”
To stave off some obvious comments:
yoUr'E PRoMPTiNg IT WRoNg!
Am I though? I don't think I've asked for anything particularly ambiguous. This task is a matter of fact, not opinion. If it can't cope with a fairly limited task with an easily testable end-state, why should I trust it with anything more complicated?
How much prompting would I have to do in order to get the correct answer? Would it be more or less effort than manually doing the work?
WHY Do YOu HaTE PROGREsS?
I don't. I hate the way this is being sold as a universal and magical tool. The reality doesn't live up to the hype.
Every time someone says they've made progress in AI, I try a few simple tasks that I'd expect an intern to be able to manage. I've yet to be impressed by anything these chatbots have regurgitated.
nExt yEAR lLMS WILl be GrEat
I've been hearing that for a while.
I can't wait until I can jack into the Metaverse and buy an NFT with cryptocurrency just by using an LLM! Perhaps I can view it on my 3D TV by streaming it over WIMAX? I'd better stock up on quantum computers to make sure it all works.
you WeRE PAId BY biG OIl TO sAY tHIs
(This is genuinely a comment I received on one of my solar posts!)
Over the years I've been accused of being in the pocket of Apple, Google, BlackBerry, Nokia, Tesla, Amazon, the fossil fuel lobby, the nuclear lobby, Chinese solar panel manufacturers, and a whole bunch more.
None of them have paid me yet. Any day now, I'm sure.
Carl Breen
You were prompting it wrong though. I did it in 3 steps. I noticed the model got aggressive at returning all TLDs, either linking me or telling me the list is too long. After I ordered it that I do not care it returned it. I named each Table 1, 2, and 3 and told it how to generate the 3rd one.
It worked. The problem with LLMs is they want to be smarter than the users or refuse work and know better.
Table 3 Element audio br hr li link style td th tr video
Your "Table 3" is missing .data, .map, .menu, and others.
@blog I agree with you that we need the point made by Gell-Mann Amnesia applied to LLM. Why do we need a new term for it?
Reply to original comment on radikal.social
|LLMs are still surprisingly bad at some simple tasks | Hacker News
Reply to original comment on news.ycombinator.com
|I think this illustrates one of the most unexpected limitations of LLMs - they're really bad at comprehensive, methodical problem solving across more than a few dozen items
Anything that involves methodically processing every item in a known long list frequently produces useless results
Reply to original comment on bsky.app
|Lau Gainpaulsingh
I have been a bit meta and asked ChatGPT about your opinion of ChatGPT and here is the result:
https://chatgpt.com/share/68cffaf6-9a70-8004-a452-e400bb11a548
I personally wouldn't go as far as knowing its limitations make it useless. On the contrary, I have got ChatGPT to refactor code for me and it has been phenomenal. I know it has made mistakes, but as an experienced software engineer, I know how to tweak things to get what I want, and I end up with very fast and efficient code. Of course, certainly it has to go through a testing process (especially because of the limitations), but you certainly can get good results.
Your argument is "it's good at doing stuff but it makes a lot of mistakes I have to correct"?
That doesn't sound like it is phenomenal. How do you know which mistakes you haven't caught?
Lau Gainpaulsingh
In the same way that a thorough testing process will capture the mistakes that a human hasn't caught. That is the point of test teams. A testing process should capture the "trivial" mistakes that LLM will make, just like how it will capture more complex ones that humans (and AI) make. However, the refactoring has outweighed what I could possibly do manually. I am aware of the edge cases and test scenarios I need to do. Naturally I will only do this on code that isn't a huge project requiring hundreds of developers, but in large scale projects, you will also have large test teams.
Maz Atighi
Your observation that they’re terrible at some things is exactly right, and you’re not prompting it wrong, but I do think you’re using it wrong.
The question that correlates with usefulness is “Am I asking about something likely to be well documented in some textual medium?” If yes, you’ll likely get good results. If no, garbage.
Here are my last few ChatGPT queries, all of which gave me excellent answers. * How can I keep my charger from falling out of an airplane power jack? * What are the various ways of representing sports odds and what do they mean? * How are Indonesia, Malaysia, and Singapore related?
These days, these answers come with links to sources and they’re all quite accurate and much faster than sorting through search results.
There are more “think-y” queries that can be hit or miss, but asking about something novel, like your question in the post, is likely to be a bust.
The answer is well-documented though. The answer has been on my blog for a couple of years, and has been slurped up by various aggregators.
Even if it hadn't ingested my work - both those lists are well-referenced.
Mike
I don’t think you’ve understood, Terence. You are at fault for asking the spicy autocomplete to do something it is unable to, which is incorrect usage. If you take the time to properly understand the myriad limitations and how to avoid them you will see how brilliant the technology is. Your next job could be in Prompt Engineering, you just don’t know it yet.
@Edent At the risk of claiming "You're holding it wrong", the model makes a massive difference. I gave the same question to Google Gemini 2.5Pro "Thinking" (lulz) model. Here's the result.
gTLD: audio, date, form, link, map, menu, nav, select, style, video
ccTLD: hr, td, th, tr
I exported the result as a doc, which doesn't show the "Thinking" (lulz), or the Sources it quotes, so I pasted that in, in a different font. 🤷
https://docs.google.com/document/d/1WKhyjsD8sXx13JTRl1DveVqGxUE2fjN5Iig2lEbBU_s/edit?usp=sharing
Reply to original comment on mastodon.social
|OK. But it missed out "br" - there's a
<br>
element and a .br for Brazil.It's also missing "search".
It has made up the .nav domain, which doesn't exit.
So perhaps both of us are holding it wrong 🙃
Maz Atighi
An argument in blog comments is giving me serious 2005 nostalgia. This is offered in the spirit of discussion, rather than argument, but something on one blog (or a dozen blogs) is not well-documented, nor would we want it to be considered so, because then LLM output would really be nothing but garbage. The scale of the documentation has to be far greater, or from very authoritative sources.
I've been hosting arguments here for many years - and hope to do so far into the future.
Considering that most of the code generation appears to have been trained on half-arsed StackOverflow answers, I think it is fair to say that the output is closer to garbage than many would admit.
As I say, both lists are very well documented and from highly authoritative sources.
MichalZ
Tried https://www.perplexity.ai/search/which-tlds-have-the-same-name-42yGbTY2SoSChRcBmqfdCg#0 with Research mode
So it ignores 4 generic TLDs and all the ccTLDs!
📖
Reply to original comment on programming.dev
|More comments on Mastodon.
Trackbacks and Pingbacks
[…] välillä meiltä menee ohi semmosia juttuja mitkä on meidän omaa domainia, miten me voidaan tietää että vastaus on oikein tai väärin jollain sellaisella liiketoiminta-alueella jota me ei tunneta kunnolla? Jotkin vastaukset voi olla […]