Skip to main content


So, @mitsuhiko is working on using LLMs to process XML Except for, the models can’t write legal XML. So he’s using the model to generate a sloppy-XML parser: lucumr.pocoo.org/2025/6/21/my-…

OK, my mind is now made up about vibe coding. I’m saying take the ship up and nuke the site from orbit. It’s the only way to be sure.

#genAI

in reply to Tim Bray

Interesting contrast to read this shortly after another somewhat different post about parsers blog.trailofbits.com/2025/06/1…
in reply to Tim Bray

.. do it before the ship controls are updated with sloppy vibe xml
in reply to Tim Bray

I looked at the code. I'm not a python person, but I'm sure the code style itself is perfectly standard and accessible. That's the least I'd expect.

What I'm really concerned about is the fact that the entire thing is based on regexes, which is a bad idea because it's not powerful enough to do it.

This is the programming version of the eating glue suggestions. Perfectly formed sentences, arguments that makes sense taken in isolation, etc. But the output as a whole isn't what you need.

in reply to Armin Ronacher

as part of a parser, of course it's fine. Many parsers do that. But for matching entire XML constructs, yes. I.e. you can match a tag, but not an entire CDATA section for example.
in reply to Armin Ronacher

@loke I was partly joking, you stepped in a pool of history. In the early days of XML there was massive controversy over whether the parsers should be “Draconian” like JSON or “Tolerant” like HTML. There are probably people out there who are still mad. And in fact it’s hard to think of an application where a “sloppy” parser would be acceptable.
in reply to Tim Bray

Right. XML is very strict for a reason.

I have created a "quasi-sloppy" XML parser once, some 20 years ago. I needed to parse the content of an XML file which had been truncated due to the application writing it crashing in the middle. This was a very clear error condition though, and the solution was as simple as overriding some methods in the SAX parser to preserve the current state if an EOF was detected.

I don't really want to know what kind of broken infrastructure would be behind the need for parsing invalid XML.

The "be lenient what you accept, be precise in what you send" principle has been deemed false for a very long time now.

in reply to Tim Bray

Forget nuke the site, we're well into "clear the system and dump a few nova bombs into the primary" territory.
in reply to Tim Bray

It’s really disappointing to see really smart people throwing in the towel on quality to do work that isn’t all that interesting, while giving oxygen to the idea that a jumble of stolen IP can effectively replace human workers.
in reply to Tim Bray

my impression of Claude's output is that it's "answer-shaped code". it has the form of ok code, but details are wrong, choices are inconsistent, organization is weird. basically, pretty much like anything else produced by LLMs/diffusion.

I'm now even less inclined to trust LLM coding. eg, there's a unit test that's subtly wrong: it parses "&" and checks the result _contains_ "&", which will pass even if the entity isn't expanded.

uncanny, hard-to-spot errors.