So, @mitsuhiko is working on using LLMs to process XML Except for, the models can’t write legal XML. So he’s using the model to generate a sloppy-XML parser: lucumr.pocoo.org/2025/6/21/my-…
OK, my mind is now made up about vibe coding. I’m saying take the ship up and nuke the site from orbit. It’s the only way to be sure.
My First Open Source AI Generated Library
In a first for me, I published some agentic programmed AI slop to PyPI.Armin Ronacher's Thoughts and Writings
Adam Kent
in reply to Tim Bray • • •Unexpected security footguns in Go's parsers
The Trail of Bits BlogTony Fisk
in reply to Tim Bray • • •Elias Mårtenson
in reply to Tim Bray • • •I looked at the code. I'm not a python person, but I'm sure the code style itself is perfectly standard and accessible. That's the least I'd expect.
What I'm really concerned about is the fact that the entire thing is based on regexes, which is a bad idea because it's not powerful enough to do it.
This is the programming version of the eating glue suggestions. Perfectly formed sentences, arguments that makes sense taken in isolation, etc. But the output as a whole isn't what you need.
Armin Ronacher
in reply to Elias Mårtenson • • •Elias Mårtenson
in reply to Armin Ronacher • • •Armin Ronacher
in reply to Elias Mårtenson • • •Tim Bray
in reply to Armin Ronacher • • •Elias Mårtenson
in reply to Tim Bray • • •Right. XML is very strict for a reason.
I have created a "quasi-sloppy" XML parser once, some 20 years ago. I needed to parse the content of an XML file which had been truncated due to the application writing it crashing in the middle. This was a very clear error condition though, and the solution was as simple as overriding some methods in the SAX parser to preserve the current state if an EOF was detected.
I don't really want to know what kind of broken infrastructure would be behind the need for parsing invalid XML.
The "be lenient what you accept, be precise in what you send" principle has been deemed false for a very long time now.
Todd Knarr
in reply to Tim Bray • • •Nick Sloan
in reply to Tim Bray • • •felix (grayscale) 🐺
in reply to Tim Bray • • •my impression of Claude's output is that it's "answer-shaped code". it has the form of ok code, but details are wrong, choices are inconsistent, organization is weird. basically, pretty much like anything else produced by LLMs/diffusion.
I'm now even less inclined to trust LLM coding. eg, there's a unit test that's subtly wrong: it parses "&" and checks the result _contains_ "&", which will pass even if the entity isn't expanded.
uncanny, hard-to-spot errors.