I'm going to shoot myself into the foot here: AI as a tool to give good, concise, descriptive and accurate alt text to images has greatly surpassed my own abilities. See my previous toot.
At this point I think it's a hindrance to the quality of an alt text if **I** make it by hand.
Prediction: Alt text will be generated by AI directly on the consumer's side so that *they* can tell what detail, information density, parts of the picture are important for *them*. And pre-written alt text will be frowned upon.
Glenn
in reply to Michal Bryxí 🌱 • • •Michal Bryxí 🌱
in reply to Glenn • • •Jupiter Rowland
in reply to Michal Bryxí 🌱 • • •@Michal Bryxí 🌱
Won't happen.
Maybe AI sometimes happens to be as good as humans when it comes to describing generic, everyday images that are easy to describe. By the way, I keep seeing AI miserably failing to describe cat photos.
But when it comes to extremely obscure niche content, AI can only produce useless train wrecks. And this will never change. When it comes to extremely obscure niche content, AI not only requires full, super-detailed, up-to-date-by-the-minute knowledge of all aspects of the topic, down to niches within niches within the niche, but it must be able to explain it, and it must know that and inhowfar it's necessary to explain it.
I've pitted
... show more@Michal Bryxí 🌱
Won't happen.
Maybe AI sometimes happens to be as good as humans when it comes to describing generic, everyday images that are easy to describe. By the way, I keep seeing AI miserably failing to describe cat photos.
But when it comes to extremely obscure niche content, AI can only produce useless train wrecks. And this will never change. When it comes to extremely obscure niche content, AI not only requires full, super-detailed, up-to-date-by-the-minute knowledge of all aspects of the topic, down to niches within niches within the niche, but it must be able to explain it, and it must know that and inhowfar it's necessary to explain it.
I've pitted LLaVA against my own hand-written image descriptions. Twice. Not simply against the short image descriptions in my alt-texts, but against the full, long, detailed, explanatory image descriptions in the posts.
And LLaVA failed so, so miserably. What little it described, it often got it wrong. More importantly, LLaVA's descriptions were nowhere near explanatory enough for a casual audience with no prior knowledge in the topic to really understand the image.
500+ characters generated by LLaVA in five seconds are no match against my own 25,000+ characters that took me eight hours to research and write.
1,100+ characters generated by LLaVA in 30 seconds are no match against my own 60,000+ characters that took me two full days to research and write.
When I describe my images, I put abilities to use that AI will never have. Including, but not limited to the ability to join and navigate 3-D virtual worlds. Not to mention that an AI would have to be able to deduce from a picture where exactly a virtual world image was created, and how to get there.
So no, ChatGPT won't write circles around me by next year. Or ever. Neither will any other AI out there.
#Long #LongPost #CWLong #CWLongPost #VirtualWorlds #AltText #AltTextMeta #CWAltTextMeta #ImageDescription #ImageDescriptions #ImageDescriptionMeta #CWImagDescriptionMeta #LLaVA #AI #AIVsHuman #HumanVsAI
LLaVA
llava-vl.github.ioJupiter Rowland
2024-03-05 19:27:53
Michal Bryxí 🌱
in reply to Jupiter Rowland • • •@jupiter_rowland So tell me *exactly* where I'm getting this wrong: I designed a single test, for a single image for a single LLM. And then I, myself executed said test comparing said LLM+image+execution against myself as the benchmark baseline. I intentionally ignored all the infinite variations of the possible results *I* as an author can request from *the machine* to produce and let it produce _some_ output. More than that I intentionally made the test run under a _quantitative_ different conditions where I either by ignorance or lack of knowledge did not allow the machine to expand it's output to be of a comparable size to mine (which is trivial to achieve btw). I am also willingly admitting that I used for my benchmark *more* information than said image and I intentionally _did not_ provide said information to the machine I was bench-marking against.
And from there I concluded that my results are better.
I can see what you tried to achieve there. But I'm sorry to say that my own benchmark, designed by me, executed by me and summed up also by me came with a result: Nah.
Michal Bryxí 🌱
in reply to Michal Bryxí 🌱 • • •Let's try a simple test: Without any context, if you're a user that *relies* on alt text to understand the content on pictures here on the Fediverse. Would you rather have alt text that is:
#AltText #AltTextMeta #CWAltTextMeta #ImageDescription #ImageDescriptions #ImageDescriptionMeta
- 558-characters long (0%, 0 votes)
- 25,271-characters long (0%, 0 votes)
- 🍿 (100%, 1 vote)
1 voter. Poll end: 1 year agoJupiter Rowland
in reply to Michal Bryxí 🌱 • • •@Michal Bryxí 🌱
The context matters. A whole lot.
A simple real-life cat photograph can be described in a few hundred characters, and everyone knows what it's all about. It doesn't need much visual description because it's mainly only the cat that matters. Just about everyone knows what real-life cats generally look like, except from the ways they differ from one another. Even people born 100% blind should have a rough enough idea what a cat is and what it looks like from a) being told it if they inquire and b) touching and petting a few cats.
Thus, most elements of a real-life cat photograph can safely be assumed to be common knowledge. They don't require description, and they don't require explanation because everyone should know what a cat is.
Now, let's take the image which LLaVA has described in 558 characters, and which I've previously descri
... show more@Michal Bryxí 🌱
The context matters. A whole lot.
A simple real-life cat photograph can be described in a few hundred characters, and everyone knows what it's all about. It doesn't need much visual description because it's mainly only the cat that matters. Just about everyone knows what real-life cats generally look like, except from the ways they differ from one another. Even people born 100% blind should have a rough enough idea what a cat is and what it looks like from a) being told it if they inquire and b) touching and petting a few cats.
Thus, most elements of a real-life cat photograph can safely be assumed to be common knowledge. They don't require description, and they don't require explanation because everyone should know what a cat is.
Now, let's take the image which LLaVA has described in 558 characters, and which I've previously described in 25,271 characters.
For one, it doesn't focus on anything. It shows an entire scene. If the visual description has to include what's important, it has to include everything in the image because everything in the image is important just the same.
Besides, it's a picture from a 3-D virtual world. Not from the real world. People don't know anything about this kind of 3-D virtual worlds in general, and they don't know anything about this place in particular. In this picture, nothing can safely be assumed to be common knowledge. For blind or visually-impaired users even less.
People may want to know where this image was made. AI won't be able to figure that out. AI can't examine that picture and immediately and with absolute certainty recognise that it was created on a sim called Black-White Castle on an OpenSim grid named Pangea Grid, especially seeing as that place was only a few days old when I was there. LLaVA wasn't even sure if it's a video game or a virtual world. So AI won't be able to tell people.
AI doesn't know either whether or not any of the location information can be considered common knowledge and therefore necessarily to explain so humans will understand it.
I, the human describer, on the other hand, can tell people where exactly this image was made. And I can explain it to them in such a way that they'll understand it with zero prior knowledge about the matter.
Next point: text transcripts. LLaVA didn't even notice that there is text in the image, much less transcribe it. Not transcribing every bit of text in an image is sloppy; not transcribing any text in an image is ableist.
No other AI will even be able to transcribe the text in this image, however. That's because no AI can read any of it. It's all too small and, on top of that, too low-contrast for reliable OCR. All that AI has is the image I've posted at a resolution of 800x533 pixels.
I myself can see the scenery at nigh-infinite resolution by going there. No AI can do that, and no LLM AI will ever be able to do that. And so I can read and transcribe all text in the image 100% verbatim with 100% accuracy.
However, text transcripts require some room in the description, also because they additionally require descriptions of where the text is.
I win again. And so does the long, detailed description.
I'm not sure if this is typical Mastodon behaviour because it's impossible for Mastodon users to imagine that images can be described elsewhere than in the alt-text (they can, and I have), or if it's intentional trolling.
The 25,271 characters did not go into the alt-text! They went into the post.
I can put so many characters into a post. I'm not on Mastodon. I'm on Hubzilla which has never had and still doesn't have any character limits.
In the alt-text, there's a separate, shorter, still self-researched and hand-written image description to satisfy those who absolutely demand there be an image description in the alt-text.
25,271 characters in alt-text would cause Mastodon to cut 23,771 characters off and throw them away.
#Long #LongPost #CWLong #CWLongPost #VirtualWorlds #AltText #AltTextMeta #CWAltTextMeta #ImageDescription #ImageDescriptions #ImageDescriptionMeta #CWImagDescriptionMeta #LLaVA #AI #AIVsHuman #HumanVsAI
Jupiter Rowland
in reply to Jupiter Rowland • • •@Michal Bryxí 🌱 And since you obviously haven't actually read anything I've linked to, here's a quote-post of my comment in which I dissect the first AI description.#Long #LongPost #CWLong #CWLongPost #VirtualWorlds #AltText #AltT
... show more@Michal Bryxí 🌱 And since you obviously haven't actually read anything I've linked to, here's a quote-post of my comment in which I dissect the first AI description.#Long #LongPost #CWLong #CWLongPost #VirtualWorlds #AltText #AltTextMeta #CWAltTextMeta #ImageDescription #ImageDescriptions #ImageDescriptionMeta #CWImagDescriptionMeta #LLaVA #AI #AIVsHuman #HumanVsAI
LLaVA vs my own image description -
hub.netzgemeinde.euJupiter Rowland
2024-03-05 19:28:12
Jupiter Rowland
in reply to Jupiter Rowland • • •@Michal Bryxí 🌱 And while I'm at it, here's a quote-post of my comment in which I review the second AI description.#Long #LongPost #CWLong #CWLongPost #VirtualWorlds #AltText #AltTextMeta #
... show more@Michal Bryxí 🌱 And while I'm at it, here's a quote-post of my comment in which I review the second AI description.#Long #LongPost #CWLong #CWLongPost #VirtualWorlds #AltText #AltTextMeta #CWAltTextMeta #ImageDescription #ImageDescriptions #ImageDescriptionMeta #CWImagDescriptionMeta #LLaVA #AI #AIVsHuman #HumanVsAI
Jupiter Rowland
2024-05-17 22:24:46
Michal Bryxí 🌱
in reply to Jupiter Rowland • • •Michal Bryxí 🌱
in reply to Jupiter Rowland • • •Michal Bryxí 🌱
in reply to Jupiter Rowland • • •