New page on seirdy.one: Scrapers I block (and allow), with explanations.
I’ve replaced all the comments in my robots.txt file with a more readable and detailed web page on scrapers I block. It includes info on the multiple blocking-approaches and criteria I use, commonly-blocked scrapers I allow, and more fact-checking than most of the more comprehensive alternatives.
Seirdy reshared this.
Seirdy
in reply to Seirdy • • •Seirdy
in reply to Seirdy • • •solo
in reply to Seirdy • • •Seirdy likes this.
Seirdy
in reply to solo • • •solo
in reply to Seirdy • • •Seirdy
in reply to solo • • •solo
in reply to Seirdy • • •might want to edit it
you wrote
I assume here you intended to write "with no incentive for compliance"
Seirdy likes this.
Seirdy
in reply to solo • • •Seirdy
in reply to Seirdy • • •NoCacheto myX-Robotsand documented why.solo
in reply to Seirdy • • •yo, I'm looking into additional things to block in the
robots.txtfor my website (basing a decent bit of it off of yours, plus any additional stuff I find), and I felt like I'd want to just throw this your wayI'm personally planning to block everything from the first url, as well as the following from the second url
I would also like to note,nvm, I mis-remembered and thought you had blocked it due to it being AI-relatedBLEXBotis listed on the second site as an "SEO Crawler" and it indicates that it does not believe it is AI-relatedI'll mention any other resources as I find them.
Seirdy
in reply to solo • • •solo
in reply to Seirdy • • •ah, I see
did not look at the things you cite lol
do you have some examples of things that are incorrect?
there are several on that list that would probably be good to block, which you don't block (unsure if they're actually used in practice anymore or if they're just historical), such as
Claude-Webcohere-aianthropic-aithere is also the
aiHitBotone that I mentionedSeirdy
in reply to solo • • •Seirdy
in reply to Seirdy • • •