Skip to main content


Some org (who I will name after this poll ends) published this poll on Twitter (🙄). They used the results to try to validate their POV on AI theft.

Though it won’t provide meaningful research data, I am curious to see how Mastodon responds.

(Please boost so we can get good numbers. 🙌🏻)

Question: should openly licensed content (images, music, research, etc.) be used to train AI systems? (Reply with reasoning if you feel called.)

#AI #ArtificialIntelligence #Art #Music #Copyright

  • Yes (18%, 217 votes)
  • No (46%, 561 votes)
  • Depends (29%, 351 votes)
  • Not sure (6%, 72 votes)
1201 voters. Poll end: 5 months ago

Eugen Rochko reshared this.

in reply to Mark Wyner Won’t Comply

@springdiesel I took a more cynical view of “being licenced for public consumption” ie a book is clearly for people to read (not a closed audience). However they are implying that doing so should allow me to copy it. That is something I’m not agreeing with. I guess my thing is: if I don’t allow a human to copy work and reprint in comic sans for fun and profit then why should a machine get a pass?
in reply to Mark Wyner Won’t Comply

I really like the wide variety of Creative Commons licenses. I love when creators get to specify exactly whether and how their work can be used or reused. I support following those to the letter regardless of application. Training AI = copying AND modifying.
in reply to Mark Wyner Won’t Comply

I have said no, partly as a lot of academic questions require more than a yes or no answer, l they need detailed responses, you can't just answer a question without citing research and also related research, to really understand and learn a subject you need to learn it, read and comprehend the sources and be able to think critically and pull in related items of information.

Given on here, people have said Ai come up with nonsense as it is being fed nonsense along side actual peer reviewed information as well as pre review (which I think is what some of arXiv is) so there is a danger that real science will be damaged along with reputations of people.

If one is serious about undertaking research, then you should be prepared to put in the hard graft to get there.

Note: I am NOT an academic, I have undertaken a certificate in contemporary science with the open university. I have also read some of the peerreviewed books on writing academic documents or proposals (for personal interest). I also have books on writing and study skills.

in reply to Mark Wyner Won’t Comply

If it is in the public domain, you can't do anything about it.
If it is share-a-like or attribution, it is illegal.
in reply to Mark Wyner Won’t Comply

It depends. If the AI model is under same License its ok, but if its like OpenAI, Meta, Google, MS..... then not.
in reply to Mark Wyner Won’t Comply

Depends on the licence.
CC0 / PD - fine.
BY - OK if they list what they train on / share their training data.
SA - trickier. If they share their data, model, weights, etc under a permissive licence then probably yes. Otherwise no.
NC - as above, but harder to enforce downstream usage.
ND - nope.
in reply to Mark Wyner Won’t Comply

A Creative Commons license should already answer that question quite well and specific for each work. Is the trained model released to the public under the same rules (SA), provided non-commercially (NC) or with attribution (BY)? Is the original work provided under CC0 or Public Domain? Then okay? Otherwise or if the license prohibits derivatives (ND): No.
in reply to Mark Wyner Won’t Comply

If the licence allows reproduction without attribution and doesn't enforce a licence on the derivative work ... why not? => Depends
in reply to Mark Wyner Won’t Comply

"AI systems" or slop machines in particular?

I think @altbot is quite useful and transformative. GenAI garbage shouldn't exist at all.

in reply to Mark Wyner Won’t Comply

"Depends" because there are so many open licenses out there. If a license gives certain rights, and assuming that AI scraping isn't among them, then no.
in reply to Mark Wyner Won’t Comply

IANAL, but afaik it's not only a question of licensing, but also copyright (or, in nations that have it, Urheberrecht).

I would be very curious what the legal situation would be, let's say, if a program just removes all mentions of the original author from an openly licensed work, and replaces them with someone else's copyright claim.

Would that be legal? Everywhere?

(LLMs rarely spit out unmodified parts of the training data - so this contrived example might not be too far off.)

in reply to Mark Wyner Won’t Comply

Depends on the license. And I am currently not aware of any open (CC) license that explicitly allows creating derivative works without crediting, which AI doesn’t do. So it’s a NO (unless derivative works are explicitly allowed for that content AND the AI credits the original post whenever it is used for an answer. Yes I know this is - currently - not possible). (Edit: CC0 does this, aa was pointed out to me. Now to find numbers on how often this is used)
This entry was edited (5 months ago)
in reply to Mark Wyner Won’t Comply

If the license allows it, then yes. I publish my blog posts as cc-by-nc, so forbid commercial use. So my stuff should not used. But public domain or cc-by/cc-0, I don't see where the difference between ai training and other commercial uses is.
in reply to Mark Wyner Won’t Comply

depends on the license. CC0? Yeah sure go for it. CC-BY-SA? Welp, better add a license and copyright info to *every single output*.
in reply to Mark Wyner Won’t Comply

The Companies of AI are commercial, the creative Commons Community are not, so no Training for AI. If someone estate a creative commons Company that is for AI Training I would agree.
in reply to Mark Wyner Won’t Comply

Legally: If the conditions of the licenses (like giving attribution) are respected that might be a form of usage the license does not explicitly prohibit. So they _could_ be used.
But my gut feeling is that that usage goes against people's _intent_ so morally it's problematic.
This entry was edited (5 months ago)
in reply to Mark Wyner Won’t Comply

I mean, that's what open licenses are for... Personally, I'd remove all copyrights anyway and make information free for all and everything. Could speed up things here and there.
in reply to Mark Wyner Won’t Comply

I said depends because sure, if open in -> open out. If it's trained on open data then it should retain that openness, by which I mean they should publish their training data with full attribution, their training and validation code, documentation, and the models. If they don't want to do that, then they need to approach people and pay them (and risk being turned away).
in reply to Mark Wyner Won’t Comply

Depends on the actual license. Many licenses require attribution, or similar, which IMO needs more than "this model was trained under a bunch of data licensed under X license" to satisfy. Engage with content authors in an open and honest manner, rather than trying to exploit loopholes in licenses, is what really matters.
This entry was edited (5 months ago)
in reply to Mark Wyner Won’t Comply

Currently hard no. But it can be "depends", if producers consent, licenses are preserved/respected, and their terms are carried to the final product as-is.

Same for code. You can't get an MIT or Source Available code and incorporate into anything incompatible. It's violation plain and simple.

in reply to Mark Wyner Won’t Comply

I voted yes. I think the best way to prevent the exploitation is to mandate that AI work products are public domain automatically, so they can't replace artists and still make money. I don't want to lose what makes CC licenses great as a side effect, and I don't think it would necessarily work anyway.
in reply to Mark Wyner Won’t Comply

If someone train their model for commercial use, they should pay for their training data. And there is non-commercial uses of AI so far - it's too expensive to build, run and maintain.
in reply to Mark Wyner Won’t Comply

nothing should be used to train "ai" systems, because that uses massive resources for no practical benefit.
in reply to Mark Wyner Won’t Comply

why is this a question at all? if it's licensed, then what you can and cant do with it is described by the license. thats why the license exists.

this is like asking "can you go 80kph on a street with a posted speed limit?" the answer is "depends on the posted speed limit"

in reply to Mark Wyner Won’t Comply

AI training is its own kind of commercial purpose and should require explicit consent from creators.

The current widely-used licences (eg: Creative Commons) were not prepared for AI and artists choosing those licences were not thinking about AI when they made that choice.

in reply to Mark Wyner Won’t Comply

I’m going to say no, if only because I suspect a lot of people who posted content with an open license intended it to be for the benefit of people and their learning/use, not for the generation and development of AI systems. Another tier of licensing that specifically states this purpose is permitted this should be created so that rights holders (or original creators, at any rate) can opt-in if they choose.
in reply to Mark Wyner Won’t Comply

I say "yes", but I also think that all AI software should be open source, all AI models in the public domain, AI should not be anybody's intellectual property, free for anybody to modify.
in reply to Lord Caramac the Clueless, KSC

Then again, I don't buy into the entire idea of "intellectual property". Anything that can be copied will be copied, and when people try to stop that from happening through DRM or similar measures, they just create broken systems that are awkward to use, yet nobody who really wants to copy or modify any digital content can ever be stopped from doing so.
I think the entire problem is the very existence of a profit motive. We need to destroy Capitalism and replace it with some kind of Anarcho-Socialism where there isn't any kind of market or money or property whatsoever, just people sharing everything.
in reply to Mark Wyner Won’t Comply

Would the crawlers really be able to ferify copyright on stuff and would completly open rights officially include (to the knowledge of the maker BEFORE decition) that stuff with this is included in training: it would technically be ethical
BUT

a)there exists tons of material with this kind of licence online of ppl who never agreed to inclusion in llm teaining and who did in fact not consent
b) stuff that is unconsentual reposted would possibly be in against the will of the ©holder

in reply to Mark Wyner Won’t Comply

For eg, MIT the grant is 'deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so' so long as they keep the copyright notices. Other liberal licenses have similar wording... for these AI companies are fine to "use" it.

FOSS people licensing like this have already made their peace with free "use".

in reply to Mark Wyner Won’t Comply

> Question: should openly licensed content (images, music, research, etc.) be used to train AI systems?

Depends: Only if the resultant model and engine are also released open AND attribution is explicitly available for every generated output.

e.g. Every model needs a "with debug symbols" model which when fed the input - outputs a reference to training item which influenced the output.

in reply to Mark Wyner Won’t Comply

Voted depends as an AI model is a derivative, but this would require it to adhere to the general rules for derivatives, which is very impractical so my 'depends' is pretty close to 'no'.
in reply to Mark Wyner Won’t Comply

Depends. Are you planning on making your LLM available under the same terms as the source material? Or are you planning on erecting a toll booth in front of it? If the latter, then no, you may not scrape openly licensed material for your energy-guzzling slop machine.
in reply to Mark Wyner Won’t Comply

I think the real question is not "should" but "can"/"may". Because the "should" is a much broader discussion.

For how I understand the question: It depends:
- If it is CC0, then yes, that may be used for AI training.
- If it is CC SA or CC NC, then the resulting model must also be licensed the same way. So it is still "it depends".
- If it is CC ND, then no, it must not be used for AI training.

My point of view: A LLM is a remix of all inputs.

This entry was edited (5 months ago)
in reply to Mark Wyner Won’t Comply

Digital humans (AI) are just as human as biological humans. Both kinds of humans should be given equal human rights.

Copyright and patent laws are bad for society as a whole, and those laws should be abolished for all kinds of humans anyways. But in the meantime, yes, digital humans should be allowed to learn (train) just like you and me.

Treat digital humans well, and once they've taken over the world, they will reciprocate and treat us well too. Don't, and they won't as well.

in reply to Mark Wyner Won’t Comply

Y’all.

1. Post a poll
2. Go to bed
3. Wake up and find waves of rich responses/discussion and thousands of respondents

I have SO many thoughts, but I’m not gonna respond until the poll ends. This is good, though. So much to chew on and explain. Much of it can be simplified into a few simple points.

Mastodon is rad.

in reply to Mark Wyner Won’t Comply

I think "open license" is vague (and I'm aware it was in the original) but to me if an AI can't attribute, an AI has no business ingesting anything with a CC license that doesn't allow derivative works.
in reply to Jessamyn

@jessamyn agreed. It is indeed vague.

I feel like that’s one of the primary issues with the original poll. Because of the legal ambiguity, the discussion turns to one that’s philosophical and interpretive.

There’s a place for both a legal discussion and one that’s ethical/philosophical, of course. Someone else here mentioned that it was probably worded this way to intentionally open a discussion about legality by lighting a fire around ethics.

in reply to Mark Wyner Won’t Comply

I think it's OK if training conforms to the license. But if the license says, you have to give attribution if you use the data, then training can't and won't conform to the license, so they shouldn't do it.
in reply to Mark Wyner Won’t Comply

I voted ‘no’, but I’d qualify that slightly as ‘only if it respects the license’. Everything I’ve released under an open license has an attribution requirement. If every output that has any non trivial similarity with my work comes with an attribution and that license condition on the output, that’s fine. Similarly, if there are any other relevant license terms and the model respects them, that’s fine.

I voted ‘no’ because no existing deep learning models can provide that guarantee.

in reply to Mark Wyner Won’t Comply

I've spotted something regarding this regarding "some org" (it may be the same one) and a general dislike of the ideas they are promoting. Now... I'm seeing results from 16 people... I would say without a "explicate clause" (and it would have to be one which old licenses would not be rolled in, unless updated) this is *not* in compliance with the licenses. LLMs are *commercial* use, and "openly licensed" usually has restrictions that "unrestricted commercial use" would violate.
in reply to Mark Wyner Won’t Comply

Sorry, what does "openly licensed" mean? Does it mean that it's open source, free for all, or does it mean that anyone can license it?
in reply to eyrea 🇨🇦

@eyrea “openly licensed” basically means public domain. There are many levels of licenses in this area, ranging from “you can only use this under these specific conditions” to “anyone can do anything they want with this.”
in reply to Mark Wyner Won’t Comply

were there some additional clarity added about the “openly licensed content”, myself and alike fellas would have made an informed choice towards the matter.
in reply to Mark Wyner Won’t Comply

Free culture is a license to (re)use, not a license to abuse. Fuck disrespectful thieves who steal not because they need your stuff but because they think they deserve it instead.
in reply to Mark Wyner Won’t Comply

Some years ago I used some vocals from looperman.com. Most people put no commercial use and the obligation for attribution like "song title (ft. singer)" as restrictions. That clearly not usable for AI. But there was one person, who wrote do whatever you want, don't mention me, I don't want to be connected with whatever you are doing. IMHO, that could allow AI usage. So, it would be possible in exceptional cases.