Mark Wyner Won’t Comply

Mark Wyner Won’t Comply

5 months ago • •

Mark Wyner Won’t Comply
5 months ago • •

Some org (who I will name after this poll ends) published this poll on Twitter (🙄). They used the results to try to validate their POV on AI theft.

Though it won’t provide meaningful research data, I am curious to see how Mastodon responds.

(Please boost so we can get good numbers. 🙌🏻)

—

Question: should openly licensed content (images, music, research, etc.) be used to train AI systems? (Reply with reasoning if you feel called.)

#AI #ArtificialIntelligence #Art #Music #Copyright

Yes (18%, 217 votes)
No (46%, 561 votes)
Depends (29%, 351 votes)
Not sure (6%, 72 votes)

1201 voters. Poll end: 5 months ago

Eugen Rochko reshared this.

in reply to Mark Wyner Won’t Comply

Coach Spore Diesel

in reply to Mark Wyner Won’t Comply • 5 months ago • •

I don't exactly know what "openly licensed" means. Like Creative Commons?

in reply to Coach Spore Diesel

Mark Wyner Won’t Comply

in reply to Coach Spore Diesel • 5 months ago • •

@springdiesel yeah, that’s what they wanted to imply. I used their wording. Good question.

@Coach Spore Diesel

in reply to Mark Wyner Won’t Comply

Francis Cook

in reply to Mark Wyner Won’t Comply • 5 months ago • •

@springdiesel I took a more cynical view of “being licenced for public consumption” ie a book is clearly for people to read (not a closed audience). However they are implying that doing so should allow me to copy it. That is something I’m not agreeing with. I guess my thing is: if I don’t allow a human to copy work and reprint in comic sans for fun and profit then why should a machine get a pass?

@Coach Spore Diesel

in reply to Mark Wyner Won’t Comply

Coach Spore Diesel

in reply to Mark Wyner Won’t Comply • 5 months ago • •

I really like the wide variety of Creative Commons licenses. I love when creators get to specify exactly whether and how their work can be used or reused. I support following those to the letter regardless of application. Training AI = copying AND modifying.

in reply to Mark Wyner Won’t Comply

Paul Sutton

in reply to Mark Wyner Won’t Comply • 5 months ago • •

I have said no, partly as a lot of academic questions require more than a yes or no answer, l they need detailed responses, you can't just answer a question without citing research and also related research, to really understand and learn a subject you need to learn it, read and comprehend the sources and be able to think critically and pull in related items of information.

Given on here, people have said Ai come up with nonsense as it is being fed nonsense along side actual peer reviewed information as well as pre review (which I think is what some of arXiv is) so there is a danger that real science will be damaged along with reputations of people.

If one is serious about undertaking research, then you should be prepared to put in the hard graft to get there.

Note: I am NOT an academic, I have undertaken a certificate in contemporary science with the open university. I have also read some of the peerreviewed books on writing academic documents or proposals (for personal interest). I also have books on writing and study skills.

in reply to Mark Wyner Won’t Comply

Christof Damian 💙💛

in reply to Mark Wyner Won’t Comply • 5 months ago • •

If it is in the public domain, you can't do anything about it.
If it is share-a-like or attribution, it is illegal.

in reply to Mark Wyner Won’t Comply

KickDownCH

in reply to Mark Wyner Won’t Comply • 5 months ago • •

It depends. If the AI model is under same License its ok, but if its like OpenAI, Meta, Google, MS..... then not.

in reply to Mark Wyner Won’t Comply

Mike Taylor 🦕

in reply to Mark Wyner Won’t Comply • 5 months ago • •

Of course yes. That's what open licensing MEANS.

in reply to Mark Wyner Won’t Comply

Terence Eden

in reply to Mark Wyner Won’t Comply • 5 months ago • •

Depends on the licence.
CC0 / PD - fine.
BY - OK if they list what they train on / share their training data.
SA - trickier. If they share their data, model, weights, etc under a permissive licence then probably yes. Otherwise no.
NC - as above, but harder to enforce downstream usage.
ND - nope.

in reply to Terence Eden

Mark Wyner Won’t Comply

in reply to Terence Eden • 5 months ago • •

@Edent love the detailed response here. 👏🏻

@Terence Eden

in reply to Mark Wyner Won’t Comply

Tom Schaffer

in reply to Mark Wyner Won’t Comply • 5 months ago • •

A Creative Commons license should already answer that question quite well and specific for each work. Is the trained model released to the public under the same rules (SA), provided non-commercially (NC) or with attribution (BY)? Is the original work provided under CC0 or Public Domain? Then okay? Otherwise or if the license prohibits derivatives (ND): No.

in reply to Mark Wyner Won’t Comply

Jan Niklas Fingerle

in reply to Mark Wyner Won’t Comply • 5 months ago • •

If the licence allows reproduction without attribution and doesn't enforce a licence on the derivative work ... why not? => Depends

in reply to Mark Wyner Won’t Comply

lick here for more info

in reply to Mark Wyner Won’t Comply • 5 months ago • •

"AI systems" or slop machines in particular?

I think @altbot is quite useful and transformative. GenAI garbage shouldn't exist at all.

@Altbot

in reply to Mark Wyner Won’t Comply

Secret Squirrel

in reply to Mark Wyner Won’t Comply • 5 months ago • •

"Depends" because there are so many open licenses out there. If a license gives certain rights, and assuming that AI scraping isn't among them, then no.

in reply to Mark Wyner Won’t Comply

Andreas Grois

in reply to Mark Wyner Won’t Comply • 5 months ago • •

IANAL, but afaik it's not only a question of licensing, but also copyright (or, in nations that have it, Urheberrecht).

I would be very curious what the legal situation would be, let's say, if a program just removes all mentions of the original author from an openly licensed work, and replaces them with someone else's copyright claim.

Would that be legal? Everywhere?

(LLMs rarely spit out unmodified parts of the training data - so this contrived example might not be too far off.)

in reply to Mark Wyner Won’t Comply

Patrick Dersjant RCX

in reply to Mark Wyner Won’t Comply • 5 months ago • •

Depends on the license. And I am currently not aware of any open (CC) license that explicitly allows creating derivative works without crediting, which AI doesn’t do. So it’s a NO (unless derivative works are explicitly allowed for that content AND the AI credits the original post whenever it is used for an answer. Yes I know this is - currently - not possible). (Edit: CC0 does this, aa was pointed out to me. Now to find numbers on how often this is used)

This entry was edited (5 months ago)

in reply to Mark Wyner Won’t Comply

Christian

in reply to Mark Wyner Won’t Comply • 5 months ago • •

If the license allows it, then yes. I publish my blog posts as cc-by-nc, so forbid commercial use. So my stuff should not used. But public domain or cc-by/cc-0, I don't see where the difference between ai training and other commercial uses is.

in reply to Mark Wyner Won’t Comply

Pixdigit

in reply to Mark Wyner Won’t Comply • 5 months ago • •

depends on the license. CC0? Yeah sure go for it. CC-BY-SA? Welp, better add a license and copyright info to *every single output*.

in reply to Mark Wyner Won’t Comply

Comicbuchtyp

in reply to Mark Wyner Won’t Comply • 5 months ago • •

The Companies of AI are commercial, the creative Commons Community are not, so no Training for AI. If someone estate a creative commons Company that is for AI Training I would agree.

in reply to Mark Wyner Won’t Comply

tante

in reply to Mark Wyner Won’t Comply • 5 months ago • •

Legally: If the conditions of the licenses (like giving attribution) are respected that might be a form of usage the license does not explicitly prohibit. So they _could_ be used.
But my gut feeling is that that usage goes against people's _intent_ so morally it's problematic.

This entry was edited (5 months ago)

in reply to Mark Wyner Won’t Comply

Dr. Lämmerbein

in reply to Mark Wyner Won’t Comply • 5 months ago • •

I mean, that's what open licenses are for... Personally, I'd remove all copyrights anyway and make information free for all and everything. Could speed up things here and there.

in reply to Mark Wyner Won’t Comply

humanfish

in reply to Mark Wyner Won’t Comply • 5 months ago • •

I said depends because sure, if open in -> open out. If it's trained on open data then it should retain that openness, by which I mean they should publish their training data with full attribution, their training and validation code, documentation, and the models. If they don't want to do that, then they need to approach people and pay them (and risk being turned away).

in reply to Mark Wyner Won’t Comply

Paul Moore

in reply to Mark Wyner Won’t Comply • 5 months ago • •

Depends on the actual license. Many licenses require attribution, or similar, which IMO needs more than "this model was trained under a bunch of data licensed under X license" to satisfy. Engage with content authors in an open and honest manner, rather than trying to exploit loopholes in licenses, is what really matters.

This entry was edited (5 months ago)

in reply to Mark Wyner Won’t Comply

Hakan Bayındır

in reply to Mark Wyner Won’t Comply • 5 months ago • •

Currently hard no. But it can be "depends", if producers consent, licenses are preserved/respected, and their terms are carried to the final product as-is.

Same for code. You can't get an MIT or Source Available code and incorporate into anything incompatible. It's violation plain and simple.

in reply to Mark Wyner Won’t Comply

fractal_timescales

in reply to Mark Wyner Won’t Comply • 5 months ago • •

I voted yes. I think the best way to prevent the exploitation is to mandate that AI work products are public domain automatically, so they can't replace artists and still make money. I don't want to lose what makes CC licenses great as a side effect, and I don't think it would necessarily work anyway.

in reply to Mark Wyner Won’t Comply

Fionor

in reply to Mark Wyner Won’t Comply • 5 months ago • •

If someone train their model for commercial use, they should pay for their training data. And there is non-commercial uses of AI so far - it's too expensive to build, run and maintain.

in reply to Mark Wyner Won’t Comply

Fish Id Wardrobe

in reply to Mark Wyner Won’t Comply • 5 months ago • •

nothing should be used to train "ai" systems, because that uses massive resources for no practical benefit.

in reply to Mark Wyner Won’t Comply

A Very Merry Mimsy

in reply to Mark Wyner Won’t Comply • 5 months ago • •

why is this a question at all? if it's licensed, then what you can and cant do with it is described by the license. thats why the license exists.

this is like asking "can you go 80kph on a street with a posted speed limit?" the answer is "depends on the posted speed limit"

in reply to Mark Wyner Won’t Comply

Simon Carpentier

in reply to Mark Wyner Won’t Comply • 5 months ago • •

AI training is its own kind of commercial purpose and should require explicit consent from creators.

The current widely-used licences (eg: Creative Commons) were not prepared for AI and artists choosing those licences were not thinking about AI when they made that choice.

in reply to Mark Wyner Won’t Comply

Christopher Griffiths

in reply to Mark Wyner Won’t Comply • 5 months ago • •

I’m going to say no, if only because I suspect a lot of people who posted content with an open license intended it to be for the benefit of people and their learning/use, not for the generation and development of AI systems. Another tier of licensing that specifically states this purpose is permitted this should be created so that rights holders (or original creators, at any rate) can opt-in if they choose.

in reply to Mark Wyner Won’t Comply

Lord Caramac the Clueless, KSC

in reply to Mark Wyner Won’t Comply • 5 months ago • •

I say "yes", but I also think that all AI software should be open source, all AI models in the public domain, AI should not be anybody's intellectual property, free for anybody to modify.

in reply to Lord Caramac the Clueless, KSC

Mark Wyner Won’t Comply

in reply to Lord Caramac the Clueless, KSC • 5 months ago • •

@LordCaramac all AI software should be open source. I love that. I’d argue it should even be FOSS.

@Lord Caramac the Clueless, KSC

in reply to Lord Caramac the Clueless, KSC

Lord Caramac the Clueless, KSC

in reply to Lord Caramac the Clueless, KSC • 5 months ago • •

Then again, I don't buy into the entire idea of "intellectual property". Anything that can be copied will be copied, and when people try to stop that from happening through DRM or similar measures, they just create broken systems that are awkward to use, yet nobody who really wants to copy or modify any digital content can ever be stopped from doing so.
I think the entire problem is the very existence of a profit motive. We need to destroy Capitalism and replace it with some kind of Anarcho-Socialism where there isn't any kind of market or money or property whatsoever, just people sharing everything.

in reply to Lord Caramac the Clueless, KSC

Mark Wyner Won’t Comply

in reply to Lord Caramac the Clueless, KSC • 5 months ago • •

@LordCaramac yes! FOSS is the best. Think of what we could all accomplish if all (or at least most) software was FOSS.

@Lord Caramac the Clueless, KSC

in reply to Mark Wyner Won’t Comply

Viktoria D. Richards/Uddelhexe

in reply to Mark Wyner Won’t Comply • 5 months ago • •

Would the crawlers really be able to ferify copyright on stuff and would completly open rights officially include (to the knowledge of the maker BEFORE decition) that stuff with this is included in training: it would technically be ethical
BUT

a)there exists tons of material with this kind of licence online of ppl who never agreed to inclusion in llm teaining and who did in fact not consent
b) stuff that is unconsentual reposted would possibly be in against the will of the ©holder

in reply to Mark Wyner Won’t Comply

degenerating degenerate

in reply to Mark Wyner Won’t Comply • 5 months ago • •

For eg, MIT the grant is 'deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so' so long as they keep the copyright notices. Other liberal licenses have similar wording... for these AI companies are fine to "use" it.

FOSS people licensing like this have already made their peace with free "use".

in reply to Mark Wyner Won’t Comply

cyclical_obsessive

in reply to Mark Wyner Won’t Comply • 5 months ago • •

> Question: should openly licensed content (images, music, research, etc.) be used to train AI systems?

Depends: Only if the resultant model and engine are also released open AND attribution is explicitly available for every generated output.

e.g. Every model needs a "with debug symbols" model which when fed the input - outputs a reference to training item which influenced the output.

in reply to Mark Wyner Won’t Comply

kainisenni

in reply to Mark Wyner Won’t Comply • 5 months ago • •

Voted depends as an AI model is a derivative, but this would require it to adhere to the general rules for derivatives, which is very impractical so my 'depends' is pretty close to 'no'.

in reply to Mark Wyner Won’t Comply

Dinah 🕊🇺🇦

in reply to Mark Wyner Won’t Comply • 5 months ago • •

No, unless licensed for commercial use.

in reply to Mark Wyner Won’t Comply

ewhac

in reply to Mark Wyner Won’t Comply • 5 months ago • •

Depends. Are you planning on making your LLM available under the same terms as the source material? Or are you planning on erecting a toll booth in front of it? If the latter, then no, you may not scrape openly licensed material for your energy-guzzling slop machine.

in reply to Mark Wyner Won’t Comply

SchwarzeLocke

in reply to Mark Wyner Won’t Comply • 5 months ago • •

I think the real question is not "should" but "can"/"may". Because the "should" is a much broader discussion.

For how I understand the question: It depends:
- If it is CC0, then yes, that may be used for AI training.
- If it is CC SA or CC NC, then the resulting model must also be licensed the same way. So it is still "it depends".
- If it is CC ND, then no, it must not be used for AI training.

My point of view: A LLM is a remix of all inputs.

This entry was edited (5 months ago)

in reply to Mark Wyner Won’t Comply

harmone

in reply to Mark Wyner Won’t Comply • 5 months ago • •

Digital humans (AI) are just as human as biological humans. Both kinds of humans should be given equal human rights.

Copyright and patent laws are bad for society as a whole, and those laws should be abolished for all kinds of humans anyways. But in the meantime, yes, digital humans should be allowed to learn (train) just like you and me.

Treat digital humans well, and once they've taken over the world, they will reciprocate and treat us well too. Don't, and they won't as well.

in reply to Mark Wyner Won’t Comply

Mark Wyner Won’t Comply

in reply to Mark Wyner Won’t Comply • 5 months ago • •

Y’all.

1. Post a poll
2. Go to bed
3. Wake up and find waves of rich responses/discussion and thousands of respondents

I have SO many thoughts, but I’m not gonna respond until the poll ends. This is good, though. So much to chew on and explain. Much of it can be simplified into a few simple points.

Mastodon is rad.

in reply to Mark Wyner Won’t Comply

Jessamyn

in reply to Mark Wyner Won’t Comply • 5 months ago • •

I think "open license" is vague (and I'm aware it was in the original) but to me if an AI can't attribute, an AI has no business ingesting anything with a CC license that doesn't allow derivative works.

in reply to Jessamyn

Mark Wyner Won’t Comply

in reply to Jessamyn • 5 months ago • •

@jessamyn agreed. It is indeed vague.

I feel like that’s one of the primary issues with the original poll. Because of the legal ambiguity, the discussion turns to one that’s philosophical and interpretive.

There’s a place for both a legal discussion and one that’s ethical/philosophical, of course. Someone else here mentioned that it was probably worded this way to intentionally open a discussion about legality by lighting a fire around ethics.

@Jessamyn

in reply to Mark Wyner Won’t Comply

Alex Strasheim

in reply to Mark Wyner Won’t Comply • 5 months ago • •

I think it's OK if training conforms to the license. But if the license says, you have to give attribution if you use the data, then training can't and won't conform to the license, so they shouldn't do it.

in reply to Mark Wyner Won’t Comply

David Chisnall (Now with 50% more sarcasm!)

in reply to Mark Wyner Won’t Comply • 5 months ago • •

I voted ‘no’, but I’d qualify that slightly as ‘only if it respects the license’. Everything I’ve released under an open license has an attribution requirement. If every output that has any non trivial similarity with my work comes with an attribution and that license condition on the output, that’s fine. Similarly, if there are any other relevant license terms and the model respects them, that’s fine.

I voted ‘no’ because no existing deep learning models can provide that guarantee.

in reply to Mark Wyner Won’t Comply

Jigme Datse

in reply to Mark Wyner Won’t Comply • 5 months ago • •

I've spotted something regarding this regarding "some org" (it may be the same one) and a general dislike of the ideas they are promoting. Now... I'm seeing results from 16 people... I would say without a "explicate clause" (and it would have to be one which old licenses would not be rolled in, unless updated) this is *not* in compliance with the licenses. LLMs are *commercial* use, and "openly licensed" usually has restrictions that "unrestricted commercial use" would violate.

in reply to Mark Wyner Won’t Comply

eyrea 🇨🇦

in reply to Mark Wyner Won’t Comply • 5 months ago • •

Sorry, what does "openly licensed" mean? Does it mean that it's open source, free for all, or does it mean that anyone can license it?

in reply to eyrea 🇨🇦

Mark Wyner Won’t Comply

in reply to eyrea 🇨🇦 • 5 months ago • •

@eyrea “openly licensed” basically means public domain. There are many levels of licenses in this area, ranging from “you can only use this under these specific conditions” to “anyone can do anything they want with this.”

@eyrea 🇨🇦

in reply to Mark Wyner Won’t Comply

Led Azemaj

in reply to Mark Wyner Won’t Comply • 5 months ago • •

were there some additional clarity added about the “openly licensed content”, myself and alike fellas would have made an informed choice towards the matter.

in reply to Led Azemaj

Mark Wyner Won’t Comply

in reply to Led Azemaj • 5 months ago • •

@leemay their ambiguity is part of the equation, I suspect.

@Led Azemaj

in reply to Mark Wyner Won’t Comply

No Time To Play

in reply to Mark Wyner Won’t Comply • 5 months ago • •

Free culture is a license to (re)use, not a license to abuse. Fuck disrespectful thieves who steal not because they need your stuff but because they think they deserve it instead.

in reply to Mark Wyner Won’t Comply

goedelchen

in reply to Mark Wyner Won’t Comply • 5 months ago • •

Some years ago I used some vocals from looperman.com. Most people put no commercial use and the obligation for attribution like "song title (ft. singer)" as restrictions. That clearly not usable for AI. But there was one person, who wrote do whatever you want, don't mention me, I don't want to be connected with whatever you are doing. IMHO, that could allow AI usage. So, it would be possible in exceptional cases.

⇧