
Australian authors say they are “livid” and feel violated that their work was included in an allegedly pirated dataset of books Meta used to train its AI.
The parent company of Facebook and Instagram is being sued by authors in the United States, including Ta-Nehisi Coates and the comedian Sarah Silverman, for copyright infringement.
In court filings in January it was alleged chief executive Mark Zuckerberg approved the use of the LibGen dataset – an online archive of books – to train the company’s artificial intelligence models despite warnings from his AI executive team that it is a dataset “we know to be pirated”.
The Atlantic has published a searchable database where authors can type in their name to see what of their work is included in LibGen dataset.
It includes books published by many Australian authors, including some by former prime ministers Malcolm Turnbull, Kevin Rudd, Julia Gillard and John Howard.
Holden Sheppard, the author of Invisible Boys, a hit young adult novel that has been adapted into a series on Stan, said two of his books and two short stories were included.
He said he was “fucking livid” to learn they could have been used to train Meta’s AI.
“I am furious to learn my books have been again pirated and used without my consent to train a generative AI system which is not only unethical and illegal in its current form, but something I am vehemently opposed to,” he said.
“No consent has been obtained from any of the thousands of authors who have had our work taken, and not a single cent has been paid to any of us,” he said.
“Given Meta is worth literally billions, they are absolutely in a financial position to compensate authors fairly. More importantly, they are not above the law and are required to obtain consent.”
He said the government needed to act on it now.
“We need AI-specific legislation introduced in Australia that requires generative AI developers or deployers to put in a range of measures to comply with existing copyright legislation.”
Journalist and author Tracey Spicer said two of her books – The Good Girl Stripped Bare and Man-Made – are included. The latter deals with the rise of artificial intelligence.
She said she felt violated when she realised her works were in the data set.
“It was a gut-punch. Authors don’t make a lot of money, especially in a small market like Australia,” she said.
“This is peak technocapitalism.”
She said there should be a class action in Australia, and urged authors to contact their local federal MPs.
“It’s a bit rich for big tech to cry poor. These companies can afford to pay for content, or they can create synthetic datasets.”
Alexandra Heller-Nicholas, an award-winning film critic and author of ten books on cult movies including 1000 Women in Horror and Cinema Coven, found eight of her books, as well as books she co-edited, included.
“It is no understatement that this is my lifetime’s work. I’m upset, angry, but mostly exhausted,” she said.
Heller-Nicholas also called for the federal government to act.
The Australian Society of Authors has put a call out – on Facebook – for authors to get in contact to advocate on their behalf against the use of their works.
The society’s chair, Sophie Cunningham, said she had been in contact with dozens of authors who had their works included, and said there was generally an ill feeling about how it had been done.
“Massive corporations are profiting and reducing writers to serfs,” she said. “Most writers are lucky to get $18,000 per year … and they’re not even having the right to be involved in which work [is used].”
Cunningham said Meta was treating writers with contempt.
Meta declined to comment, citing the ongoing litigation. The company has reportedly lobbied the Trump administration to declare, via executive order, that training AI on copyrighted data is fair use.
Earlier this month, Melbourne publisher Black Inc Books caused concern among writers, literary agents and the industry’s peak body when it asked its authors to consent to their work being used to train artificial intelligence.
Some AI companies have begun entering into agreements with publishers for the use of their work, including OpenAI, which signed a deal with the Guardian in February for use of Guardian content in ChatGPT.
