Home / Technology / Apple Sued Over Allegedly Using Copyrighted Books for AI

Apple Sued Over Allegedly Using Copyrighted Books for AI

Apple Sued Over Allegedly Using Copyrighted Books for AI

Apple is facing a new lawsuit over claims that it trained its artificial intelligence (AI) models using copyrighted books without permission. Authors Grady Hendrix and Jennifer Robertson have alleged that Apple’s open-source AI model, OpenELM, included pirated versions of their books in its training dataset. The lawsuit was filed in the US federal court in Northern California and seeks to proceed as a class action.

OpenELM, launched last year, is a large language model (LLM) with up to 3 billion parameters. It was previously scrutinized for allegedly using YouTube subtitle data in its training, raising concerns about data licensing and copyright compliance. According to the complaint, Apple’s OpenELM model card, uploaded to Hugging Face, lists the RedPajama dataset as part of its training materials. RedPajama reportedly contains a dataset known as Books3, which the plaintiffs claim includes pirated copies of copyrighted books, including their own works.

The lawsuit requests a jury trial and seeks various forms of relief, including statutory and compensatory damages, restitution, disgorgement, and an order requiring Apple to destroy any AI models trained using the copyrighted materials. The plaintiffs argue that Apple improperly used copyrighted content under the guise of public datasets and contributed it to the AI research community.

Apple has responded by noting that OpenELM does not power any of its Apple Intelligence features or other AI tools in its devices. The company maintains that the model was released solely as a research contribution and not for commercial use, emphasizing that it was intended to benefit the wider AI community.

This lawsuit comes amid a broader wave of legal scrutiny targeting AI developers over copyright issues. In a related development, AI startup Anthropic agreed to pay $1.5 billion to settle a class-action suit from authors claiming their copyrighted works were used to train Anthropic’s Claude AI models, though the company did not admit any liability.

The Apple lawsuit highlights growing legal and ethical challenges in AI development. As AI models increasingly rely on massive datasets, ensuring copyright compliance and obtaining proper permissions remain critical issues for tech companies. The case is likely to influence how companies approach dataset sourcing for AI models in the future.