AI and fair use: In a first-of-its-kind case in India, ANI Media has taken legal action against ChatGPT maker OpenAI, alleging that the company unlawfully used its content to train large language models and other artificial intelligence systems. This legal battle is expected to set a landmark precedent by determining the accountability of AI developers for the content generated by their platforms, significantly impacting AI development, companies, and copyright law in the country.
Alongside ANI Media, several other domestic news publishers, their representative associations, media houses, and music labels have sought to intervene in the case as petitioners. The outcome of this lawsuit could establish foundational precedents for addressing similar disputes in the future.
READ | Global warming is speeding up faster than we thought
Industry-wide implications
The advent of AI and LLMs has thrust intellectual property issues, copyright laws, and questions about the boundaries of “fair use” into the spotlight. Media companies, content creators, music publishers, and record labels are increasingly scrutinising the legality and ethics of using their content. A contentious debate has emerged: while some argue that using IP to incentivise creation may violate creators’ rights, others contend that overly strict IP protection could stifle AI-driven progress.
In the evolving world of digital news dissemination, a symbiotic yet contentious relationship exists between news publishers and technology companies. Publishers rely on platforms to host their content and drive traffic, while tech companies—such as Meta—act as intermediaries, directing users to news websites and claiming a share of the advertising revenue.
Generative AI and the revenue debate
The advent of generative AI (GenAI) has further intensified the struggle over revenue distribution. GenAI platforms require vast datasets from the open web to train their models, yet the companies behind these platforms do not compensate the original content creators. In response to ANI Media’s claims, OpenAI blocked ANI in October by invoking its opt-out policy, which allows websites to exclude their text from automated scraping by AI scanners. Although this policy is based on fair use and exceptions for text and data mining (TDM) for scientific research, ANI contends that the measure is ineffective—its content is widely republished by other sites, enabling OpenAI’s crawlers to access it indirectly.
AI and fair use
In its defence, OpenAI argued that copyright protects only the expression of ideas, not the ideas or facts themselves. The company maintained that its models do not reproduce source content verbatim and that the language is sufficiently transformed to qualify for copyright exceptions. Ultimately, the case will turn on whether OpenAI’s use of ANI’s data qualifies as “fair use.” This legal doctrine permits limited use of copyrighted material without permission, based on factors such as the purpose and character of the use, the nature of the work, the amount used, and the effect on the work’s market value.
At the heart of these debates lie two principles: unrestrained innovation and open access to knowledge. The concept of permissionless innovation holds that new technologies and business models should be embraced unless clear harm is demonstrated, while free inquiry emphasises the public nature of data and facts, promoting widespread sharing for scientific progress.
In India, the current fair use framework lists numerous exceptions to copyright protection but does not explicitly address AI training models. This legal ambiguity leaves content creators vulnerable and complicates efforts to foster a balanced AI ecosystem.
Data sovereignty and regulatory hurdles
Another significant challenge is regulating data in the era of cloud computing and distributed AI models. OpenAI’s defence highlights issues of territoriality in data storage; data generated by Indian users is often dispersed across multiple servers and cloud environments, making the enforcement of national laws difficult.
Moreover, while major US cases—such as Reuters’ recent victory over Ross Intelligence—offer some perspective, fair use is likely to be interpreted differently in India. Already, major news publishers like The Atlantic are entering into contractual agreements to license their content to AI companies—a model that Indian firms might also consider to prevent future disputes.
Given India’s current lack of AI-inclusive provisions, policymakers should consider a permissionless innovation approach that stimulates AI development while safeguarding the rights of content creators. The outcome of the ANI versus OpenAI case will be pivotal, as it is expected to set a precedent for how copyrighted material may be used in AI training.
Balancing innovation with intellectual property protection will likely require incorporating AI-specific provisions into copyright law. By drawing inspiration from international models and tailoring them to its unique socio-legal context, India can clarify the boundaries of permissible data usage for AI—ensuring a sustainable future for both content creators and the burgeoning AI industry.