ai tar pit programming

A.I. Tar Pits – How the web is fighting back

In the information age, the term “tar pits” has taken on a new meaning. No longer solely referring to the prehistoric deposits of sticky asphalt, digital A.I. tar pits symbolize the treacherous pools of misleading or corrupted data on the internet, where artificial intelligence systems can become trapped. These pits started with tools like ‘Nepenthes‘, which are engineered to confuse A.I. crawlers, getting them stuck in a never ending maze of fake links and random text.

Sowing Confusion Among AI Models

In response to the unchecked scraping, an anonymous software developer, referred to as Aaron, developed Nepenthes, a piece of aggressive “malware” named after a carnivorous plant. This tool is designed to trap AI crawlers in an infinite maze of static files, where they become ensnared, thrashing without escape. The crawlers are then fed Markov babble—meaningless data intended to poison AI models. Aaron’s creation is a radical step, reflecting frustration with the costs and impacts of AI scraping on website owners.

OpenAI’s Response and the Challenges of Data Poisoning

OpenAI is the only AI company that has publicly commented on Nepenthes, confirming their awareness of such disruptions and their commitment to resilience while “respecting robots.txt”, which is a proven lie. Aaron’s initiative challenges AI companies to reconsider their scraping practices, potentially increasing their operational costs and slowing down their aggressive training timelines.

Iocaine and the Proliferation of Tarpits

Inspired by the concept of Nepenthes, software developer Gergely Nagy, known online as “algernon,” created Iocaine, his own version of a tarpit. Named after the fictional poison from “The Princess Bride,” Iocaine reportedly reduced bot traffic to Nagy’s site by 94 percent. The release of these tools has sparked widespread interest, with individuals and organizations seeking to deploy similar measures to protect their content and assert control over their data.

The Broader Impact and Ethical Considerations

While the deployment of tools like Nepenthes and Iocaine can impose additional server burdens, their creators argue that the primary cost is borne by AI companies who must expend resources on useless data. Critics worry that such tools could inadvertently increase the overall energy consumption associated with AI, but proponents like Aaron suggest that these tools could slow investment in AI technologies, forcing a reevaluation of scraping practices and possibly leading to compensation for content creators.

AI security researchers, such as Nathan VanHoudnos from Carnegie Mellon University’s CERT Division, view tarpits as one of many tools in a larger toolkit aimed at addressing the resilience of AI systems. VanHoudnos suggests that while AI companies are aware of the risks posed by poor-quality data, the social and political dimensions of the issue require broader solutions.

The Sociopolitical Statement of Tarpits

Computer scientists like Jürgen Geuter see Nepenthes and similar tools as more than just technological responses; they are sociopolitical statements against the perceived overreach of AI technologies. Geuter argues that the tools represent a form of resistance to the notion that AI should be integrated into every aspect of our digital lives without consent or compensation.

The Future of AI Poisoning

As creators like Nagy continue to refine their tarpits, and with the emergence of tools like Quixotic, the landscape of AI training data collection is changing. These developments suggest a growing movement towards making AI poisoning a norm, with the hope that AI companies will face increased difficulty in sourcing clean data, potentially leading to a renegotiation of the social contract between the tech sector and society.

This website proudly uses tar pits.

Be indigestible. Grow spikes.

Source: Arstechnica

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *