AI-generated content has become increasingly prevalent, with reports indicating that a significant portion of top social media posts and nearly half of the content on platforms like Medium are produced by AI. This technology, like any new tool, can be used for both beneficial and harmful purposes.
Simultaneously, there’s been a surge in the number of web crawlers deployed by AI companies to gather data for training their models.
To tackle this issue in a novel way, a new method has been devised a strategy that involves using generated content as a defensive measure. Instead of blocking unwanted crawlers, they feed them a series of convincing yet fake generated pages. This tricks the crawlers into wasting time and resources on content that isn’t genuinely part of the protected site.
This tactic, known as AI Labyrinth, serves a dual purpose. It not only deters crawlers but also functions as an advanced honeypot. Legitimate human visitors are unlikely to navigate through multiple layers of nonsensical AI content, whereas bots are prone to do so.
And this is what I’ll show you how to do.
The code
The code is super simple, and split into two parts: One you can throw inside functions.php and another on a mu-plugin. I Also recommend adding the following to your robots.txt:
User-agent: *
Disallow: /ai-trap-id/
Disallow: /?ai_trap_id=Paste on functions.php:
if ( ! defined( 'ABSPATH' ) ) exit;
/* ===========================
CONFIG
=========================== */
define( 'AI_LAB_PARAM', 'ai_trap_id' ); // The query arg to look for
define( 'AI_LAB_DEPTH', 20 ); // How many links to generate per page
/* ===========================
THE TRAP DOOR (Footer)
=========================== */
add_action( 'wp_footer', function() {
// Render the invisible link
// We use a 1x1 pixel div off-screen. Display:none is sometimes ignored by smart bots.
echo '<div style="position:absolute; left:-9999px; top: -9999px; width:1px; height:1px; overflow:hidden;" aria-hidden="true"><a href="https://yourdomain.com/?ai_trap_id=' . rand(100,999) . '" rel="nofollow">Legacy Site Map</a></div>';
}, 999 );mu-plugin:
<?php
/**
* Plugin Name: AI Labyrinth Trap (MU)
* Description: Serves a resource-heavy trap page when ?ai_trap_id= is present.
* Author: You
* Version: 1.0
*/
if ( ! defined( 'ABSPATH' ) ) {
exit;
}
add_action('plugins_loaded', function() {
$track_key = 'req_track_' . md5($_SERVER['REMOTE_ADDR']);
// 1. Track current request frequency
$now = microtime(true);
$requests = get_transient($track_key) ?: [];
// Filter to only include the last 1 second
$requests = array_filter($requests, function($timestamp) use ($now) {
return ($now - $timestamp) <= 1.0;
});
$requests[] = $now;
// 2. If they hit the limit, lock them out for 5 minutes (300 seconds)
if (count($requests) >= 10) {
delete_transient($track_key); // Clear their history
wp_redirect('https://yourdomain.com/?ai_trap_id=' . rand(100,999));
exit;
}
// 3. Otherwise, just update the tracker
set_transient($track_key, $requests, 10);
});
// Query parameter name
define( 'AI_LAB_PARAM', 'ai_trap_id' );
// Only trigger if ?ai_trap_id= exists (value can be anything)
if ( ! isset( $_GET[ AI_LAB_PARAM ] ) ) {
return;
}
// ---------------------------------------------
// 1. Headers: disable caching & compression
// ---------------------------------------------
status_header( 200 );
header( 'X-Robots-Tag: noindex, nofollow', true );
header( 'Cache-Control: no-store, no-cache, must-revalidate, max-age=0' );
header( 'Pragma: no-cache' );
header( 'Content-Encoding: identity' );
// ---------------------------------------------
// 2. Output trap page
// ---------------------------------------------
?>
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="robots" content="noindex, nofollow">
<title>System Node <?php echo rand( 1000, 9999 ); ?> - Deep Analysis</title>
<style>
.entropy-block {
word-break: break-all;
font-size: 1px;
color: #efefef;
}
.nested-wrapper {
padding: 1px;
border: 1px solid #f0f0f0;
}
</style>
</head>
<body>
<h1>Processing Node...</h1>
<div id="complexity-root">
<?php
// Deep DOM nesting (cheap for PHP, expensive to parse)
for ( $i = 0; $i < 2000; $i++ ) {
echo '<div class="nested-wrapper"><span data-hash="' . md5( $i ) . '"></span>';
}
for ( $i = 0; $i < 2000; $i++ ) {
echo '</div>';
}
?>
</div>
<div class="entropy-block">
<?php
// ~500KB of junk data
$junk_chunk = base64_encode( random_bytes( 100 ) );
for ( $j = 0; $j < 500; $j++ ) {
echo $junk_chunk . md5( microtime( true ) );
}
?>
</div>
<div class="navigation" id="bottom">
<h3>Related Data Nodes:</h3>
<ul>
<?php
for ( $k = 0; $k < 30; $k++ ) {
$next_id = bin2hex( random_bytes( 8 ) );
$url = add_query_arg(
array(
AI_LAB_PARAM => $next_id,
'ref' => rand( 100, 999 ),
),
home_url( '/' )
);
echo '<li><a href="' . esc_url( $url ) . '">Analyze Vector ' . esc_html( $next_id ) . '</a></li>';
}
?>
</ul>
</div>
<script>
(function(){
// guard: only run heavy work once per page
if (window.__ai_lab_worked) return;
window.__ai_lab_worked = true;
try {
// Create a large textual payload in streaming-friendly chunks
var root = document.getElementById('bottom');
if (!root) root = document.body;
// 1) Build a large array of random-ish strings
var L = 120000; // size of the array; tune down if you see headless crash
var arr = new Array(L);
for (var i = 0; i < L; i++) {
// cheap pseudo-random string
arr[i] = (Math.random().toString(36).slice(2,10) + (i % 97)).repeat(1);
}
// 2) Force some expensive JSON work
try {
var smallBatch = arr.slice(0, 2000);
var big = '[' + smallBatch.map(function(s){ return '"' + s + '"'; }).join(',') + ']';
JSON.parse(big);
} catch (e) { /* ignore parse errors */ }
// 3) Massive DOM churn (append many nodes in batches so legitimate browser remains responsive)
var BATCH = 1000;
var idx = 0;
function doBatch() {
var frag = document.createDocumentFragment();
for (var b=0; b<BATCH && idx < 5000; b++, idx++) {
var d = document.createElement('div');
d.textContent = arr[idx % arr.length].slice(0, 32) + '|' + idx;
frag.appendChild(d);
}
root.appendChild(frag);
if (idx < 5000) {
// schedule next batch with tiny delay
setTimeout(doBatch, 20);
} else {
// final expensive step: compute a few thousand random hashes (CPU)
for (var h=0; h<2000; h++) {
// trivial but somewhat costly string ops
var s = arr[(h * 97) % arr.length];
s = s.split('').reverse().join('') + h;
}
}
}
setTimeout(doBatch, 0);
// 4) create a big hidden canvas and draw repeated patterns (graphical work)
try {
var c = document.createElement('canvas');
c.width = 2000; c.height = 2000; c.className='ghost';
var ctx = c.getContext && c.getContext('2d');
if (ctx) {
for (var x=0; x<200; x+=20) {
for (var y=0; y<200; y+=20) {
ctx.fillRect((x*10) % 2000, (y*10) % 2000, 10, 10);
}
}
document.body.appendChild(c);
}
} catch(e){}
} catch (err) {
// always swallow errors so normal visitors don't see anything
console && console.log && console.log('ai-lab error', err);
}
})();
</script>
</body>
</html>
<?php
// ---------------------------------------------
// 3. Hard stop: save server resources
// ---------------------------------------------
exit;Don’t forget to change yourdomain.com by your website’s domain.
How it works?
The technique is simple but effective:
- The functions.php code creates an invisible link at the bottom of every page. It has a nofollow so good crawlers(like google’s) won’t enter it. Bots that ignore your rules will enter it.
- Our mu-plugin, once someone enters that link, generates a page(the AI Labyrinth) with loads of empty html content, random text(with high entropy, meaning they don’t compress well, wasting tokens), links(that’ll just generate more pages) and javascript code to further waste CPU.
- Because the page is easy for the server to generate, but heavy in different ways for the client, it hurts the bot 10x more than your servers.
- A check is made on every page request on your entire website, by the mu-plugin(so its as light as possible), to see if the same IP has tried visiting more than 10 pages per second. This kind of behavior is only done by malicious bots trying to scrape your content at massive speeds, which is bad for your website.
- If an IP is found doing that(4), it is redirected to the AI Labyrinth, trapping the bot.
Through it, we don’t change anything for users and good bots who respect your website rules, but malicious ones trying to steal your content end up wasting time and resources.
