WTF are gray bots?

By Marty Swant • April 28, 2025 •

Ivy Liu

As a Digiday+ member, you were able to access this article early through the Digiday+ Story Preview email. See other exclusives or manage your account.This article was provided as an exclusive preview for Digiday+ members, who were able to access it early. Check out the other features included with Digiday+ to help you stay ahead

A new category of bots is quietly becoming one of the biggest forces reshaping the web: gray bots.

Spiders, crawlers and scrapers have long plagued digital media. But as generative AI mutates the threat for advertisers and publishers one security firm has created the term “gray bots” to illustrate the blurred line between real and fake traffic — and between legitimate activity and harmful exploitation.

They’re not a household name like “generative AI” or “AI agents,” but gray bots are already far beyond niche. Experts see the growing wave as an urgent challenge for the digital economy. Recent cybersecurity reports show some websites are seeing millions of scraper bot requests each month largely tied to generative AI activity. Others are noticing AI crawlers are driving more general invalid traffic and GIVT-related ad request volumes.

Between December and February, cybersecurity firm Barracuda — which coined the term “gray bots” in an April report — tracked millions of generative AI bot requests, including one web app that saw nearly 10 million in a month and another that logged over 500,000 in a single day. The most active were Anthropic’s ClaudeBot and TikTok’s ByteSpider, according to Barracuda.

“These bots are actually kind of throwing off a lot of the analytics and a lot of the metrics companies are driving towards,” Adam Khan, VP of global security operations at Barracuda, told Digiday. “It’s coming in mass amounts and there’s sometimes obfuscation where they’re hiding, where they’re coming from, or who it’s purposed for.”

Gray bots present a paradox. They fuel innovation for AI search engines like ChatGPT and Perplexity, collect data for large language models and enable emerging use cases like automated browsing and shopping. They also strain digital infrastructures, distort website and ad analytics, and extract value without consent. Their rise also will reshape how publishers, advertisers, creators and e-commerce companies adapt to an internet increasingly built for non-human traffic.

The topic’s real-world impact on publishers also came up last week in a new AI-related lawsuit filed by Ziff Davis, the parent company of digital media websites including Mashable, CNET, PCMag and Lifehacker. The complaint, filed April 24, alleges OpenAI’s GPTBot significantly increased scraping activity on Ziff Davis websites even after the publisher followed OpenAI’s own instructions for its robots.txt file — a tool that tells bots which parts of a site they can or can’t access.

What are gray bots?

Gray bots are automated programs — like AI agents, scrapers, and crawlers — that don’t fall neatly into categories of “good” or “bad” bots. That’s because there’s plenty of nuance to their benefits and drawbacks. They drive innovation, but also still could cause serious challenges by bypassing ads, scraping content, inflating traffic metrics, and consuming resources without providing value in return.

Security experts say publishers should be concerned about gray bots — and their shapes and sizes. Beyond bots impacting analytics and stealing content, AI-driven ad impressions harm core metrics like clickthrough and conversion rates that marketers care about, said Zach Edwards, an independent security researcher.

“Enterprise publishers must find a way to increase the costs on AI companies for illicitly scraping content — to level the playing field — and ensure they are getting adequately compensated for original content they produce,” Edwards said. “…If you own a website hosting any substantial original content, you’re under a gray bot attack even if you aren’t doing anything about it yet.”

Why are they a risk for publishers particularly?

Gray bots pose growing risks for publishers and e-commerce businesses. They can scrape intellectual property, harvest pricing data for competitors. They can also access gated content without permission, which undermines paywalls and affiliate models. For ad-supported sites, bots can mimic human behavior, inflating engagement metrics and triggering ad calls without delivering value. This can distort campaign performance, increase invalid traffic (IVT), and drive up infrastructure costs through excessive server and bandwidth use.

Publishers are already feeling the weight of new automated traffic. One example is Wikipedia, with its Wikimedia Foundation disclosing this month that bots and AI scrapers drove a 50% rise in infrastructure costs since January 2024. The surge has increased operational costs and risks degrading user experience, prompting Wikimedia to explore sustainable access models and advocate for more responsible use of its content.

How big is the problem?

Last year, AI scraper bots from companies like Meta, Apple and others accounted for 16% of general invalid traffic (GIVT), according to DoubleVerify. The company also noted GIVT in 2024 nearly doubled, with four-quarter volume surpassing 2 billion ad requests for the first time ever.

Another firm, HUMAN Security, recently reported identifying and blocking more than 215 billion scraping attempts in 2024, with the vast majority targeting retail, e-commerce, and media platforms. The largest growth category was in technology, SaaS, and services, which saw nearly 500% growth year over year. Another sector targeted with bot-scraping was the travel and hospitality industry, which experienced more than 125% growth year over year. In some extreme cases, bots made up more than 90% of monthly traffic to certain product pages.

Where are the bots coming from?

Major AI players like OpenAI, Perplexity, Google, and TikTok are among the companies driving more gray bot traffic. For example, popular chatbots like ChatGPT and Claude deploy types of gray bots to find and retrieve content, which could create issues for websites that monetize their content via ads.

Many publishers and site administrators have been blocking AI crawlers via robots.txt while others — such as Reddit — have negotiated paid access agreements. Site owners are also looking for new ways to protect and preserve resources through rate limiting, traffic monitoring, and technical safeguards to maintain control while navigating the rise of AI-driven traffic. However, there’s also a risk that blocking AI scrapers can impact how websites show up in generative AI search platforms like ChatGPT and Perplexity.

Are they hard to detect?

Gray bots are increasingly difficult to detect due to their rapid growth and expanding range of functions — especially as more companies build custom scrapers to train AI models. Their dual nature also complicates decisions about whether to block them, especially when some serve legitimate purposes.

While bot-enabled AI tools like shopping and browsing agents are still emerging, researchers have found they often don’t identify themselves. This ambiguity has fueled demand for systems that can not only identify bots but also evaluate their intent and manage access accordingly.

And just like traditional bots, many among the new wave can still disguise themselves as regular browsers, ignore robots.txt rules for scraping, and use tactics like device spoofing to evade detection. Some even mimic human behavior well enough to bypass CAPTCHA.

What are some ways to handle the bot influx?

The EU’s legal framework — including the AI Act, GDPR, Digital Services Act, and Copyright Directive — places strict limits on data scraping by treating AI providers as data controllers, requiring disclosure of training data, and enforcing copyright protections.

Meanwhile, security startups like Skyfire and Cequence are looking to actually monetize it. This week, the companies announced a new system to help websites identify AI bots, verify their identity and uses, enable site owners to charge legitimate agents, and give agent owners a way to automatically pay for access. The startups hope their “Know Your Agent” framework will see bots not just as threats, but as new types of users.

Skyfire’s goal is to help companies “claw back a lot of lost business that today is being eaten up by folks who are unauthorized essentially to have access,” said Amir Sarhangi, the Founder and CEO of Skyfire Systems.

“It’s one of those things where you know a lot of the concern is about if websites are cannibalizing their business,” Sarhangi said. “Well, guess what? The site’s already being accessed today. It’s already being scraped. It’s not like you’re all of a sudden opening the doors to a bunch of new cannibalization because of this model.”

https://digiday.com/?p=576430

More in Media

Member Exclusive

Digiday+ Research: Publishers’ subscription revenue is up this year, and they’ll focus on growing it even further

May 8, 2025

Subscriptions is one area where publishers are seeing more revenue, and, in turn, ramping up their plans to strengthen that part of their business in the coming months.

Member Exclusive

Media Briefing: Economic uncertainty and brand safety jitters shadow publishers’ pitch at NewFronts

May 8, 2025

At the IAB NewFronts, news publishers are hammering home a familiar message to advertisers that their content is brand safe, at a time when stakes are high.

Content & Commerce

Q&A: Uber Ads hits $1.5 billion run rate as it hires first head of measurement

May 8, 2025

As Uber’s ad business scales past $1.5 billion, it hires first head of measurement.

	
		OSZAR »