This month, Fortune.com reported that TikTok’s web scraper– called Bytespider– is strongly drawing up material to sustain generative AI designs. We saw the exact same thing when taking a look at bot management analytics produced by HAProxy Edge– our worldwide network that we ourselves utilize to serve traffic for haproxy.com. A few of the numbers we are seeing are relatively stunning, so let’s evaluate the traffic sources and where they stem.
Our own measurements, gathered by HAProxy Edge and filtered to traffic for haproxy.com, reveal a couple of intriguing figures:
-
Almost 1% of our overall traffic originates from AI spiders
-
Near to 90% of that traffic is from Bytespider, by Bytedance (the moms and dad business of TikTok)
While Bytespider is presently the most common AI spider, revealing that Bytedance is presently the leading source, we have actually formerly observed others (such as ClaudeBot) taking the leading area. AI spider activity, like all traffic, modifications with time.
What does AI traffic indicate for us– and you?
While we are mostly an innovation business, we likewise consider ourselves to be a content business; we purchase initial, human-authored material– such as documents or blog sites that offer valuable info to our users and larger audience.
Content-scraping bots existed long in the past LLMs began crawling the web for generative AI applications, and they have actually typically been thought about unfavorable visitors on content-heavy sites. Lots of organizations would not grant the scraping and possible re-use of their material, completely or in part, by a 3rd party.
AI spiders utilized by LLMs come with distinct threats and chances.
-
On one hand, an LLM may re-use the initial material completely, or with some adjustment, or remixed with other material at the level of an LLM token (approximately the level of a single word). It is not likely that a user will understand where the initial material originated from. In cases where an LLM “hallucinates”, a user may get incorrect details, for instance when asking for code or setup directions.
-
On the other hand, with lots of users turning to AI chatbots as an option to standard online search engine, this is ending up being a crucial channel for discovery and awareness. Companies may desire their brand name or item info to be provided by chatbots in reaction to user inquiries. If a user asks for a list of pertinent items, a company may desire their item to be consisted of in the list, along with functions and advantages.
While we do not restrict AI spiders on our site today, we will need to decide whether to continue to enable them or not. Other companies running content-heavy public sites will likely discover themselves needing to make the exact same choice: to secure the worth of their material, or to permit the dissemination of details about their brand name and items through these brand-new channels.