Is On-site Search The Shortest Path For Scraper Bots?
Site Search makes it easy for us to find what we’re looking for online, and it’s the same with bad bots as well. That’s why Search Results pages at online retailers, media sites, and classifieds (including real estate) portals are among the pages most targeted by bots. Scraping Search Results pages allows bots to easily target product listings, SKUs, or reference IDs to scrape, for example.
Retailers, media sites, and classified portals are among the industries heavily targeted by bots that carry out scraping attacks, account takeovers, form and comment spam, card fraud, and other malicious activities. Analysis of bot traffic on Search Results pages indicates that nearly 32% of the traffic on those specific pages on E-Commerce sites comprises bots, while Media & Publishing sites see nearly 28% bot presence, and Classifieds approximately 24% of their traffic on Search Results pages coming from bots.
Scraper bots typically trigger the target Search Results page URL multiple times with different ITEM IDs to continuously update their database by scraping content from these pages. For example, if you see a huge spike of thousands of requests during a specific time interval, it could indicate bots attacking the same section with different ITEM IDs to extract the prices of thousands of items. These bots are mostly deployed from data centers, use multiple User Agents to try to evade detection, and typically make hits on a periodic basis, with a fixed or incremental number of hits on Search Results pages.
Evidence of price scraping includes regular bot traffic on Product Listing URLs, and price variations on other retail sites soon after prices are updated on the targeted site. Sophisticated bot mitigation is essential to protect Search Results pages since they can reveal marketing and pricing strategies to unscrupulous competitors and upstarts trying to make a quick buck by scraping valuable content.
This is why ShieldSquare provides the ‘Feed Fake Data’ option when a bot is detected. For example, assuming that the intention of bots visiting a Search Results page is to scrape pricing or product information, an online retailer could turn the tables on the attacker and provide fake pricing data to mislead them. Our customers can apply their own logic within their ShieldSquare instance to feed fake data to scraper bots, such as implementing a script to hike prices by 20%.
Before implementing custom responses such as feeding fake data, it’s important to know whether a scraper bot is from a price comparison website, a partner bot, or from a competitor. Being able to send multiple types of custom responses based on bot signatures helps you tackle them appropriately. You could also request your partner bots to declare themselves accordingly in their User Agents or implement a handshake mechanism to validate them.
Without question, sophisticated bot mitigation solutions are crucial for industries such as the ones mentioned here, since they have the most to lose from scraper bots. This is why conventional rulesets, IP blacklists, and other obsolete methods are ineffective when compared to technologies such as ShieldSquare’s Intent-based Deep Behavior Analysis (IDBA), which leverages AI and Machine Learning to ascertain the intent of visitors. Capturing intent enables IDBA to provide significantly higher levels of accuracy while detecting human-like sophisticated bots.
Originally published at https://www.shieldsquare.com on August 03, 2018.