Nearly every site has an internal site search engine that searches only that site, and itās not usually a part of your list of security risks. Theyāre also typically non-entities for search engine optimization. After all, search engines generally donāt use search bars. There is, however, a way for spammers to hijack your internal site search to create spam.
A Real-Life Example of Internal Site Search Spam
Hereās how internal site search spam works: A spammer uses your internal site search URL string to ācreateā low-value pages that contain keywords and URLs. Then, they create a third-party page that links to those low-value internal site search URLs on your site and use Google Search Console to request indexation of the page. That prompts Google to crawl from the third-party page to your low-value internal site URLs, getting them discovered and potentially indexed.
Too confusing? Hereās an example.
One of our clients, weāll call them Client-A, had an internal site search URL string that looked like this: https://www.client-a.com/search?q=search-term. The spam was generated using this scenario:
- Spammer-B identifies the internal site search results page URL on https://www.client-a.com.
- Spammer-B creates a page on its own spam site, for example, https://www.spammer-b.com/some-crummy-page.
- That page (https://www.spammer-b.com/some-crummy-page) links to a bunch of Client-Aās internal site search URLs. For example, perhaps they do a search for the query āfree viagra www.spam-site-c.comā in the internal site search, and it generates a zero-results internal search results page at this URL: https://www.client-a.com/search?q=free-viagra-www.spam-site-c.com. The spammers grab that URL and create a link to it on https://www.spammer-b.com/some-crummy-page.
- Spammer-B requests that Google index https://www.spammer-b.com/some-crummy-page in Google Search Console.
- Google crawls https://www.spammer-b.com/some-crummy-page and discovers the links to Client-Aās internal site search pages, like https://www.client-a.com/search?q=viagra-free-viagra-www.spam-site-c.com.
Why would spammers do this? The theory is that they are creating mentions of a URL and a keyword on a domain that has authority, some of which would then transfer to Spammer-Bās domain.
Itās incredibly unlikely that this would actually result in a transfer of value to the spammerās site, but that doesnāt stop them from trying. What it does do is create a whole mess of low-value, zero-result internal site search URLs for Google to crawl, wasting your crawl equity.
To see if you have this problem today, look for your internal site search URLs in your Google Search Console Pages āDiscovered – currently not indexedā report.
How to Prevent Internal Site Search Spam
Better yet, before internal site search spam becomes an issue, take evasive action.
There are two ways to prevent internal site search spam from taking hold on your site. You can choose to block internal site search results from being indexed or from being crawled.
Block Indexing
Of the two choices, blocking indexation is the better option because it ensures that these internal site search results pages wonāt be indexed by Google and wonāt appear in search results. Simply use a meta robots tag in the head of the page with a noindex attribute to effectively prevent indexation of the page. The line of code looks like this:
<meta name=ārobotsā content=ānoindexā>
Block Crawling
Unlike using a meta robots noindex tag, choosing to block crawling using a disallow command in the robots.txt file doesnāt prevent Google from indexing internal site search results pages. It only requests that bots not crawl the pages indicated.
Itās important to note, however, that if Google has already associated internal site search spam with your site, just blocking crawling wonāt stop it from being indexed. You need to first noindex the affected pages. After they have been deindexed, then you can disallow them with the robots.txt file to save crawl budget.
Preventing internal site search spam is a simple yet crucial step in SEO and site security. It prevents bad actors from manipulating your internal site search results for nefarious purposes and protects your crawl budget from being wasted in the process. Take the time to check your Google Search Console reporting today for evidence of this type of spam and, if found, take steps to remove it today.
