How to avoid duplicate results?

When searching, you may experience the same content appearing more than once in the results. Since a crawler is unable to index the same URL twice, this will always be due to the same content existing on multiple URLs. That is, of course, unless you have two crawlers that index the same pages added to the same engine.

Multiple pages with the same content

When running, the crawler will detect and crawl all available links on the site. Many links exist with various URL parameters, resulting in the same content existing on multiple unique URLs. To account for this, like with SEO in general, it is encouraged to use canonical tags.

“A canonical tag (aka “rel canonical”) is a way of telling search engines that
a specific URL represents the master copy of a page. Using the canonical tag prevents
problems caused by identical or “duplicate” content appearing on multiple URLs.
Practically speaking, the canonical tag tells search engines which version of a URL
you want to appear in search results.“
moz.com: Canonicalization

On top of canonicalization, the crawler, of course, respects noindex/nofollow as well as rules set in the robots.txt file. This can however be disabled within the crawler settings, though it is encouraged to have these settings enabled.

Multiple crawlers on a single engine

While each crawler attempts to keep a unique list of indexed pages, an engine with multiple crawlers can end up with the same unique page appearing twice in the results if some of those crawlers are indexing the same pages. It is recommended to keep each crawler logic clear and easy to understand to avoid accidentally having the same page indexed by one or more crawlers that are added to the same single engine.

What are you looking for?

Explore topics

Multiple pages with the same content

Multiple crawlers on a single engine

What are you looking for?

Explore topics

How to avoid duplicate results?

Multiple pages with the same content

Multiple crawlers on a single engine

Related Posts