How is relevance determined?
Cludo Search is not a black box – the search relevance is based on a unique blend of machine learning and human customization on top of an algorithm.
Before customization or machine learning is applied, the Cludo search engine bases relevance using the Okapi BM25 algorithm. A rule of thumb for good search relevance – and SEO in general – is that a good content structure is key. While this also applies to Cludo, the relevance in a Cludo search engine is highly dependent on how the crawler is configured, i.e. what fields and content are indexed for the pages.
How crawler fields and boosting affect the search relevance
The crawler only indexes the content that it is configured to read. When searching for results, only the content indexed by the crawler is searchable.
The title and description of a page are always required for a crawler to pick it up. Additional fields such as meta description, subtitles, or intro text can also be set up. These fields will then be indexed separately, making sure these are not only searchable but also allowing for later adjustment to the relevance using boosting.
Why you should pay more attention to your Description field
It is important that the Description field is set to only include the actual page content and not static page elements such as navigation items or the footer. If these elements are indexed, they will be searchable, risking that irrelevant results appear for a given search because the search term exists outside of the main content.
What not to do
Setting the Description field to only index the meta description: A lot of the page content will not be searchable even though it’s relevant, and you risk ending up with an increase in searches without results.
Setting the Description to include too many elements, such as indexing the full body element: Each page will be indexed with static elements such as navigation items that do not relate to the specific page content. This will then be searchable and deemed as relevant as the main content of the page, decreasing relevance and increasing the number of ineffective searches.
What to do
Define your Description field more specifically using an XPath to make sure that the entire page content is indexed without including irrelevant elements such as navigation items or footer content.
Set up separate fields to index items such as the meta description or other content/data that is relevant for your search.
Apply boosting to fields if needed. For example, if you’re crawling the meta descriptions or meta keywords, it is recommended to add boosting for these fields. Start low and increase the boosting value as needed.
How to measure Relevance
Relevance for search can be measured using the Mean Reciprocal Rank (MRR). Mean Reciprocal Rank is a statistical measure which takes a list of possible search page rankings and defines an order by the position of the relevance page ranking and click-through rates. For example, if someone searches a term, clicks on the first-page result, that would be a perfect MRR score of 1. The reciprocal rank is calculated using:
|Search Query||Page Rankings||Clicked on Ranking||Rank in Rankings||Reciprocal Rank|
|dogs||3||1/3 = 0.33|
2. monkey bars
3. monkey pets
|monkey bars||2||1/2 = 0.5|
|cat||1||1 = 1|
A score of 0.5 is considered a good standard MRR score. This means the visitors are on clicking on the 2nd result or higher on average. It is possible to get the average MRR score of a search engine by reaching out to support.
How to impact the search algorithm
With Cludo, there are multiple ways to impact or customize the search algorithm, from determining which page results should show up for specific queries, to prioritize or de-prioritize certain areas of a website, or even using machine learning to dynamically adjust the order of results based on user activity.
The following Cludo tools can be used to customize the search algorithm:
- Page rankings
- Dynamic re-ranking
These tools all act on top of the Okapi BM25 algorithm in the search application.