Category: FAQ

How does the crawler index and delete pages?

Cludo’s strategy for crawling sites is based on finding as many pages as possible within the user-defined domains, indexing, and storing their content. The step-by-step process can be seen in detail in the diagram at the end of the article and will be explained further below: Crawling: Step-by-step process 1: Sites . . . Read more

How are compound words treated in searches?

What is a compound word? A compound word is a word that consists of two or more nouns that together form a word with its own meaning, which is very typical in some languages, like the Scandinavian languages. How is a compound word treated in a search? When searching, a . . . Read more

Where can I see previous webinars?

Cludo frequently hosts webinars. When a webinar is planned, all MyCludo users will automatically receive an invite in their inbox. Below, you can see a list of selected previous webinars – click the link to watch the recording! View recordings for previous webinars

How does Cludo index files?

As long as a file is machine-readable (not an image), Cludo is able to crawl its content along with the information sent with the HTTP headers. File titles It is possible to select how the file title should be extracted by selecting one of the following: Automatic The default option is Automatic, . . . Read more

Why is this file not indexed?

Once a crawler has crawled the defined domain(s), you may experience a specific file not being added to the search index. This will typically be due to one of the following reasons:

What are the crawlers’ IP addresses?

In some cases, the crawler may be blocked from indexing your website. To fix this, you may need to whitelist our IP address to allow the crawler to access the site. Our crawler’s IP addresses are: These are the IPs of the proxy that all of Cludo’s internal services use . . . Read more

What is the maximum file size Cludo can index?

Cludo’s crawlers can index files up to 15 MB. Anything larger can be pushed directly via Cludo’s API. The extraction of files removes the size of images and other irrelevant information prior to looking at the file size. For reference, the raw text of the entire Bible is around 5MB . . . Read more

How to delete a crawler

For security reasons, crawlers can only be deleted by Cludo staff. If you need to delete a crawler, please contact support and let us know the ID(s) of the crawler(s) you would like to delete.