Month: January 2023
It is possible to set a specific time schedule for the crawler to run at a specific time of day. Currently, it is not possible to set the time schedule for a crawler via MyCludo. To configure time-scheduled crawling, submit a support ticket, informing your timezone and at which time of . . . Read more
Async crawling is meant for websites where content is loaded asynchronously (AJAX-generated content). AJAX-generated content allows the web page and web browser to process data without having to reload the page. For example, if you hit a “Submit” button on the page, AJAX processes the information and updates the content . . . Read more
In some cases, the crawler may be blocked from indexing your website. To fix this, you may need to whitelist our IP address to allow the crawler to access the site. Our crawler’s user agent is: Our crawler’s user agent can be referred to simply as cludo. User-agent: cludoAllow: * Our . . . Read more
Cludo’s crawlers can index files up to 15 MB. Anything larger can be pushed directly via Cludo’s API. The extraction of files removes the size of images and other irrelevant information prior to looking at the file size. For reference, the raw text of the entire Bible is around 5MB.
For security reasons, crawlers can only be deleted by Cludo staff. If you need to delete a crawler, please contact support and let us know the ID(s) of the crawler(s) you would like to delete.
The indexability of a file is not defined by its extension (e.g. “.pdf”), but rather by the content type, as returned in the HTTP headers. In the list below, we have added extensions as examples. Supported file types
Our crawler will always attempt to make as many requests as possible, often requesting multiple pages per second, but the actual frequency of requests depends on the server response from the website. Some websites might also have a crawl delay set in their robots.txt, which can impact how many requests . . . Read more
Stop words are fill words that don’t provide any context and they will be ignored in the search query, thereby increasing relevance in search. There is a list of stop words for every language that is supported by Cludo. Below you can find the stop words for a few languages: . . . Read more
The MyCludo interface is optimized for desktop use but is also responsive, meaning it will automatically adjust to fit various screen sizes, including mobile devices and tablets. Compatible MyCludo browsers Compatible Template browsers Cludo-provided templates are compatible across all device types—desktop, mobile, and tablet—on browsers with over 1% global usage. . . . Read more
Lemmatization is the process of grouping together the inflected forms of a word so they can be analyzed as a single item, identified by the word’s lemma, or dictionary form. Unlike stemming, lemmatization depends on correctly identifying the intended part of speech and meaning of a word in a sentence, . . . Read more