Tag: Crawling

Crawlers

15 Jul, 2024 Configuration 0

What is a crawler? A crawler is usually the first step to creating a functional site search. The crawler is in charge of checking all available pages on the given domain(s) and indexing any pages and files according to the configuration. Once indexed, the pages and files can be searched . . . Read more

How to set up date crawling

09 Apr, 2024 Configuration 0

You can configure your crawler to index dates, which can be utilized for various purposes such as date filtering or Date Freshness Boosting. To set this up: When selecting the Date data type, the indexed value will appear in the Date format. You can verify the correct date format by . . . Read more

Ways to index content

14 Jan, 2024 FAQ 0

Indexing content is an important part of having an efficient search functionality. There are three distinct ways to index content that can be utilized. Crawling Crawling is the most common method of indexing content and is automatically performed when setting up and activating a crawler. Crawling is the activity of . . . Read more

How to use the Update Content tool

07 Dec, 2023 Configuration 0

When you have a page that you recently published or updated and would like this to be indexed for your search immediately, the Update Content tool can come in handy.

Cludooff/Cludoon

23 Mar, 2023 FAQ 0

If there are certain parts of the content on a page that you would like the crawler to ignore, this can be achieved using cludooff/cludoon.

How does the crawler index and delete pages?

15 Feb, 2023 FAQ 0

Cludo’s strategy for crawling sites is based on finding as many pages as possible within the user-defined domains, indexing, and storing their content. The step-by-step process can be seen in detail in the diagram at the end of the article and will be explained further below: Crawling: Step-by-step process 1: Sites . . . Read more

Filtering searches

09 Feb, 2023 Configuration 0

If you would like an existing engine to only show results for a specific area, this can be done by adding a filter in the script. Scoped search allows you to limit search results to a specific section or type of content within the website instead of searching across the whole . . . Read more

How does Cludo index files?

09 Feb, 2023 FAQ 0

As long as a file is machine-readable (not an image), Cludo is able to crawl its content along with the information sent with the HTTP headers. How to enable or disable file indexing By default, the crawler is configured to index files for the specified domain. You can enable or disable . . . Read more

Page Inventory

06 Feb, 2023 Configuration 0

If you’re ever wondering about the number of pages in your search results or find the need to check up on any indexed content, Page Inventory is here to help. Page Inventory will provide you with an overview of indexed content for all your crawlers to provide you with a . . . Read more

How to avoid duplicate results?

02 Feb, 2023 FAQ 0

When searching, you may experience the same content appearing more than once in the results. Since a crawler is unable to index the same URL twice, this will always be due to the same content existing on multiple URLs. Note: Having two crawlers that index the same pages added to . . . Read more

1 2 3

What are you looking for?

Explore topics