Tag: Crawling
What is a crawler? A crawler is usually the first step to creating a functional site search. The crawler is in charge of checking all available pages on the given domain(s) and indexing any pages and files according to the configuration. Once indexed, the pages and files can be searched . . . Read more
You can configure your crawler to index dates, which can be utilized for various purposes such as date filtering or date freshness boosting. To set this up: When selecting the Date data type, the indexed value will appear in the Date format. You can verify the correct date format by . . . Read more
Indexing content is an important part of having an efficient search functionality. There are three distinct ways to index content that can be utilized. Crawling Crawling is the most common method of indexing content and is automatically performed when setting up and activating a crawler. Crawling is the activity of . . . Read more
When you have a page that you recently published or updated and would like this to be indexed for your search immediately, the Update Content tool can come in handy.
If there are certain parts of the content on a page that you would like the crawler to ignore, this can be achieved using cludooff/cludoon.
Cludo’s strategy for crawling sites is based on finding as many pages as possible within the user-defined domains, indexing, and storing their content. The step-by-step process can be seen in detail in the diagram at the end of the article and will be explained further below: Crawling: Step-by-step process 1: Sites . . . Read more
If you would like an existing engine to only show results for a specific area, this can be done by adding a filter in the script. Scoped search allows you to limit search results to a specific section or type of content within the website instead of searching across the whole . . . Read more
As long as a file is machine-readable (not an image), Cludo is able to crawl its content along with the information sent with the HTTP headers. File titles It is possible to select how the file title should be extracted by selecting one of the following: Automatic The default option is Automatic, . . . Read more
If you’re ever wondering about the number of pages in your search results or find the need to check up on any indexed content, Page Inventory is here to help. Page Inventory will provide you with an overview of indexed content for all your crawlers to provide you with a . . . Read more
When searching, you may experience the same content appearing more than once in the results. Since a crawler is unable to index the same URL twice, this will always be due to the same content existing on multiple URLs. That is, of course, unless you have two crawlers that index . . . Read more