What is XPath?

XPath (XML Path Language) is a path expression, used to refer to a specific subset of XML or HTML markup. In MyCludo, XPath can be used in the crawler configuration to define a specific part of the HTML of the page to use for a field. Using the HTML markup, different IDs and class names can be used to navigate down to specific values like image paths, text, and numbers, which can later be used in the search engine for various filters or display purposes.

Consider the following markup for a page being crawled:


<body>
    <div class="product overview">
        <div class="product media">
            <div class="product gallery">
                <img class="product current-image" src="/images/xiamo_mix_fold_2-front.png" />
                <img class="product next-image" src="/images/xiamo_mix_fold_2-back.png" style="display:none;" />
            </div>
        </div>
        <div class="product info detailed">
            <h1>XIAOMI MIX FOLD 2</h1>
            <div id="description-list">
                <ul>
                    <li>A foldable 8-inch WQHD+ AMOLED Inner display with LTPO technology,
                        a 6.5-inch Samsung AMOLED E5 external screen.
                    </li>
                    <li>Qualcomm 8+ Gen 1 5G processor, Adreno 730 GPU.</li>
                    <li>Rear Camera: 50MP Sony IMX766 primary with OIS support,
                        13MP OmniVision OV13B ultra-wide snapper, and a telephoto camera
                        with
                        2x optical zoom.
                    </li>
                    <li>12GB LPDDR5 RAM, 256GB/512GB/1TB UFS 3.1 Storage.</li>
                    <li>4500mAh battery, and 67W wired charging.</li>
                    <li>MIUI 13 is based on Android 12 OS, with full Google Mobile Service and
                        Google
                        Play Store pre-installed.</li>
                </ul>
            </div>
            <price>39.95</price>
        </div>
    </div>
</body>

Examples

Using the HTML above, the following XPaths would return values:

//price/text() will return the price of the phone: 39.95.

(//div[@class='product gallery']/img/@src)[1] would return the first image path from the ‘product gallery’ class: /images/xiamo_mix_fold_2-front.png

//div[@id="description-list"]/*/li[2]/text() will return the processor of the phone: Qualcomm 8+ Gen 1 5G processor, adreno 730 GPU.

Tags: