Return to table of contents

Crawl Analytics are an add-on module to Keylime Toolbox and process server logs to determine what URLs search engines are crawling and what technical issues the search engine bots are encountering that prevent complete indexing of the site.

If you would like information about adding the Crawl Analytics module, just let us know at support@keylimetoolbox.com.

See our Crawl Analytics Set Up Guide for information on how to begin uploading logs to Keylime Toolbox.

The Value of Server Logs

Server web access logs provide crucial details about how search engines crawl a site. From the logs, Keylime Toolbox can provide details about which URLs are being crawled and how often, pinpoint technical issues, and measure improvements. Keylime Toolbox reports show how search engines see the infrastructure of the site, what pages they find valuable, and what obstacles are keeping the site from being fully crawled and indexed. Much of this data is only available from the web access logs.

Accessing Crawl Analytics

To access Crawl Analytics, choose a reporting group from the dashboard, then click Crawl Analytics.

For each report, you can choose the date range and the search engine bot. Keylime Toolbox reports the crawl behavior of the following search engine bots:

  • Google
    • Googlebot
    • Googlebot-Image
    • Googlebot-Mobile
  • Bing
    • Bingbot
    • Bingbot Media
    • Bingbot News and Blogs
    • Bingbot Product
  • Yahoo
  • Naver
  • Yandex
  • Baidu

Crawl Trends

The Crawl Trends tab shows the overall distribution of status codes so you can monitor the percentage of number of URLs crawled successfully vs. ones that return an error.

The first section shows the distribution of status codes:

crawl trends

 

The second section shows the number of URLs crawled with each status code. You can select which status codes to graph:

Googlebot Crawl Logs

 

Crawl Snapshot

The Crawl Snapshot provides more detailed data by day.

The summary section provides information on how many URLs were crawled by each search engine:

crawl snapshot

 

The second section shows the distribution of status codes for the chosen day and chosen bot. Hover over a bar to see the total.

crawl distribution

 

The Top URLs section shows the 10 URLs crawled most often by each search engine bot. (The list of all crawled URLs is contained in the downloadable Excel file.) This report can show you at a glance if critical issues exist with the crawl (for instance, if the wrong URLs are being crawled or if significant canonicalization issues exist).

Top Crawled URLs

 

Log Details

Expand the Log Details section at the bottom of the Crawl Analytics tab to view the list of Excel files available to download.

log details

 

These Excel files contain detailed information about how search engines crawl your site. Keylime Toolbox generates a file for each log file with the following tabs:

  • Summary – similar to the information provided in the Keylime Toolbox interface reports.
  • Search engine-specific tabs – lists of all URLs crawled by each search engine.
  • Site hierarchy – a breakdown of the number of URLs crawled in each subfolder and the count of each status code returned for each folder. This can help you pinpoint the specific sections of the site infrastructure with issues.
  • Canonicalization – all of the parameters crawled, how many URLs were crawled with each parameter, the number of values detected for each parameter, and sample URLs.
  • User agents – a list of all user agents found, and a reverse lookup on IPs addresses to ensure the hostname matches.
  • Status code-specific tabs – list all URLs that returned each status code.

Previous: Custom Segmentation Tool

Next: Google Search Console Downloads