Crawl Analytics are an add-on module to Keylime Toolbox and process server logs to determine what URLs search engines are crawling and what technical issues the search engine bots are encountering that prevent complete indexing of the site.
If you would like information about adding the Crawl Analytics module, just let us know at support@keylimetoolbox.com.
See our Crawl Analytics Set Up Guide for information on how to begin uploading logs to Keylime Toolbox.
The Value of Server Logs
Server web access logs provide crucial details about how search engines crawl a site. From the logs, Keylime Toolbox can provide details about which URLs are being crawled and how often, pinpoint technical issues, and measure improvements. Keylime Toolbox reports show how search engines see the infrastructure of the site, what pages they find valuable, and what obstacles are keeping the site from being fully crawled and indexed. Much of this data is only available from the web access logs.
Accessing Crawl Analytics
To access Crawl Analytics, choose a reporting group from the dashboard, then click Crawl Analytics.
For each report, you can choose the date range and the search engine bot. Keylime Toolbox reports the crawl behavior of the following search engine bots:
- Google
- Googlebot
- Googlebot-Image
- Googlebot-Mobile
- Bing
- Bingbot
- Bingbot Media
- Bingbot News and Blogs
- Bingbot Product
- Yahoo
- Naver
- Yandex
- Baidu
Crawl Trends
The Crawl Trends tab shows the overall distribution of status codes so you can monitor the percentage of number of URLs crawled successfully vs. ones that return an error.
The first section shows the distribution of status codes:
The second section shows the number of URLs crawled with each status code. You can select which status codes to graph:
Crawl Snapshot
The Crawl Snapshot provides more detailed data by day.
The summary section provides information on how many URLs were crawled by each search engine:
The second section shows the distribution of status codes for the chosen day and chosen bot. Hover over a bar to see the total.
The Top URLs section shows the 10 URLs crawled most often by each search engine bot. (The list of all crawled URLs is contained in the downloadable Excel file.) This report can show you at a glance if critical issues exist with the crawl (for instance, if the wrong URLs are being crawled or if significant canonicalization issues exist).
Log Details
Expand the Log Details section at the bottom of the Crawl Analytics tab to view the list of Excel files available to download.
These Excel files contain detailed information about how search engines crawl your site. Keylime Toolbox generates a file for each log file with the following tabs:
- Summary – similar to the information provided in the Keylime Toolbox interface reports.
- Site hierarchy by search engine – a breakdown of the number of URLs crawled in each subfolder and the count of each status code returned for each folder. This can help you pinpoint the specific sections of the site infrastructure with issues.
- Search engine-specific tabs, grouped by status code – lists of all URLs crawled by each search engine for each status code.
- Canonicalization – all of the parameters crawled, how many URLs were crawled with each parameter, the number of values detected for each parameter, and sample URLs.
- User agents – a list of all user agents found, and a reverse lookup on IPs addresses to ensure the hostname matches.