Google Employees’ Thoughts on Indexing and Crawl Budget from the “Search Off the Record” Podcast

Contact Us

Services

Related Posts

Category

In a recent “Search Off the Record” podcast from Google Developers, Gary Illyes, Google’s webmaster trends analyst; Martin Splitt, a software developer at Google; and Lizzy Sassman, who looks after Google Search Central Site, gave us some understanding of web indexing and the crawl budget.

Let’s first understand what web indexing and crawl budgets actually are before getting ahead with the podcast.

Web Indexing

Web indexing is the same as indexing pieces of content in a copy on the first page or dividing all the data and information in relevant documents of any organization. When a web searcher types a query related to any subject, and the searcher gets relevant information on the SERPs, this process is done by search engine algorithms by indexing several websites’ content.

Here is when SEO techniques such as keywords, meta descriptions, XML site maps of websites, and other databases come into play.

Crawl Budget

Every website has a limited number of web pages that Google crawlers can index in a fixed timeframe. This means if you have more web pages than the crawlers can index in the search engine, then those left behind web pages that aren’t indexed will not come under the radar of Google crawlers, resulting in low traffic and performance of those web pages.

Now, let’s understand what the podcast says about web indexing and the crawl budget.

Should You Worry About the Crawl Budget?

According to Gary Illyes, more than 90% of websites don’t need to worry about the crawl budget. However, indexing a new page and crawling might take a few days because Google or any other search engine has millions of pages that are online and need indexing, and search engine platforms need storage for indexing, such as memory modules, hard disks, or SSDs, and it costs money.

So, in short, according to Martin Splitt, if you have a website with 20 pages or even a bit more, you don’t have to worry about the crawl budget. However, it may be a concern for those with blog websites that have many articles.

Martin Splitt: Yeah, “going to have.” It doesn’t even “have.” They assume it’s a crawl budget problem when they are very far from it, especially if it’s small websites with 20 pages.

I’m like, “No, very unlikely to be an issue.” But people are worried about it, and I’m not exactly sure where it comes from. I think it comes from the fact that a few large-scale websites do have articles and blog posts where they talk about crawl budget being a thing.

It is being discussed in SEO training courses. As far as I’ve seen, it’s being discussed at conferences. But it’s a problem that is rare to be had. Like it’s not a thing that every website suffers, and yet, people are very nervous about it.

How to know if the Webpage You Just Launched is going to be Crawled by Search Engine Bots?

Gary, Lizzy, and Martin further talked about whether your newly launched website is worth crawling by bots.

Gary Illyes: Yeah, Because, like we said, we don’t have infinite space, so we want to index stuff that we think– well, not we– but our algorithms determine that it might be searched for at some point, and if we don’t have signals, for example, yet, about a certain site or a certain URL or whatever, then how would we know that we need to crawl that for indexing?

Gary further talks about how site signal quality can increase its crawling frequency, which means if crawlers see a certain pattern or subject being popular on the internet and the site has those related subjects, then it will be crawled more frequently by search engine bots.

In short, if your website is not popular among search engines now, that won’t necessarily make it unindexed by search engines; instead, it’s just that the subject of your webpage is not that useful for NOW.

Can You Tell Google to Stop Crawling Your Website?

The simple answer is yes. You can decrease the crawl budget of your website by slowing down its loading time or adding status codes such as 503/429, etc.

This will stop Googlebot from crawling your website for a longer period of time, for example, a few days or weeks.

Furthermore, Martin Splitt talks about why e-commerce website owners are worried about their crawl budget. If the relevant content has 1 million product pages on the search engine, then crawlers will start visiting all of them, but after 100 websites, it will slow down the crawling and that will result in your site not getting a chance to be crawled and not coming up in search results.

Save Your Crawl Budget with 404 Errors

At a technical level, if your website is getting crawled for JavaScript more than 20-30%, then it’s time to get those JavaScript forbidden from crawling by adding 404/410 errors to the JavaScript. This won’t change the content of the page but instead will stop your website from wasting crawl budget on JavaScript.

Summary

So, the crawl budget and web indexing are matters of concern only when your website has many web pages and is new. Websites with fewer web pages don’t need to worry about indexing by search engines. Moreover, you can save your crawl budget by avoiding having it wasted on JavaScript. Also, use your crawl budget only where it matters, such as deciding which web page should be indexed based on its usefulness.

SEO or PPC? Which is better for your business in 2022SEO or PPC? Which is better for your business in 2022
The Third Phase of Meta’s “Community Accelerator” Program is hereThe Third Phase of Meta’s "Community Accelerator" Program is here
Google Employees’ Thoughts on Indexing and Crawl Budget from the “Search Off the Record” Podcast

In a recent “Search Off the Record” podcast from Google Developers, Gary Illyes, Google’s webmaster trends analyst; Martin Splitt, a software developer at Google; and Lizzy Sassman, who looks after Google Search Central Site, gave us some understanding of web indexing and the crawl budget.

Let’s first understand what web indexing and crawl budgets actually are before getting ahead with the podcast.

Web Indexing

Web indexing is the same as indexing pieces of content in a copy on the first page or dividing all the data and information in relevant documents of any organization. When a web searcher types a query related to any subject, and the searcher gets relevant information on the SERPs, this process is done by search engine algorithms by indexing several websites’ content.

Here is when SEO techniques such as keywords, meta descriptions, XML site maps of websites, and other databases come into play.

Crawl Budget

Every website has a limited number of web pages that Google crawlers can index in a fixed timeframe. This means if you have more web pages than the crawlers can index in the search engine, then those left behind web pages that aren’t indexed will not come under the radar of Google crawlers, resulting in low traffic and performance of those web pages.

Now, let’s understand what the podcast says about web indexing and the crawl budget.

Should You Worry About the Crawl Budget?

According to Gary Illyes, more than 90% of websites don’t need to worry about the crawl budget. However, indexing a new page and crawling might take a few days because Google or any other search engine has millions of pages that are online and need indexing, and search engine platforms need storage for indexing, such as memory modules, hard disks, or SSDs, and it costs money.

So, in short, according to Martin Splitt, if you have a website with 20 pages or even a bit more, you don’t have to worry about the crawl budget. However, it may be a concern for those with blog websites that have many articles.

Martin Splitt: Yeah, “going to have.” It doesn’t even “have.” They assume it’s a crawl budget problem when they are very far from it, especially if it’s small websites with 20 pages.

I’m like, “No, very unlikely to be an issue.” But people are worried about it, and I’m not exactly sure where it comes from. I think it comes from the fact that a few large-scale websites do have articles and blog posts where they talk about crawl budget being a thing.

It is being discussed in SEO training courses. As far as I’ve seen, it’s being discussed at conferences. But it’s a problem that is rare to be had. Like it’s not a thing that every website suffers, and yet, people are very nervous about it.

How to know if the Webpage You Just Launched is going to be Crawled by Search Engine Bots?

Gary, Lizzy, and Martin further talked about whether your newly launched website is worth crawling by bots.

Gary Illyes: Yeah, Because, like we said, we don’t have infinite space, so we want to index stuff that we think– well, not we– but our algorithms determine that it might be searched for at some point, and if we don’t have signals, for example, yet, about a certain site or a certain URL or whatever, then how would we know that we need to crawl that for indexing?

Gary further talks about how site signal quality can increase its crawling frequency, which means if crawlers see a certain pattern or subject being popular on the internet and the site has those related subjects, then it will be crawled more frequently by search engine bots.

In short, if your website is not popular among search engines now, that won’t necessarily make it unindexed by search engines; instead, it’s just that the subject of your webpage is not that useful for NOW.

Can You Tell Google to Stop Crawling Your Website?

The simple answer is yes. You can decrease the crawl budget of your website by slowing down its loading time or adding status codes such as 503/429, etc.

This will stop Googlebot from crawling your website for a longer period of time, for example, a few days or weeks.

Furthermore, Martin Splitt talks about why e-commerce website owners are worried about their crawl budget. If the relevant content has 1 million product pages on the search engine, then crawlers will start visiting all of them, but after 100 websites, it will slow down the crawling and that will result in your site not getting a chance to be crawled and not coming up in search results.

Save Your Crawl Budget with 404 Errors

At a technical level, if your website is getting crawled for JavaScript more than 20-30%, then it’s time to get those JavaScript forbidden from crawling by adding 404/410 errors to the JavaScript. This won’t change the content of the page but instead will stop your website from wasting crawl budget on JavaScript.

Summary

So, the crawl budget and web indexing are matters of concern only when your website has many web pages and is new. Websites with fewer web pages don’t need to worry about indexing by search engines. Moreover, you can save your crawl budget by avoiding having it wasted on JavaScript. Also, use your crawl budget only where it matters, such as deciding which web page should be indexed based on its usefulness.

Contact Us

Services

Related Posts

Category