Recently, Google announced that Google Crawl will only fetch the first 15MB of HTML data from a website to determine the SEO ranking of a website. If a website has an HTML size of more than 15MB then the rest will not affect the ranking of the site.
This is according to a statement that can be read from the recently updated Google documentation page.
Any resources referenced in the HTML such as images, videos, CSS and JavaScript are fetched separately.
After the first 15 MB of the file, Googlebot stops crawling and only considers the first 15 MB of the file for indexing.
The file size limit is applied on the uncompressed data.Google Documentation
John Mueller clarified via twitter that the limitation only applies to HTML only. While the images embedded on the page are not included in the calculation. In other words the embedded image or other resource even though its size is more than 15MB has no effect on the HTML size.
“It’s specific to the HTML file itself, like it’s written,”John Mueller
“Embedded resources/content pulled in with IMG tags is not a part of the HTML file.”John Mueller
So what impact does this have on SEO?
To ensure that our content can be reached by Google Bot, the important part must be placed in the upper middle section.
But please also note that the 15MB limit is a very large size for a website page. In general, the size of a web page or blog is less than 1MB. If more than that it will usually be very slow to access.
Many tools can be used to determine the size of a website page. One of them is Google Pagespeed Insights.
This 15 MB limit is not an algorithm that has just been implemented by Google. Actually, it has been implemented for a long time, but has only been included in the documentation recently, according to John Mueller’s clarification via twitter.
“This is not a change, it’s just not previously been officially documented…”John Mueller