A discussion came up on twitter about different content types and how Google determines what type of files they are. The discussion then moved to PDFs in the Google search results and how Google handles them.
John Mueller commented that Google automatically converts PDFs and similar document types into HTML format for indexing and ranking purposes.
FWIW we convert PDFs & other similar document types into HTML for indexing too, so theoretically there wouldn't be too much difference.
— John ☆.o(≧▽≦)o.☆ (@JohnMu) August 30, 2018
For those who are active in PDF SEO, this won’t be a surprise. Google has converted PDFs into HTML for quite some time, and included a link to the HTML version directly in the search results. So while you may have what you think is an awesome PDF, your users might actually prefer the HTML version and click this link instead.
Do note that for larger files in Google will not convert the entire PDF document into HTML. So there’s still some important content that could be within the PDF that is just simply not indexed because of the PDF size.
And there’s a lot of evidence that while PDF files can rank very well, they tend to rank well for the types of queries where someone is looking for something like a PDF, such as a search for a manual for example.
If you do have a large number of important PDFs indexed and that you want ranking well, it is worth considering whether having that content with in a PDF is the best solution for your users as well. For example, PDFs are hard to open and read on many mobile devices. And sizes of PDFs are often much larger than what the corresponding HTML version of the page would be, which is also a limitation on some slower connections depending on the size of the PDF.
PDFs aren’t the only file type that Google converts to HTML for indexing. Google also does it for .doc documents (such as Word documents), .xls (spreadsheets) and other similar non-HTML content types.
Latest posts by Jennifer Slegg (see all)
- Analyzing “How Google Search Works” Changes from Google - July 8, 2020
- Google Quality Rater Guidelines Update: New Introduction, Rater Bias & Political Affiliations - December 6, 2019
- Google Updates Quality Rater Guidelines: Reputation for News Sites; Video Content Updates; Quality for Information Sites - September 13, 2019
- Google Makes Major Changes to NoFollow, Adds Sponsored & UGC Tags - September 10, 2019
- Google Updates Quality Rater Guidelines Targeting E-A-T, Page Quality & Interstitials - May 17, 2019