Compliance with W3C HTML standard is it a ranking factor?
HTML compliance
In fact, nobody ever said that a page should be coded in HTML to be indexed by search engines, the most notable of them at least.
Hence why a page which does conform to this standard should therefore be better ranked?
Plain text pages are indexed as well as HTML pages, that can be verified by placing a unique content in a text file and include it in the site map.
On the other hand, it is certain that if a page is a mixture of text and tags so that it is impossible to recognize each from other, this will not help to index the useful content.
Google implements a parser that works like a browser displaying only the basic textual content. We can check pages with a such tool is to know how crawlers see them.
HTML Optimization
Using HTML (or XHTML) is preferable for better indexing because it contains some tag crucial for ranking:
- <a> link tags to find linked pages. Although Google is able to recognize links elsewhere in the code including in JavaScript, this tag contains an anchor, which is important.
- The heading tags <h1>, <h2>, etc.. which report on page structure and the topics of sections.
- The meta tags that give instructions to robots.
- The alt attribute for images.
If these tags are missing, or if they are not well-formed and not recognized this is detrimental to ranking.
In conclusion, a strict compliance to standard is not essential, while the content can be correctly interpreted.