Who said the world of SEO was boring? Learn the details of the latest data leak from the Google algorithm and its effects on organic web positioning.
What happened
Despite its high levels of security, data leaks at Google are nothing new. However, so far in the 21st century, there has not been one as large as the one that was announced last Monday, May 27.
Towards the end of March of this year, from GitHub, documents from the Google Search API content warehouse were disclosed, intended for internal use. From the end of March to the end of May, this documentation moved to the Hexdocs repository, in addition to circulating through other sources. And, then, the pot was uncovered.
On May 5, Rand Fishkin received an e-mail from an anonymous source who later identified himself as Efran Azimi. In his email, Azimi claimed to have access to internal documents from Google’s search service.
Although the leak from the Google repository would have been resolved on May 7, the documents were still on publicly accessible pages. Therefore, on May 24, Fishkin and Azimi held a video call, in which Azimi showed the documentation and explained his motivations to Fishkin.
How did everything go? Fishkin, self-proclaimed retired SEO who in the young 2010s was the king of SEO at MOZ with his “whiteboard fridays” that enchanted all of us wannabe SEOs who admired his style and his knowledge. Rand Fishkin is today the Founder and CEO of the audience analysis platform Sparktoro, and confirmed with former Google workers that these were legitimate documents.
Their next step was to turn to Mike King (technical SEO expert, Founder & CEO of iPullRank) to decipher the documentation. Once King had completed a first analysis of the document and shared the results with Fishkin, both published articles talking about the subject… And the rest is history.
What Google documentation was leaked?
While the documents are dated August 2023, it is reasonable to believe that they were still valid as of March of this year. They configure a kind of “guide” for the team of Xooglers (Google workers) in the search department.
Through codes and brief descriptions, the documentation explains Google Search attributes and API modules. That is, the interface that mediates between two systems so that they share information and functionalities. The information covers 2,596 Google Search API modules. These are specified in types, functions, and 14,014 attributes (i.e., characteristics for classification) that the interface considers when collecting data.
Thus, with these documents it is possible to know what data Google recovers from its users’ searches and their behavior before, during and after them. Apparently, information is recorded about the links and content of a page, and about user interactions with them. It is from this data and Google’s algorithms that a page will be better or worse positioned in the SERPs.
It seems, therefore, that this leak is a gift from heaven for SEO specialists. However, many of the entries refer to internal Google pages, which can only be accessed with a company credential. This implies that King’s analysis and the conclusions he and Fishkin drew should not be taken as definitive or complete.
Both experts, in fact, recognized a series of limitations in their work with documentation. Firstly, although the leaked documents indicate what data they record, they do not distinguish all the elements that influence the positioning of a page. It is known, however, that some are obsolete for the organization of the SERPs. On the other hand, the documentation does not allow us to unquestionably identify what information is used or in what ways it is done. It is also not possible to determine which characteristics have more weight than others when positioning a web page.
Even so, it should be noted that, to analyze the documents, they started from their extensive experience as SEOs and the practices of other professionals in the field. They also considered positioning systems similar to those explained by Google. In short: without being able to make precise statements in this regard, they reached more than probable conclusions, perfect for having a clearer idea of how to apply SEO to achieve ideal results.
The crux of the matter
Now, what is serious about this matter for those of us who perform SEO? Or, rather, why is the leaked data so controversial? On the one hand, because more than once Google executives denied the influence of certain metrics on web positioning. Likewise, they harshly criticized SEO professionals (Rand Fishkin among them) who claimed that Google organized SERPs based on data from Chrome, the time a user spends on a page, sandboxes, among other things.
On the other hand, the documents reveal that not all SEO practices really help when it comes to positioning a page. The documentation presents a positioning system where writing for search engines matters less than addressing users… Unless, for example, there is a recognized brand behind it, which will attract clicks like flies to honey.
In other words: with respect to page ranking, Google did not share even half of what happens in its search engine system. Furthermore, it was not only the lack of information that was problematic, but also the way Google handled the situation. More than not sharing even half of it, it made us dizzy, leading new SEOs down wrong paths and confusing them in the great labyrinth of the cybernetic minotaur. This complexity and lack of transparency made the task of optimizing pages for the search engine even more difficult, leaving many professionals frustrated and disoriented.
Leaked documents: Key points for SEO
Now let’s get to what should be your biggest concern: what, of everything that was leaked, is what an SEO expert needs to know. Although we cannot know it in its entirety, in their articles Fishkin and King recovered (in a very technical way) several of the aspects of the Google API that affect SEO. Some of the most important are:
- Navboost. It is one of the internal algorithms of the search engine. Together with Glue (another algorithm) they order SERP entries according to the number and types of clicks they receive, in a range of up to 13 months. The more and better clicks a page receives, the higher it will rank, as it indicates to the search engine that the link is trustworthy.
- Chrome Data. Google Search collects information about user behavior in Chrome. It uses this data to, among other things, define the sitelinks it will present in the SERPs. What URLs will appear? Those with which users interact the most. That is, where they click more and stay longer.
- Quality raters. Google Search documentation confirms that the search engine uses information, provided by its quality raters through the Ewok platform, to determine search results.
- Sandbox. Although Google denied it, for some time now, it was suspected that the search engine used “sandboxes” to determine the reliability of new domains. Thus, no matter how good SEO a page has, while it goes through this algorithm it will be as if it did not exist in searches. It’s a matter of patience, at least that’s what we assume.
- Twiddlers. They are re-classification “functions” that are applied after the main Google Search algorithms. They thus allow you to specify the entries that will be displayed in the SERPs.
The previous points explain some of the tools that Google Search uses to filter the pages that it will position in the SERPs. But King and Fishkin are not limited to them: they also mention certain “internal” aspects that affect the ranking of the pages. These are:
Brand Popularity
As the leaked documentation demonstrates, the best of SEOs have little to do with an established brand that users will inevitably click on. Also, it was discovered that Google Search identifies pages that are a small personal site. Given their tendency to prioritize large companies, this is not encouraging for small-scale businesses.
Titles
Analyzing the documents, King found a mention of a titlematchScore. As far as SEO is concerned, this means that, when generating the SERPs, Google will look for the title of the pages and the search performed to match.
Bold and font size.
As King deciphered, using the avgTermWeight and fontsize attributes , Google Search registers terms in bold or with larger fonts. Consequently, it is logical to assume that these typography modifications can help achieve a good level of web positioning.
Dates
The leaked documents include the bylineDate, syntacticDate, and semanticDate attributes . Respectively, these compare the dates that appear on the page, which are extracted from the title or URL, and the content of the page. From this, King infers that, for a page’s ranking in the SERPs, it is important that the different dates it includes coincide .
Finally, some factors that negatively influence the positioning of a page in search results are:
- Potential user dissatisfaction , established by the page receiving a low number of clicks.
- That the page is “global” or that it is not associated with a specific location.
- UX or navigation issues.
- The inclusion of pornographic elements .
- Use exact match domains (e.g. “www.shoes-for-women.com”).
What can we say?: The user is the key
Being objective and, broadly speaking, the most serious thing about the matter is that the leaked documentation openly contradicts many public statements made by Google executives. To make matters worse, with some of these statements those professionals who sought to deny certain assumptions about Google’s internal mechanisms were disqualified. Of course, since he who laughs last laughs best, this data leak made it clear who was telling (or deducing) the truth.
On the other hand, the seriousness of the matter lies in the deception suffered by those of us who are dedicated to SEO. Certainly, the mystery of many of the measures applied by the search engine serves to disorient spammers and guarantee the quality of the pages. However, the leaked documents show that, sometimes, the weight of a brand can leave quality in the background. They also demonstrate that positioning is not as “organic” as one might suppose. And that depends less on the skill of an SEO professional than on what Google considers as many or few clicks.
In any case, the leaked information serves to confirm a trend that Google and other browsers have been assuming in recent years: producing for the user, not for a robot. Just look at the elements that contribute to positioning in the SERPs. User experience and navigability, fonts that facilitate reading, quality content that matches the search carried out…
The focus is on what the user needs. And satisfactorily resolving such a need is the most organic measure one can do to position oneself well today. After all, quality clicks are not given by Google (or maybe they are, but we cannot say for sure), but by those who browse the web.
None of this is new, nor does it imply a very big change for those who have been applying SEO practices oriented to the user experience in their production. In short: the Google Search API content store data leak confirmed (among other things) that the consumer is the key. But we already knew that.
How to improve your SEO practices for Google?
Now, adapting to this new paradigm from one day to the next can be a bit difficult. Especially if you don’t work with marketers with a solid team of SEO experts who can advise you on the best ways to reach the user. At a general level, for now, there are some issues that (beyond what has already been mentioned) help generate optimal positioning in the SERPs:
- First of all, it is essential to reinforce organic positioning and establish your brand using resources that go beyond SEO. Google Ads, newsletter campaigns, presence on networks, word of mouth… It doesn’t matter what medium you choose, the important thing is that it produces specific searches for your page, which lead to quality clicks.
- Second, according to Mike King, Google algorithms tend to consider “fresh” content to be quality content. Therefore, one way to improve your chances of achieving good organic positioning is to keep your pages updated. In addition, including external links , ideally these should also be recent.
- A third way to improve your SEO practices is to design your pages with the user in mind. Identify the routes you will take to get there and once you have entered; the information you will seek; the actions you will take… Clearly, this is easier said than done. Therefore, the fourth question to take into account can only be one:
- You’re lost? Don’t you understand how to proceed? Then don’t be afraid to consult an agency specializing in positioning and web design, such as Enjoy Minder.
We have a large team of experts in SEO, SEM, web design, social networks and more. We offer unique solutions, tailored to what you need. More than ten years of experience in the field guarantee us to help you achieve the search engine presence that your projects need. Don’t let Google’s algorithms swallow your projects. I surfaced with a couple of clicks: contact us.