The work of a search engine optimization (SEO) consultant revolves around one central theme:


Especially keyword data.

We collect it from a variety of third- and second-party sources, perhaps even via self-made tracking tools, to then start crunching the numbers and eventually delivering valuable insights to our bosses, clients, or prospects.

However, only running a few tools and employing some analytical magic is not going to cut it.

We also need to be thoughtful about how we interpret data from keyword tools and deal with any inaccuracies or inconsistencies.

Just like any software program, each keyword tool has a characteristic mechanism in place for collecting, aggregating, and manipulating data.

Similarly, tools’ workings affect how they handle queries and present the output keyword data.

An essential part of a marketer’s job function is to validate whether the data values stored for these keywords are represented in a consistent and unambiguous form.

Meaning, is the keyword data I am working with accurate?

The simple answer:


Comparing data values of different tool providers for a set of keywords already proves to contain large inconsistencies – not only in data values but also in if and how your output data is presented.

This study, by my company, OAK, attempts to find clarity by exploring data accuracy and reliability with respect to second- and third-party keyword tool data.

Specifically, this study examines the following topics:

  • Data collection: How do keyword tools collect their data?
  • Data handling: How do keyword tools manipulate data?
  • Data validation: Validating keyword data values.
  • Role as an SEO Consultant.

The primary purpose of this study is to grow awareness about the complexity surrounding keyword data values and tool providers’ data collecting and processing mechanisms.

Google Search Console

Let us begin at the beginning: Google Search Console.

It is a second-party tool from Google that collects behavioral data for a single domain or entity and, after manipulation, injects the data into the front-end interface.

The mere fact that Google collects and processes the data might you wonder: how close to reality are the data values of the projected data?

This question poses an immediate challenge: Search Console data is not 100% validatable.

Luckily, Google is, to a point, transparent and provides various explanations for why your data values do not reflect reality or add up as you could expect.

A few of them are:

  • To protect the privacy of the user. The click is sometimes not credited to the search term. Search Console, however, does register the click, causing discrepancies between the table and diagram data.
  • The same can apply to branded queries.
  • Clicks could come from bots.
  • In some cases, selecting certain filter combinations can also lead to differences between the diagram and table data.

Unfortunately, only the G-giant has access to the exact data values, which means verifying the accuracy of Search Console data is a troublesome process.

The reliability of keyword data increases, however, with third-party tools.

These are tools like SEMrush, Ahrefs,,, and many others.

To find answers, this study explores the mechanics of these keyword tools apply.

Unfortunately, the companies running these tools disclose little to no information about how they collect, aggregate, or manipulate their data.

It seems fair.

A chef doesn’t just give away her or his world-famous recipe. Hence we attempt to generate insights with the help of the following approaches:

  • Using and comparing the tools.
  • Inquiring at the customer service departments.
  • Reading the FAQ sections and utility pages.

1. Data Collection: How Do Keyword Tools Collect Their Data?

In general, there are five kinds of resources through which keyword tools accumulate their data:

Google Ads API / Keyword Planner

Keyword data is gathered directly from Google’s keyword database through the Google Ads API.

As is the case with Search Console, Google Ads first manipulates the data before injecting it into the database.

Clickstream Data by Aggregators & Data Brokers

Clickstream is nothing more than data derived from consumers’ online surfing behavior.

Aggregators gather this data in a variety of ways.

Large, until recently active aggregators, were, for example, Jumpshot or Hitwise.

Wherefrom do they get their data?

  • Browser extensions and plugins
  • A homemade plugin or extension of the aggregator itself.
  • They pay external third party browser plugins to share consumer data
  • They pay internet service providers for access to the data in an “anonymized” data feed.

The aggregators then sell the data to keyword tools such as Ahrefs, SEMrush, and Moz, among others.

#seo #tools #data analysis

Keyword Data Accuracy & Data Manipulation by SEO Tools
1.45 GEEK