How to use Hash Joins in MySQL 8?

How to use Hash Joins in MySQL 8?

In MySQL 8.0.18 there is a new feature called Hash Joins, and I wanted to see how it works. In this post, you'll understanding Hash Joins in MySQL 8. See when and how to use Hash Joins in MySQL 8.

In MySQL 8.0.18 there is a new feature called Hash Joins, and I wanted to see how it works, and in which situations it can help us. Here, you can find a nice detailed explanation about how it works under the hood.

The high-level basics are the following: if there is a join, it will create an in-memory hash table based on one of the tables and will read the other table row by row, calculate a hash, and do a lookup on the in-memory hash table.

Great, but Does This Give Us Any Performance Benefits?

First of all, this only works on fields that are not indexed, so that is an immediate table scan, and we usually do not recommend doing joins without indexes because it is slow. Here is where Hash Joins in MySQL can help because it will use an in-memory hash table instead of Nested Loop.

Let’s do some tests and see. First, I created the following tables:

CREATE TABLE `t1` (
`id` int(11) NOT NULL AUTO_INCREMENT ,
`c1` int(11) NOT NULL DEFAULT '0',
`c2` int(11) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
KEY `idx_c1` (`c1`)
) ENGINE=InnoDB;

CREATE TABLE `t2` (
`id` int(11) NOT NULL AUTO_INCREMENT ,
`c1` int(11) NOT NULL DEFAULT '0',
`c2` int(11) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
KEY `idx_c1` (`c1`)
) ENGINE=InnoDB;

I have inserted 131072 random rows into both tables.

mysql> select count(*) from t1;
+----------+
| count(*) |
+----------+
| 131072   |
+----------+

First test – Hash Joins

Run a join based on c2, which is not indexed.

mysql> explain format=tree select count(*) from t1 join t2 on t1.c2 = t2.c2\G
*************************** 1\. row ***************************
EXPLAIN: -> Aggregate: count(0)
-> Inner hash join (t2.c2 = t1.c2) (cost=1728502115.04 rows=1728488704)
-> Table scan on t2 (cost=0.01 rows=131472)
-> Hash
-> Table scan on t1 (cost=13219.45 rows=131472)

1 row in set (0.00 sec)

We have to use explain format=tree to see if the Hash Join will be used or not. It is going to be a nested loop, which I think it is very misleading. I have already filed a bug report because of this, and in the ticket, you can see some comments from developers saying:

The solution is to stop using traditional EXPLAIN (it will eventually go away).

So, this is not going to be fixed in traditional explain and we should start using the new way.

Back to the query; we can see it is going to use Hash Join for this query, but how fast is it?

mysql> select count(*) from t1 join t2 on t1.c2 = t2.c2;
+----------+
| count(*) |
+----------+
| 17172231 |
+----------+
1 row in set (0.73 sec)

0.73s for a more than 17m rows join table. Looks promising.

Second Test – Non-Hash Joins

We can disable it with an optimizer switch or optimizer hint.

mysql> select /*+ NO_HASH_JOIN (t1,t2) */ count(*) from t1 join t2 on t1.c2 = t2.c2;
+----------+
| count(*) |
+----------+
| 17172231 |
+----------+
1 row in set (13 min 36.36 sec)

Now, the same query takes more than 13 minutes. That is a huge difference, and we can see a Hash Join helps a lot here.

Third Test – Joins Based on Indexes

Let’s create indexes and see how fast a join based on indexes is.

create index idx_c2 on t1(c2);
create index idx_c2 on t2(c2);

mysql> select count(*) from t1 join t2 on t1.c2 = t2.c2;
+----------+
| count(*) |
+----------+
| 17172231 |
+----------+
1 row in set (2.63 sec)
2.6s

Hash Join is even faster than the Index-based join in this case.

However, I was able to force the optimizer to use Hash Joins even if an index is available by using ignore index:

mysql> explain format=tree select count(*) from t1 ignore index (idx_c2) join t2 ignore index (idx_c2) on t1.c2 = t2.c2 where t1.c2=t2.c2\G
*************************** 1\. row ***************************
EXPLAIN: -> Aggregate: count(0)
-> Inner hash join (t2.c2 = t1.c2) (cost=1728502115.04 rows=17336898)
-> Table scan on t2 (cost=0.00 rows=131472)
-> Hash
-> Table scan on t1 (cost=13219.45 rows=131472)

1 row in set (0.00 sec)

I still think it would be nice if I can tell the optimizer with a hint to use Hash Joins even if an index is available. This way we do not have to ignore indexes on all the tables. I have created a feature request for this.

However, if you read my first bug report carefully, you can see a comment from a MySQL developer, which indicates this might not necessary:

BNL (Block Nested-Loop) will also go away entirely at some point, at which point this hint will be ignored.

That could mean they are planning to remove BNL joins in the future and maybe replace it with Hash join.

Limitations

We can see a Hash Join can be powerful, but there are limitations:

  • As I mentioned, it only works on columns that do not have indexes (or you have to ignore them).
  • It only works with equi-join conditions.
  • It does not work with LEFT or RIGHT JOIN.

I would like to see a status metric as well to monitor how many times Hash Join was used, and for this, I filled another feature request.

Conclusion

Hash Join seems like a very powerful new join option, and we should keep an eye on this because I would not be surprised if we get some other features in the future as well. In theory, it would be able to do Left and Right joins as well, and, as we can see, in the comments on the bug report, that Oracle has plans for it in the future.

MySQL databases web-development

Bootstrap 5 Complete Course with Examples

Bootstrap 5 Tutorial - Bootstrap 5 Crash Course for Beginners

Nest.JS Tutorial for Beginners

Hello Vue 3: A First Look at Vue 3 and the Composition API

Building a simple Applications with Vue 3

Deno Crash Course: Explore Deno and Create a full REST API with Deno

How to Build a Real-time Chat App with Deno and WebSockets

Convert HTML to Markdown Online

HTML entity encoder decoder Online

Best MySQL DigitalOcean Performance – ScaleGrid vs. DigitalOcean Managed Databases

Compare ScaleGrid MySQL vs. DigitalOcean Managed Databases - See which offers the best MySQL throughput, latency, and pricing on DigitalOcean across workloads.

Hire Web Developer

Looking for an attractive & user-friendly web developer? HourlyDeveloper.io, a leading web, and mobile app development company, offers web developers for hire through flexible engagement models. You can **[Hire Web...

Why Web Development is Important for your Business

With the rapid development in technology, the old ways to do business have changed completely. A lot more advanced and developed ways are ...

Important Reasons to Hire a Professional Web Development Company

    You name the business and I will tell you how web development can help you promote your business. If it is a startup or you seeking some...

Hire Dedicated eCommerce Web Developers | Top eCommerce Web Designers

Build your eCommerce project by hiring our expert eCommerce Website developers. Our Dedicated Web Designers develop powerful & robust website in a short span of time.