Earlier this week, I took a look at using [ORDER BY](https://www.bennadel.com/blog/3846-you-can-use-order-by-and-limit-within-update-and-delete-statements-in-mysql-5-6-37.htm) and [LIMIT](https://www.bennadel.com/blog/3846-you-can-use-order-by-and-limit-within-update-and-delete-statements-in-mysql-5-6-37.htm) clauses within an [UPDATE](https://www.bennadel.com/blog/3846-you-can-use-order-by-and-limit-within-update-and-delete-statements-in-mysql-5-6-37.htm) statement in MySQL 5.6.37 in order to limit the scope of the data-mutation. In that post, I said that such a technique can be used to drive a multi-worker / multi-threaded migration in which each worker “claims” and then “processes” a subset of records within a “task table”. Since I find this technique to be relatively simple but also wickedly useful, I thought it would be fun to put together a migration demo in Lucee CFML 5.3.6.61.

In this demo, we’re going to be using a database table as a message queue of sorts. Typically, you don’t want to use a database table as a message queue for interprocess communication - you’d want to use a more specialized technology like Amazon SQS or RabbitMQ or the like. However, in this context, the database table in question is being pre-allocated to represent a finite set of work. As such, a database table is perfectly acceptable and a very pragmatic choice at that.

To set the stage, this “migration” requires us to download and reprocess a set of image files. I don’t actually care about the reprocessing itself - that’s incidental. As such, I’m just going to download the image and consider that to be a successful task execution. The real meat of the post is how we manage the downloading and the incremental traversal of the shared database table across multiple threads / workers.

To do this, I’m going to create and populate a MySQL table named, migration_task. Each record in this table represents a single, cohesive unit-of-work to be executed (in this case, downloading a given imageUrl). Each task / record can exist in one of three states:

  • open - Ready to be claimed.
  • processing - Claimed by one of the workers and is in mid-process.
  • complete - Successfully processed.

In reality, you might want additional statuses, such as “failure”, based on your particular migration needs. However, for this demo, these three are sufficient to illustrate my technique.

This table is going to be created and populated ahead of time. As such, it represents a finite set of work that needs to be consumed. And, the algorithm for consuming this table is thus:

  1. A worker claims a set of records, changing the state from open to processing. This is where our UPDATE + ORDER BY + LIMIT SQL statement comes into play.

  2. Said worker iterates over the claimed records in-turn and processes them each individually.

  3. Each record is marked as complete if it was processed successfully; or, is “reset” (ie, put back into an open state) upon failure.

In the following MySQL table, the claimedBy value is the unique ID of the worker / thread that is processing a given record:

CREATE TABLE `migration_task` (
	`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
	`status` varchar(20) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
	`iteration` tinyint(3) unsigned NOT NULL,
	`claimedBy` char(35) DEFAULT NULL,
	`claimedAt` datetime DEFAULT NULL,
	`completedAt` datetime DEFAULT NULL,
	`imageUrl` varchar(500) NOT NULL,
	PRIMARY KEY (`id`),
	KEY `byClaim` (`claimedBy`,`status`),
	KEY `byStatus` (`status`,`completedAt`,`claimedAt`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

To populate this database table, I literally ran a for-loop from 1  to 50,000  and inserted a quasi-random remote imageUrl . Again, the goal of this post isn’t to showcase image processing - the goal here is to look at the traversal and processing of this table.

#coldfusion #sql

Using UPDATE + ORDER BY + LIMIT To Drive Multi-Worker Migrations In Lucee CFML 5.3.6.61
1.15 GEEK