In the past, I’ve looked at implementing a “Natural Sort order” in both JavaScipt and ColdFusion. However, my ColdFusion-based approach has always created a new array as part of its algorithm. Most of the time this doesn’t matter; however, yesterday at InVision, I ran into a situation in which I needed to conditionally perform either a numeric sort or an in-place natural sort on an Array. As such, I wanted to quickly revisit the idea of a natural sort, this time using an in-place sorting algorithm in Lucee CFML 5.3.6.61.

To quickly recap what a “Natural sort order” is, it’s a technique in which the numbers embedded within a string are treated “atomically”. Meaning, each string of numeric characters is treated as a single numeric value. That is, each number is compared “as a number” and not as a string of “characters”. This approach to sorting is more in alignment with how humans think about sorting.

In my previous ColdFusion algorithm, I implemented a natural sort by mapping one Array onto another intermediary Array; sorting the intermediary Array; and then, mapping the intermediary Array back onto a final, sorted Array. Today’s algorithm uses the same basic constructs; however, rather than creating an intermediary array, I’m creating a look-up struct / index of “normalized value”. Then, I’m performing an in-place sort on said normalized values.

To demonstrate, I’ve created an Array of values that will sort differently when using a native sort vs. a “natural sort”. I then output the results of both sorting approaches:

<cfscript>

		// To demonstrate the two sorting approaches, we're going to embed numbers within the
		// following values that will sort differently using an Alphabetical sort (where the
		// numbers are compared as Strings) and a Natural sort (where the numbers are
		// compared as numbers).
		reports = [
			"Client 5 Data",
			"Client 100 Data",
			"Client 20 Data",
			"Client 70 Data (22)",
			"Client 70 Data (104)",
			"Client 412 Data"
		];

		dump( label = "Native Sort", var = sortReports( reports ) );
		echo( "<br />" );
		dump( label = "Natural Sort", var = naturalSortReports( reports ) );

		// ------------------------------------------------------------------------------- //
		// ------------------------------------------------------------------------------- //

		/**
		* I perform an IN PLACE sort of the given collection using a text-sort.
		* 
		* @collection I am the collection being sorted.
		*/
		public array function sortReports( required array collection ) {

			return( collection.sort( "textNoCase" ) );

		}

		/**
		* I perform an IN PLACE NATURAL sort of the given collection in which the embedded
		* numbers are compared as numbers, not just strings of numeric characters.
		* 
		* @collection I am the collection being sorted.
		*/
		public array function naturalSortReports( required array collection ) {

			// The trick to performing a "natural sort" in which numbers are treated "as
			// numbers", is that we're actually going to perform a "text sort"; but, we're
			// going to do so with numbers that have been NORMALIZED IN LENGTH (which is what
			// makes it safe to sort based on a the character data). To do this, we need to
			// create an index (look-up hash) that maps our collection values onto the
			// normalized values.
			var natraulValueIndex = {};

			for ( var item in collection ) {

				naturalValueIndex[ item ] = normalizeEmbeddedNumbers( item );

			}

			// Now that we have our value -> normalized-value mapping, we can perform the
			// IN-PLACE SORT with an operator that compares the mapped values.
			collection.sort(
				( a, b ) => {

					var aValue = naturalValueIndex[ a ];
					var bValue = naturalValueIndex[ b ];

					return( aValue.compareNoCase( bValue ) );

				}
			);

			return( collection );

		}

		/**
		* I attempt to normalize the numbers embedded within the given value, making them all
		* 12-digits long. Any embedded number that is less than 12-digits long will be front-
		* padded with zeros:
		* 
		* foo 3 bar -> becomes -> foo 000000000003 bar
		* foo 3.3 bar -> becomes -> foo 000000000003.000000000003 bar
		* 
		* @value I am the value being normalized.
		*/
		public string function normalizeEmbeddedNumbers( required string value ) {

			var normalizedValue = value
				// STEP 1: Split the value up into Numeric and Non-Numeric tokens.
				.reMatch( "\d+|\D+" )

				// STEP 2: Map the tokens onto normalized tokens. This means that any token
				// that is numeric (a string of numeric characters) will be formatted to be
				// 12-digits long (front-padded with zeros).
				.map(
					( token ) => {

						if ( isNumeric( token ) ) {

							return( numberFormat( token, "000000000000" ) );

						}

						return( token );

					}
				)

				// STEP 3: Join the normalized tokens back together into a single value.
				.toList( "" )
			;

			return( normalizedValue );

		}

	</cfscript>

As you can see, the first step of my naturalSortReports() function is create a mapping - natraulValueIndex - of the inherent values onto “normalized values”. The normalized values are Strings in which the embedded numbers have all been front-padded with zeros to be 12-digits long. Then, I’m simply using the natraulValueIndex to find the normalized values within my .sort() in-place sort operation.

Now, if I run the following ColdFusion code in Lucee, I get the following output:

Two arrays, one sorted natively, one using an in-place natural sort in Lucee CFML.

#coldfusion #arrays #in-place

Performing An In-Place Natural Sort On An Alpha-Numeric Array In Lucee CFML 5.3.6.61
1.20 GEEK