In the past, I’ve looked at implementing a “Natural Sort order” in both JavaScipt and ColdFusion. However, my ColdFusion-based approach has always created a new array as part of its algorithm. Most of the time this doesn’t matter; however, yesterday at InVision, I ran into a situation in which I needed to conditionally perform either a numeric sort or an in-place natural sort on an Array. As such, I wanted to quickly revisit the idea of a natural sort, this time using an in-place sorting algorithm in Lucee CFML 5.3.6.61.
To quickly recap what a “Natural sort order” is, it’s a technique in which the numbers embedded within a string are treated “atomically”. Meaning, each string of numeric characters is treated as a single numeric value. That is, each number is compared “as a number” and not as a string of “characters”. This approach to sorting is more in alignment with how humans think about sorting.
In my previous ColdFusion algorithm, I implemented a natural sort by mapping one Array onto another intermediary Array; sorting the intermediary Array; and then, mapping the intermediary Array back onto a final, sorted Array. Today’s algorithm uses the same basic constructs; however, rather than creating an intermediary array, I’m creating a look-up struct / index of “normalized value”. Then, I’m performing an in-place sort on said normalized values.
To demonstrate, I’ve created an Array of values that will sort differently when using a native sort vs. a “natural sort”. I then output the results of both sorting approaches:
<cfscript>
// To demonstrate the two sorting approaches, we're going to embed numbers within the
// following values that will sort differently using an Alphabetical sort (where the
// numbers are compared as Strings) and a Natural sort (where the numbers are
// compared as numbers).
reports = [
"Client 5 Data",
"Client 100 Data",
"Client 20 Data",
"Client 70 Data (22)",
"Client 70 Data (104)",
"Client 412 Data"
];
dump( label = "Native Sort", var = sortReports( reports ) );
echo( "<br />" );
dump( label = "Natural Sort", var = naturalSortReports( reports ) );
// ------------------------------------------------------------------------------- //
// ------------------------------------------------------------------------------- //
/**
* I perform an IN PLACE sort of the given collection using a text-sort.
*
* @collection I am the collection being sorted.
*/
public array function sortReports( required array collection ) {
return( collection.sort( "textNoCase" ) );
}
/**
* I perform an IN PLACE NATURAL sort of the given collection in which the embedded
* numbers are compared as numbers, not just strings of numeric characters.
*
* @collection I am the collection being sorted.
*/
public array function naturalSortReports( required array collection ) {
// The trick to performing a "natural sort" in which numbers are treated "as
// numbers", is that we're actually going to perform a "text sort"; but, we're
// going to do so with numbers that have been NORMALIZED IN LENGTH (which is what
// makes it safe to sort based on a the character data). To do this, we need to
// create an index (look-up hash) that maps our collection values onto the
// normalized values.
var natraulValueIndex = {};
for ( var item in collection ) {
naturalValueIndex[ item ] = normalizeEmbeddedNumbers( item );
}
// Now that we have our value -> normalized-value mapping, we can perform the
// IN-PLACE SORT with an operator that compares the mapped values.
collection.sort(
( a, b ) => {
var aValue = naturalValueIndex[ a ];
var bValue = naturalValueIndex[ b ];
return( aValue.compareNoCase( bValue ) );
}
);
return( collection );
}
/**
* I attempt to normalize the numbers embedded within the given value, making them all
* 12-digits long. Any embedded number that is less than 12-digits long will be front-
* padded with zeros:
*
* foo 3 bar -> becomes -> foo 000000000003 bar
* foo 3.3 bar -> becomes -> foo 000000000003.000000000003 bar
*
* @value I am the value being normalized.
*/
public string function normalizeEmbeddedNumbers( required string value ) {
var normalizedValue = value
// STEP 1: Split the value up into Numeric and Non-Numeric tokens.
.reMatch( "\d+|\D+" )
// STEP 2: Map the tokens onto normalized tokens. This means that any token
// that is numeric (a string of numeric characters) will be formatted to be
// 12-digits long (front-padded with zeros).
.map(
( token ) => {
if ( isNumeric( token ) ) {
return( numberFormat( token, "000000000000" ) );
}
return( token );
}
)
// STEP 3: Join the normalized tokens back together into a single value.
.toList( "" )
;
return( normalizedValue );
}
</cfscript>
As you can see, the first step of my naturalSortReports()
function is create a mapping - natraulValueIndex
- of the inherent values onto “normalized values”. The normalized values are Strings in which the embedded numbers have all been front-padded with zeros to be 12-digits long. Then, I’m simply using the natraulValueIndex
to find the normalized values within my .sort()
in-place sort operation.
Now, if I run the following ColdFusion code in Lucee, I get the following output:
#coldfusion #arrays #in-place