Emoji Challenge

Building telegram bots are fun. Choosing Java to create a bot and use emojis sent me into a tricky situation.

The unicode code points for emoji must be converted to surrogate sequence for Java code to process it correctly, otherwise the character will not be rendered rightly to visualize.

Java needs surrogate pair for the unicode point, which is a bit daunting to start with, and it is even more crazier to keep it synced periodically with the list of emojis that get created. This is best automated, so that when things change, code can be adapted easily.

As of this writing on Sep 27, 2020; there are 1816 code points. New ones are created and tracked here.

This article demonstrates the solution by applying ETL (extract-transform-load) design pattern to generate partial Java code from HTML page!

1. Analysis

Understand document structure

The unicode.org’s full listing of emoji page is rendered as shown below. I’m illustrating the elements required to create a predictable connection between human readable name and emoji unicode point.

The page consists of HTML table. The table is split into multiple sections (with table header elements). The table rows represent the relevant unicode point information, and the columns contain specific values of interest.

Image for post

Model a class to store row entries

Let’s create a Java POJO UnicodePointEntry to extract the web page content into a structured format. This class provides a method to convert unicode surrogate pairs into a visually representable emoji with toEmoji().

#emoji #unicode #junit #java #telegram

How to Easily Handle Emoji Unicode in Java
5.10 GEEK