Openrefine is a powerful tool that helps you clean messy data without having to explicitly code even a single line. It was previously known as google refine and can be used similar to using Excel to perform complex data transformations. The main advantage of this tool is that it is completely free, open-source and the data does not leave your computer, which means all the operations can be done locally.

The purpose of this article is to explore and experiment with the openrefine tool using a sample dataset. So, let’s begin.

When is this tool most useful?

The Openrefine tool is most effective for data migration projects. For datasets with columns containing address field, names etc which needs to be converted to a standard format before it is migrated and modelled, the openrefine tool works best. It is also very useful for columns that contain data with spelling errors, dates written in different formats, white spaces and special characters. These columns can be easily manipulated without writing functions to do so.

#developers corner #data preprocessing #data transformation #openrefine

Openrefine Tutorial: A Tool For Data Preprocessing Without Code
