Open-Sourcing Datanymizer: In-flight Template-Driven Data Anonymization

Open-Sourcing Datanymizer: In-flight Template-Driven Data Anonymization

Datanymizer is an open-source, GDPR-compliant, privacy-preserving data anonymization tool flexible about how the anonymization takes place.

Production systems often need to store sensitive data, including personally-identifiable information (PII). Developers often need their test systems to have data that is as close to that in the production systems as is reasonably possible. Whilst it was always best-practice, legal data protection regimes such as HIPAA, HITECH, CPRA, and GDPR means it’s even more important to ensure that any personal data remains only where it’s strictly needed, and is properly masked or anonymized when being transferred elsewhere.

There are a number of different ways to bridge this gap, such as designing a strict separation between database tables that hold PII and those which don’t, allowing the PII tables to be skipped on export and replaced with synthetic data on the development systems. This approach can certainly work, but it relies on the system adhering to this design pattern, and the synthetic data being kept closely enough in step with the production equivalents to not cause problems.

An alternative might be to generate a special kind of “cleansed” dump on the production system, with PII already masked or replaced with synthetic data, ready for developers to import, keeping the risk of any sensitive data ever leaving the production environment low.

This is the approach Datanymizer takes.

Fakers, anonymizers, and obfuscators — there are various free and open-source data anonymization tools that have been around for a long time and work pretty well, so why did we create a new one? The one that supports globals, uniqueness constraints, inline rules, and other cool features.

We had some particular requirements we wanted our tool to meet. We didn't want the anonymizer to take a "raw" dump and mutate it. Instead, we needed to provide an already anonymized dump, without access to real data. The configuration that determined how the real system data would be anonymized should have been kept separate from that data.

We also wanted a tool that was flexible about how the anonymization itself takes place, ideally allowing the use of templates to populate field contents.

rust rust lang

What is Geek Coin

What is GeekCash, Geek Token

Best Visual Studio Code Themes of 2021

Bootstrap 5 Tutorial - Bootstrap 5 Crash Course for Beginners

Nest.JS Tutorial for Beginners

Hello Vue 3: A First Look at Vue 3 and the Composition API

Rust Course for Beginners - Lesson 1 - Installing Rust and Hello World - Tutorial Rust lang rustlang

Rust Course for Beginners - Lesson 1 - Installing Rust and Hello World - Tutorial Rust lang rustlang

Rust Course for Beginners - Lesson 7 - Ownership - Tutorial Rust lang - rustlang

Rust Course for Beginners - Lesson 7 - Ownership - Tutorial Rust lang - rustlang

Rust Course for Beginners - Lesson 6 - Loops - Tutorial Rust lang - rustlang

Rust Course for Beginners - Lesson 6 - Loops - Tutorial Rust lang - rustlang

Rust Course for Beginners - Lesson 5 - If Expressions - Tutorial Rust lang - rustlang

Rust Course for Beginners - Lesson 5 - If Expressions - Tutorial Rust lang - rustlang

Rust Course for Beginners - Lesson 4 - Functions - Tutorial Rust lang - rustlang

Rust Course for Beginners - Lesson 4 - Functions - Tutorial Rust lang - rustlang