August  Larson

August Larson

1625115660

Remove Personal Information From Text With Python

Implementation of a Python privacy textfilter to protect the privacy of your users by removing Personally Identifiable Information (PII).

The GDPR is the _General Data Protection Regulation _by the European Union. Its purpose is to protect data of all European residents. Protecting data is also an intrinsic value of a developer. Protecting data in a row/column data structure is relative easy by controlling access to columns and rows. But what about free text?

In order to fulfil our privacy requirements we can adapt the content of a free text field en replace privacy related information by tags. The meaning of the text is not altered but it cannot be related to an individual through anonymization. The goal is translate the following text (date is Dutch):

The possibilities have increased since 2014, especially compared to2012, hè Kees? The system has different functions to manipulate data. The date is 12–01–2021 (or 12 jan 2021 or 12 januari 2021).

You can reach me at [email protected] and I live in Rotterdam. My address is Maasstraat 13, 1234AB. My name is Thomas de Vries and I have Acne. Oh , I use ranitidine for this.

and replace it with

The possibilities have increased since , especially compared to, hè ? The system has different functions to manipulate data. The date is (or or ).

You can reach me at and I live in . My address is , . My name is and I have . Oh , I use for this.

This article describes a simple privacy filter that will perform the following actions:

  • Replace dates with the tag
  • Replace an URL with the tag
  • Replace email addresses with
  • Replace Postal codes with
  • Replace numbers with
  • Replace cities and regions with
  • Replace street names with
  • Replace first and last names with
  • Replace diseases with
  • Replace medicine names with

The last two are added since medical information requires extra care. The number of occurrences will be low but the impact is big when this information is leaked.

#pii #gdpr-compliance #privacy-protection #regular-expressions #python #remove personal information from text with python

Remove Personal Information From Text With Python