Information security is very important for any organization. Lost money is a minor problem, the serious one is that the enterprise system. However, fraud email and phishing email occupy a small set of data when comparing to normal email. Augmenting fraud and phishing email is a way to tackle this problem.
Example of CEO fraud email (Regina et al., 2020)
Therefore, Regina et al. proposed three different approaches to generate synthetic data for model training. As synthetic data is a kind of “fake” data, some low-quality data may hurt model performance. Validations are needed to keep a high-quality synthetic data. Also, there are some assumptions which are:
Above: Original Text. Below: Augmented Text (Regina et al., 2020)
#data-augmentation #nlp #data-science #artificial-intelligence #machine-learning