Maintain confidentiality before, during, and after training

Confidential Machine Learning - ConfML - is a protocol that data owners follow when sharing training data with an ML service. This protocol maintains the confidentiality of the training data during the training process.

The confidentiality of data-at-rest and data-in-transit can be ensured through encryption. The data gets decrypted right before the start of training and remains vulnerable until the end of the training process. ConfML addresses that vulnerability: it ensures training data confidentiality during the training process.

The ConfML protocol consists of two steps that bookend the training process:

  1. The data owner scrambles the training data files using a secret-key before sending them to the ML service. The secret-key is not shared with the ML service.
  2. After receiving the _network-trained-on-scrambled-data _from the ML service, the data owner uses the secret-key of step 1 to transform that network into one that behaves the same as if it was trained on the original, unscrambled data.

These two steps ensure that the ML service never sees the original data, while the data owners obtain the networks that they desired.

Image for post](https://miro.medium.com/max/7500/1*m6isD2HN8ZrH_NYO9JgWMw.png)

Confidential machine learning (image by author)

scramble_files.py

Data owners can use something similar to the following program to scramble the features and labels files that will be used for training fully-connected, feedforward deep neural networks.

This program uses the secret-key to scramble the order of columns in the features and labels CSV files. This scrambling of column-order makes the data difficult to understand for intruders but has almost no impact on the quality of training.

#scramble_files.py

import random
import pandas
def bld_scram_idx(lst_len, key_secret):  #random list based on a key
   my_seed = int(''.join(list(map(str, map(ord, key_secret)))))
   random.seed(my_seed * lst_len)
   scram_idx = list(range(lst_len))
   random.shuffle(scram_idx)
   return scram_idx
def scram_list(lst, scram_idx):  #scramble a list of integers
    scram_lst = [0] * len(lst)
    for i, item in enumerate(lst):
        scram_lst[i] = scram_idx.index(item)
    return scram_lst
def scram_df(df, scram_idx):  #scramble a dataframe
    cols_idx = list(range(len(df.columns)))
    cols_idx_scram = scram_list(cols_idx, scram_idx)
    return df.reindex(labels = cols_idx_scram, axis='columns')
def read_csv_file_write_scram_version(csv_fname, key_secret):
    df_csv = pandas.read_csv(csv_fname, header=None)
    scram_idx = bld_scram_idx(len(df_csv.columns), key_secret)
    df_csv = scram_df(df_csv, scram_idx)
    csv_scram_fname = csv_fname.split('.csv')[0] + '_scrambled.csv'
    df_csv.to_csv(csv_scram_fname, header=None, index=None)
    print(csv_scram_fname + ' file written to disk')
KEY_SECRET, FT_CSV_FNAME, LB_CSV_FNAME = "", "", ""  #insert values
read_csv_file_write_scram_version(FT_CSV_FNAME, KEY_SECRET)
read_csv_file_write_scram_version(LB_CSV_FNAME, KEY_SECRET)

#encryption #privacy #security #deep-learning #artificial-intelligence #deep learning

Confidential Machine Learning
1.45 GEEK