Confidential Machine Learning - ConfML - is a protocol that data owners follow when sharing training data with an ML service. This protocol maintains the confidentiality of the training data during the training process.
The confidentiality of data-at-rest and data-in-transit can be ensured through encryption. The data gets decrypted right before the start of training and remains vulnerable until the end of the training process. ConfML addresses that vulnerability: it ensures training data confidentiality during the training process.
The ConfML protocol consists of two steps that bookend the training process:
These two steps ensure that the ML service never sees the original data, while the data owners obtain the networks that they desired.
Image for post](https://miro.medium.com/max/7500/1*m6isD2HN8ZrH_NYO9JgWMw.png)
Confidential machine learning (image by author)
Data owners can use something similar to the following program to scramble the features and labels files that will be used for training fully-connected, feedforward deep neural networks.
This program uses the secret-key to scramble the order of columns in the features and labels CSV files. This scrambling of column-order makes the data difficult to understand for intruders but has almost no impact on the quality of training.
#scramble_files.py
import random
import pandas
def bld_scram_idx(lst_len, key_secret): #random list based on a key
my_seed = int(''.join(list(map(str, map(ord, key_secret)))))
random.seed(my_seed * lst_len)
scram_idx = list(range(lst_len))
random.shuffle(scram_idx)
return scram_idx
def scram_list(lst, scram_idx): #scramble a list of integers
scram_lst = [0] * len(lst)
for i, item in enumerate(lst):
scram_lst[i] = scram_idx.index(item)
return scram_lst
def scram_df(df, scram_idx): #scramble a dataframe
cols_idx = list(range(len(df.columns)))
cols_idx_scram = scram_list(cols_idx, scram_idx)
return df.reindex(labels = cols_idx_scram, axis='columns')
def read_csv_file_write_scram_version(csv_fname, key_secret):
df_csv = pandas.read_csv(csv_fname, header=None)
scram_idx = bld_scram_idx(len(df_csv.columns), key_secret)
df_csv = scram_df(df_csv, scram_idx)
csv_scram_fname = csv_fname.split('.csv')[0] + '_scrambled.csv'
df_csv.to_csv(csv_scram_fname, header=None, index=None)
print(csv_scram_fname + ' file written to disk')
KEY_SECRET, FT_CSV_FNAME, LB_CSV_FNAME = "", "", "" #insert values
read_csv_file_write_scram_version(FT_CSV_FNAME, KEY_SECRET)
read_csv_file_write_scram_version(LB_CSV_FNAME, KEY_SECRET)
#encryption #privacy #security #deep-learning #artificial-intelligence #deep learning