A phrase I often hear is “security is everyone’s responsibility” but I notice that data scientists are frequently so focused on the vast number of skills that they need to know, that security goes ignored. Besides having many responsibilities, I believe that security seems daunting and appears to require lots of software engineering skill. In reality, it’s fairly easy to implement the lowest level of security into your software. I’d recommend following Charles Nwatu’s (a leader in security at Netflix and formerly StitchFix) principle “do less better.” To me, this means successfully implementing low level security is better than failing to implement high level security. This article is aimed at data scientists (or python users) and only assumes you have some rudimentary knowledge of the command line.

Types of Security Tools

There are two broad types of security tools, static and dynamic. Dynamic security tools run against running software to uncover threats, while static tools run against source code files to find problems. In this article we will be adding two static security tools. The first tool will check to make sure you are not adding keys, secrets, or passwords to your repository, preventing you from exposing that private information. The second tool will help you check your software dependencies for security threats.

Git Secrets

Git Secrets is the tool we will use to monitor our repositories for information we don’t want to be public. It is available and easy to install for your operating system by consulting its repo here. Once you have installed it you can add things we want to look out for with the following commands: git secrets --add 'your-regular-expression' to block patterns or git secrets --add --literal 'your-literal-string' to block specific strings. As an example, I will run git secrets --add 'password ?=+ ?[A-Za-z0-9]+ which will block things like the following:

  • password = anyLengthPassword
  • password=passwordWithNoSpacesNextToEqualSign
  • password==doubleEqualsBlockedToo

But won’t block these:

  • password=””
  • password=

This is great since it won’t warn us against pushing code that has the passwords properly removed. Now I run git secrets --scan -r to scan all the documents for the patterns I added. It found one password that I put on the first line of the README. Here’s what Git Secrets reported to me:

README.md:1:password = password12345

[ERROR] Matched one or more prohibited patterns

Possible mitigations:
- Mark false positives as allowed using: git config --add secrets.allowed ...
- Mark false positives as allowed by adding regular expressions to .gitallowed at repository's root directory
- List your configured patterns: git config --get-all secrets.patterns
- List your configured allowed patterns: git config --get-all secrets.allowed
- List your configured allowed patterns in .gitallowed at repository's root directory
- Use --no-verify if this is a one-time false positive

I set these Git Secrets configurations for this specific repository, but you can set some global configurations to prevent you from pushing any secrets in any repo. Do this with git secrets --add --global 'pattern or string here'.

#security #data-science #programming #automation #continuous-integration

Adding Security to Your Code
1.55 GEEK