Managing software dependencies for Data Science projects

Managing software dependencies for Data Science projects

This post discusses another important subject: dependency management, which relates to properly documenting virtual environments to ease reproducibility and make them reliable when deployed to production.

Virtual environments are a must when developing software projects. They allow you to create self-contained, isolated Python installations that prevent your projects from clashing with each other and let other people reproduce your setup.

However, using virtual environments is just the first step towards developing reproducible data projects. This post discusses another important subject: dependency management, which relates to properly documenting virtual environments to ease reproducibility and make them reliable when deployed to production.

For a more in-depth description of Python environments, see this other post.

How dependency managers work

tl; dr; Package managers use heuristics to install package versions that are compatible with each other. A large set of dependencies and version constraints might lead to failures when resolving the environment.

When you install a package, your package manager (pip, conda) has to install dependencies for the requested package and dependencies of those dependencies, until all requirements are satisfied.

Packages usually have version constraints. For example, at this time of writing scikit-learn requires numpy 1.13.3 or higher. When packages share dependencies, the package manager has to find versions that satisfy all constraints (this process is known as dependency resolution), doing so is computationally expensive, current package managers use heuristics to find a solution in a reasonable amount of time.

With large sets of dependencies, it can happen that the solver is unable to find a solution. Some package managers will throw an error, but others might just print a warning message. To avoid these issues, it is important to be conscious about project dependencies.

machine-learning data-science python

Bootstrap 5 Complete Course with Examples

Bootstrap 5 Tutorial - Bootstrap 5 Crash Course for Beginners

Nest.JS Tutorial for Beginners

Hello Vue 3: A First Look at Vue 3 and the Composition API

Building a simple Applications with Vue 3

Deno Crash Course: Explore Deno and Create a full REST API with Deno

How to Build a Real-time Chat App with Deno and WebSockets

Convert HTML to Markdown Online

HTML entity encoder decoder Online

Data Science Projects | Data Science | Machine Learning | Python

Practice your skills in Data Science with Python, by learning and then trying all these hands-on, interactive projects, that I have posted for you.

Data Science Projects | Data Science | Machine Learning | Python

Practice your skills in Data Science with Python, by learning and then trying all these hands-on, interactive projects, that I have posted for you.

Data Science Projects | Data Science | Machine Learning | Python

Practice your skills in Data Science with Python, by learning and then trying all these hands-on, interactive projects, that I have posted for you.

Data Science Projects | Data Science | Machine Learning | Python

Practice your skills in Data Science with Python, by learning and then trying all these hands-on, interactive projects, that I have posted for you.

Data Science Projects | Data Science | Machine Learning | Python

Practice your skills in Data Science with Python, by learning and then trying all these hands-on, interactive projects, that I have posted for you.