Do We Need Object Oriented Programming in Data Science? Let’s discuss some pros and cons of switching to object oriented programming as a data scientist.

When I first got into data science, it became clear that object oriented programming was not the main focus. Many data scientists wrote code in notebooks or as single Python scripts to clean data, develop models, and run them but did not put the code into functions or classes. Most classes center around understanding machine learning (ML) models, feature engineering, training/testing/validation sets, and more. The work was commonly done in a notebook environment and shared around between students and professors. Often, data scientists do not consider the end-user in the same manner as a developer during software development.When I began working as a data engineer, there was more focus on object oriented languages and cloud technologies to host, clean, and provide the data to other teams. Even then, most of the programming lived in Lambda functions on AWS or using open source libraries to do the necessary work. By the time I switched into a data science role, I was back to long or chained notebooks, with some functions, depending on the developer. There was minimal object oriented code, and that was only there to interact with other teams who ingested our work and required it for running the code.Now I work on a team whose main codebase is almost all object oriented alongside other libraries and tools that follow similar coding standards. There is a mix of notebooks, libraries, and automation with testing pipelines and release pipelines. Shifting towards an object oriented mindset has been very beneficial. This shift has allowed our code to be production-ready, easily readable, and extendable to new use-cases.

