If we were to bring Data Science (DS) down to three components we would get algorithms, _programming _and domain knowledge. The importance of these three components might not be equal; take domain knowledge for example, while it is fundamental to build something useful it is also tightly related to the data being used and thus tightly related to the company, so it’s very difficult to have the domain knowledge before starting to work with the specific data. Surely, having previous experience in the same field helps a lot but one would still need to need to learn about the company’s data culture. The same cannot be said about programming or algorithms but let’s focus on the programming component for this article.

Programming is not something new nor is it company-specific; there are 245 notable programming languages that exist to date. As an individual, choosing a programming language to learn can be daunting because of the many possibilities but choosing among the most used ones in the field might help. As a company instead it can be far more complicated to choose _the _programming language to use, especially for companies that have been around for years and have many systems already set up: it depends on company culture, security, the complexity of the application, scalability, integration with tools used within the company and so on and so forth. In the specific case of DS, on top of all that other aspects like what the libraries/packages cover, how up to date these packages are, where the data is stored, compatibility with deployment tools if there are some already in use and so on. However, as big as the number of possibilities can be there is a list of the most used programming languages in the field of DS and three of these languages I have experience with are Python, Java and R.

#java #r #data-science #python #programming-languages

The programming in Data Science
1.15 GEEK