The data-related career landscape can be confusing, not only to newcomers, but also to those who have spent time working within the field.

Get in where you fit in. Focusing on newcomers, however, I find from requests that I receive from those interested in join the data field in some capacity that there is often (and rightly) a general lack of understanding of what it is one needs to know in order to decide where it is that they fit in. In this article, we will have a look at five distinct data career archetypes, and hopefully provide some advice on how to get one’s feet wet in this vast, convoluted field.

We will focus solely on industry roles, as opposed to those in research, as not to add an additional layer of complication. We will also omit executive level positions such as Chief Data Officer and the like, mostly because if you are at the point in your career that this role is an option for you, you probably don’t need the information in this article.

So here are 5 data career archetypes, replete with descriptions and information on what makes them distinct from one another.

Data Architect

The data architect focuses on engineering and managing data stores and the data that reside within them.

The data architect is concerned with managing data and engineering the infrastructure which stores and supports this data. There is generally little to no data analysis needing to take place in such a role (beyond data store analysis for performance tuning), and the use of languages such as Python and R is likely not necessary. An expert level knowledge of relational and non-relational databases, however, will undoubtedly be necessary for such a role. Selecting data stores for the appropriate types of data being stored, as well as transforming and loading the data, will be necessary. Databases, data warehouses, and data lakes; these are among the storage landscapes that will be in the data architect’s wheelhouse. This role is likely the one which will have the greatest understanding of and closest relationship with hardware, primarily that related to storage, and will probably have the best understanding of cloud computing architectures of anyone else in this article as well.

SQL and other data query languages — such as Jaql, Hive, Pig, etc. — will be invaluable, and will likely be some of the main tools of an ongoing data architect’s daily work after a data infrastructure has been designed and implemented. Verifying the consistency of this data as well as optimizing access to it are also important tasks for this role. A data architect will have the know-how to maintain appropriate data access rights, ensure the infrastructure’s stability, and guarantee the availability of the housed data.

This is differentiated from the data engineer role by focus: while a data engineer is concerned with building and maintaining data pipelines (see below), the data architect is focused on the data itself. There may be overlap between the 2 roles, however: ETL; any task which could transform or move data, especially from one store to another; starting data on a journey down a pipeline.

Like other roles in this article, you might not necessarily see a “data architect” role advertised as such, and might instead see related job titles, such as:

  • Database Administrator
  • Spark Administrator
  • Big Data Administrator
  • Database Engineer
  • Data Manager

Data Engineer

The data engineer focuses on engineering and managing the infrastructure which supports the data and data pipelines.

What is the data infrastructure? It’s the collection of software and storage solutions that allow for the retrieval of data from a data store, the processing of data in some specified manner (or series of manners), the movement of data between tasks (as well as the tasks themselves), as data is on its way to analysis or modeling, as well as the tasks which come after this analysis or modeling. It’s the pathway that the data takes as it moves along its journey from its home to its ultimate location of usefulness, and beyond. The data engineer is certainly familiar with DataOps and its integration into the data lifecycle.

From where does the data infrastructure come? Well, it needs to be designed and implemented, and the data engineer does this. If the data architect is the automobile mechanic, keeping the car running optimally, then data engineering can be thought of as designing the roadway and service centers that the automobile requires to both get around and to make the changes needed to continue on the next section of its journey. The pair of these roles are crucial to both the functioning and movement of your automobile, and are of equal importance when you are driving from point A to point B.

Truth be told, some the technologies and skills required for data engineering and data management are similar; however, the practitioners of these disciplines use and understand these concepts at different levels. The data engineer may have a foundational knowledge of securing data access in a relational database, while the data architect has expert level knowledge; the data architect may have some understanding of the transformation process that an organization requires its stored data to undergo prior to a data scientist performing modeling with that data, while a data engineer knows this transformation process intimately. These roles speak their own languages, but these languages are more or less mutually intelligible.

#data analyst #data engineer #data engineering #data management #data science

Data Scientist, Data Engineer & Other Data Careers, Explained
1.20 GEEK