GitHub for Data Scientists: Part1

This is the first part of a follow-along series on GitHub collaboration. With this article, I aim to explain how two or more people can collaborate, version control and proofread their codes on GitHub for Data Science projects.

We will be covering specific topics like:

Getting started on GitHub: Collaborate on GitHub like Pro: Part1
**Branching: **Collaborate on GitHub like Pro: Part2
**Commit: **Collaborate on GitHub like Pro: Commit

In these articles, we will have two people collaborating on GitHub. Let’s give them two pseudonyms, Sofi and Alec.

Tasks Planned

Since this is not an article on data cleaning, we will just focus on writing a few functions to see how collaboration and version controlling works. We will be working with a used cars dataset. You can download the dataset from here.

Sofi and Alec: are working on a data cleaning project named “autos“. Sofi takes the initiative of gathering data, creating required .py, .ipynb  files for the project.

Exploratory data analysis (EDA) was carried out on the dataset. Please refer to this article for the EDA. Based on EDA report tasks are planned for the project. We will only look at 3 tasks in this article.

Sofi creates a project folder (…/autos) with two files, auto.csv and autos_analysis.ipynb.

Version Control Systems (VCS)

In software engineering, version control (also known as revision control, source control, or source code management) is a class of systems responsible for managing changes to computer programs, documents, large web sites, or other collections of information. Version control is a component of software configuration management.

There are many version control systems out there. Often they are divided into two groups: “centralized” and “distributed”.

Centralized version control systems (CVCS)

CVCS are based on the idea that there is a single “central” copy of your project somewhere (probably on a server), and programmers will “commit” their changes to this central copy. The most popular CVCS is Subversion.

#git-pull #git-clone #git-commands #git-push #git-branch

Tasks Planned

Version Control Systems (VCS)

Centralized version control systems (CVCS)

towardsdatascience.com

GitHub for Data Scientists: Part1