This is the first part of a follow-along series on GitHub collaboration. With this article, I aim to explain how two or more people can collaborate, version control and proofread their codes on GitHub for Data Science projects.
We will be covering specific topics like:
In these articles, we will have two people collaborating on GitHub. Let’s give them two pseudonyms, Sofi and Alec.
Since this is not an article on data cleaning, we will just focus on writing a few functions to see how collaboration and version controlling works. We will be working with a used cars dataset. You can download the dataset from here.
Sofi and Alec: are working on a data cleaning project named “autos“. Sofi takes the initiative of gathering data, creating required .py, .ipynb files for the project.
Exploratory data analysis (EDA) was carried out on the dataset. Please refer to this article for the EDA. Based on EDA report tasks are planned for the project. We will only look at 3 tasks in this article.
Sofi creates a project folder (…/autos) with two files, auto.csv and autos_analysis.ipynb.
In software engineering, version control (also known as revision control, source control, or source code management) is a class of systems responsible for managing changes to computer programs, documents, large web sites, or other collections of information. Version control is a component of software configuration management.
There are many version control systems out there. Often they are divided into two groups: “centralized” and “distributed”.
CVCS are based on the idea that there is a single “central” copy of your project somewhere (probably on a server), and programmers will “commit” their changes to this central copy. The most popular CVCS is Subversion.
#git-pull #git-clone #git-commands #git-push #git-branch