Gordon  Matlala

Gordon Matlala


Git Clone: A Data-driven Study on Cloning Behaviors

@derrickstolee recently  discussed several different git clone  options, but how do those options actually affect your Git performance? Which option is fastest for your client experience? Which option is fastest for your build machines? How can these options impact server performance? If you are a GitHub Enterprise Server administrator it’s important that you understand how the server responds to these options under the load of multiple simultaneous requests.

Here at GitHub, we use a data-driven approach to answer these questions. We ran an experiment to compare these different clone options and measured the client and server behavior. It is not enough to just compare git clone times, because that is only the start of your interaction with a Git repository. In particular, we wanted to determine how these clone options change the behavior of future Git operations such as git fetch.

In this experiment, we aimed to answer the below questions:

  1. How fast are the various git clone commands?
  2. Once we have cloned a repository, what kind of impact do future git fetch commands have on the server and client?
  3. What impact do full, shallow and partial clones have on a Git server? This is mostly important for our GitHub Enterprise Server Admins.
  4. Will the repository shape and size make any difference in the overall performance?

It is worth special emphasis that these results come from simulations that we performed in our controlled environments and do not simulate complex workflows that might be used by many Git users. Depending on your workflows and repository characteristics these results may change. Perhaps this experiment provides a framework that you could follow to measure how your workflows are affected by these options. If you would like help analyzing your worksflows, feel free to engage with  GitHub’s Professional Services team.

For a summary of our findings, feel free to jump to  our conclusions and recommendations.

Experiment design

To maximize the repeatability of our experiment, we use open source repositories for our sample data. This way, you can compare your repository shape to the tested repositories to see which is most applicable to your scenario.

We chose to use the jquery/jqueryapple/swift and torvalds/linux repositories. These three repositories vary in size and number of commits, blobs, and trees.

These repositories were mirrored to a GitHub Enterprise Server running version 2.22 on a 8-core cloud machine. We use an internal load testing tool based on  Gatling to generate git requests against the test instance. We ran each test with a specific number of users across 5 different load generators for 30 minutes. All of our load generators use git version 2.28.0 which by default is using protocol version 1. We would like to make a note that protocol version 2 only improves ref advertisement and therefore we don’t expect it to make a difference in our tests.

Once a test is complete, we use a combination of Gatling results, ghe-governor and server health metrics to analyze the test.

#uncategorized #git

What is GEEK

Buddha Community

Git Clone: A Data-driven Study on Cloning Behaviors
 iOS App Dev

iOS App Dev


Your Data Architecture: Simple Best Practices for Your Data Strategy

If you accumulate data on which you base your decision-making as an organization, you should probably think about your data architecture and possible best practices.

If you accumulate data on which you base your decision-making as an organization, you most probably need to think about your data architecture and consider possible best practices. Gaining a competitive edge, remaining customer-centric to the greatest extent possible, and streamlining processes to get on-the-button outcomes can all be traced back to an organization’s capacity to build a future-ready data architecture.

In what follows, we offer a short overview of the overarching capabilities of data architecture. These include user-centricity, elasticity, robustness, and the capacity to ensure the seamless flow of data at all times. Added to these are automation enablement, plus security and data governance considerations. These points from our checklist for what we perceive to be an anticipatory analytics ecosystem.

#big data #data science #big data analytics #data analysis #data architecture #data transformation #data platform #data strategy #cloud data platform #data acquisition

Uriah  Dietrich

Uriah Dietrich


Only Data-Minded Marketers and Market-Minded Developers Can Achieve Data Driven Marketing

Using data as a part of your marketing plan can have a tremendous impact on your overall results, which is why data-driven marketing has become the standard for many agencies.

However, data-driven marketing may require many businesses to rethink the way they work, especially when it comes to cooperation between their various teams.

You may have heard about the concept of collaboration and automating processes before - something referred to as webops. Now an increasing number of companies are throwing marketing into the mix.

Among the most important factors is a close working relationship between marketing and web development teams if a business wants to make the most of data-driven marketing.

#data-driven #data-driven-marketing #web-development #marketing-data-science #teamwork #data-driven-development #data-driven-decision-making #webops

Gerhard  Brink

Gerhard Brink


Getting Started With Data Lakes

Frameworks for Efficient Enterprise Analytics

The opportunities big data offers also come with very real challenges that many organizations are facing today. Often, it’s finding the most cost-effective, scalable way to store and process boundless volumes of data in multiple formats that come from a growing number of sources. Then organizations need the analytical capabilities and flexibility to turn this data into insights that can meet their specific business objectives.

This Refcard dives into how a data lake helps tackle these challenges at both ends — from its enhanced architecture that’s designed for efficient data ingestion, storage, and management to its advanced analytics functionality and performance flexibility. You’ll also explore key benefits and common use cases.


As technology continues to evolve with new data sources, such as IoT sensors and social media churning out large volumes of data, there has never been a better time to discuss the possibilities and challenges of managing such data for varying analytical insights. In this Refcard, we dig deep into how data lakes solve the problem of storing and processing enormous amounts of data. While doing so, we also explore the benefits of data lakes, their use cases, and how they differ from data warehouses (DWHs).

This is a preview of the Getting Started With Data Lakes Refcard. To read the entire Refcard, please download the PDF from the link above.

#big data #data analytics #data analysis #business analytics #data warehouse #data storage #data lake #data lake architecture #data lake governance #data lake management

Gerhard  Brink

Gerhard Brink


83% Of Data-Driven Organisations Gained Critical Business Advantages During Pandemic

Tableau Software has announced a new study developed in conjunction with YouGov, to explore how organisations in the Asia Pacific and Japan (APJ) have used data during COVID pandemic. The survey noted that data-driven companies in India are more resilient and confident during the pandemic, compared to non-data-driven companies.

According to the data, 83% of data-driven companies in India have reported reaping critical business advantages during the pandemic. Along with that, the survey revealed that 62% of organisations believe that leveraging data can provide multiple and vast benefits to businesses, including more effective communication with stakeholders. Another 58% organisation noted making faster strategic business decisions with 56% witnessing increasing cross-team collaboration. Further, the data stated that 48% of organisations have managed to make their business more agile.

Being data-driven is also allowing organisations to be more optimistic towards this turbulent time. The survey stated that around 76% of organisations are confident and looking forward to a promising future for their business.

While data-driven companies are reaping its benefits, the non-data-driven companies are facing massive challenges in grasping the importance of data. This demonstrates the prevailing disconnect of how businesses leveraging data and the potential for organisations to benefit from a more data-driven approach.

#news #data advantages #data driven companies #data driven decisions india #data driven organisation #data driven organisations gained advantage amid pandemic

Cyrus  Kreiger

Cyrus Kreiger


Future of Marketing: How Data Science Predicts Consumer Behavior

The outbreak of COVID-19 has shaken every industry thereby influenced the preferences and priorities of consumers. During the pandemic, those business models who adapted to the change survived while others got lost in the thin air. Gradually, as the post-pandemic phase arrived, one thing that helped marketers predict their consumer behavior was Data Science

#data-science #consumer-behavior #marketing #big-data #data #data-analysis #data-visualization #data-analytics