Desmond  Gerber

Desmond Gerber

1621260720

Visualizing GitHub’s Global Community

This is the second post in a series about how we built our new homepage.

  1. How our globe is built
  2. How we collect and use the data behind the globe
  3. How we made the page fast and performant
  4. How we illustrate at GitHub
  5. How we designed the homepage and wrote the narrative

In the first post, my teammate  Tobias shared  how we made the 3D globe come to life, with lots of nitty gritty details about Three.js, performance optimization, and delightful touches.

But there’s another side to the story—the data! We hope you enjoy the read.

Data goals

When we kicked off the project, we knew that we didn’t want to make just another animated globe. We wanted the data to be interesting and engaging. We wanted it to be real, and most importantly, we wanted it to be live.

Luckily, the data was there.

The challenge then became designing a data service that addressed the following challenges:

  1. How do we query our massive volume of data?
  2. How do we show you the most interesting bits?
  3. How do we geocode user locations in a way that respects privacy?
  4. How do we expose the computed data back to the monolith?
  5. How do we not break GitHub?

Let’s begin, shall we?

Querying GitHub

So, how hard could it be to show you some recent pull requests? It turns out it’s actually very simple:

class GlobeController < ApplicationController
  def data
    pull_requests = PullRequest
      .where(open: true)
      .joins(:repositories)
      .where("repository.is_open_source = true")
      .last(10_000)

    render json: pull_requests
  end
end

Just kidding

Because of the volume of data generated on GitHub every day, the size of our databases, as well as the importance of keeping GitHub fast and reliable, we knew we couldn’t query our production databases directly.

Luckily, we have a data warehouse and a fantastic team that maintains it. Data from production is fetched, sanitized, and packaged nicely into the data warehouse on a regular schedule. The data can then be queried using  Presto, a flavor of SQL meant for querying large sets of data.

We also wanted the data to be as fresh as possible. So instead of querying snapshots of our MySQL tables that are only copied over once a day, we were able to query data coming from our  Apache Kafka event stream that makes it into the data warehouse much more regularly.

#engineering #github

What is GEEK

Buddha Community

Visualizing GitHub’s Global Community
Edison  Stark

Edison Stark

1603861600

How to Compare Multiple GitHub Projects with Our GitHub Stats tool

If you have project code hosted on GitHub, chances are you might be interested in checking some numbers and stats such as stars, commits and pull requests.

You might also want to compare some similar projects in terms of the above mentioned stats, for whatever reasons that interest you.

We have the right tool for you: the simple and easy-to-use little tool called GitHub Stats.

Let’s dive right in to what we can get out of it.

Getting started

This interactive tool is really easy to use. Follow the three steps below and you’ll get what you want in real-time:

1. Head to the GitHub repo of the tool

2. Enter as many projects as you need to check on

3. Hit the Update button beside each metric

In this article we are going to compare three most popular machine learning projects for you.

#github #tools #github-statistics-react #github-stats-tool #compare-github-projects #github-projects #software-development #programming

Desmond  Gerber

Desmond Gerber

1621260720

Visualizing GitHub’s Global Community

This is the second post in a series about how we built our new homepage.

  1. How our globe is built
  2. How we collect and use the data behind the globe
  3. How we made the page fast and performant
  4. How we illustrate at GitHub
  5. How we designed the homepage and wrote the narrative

In the first post, my teammate  Tobias shared  how we made the 3D globe come to life, with lots of nitty gritty details about Three.js, performance optimization, and delightful touches.

But there’s another side to the story—the data! We hope you enjoy the read.

Data goals

When we kicked off the project, we knew that we didn’t want to make just another animated globe. We wanted the data to be interesting and engaging. We wanted it to be real, and most importantly, we wanted it to be live.

Luckily, the data was there.

The challenge then became designing a data service that addressed the following challenges:

  1. How do we query our massive volume of data?
  2. How do we show you the most interesting bits?
  3. How do we geocode user locations in a way that respects privacy?
  4. How do we expose the computed data back to the monolith?
  5. How do we not break GitHub?

Let’s begin, shall we?

Querying GitHub

So, how hard could it be to show you some recent pull requests? It turns out it’s actually very simple:

class GlobeController < ApplicationController
  def data
    pull_requests = PullRequest
      .where(open: true)
      .joins(:repositories)
      .where("repository.is_open_source = true")
      .last(10_000)

    render json: pull_requests
  end
end

Just kidding

Because of the volume of data generated on GitHub every day, the size of our databases, as well as the importance of keeping GitHub fast and reliable, we knew we couldn’t query our production databases directly.

Luckily, we have a data warehouse and a fantastic team that maintains it. Data from production is fetched, sanitized, and packaged nicely into the data warehouse on a regular schedule. The data can then be queried using  Presto, a flavor of SQL meant for querying large sets of data.

We also wanted the data to be as fresh as possible. So instead of querying snapshots of our MySQL tables that are only copied over once a day, we were able to query data coming from our  Apache Kafka event stream that makes it into the data warehouse much more regularly.

#engineering #github

GitHub India: Celebrating A Community Connected By Code

A year ago, we were celebrating the launch of  GitHub India to serve the third largest developer community on GitHub. Today, I am thrilled to welcome  GitHub Satellite to India and introduce new programs to empower India’s rich community of developers, startups and students. We will also celebrate the achievements of India’s thriving developer community over the past year.

Open source powers India’s innovation

India has become an innovation powerhouse, with open source software development at the core. The growth in activity and engagement shows how powerful and invested India’s software development has become. Indian developers are truly building the future of software! Over the past year, we have seen over 1.8 million developers join GitHub in India, with the community now totaling 5.8 million developers, and growing.

*Over the past year

The number of contributions on the platform, such as issues and team discussions increased by 75 percent in India compared to about 40 percent for the rest of the world over the past year. In public repos, the number of contributors increased by 80 percent over the previous year. In fact, over 1 million developers created their first repository on GitHub in India, making it the fastest growing country in the world, in terms of developers contributing to open source. So it is no shock that we are predicting that more than 10 million developers from India will call GitHub home by 2023. It is so exciting to think about what all of these new developers will create!

#community #github india #github

Jolie  Reichert

Jolie Reichert

1595668020

Stay Safe on GitHub: Security Practices to Follow

GitHub is undoubtedly the largest and most popular social development platform in the world. According to its 2019 State of the Octoverse Report, GitHub is home to over 40 million, and the community keeps expanding every day.

As developers in this deeply interconnected community use open source code to build software, Github security should be a top priority. This is because extensive code re-use increases the risk of distributing vulnerabilities from one dependency or repository to another. As such, every contributor should focus on creating a secure development environment.

Here are eight security practices that GitHub users can follow to stay safe and protect their code:

Strengthen Access Controls

Implementing proper access control is one of the best practices for enhancing security, not only on GitHub but in every other environment where code security is imperative.

GitHub offers several options that users can employ to reduce the risk of improper exposure. But to start with, it is important to employ the least privilege model where users are only granted necessary permissions.

Here are some basic access control guidelines that you should follow:

  • Restrict the creation of repositories to prevent users from exposing organization information in public repositories.
  • Enable branch protection and status checks to ensure users can merge commits or manipulate branches safely.
  • Allow or disallow forking private repositories to ensure users do not expose or share organizational code with unauthorized parties.
  • Revoke access for all inactive users who are no longer part of the contributors.

#tutorial #github #access control #software security #repository management #github issues #source code analysis #github apps #github enterprise #git best practices

Jolie  Reichert

Jolie Reichert

1595581560

Stay Safe on GitHub: Security Practices to Follow

gthen Access Controls

Implementing proper access control is one of the best practices for enhancing security, not only on GitHub but in every other environment where code security is imperative.

GitHub offers several options that users can employ to reduce the risk of improper exposure. But to start with, it is important to employ the least privilege model where users are only granted necessary permissions.

Here are some basic access control guidelines that you should follow:

  • Restrict the creation of repositories to prevent users from exposing organization information in public repositories.
  • Enable branch protection and status checks to ensure users can merge commits or manipulate branches safely.
  • Allow or disallow forking private repositories to ensure users do not expose or share organizational code with unauthorized parties.
  • Revoke access for all inactive users who are no longer part of the contributors.
  • Review access rights to your GitHub projects periodically.
  • Ensure users do not share GitHub accounts or passwords.
  • Ensure every contributor uses two-factor authentication on their account.
  • Rotate personal access tokens and SSH keys

Never Store Credentials in Your GitHub Files

Leaking secrets to your GitHub repositories, either through code, configuration files, or commit messages, provides a gateway for attacks.

#tutorial #github #access control #software security #repository management #github issues #source code analysis #github apps #github enterprise #git best practices