Oral  Brekke

Oral Brekke


Popular Tips and Tricks for Your First Tech Job

Popular Tips and Tricks for Your First Tech Job

Starting a new job is daunting for anyone. Here's how to navigate the early days at your first tech job.

First days at work are scary. I still recall many instances where I lay awake at night before my first day at work, having an internal meltdown over what would happen the next day. Starting a new job is uncharted territory for most people. Even if you're a veteran in the industry, there's no denying that there can be a part of you that's a bit terrified of what is to come.

Understandably, a lot is happening. There are new people to meet, new projects and technologies to understand, documentation to read, tutorials to sit through, and endless HR presentations and paperwork to fill out. This can be overwhelming and, coupled with the considerable degree of uncertainty and unknowns you're dealing with, can be quite anxiety-inducing.

Two reasons motivated me to write about this subject. The first one being that back when I was a student, most of the discussion revolved around getting a job in tech, and no one talked about what happened next. How do you excel in your new role? Now that I look back, I think I assumed that the hard part is getting the job, and whatever comes after, I could probably figure out myself.

Similarly, once I started working in the industry, most of the career-related content I came across was about how to go from one senior level to another. No one really talked about what to do in the middle. What about the interns and the junior engineers? How do they navigate their early careers?

After completing three years of full-time professional experience as a software engineer (and a couple of internships before), I reflected on my time. I put together a list of tips and tricks I've employed while settling into a new tech role. I wanted to look beyond just the first couple of months and prioritize helping achieve long-term success.

Reflect on existing processes and documentation

Most new employees start by either having a ton of documentation thrown their way or none at all. Instead of being overwhelmed by either of these possibilities, you could view this as an opportunity.

Identify gaps in existing documentation and think about how you could improve it for the next engineer that gets onboarded. This not only shows initiative on your part but also demonstrates that you're committed to improving existing processes within your team.

I've seen both ends of the spectrum. I've been on teams with no documentation whatsoever. I've also been on teams that were very diligent with keeping their documentation up to date. Your path is pretty straightforward with the former, and you can work on creating that missing documentation. With the latter, you can always think of ways to improve what already exists. Sometimes, too much documentation in written form can also feel intimidating, especially for new employees. Some things might be better explained through other mediums, like video tutorials or screencasts.

Ask questions

I encourage you to look into whether a buddy will be assigned to you when you're starting. This is a fairly common practice at companies. The purpose of a buddy is to help you as you are onboarded. I've found this incredibly helpful because it gives you someone to direct all your questions, and you don't have to run around trying to find the right person/team.

While asking questions should always be encouraged, it is also necessary to do your homework before you ask those questions, including:

  • Do your research. This encompasses doing a web search, checking forums, and reading existing documentation. Use all the available tools at your disposal. However, it is essential to timebox yourself. You must balance doing your due diligence and keeping project deadlines and deliverables in mind.
  • Talk it out. As someone whose first language isn't English, I recommend talking things out loud before asking questions. In my experience, I've often found that, especially when I'm struggling with something difficult, I think in one language (probably my native language) and must explain it in another. This can be a bit challenging sometimes because doing that translation might not be straightforward.
  • Organize your thoughts. When struggling with something, it's very common to have many scrambled ideas that make sense to us but might not necessarily make sense to another person. I suggest sitting down, gathering your thoughts, writing them down, and talking through them out loud. This practice ensures that when you're explaining your thought process, it flows as intended, and the listener can follow your train of thought.

This approach is called the rubber duck technique, a common practice developers use while debugging. The concept is that sometimes explaining your problem to a third person can be very helpful in getting to the solution. This is also a testament to your excellent communication skills.

Respect people's time. Even if you're reaching out to someone like your buddy, be cognizant of the fact that they also have their day-to-day tasks to complete. Some things that I've tried out include the following:

  • Write down my questions and then set aside some time with my mentor so I could talk to them.
  • Compile questions instead of repeatedly asking for help so your mentor can get to them when they have time.
  • Schedule a quick 15-20 min video chat, especially if you want to share your screen, which is a great way to showcase your findings.

I think these approaches are better because you get someone's undivided attention instead of bothering them every couple of minutes when their attention might be elsewhere.

Deep dive into your projects

Even on teams with excellent documentation, starting your technical projects can be very daunting since multiple components are involved. Over time though, you will understand how your team does things. However, it can save you time and potential headaches to figure this out early on by keeping a handy list to refer to, including basic project setup, testing requirements, review and deployment processes, task tracking, and documentation.

If there's no documentation for the project you're starting on (a situation I have been in), see if you can identify the current or previous project owner and understand the basic project structure. This includes setting it up, deploying it, etc.

  • Identify your team's preference in the IDE (integrated development environment). You're free to use the IDE of your choice, but using the same one as your team can help, especially when debugging, since the choice of IDE impacts debugging. Different IDEs offer varying degrees of debugging support.
  • Understand how to do debugging, and I don't just mean using print statements (not that there's anything wrong with that approach). Leverage your team's experience here!
  • Understand testing requirements. This might depend on the scope of your project and general team practices, but the earlier you figure this out, the more confident you'll be in the changes you push to production.
  • Visualize the deployment process. This process can vary by team, company, etc. Regardless of how informal or formal it may be, make sure you understand how your changes get deployed to production, what the deployment pipeline looks like, how to deploy changes safely, what to do in case of failed builds, how to rollback faulty changes, and how to test your changes in production.
  • Understand the ticketing process. Understand how to document tickets and the level of detail expected. You will see a lot of variation here. Some companies expected us to submit our tickets daily, showing our progress. Other companies might not require that level of documentation.

Given everything I just mentioned, a beneficial, all-in-one exercise you can do in the first couple of weeks is to shadow another engineer and do peer coding sessions. This allows you to observe the entire process, end to end, from the moment a ticket is assigned to an engineer to when it gets deployed to production.

The first couple weeks can also feel frustrating if you're not yet given an opportunity to get your hands dirty. To counter this, ask your manager to assign some starter tickets to you. These are usually minor tasks like code cleanup or adding unit tests. Still, they allow you to tinker with the codebase, which helps improve your understanding and gives you a sense of accomplishment, which is a very encouraging feeling in the early days of a new job.

Speak up, especially when you're stuck

I want to stress the importance of communication when you're stuck. This happens, especially in the early months of a new job, and as frustrating as it can be, this is where your communication skills will shine.

  • Be transparent about blockers and your progress. Even if it's something as trivial as permission issues (a fairly common blocker for new employees), ensure that your manager is aware.
  • Don't wait until the last day to report if something will be delayed. Delays in your project push many other things forward. Share necessary project delays well in advance, so your manager can share this with stakeholders.
  • Don't forget things like thoroughly testing your changes or documenting your code just because you're in a rush.

Gain technical context

Gaining technical context is something I've personally struggled with, and I've actively worked on changing my approach in this area.

When I started as an intern, I would go in with a very focused mindset regarding what I wanted to learn. I'd have a laser-sharp focus on my project, but I'd completely turn a blind eye to everything else. Over the years, I realized that turning a blind eye to other or adjacent projects might not be the wisest decision.

First and foremost, it impacts your understanding of your work. I was naive to think I could be a good engineer if I focused exclusively on my project. That's just not true. You should take the time to understand other services with which your project might interact. You don't need to get into the nitty gritty, but developing a basic understanding goes a long way.

A common experience that new employees undergo is disconnecting from the rest of the company, which is a very natural feeling, especially at larger companies. I'm someone who develops a sense of exclusion very quickly, so when I moved to Yelp, a significantly larger company than my previous one, with projects of a much larger scale, I prioritized understanding the big picture. Not only did I work on developing an understanding of my project but also of other adjacent projects.

In my first few weeks at Yelp, I sat down with various engineers on my team and asked them to give me a bird's eye view of what I would be doing and the project's overarching goal. This approach was incredibly helpful because not only did I get varying degrees of explanations based on how senior the engineer was and how long they had been working on the project, but it also deepened my understanding of what I would be working on. I went into these meetings with the goal that my knowledge of the project should allow me to explain what I do to a stranger on the street. To this end, I asked my tech lead to clarify at what point my work came into the picture when a user opened the Yelp app and searched for something.

Architecture diagrams can also help in this scenario, especially when understanding how different services interact.

Establish expectations

For the longest time, I thought that all I needed to do was my best and be a good employee. If I was doing work, meeting goals, and no one complained, that should be good enough, right? Wrong.

You must be strategic with your career. You can't just outsource it to people's goodwill and hope you'll get the desired results just because you're meeting expectations.

  • Establish clear criteria the moment you start your new job. This varies by company, as some organizations have very well-defined measures while others might barely have any. If it's the latter, I suggest you sit down with your manager within the first couple of weeks and establish and unanimously agree on a criterion.
  • Make sure you thoroughly understand how you will be evaluated and what measures are used.

I remember walking out of my first evaluation very confused in my first full-time role. The whole conversation had been very vague and hand-wavy, and I had no clarity about my strengths, weaknesses, or even steps to improve.

At first, it was easy to attribute everything to my manager because the new employee in me thought this was their job, not mine. But over time, I realized that I couldn't just take a backseat as far as my performance evaluations were concerned. You can't just do good work and expect it to be enough. You have to actively take part in these conversations. You have to make sure that your effort and contributions are being noticed. From regularly contributing to technical design conversations to setting up socials for your team, ensure that your work is acknowledged.

Tying into establishing expectations is also the importance of actively seeking feedback. Don't wait until your formal performance evaluations every three or four months to find out how you're doing. Actively set up a feedback loop with your manager. Try to have regular conversations where you're seeking feedback, as scary as that may be.

Navigate working in distributed teams

The workplace has evolved over the past two years, and working in remote and distributed teams is now the norm instead of a rarity. I've listed some tips to help you navigate working in distributed teams:

  • Establish core hours and set these on your calendar. These are a set of hours that your team will unanimously agree upon, and the understanding is that everyone should be online and responsive during these hours. This is also convenient because meetings only get scheduled within these hours, making it much easier to plan your day.
  • Be mindful of people's time zones and lunch hours.
  • In the virtual world, you need to make a greater effort to maintain social interactions, and little gestures can go a long way in helping make the work environment much friendlier. These include the following:
    • When starting meetings, exchange pleasantries and ask people how their weekend/day has been. This helps break the ice and enables you to build a more personal connection with your team members, which goes beyond work.
    • Suggest an informal virtual gathering periodically for some casual chit-chat with the team.

Maintain a work-life balance

At the beginning of your career, it's easy to think that it's all about putting in those hours, especially given the 'hustle culture' narrative that we're fed 24/7 and the idea that a work-life balance is established in the later stages of our careers. This idea couldn't be further from the truth because a work-life balance isn't just magically going to occur for you. You need to actively and very diligently work on it.

The scary thing about not having a work-life balance is that it slowly creeps up on you. It starts with you checking emails after hours and then slowly makes its way to you, working over weekends and feeling perpetually exhausted.

[ Related read How I recognize and prevent burnout in open source ]

I've listed some tips to help you avoid this situation:

  • Turn off/pause notifications and emails and set yourself to offline.
  • Do not work weekends. It starts with you working one weekend, and the next thing you know, you're working most weekends. Whatever it is, it can wait until Monday.
  • If you're an on-call engineer, understand your company's policies surrounding that. Some companies offer monetary compensation, while others may give time off in lieu. Use this time. Not using your benefits like PTO (paid time off) and wellness days really shortens your longevity at work.

Wrap up

There's no doubt that starting a new job is stressful and difficult. I hope that these tips and tricks will make your first few months easier and set you up for great success with your new position. Remember to communicate, establish your career goals, take initiative, and use the company's tools effectively. I know you'll do great!

Original article source at: https://opensource.com/

#job #tips #tricks #careers #scale 

Popular Tips and Tricks for Your First Tech Job

Best 5 Open Source tools To Take Control Of Your Own Data

Best 5 Open Source tools To Take Control Of Your Own Data

Take your data out of the hands of proprietary corporations and into your own hands with open source solutions.

Back in the old days, there was no cloud. Everything was on your phone. Maybe you had a microSD card that you backed up everything on. Eventually, the SD card would stop working, and you lost everything unless you'd saved it on a writable CD or DVD or stored it on your PC. Self-hosting was tough in those days, and it was expensive. Software wasn't as accessible as it is now.

Today, it's common for phones not to have an SD card slot. The good news is that software is good enough that you can back up everything you own on a single Raspberry Pi, spare laptop, or mini-PC.

You can own your own data and data stack by self-hosting. Containers and personal cloud software make it possible. In this article, I share several of my favorite ways to make that happen.


A container is software consisting of everything required for an application to work. Each container acts as its own computer and doesn't affect other containers or software on your host server. With this technology, you can keep your software up to date without breaking your system. It also allows you to control where data is stored, making backing up your data easy.

Learning to use containers can be intimidating. I started with Docker, although many other container engines exist, including Podman and Istio. It didn't take long for me to get the hang of it. I found that containers make self-hosting services easier than ever. If you're familiar with installing applications on the Linux terminal, you'll get the hang of it quickly.


One of the easiest ways to back up your data is through Syncthing. This open source software synchronizes data across different devices. Select the folder you want to exist on two (or more) devices, and then that data and any changes to it are reliably kept updated on each device.

This isn't just a convenient way to share data; it's also a backup scheme. Should one hard drive go down, you have a copy of your important data on another device. Once you restore the broken PC, you can reconnect with Syncthing, and it synchronizes everything you lost. Syncthing is useful for storing data on multiple devices in different locations, including on machines outside your house (at a friend or family member's home, for instance). It's also a great off-site backup tool.


Nextcloud is an open source alternative to Google Drive or Dropbox. It's also multi-user, so once you install Nextcloud, you can set up distinct logins for each user. There are a variety of Nextcloud apps for phones and PCs. You can auto-synchronize your photos and then view photos from the app or a web browser. You can mark files public to share them with the rest of the internet.

Similar to Syncthing, a client can also synchronize files between your server and your desktop or laptop. Nextcloud also has components to let you manage contacts and calendars, and of course, you can synchronize them between other devices.

In fact, you can install many kinds of apps on Nextcloud, including programs to store notes, manage email, chat with others, and more. The Nextcloud environment includes an "app store" of open source applications.


If you're interested in managing your own media server, then you're in luck. Jellyfin takes your media, like movies, TV shows, and music, and makes them available to any device you allow access. You can use Jellyfin to scrape the web for metadata, automatically retrieving cover art and media information.

Jellyfin also works without the internet. When your internet goes out and you can't connect to your favorite streaming service, you can use your local network to connect to your Jellyfin server and watch or listen to your media. I have had this happen, and I can attest that it's a great way to keep yourself and your family entertained.

Home server

These are just a few services you can install on any Linux PC or laptop. You need a server that's always on to ensure your services are constantly available. That doesn't necessitate a major investment, though. You can use many kinds of computers as Linux servers. The easiest and most inexpensive is a Raspberry Pi, which has excellent support with a helpful and enthusiastic community.

Getting a Raspberry Pi set up is "as easy as pie," thanks to the Raspberry Pi imager. It only needs about 5W of power, so it doesn't take much energy to keep it running. There are many similar low-powered devices, including the Odroid, Orange Pi, and Rockpi.

You can also install Linux on any PC or laptop and run it as a server. It's a great way to repurpose old computers.

Finally, you could use a Virtual Private Server (VPS). A VPS is a "slice" of space on a server located in a big data center. You pay rent on the server space and maintain it as you wish.

Your own data

When you put data on the cloud, it can be used without your control or consent. It may even be used without your knowledge. I don't foresee that issue improving.

We don't need private companies handling our data anymore. You can often replace corporate services to reduce the amount of data you're giving away.

In my opinion, we should all own our data, and we need to do it correctly, with open source. We can host services for personal use and for family and friends. I synchronize my calendar and contacts with my personal server (a Raspberry Pi in my home). I think it's worth fighting for, and there's no better time than right now.

Original article source at: https://opensource.com/

#linux #raspberrypi #scale #opensource #data #take 

Best 5 Open Source tools To Take Control Of Your Own Data

Best 4 Questions Open Source Engineers Should Ask to Mitigate Risk

Best 4 Questions Open Source Engineers Should Ask to Mitigate Risk

What do you do with a finite amount of time to deal with an infinite number of things that can go wrong? 

At Shopify, we use and maintain a lot of open source projects, and every year we prepare for Black Friday Cyber Monday (BFCM) and other high-traffic events to make sure our merchants can sell to their buyers. To do this, we built an infrastructure platform at a large scale that is highly complex, interconnected, globally distributed, requiring thoughtful technology investments from a network of teams. We’re changing how the internet works, where no single person can oversee the full design and detail at our scale.

Over BFCM 2022, we served 75.98M requests per minute to our commerce platform at peak. That’s 1.27M requests per second. Working at this massive scale in a complex and interdependent system, it would be impossible to identify and mitigate every possible risk. This article breaks down a high-level risk mitigation process into four questions that can be applied to nearly any scenario to help you make the best use of your time and resources available.

1. What are the risks?

To inform mitigation decisions, you must first understand the current state of affairs. We expand our breadth of knowledge by learning from people from all corners of the platform. We run “what could go wrong” (WCGW) exercises where anyone building or interested in infrastructure can highlight a risk. These can be technology risks, operational risks, or something else. Having this unfiltered list is a great way to get a broad understanding of what could happen.

The goal here is visibility.

2. What is worth mitigating?

Great brainstorming leaves us with a large and daunting list of risks. With limited time to fix things, the key is to prioritize what is most important to our business. To do this, we vote on risks, then gather technical experts to discuss highest ranked risks in more detail, including their likelihood and severity. We make decisions about what and how to mitigate, and which team will own each action item.

The goal here is to optimize how we spend our time.

3. Who makes what decisions?

In any organization, there are times when waiting for a perfect consensus is not possible or not effective. Shopify moves tremendously fast because we make sure to identify decision makers, then empower them to gather input, weigh risks/rewards, and come to a decision. Often the decision is best made by the subject matter expert or who bears the most benefit or repercussions of whatever direction we choose.

The goal here is to align incentives and accountability.

4. How do you communicate?

We move fast but still need to keep stakeholders and close collaborators informed. We summarize key findings and risks from our WCGW exercises so that we all land on the same page about our risk profile. This may include key risks or single points of failure. We over-communicate so that we’re aligned and aware and stakeholders have opportunities to interject.

The goal here is alignment and awareness.

Solving the right things when there is uncertainty

Underlying all these questions is the uncertainty in our working environment. You never have all the facts or know exactly which components will fail when and how. The best way to deal with uncertainty is by using probability.

Expert poker players know that great bets don’t always yield great outcomes, and bad bets don’t always yield bad outcomes. What’s important is to bet on the probability of outcomes, where over enough rounds, your results will converge to expectation. The same applies in engineering, where we constantly make bets and learn from them. Great bets require clearly distinguishing the quality of your decisions versus outcomes. It means not over-indexing on bad decisions that led to lucky outcomes or great decisions that happen to run into very unlucky scenarios.

Knowing that we can’t control everything also helps us stay calm, which is vital for us to practice good judgment in high-pressure situations.

When it comes to BFCM (and life in general), no one can predict the future or fully protect against all risks. The question is, what would you change looking back? In hindsight, would you feel confident that you prioritized the most important things and made thoughtful bets using the information available? Did you facilitate meaningful discussions with the right people? Could you justify your actions to your customers and their customers?

Original article source at: https://opensource.com/

#questions #opensource #business #devops #scale 

Best 4 Questions Open Source Engineers Should Ask to Mitigate Risk
Lawrence  Lesch

Lawrence Lesch


Tonal: A Functional Music Theory Library for Javascript


tonal is a music theory library. Contains functions to manipulate tonal elements of music (note, intervals, chords, scales, modes, keys). It deals with abstractions (not actual music or sound).

tonal is implemented in Typescript and published as a collection of Javascript npm packages.

It uses a functional programing style: all functions are pure, there is no data mutation, and entities are represented by data structures instead of objects.


import { Interval, Note, Scale } from "tonal";

Note.midi("A4"); // => 60
Note.freq("a4").freq; // => 440
Note.accidentals("c#2"); // => '#'
Note.transpose("C4", "5P"); // => "G4"
Interval.semitones("5P"); // => 7
Interval.distance("C4", "G4"); // => "5P"
Scale.get("C major").notes; // =>["C", "D", "E", "F", "G", "A", "B"];


Install all packages at once:

npm install --save tonal


Tonal is compatible with both ES5 and ES6 modules, and browser.

ES6 import:

import { Note, Scale } from "tonal";

ES5 require:

const { Note, Scale } = require("tonal");


You can use the browser version from jsdelivr CDN directly in your html:

<script src="https://cdn.jsdelivr.net/npm/tonal/browser/tonal.min.js"></script>

Or if you prefer, grab the minified browser ready version from the repository.

Bundle size

tonal includes all published modules.

Although the final bundle it is small, you can reduce bundle sizes even more by installing the modules individually, and importing only the functions you need.

Note that individual modules are prefixed with @tonaljs/. For example:

npm i @tonaljs/note
import { transpose } from "@tonaljs/note";
transpose("A4", "P5");


Generally, you just need to install tonal package (before it was called @tonaljs/tonal).

The API documentation is inside README.md of each module 👇

Notes and intervals

Scales and chords

Keys, chord progressions

Time, rhythm



Read contributing document. To contribute open a PR and ensure:

  • If is a music theory change (like the name of a scale) link to reliable references.
  • If is a new feature, add documentation: changes to README of the affected module(s) are expected.
  • Ad tests: changes to the test.ts file of the affected module(s) are expected.
  • All tests are green


This library takes inspiration from other music theory libraries:

Projects using tonal

Showcase of projects that are using Tonal:

Thank you all!

Add your project here by editing this file

Download Details:

Author: Tonaljs
Source Code: https://github.com/tonaljs/tonal 

#typescript #javascript #music #functional #scale 

Tonal: A Functional Music Theory Library for Javascript
Gordon  Matlala

Gordon Matlala


How to Build and Scale a Remote Engineering Team

Building and scaling engineering teams is more complicated than just hiring additional people. As teams grow, everything has to change. In the latest on-demand Toptal Webinar, our Vice President of R&D Bozhidar Batsov and Chief People Officer Michelle Labbe discuss how to build and scale best-in-class engineering teams.

Companies that have been successful as small entities often think that scaling up means doing what they’ve been doing but on a larger scale. This couldn’t be further from the truth. For instance, building and scaling engineering teams is more complicated than just hiring additional people. As teams grow, everything—from an organization’s structure to its culture to its skills training—has to change.

In this new on-demand webinar, Toptal’s Vice President of R&D Bozhidar Batsov and Chief People Officer Michelle Labbe discuss building and scaling best-in-class engineering teams, taking into account the unique circumstances that many companies are operating under currently.

Though the pandemic will end at some point, the changes it has brought about in how we work are likely here to stay. More and more companies are realizing there are benefits to having remote teams, and many workers are interested in maintaining a lifestyle in which they’re more productive yet also have more free time.

The best companies know this and have created cultures intended to attract top talent and keep them happy. As Bozhidar and Michelle point out, these efforts include recognizing that trust is critical, that different people are needed at different stages of a company’s growth, and that offering the most challenging and interesting projects to your best people really will aid in retention.

Other topics they discuss include:

  • The need to make tough decisions as a company grows
  • Why people are more important than technology in any company
  • The importance of video calls in building team connectivity
  • The role of engagement in a company’s culture and why it’s key to success

What may work for a smaller organization won’t necessarily scale up, and companies need to make tough decisions as they grow and mature. Bozhidar and Michelle offer solutions on how to handle the transitions, which include looking beyond technology challenges and focusing on effective communication on a large scale.

Click here to access this webinar and learn more about how to define success in a remote engineering world.

Original article source at: https://www.toptal.com/

#scale #engineering 

How to Build and Scale a Remote Engineering Team

A Tale of AWS Cost Optimization: Efficiency at Scale

Understanding total spend is a common challenge for cloud users, especially on projects with complex pricing models. This article explores the top AWS cost optimizations that will help you scale your platform effectively.

I recently launched a cryptocurrency analysis platform, expecting a small number of daily users. However, when some popular YouTubers found the site helpful and published a review, traffic grew so quickly that it overloaded the server, and the platform (Scalper.AI) became inaccessible. My original AWS EC2 environment needed extra support. After considering multiple solutions, I decided to use AWS Elastic Beanstalk to scale my application. Things were looking good and running smoothly, but I was taken aback by the costs in the billing dashboard.

This is not an uncommon issue. A survey from 2021 found that 82% of IT and cloud decision-makers have encountered unnecessary cloud costs, and 86% don’t feel they can get a comprehensive view of all their cloud spending. Though Amazon offers a detailed overview of additional expenses in its documentation, the pricing model is complex for a growing project. To make things easier to understand, I’ll break down a few relevant optimizations to reduce your cloud costs.

Why I Chose AWS

The goal of Scalper.AI is to collect information about cryptocurrency pairs (the assets swapped when trading on an exchange), run statistical analyses, and provide crypto traders with insights about the state of the market. The technical structure of the platform consists of three parts:

  • Data ingestion scripts
  • A web server
  • A database

The ingestion scripts gather data from different sources and load it to the database. I had experience working with AWS services, so I decided to deploy these scripts by setting up EC2 instances. EC2 offers many instance types and lets you choose an instance’s processor, storage, network, and operating system.

I chose Elastic Beanstalk for the remaining functionality because it promised smooth application management. The load balancer properly distributed the burden among my server’s instances, while the autoscaling feature handled adding new instances for an increased load. Deploying updates became very easy, taking just a few minutes.

Scalper.AI worked stably, and my users no longer faced downtime. Of course, I expected an increase in spending since I added extra services, but the numbers were much larger than I had predicted.

How I Could Have Reduced Cloud Costs

Looking back, there were many areas of complexity in my project’s use of AWS services. We’ll examine the budget optimizations I discovered while working with common AWS EC2 features: burstable performance instances, outbound data transfers, elastic IP addresses, and terminate and stop states.

Burstable Performance Instances

My first challenge was supporting CPU power consumption for my growing project. Scalper.AI’s data ingestion scripts provide users with real-time information analysis; the scripts run every few seconds and feed the platform with the most recent updates from crypto exchanges. Each iteration of this process generates hundreds of asynchronous jobs, so the site’s increased traffic necessitated more CPU power to decrease processing time.

The cheapest instance offered by AWS with four vCPUs, a1.xlarge, would have cost me ~$75 per month at the time. Instead, I decided to spread the load between two t3.micro instances with two vCPUs and 1GB of RAM each. The t3.micro instances offered enough speed and memory for the job I needed at one-fifth of the a1.xlarge’s cost. Nevertheless, my bill was still larger than I expected at the end of the month.

In an effort to understand why, I searched Amazon’s documentation and found the answer: When an instance’s CPU usage falls below a defined baseline, it collects credits, but when the instance bursts above baseline usage, it consumes the previously earned credits. If there are no credits available, the instance spends Amazon-provided “surplus credits.” This ability to earn and spend credits causes Amazon EC2 to average an instance’s CPU usage over 24 hours. If the average usage goes above the baseline, the instance is billed extra at a flat rate per vCPU-hour.

I monitored the data ingestion instances for multiple days and found that my CPU setup, which was intended to cut costs, did the opposite. Most of the time, my average CPU usage was higher than the baseline.

A chart has three drop-down selections chosen at the top of the screen. The first two, at the left, are "Feb 10, 2022 - Feb 19, 2022" and "Daily," followed by a small, circled "i." The third is on the right of the screen and says "Line" with a symbol for a line graph. Below the drop-down selections, the chart contains two line graphs with filters. At the top, the filter line reads "Group by: Usage Type" (the selected filter, which has a blue background) and then shows the unselected filters: "Service, Linked Account, Region, Instance Type, Resource, Cost Category, Tag, More," with "Resource" grayed out and the last three having drop-down arrows beside them. The top line graph has "Costs ($)" on the y-axis, ranging from 0.0 to 2.5. The bottom line graph has "Usage (vCPU-Hours)" on the y-axis, ranging from 0 to 40. Both line graphs share dates labeled on the x-axis, ranging from Feb-10 to Feb-19, and a key labeling their purple lines: "USE2-CPUCredits:t3." The top line graph has approximately eight points connected linearly and trends upward over time: one point around (Feb-10, $0.3), a second around (Feb-11, $0.6), a third around (Feb-12, $0.5), a fourth around (Feb-14, $2.1), a fifth around (Feb-15, $2.4), a sixth around (Feb-16, $2.3), a seventh around (Feb-18, $2.3), and an eighth around (Feb-19, $2.6). The bottom line graph also has approximately eight points connected linearly and trends upward over time: one point around (Feb-10, 5 vCPU-Hours), a second around (Feb-11, 15 vCPU-Hours), a third around (Feb-12, 10 vCPU-Hours), a fourth around (Feb-14, 40 vCPU-Hours), a fifth around (Feb-15, 50 vCPU-Hours), a sixth around (Feb-16, 45 vCPU-Hours), a seventh around (Feb-18, 45 vCPU-Hours), and an eighth around (Feb-19, 55 vCPU-Hours).The above chart displays cost surges (top graph) and increasing CPU credit usage (bottom graph) during a period when CPU usage was above the baseline. The dollar cost is proportional to the surplus credits spent, since the instance is billed per vCPU-hour.


I had initially analyzed CPU usage for a few crypto pairs; the load was small, so I thought I had plenty of space for growth. (I used just one micro-instance for data ingestion since fewer crypto pairs did not require as much CPU power.) ​However, I realized the limitations of my original analysis once I decided to make my insights more comprehensive and support the ingestion of data for hundreds of crypto pairs—cloud service analysis means nothing unless performed at the correct scale.

Outbound Data Transfers

Another result of my site’s expansion was increased data transfers from my app due to a small bug. With traffic growing steadily and no more downtime, I needed to add features to capture and hold users’ attention as soon as possible. My newest update was an audio alert triggered when a crypto pair’s market conditions matched the user’s predefined parameters. Unfortunately, I made a mistake in the code, and audio files loaded into the user’s browser hundreds of times every few seconds.

The impact was huge. My bug generated audio downloads from my web servers, causing additional outbound data transfers. A tiny error in my code resulted in a bill almost five times larger than the previous ones. (This wasn’t the only consequence: The bug could cause a memory leak in the user’s computer, so many users stopped coming back.)


A chart similar to the previous one but with the first drop-down reading "Jan 06, 2022 - Jan 15, 2022," the top line graph's "Costs ($)" ranging from 0 to 30, and the bottom line graph having "Usage (GB)" on the y-axis, ranging from 0 to 300. Both line graphs share dates labeled on the x-axis, ranging from Jan-06 to Jan-15, and a key labeling their purple lines: "USE2-DataTransfer-Out-Bytes." The top line graph has approximately eight points connected linearly and trends upward over time: one point around (Jan-06, $2), a second around (Jan-08, $4), a third around (Jan-09, $7), a fourth around (Jan-10, $6), a fifth around (Jan-12, $15), a sixth around (Jan-13, $25), a seventh around (Jan-14, $24), and an eighth around (Jan-15, $29). The bottom line graph also has approximately eight points connected linearly and trends upward over time: one point around (Jan-06, 10 GB), a second around (Jan-08, 50 GB), a third around (Jan-09, 80 GB), a fourth around (Jan-10, 70 GB), a fifth around (Jan-12, 160 GB), a sixth around (Jan-13, 270 GB), a seventh around (Jan-14, 260 GB), and an eighth around (Jan-15, 320 GB).The above chart displays cost surges (top graph) and increasing outbound data transfers (bottom graph). Because outbound data transfers are billed per GB, the dollar cost is proportional to the outbound data usage.

Data transfer costs can account for upward of 30% of AWS price surges. EC2 inbound transfer is free, but outbound transfer charges are billed per GB ($0.09 per GB when I built Scalper.AI). As I learned the hard way, it is important to be cautious with code affecting outbound data; reducing downloads or file loading where possible (or carefully monitoring these areas) will protect you from higher fees. These pennies add up quickly since charges for transferring data from EC2 to the internet depend on the workload and AWS Region-specific rates. A final caveat unknown to many new AWS customers: Data transfer becomes more expensive between different locations. However, using private IP addresses can prevent extra data transfer costs between different availability zones of the same region.

Elastic IP Addresses

Even when using public addresses such as Elastic IP addresses (EIPs), it is possible to lower your EC2 costs. EIPs are static IPv4 addresses used for dynamic cloud computing. The “elastic” part means that you can assign an EIP to any EC2 instance and use it until you choose to stop. These addresses let you seamlessly swap unhealthy instances with healthy ones by remapping the address to a different instance in your account. You can also use EIPs to specify a DNS record for a domain so that it points to an EC2 instance.

AWS provides only five EIPs per account per region, making them a limited resource and costly with inefficient use. AWS charges a low hourly rate for each additional EIP and bills extra if you remap an EIP more than 100 times in a month; staying under these limits will lower costs.

Terminate and Stop States

AWS provides two options for managing the state of running EC2 instances: terminate or stop. Terminating will shut down the instance, and the virtual machine provisioned for it will no longer be available. Any attached Elastic Block Store (EBS) volumes will be detached and deleted, and all data stored locally in the instance will be lost. You will no longer be charged for the instance.

Stopping an instance is similar, with one small difference. The attached EBS volumes are not deleted, so their data is preserved, and you can restart the instance at any time. In both cases, Amazon no longer charges for using the instance, but if you opt for stopping instead of terminating, the EBS volumes will generate a cost as long as they exist. AWS recommends stopping an instance only if you expect to reactivate it soon.

But there’s a feature that can enlarge your AWS bill at the end of the month even if you terminated an instance instead of stopping it: EBS snapshots. These are incremental backups of your EBS volumes stored in Amazon’s Simple Storage Service (S3). Each snapshot holds the information you need to create a new EBS volume with your previous data. If you terminate an instance, its associated EBS volumes will be deleted automatically, but its snapshots will remain. As S3 charges by the volume of data stored, I recommend that you delete these snapshots if you won’t use them shortly. AWS features the ability to monitor per-volume storage activity using the CloudWatch service:

  1. While logged into the AWS Console, from the top-left Services menu, find and open the CloudWatch service.
  2. On the left side of the page, under the Metrics collapsible menu, click on All Metrics.
  3. The page shows a list of services with metrics available, including EBS, EC2, S3, and more. Click on EBS and then on Per-volume Metrics. (Note: The EBS option will be visible only if you have EBS volumes configured on your account.)
  4. Click on the Query tab. In the Editor view, copy and paste the command SELECT AVG(VolumeReadBytes) FROM "AWS/EBS" GROUP BY VolumeId and then click Run. (Note: CloudWatch uses a dialect of SQL with a unique syntax.)


A webpage appears with a dark blue header menu on top of the page, which from left to right includes the aws logo, a "Services" drop-down, a search bar, a code icon, a bell icon, a question mark icon, a drop-down text reading "N. Virginia", and a drop-down text reading "Toptal Test Account." Underneath, we see the main part of the webpage in white. The left of the page has a scrolling menu with title "CloudWatch" and an "X." Underneath, from top to bottom, the menu reads: "Favorites and recents," "Dashboards," "Alarms" (in bold, with three drop-downs: "In alarm," "All alarms," and "Billing"), "Logs" (in bold, with two drop-downs: "Log groups" and "Log insights"), "Metrics" (in bold, with three drop-downs: "All metrics," which is highlighted in bright orange, "Explorer," and "Streams"), "X-Ray traces" (in bold), "Events" (in bold), and "Application monitoring" (in bold). All bolded text in the menu has a drop-down menu triangle to the left of the text. The middle of the webpage displays a graph on the top half of the page, and an editor on the bottom half of the page. The graph has two lines of headers. The first line reads "CloudWatch > Metrics" on the left (with "CloudWatch" text in blue) and "Switch to your original interface" in blue on the right. The second line reads "Untitled graph" with an edit icon on the left, and displays options on the right: "1h, 3h, 12h, 1d, 3d, 1w, Custom" (with "3h" in blue, and "Custom" having a calendar icon), "Line" (with a drop-down icon), "Actions" (with a drop-down icon), and a refresh button with a drop-down icon. The graph itself has text in its center reading: "Your CloudWatch graph is empty. Select some metrics to appear here." The editor also has two lines of headers. The first line reads "Browse," "Query" (highlighted in orange), "Graphed metrics," "Options," and "Source" on the left, and "Add math" (with a drop-down icon) and "Add query" (with a drop-down icon) on the right. The second line reads "Metrics Insights - query editor" and "Info" (in blue) on the left, and "Builder" and "Editor" (with "Editor" selected) on the right. Below the editor is an orange "Run" button on the left, and the text "Use Ctrl + Enter to run query, Ctrl + Space to autocomplete" on the right. The right of the webpage has a white menu, which reads "Queries" and "Help" from top to bottom.An overview of the CloudWatch monitoring setup described above (shown with empty data and no metrics selected). If you have existing EBS, EC2, or S3 instances on your account, these will show up as metric options and will populate your CloudWatch graph.

CloudWatch offers a variety of visualization formats for analyzing storage activity, such as pie charts, lines, bars, stacked area charts, and numbers. Using CloudWatch to identify inactive EBS volumes and snapshots is an easy step toward optimizing cloud costs.

Extra Money in Your Pocket

Though AWS tools such as CloudWatch offer decent solutions for cloud cost monitoring, various external platforms integrate with AWS for more comprehensive analysis. For example, cloud management platforms like VMWare’s CloudHealth show a detailed breakdown of top spending areas that can be used for trend analysis, anomaly detection, and cost and performance monitoring. I also recommend that you set up a CloudWatch billing alarm to detect any surges in charges before they become excessive.

Amazon provides many great cloud services that can help you delegate the maintenance work of servers, databases, and hardware to the AWS team. Though cloud platform costs can easily grow due to bugs or user errors, AWS monitoring tools equip developers with the knowledge to defend themselves from additional expenses.

With these cost optimizations in mind, you’re ready to get your project off the ground—and save hundreds of dollars in the process.


The AWS logo with the word "PARTNER" and the text "Advanced Tier Services" below that.

As an Advanced Consulting Partner in the Amazon Partner Network (APN), Toptal offers companies access to AWS-certified experts, on demand, anywhere in the world.

Original article source at: https://www.toptal.com/

#scale #aws 

A Tale of AWS Cost Optimization: Efficiency at Scale
Royce  Reinger

Royce Reinger


Cortex: Production infrastructure for Machine Learning At Scale


Production infrastructure for machine learning at scale

Deploy, manage, and scale machine learning models in production.

Serverless workloads

Realtime - respond to requests in real-time and autoscale based on in-flight request volumes.

Async - process requests asynchronously and autoscale based on request queue length.

Batch - run distributed and fault-tolerant batch processing jobs on-demand.

Automated cluster management

Autoscaling - elastically scale clusters with CPU and GPU instances.

Spot instances - run workloads on spot instances with automated on-demand backups.

Environments - create multiple clusters with different configurations.

CI/CD and observability integrations

Provisioning - provision clusters with declarative configuration or a Terraform provider.

Metrics - send metrics to any monitoring tool or use pre-built Grafana dashboards.

Logs - stream logs to any log management tool or use the pre-built CloudWatch integration.

Built for AWS

EKS - Cortex runs on top of EKS to scale workloads reliably and cost-effectively.

VPC - deploy clusters into a VPC on your AWS account to keep your data private.

IAM - integrate with IAM for authentication and authorization workflows.

Download Details:

Author: Cortexlabs
Source Code: https://github.com/cortexlabs/cortex 
License: Apache-2.0 license

#machinelearning #infrastructure #scale 

Cortex: Production infrastructure for Machine Learning At Scale
Hermann  Frami

Hermann Frami


TiDB: An Open-source, Cloud-native, Distributed

What is TiDB?

TiDB ("Ti" stands for Titanium) is an open-source NewSQL database that supports Hybrid Transactional and Analytical Processing (HTAP) workloads. It is MySQL compatible and features horizontal scalability, strong consistency, and high availability.

For more details and latest updates, see TiDB docs and release notes.

For future plans, see TiDB Roadmap.

Quick start

Start with TiDB Cloud

TiDB Cloud is the fully-managed service of TiDB, currently available on AWS and GCP.

Quickly check out TiDB Cloud with a free trial.

See TiDB Cloud Quick Start Guide.

Start with TiDB

See TiDB Quick Start Guide.

Start developing TiDB

See Get Started chapter of TiDB Dev Guide.


You can join these groups and chats to discuss and ask TiDB related questions:

In addition, you may enjoy following:

For support, please contact PingCAP.


The community repository hosts all information about the TiDB community, including how to contribute to TiDB, how TiDB community is governed, how special interest groups are organized, etc.


Contributions are welcomed and greatly appreciated. All the contributors are welcomed to claim your reward by filing this form. See Contribution to TiDB for details on typical contribution workflows. For more contributing information, click on the contributor icon above.

Case studies




Download Details:

Author: Pingcap
Source Code: https://github.com/pingcap/tidb 
License: Apache-2.0 license

#serverless #mysql #go #sql #database #scale 

TiDB: An Open-source, Cloud-native, Distributed
Rupert  Beatty

Rupert Beatty


CollectionViewPagingLayout - PagingView for SwiftUI

CollectionViewPagingLayout - PagingView for SwiftUI


Layout Designer


Custom implementations, UIKit: TransformableView, SwiftUI: TransformPageView

Click on image to see the code


UIKit: SnapshotTransformView, SwiftUI: SnapshotPageView

UIKit: ScaleTransformView, SwiftUI: ScalePageView

UIKit: StackTransformView, SwiftUI: StackPageView



A simple but powerful framework that lets you make complex layouts for your UICollectionView.
The implementation is quite simple. Just a custom UICollectionViewLayout that gives you the ability to apply transforms to the cells.
No UICollectionView inheritance or anything like that.

A simple View that lets you make page-view effects.
Powered by UICollectionView

For more details, see How to use


This framework doesn't contain any external dependencies.


# Podfile

target 'YOUR_TARGET_NAME' do
    pod 'CollectionViewPagingLayout'

Replace YOUR_TARGET_NAME and then, in the Podfile directory, type:

$ pod install


Add this to Cartfile

github "CollectionViewPagingLayout"

and then, in the Cartfile directory, type:

$ carthage update

Swift Package Manager

using Xcode:

File > Swift Packages > Add Package Dependency


Just add all the files under Lib directory to your project

How to use

Using Layout Designer

There is a macOS app to make it even easier for you to build your custom layout.
It allows you to tweak many options and see the result in real-time.
It also generates the code for you. So, you can copy it to your project.

You can purchase the app from App Store and support this repository, or you can build it yourself from the source.
Yes, the macOS app is open-source too!.

Continue for SwiftUI or UIKit


Specify the number of visible items:
You need to specify the number of visible items.
Since this layout gives you the flexibility to show the next and previous cells,
By default, it loads all of the cells in the collectionview's frame, which means iOS keeps all of them in the memory.
Based on your design, you can specify the number of items that you need to show.

It doesn't support RTL layouts:
however, you can achieve a similar result by tweaking options, for instance try StackTransformViewOptions.Layout.reverse


Download Details:

Author: Amirdew
Source Code: https://github.com/amirdew/CollectionViewPagingLayout 
License: MIT license

#swift #gallery #scale 

CollectionViewPagingLayout - PagingView for SwiftUI
Lawrence  Lesch

Lawrence Lesch


Image-scale: Scale Images to Fit Or Fill any Target Container

Image Scale

Scale images to fit or fill any target container via two simple properties: scale and align.

This plugin is greatly inspired from Sproutcore SC.ImageView.


image-scale depends on jQuery. To use it, include this in your page :

<script src="jquery.js" type="text/javascript"></script>
<script src="image-scale.js" type="text/javascript"></script>


If you want to identify the images that you want to scale, you can add a class to them. In this example we are adding a class call scale.

You can also set the data-scale and data-align attributes directly to the images if you want to override the default setting.

<div class="image-container">
  <img class="scale" data-scale="best-fit-down" data-align="center" src="img/example.jpg">

Now add this JavaScript code to your page :

$(function() {

You're done.



Determines how the image will scale to fit within its containing space. Possible values:

  • fill - stretches or compresses the source image to fill the target frame
  • best-fill - fits the shortest side of the source image within the target frame while maintaining the original aspect ratio
  • best-fit - fits the longest edge of the source image within the target frame while maintaining the original aspect ratio
  • best-fit-down - same as best-fit but will not stretch the source if it is smaller than the target
  • none - the source image is left unscaled
Type: String
Default: best-fill


Align the image within its frame. Possible values:

  • left
  • right
  • center
  • top
  • bottom
  • top-left
  • top-right
  • bottom-left
  • bottom-right
Type: String
Default: center


A jQuery Object against which the image size will be calculated. If null, the parent of the image will be used.

Type: jQuery Object
Default: null


A boolean determining if the parent should hide its overflow.

Type: Boolean
Default: true


A duration in milliseconds determining how long the fadeIn animation will run when your image is scale for the firstTime.

Set it to 0 if you don't want any animation.

Type: Number or String
Default: 0


A boolean indicating if the image size should be rescaled when the window is resized.

The window size is checked using requestAnimationFrame for good performance.


  rescaleOnResize: true
Type: Boolean
Default: false


A function that will be call each time the receiver is scaled.


  didScale: function(firstTime, options) {
    console.log('did scale img: ', this.element);
Type: Function
  - firstTime {Boolean} true if the image was scale for the first time.
  - options {Object} the options passed to the scale method.


A number indicating the debug level :

  1. silent
  2. error
  3. error & warning
  4. error & warning & notice
Type: Number
Default: 0



Main method. Used to scale the images.

When rescaleOnResize is set to true, this method is executed each time the windows size changes.

If rescaleOnResize is set to false, you may want to call it manually. Here is an example on how you should do it:



Removes the data for the element.

Here is an example on how you can call the destroy method:



See it in action on our home page.

You can also check out the Sproutcore Automatic Image Scaling demo to understand the difference between all the different options.


Original Size: 4.3KB gzipped (15.04KB uncompressed)

Compiled Size: 1.9KB gzipped (4.65KB uncompressed)

Download Details:

Author: Gestixi
Source Code: https://github.com/gestixi/image-scale 
License: MIT license

#javascript #image #scale #container 

Image-scale: Scale Images to Fit Or Fill any Target Container
Nat  Grady

Nat Grady


GGalt: Extra Coordinate Systems, Geoms, Statistical Transformations

ggalt : Extra Coordinate Systems, Geoms, Statistical Transformations, Scales & Fonts for ‘ggplot2’

A compendium of ‘geoms’, ‘coords’, ‘stats’, scales and fonts for ‘ggplot2’, including splines, 1d and 2d densities, univariate average shifted histograms, a new map coordinate system based on the ‘PROJ.4’-library and the ‘StateFace’ open source font ‘ProPublica’.

The following functions are implemented:

geom_ubar : Uniform width bar charts

geom_horizon : Horizon charts (modified from https://github.com/AtherEnergy/ggTimeSeries)

coord_proj : Like coord_map, only better (prbly shld use this with geom_cartogram as geom_map’s new defaults are ugh)

geom_xspline : Connect control points/observations with an X-spline

stat_xspline : Connect control points/observations with an X-spline

geom_bkde : Display a smooth density estimate (uses KernSmooth::bkde)

geom_stateface: Use ProPublica’s StateFace font in ggplot2 plots

geom_bkde2d : Contours from a 2d density estimate. (uses KernSmooth::bkde2D)

stat_bkde : Display a smooth density estimate (uses KernSmooth::bkde)

stat_bkde2d : Contours from a 2d density estimate. (uses KernSmooth::bkde2D)

stat_ash : Compute and display a univariate averaged shifted histogram (polynomial kernel) (uses ash::ash1/ash::bin1)

geom_encircle: Automatically enclose points in a polygon

byte_format: + helpers. e.g. turn 10000 into 10 Kb

geom_lollipop(): Dead easy lollipops (horizontal or vertical)

geom_dumbbell() : Dead easy dumbbell plots

stat_stepribbon() : Step ribbons

annotation_ticks() : Add minor ticks to identity, exp(1) and exp(10) axis scales independently of each other.

geom_spikelines() : Instead of geom_vline and geom_hline a pair of segments that originate from same c(x,y) are drawn to the respective axes.

plotly integration for a few of the ^^ geoms


# you'll want to see the vignettes, trust me
# OR: devtools::install_github("hrbrmstr/ggalt")



# current verison
## [1] '0.6.1'

dat <- data.frame(x=c(1:10, 1:10, 1:10),
                  y=c(sample(15:30, 10), 2*sample(15:30, 10), 3*sample(15:30, 10)),
                  group=factor(c(rep(1, 10), rep(2, 10), rep(3, 10)))

Horzon Chart

Example carved from: https://github.com/halhen/viz-pub/blob/master/sports-time-of-day/2_gen_chart.R


sports <- read_tsv("https://github.com/halhen/viz-pub/raw/master/sports-time-of-day/activity.tsv")

sports %>%
  group_by(activity) %>% 
  filter(max(p) > 3e-04, 
         !grepl('n\\.e\\.c', activity)) %>% 
  arrange(time) %>%
  mutate(p_peak = p / max(p), 
         p_smooth = (lag(p_peak) + p_peak + lead(p_peak)) / 3,
         p_smooth = coalesce(p_smooth, p_peak)) %>% 
  ungroup() %>%
          filter(., time == 0) %>%
            mutate(time = 24*60))
  }) %>%
  mutate(time = ifelse(time < 3 * 60, time + 24 * 60, time)) %>%
  mutate(activity = reorder(activity, p_peak, FUN=which.max)) %>% 
  arrange(activity) %>%
  mutate(activity.f = reorder(as.character(activity), desc(activity))) -> sports

sports <- mutate(sports, time2 = time/60)

ggplot(sports, aes(time2, p_smooth)) +
  geom_horizon(bandwidth=0.1) +
  facet_grid(activity.f~.) +
  scale_x_continuous(expand=c(0,0), breaks=seq(from = 3, to = 27, by = 3), labels = function(x) {sprintf("%02d:00", as.integer(x %% 24))}) +
  viridis::scale_fill_viridis(name = "Activity relative to peak", discrete=TRUE,
                              labels=scales::percent(seq(0, 1, 0.1)+0.1)) +
  labs(x=NULL, y=NULL, title="Peak time of day for sports and leisure",
       subtitle="Number of participants throughout the day compared to peak popularity.\nNote the morning-and-evening everyday workouts, the midday hobbies,\nand the evenings/late nights out.") +
  theme_ipsum_rc(grid="") +
  theme(panel.spacing.y=unit(-0.05, "lines")) +
  theme(strip.text.y = element_text(hjust=0, angle=360)) +


ggplot(dat, aes(x, y, group=group, color=group)) +
  geom_point() +

ggplot(dat, aes(x, y, group=group, color=factor(group))) +
  geom_point() +
  geom_line() +
  geom_smooth(se=FALSE, linetype="dashed", size=0.5)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

ggplot(dat, aes(x, y, group=group, color=factor(group))) +
  geom_point(color="black") +
  geom_smooth(se=FALSE, linetype="dashed", size=0.5) +
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

ggplot(dat, aes(x, y, group=group, color=factor(group))) +
  geom_point(color="black") +
  geom_smooth(se=FALSE, linetype="dashed", size=0.5) +
  geom_xspline(spline_shape=-0.4, size=0.5)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

ggplot(dat, aes(x, y, group=group, color=factor(group))) +
  geom_point(color="black") +
  geom_smooth(se=FALSE, linetype="dashed", size=0.5) +
  geom_xspline(spline_shape=0.4, size=0.5)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

ggplot(dat, aes(x, y, group=group, color=factor(group))) +
  geom_point(color="black") +
  geom_smooth(se=FALSE, linetype="dashed", size=0.5) +
  geom_xspline(spline_shape=1, size=0.5)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

ggplot(dat, aes(x, y, group=group, color=factor(group))) +
  geom_point(color="black") +
  geom_smooth(se=FALSE, linetype="dashed", size=0.5) +
  geom_xspline(spline_shape=0, size=0.5)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

ggplot(dat, aes(x, y, group=group, color=factor(group))) +
  geom_point(color="black") +
  geom_smooth(se=FALSE, linetype="dashed", size=0.5) +
  geom_xspline(spline_shape=-1, size=0.5)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Alternate (better) density plots

# bkde

data(geyser, package="MASS")

ggplot(geyser, aes(x=duration)) + 
## Bandwidth not specified. Using '0.14', via KernSmooth::dpik.

ggplot(geyser, aes(x=duration)) +
## Bandwidth not specified. Using '0.14', via KernSmooth::dpik.

ggplot(geyser, aes(x=duration)) + 

ggplot(geyser, aes(x=duration)) +

dat <- data.frame(cond = factor(rep(c("A","B"), each=200)), 
                   rating = c(rnorm(200),rnorm(200, mean=.8)))

ggplot(dat, aes(x=rating, color=cond)) + geom_bkde(fill="#00000000")
## Bandwidth not specified. Using '0.36', via KernSmooth::dpik.
## Bandwidth not specified. Using '0.31', via KernSmooth::dpik.

ggplot(dat, aes(x=rating, fill=cond)) + geom_bkde(alpha=0.3)
## Bandwidth not specified. Using '0.36', via KernSmooth::dpik.
## Bandwidth not specified. Using '0.31', via KernSmooth::dpik.

# ash

dat <- data.frame(x=rnorm(100))
grid.arrange(ggplot(dat, aes(x)) + stat_ash(),
             ggplot(dat, aes(x)) + stat_bkde(),
             ggplot(dat, aes(x)) + stat_density(),
## Estimate nonzero outside interval ab.
## Bandwidth not specified. Using '0.43', via KernSmooth::dpik.

cols <- RColorBrewer::brewer.pal(3, "Dark2")
ggplot(dat, aes(x)) + 
  stat_ash(alpha=1/3, fill=cols[3]) + 
  stat_bkde(alpha=1/3, fill=cols[2]) + 
  stat_density(alpha=1/3, fill=cols[1]) + 
  geom_rug() +
  labs(x=NULL, y="density/estimate") +
  scale_x_continuous(expand=c(0,0)) +
  theme_bw() +
  theme(panel.grid=element_blank()) +
## Estimate nonzero outside interval ab.
## Bandwidth not specified. Using '0.43', via KernSmooth::dpik.

Alternate 2D density plots

m <- ggplot(faithful, aes(x = eruptions, y = waiting)) +
       geom_point() +
       xlim(0.5, 6) +
       ylim(40, 110)

m + geom_bkde2d(bandwidth=c(0.5, 4))

m + stat_bkde2d(bandwidth=c(0.5, 4), aes(fill = ..level..), geom = "polygon")

coord_proj LIVES! (still needs a teensy bit of work)

world <- map_data("world")
## Attaching package: 'maps'
## The following object is masked from 'package:purrr':
##     map
world <- world[world$region != "Antarctica",]

gg <- ggplot()
gg <- gg + geom_cartogram(data=world, map=world,
                    aes(x=long, y=lat, map_id=region))
gg <- gg + coord_proj("+proj=wintri")

ProPublica StateFace

# Run show_stateface() to see the location of the TTF StateFace font
# You need to install it for it to work

dat <- data.frame(state=state.abb,
                  x=sample(100, 50),
                  y=sample(100, 50),
                  col=sample(c("#b2182b", "#2166ac"), 50, replace=TRUE),
                  sz=sample(6:15, 50, replace=TRUE),
gg <- ggplot(dat, aes(x=x, y=y))
gg <- gg + geom_stateface(aes(label=state, color=col, size=sz))
gg <- gg + scale_color_identity()
gg <- gg + scale_size_identity()

Encircling points automagically

d <- data.frame(x=c(1,1,2),y=c(1,2,2)*100)

gg <- ggplot(d,aes(x,y))
gg <- gg + scale_x_continuous(expand=c(0.5,1))
gg <- gg + scale_y_continuous(expand=c(0.5,1))

gg + geom_encircle(s_shape=1, expand=0) + geom_point()

gg + geom_encircle(s_shape=1, expand=0.1, colour="red") + geom_point()

gg + geom_encircle(s_shape=0.5, expand=0.1, colour="purple") + geom_point()

gg + geom_encircle(data=subset(d, x==1), colour="blue", spread=0.02) +

gg +geom_encircle(data=subset(d, x==2), colour="cyan", spread=0.04) + 

gg <- ggplot(mpg, aes(displ, hwy))
gg + geom_encircle(data=subset(mpg, hwy>40)) + geom_point()

ss <- subset(mpg,hwy>31 & displ<2)

gg + geom_encircle(data=ss, colour="blue", s_shape=0.9, expand=0.07) +
  geom_point() + geom_point(data=ss, colour="blue")

Step ribbons

x <- 1:10
df <- data.frame(x=x, y=x+10, ymin=x+7, ymax=x+12)

gg <- ggplot(df, aes(x, y))
gg <- gg + geom_ribbon(aes(ymin=ymin, ymax=ymax),
                      stat="stepribbon", fill="#b2b2b2")
gg <- gg + geom_step(color="#2b2b2b")

gg <- ggplot(df, aes(x, y))
gg <- gg + geom_ribbon(aes(ymin=ymin, ymax=ymax),
                      stat="stepribbon", fill="#b2b2b2",
gg <- gg + geom_step(color="#2b2b2b")

Lollipop charts

df <- read.csv(text="category,pct
South Asian/South Asian Americans,0.12
S Asian/Asian Americans,0.25
Muslim Observance,0.29
Africa/Pan Africa/African Americans,0.34
Gender Equity,0.34
Disability Advocacy,0.49
European/European Americans,0.52
Pacific Islander/Pacific Islander Americans,0.59
Non-Traditional Students,0.61
Religious Equity,0.64
Caribbean/Caribbean Americans,0.67
Middle Eastern Heritages and Traditions,0.73
Trans-racial Adoptee/Parent,0.76
Mixed Race,0.80
Jewish Heritage/Observance,0.85
International Students,0.87", stringsAsFactors=FALSE, sep=",", header=TRUE)
## Attaching package: 'scales'
## The following object is masked from 'package:purrr':
##     discard
## The following object is masked from 'package:readr':
##     col_factor
gg <- ggplot(df, aes(y=reorder(category, pct), x=pct))
gg <- gg + geom_lollipop(point.colour="steelblue", point.size=2, horizontal=TRUE)
gg <- gg + scale_x_continuous(expand=c(0,0), labels=percent,
                              breaks=seq(0, 1, by=0.2), limits=c(0, 1))
gg <- gg + labs(x=NULL, y=NULL, 
                title="SUNY Cortland Multicultural Alumni survey results",
                subtitle="Ranked by race, ethnicity, home land and orientation\namong the top areas of concern",
                caption="Data from http://stephanieevergreen.com/lollipop/")
gg <- gg + theme_minimal(base_family="Arial Narrow")
gg <- gg + theme(panel.grid.major.y=element_blank())
gg <- gg + theme(panel.grid.minor=element_blank())
gg <- gg + theme(axis.line.y=element_line(color="#2b2b2b", size=0.15))
gg <- gg + theme(axis.text.y=element_text(margin=margin(r=0, l=0)))
gg <- gg + theme(plot.margin=unit(rep(30, 4), "pt"))
gg <- gg + theme(plot.title=element_text(face="bold"))
gg <- gg + theme(plot.subtitle=element_text(margin=margin(b=10)))
gg <- gg + theme(plot.caption=element_text(size=8, margin=margin(t=10)))

library(ggalt) # devtools::install_github("hrbrmstr/ggalt")

health <- read.csv("https://rud.is/dl/zhealth.csv", stringsAsFactors=FALSE, 
                   header=FALSE, col.names=c("pct", "area_id"))

areas <- read.csv("https://rud.is/dl/zarea_trans.csv", stringsAsFactors=FALSE, header=TRUE)

health %>% 
  mutate(area_id=trunc(area_id)) %>% 
  arrange(area_id, pct) %>% 
  mutate(year=rep(c("2014", "2013"), 26),
         pct=pct/100) %>% 
  left_join(areas, "area_id") %>% 
  mutate(area_name=factor(area_name, levels=unique(area_name))) -> health

setNames(bind_cols(filter(health, year==2014), filter(health, year==2013))[,c(4,1,5)],
         c("area_name", "pct_2014", "pct_2013")) -> health

gg <- ggplot(health, aes(x=pct_2014, xend=pct_2013, y=area_name, group=area_name))
gg <- gg + geom_dumbbell(colour="#a3c4dc", size=1.5, colour_xend="#0e668b", 
                         dot_guide=TRUE, dot_guide_size=0.15)
gg <- gg + scale_x_continuous(label=percent)
gg <- gg + labs(x=NULL, y=NULL)
gg <- gg + theme_bw()
gg <- gg + theme(plot.background=element_rect(fill="#f7f7f7"))
gg <- gg + theme(panel.background=element_rect(fill="#f7f7f7"))
gg <- gg + theme(panel.grid.minor=element_blank())
gg <- gg + theme(panel.grid.major.y=element_blank())
gg <- gg + theme(panel.grid.major.x=element_line())
gg <- gg + theme(axis.ticks=element_blank())
gg <- gg + theme(legend.position="top")
gg <- gg + theme(panel.border=element_blank())


df <- data.frame(trt=LETTERS[1:5], l=c(20, 40, 10, 30, 50), r=c(70, 50, 30, 60, 80))

ggplot(df, aes(y=trt, x=l, xend=r)) + 
  geom_dumbbell(size=3, color="#e3e2e1", 
                colour_x = "#5b8124", colour_xend = "#bad744",
                dot_guide=TRUE, dot_guide_size=0.25) +
  labs(x=NULL, y=NULL, title="ggplot2 geom_dumbbell with dot guide") +
  theme_ipsum_rc(grid="X") +

p <- ggplot(msleep, aes(bodywt, brainwt)) + geom_point()

# add identity scale minor ticks on y axis
p + annotation_ticks(sides = 'l')
## Warning: Removed 27 rows containing missing values (geom_point).

# add identity scale minor ticks on x,y axis
p + annotation_ticks(sides = 'lb')
## Warning: Removed 27 rows containing missing values (geom_point).

# log10 scale
p1 <- p + scale_x_log10()

# add minor ticks on both scales
p1 + annotation_ticks(sides = 'lb', scale = c('identity','log10'))
## Warning: Removed 27 rows containing missing values (geom_point).

mtcars$name <- rownames(mtcars)

p <- ggplot(data = mtcars, aes(x=mpg,y=disp)) + geom_point()

p + 
  geom_spikelines(data = mtcars[mtcars$carb==4,],aes(colour = factor(gear)), linetype = 2) + 
  ggrepel::geom_label_repel(data = mtcars[mtcars$carb==4,],aes(label = name))

Code of Conduct

Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.

Author: hrbrmstr
Source Code: https://github.com/hrbrmstr/ggalt 
License: View license

#r #scale #system 

GGalt: Extra Coordinate Systems, Geoms, Statistical Transformations

Rowan Benny


Detecting Parkinson’s Disease

Parkinson's disease is a condition that affects the central nervous system. It is a sickness in which the neurological system is disrupted. Its symptoms are caused by a lack of dopamine in the brain. Four Tremor, rigidity, delayed movement, and balance issues are the most common symptoms. Parkinson's disease is a movement disorder that affects the human body. Around 1 million persons worldwide are affected by this condition today. Because there is currently no cure for Parkinson's Disease, treatment focuses on reducing the symptoms. This is a condition that causes dopamine-producing neurons in the brain to degenerate.

About the Parkinson's Disease Detection Project

Parkinson's disease is a neurological condition that affects the brain. It causes tremors in the body and hands, as well as stiffness in the body. Because our output comprises only 1s and 0s, we will use one of the Classifier approaches known as RandomForestClassifier in this Python machine learning project to develop a model to detect Parkinson's illness. At this moment, there is no proper cure or treatment available. We'll import the dataset, extract the features and targets, divide them into training and testing sets, and then send them to RandomForestClassifier for classification. Only when the disease is caught early enough can it be treated. So, it is one of the best data science project ideas for beginners and experts as well.

Parkinson's disease (PD) is a neurodegenerative illness that affects the nervous system and belongs to the category of disorders known as motor system disorders. The majority of available methods can detect Parkinson's disease in its advanced stages, which indicates a loss of approximately 60% dopamine in the basal ganglia, which is responsible for coordinating bodily movement with a tiny amount of dopamine. Parkinson's disease patients' basic physical processes, such as breathing, balance, movement, and heart function, deteriorate over time.

Alzheimer's disease, Huntington's disease, and amyotrophic lateral sclerosis, also known as Lou Gehrig's disease, are examples of neurodegenerative diseases. In the United Kingdom, over 145,000 people have been identified alone suffering, while in India, nearly one million people are affected by the sickness, which is rapidly spreading throughout the world. Parkinson's disease affects an estimated seven to ten million people worldwide. Enroll at Learnbay: best data science course in Chennai for more details.

Data mining techniques in medicine is a field of study that blends advanced representational and computing approaches with professional physician perspectives to create tools for bettering healthcare. Gene mutations frequently enhance the risk of Parkinson's disease, but each genetic marker has a smaller influence. Data mining is a computational process for uncovering hidden patterns in information by constructing prediction or classification models that may be learned from previous cases and applied to new ones. The condition can be triggered by dangerous poisons or chemical chemicals found in the environment, but they have a minor effect.

With a large amount of medical data available to hospitals, medical centers, and medical research organizations, the field of medicine can improve healthcare quality and assist clinicians in making decisions about their patients' treatment using data mining techniques. To see which algorithm is the best for detecting the onset of disease, we'll employ XGBoost, KNN, SVMs, and the Random Forest Algorithm.

Support vector machines (SVM), neural networks, decision trees, and Nave Bayes are examples of classification techniques. The study's goal is to examine and compare the performance of four of the above-mentioned classification algorithms after a Parkinson's diagnosis. We assess the performance of the classifiers on actual and discretized PD datasets first and then compare their performance using the attributes selection approach.


I installed the following libraries through pip:


pip install numpy pandas sklearn xgboost

Importing the Required Libraries

Importing all of the relevant modules into our project is the initial step in every project. We'll start by importing all of the essential libraries, which were discussed in the prerequisites section. To prepare, load, and plot data, we'll need some fundamental modules such as numpy, pandas, and matplotlib.


import numpy as np

import pandas as pd

import os, sys

from sklearn.preprocessing import MinMaxScaler

from xgboost import XGBClassifier

from sklearn.model_selection import train_test_split

from sklearn.metrics import accuracy_score

Loading the Dataset

Obtain the data frame's features and targets. The last step is to place the data that we previously downloaded in the same folder as the code file. The panda's module is used for this, and the code for it may be found below.



print("The shape of data is: ",dataframe.shape,"\n")




Our feature data will be scaled between -1 (lowest value) and 1 (highest value) (maximum value). MinMaxScaler would be used to convert features and scale them to a specified range as a parameter. Scaling is critical because variables at different sizes may not contribute equally to model fitting, which can lead to bias. The fit transform function aids in the fitting of data before it is transformed or normalized.


#scale all the features data in the range between -1,1

scaler= MinMaxScaler((-1,1))


Train-Test Split of data

The next stage is to divide the data into training and testing groups using the 80-20 rule, which allocates 80% of the data to training and 20% to testing. Split the dataset into an 80:20 ratio, with 80 rows used for training and 20 rows used for testing.

To accomplish this, we'll use the sklearn module's train test split method. The code can be found below. For this, we will pass scaled features and target data to ‘train_test_split()’.



Building the classifier model

For the classification of our data points, we'll utilise a Random Forest Classifier. Our data have now been trained and is ready to be entered into the XBGClassifier. So, let's have a look at what a random forest is. To get the same result, we'll design a classifier object and then fit the training data into it.




Prediction and Accuracy

Now, we'll use the model we've trained to forecast our output(y pred) for testing data(x test), which makes up 20% of the dataset. The following and final stage is to obtain predictions for the testing dataset and measure our model's accuracy. We'll also calculate our model's accuracy, mean absolute error, and root means square error.


#predict the output for x_test


#calculate accuracy,root mean squared error

print('Accuracy :',accuracy_score(y_test, y_pred))

print('Mean Absolute Error:', mean_absolute_error(y_test, y_pred))

print('Root Mean Squared Error:', np.sqrt(mean_squared_error(y_test, y_pred)))

So there you have it! We created our own Parkinson's disease classification system.

Final Thoughts

Hence Detecting Parkinson’s disease is one of the top data science project ideas for beginners and experts. The goal of this study was to see how different classifiers would perform when applied to the PD dataset, to evaluate their performance, and to see how to attribute selection, discretization, and test mode affected the performance of the selected classifier when applied to the PD dataset.  If you're interested in learning more about data science courses or data science projects,

visit our Learnbay website. We offer the most comprehensive data science course in Bangalore, along with clear explanations. We provide the best data science course in Chennai with detailed explanations.

In this data science project, we created a model to predict whether or not a person has Parkinson's Disease using the RandomForestClassifier of the sklearn module of Python. Because our dataset comprises fewer records, we were able to achieve 97.33 percent accuracy with the machine learning model.


Automated Testing at Scale

Test code and test tools are as critical to a software application as the application code itself. In practice, we often neglect tests because they add overhead to our development and build times. These slowdowns affect our release cadence and disrupt our short-term schedules (for long-term gains that are hard to see during crunch).

#testing #automation #automated testing #scale

Automated Testing at Scale

Testing Beyond Infinity

Wait… What?

Do you live in a world where:

  • You need to turn off the regression tests in the nightly build because it never completes
  • You check-in a piece of code, and it takes days and sometimes weeks to conduct the testing before it gets deployed to production

This article will talk about an approach that can help customers run integration and regression tests quicker and cheaper to improve the CI/CD process.

A typical release cycle

Let us consider a scenario where the developer has added a new feature or made a bug fix to an existing product and checks in the changes. As part of the CI/CD pipeline, the new code gets built, unit test cases are run and deployed in a QA or staging environment. Once the deployment is successful QA team will run the integration and regression test cases to certify the build. In a typical case, the number of test cases may vary between hundreds to a few thousand, so executing these test cases will take time, eventually slowing down the phase in which features get deployed in production. The traditional way of accelerating this test case execution is one of the following:

  • Add more resources to the QA team so they can start these test case execution in parallel from multiple machines, monitor them, and collect results
  • Add more physical machines or VM’s, so we have enough browser agents to run these tests rather than relying on local machines

This method is not scalable and costs both money and time. The approach explained here focuses on this specific problem and walks through how we can run these test cases quicker without costing a lot of money and time.

Note: The approach highlighted here is only applicable for testcases executed using selenium framework

#testing #scale #selenium

Testing Beyond Infinity
Delbert  Ferry

Delbert Ferry


The Evolution of GraphQL at Scale

A Fork in the Road
With a working GraphQL API in place, it won’t be long before other teams want to use it and get all the benefits of using GraphQL. Things tend to go in one of two directions: teams either start to adopt and add to the existing implementation (monolith) or duplicate what the initial team has done (BFF).

The Glorious Monolith
Rather than reinvent the wheel, other teams want to build onto the efforts of that first team. Initially, things start great with new teams adding their unique data concerns to the graph and reusing the rest’s pre-existing schema. Product development times decrease, there’s less duplication of effort, frontend engineers have a consistent API tailored to them to work with, etc. Nirvana, right? Not so quick.

#graphql #development #scale

The Evolution of GraphQL at Scale