Myah  Conn

Myah Conn

1593418800

Learning From the Giants -Top Engineering Blogs you Must Follow

Learning is a lifelong process.
More fundamentally, expanding your knowledge both vertically and horizontally is the best way to keep growing as an individual, or as a professional.

It’s even more true when it comes to the developer’s community, where things move at an insane speed. Actually, for the JavaScript community, things might be moving too fast. Some even call it JavaScript fatigue.

In this situation, I believe we should focus on fundamental knowledge and software design rather than the latest tools and frameworks.

One thing that’s helping me broaden my perspective is reading the stories and choices of the architecture of the biggest tech companies in the world, and understand how they tackle the most complex challenges, at scale.I compiled a list of engineering blogs, where for each one of them you will learn what topics do they cover, a favorite post you can learn from, with one **bonus **at the end. Let’s go!

LinkedIn

The “Professional Facebook”, or the largest employment-oriented online service, with more than 645M users and 200 countries.

As a social platform, LinkedIn has a variety of search and content algorithms that are designed for a 9 digit number of users.Their team posts articles several times a month on topics such as AI & Recommendation, handling** social interactions at scale**, and open-sourcing their systems.One post I recommend: Making the LinkedIn experimentation engine 20x faster

Medium

Well, you should know Medium by know, but isn’t it cool to read how this beautiful platform works under the hood?

There is so much value in learning how the biggest online publishing platform works. Did you know that Medium uses **React & GraphQL?**Their posts cover a variety of topics from server design, to **code review **best practices and even how they tackle complex bugs.

#learning #coding #software-development #data-science #programming

What is GEEK

Buddha Community

Learning From the Giants -Top Engineering Blogs you Must Follow
Jerad  Bailey

Jerad Bailey

1598891580

Google Reveals "What is being Transferred” in Transfer Learning

Recently, researchers from Google proposed the solution of a very fundamental question in the machine learning community — What is being transferred in Transfer Learning? They explained various tools and analyses to address the fundamental question.

The ability to transfer the domain knowledge of one machine in which it is trained on to another where the data is usually scarce is one of the desired capabilities for machines. Researchers around the globe have been using transfer learning in various deep learning applications, including object detection, image classification, medical imaging tasks, among others.

#developers corner #learn transfer learning #machine learning #transfer learning #transfer learning methods #transfer learning resources

Five Reasons Why You Should Learn Data Engineering — Dataquest

Looks cool, right? But it begs the question: why learn data engineering in the first place?

Typically, data science teams are comprised of data analysts, data scientists, and data engineers. In a previous post, we’ve talked about the differences between these roles, but here let’s dive deeper into some of the advantages of being a data engineer.

Data engineers are the people who connect all the pieces of the data ecosystem within a company or institution. They accomplish this by doing things like:

  • Accessing, collecting, auditing, and cleaning data from applications and systems into a usable state
  • Creating and maintaining efficient databases
  • Building data pipelines
  • Monitoring and managing all the data systems (scalability, security, etc)
  • Implementing data scientists’ output in a scalable manner

Doing everything listed above primarily requires one particular skill: programming. Data engineers are software engineers who specialize in data and data technologies.

That makes them quite different from data scientists, who certainly have programming skills, but who typically aren’t engineers. It’s not uncommon for data scientists to hand over their work (e.g., a recommendation system) to data engineers for actual implementation.

And while it’s data analysts and data scientists who are doing the analysis, it’s typically data engineers who are building the data pipelines and other systems necessary to make sure that everyone has easy access to the data they need (and that no one has access to the data who shouldn’t).

A strong foundation in software engineering and programming equips data engineers to build the tools data teams and their companies need to succeed. Or, as Jeff Magnusson put it: “I like to think of it in terms of Lego blocks. Engineers design new Lego blocks that data scientists assemble in creative ways to create new data science.”

This brings us to the first reason why you might want to become a data engineer:

1. Why Learn Data Engineering? It’s the Backbone of Data Science

Data engineers are on the front lines of data strategy so that others don’t need to be. They are the first people to tackle the influx of structured and unstructured data that enters a company’s systems. They are the foundation of any data strategy. Without Lego blocks, after all, you can’t build a Lego castle.

ai_hierarchy

In the above Data Science Hierarchy of Needs (proposed by Monica Rogati), data engineers are completely responsible for the two bottom rows, and share responsibility with data analysts and data scientists for the third row from the bottom.

To gain a better understanding of how critical data engineering is, imagine the pyramid pictured above is used as a funnel and flipped upside down. Data is poured into the top of that funnel, and the first people to touch it are data engineers. The more efficient they are at filtering, cleaning, and directing that data, the more efficient everything else can be as the data flows further down the funnel and towards other team members.

Conversely, if the data engineers are not efficient, they can serve as a block in the funnel that harms the work of everyone downstream. If, for example, a poorly-built data pipeline ends up feeding the data science team incomplete data, any analysis they perform on that data may be useless.

In this way, data engineers act as multipliers of the outcomes of a data strategy. They are the giants on whose shoulders data analysts and data scientists stand.

This is evidenced in the way companies with good data strategies structure their teams. According to Jesse Anderson a data engineer and managing director of the Big Data Institute:

“A common starting point is 2-3 data engineers for every data scientist. For some organizations with more complex data engineering requirements, this can be 4-5 data engineers per data scientist.”

2. It’s Technically Challenging

One of the Python functions data analysts and scientists use the most is read_csv — from the pandas library. This function reads tabular data stored in a text file into Python, so that it can be explored and manipulated.

If you’ve worked with data in Python before, you’re probably very used to typing something like this:

import pandas as pd df = pd.read_csv("a_text_file.csv")

Easy and convenient, right? The read_csv function is a great example of the essence of software engineering: creating abstract, broad, efficient, and scalable solutions.

What does that mean and how does it relate to learning data engineering? Let’s take a deeper look.

  • Abstract. When reading a file in a computer, a very complex process occurs under the hood. However, our use of the function is very simple, what goes on in the background is abstracted away from the usage. You don’t need to understand what read_csv is doing “under the hood” to use it effectively.
  • Broad. This function also allows us to explicitly choose what delimiter is being used in the text’s file tabular data (e.g. commas, semicolons, tabs, and so on). This makes it easy to use with a variety of CSV styles, and that’s music to data scientists’ ears. And there are many other options that allow data practitioners to focus on their goals instead of having to worry about programming details.
  • Efficient. read_csv works quickly and efficiently, and it’s also efficient to read in code.
  • Scalable. Another option included with this function allows us to read files by chunks, so that if a file is too large to read into the computer’s RAM, it can be read chunk by chunk, allowing users to process files as large as they come.

#learning and motivation #data engineer #data engineering #study #why #why learn

Jackson  Crist

Jackson Crist

1617331066

Intro to Reinforcement Learning: Temporal Difference Learning, SARSA Vs. Q-learning

Reinforcement learning (RL) is surely a rising field, with the huge influence from the performance of AlphaZero (the best chess engine as of now). RL is a subfield of machine learning that teaches agents to perform in an environment to maximize rewards overtime.

Among RL’s model-free methods is temporal difference (TD) learning, with SARSA and Q-learning (QL) being two of the most used algorithms. I chose to explore SARSA and QL to highlight a subtle difference between on-policy learning and off-learning, which we will discuss later in the post.

This post assumes you have basic knowledge of the agent, environment, action, and rewards within RL’s scope. A brief introduction can be found here.

The outline of this post include:

  • Temporal difference learning (TD learning)
  • Parameters
  • QL & SARSA
  • Comparison
  • Implementation
  • Conclusion

We will compare these two algorithms via the CartPole game implementation. This post’s code can be found here :QL code ,SARSA code , and the fully functioning code . (the fully-functioning code has both algorithms implemented and trained on cart pole game)

The TD learning will be a bit mathematical, but feel free to skim through and jump directly to QL and SARSA.

#reinforcement-learning #artificial-intelligence #machine-learning #deep-learning #learning

Siphiwe  Nair

Siphiwe Nair

1624072920

10 Must-have Skills for Data Engineering Jobs

Big data skills are crucial to land up data engineering job roles. From designing, creating, building, and maintaining data pipelines to collating raw data from various sources and ensuring performance optimization, data engineering professionals carry a plethora of tasks. They are expected to know about big data frameworks, databases, building data infrastructure, containers, and more. It is also important that they have hands-on exposure to tools such as Scala, Hadoop, HPCC, Storm, Cloudera, Rapidminer, SPSS, SAS, Excel, R, Python, Docker, Kubernetes, MapReduce, Pig, and to name a few.

Here, we list some of the important skills that one should possess to build a successful career in big data.

1. Database Tools
2. Data Transformation Tools
3. Data Ingestion Tools
4. Data Mining Tools

#big data #latest news #data engineering jobs #skills for data engineering jobs #10 must-have skills for data engineering jobs #data engineering

Few Shot Learning — A Case Study (2)

In the previous blog, we looked into the fact why Few Shot Learning is essential and what are the applications of it. In this article, I will be explaining the Relation Network for Few-Shot Classification (especially for image classification) in the simplest way possible. Moreover, I will be analyzing the Relation Network in terms of:

  1. Effectiveness of different architectures such as Residual and Inception Networks
  2. Effects of transfer learning via using pre-trained classifier on ImageNet dataset

Moreover, effectiveness will be evaluated on the accuracy, time required for training, and the number of required training parameters.

Please watch the GitHub repository to check out the implementations and keep updated with further experiments.

Introduction to Few-Shot Classification

In few shot classification, our objective is to design a method which can identify any object images by analyzing few sample images of the same class. Let’s the take one example to understand this. Suppose Bob has a client project to design a 5 class classifier, where 5 classes can be anything and these 5 classes can even change with time. As discussed in previous blog, collecting the huge amount of data is very tedious task. Hence, in such cases, Bob will rely upon few shot classification methods where his client can give few set of example images for each classes and after that his system can perform classification young these examples with or without the need of additional training.

In general, in few shot classification four terminologies (N way, K shot, support set, and query set) are used.

  1. N way: It means that there will be total N classes which we will be using for training/testing, like 5 classes in above example.
  2. K shot: Here, K means we have only K example images available for each classes during training/testing.
  3. Support set: It represents a collection of all available K examples images from each classes. Therefore, in support set we have total N*K images.
  4. Query set: This set will have all the images for which we want to predict the respective classes.

At this point, someone new to this concept will have doubt regarding the need of support and query set. So, let’s understand it intuitively. Whenever humans sees any object for the first time, we get the rough idea about that object. Now, in future if we see the same object second time then we will compare it with the image stored in memory from the when we see it for the first time. This applied to all of our surroundings things whether we see, read, or hear. Similarly, to recognise new images from query set, we will provide our model a set of examples i.e., support set to compare.

And this is the basic concept behind Relation Network as well. In next sections, I will be giving the rough idea behind Relation Network and I will be performing different experiments on 102-flower dataset.

About Relation Network

The Core idea behind Relation Network is to learn the generalized image representations for each classes using support set such that we can compare lower dimensional representation of query images with each of the class representations. And based on this comparison decide the class of each query images. Relation Network has two modules which allows us to perform above two tasks:

  1. Embedding module: This module will extract the required underlying representations from each input images irrespective of the their classes.
  2. Relation Module: This module will score the relation of embedding of query image with each class embedding.

Training/Testing procedure:

We can define the whole procedure in just 5 steps.

  1. Use the support set and get underlying representations of each images using embedding module.
  2. Take the average of between each class images and get the single underlying representation for each class.
  3. Then get the embedding for each query images and concatenate them with each class’ embedding.
  4. Use the relation module to get the scores. And class with highest score will be the label of respective query image.
  5. [Only during training] Use MSE loss functions to train both (embedding + relation) modules.

Few things to know during the training is that we will use only images from the set of selective class, and during the testing, we will be using images from unseen classes. For example, from the 102-flower dataset, we will use 50% classes for training, and rest will be used for validation and testing. Moreover, in each episode, we will randomly select 5 classes to create the support and query set and follow the above 5 steps.

That is all need to know about the implementation point of view. Although the whole process is simple and easy to understand, I’ll recommend reading the published research paper, Learning to Compare: Relation Network for Few-Shot Learning, for better understanding.

#deep-learning #few-shot-learning #computer-vision #machine-learning #deep learning #deep learning