Learning is a lifelong process.
More fundamentally, expanding your knowledge both vertically and horizontally is the best way to keep growing as an individual, or as a professional.
In this situation, I believe we should focus on fundamental knowledge and software design rather than the latest tools and frameworks.
One thing that’s helping me broaden my perspective is reading the stories and choices of the architecture of the biggest tech companies in the world, and understand how they tackle the most complex challenges, at scale.I compiled a list of engineering blogs, where for each one of them you will learn what topics do they cover, a favorite post you can learn from, with one **bonus **at the end. Let’s go!
The “Professional Facebook”, or the largest employment-oriented online service, with more than 645M users and 200 countries.
As a social platform, LinkedIn has a variety of search and content algorithms that are designed for a 9 digit number of users.Their team posts articles several times a month on topics such as AI & Recommendation, handling** social interactions at scale**, and open-sourcing their systems.One post I recommend: Making the LinkedIn experimentation engine 20x faster
Well, you should know Medium by know, but isn’t it cool to read how this beautiful platform works under the hood?
There is so much value in learning how the biggest online publishing platform works. Did you know that Medium uses **React & GraphQL?**Their posts cover a variety of topics from server design, to **code review **best practices and even how they tackle complex bugs.
#learning #coding #software-development #data-science #programming
Recently, researchers from Google proposed the solution of a very fundamental question in the machine learning community — What is being transferred in Transfer Learning? They explained various tools and analyses to address the fundamental question.
The ability to transfer the domain knowledge of one machine in which it is trained on to another where the data is usually scarce is one of the desired capabilities for machines. Researchers around the globe have been using transfer learning in various deep learning applications, including object detection, image classification, medical imaging tasks, among others.
#developers corner #learn transfer learning #machine learning #transfer learning #transfer learning methods #transfer learning resources
Looks cool, right? But it begs the question: why learn data engineering in the first place?
Typically, data science teams are comprised of data analysts, data scientists, and data engineers. In a previous post, we’ve talked about the differences between these roles, but here let’s dive deeper into some of the advantages of being a data engineer.
Data engineers are the people who connect all the pieces of the data ecosystem within a company or institution. They accomplish this by doing things like:
Doing everything listed above primarily requires one particular skill: programming. Data engineers are software engineers who specialize in data and data technologies.
That makes them quite different from data scientists, who certainly have programming skills, but who typically aren’t engineers. It’s not uncommon for data scientists to hand over their work (e.g., a recommendation system) to data engineers for actual implementation.
And while it’s data analysts and data scientists who are doing the analysis, it’s typically data engineers who are building the data pipelines and other systems necessary to make sure that everyone has easy access to the data they need (and that no one has access to the data who shouldn’t).
A strong foundation in software engineering and programming equips data engineers to build the tools data teams and their companies need to succeed. Or, as Jeff Magnusson put it: “I like to think of it in terms of Lego blocks. Engineers design new Lego blocks that data scientists assemble in creative ways to create new data science.”
This brings us to the first reason why you might want to become a data engineer:
Data engineers are on the front lines of data strategy so that others don’t need to be. They are the first people to tackle the influx of structured and unstructured data that enters a company’s systems. They are the foundation of any data strategy. Without Lego blocks, after all, you can’t build a Lego castle.
In the above Data Science Hierarchy of Needs (proposed by Monica Rogati), data engineers are completely responsible for the two bottom rows, and share responsibility with data analysts and data scientists for the third row from the bottom.
To gain a better understanding of how critical data engineering is, imagine the pyramid pictured above is used as a funnel and flipped upside down. Data is poured into the top of that funnel, and the first people to touch it are data engineers. The more efficient they are at filtering, cleaning, and directing that data, the more efficient everything else can be as the data flows further down the funnel and towards other team members.
Conversely, if the data engineers are not efficient, they can serve as a block in the funnel that harms the work of everyone downstream. If, for example, a poorly-built data pipeline ends up feeding the data science team incomplete data, any analysis they perform on that data may be useless.
In this way, data engineers act as multipliers of the outcomes of a data strategy. They are the giants on whose shoulders data analysts and data scientists stand.
“A common starting point is 2-3 data engineers for every data scientist. For some organizations with more complex data engineering requirements, this can be 4-5 data engineers per data scientist.”
One of the Python functions data analysts and scientists use the most is
read_csv — from the pandas library. This function reads tabular data stored in a text file into Python, so that it can be explored and manipulated.
If you’ve worked with data in Python before, you’re probably very used to typing something like this:
import pandas as pd df = pd.read_csv("a_text_file.csv")
Easy and convenient, right? The
read_csv function is a great example of the essence of software engineering: creating abstract, broad, efficient, and scalable solutions.
What does that mean and how does it relate to learning data engineering? Let’s take a deeper look.
read_csvis doing “under the hood” to use it effectively.
read_csvworks quickly and efficiently, and it’s also efficient to read in code.
#learning and motivation #data engineer #data engineering #study #why #why learn
Reinforcement learning (RL) is surely a rising field, with the huge influence from the performance of AlphaZero (the best chess engine as of now). RL is a subfield of machine learning that teaches agents to perform in an environment to maximize rewards overtime.
Among RL’s model-free methods is temporal difference (TD) learning, with SARSA and Q-learning (QL) being two of the most used algorithms. I chose to explore SARSA and QL to highlight a subtle difference between on-policy learning and off-learning, which we will discuss later in the post.
This post assumes you have basic knowledge of the agent, environment, action, and rewards within RL’s scope. A brief introduction can be found here.
The outline of this post include:
We will compare these two algorithms via the CartPole game implementation. This post’s code can be found here :QL code ,SARSA code , and the fully functioning code . (the fully-functioning code has both algorithms implemented and trained on cart pole game)
The TD learning will be a bit mathematical, but feel free to skim through and jump directly to QL and SARSA.
#reinforcement-learning #artificial-intelligence #machine-learning #deep-learning #learning
Big data skills are crucial to land up data engineering job roles. From designing, creating, building, and maintaining data pipelines to collating raw data from various sources and ensuring performance optimization, data engineering professionals carry a plethora of tasks. They are expected to know about big data frameworks, databases, building data infrastructure, containers, and more. It is also important that they have hands-on exposure to tools such as Scala, Hadoop, HPCC, Storm, Cloudera, Rapidminer, SPSS, SAS, Excel, R, Python, Docker, Kubernetes, MapReduce, Pig, and to name a few.
Here, we list some of the important skills that one should possess to build a successful career in big data.
#big data #latest news #data engineering jobs #skills for data engineering jobs #10 must-have skills for data engineering jobs #data engineering
In the previous blog, we looked into the fact why Few Shot Learning is essential and what are the applications of it. In this article, I will be explaining the Relation Network for Few-Shot Classification (especially for image classification) in the simplest way possible. Moreover, I will be analyzing the Relation Network in terms of:
Moreover, effectiveness will be evaluated on the accuracy, time required for training, and the number of required training parameters.
Please watch the GitHub repository to check out the implementations and keep updated with further experiments.
In few shot classification, our objective is to design a method which can identify any object images by analyzing few sample images of the same class. Let’s the take one example to understand this. Suppose Bob has a client project to design a 5 class classifier, where 5 classes can be anything and these 5 classes can even change with time. As discussed in previous blog, collecting the huge amount of data is very tedious task. Hence, in such cases, Bob will rely upon few shot classification methods where his client can give few set of example images for each classes and after that his system can perform classification young these examples with or without the need of additional training.
In general, in few shot classification four terminologies (N way, K shot, support set, and query set) are used.
At this point, someone new to this concept will have doubt regarding the need of support and query set. So, let’s understand it intuitively. Whenever humans sees any object for the first time, we get the rough idea about that object. Now, in future if we see the same object second time then we will compare it with the image stored in memory from the when we see it for the first time. This applied to all of our surroundings things whether we see, read, or hear. Similarly, to recognise new images from query set, we will provide our model a set of examples i.e., support set to compare.
And this is the basic concept behind Relation Network as well. In next sections, I will be giving the rough idea behind Relation Network and I will be performing different experiments on 102-flower dataset.
The Core idea behind Relation Network is to learn the generalized image representations for each classes using support set such that we can compare lower dimensional representation of query images with each of the class representations. And based on this comparison decide the class of each query images. Relation Network has two modules which allows us to perform above two tasks:
We can define the whole procedure in just 5 steps.
Few things to know during the training is that we will use only images from the set of selective class, and during the testing, we will be using images from unseen classes. For example, from the 102-flower dataset, we will use 50% classes for training, and rest will be used for validation and testing. Moreover, in each episode, we will randomly select 5 classes to create the support and query set and follow the above 5 steps.
That is all need to know about the implementation point of view. Although the whole process is simple and easy to understand, I’ll recommend reading the published research paper, Learning to Compare: Relation Network for Few-Shot Learning, for better understanding.
#deep-learning #few-shot-learning #computer-vision #machine-learning #deep learning #deep learning