Tia  Gottlieb

Tia Gottlieb

1594017600

How a Feature Dictionary Can Uplift the Modern ML Architecture

Feature engineering is the most critical component in a machine learning pipeline. The quality of data-features determines the quality of the ML output. Garbage in, garbage out. Also, the strategy used to maintain data-features affects the plug-ability and flexibility of the ML platform. Small scale ML architectures usually have independent pipelines for each ML model. On the other hand, modern enterprise ML architectures abstract out ETL and feature engineering jobs from ML jobs.

In enterprise ML architectures, it’s wise to maintain the outputs of the feature jobs in a sharable format without encoding. These features can be later cherrypicked, encoded, and fed into an ML model that needs it. This approach has several advantages.

  • In big enterprises, there will be multiple ML models that are solving problems within the same business realm. Which means they will have a lot of common features. If these common features don’t have to be regenerated each time for each model, it saves time, cost and hardware.
  • Plugging in new models and experimenting will be easier as data-features are readily available.
  • The data-scientist can focus more on the modelling parts and less on the engineering aspects.

The Process

When features are readily available, the journey from a ‘business question’ to ‘scientific answer’ becomes much more simple.

With the availability of feature pool, when a data scientist wants to do a new experiment, he/she does not have to start from the raw data. Instead he/she can start with the available features. This can avoid a lot of unoptimised runs. In the cases where they need more data-features, it can go as a request to the engineering team to optimally build whatever new is requested. And when they are confident to take the model to production environment, the model promotion will involve only minimal components.

But to facilitate all this, we need one more component — A Feature Dictionary

Feature Dictionary

Simply put, it’s a meta-data store with some statistical attributes. Each record in the dictionary would represent a feature. A sample feature dictionary would look something like the following snapshot.

A snapshot of a sample feature dictionary

Snapshot created for easier explanation

In this example, the column_‘VarType’ _points at the type of feature variable, and hence values will be labels like ‘continuous’, ‘discrete’, ‘nominal’, and ‘ordinal’. ‘_Desc’ _column would tell what the feature represents in the real world. ‘_Ancestry’ _column is all about the data lineage of the feature. It would contain details of one or more raw data sources that contributed to the computation of this feature, the method of ingestion (stream/batch), details of each point of mutability, etc. Column ‘_common outliers’ _talk about values that defers significantly from expected values while the ‘_usage’ _column is about the current ML models in production that are using the feature.

This is just a basic structure. Depending on the business need & complexity of the platform, the above can be expanded.

The crux of this idea is the fact that there is a go-to place to search for what’s available in the platform.

Data dictionaries are not a new concept. However, they focus heavily on the raw data, storage-related attributes, and business meaning. Data-features require different set of attributes when compared to raw data because features are a result of mathematical or time-lapse based computations.

Unlike data dictionaries, the main elements that feature dictionaries bring along are the statistical and scientific characteristics of the data-features. The availability of scientific meta-data combined with the feature pool helps in avoiding repeated data exploration of already explored and wrangled data. This leads to a highly flexible ML platform where new models can be deployed easily without the burden of building new end-to-end pipelines.

#feature-engineering #big-data #data analysis

What is GEEK

Buddha Community

How a Feature Dictionary Can Uplift the Modern ML Architecture
Tia  Gottlieb

Tia Gottlieb

1594017600

How a Feature Dictionary Can Uplift the Modern ML Architecture

Feature engineering is the most critical component in a machine learning pipeline. The quality of data-features determines the quality of the ML output. Garbage in, garbage out. Also, the strategy used to maintain data-features affects the plug-ability and flexibility of the ML platform. Small scale ML architectures usually have independent pipelines for each ML model. On the other hand, modern enterprise ML architectures abstract out ETL and feature engineering jobs from ML jobs.

In enterprise ML architectures, it’s wise to maintain the outputs of the feature jobs in a sharable format without encoding. These features can be later cherrypicked, encoded, and fed into an ML model that needs it. This approach has several advantages.

  • In big enterprises, there will be multiple ML models that are solving problems within the same business realm. Which means they will have a lot of common features. If these common features don’t have to be regenerated each time for each model, it saves time, cost and hardware.
  • Plugging in new models and experimenting will be easier as data-features are readily available.
  • The data-scientist can focus more on the modelling parts and less on the engineering aspects.

The Process

When features are readily available, the journey from a ‘business question’ to ‘scientific answer’ becomes much more simple.

With the availability of feature pool, when a data scientist wants to do a new experiment, he/she does not have to start from the raw data. Instead he/she can start with the available features. This can avoid a lot of unoptimised runs. In the cases where they need more data-features, it can go as a request to the engineering team to optimally build whatever new is requested. And when they are confident to take the model to production environment, the model promotion will involve only minimal components.

But to facilitate all this, we need one more component — A Feature Dictionary

Feature Dictionary

Simply put, it’s a meta-data store with some statistical attributes. Each record in the dictionary would represent a feature. A sample feature dictionary would look something like the following snapshot.

A snapshot of a sample feature dictionary

Snapshot created for easier explanation

In this example, the column_‘VarType’ _points at the type of feature variable, and hence values will be labels like ‘continuous’, ‘discrete’, ‘nominal’, and ‘ordinal’. ‘_Desc’ _column would tell what the feature represents in the real world. ‘_Ancestry’ _column is all about the data lineage of the feature. It would contain details of one or more raw data sources that contributed to the computation of this feature, the method of ingestion (stream/batch), details of each point of mutability, etc. Column ‘_common outliers’ _talk about values that defers significantly from expected values while the ‘_usage’ _column is about the current ML models in production that are using the feature.

This is just a basic structure. Depending on the business need & complexity of the platform, the above can be expanded.

The crux of this idea is the fact that there is a go-to place to search for what’s available in the platform.

Data dictionaries are not a new concept. However, they focus heavily on the raw data, storage-related attributes, and business meaning. Data-features require different set of attributes when compared to raw data because features are a result of mathematical or time-lapse based computations.

Unlike data dictionaries, the main elements that feature dictionaries bring along are the statistical and scientific characteristics of the data-features. The availability of scientific meta-data combined with the feature pool helps in avoiding repeated data exploration of already explored and wrangled data. This leads to a highly flexible ML platform where new models can be deployed easily without the burden of building new end-to-end pipelines.

#feature-engineering #big-data #data analysis

Ray  Patel

Ray Patel

1623077700

Working with Python dictionaries: a cheat sheet

Accessing, editing and looping through dictionary items

Dictionaries in Python are a collection of key-value pairs — meaning every item in the dictionary has a key and an associated value.

If we want to write down prices of some items in a grocery store, normally we will note them on a piece of paper like this:

eggs - 4.99
banana - 1.49
cheese- 4.5
eggplant - 2.5
bread - 3.99

In Python dictionary lingo, the name of each item is “key” and the associated price is “value” and they appear in pairs. We can represent the same in a Python dictionary data structure as follows:

{"eggs": 4.99,
"banana": 1.49,
"cheese": 4.5,
"eggplant": 2.5,
"bread": 3.99}

Notice the differences. In the dictionary

  • each key is within quotation marks because they are strings
  • the associated values are not quoted because they are numeric
  • keys and values are separated by a colon (:)
  • the items are comma-separated

#dictionary #python #artificial-intelligence #dictionaries #python dictionary #working with python dictionaries

Serverless Vs Microservices Architecture - A Deep Dive

Companies need to be thinking long-term before even starting a software development project. These needs are solved at the level of architecture: business owners want to assure agility, scalability, and performance.

The top contenders for scalable solutions are serverless and microservices. Both architectures prioritize security but approach it in their own ways. Let’s take a look at how businesses can benefit from the adoption of serverless architecture vs microservices, examine their differences, advantages, and use cases.

#serverless #microservices #architecture #software-architecture #serverless-architecture #microservice-architecture #serverless-vs-microservices #hackernoon-top-story

Roberta  Ward

Roberta Ward

1595344320

Wondering how to upgrade your skills in the pandemic? Here's a simple way you can do it.

Corona Virus Pandemic has brought the world to a standstill.

Countries are on a major lockdown. Schools, colleges, theatres, gym, clubs, and all other public places are shut down, the country’s economy is suffering, human health is on stake, people are losing their jobs and nobody knows how worse it can get.

Since most of the places are on lockdown, and you are working from home or have enough time to nourish your skills, then you should use this time wisely! We always complain that we want some ‘time’ to learn and upgrade our knowledge but don’t get it due to our ‘busy schedules’. So, now is the time to make a ‘list of skills’ and learn and upgrade your skills at home!

And for the technology-loving people like us, Knoldus Techhub has already helped us a lot in doing it in a short span of time!

If you are still not aware of it, don’t worry as Georgia Byng has well said,

“No time is better than the present”

– Georgia Byng, a British children’s writer, illustrator, actress and film producer.

No matter if you are a developer (be it front-end or back-end) or a data scientisttester, or a DevOps person, or, a learner who has a keen interest in technology, Knoldus Techhub has brought it all for you under one common roof.

From technologies like Scala, spark, elastic-search to angular, go, machine learning, it has a total of 20 technologies with some recently added ones i.e. DAML, test automation, snowflake, and ionic.

How to upgrade your skills?

Every technology in Tech-hub has n number of templates. Once you click on any specific technology you’ll be able to see all the templates of that technology. Since these templates are downloadable, you need to provide your email to get the template downloadable link in your mail.

These templates helps you learn the practical implementation of a topic with so much of ease. Using these templates you can learn and kick-start your development in no time.

Apart from your learning, there are some out of the box templates, that can help provide the solution to your business problem that has all the basic dependencies/ implementations already plugged in. Tech hub names these templates as xlr8rs (pronounced as accelerators).

xlr8rs make your development real fast by just adding your core business logic to the template.

If you are looking for a template that’s not available, you can also request a template may be for learning or requesting for a solution to your business problem and tech-hub will connect with you to provide you the solution. Isn’t this helpful 🙂

Confused with which technology to start with?

To keep you updated, the Knoldus tech hub provides you with the information on the most trending technology and the most downloaded templates at present. This you’ll be informed and learn the one that’s most trending.

Since we believe:

“There’s always a scope of improvement“

If you still feel like it isn’t helping you in learning and development, you can provide your feedback in the feedback section in the bottom right corner of the website.

#ai #akka #akka-http #akka-streams #amazon ec2 #angular 6 #angular 9 #angular material #apache flink #apache kafka #apache spark #api testing #artificial intelligence #aws #aws services #big data and fast data #blockchain #css #daml #devops #elasticsearch #flink #functional programming #future #grpc #html #hybrid application development #ionic framework #java #java11 #kubernetes #lagom #microservices #ml # ai and data engineering #mlflow #mlops #mobile development #mongodb #non-blocking #nosql #play #play 2.4.x #play framework #python #react #reactive application #reactive architecture #reactive programming #rust #scala #scalatest #slick #software #spark #spring boot #sql #streaming #tech blogs #testing #user interface (ui) #web #web application #web designing #angular #coronavirus #daml #development #devops #elasticsearch #golang #ionic #java #kafka #knoldus #lagom #learn #machine learning #ml #pandemic #play framework #scala #skills #snowflake #spark streaming #techhub #technology #test automation #time management #upgrade

Christa  Stehr

Christa Stehr

1598944269

Angular Architecture Components and Features

Angular is one of the most popular frameworks for developing Desktop and mobile applications for clients. Angular application uses HTML and TypeScript. You can use this in cross-platform mobile development via IONIC. Angular Implements both Core and Optional functionalities in the form of TypeScript libraries that you can import in your application. You should have domain knowledge of HTML, CSS, and JavaScript for working with Angular. In this Angular Tutorial by DataFlair, we will learn about Angular Architecture and its components.

There are three basic things in Angular that are Components, Modules, and Routing. An angular app is a combination of different NgModules as modules are the building block of angular. Components, on the other hand, are responsible for defining the views, which are a part of elements of the screen. You can change the Views using data and program logic. Routing is the functionality that links multiple components together.

angular architecture

Architecture of Angular

The Building blocks of Angular Architecture as depicted in the image are:

Architecture of angular

  • Module
  • Template
  • Component
  • Metadata
  • Data Binding
  • Services
  • Directives
  • Dependency Injection

Let us learn each of these Angular Architecture Components in detail now:

1. Module

Angular is a modular platform and it may contain one or more Angular Module or NgModules depending on the demand. It is the essential module that is always present is the Root module namely “AppModule” in the application.

Flow of angular application

NgModule is a Decorator function that handles the compilation part of the application. It works in synergy with other modules. It takes a single object in the form of Metadata. NgModule communicates with other modules for bootstrapping them and works in the Parent-Child relationship for the proper execution of the application.

Here are the properties of NgModule:

Angular NgModule Elaborated

#angular tutorials #angular architecture #angular architecture components #angular architecture working