From Predictions to Decisions (with Dan Becker)

In this episode of DataFramed, Adel speaks with Dan Becker, CEO of decision.ai and founder of Kaggle Learn on the intersection of decision sciences and AI, and best practices when aligning machine learning to business value.

Throughout the episode, Dan deep-dives into his background, how he reached the top of a Kaggle competition, the difference between machine learning in a Kaggle competition and the real world, the role of empathy when aligning machine learning to business value, the importance of decisions sciences when maximizing the value of machine learning in production, and more.

Please subscribe to the podcast on Itunes and give us a rating and review!

Itunes Link: https://podcasts.apple.com/us/podcast/dataframed/id1336150688

This is the DataCamp podcast link, check it out for the show notes and other goodies: https://www.datacamp.com/community/podcast/from-predictions-to-decisions

#data-science #developer

What is GEEK

Buddha Community

From Predictions to Decisions (with Dan Becker)
Ian  Robinson

Ian Robinson

1623223443

Predictive Modeling in Data Science

Predictive modeling is an integral tool used in the data science world — learn the five primary predictive models and how to use them properly.

Predictive modeling in data science is used to answer the question “What is going to happen in the future, based on known past behaviors?” Modeling is an essential part of data science, and it is mainly divided into predictive and preventive modeling. Predictive modeling, also known as predictive analytics, is the process of using data and statistical algorithms to predict outcomes with data models. Anything from sports outcomes, television ratings to technological advances, and corporate economies can be predicted using these models.

Top 5 Predictive Models

  1. Classification Model: It is the simplest of all predictive analytics models. It puts data in categories based on its historical data. Classification models are best to answer “yes or no” types of questions.
  2. Clustering Model: This model groups data points into separate groups, based on similar behavior.
  3. **Forecast Model: **One of the most widely used predictive analytics models. It deals with metric value prediction, and this model can be applied wherever historical numerical data is available.
  4. Outliers Model: This model, as the name suggests, is oriented around exceptional data entries within a dataset. It can identify exceptional figures either by themselves or in concurrence with other numbers and categories.
  5. Time Series Model: This predictive model consists of a series of data points captured, using time as the input limit. It uses the data from previous years to develop a numerical metric and predicts the next three to six weeks of data using that metric.

#big data #data science #predictive analytics #predictive analysis #predictive modeling #predictive models

Otho  Hagenes

Otho Hagenes

1617419868

Top Five Artificial Intelligence Predictions For 2021

As AI becomes more ubiquitous, it’s also become more autonomous — able to act on its own without human supervision. This demonstrates progress, but it also introduces concerns around control over AI. The AI Arms Race has driven organizations everywhere to deliver the most sophisticated algorithms around, but this can come at a price, ignoring cultural and ethical values that are critical to responsible AI. Here are five predictions on what we should expect to see in AI in 2021:

  1. Something’s going to give around AI governance
  2. Most consumers will continue to be sceptical of AI
  3. Digital transformation (DX) finds its moment
  4. Organizations will increasingly push AI to the edge
  5. ModelOps will become the “go-to” approach for AI deployment.

#opinions #2021 ai predictions #ai predictions for 2021 #artificial intelligence predictions #five artificial intelligence predictions for 2021

Ray  Patel

Ray Patel

1623251700

Beating the Averages With Predictive Data Analytics

Predictive analytics can help narrow the chasm between data analytics professionals and the business people who benefit from their activities.

“Nanoeconomics” may sound like a college course that one may expunge from their minds as soon as they wrap up their final exam the last day of the semester, but it’s a force that may supercharge the insights data and technology professionals are delivering to their business decision-makers. At its heart, data analytics is an economic activity, expected to add some incremental value to corporate revenues.

But there’s been a yawning chasm between the activities of data analytics professionals and the businesspeople who are supposed to see the benefits of those activities. Namely, analytics insights are typically based on statistical averages, versus directly focusing on the problems at hand.

#analytics #big data #big data analysis tools #business strategies #decision management #real-time decisions #trending now #predictive analytics #prescriptive analytics

Decision Tree -Classification

Decision Tree is one of the most widely used machine learning algorithm. It is a supervised learning algorithm that can perform both classification and regression operations.

As the name suggest, it uses a tree like structure to make decisions on the given dataset. Each internal node of the tree represent a “decision” taken by the model based on any of our attributes. From this decision, we can seperate classes or predict values.

Let’s look at both classification and regression operations one by one.

Classification

In Classification, each leaf node of our decision tree represents a **class **based on the decisions we make on attributes at internal nodes.

To understand it more properly let us look at an example. I have used the Iris Flower Dataset from sklearn library. You can refer the complete code on Github — Here.

Image for post

A node’s samples attribute counts how many training instances it applies to. For example, 100 training instances have a petal width ≤ 2.45 cm .

A node’s value attribute tells you how many training instances of each class this node applies to. For example, the bottom-right node applies to 0 Iris-Setosa, 0 Iris- Versicolor, and 43 Iris-Virginica.

And a node’s gini attribute measures its impurity: a node is “pure” (gini=0) if all training instances it applies to belong to the same class. For example, since the depth-1 left node applies only to Iris-Setosa training instances, it is pure and its gini score is 0.

Image for post

Gini Impurity Formula

where, pⱼ is the ratio of instances of class j among all training instances at that node.

Based on the decisions made at each internal node, we can sketch decision boundaries to visualize the model.

Image for post

But how do we find these boundaries ?

We use Classification And Regression Tree (CART) to find these boundaries.

CART is a simple algorithm that finds an attribute _k _and a threshold _t_ₖat which we get a purest subset. Purest subset means that either of the subsets contain maximum proportion of one particular class. For example, left node at depth-2 has maximum proportion of Iris-Versicolor class i.e 49 of 54. In the _CART cost function, _we split the training set in such a way that we get minimum gini impurity.The CART cost function is given as:

Image for post

After successfully splitting the dataset into two, we repeat the process on either sides of the tree.

We can directly implement Decision tree with the help of Scikit learn library. It has a class called DecisionTreeClassifier which trains the model for us directly and we can adjust the hyperparameters as per our requirements.

Image for post

#machine-learning #decision-tree #decision-tree-classifier #decision-tree-regressor #deep learning

From Predictions to Decisions (with Dan Becker)

In this episode of DataFramed, Adel speaks with Dan Becker, CEO of decision.ai and founder of Kaggle Learn on the intersection of decision sciences and AI, and best practices when aligning machine learning to business value.

Throughout the episode, Dan deep-dives into his background, how he reached the top of a Kaggle competition, the difference between machine learning in a Kaggle competition and the real world, the role of empathy when aligning machine learning to business value, the importance of decisions sciences when maximizing the value of machine learning in production, and more.

Please subscribe to the podcast on Itunes and give us a rating and review!

Itunes Link: https://podcasts.apple.com/us/podcast/dataframed/id1336150688

This is the DataCamp podcast link, check it out for the show notes and other goodies: https://www.datacamp.com/community/podcast/from-predictions-to-decisions

#data-science #developer