Ssekidde  Nat

Ssekidde Nat


The Sensitivity Analysis: A Powerful Yet Underused Tool for Data Scientists

Sensitivity analyses involve varying a system’s inputs to assess the individual impacts of each variable on the output and ultimately provide information regarding the different effects of each tested variable. Sensitivity analyses are typically used in a variety of disciplines such as in business for financial modeling, or in engineering to optimize efficiency in a given system. If used correctly, the sensitivity analysis can be a powerful tool for revealing additional insights that would have otherwise been missed.

While data scientists are great at modeling and creating actionable information based on the understanding and interpretation of datasets or workflows, the sensitivities of basic inputs are often ignored. Conducting a simple sensitivity analysis could add value to a data science project by providing additional information to stakeholders for making more informed decisions. While implementing sensitivity analyses would not be feasible or desirable for certain tasks, they could serve as an additional exploratory tool for data scientists to derive additional insights from multivariate datasets.

In this tutorial, we will go over a simple sensitivity analysis using some real gemstone data. First, we will conduct an exploratory data analysis on the diamonds dataset so that we can better understand the results of the sensitivity analysis which will be discussed in the section after.

Exploring the diamonds dataset

We’ll be using a well-known gemstone dataset that is available within R or could also be found on Kaggle.

The dataset contains 53,940 round-cut diamonds and measures various attributes for each diamond, we will be focusing on the following five:

  • Price is the dollar amount for the diamond measured in $USD and ranges from $326 to $18,823
  • **Weight **is the mass of the diamond, measured in carats (one carat is equal to 0.2 grams) and ranges from 0.2 to 5.01 carats
  • **Clarity **quantifies how clear a diamond is based on the quantity, location, and type of inclusions it contains
  • Color measures the degree to which the diamond has a slight stain or is colorless
  • Cut refers to the quality of the diamond cut as it has a significant impact on the diamond’s optical properties

For more information on diamond attributes or if you are curious to know about how they are quantified, refer to this link.

Attribute correlations and apparent relationships

Correlation coefficient matrices are often the first tool used when determining relationships between variables. Below is the correlation coefficient matrix for the five diamond attributes we are considering.

Correlation matrix for the five attributes of interest in the diamonds dataset. Image by Author

We get some correlations that we expect like the correlation coefficient of 0.92 between diamond price and weight making it clear that the weight of the diamond has the biggest impact on its price. However, we also get some unusual trends such as the small negative correlations between diamond cut, color, and clarity with price. This negative correlation is due to lighter diamonds having better cuts, color, and clarity when compared to heavier diamonds thus resulting in the misleading correlations of how improving these individual attributes results in a decreased price.

A closer look at the data to reveal additional insights

We can use a scatter plot and implement some multidimensional visualization techniques to better understand how the diamond attributes are related. Below is a log-log scatter plot showing the relationship between diamond weight, color, and clarity with price.

Scatter plot of various diamond attributes showing the increasing price when increasing either the weight, color, or clarity of the diamond. Although the diamond cut attribute is not shown here, the effect is similar to diamond color and clarity. Image by Author

In the figure above we now realize that while holding other attributes constant, price increases when the value of any other attribute increases. Specifically, in the above figure we observe:

  • The positive slope in the data indicates that price increases with weight
  • For any single diamond weight, increasing only the color or clarity also increases the price, this is most clearly seen as the color and size of markers increases along the x-axis (price) at any single, constant weight value on the y-axis

Note that to improve the visualization above, the data was sorted based on clarity which controls the size of the markers and this allowed us to plot smaller dots on top of bigger dots. Unfortunately, given the amount of data, plotting smaller dots atop larger ones conceals how the lighter diamonds typically have greater clarity and color as they are hidden behind many smaller dots.

Now that we better understand how the diamond attributes relate to one another, we can conduct our sensitivity analysis.

#data-analysis #data-visualization

What is GEEK

Buddha Community

The Sensitivity Analysis: A Powerful Yet Underused Tool for Data Scientists
 iOS App Dev

iOS App Dev


Your Data Architecture: Simple Best Practices for Your Data Strategy

If you accumulate data on which you base your decision-making as an organization, you should probably think about your data architecture and possible best practices.

If you accumulate data on which you base your decision-making as an organization, you most probably need to think about your data architecture and consider possible best practices. Gaining a competitive edge, remaining customer-centric to the greatest extent possible, and streamlining processes to get on-the-button outcomes can all be traced back to an organization’s capacity to build a future-ready data architecture.

In what follows, we offer a short overview of the overarching capabilities of data architecture. These include user-centricity, elasticity, robustness, and the capacity to ensure the seamless flow of data at all times. Added to these are automation enablement, plus security and data governance considerations. These points from our checklist for what we perceive to be an anticipatory analytics ecosystem.

#big data #data science #big data analytics #data analysis #data architecture #data transformation #data platform #data strategy #cloud data platform #data acquisition

Ian  Robinson

Ian Robinson


Top 10 Big Data Tools for Data Management and Analytics

Introduction to Big Data

What exactly is Big Data? Big Data is nothing but large and complex data sets, which can be both structured and unstructured. Its concept encompasses the infrastructures, technologies, and Big Data Tools created to manage this large amount of information.

To fulfill the need to achieve high-performance, Big Data Analytics tools play a vital role. Further, various Big Data tools and frameworks are responsible for retrieving meaningful information from a huge set of data.

List of Big Data Tools & Frameworks

The most important as well as popular Big Data Analytics Open Source Tools which are used in 2020 are as follows:

  1. Big Data Framework
  2. Data Storage Tools
  3. Data Visualization Tools
  4. Big Data Processing Tools
  5. Data Preprocessing Tools
  6. Data Wrangling Tools
  7. Big Data Testing Tools
  8. Data Governance Tools
  9. Security Management Tools
  10. Real-Time Data Streaming Tools

#big data engineering #top 10 big data tools for data management and analytics #big data tools for data management and analytics #tools for data management #analytics #top big data tools for data management and analytics

Gerhard  Brink

Gerhard Brink


How Are Data analysis and Data science Different From Each Other

With possibly everything that one can think of which revolves around data, the need for people who can transform data into a manner that helps in making the best of the available data is at its peak. This brings our attention to two major aspects of data – data science and data analysis. Many tend to get confused between the two and often misuse one in place of the other. In reality, they are different from each other in a couple of aspects. Read on to find how data analysis and data science are different from each other.

Before jumping straight into the differences between the two, it is critical to understand the commonalities between data analysis and data science. First things first – both these areas revolve primarily around data. Next, the prime objective of both of them remains the same – to meet the business objective and aid in the decision-making ability. Also, both these fields demand the person be well acquainted with the business problems, market size, opportunities, risks and a rough idea of what could be the possible solutions.

Now, addressing the main topic of interest – how are data analysis and data science different from each other.

As far as data science is concerned, it is nothing but drawing actionable insights from raw data. Data science has most of the work done in these three areas –

  • Building/collecting data
  • Cleaning/filtering data
  • Organizing data

#big data #latest news #how are data analysis and data science different from each other #data science #data analysis #data analysis and data science different

Gerhard  Brink

Gerhard Brink


Getting Started With Data Lakes

Frameworks for Efficient Enterprise Analytics

The opportunities big data offers also come with very real challenges that many organizations are facing today. Often, it’s finding the most cost-effective, scalable way to store and process boundless volumes of data in multiple formats that come from a growing number of sources. Then organizations need the analytical capabilities and flexibility to turn this data into insights that can meet their specific business objectives.

This Refcard dives into how a data lake helps tackle these challenges at both ends — from its enhanced architecture that’s designed for efficient data ingestion, storage, and management to its advanced analytics functionality and performance flexibility. You’ll also explore key benefits and common use cases.


As technology continues to evolve with new data sources, such as IoT sensors and social media churning out large volumes of data, there has never been a better time to discuss the possibilities and challenges of managing such data for varying analytical insights. In this Refcard, we dig deep into how data lakes solve the problem of storing and processing enormous amounts of data. While doing so, we also explore the benefits of data lakes, their use cases, and how they differ from data warehouses (DWHs).

This is a preview of the Getting Started With Data Lakes Refcard. To read the entire Refcard, please download the PDF from the link above.

#big data #data analytics #data analysis #business analytics #data warehouse #data storage #data lake #data lake architecture #data lake governance #data lake management

Gerhard  Brink

Gerhard Brink


Top 10 Big Data Tools for 2021!

In today’s tech world, data is everything. As the focus on data grows, it keeps multiplying by leaps and bounds each day. If earlier mounds of data were talked about in kilobytes and megabytes, today terabytes have become the base unit for organizational data. This coming in of big data has transformed paradigms of data storage, processing, and analytics.

Instead of only gathering and storing information that can offer crucial insights to meet short-term goals, an increasing number of enterprises are storing much larger amounts of data gathered from multiple resources across business processes. However, all this data is meaningless on its own. It can add value only when it is processed and analyzed the right way to draw point insights that can improve decision-making.

Processing and analyzing big data is not an easy task. If not handled correctly, big data can turn into an obstacle rather than an effective solution for businesses. Effective handling of big data management  requires to use of tools that can steer you toward tangible, substantial results. For that, you need a set of great big data tools that will not only solve this problem but also help you in producing substantial results.

Data storage tools, warehouses, and data lakes all play a crucial role in helping companies store and sort vast amounts of information. However, the true power of big data lies in its analytics. There are a host of big data tools in the market today to aid a business’ journey from gathering data to storing, processing, analyzing, and reporting it. Let’s take a closer look at some of the top big data tools that can help you inch closer to your goal of establishing data-driven decision-making and workflow processes.

Apache Hadoop

Apache Spark


Apache Storm

Apache Cassandra

#big data #big data tools #big data management #big data tool #top 10 big data tools for 2021! #top-big-data-tool