Visualizing regularization and the L1 and L2 norms

If you’ve taken an introductory Machine Learning class, you’ve certainly come across the issue of overfitting and been introduced to the concept of regularization and norm. I often see this being discussed purely by looking at the formulas, so I figured I’d try to give a better insight into why exactly minimising the norm induces regularization — and how L1 and L2 differ from each other — using some visual examples.

Prerequisite knowledge

  • Linear regression
  • Gradient descent
  • Some understanding of overfitting and regularization

Topics covered

  • Why does minimizing the norm induce regularization?
  • What’s the difference between the L1 norm and the L2 norm?

Recap of regularization

Using the example of linear regression, our loss is given by the Mean Squared Error (MSE):

Image for post

and our goal is to minimize this loss:

Image for post

To prevent overfitting, we want to add abias towards less complex functions.That is, given two functions that can fit our data reasonably well, we prefer the simpler one. We do this by adding a regularization term, typically either the L1 norm or the squared L2 norm:

Image for post

#overfitting #data-science #regularization #machine-learning #norms

What is GEEK

Buddha Community

Visualizing regularization and the L1 and L2 norms

Visualizing regularization and the L1 and L2 norms

If you’ve taken an introductory Machine Learning class, you’ve certainly come across the issue of overfitting and been introduced to the concept of regularization and norm. I often see this being discussed purely by looking at the formulas, so I figured I’d try to give a better insight into why exactly minimising the norm induces regularization — and how L1 and L2 differ from each other — using some visual examples.

Prerequisite knowledge

  • Linear regression
  • Gradient descent
  • Some understanding of overfitting and regularization

Topics covered

  • Why does minimizing the norm induce regularization?
  • What’s the difference between the L1 norm and the L2 norm?

Recap of regularization

Using the example of linear regression, our loss is given by the Mean Squared Error (MSE):

Image for post

and our goal is to minimize this loss:

Image for post

To prevent overfitting, we want to add abias towards less complex functions.That is, given two functions that can fit our data reasonably well, we prefer the simpler one. We do this by adding a regularization term, typically either the L1 norm or the squared L2 norm:

Image for post

#overfitting #data-science #regularization #machine-learning #norms

Arvel  Parker

Arvel Parker

1591177440

Visual Analytics and Advanced Data Visualization

Visual Analytics is the scientific visualization to emerge an idea to present data in such a way so that it could be easily determined by anyone.

It gives an idea to the human mind to directly interact with interactive visuals which could help in making decisions easy and fast.

Visual Analytics basically breaks the complex data in a simple way.

The human brain is fast and is built to process things faster. So Data visualization provides its way to make things easy for students, researchers, mathematicians, scientists e

#blogs #data visualization #business analytics #data visualization techniques #visual analytics #visualizing ml models

Arvel  Parker

Arvel Parker

1591184760

Visual Analytics Services for Data-Driven Decision Making

Visual analytics is the process of collecting, examining complex and large data sets (structured or unstructured) to get useful information to draw conclusions about the datasets and visualize the data or information in the form of interactive visual interfaces and graphical manner.

Data analytics is usually accomplished by extracting or collecting data from different data sources in the form of numbers, statistics and overall activity of any organization, with different deep learning and analytics tools, which is then processed using data visualization software and presented in the form of graphical charts, figures, and bars.

In today technology world, data are reproduced in incredible rate and amount. Visual Analytics helps the world to make the vast and complex amount of data useful and readable. Visual Analytics is the process to collect and store the data at a faster rate than analyze the data and make it helpful.

As human brain process visual content better than it processes plain text. So using advanced visual interfaces, humans may directly interact with the data analysis capabilities of today’s computers and allow them to make well-informed decisions in complex situations.

It allows you to create beautiful, interactive dashboards or reports that are immediately available on the web or a mobile device. The tool has a Data Explorer that makes it easy for the novice analyst to create forecasts, decision trees, or other fancy statistical methods.

#blogs #data visualization #data visualization tools #visual analytics #visualizing ml models

Openshift Sandbox/Kata Containers

In this article, I will walk you through Openshift Sandbox containers based on Kata containers and how this is different from the traditional Openshift containers. 

Containers graphic.

Sandbox/kata containers are useful for users for the following scenarios: 

  1. Run 3rd party/untrusted application.
  2. Ensure kernel level isolation.
  3. Proper isolation through VM boundaries.

Prerequisites

You will need to install the following technologies before beginning this exercise:

Create the KataConfig

Create the KataConfig  CR and label the node on which Sandbox containers will be running. I have used sandbox=true label. 

apiVersion: kataconfiguration.openshift.io/v1

kind: KataConfig

metadata:

 name: cluster-kataconfig

spec:

   kataConfigPoolSelector:

     matchLabels:

        sandbox: 'true'

Verify the deployment:

oc describe kataconfig cluster-kataconfig

Name:         cluster-kataconfig

…..

Status:

  Installation Status:

    Is In Progress:  false

    Completed:

      Completed Nodes Count:  3

      Completed Nodes List:

        master0

        master1

        master2

    Failed:

    Inprogress:

  Prev Mcp Generation:  2

  Runtime Class:        kata

  Total Nodes Count:    3

  Un Installation Status:

    Completed:

    Failed:

    In Progress:

      Status:

  Upgrade Status:

Events:  <none>

Verify a new machine config(mc) and machine config pool(MCP) would have been created with the name Sandbox:

oc get mc |grep sandbox

50-enable-sandboxed-containers-extension

Verify the node configuration. Login to the Node label sandbox=true:

sh-4.4# cat /etc/crio/crio.conf.d/50-kata

[crio.runtime.runtimes.kata]

  runtime_path = "/usr/bin/containerd-shim-kata-v2"

  runtime_type = "vm"

  runtime_root = "/run/vc"

  privileged_without_host_devices = true

 Verify the Runtimeclass:

→ oc get runtimeclass

NAME   HANDLER   AGE

kata   kata      5d14h

This completes the deployment of the Sandbox container using Operator. 

Deploying the Application on Sandbox vs Regular Containers.

Let's try to deploy Sandbox and Regular containers from the same image and will verify the difference.

I have used a sample application image(quay.io/shailendra14k/getotp) based on spring boot for testing. 

#Regular POD definition:

apiVersion: apps/v1

kind: Deployment

metadata:

 name: webapp-deployment-6.0

 labels:

   app: webapp

   version: v6.0

spec:

 replicas: 2

 selector:

   matchLabels:

     app: webapp

 template:

   metadata:

     labels:

       app: webapp

       version: v6.0

   spec:

     containers:

     - name: webapp

       image: quay.io/shailendra14k/getotp:6.0

       imagePullPolicy: Always

       ports:

       - containerPort: 8180

Version 6.0 is Normal and 6.1 has the runtimeclass=kata. 

apiVersion: apps/v1

kind: Deployment

metadata:

 name: webapp-deployment-6.1

 labels:

   app: webapp

   version: v6.1

spec:

 replicas: 1

 selector:

   matchLabels:

     app: webapp

 template:

   metadata:

     labels:

       app: webapp

       version: v6.1

   Spec:

     runtimeClassName: kata

       containers:

     - name: webapp

       image: quay.io/shailendra14k/getotp:6.1

       imagePullPolicy: Always

       ports:

       - containerPort: 8180

 Deploy the application and verify the status:

➜  ~ oc get pods

NAME                                     READY   STATUS    RESTARTS   AGE

webapp-deployment-6.0-5d78fcd8db-ck7g7   1/1     Running   0          11m

webapp-deployment-6.1-6587f8997b-7f5p5   1/1     Running   0          11m

Compare the Uptime of Both Containers 

#Regular containers:

➜  ~ oc exec -it webapp-deployment-6.0-5d78fcd8db-ck7g7 -- cat /proc/uptime

416625.14 4640515.30

#Sandbox containers:

➜  ~ oc exec -it webapp-deployment-6.1-6587f8997b-7f5p5 -- cat /proc/uptime

670.63 658.26

You can observe the difference, which is huge, the uptime of the regular container Kernel is the same as the Node kernel(416625.14s= 4.8 Days). However, the Sandbox container kernel uptime is the time of the creation of the Pod(670.63s=11min)

Compare the Process on the Nodes

Log in to the node, where both containers are running. Use the oc debug node/<node-Name> 

#Regular containers:


sh-4.4# ps -eaf |grep 10008

1000800+  852898  852878  0 07:23 ?        00:00:08 java -jar /home/jboss/test.jar

1000800+ is the UID for the container.

#Sandbox containers:

 First, fetch the sandbox id using the crictl inspect command:

➜  ~oc get pods webapp-deployment-6.1-6587f8997b-7f5p5  -o jsonpath='{.status.containerStatuses[0]}'

{"containerID":"cri-o://b0768d7fbfd2d656b9900ba0b16b6078eb625b412784809ce516f9111a211e10" …..



#From the node

sh-4.4# crictl inspect b0768d7fbfd2d656b9900ba0b16b6078eb625b412784809ce516f9111a211e10 | jq -r '.info.sandboxID'

7740c8967dd6ad50ecd8c31558c3c844bbe7ac4e7ca1115e7f91eec974737270

Fetch the process id using the SandboxId:

sh-4.4# ps aux | grep 7740c8967dd6ad50ecd8c31558c3c844bbe7ac4e7ca1115e7f91eec974737270

root      852850  0.0  0.1 1337556 34816 ?       Sl   07:23   0:00 /usr/bin/containerd-shim-kata-v2 -namespace default -address  -publish-binary /usr/bin/crio -id 

7740c8967dd6ad50ecd8c31558c3c844bbe7ac4e7ca1115e7f91eec974737270

 

root      852859  0.0  0.0 122804  4776 ?        Sl   07:23   0:00 /usr/libexec/virtiofsd --fd=3 -o source=/run/kata-containers/shared/sandboxes/7740c8967dd6ad50ecd8c31558c3c844bbe7ac4e7ca1115e7f91eec974737270/shared -o cache=auto --syslog -o no_posix_lock -d --thread-pool-size=1

root      852865  0.9  1.8 2465200 603492 ?      Sl   07:23   0:15 /usr/libexec/qemu-kiwi -name sandbox-7740c8967dd6ad50ecd8c31558c3c844bbe7ac4e7ca1115e7f91eec974737270 -uuid ae09b8a0-1f89-4196-8402-cdcb471675bd -machine q35,accel=kvm,kernel_irqchip -cpu 

…… /run/vc/vm/7740c8967dd6ad50ecd8c31558c3c844bbe7ac4e7ca1115e7f91eec974737270/qemu.log -smp 1,cores=1,threads=1,sockets=12,maxcpus=12

 

root      852873  0.0  0.2 2514884 75800 ?       Sl   07:23   0:00 /usr/libexec/virtiofsd --fd=3 -o source=/run/kata-containers/shared/sandboxes/7740c8967dd6ad50ecd8c31558c3c844bbe7ac4e7ca1115e7f91eec974737270/shared -o cache=auto --syslog -o no_posix_lock -d --thread-pool-size=1

For the regular container, the process runs directly on the Node host kernel; However, for the Sandbox, the containers run inside the VMs.  

Conclusion

Thank you for reading! We saw how the Sandbox containers are deployed on Openshift and its comparison with the regular containers.  

Source: https://dzone.com/articles/openshift-sandboxkata-containers

#openshift #sandbox 

Visual Perception

Why do we visualize data?

It helps us to comprehend _huge _amounts of data by compressing it into a simple, easy to understand visualization. It helps us to find hidden patterns or see underlying problems in the data itself which might not have been obvious without a good chart.

Our brain is specialized to perceive the physical world around us as efficiently as possible. Evidence also suggests that we all develop the same visual systems, regardless of our environment or culture. This suggests that the development of the visual system isn’t solely based on our environment but is the result of millions of years of evolution. Which would contradict the tabula rasa theory (Ware 2021 ). Sorry John Locke. Our visual system splits tasks and thus has specialized regions that are responsible for segmentation (early rapid-processing), edge orientation detection, or color and light perception. We are able to extract features and find patterns with ease.

It is interesting that on a higher level of visual perception (visual cognition), our brains are able to highlight colors and shapes to focus on certain aspects. If we search for red-colored highways on a road map, we can use our visual cognition to highlight the red roads and put the other colors in the background. (Ware 2021)

#data-visualization #gestalt-principles #visualization #data-science #visual-variables