Noemi  Sanford

Noemi Sanford


The Edge Computing Opportunity: It’s Not What You Think

Cloudflare Workers® is one of the largest, most widely used edge computing platforms. We announced Cloudflare Workers nearly three years ago and it’s been generally available for the last two years. Over that time, we’ve seen hundreds of thousands of developers write tens of millions of lines of code that now run across Cloudflare’s network.

Just last quarter, 20,000 developers deployed for the first time a new application using Cloudflare Workers. More than 10% of all requests flowing through our network today use Cloudflare Workers. And, among our largest customers, approximately 20% are adopting Cloudflare Workers as part of their deployments. It’s been incredible to watch the platform grow.

Over the course of the coming week, which we’re calling Serverless Week, we’re going to be announcing a series of enhancements to the Cloudflare Workers platform to allow you to build much more complicated applications, lower your serverless computing bills, make your applications even faster, and prove that the Workers platform is secure to its core.

Matthew’s Hierarchy of Developers’ Needs

Before the week begins, I wanted to step back and talk a bit about what we’ve learned about edge computing over the course of the last three years. When we launched Cloudflare Workers we thought the killer feature was speed. Workers run across the Cloudflare network, closer to end users, so they inherently have faster response times than legacy, centralized serverless platforms.

However, we’ve learned by watching developers use Cloudflare Workers that there are a number of attributes to a development platform that are far more important than just speed. Speed is the icing on the cake, but it’s not, for most applications, an initial requirement. Focusing only on it is a mistake that will doom edge computing platforms to obscurity.

#serverless week #serverless #cloudflare workers #edge #product news

What is GEEK

Buddha Community

The Edge Computing Opportunity: It’s Not What You Think
Mike  Kozey

Mike Kozey


Test_cov_console: Flutter Console Coverage Test

Flutter Console Coverage Test

This small dart tools is used to generate Flutter Coverage Test report to console

How to install

Add a line like this to your package's pubspec.yaml (and run an implicit flutter pub get):

  test_cov_console: ^0.2.2

How to run

run the following command to make sure all flutter library is up-to-date

flutter pub get
Running "flutter pub get" in coverage...                            0.5s

run the following command to generate on coverage directory

flutter test --coverage
00:02 +1: All tests passed!

run the tool to generate report from

flutter pub run test_cov_console
File                                         |% Branch | % Funcs | % Lines | Uncovered Line #s |
lib/src/                                     |         |         |         |                   |
 print_cov.dart                              |  100.00 |  100.00 |   88.37 |...,149,205,206,207|
 print_cov_constants.dart                    |    0.00 |    0.00 |    0.00 |    no unit testing|
lib/                                         |         |         |         |                   |
 test_cov_console.dart                       |    0.00 |    0.00 |    0.00 |    no unit testing|
 All files with unit testing                 |  100.00 |  100.00 |   88.37 |                   |

Optional parameter

If not given a FILE, "coverage/" will be used.
-f, --file=<FILE>                      The target file to be reported
-e, --exclude=<STRING1,STRING2,...>    A list of contains string for files without unit testing
                                       to be excluded from report
-l, --line                             It will print Lines & Uncovered Lines only
                                       Branch & Functions coverage percentage will not be printed
-i, --ignore                           It will not print any file without unit testing
-m, --multi                            Report from multiple files
-c, --csv                              Output to CSV file
-o, --output=<CSV-FILE>                Full path of output CSV file
                                       If not given, "coverage/test_cov_console.csv" will be used
-t, --total                            Print only the total coverage
                                       Note: it will ignore all other option (if any), except -m
-p, --pass=<MINIMUM>                   Print only the whether total coverage is passed MINIMUM value or not
                                       If the value >= MINIMUM, it will print PASSED, otherwise FAILED
                                       Note: it will ignore all other option (if any), except -m
-h, --help                             Show this help

example run the tool with parameters

flutter pub run test_cov_console --file=coverage/ --exclude=_constants,_mock
File                                         |% Branch | % Funcs | % Lines | Uncovered Line #s |
lib/src/                                     |         |         |         |                   |
 print_cov.dart                              |  100.00 |  100.00 |   88.37 |...,149,205,206,207|
lib/                                         |         |         |         |                   |
 test_cov_console.dart                       |    0.00 |    0.00 |    0.00 |    no unit testing|
 All files with unit testing                 |  100.00 |  100.00 |   88.37 |                   |

report for multiple files (-m, --multi)

It support to run for multiple files with the followings directory structures:
1. No root module
2. With root module
You must run test_cov_console on <root> dir, and the report would be grouped by module, here is
the sample output for directory structure 'with root module':
flutter pub run test_cov_console --file=coverage/ --exclude=_constants,_mock --multi
File                                         |% Branch | % Funcs | % Lines | Uncovered Line #s |
lib/src/                                     |         |         |         |                   |
 print_cov.dart                              |  100.00 |  100.00 |   88.37 |...,149,205,206,207|
lib/                                         |         |         |         |                   |
 test_cov_console.dart                       |    0.00 |    0.00 |    0.00 |    no unit testing|
 All files with unit testing                 |  100.00 |  100.00 |   88.37 |                   |
File - module_a -                            |% Branch | % Funcs | % Lines | Uncovered Line #s |
lib/src/                                     |         |         |         |                   |
 print_cov.dart                              |  100.00 |  100.00 |   88.37 |...,149,205,206,207|
lib/                                         |         |         |         |                   |
 test_cov_console.dart                       |    0.00 |    0.00 |    0.00 |    no unit testing|
 All files with unit testing                 |  100.00 |  100.00 |   88.37 |                   |
File - module_b -                            |% Branch | % Funcs | % Lines | Uncovered Line #s |
lib/src/                                     |         |         |         |                   |
 print_cov.dart                              |  100.00 |  100.00 |   88.37 |...,149,205,206,207|
lib/                                         |         |         |         |                   |
 test_cov_console.dart                       |    0.00 |    0.00 |    0.00 |    no unit testing|
 All files with unit testing                 |  100.00 |  100.00 |   88.37 |                   |

Output to CSV file (-c, --csv, -o, --output)

flutter pub run test_cov_console -c --output=coverage/test_coverage.csv

#### sample CSV output file:
File,% Branch,% Funcs,% Lines,Uncovered Line #s
test_cov_console.dart,0.00,0.00,0.00,no unit testing
print_cov_constants.dart,0.00,0.00,0.00,no unit testing
All files with unit testing,100.00,100.00,86.07,""


Use this package as an executable

Install it

You can install the package from the command line:

dart pub global activate test_cov_console

Use it

The package has the following executables:

$ test_cov_console

Use this package as a library

Depend on it

Run this command:

With Dart:

 $ dart pub add test_cov_console

With Flutter:

 $ flutter pub add test_cov_console

This will add a line like this to your package's pubspec.yaml (and run an implicit dart pub get):

  test_cov_console: ^0.2.2

Alternatively, your editor might support dart pub get or flutter pub get. Check the docs for your editor to learn more.

Import it

Now in your Dart code, you can use:

import 'package:test_cov_console/test_cov_console.dart';


import 'package:flutter/material.dart';

void main() {

class MyApp extends StatelessWidget {
  // This widget is the root of your application.
  Widget build(BuildContext context) {
    return MaterialApp(
      title: 'Flutter Demo',
      theme: ThemeData(
        // This is the theme of your application.
        // Try running your application with "flutter run". You'll see the
        // application has a blue toolbar. Then, without quitting the app, try
        // changing the primarySwatch below to and then invoke
        // "hot reload" (press "r" in the console where you ran "flutter run",
        // or simply save your changes to "hot reload" in a Flutter IDE).
        // Notice that the counter didn't reset back to zero; the application
        // is not restarted.
        // This makes the visual density adapt to the platform that you run
        // the app on. For desktop platforms, the controls will be smaller and
        // closer together (more dense) than on mobile platforms.
        visualDensity: VisualDensity.adaptivePlatformDensity,
      home: MyHomePage(title: 'Flutter Demo Home Page'),

class MyHomePage extends StatefulWidget {
  MyHomePage({Key? key, required this.title}) : super(key: key);

  // This widget is the home page of your application. It is stateful, meaning
  // that it has a State object (defined below) that contains fields that affect
  // how it looks.

  // This class is the configuration for the state. It holds the values (in this
  // case the title) provided by the parent (in this case the App widget) and
  // used by the build method of the State. Fields in a Widget subclass are
  // always marked "final".

  final String title;

  _MyHomePageState createState() => _MyHomePageState();

class _MyHomePageState extends State<MyHomePage> {
  int _counter = 0;

  void _incrementCounter() {
    setState(() {
      // This call to setState tells the Flutter framework that something has
      // changed in this State, which causes it to rerun the build method below
      // so that the display can reflect the updated values. If we changed
      // _counter without calling setState(), then the build method would not be
      // called again, and so nothing would appear to happen.

  Widget build(BuildContext context) {
    // This method is rerun every time setState is called, for instance as done
    // by the _incrementCounter method above.
    // The Flutter framework has been optimized to make rerunning build methods
    // fast, so that you can just rebuild anything that needs updating rather
    // than having to individually change instances of widgets.
    return Scaffold(
      appBar: AppBar(
        // Here we take the value from the MyHomePage object that was created by
        // the method, and use it to set our appbar title.
        title: Text(widget.title),
      body: Center(
        // Center is a layout widget. It takes a single child and positions it
        // in the middle of the parent.
        child: Column(
          // Column is also a layout widget. It takes a list of children and
          // arranges them vertically. By default, it sizes itself to fit its
          // children horizontally, and tries to be as tall as its parent.
          // Invoke "debug painting" (press "p" in the console, choose the
          // "Toggle Debug Paint" action from the Flutter Inspector in Android
          // Studio, or the "Toggle Debug Paint" command in Visual Studio Code)
          // to see the wireframe for each widget.
          // Column has various properties to control how it sizes itself and
          // how it positions its children. Here we use mainAxisAlignment to
          // center the children vertically; the main axis here is the vertical
          // axis because Columns are vertical (the cross axis would be
          // horizontal).
          children: <Widget>[
              'You have pushed the button this many times:',
              style: Theme.of(context).textTheme.headline4,
      floatingActionButton: FloatingActionButton(
        onPressed: _incrementCounter,
        tooltip: 'Increment',
        child: Icon(Icons.add),
      ), // This trailing comma makes auto-formatting nicer for build methods.

Author: DigitalKatalis
Source Code: 
License: BSD-3-Clause license

#flutter #dart #test 

Zelma  Gerlach

Zelma Gerlach


Edge Computing: Device Edge vs. Cloud Edge

It sometimes makes sense to treat edge computing not as a generic category but as two distinct types of architectures: cloud edge and device edge.

Most people talk about edge computing as a singular type of architecture. But in some respects, it makes sense to think of edge computing as two fundamentally distinct types of architectures: Device edge and cloud edge.

Although a device edge and a cloud edge operate in similar ways from an architectural perspective, they cater to different types of use cases, and they pose different challenges.

Here’s a breakdown of how device edge and cloud edge compare.

Edge computing, defined

First, let’s briefly define edge computing itself.

Edge computing is any type of architecture in which workloads are hosted closer to the “edge” of the network — which typically means closer to end-users — than they would be in conventional architectures that centralize processing and data storage inside large data centers.

#cloud #edge computing #cloud computing #device edge #cloud edge

Juanita  Apio

Juanita Apio


Computing on the EDGE

Most of the companies in today’s era are moving towards cloud for their computation and storage needs. Cloud provides a one shot solution for all the needs for services across various aspects, be it large scale processing, ML model training and deployments or big data storage and analysis. This again requires moving data, video or audio to the cloud for processing and storage which also has certain shortcomings compared to do it at the client like

  • Network latency
  • Network cost and bandwidth
  • Privacy
  • Single point failure

If you look at other side, cloud have their own advantages and I will not talk about them right now. With all these in mind, how about a hybrid approach where few requirements can be moved to the client and some remain on the cloud. This is where EDGE computing comes into picture. According to Wiki here is the definition of the same

Edge computing_ is a distributed computing paradigm that brings computation and data storage closer to the location where it is needed, to improve response times and save bandwidth”_

Edge has a lot of use cases like

  • Trained ML models (specially video and audio) siting closer on the edge for inferencing or prediction.
  • IoT data analysis for large scale machines right at the edge

Look at Gartner hype cycle for emerging technologies. Edge is gaining momentum.

There are many platforms in the market specialised in edge deployments right from cloud solutions like azure iot hub, aws greengrass …, open source like _kubeedge, edgeX-Foundary _and third party like Intellisite etc.

I will focus this article on using one of the platforms for building an “Attendance platform” on the edge using facial recognition. I will add as many links as possible for your references.

Let us start with taking the first step and defining the requirements

  • Capture video from the camera
  • Recognise faces based on trained ML model
  • Display the video feed with recognised faces on the monitor
  • Log attendance in a database
  • Collect logs and metrics
  • Save unrecognised images to a central repository for retraining and improving model
  • Multi site deployments

Choosing a platform

Choosing the right platform from so many options was a bit tricky. For the POC, we looked at few pieces in the platform

  • Pricing
  • Infrastructure maintenance
  • Learning curve
  • Ease of use

There were other metrics as well but these were on top of our mind. Azure IoT looked pretty good in terms of above evaluation. We also looked at Kubeedge which provided deployments on Kubernetes on the edge. It is open source and looked promising. Looking at many components (cloud and edge) involved with maintenance overhead, we decided not to move ahead with open source. We were already using Azure cloud for other cloud infra, this also made our work a little more easier in choosing this platform. This also helped

Leading platform players

Designing the solution

Azure IoT hub provided 2 main components. One is the cloud component responsible for managing the deployments on edge and collection of data from them. The other is the edge component consisting of

  • Edge Agent : manages deployment and monitoring of modules on the IoT Edge device
  • Edge Hub : handles communications between modules on the IoT Edge device, and between the device and IoT Hub.

I will not go into the details, you can find more details here about the Azure IoT edge. To give a brief, Azure edge requires modules as containers which can to be pushed to the edge. The edge device first needs to be registered with the IoT Hub. Once the Edge agent connects with the hub, you can push your modules using a deployment.json file. The container runtime that Azure Edge uses is moby.

We used Azure IoT free tier which was sufficient for our POC. Check the pricing here

As per the requirements of the POC, this is what we came up with

The solution consists of various containers which are deployment on the edge as well as few cloud deployments. I will talk about each components in details as we move ahead.

As part of the POC, we assumed 2 sites where attendance needs to be taken at multiple gates. To simulate, we created 4 ubuntu machine. This is the ubuntu desktop image we used. For attendance, we created a video containing still photos of few filmstars and sportsperson. These videos will be used for attendance in order to simulate the cameras, one for each gate.

Modules in action

Camera module

It captures IP camera feed and pushed the frames for consumption

  • It uses python opencv for capture. For the POC, we read video files pushed inside the container.
  • Frames published to zeromq (brokerless message queue).
  • Used python3-opencv docker container as base image and pyzmq module for mq. Check this blog on how to use zeromq with python.

The module was configured to use a lot of environment variables, one being sampling rate of the video frames. Processing all frames require high memory and CPU, so it is always advisable to drop frames to reduce cpu load. This can be done in either camera module or inferencing module.

Inference Module

  • Used a pre-existing face recognition deep learning model for our inferencing needs.
  • Trained the model with easily available filmstars and sportsperson images.
  • The model was not trained with couple of images which were present in the video to showcase undetected image use case. These undetected images were stored in ADLS gen2, explained in the storage module.
  • Python pyzmq module was used to consume frames published by the camera module.
  • Not every frame was processed and few frames were dropped based on the configuration set via environment variables.
  • Once an image was recognised, a message (json) for attendance was send to the cloud using IoT Edge hub. Use this to specify routes in your deployment file.

#deep-learning #edge-computing #azure #edge

Brain  Crist

Brain Crist


Testing young children’s computational thinking

Computational thinking (CT) comprises a set of skills that are fundamental to computing and being taught in more and more schools across the world. There has been much debate about the details of what CT is and how it should be approached in education, particularly for younger students.

A girl doing digital making on a tablet

In our research seminar this week, we were joined by María Zapata Cáceres from the Universidad Rey Juan Carlos in Madrid. María shared research she and her colleagues have done around CT. Specifically, she presented work on how we can understand what CT skills young children are developing. Building on existing work on assessing CT, she and her colleagues have developed a reliable test for CT skills that can be used with children as young as 5.

María Zapata Cáceres

Why do we need to test computational thinking?

Until we can assess something, María argues, we don’t know what children have or haven’t learned or what they are capable of. While testing is often associated with the final stages in learning, in order to teach something well, educators need to understand where their students’ skills are to know what they are aiming for them to learn. With CT being taught in increasing numbers of schools and in many different ways, María argues that it is imperative to be able to test learners on it.

Screenshot from an online research seminar about computational thinking with María Zapata Cáceres

How was the test developed?

One of the key challenges for assessing learning is knowing whether the activities or questions you present to learners are actually testing what you intend them to. To make sure this is the case, assessments go through a process of validation: they are tried out with large groups to ensure that the results they give are valid. María’s and her colleagues’ CT test for beginners is based on a CT test developed by researcher Marcos Román González. That test had been validated, but since it is aimed at 10- to 16-year-olds, María and her colleagues needed to adapt it for younger children and then validate the adapted rest.

Developing the first version

The new test for beginners consists of 25 questions, each of which has four possible responses, which are to be answered within 40 minutes. The questions are of two types: one that involves using instructions to draw on a canvas, and one that involves moving characters through mazes. Since the test is for younger children, María and her colleagues designed it so it involves as little text as possible to reduce the need for reading; instead the test includes self-explanatory symbols.

Screenshot from an online research seminar about computational thinking with María Zapata Cáceres

Developing a second version based on feedback

To refine the test, the researchers consulted with a group of 45 experts about the difficulty of the questions and the test’s length of the test. The general feedback was very positive.

Drawing on the experts’ feedback, María and her colleagues made some very specific improvements to the test to make it more appropriate for younger children:

  • The improve test mandates that an verbal explanation be given to children at the start, to make sure they clearly understand how to take the test and don’t have to rely on reading the instructions.
  • In some areas, the researchers added written explanations where experts had identified that questions contained ambiguity that could cause the children to misinterpret them.
  • A key improvement was to adapt the grids in the original test to include pathways between each box of the maze. It was found that children could misinterpret the maze, for example as allowing diagonal moves between squares; the added pathways are visual cues that it clear that this is not possible.

Screenshot from an online research seminar about computational thinking with María Zapata Cáceres

Validating the test

After these improvements, the test was validated with 299 primary school students aged 5-12. To assess the differences the improvements might make, the students were given different version of the test. María and her colleagues found that the younger students benefited from the improvements, and the improvements made the test more reliable for testing students’ computational thinking: students made fewer errors due to ambiguity and misinterpretation.

Statistical analysis of the test results showed that the improved version of the test is reliable and can be used with confidence to assess the skills of younger children.

What can you use this test for?

Firstly, the test is a tool for educators who want to assess the skills young people have and develop over time. Secondly, the test is also valuable for researchers. It can be used to perform projects that evaluate the outcomes of different approaches to teaching computational thinking, as well as projects investigating the effectiveness of specific learning resources, because the test can be given to children before and again after they engage with the resources.

Assessment is one of the many tools educators use to shape their teaching and promote the learning of their students, and tools like this CT test developed by María and her colleagues allow us to better understand what children are learning.

Find out more & join our next seminar

The video and slides of María’s presentation are available on our seminars page. To find out more about this test, and the process used to create and validate it, read the paper by María and her colleagues.

Our final seminar of this series takes place Tuesday 28 July before we take a break for the summer. In the session, we will explore gender balance in computing, led by Katharine Childs, who works on the Gender Balance in Computing research project at the Raspberry Pi Foundation. You can find out more and sign up to attend for free on our Computing Education Research Seminars page.

#education #research #computational thinking #computing education #research seminar #assessment

How to Predict Housing Prices with Linear Regression?


The final objective is to estimate the cost of a certain house in a Boston suburb. In 1970, the Boston Standard Metropolitan Statistical Area provided the information. To examine and modify the data, we will use several techniques such as data pre-processing and feature engineering. After that, we'll apply a statistical model like regression model to anticipate and monitor the real estate market.

Project Outline:

  • EDA
  • Feature Engineering
  • Pick and Train a Model
  • Interpret
  • Conclusion


Before using a statistical model, the EDA is a good step to go through in order to:

  • Recognize the data set
  • Check to see if any information is missing.
  • Find some outliers.
  • To get more out of the data, add, alter, or eliminate some features.

Importing the Libraries

  • Recognize the data set
  • Check to see if any information is missing.
  • Find some outliers.
  • To get more out of the data, add, alter, or eliminate some features.

# Import the libraries #Dataframe/Numerical libraries import pandas as pd import numpy as np #Data visualization import as px import matplotlib import matplotlib.pyplot as plt import seaborn as sns #Machine learning model from sklearn.linear_model import LinearRegression

Reading the Dataset with Pandas

#Reading the data path='./housing.csv' housing_df=pd.read_csv(path,header=None,delim_whitespace=True)


Have a Look at the Columns

Crime: It refers to a town's per capita crime rate.

ZN: It is the percentage of residential land allocated for 25,000 square feet.

Indus: The amount of non-retail business lands per town is referred to as the indus.

CHAS: CHAS denotes whether or not the land is surrounded by a river.

NOX: The NOX stands for nitric oxide content (part per 10m)

RM: The average number of rooms per home is referred to as RM.

AGE: The percentage of owner-occupied housing built before 1940 is referred to as AGE.

DIS: Weighted distance to five Boston employment centers are referred to as dis.

RAD: Accessibility to radial highways index

TAX: The TAX columns denote the rate of full-value property taxes per $10,000 dollars.

B: B=1000(Bk — 0.63)2 is the outcome of the equation, where Bk is the proportion of blacks in each town.

PTRATIO: It refers to the student-to-teacher ratio in each community.

LSTAT: It refers to the population's lower socioeconomic status.

MEDV: It refers to the 1000-dollar median value of owner-occupied residences.

Data Preprocessing

# Check if there is any missing values. housing_df.isna().sum() CRIM       0 ZN         0 INDUS      0 CHAS       0 NOX        0 RM         0 AGE        0 DIS        0 RAD        0 TAX        0 PTRATIO    0 B          0 LSTAT      0 MEDV       0 dtype: int64

No missing values are found

We examine our data's mean, standard deviation, and percentiles.


Graph Data


The crime, area, sector, nitric oxides, 'B' appear to have multiple outliers at first look because the minimum and maximum values are so far apart. In the Age columns, the mean and the Q2(50 percentile) do not match.

We might double-check it by examining the distribution of each column.


  1. The rate of crime is rather low. The majority of values are in the range of 0 to 25. With a huge value and a value of zero.
  2. The majority of residential land is zoned for less than 25,000 square feet. Land zones larger than 25,000 square feet represent a small portion of the dataset.
  3. The percentage of non-retial commercial acres is mostly split between two ranges: 0-13 and 13-23.
  4. The majority of the properties are bordered by the river, although a tiny portion of the data is not.
  5. The content of nitrite dioxide has been trending lower from.3 to.7, with a little bump towards.8. It is permissible to leave a value in the range of 0.1–1.
  6. The number of rooms tends to cluster around the average.
  7. With time, the proportion of owner-occupied units rises.
  8. As the number of weights grows, the weight distance between 5 employment centers reduces. It could indicate that individuals choose to live in new high-employment areas.
  9. People choose to live in places with limited access to roadways (0-10). We have a 30th percentile outlier.
  10. The majority of dwelling taxes are in the range of $200-450, with large outliers around $700,000.
  11. The percentage of people with lower status tends to cluster around the median. The majority of persons are of lower social standing.

Because the model is overly generic, removing all outliers will underfit it. Keeping all outliers causes the model to overfit and become excessively accurate. The data's noise will be learned.

The approach is to establish a happy medium that prevents the model from becoming overly precise. When faced with a new set of data, however, they generalise well.

We'll keep numbers below 600 because there's a huge anomaly in the TAX column around 600.


Looking at the Distribution


The overall distribution, particularly the TAX, PTRATIO, and RAD, has improved slightly.



Perfect correlation is denoted by the clear values. The medium correlation between the columns is represented by the reds, while the negative correlation is represented by the black.

With a value of 0.89, we can see that 'MEDV', which is the medium price we wish to anticipate, is substantially connected with the number of rooms 'RM'. The proportion of black people in area 'B' with a value of 0.19 is followed by the residential land 'ZN' with a value of 0.32 and the percentage of black people in area 'ZN' with a value of 0.32.

The metrics that are most connected with price will be plotted.


Feature Engineering

Feature Scaling

Gradient descent is aided by feature scaling, which ensures that all features are on the same scale. It makes locating the local optimum much easier.

Mean standardization is one strategy to employ. It substitutes (target-mean) for the target to ensure that the feature has a mean of nearly zero.

def standard(X):    '''Standard makes the feature 'X' have a zero mean'''    mu=np.mean(X) #mean    std=np.std(X) #standard deviation    sta=(X-mu)/std # mean normalization    return mu,std,sta     mu,std,sta=standard(X) X=sta X


Choose and Train the Model

For the sake of the project, we'll apply linear regression.

Typically, we run numerous models and select the best one based on a particular criterion.

Linear regression is a sort of supervised learning model in which the response is continuous, as it relates to machine learning.

Form of Linear Regression

y= θX+θ1 or y= θ1+X1θ2 +X2θ3 + X3θ4

y is the target you will be predicting

0 is the coefficient

x is the input

We will Sklearn to develop and train the model

#Import the libraries to train the model from sklearn.model_selection import train_test_split from sklearn.linear_model import LinearRegression

Allow us to utilise the train/test method to learn a part of the data on one set and predict using another set using the train/test approach.

X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.4) #Create and Train the model model=LinearRegression().fit(X_train,y_train) #Generate prediction predictions_test=model.predict(X_test) #Compute loss to evaluate the model coefficient= model.coef_ intercept=model.intercept_ print(coefficient,intercept) [7.22218258] 24.66379606613584

In this example, you will learn the model using below hypothesis:

Price= 24.85 + 7.18* Room

It is interpreted as:

For a decided price of a house:

A 7.18-unit increase in the price is connected with a growth in the number of rooms.

As a side note, this is an association, not a cause!


You will need a metric to determine whether our hypothesis was right. The RMSE approach will be used.

Root Means Square Error (RMSE) is defined as the square root of the mean of square error. The difference between the true and anticipated numbers called the error. It's popular because it can be expressed in y-units, which is the median price of a home in our scenario.

def rmse(predict,actual):    return np.sqrt(np.mean(np.square(predict - actual))) # Split the Data into train and test set X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.4) #Create and Train the model model=LinearRegression().fit(X_train,y_train) #Generate prediction predictions_test=model.predict(X_test) #Compute loss to evaluate the model coefficient= model.coef_ intercept=model.intercept_ print(coefficient,intercept) loss=rmse(predictions_test,y_test) print('loss: ',loss) print(model.score(X_test,y_test)) #accuracy [7.43327725] 24.912055881970886 loss: 3.9673165450580714 0.7552661033654667 Loss will be 3.96

This means that y-units refer to the median value of occupied homes with 1000 dollars.

This will be less by 3960 dollars.

While learning the model you will have a high variance when you divide the data. Coefficient and intercept will vary. It's because when we utilized the train/test approach, we choose a set of data at random to place in either the train or test set. As a result, our theory will change each time the dataset is divided.

This problem can be solved using a technique called cross-validation.

Improvisation in the Model

With 'Forward Selection,' we'll iterate through each parameter to assist us choose the numbers characteristics to include in our model.

Forward Selection

  1. Choose the most appropriate variable (in our case based on high correlation)
  2. Add the next best variable to the model
  3. Some predetermined conditions must meet.

We'll use a random state of 1 so that each iteration yields the same outcome.

cols=[] los=[] los_train=[] scor=[] i=0 while i < len(high_corr_var):    cols.append(high_corr_var[i])        # Select inputs variables    X=new_df[cols]        #mean normalization    mu,std,sta=standard(X)    X=sta        # Split the data into training and testing    X_train,X_test,y_train,y_test= train_test_split(X,y,random_state=1)        #fit the model to the training    lnreg=LinearRegression().fit(X_train,y_train)        #make prediction on the training test    prediction_train=lnreg.predict(X_train)        #make prediction on the testing test    prediction=lnreg.predict(X_test)        #compute the loss on train test    loss=rmse(prediction,y_test)    loss_train=rmse(prediction_train,y_train)    los_train.append(loss_train)    los.append(loss)        #compute the score    score=lnreg.score(X_test,y_test)    scor.append(score)        i+=1

We have a big 'loss' with a smaller collection of variables, yet our system will overgeneralize in this scenario. Although we have a reduced 'loss,' we have a large number of variables. However, if the model grows too precise, it may not generalize well to new data.

In order for our model to generalize well with another set of data, we might use 6 or 7 features. The characteristic chosen is descending based on how strong the price correlation is.

high_corr_var ['RM', 'ZN', 'B', 'CHAS', 'RAD', 'DIS', 'CRIM', 'NOX', 'AGE', 'TAX', 'INDUS', 'PTRATIO', 'LSTAT']

With 'RM' having a high price correlation and LSTAT having a negative price correlation.

# Create a list of features names feature_cols=['RM','ZN','B','CHAS','RAD','CRIM','DIS','NOX'] #Select inputs variables X=new_df[feature_cols] # Split the data into training and testing sets X_train,X_test,y_train,y_test= train_test_split(X,y, random_state=1) # feature engineering mu,std,sta=standard(X) X=sta # fit the model to the trainning data lnreg=LinearRegression().fit(X_train,y_train) # make prediction on the testing test prediction=lnreg.predict(X_test) # compute the loss loss=rmse(prediction,y_test) print('loss: ',loss) lnreg.score(X_test,y_test) loss: 3.212659865936143 0.8582338376696363

The test set yielded a loss of 3.21 and an accuracy of 85%.

Other factors, such as alpha, the learning rate at which our model learns, could still be tweaked to improve our model. Alternatively, return to the preprocessing section and working to increase the parameter distribution.

For more details regarding scraping real estate data you can contact Scraping Intelligence today