Rust Wasm Mandelbrot: A Simple Mandelbrot-computation

Mandelbrot set

We consider the sequence

$z_{n+1} = z_n^2 + c$, with $z_0=0$,

where $c$ is a complex number.

The Mandelbrot set are all $c$ such that the sequence is bounded.

The Mandelbrot sequence diverges if $|z_n|\ge 2$ and $|z_n|\ge |c|$ for any $n$.

The reverse triangle inequality, $|x+y| \ge |x| - |y|$.

$|z_{n+1}|=|z_n^2+c|\ge |z_n|^2-|c| \ge 2|z_n|-|z_n| = |z_n|$.

We have shown that such sequences are unbounded and do not belong complement of the Mandelbrot set.

A corollary is that the Mandelbrot set is contained in the circle $|c| \le 2$.

Example Image Snapshot of an early zoom in the Mandelbrot image.

Implementation details

The escape iteration is the first iteration where $|z|\ge 2$. We compute the escape iteration for each point in the image. The final task is then to decide how to map the escape iteration to a color. A major difficulty with any Mandelbrot computation is to balance the colors properly. We compute the frequency (occurances) of each escape iteration, and compute the cumulative sum of the frequencies. After we bin the different escape frequencies to have an approximately linear spread.

On the typescript side, the image is stored as an ImageData, consisting of a UInt8ClampedArray. The RGBA values are stored in steps of 4, repeating like

R,G,B,A, R,G,B,A, ...

On the Rust-side, I naively used a vector with u32, and used bit manipulations to get the bits for each color etc. This turned out to be a nightmare, due to little endian-problem. I switched to using a vec of u8, and then it worked.

Things to improve

Webpack

In order to have a reactive development environment, I use webpack. One thing that drives me crazy is that the CSS-code is written directly in the HTML code. In order to use a separate CSS-file, we need to configure webpack to keep track of CSS-files. There are a few npm packages to install, and then it should work. Maybe not sufficiently important (here) to make the effort, but I still want to explore webpack a bit more.

Extend to other iterative procedures

One could consider looking att different iterative schemes, like Julia-sets etc. The purpose of this project is mainly to get things working with Rust-wasm-typescript, with something that is computationally heavy.

Better (less minimalistic) User Interface

The resulting webpage consist of a single canvas, that have the mandelbrot image, and where you can zoom in using the mouse. All parameters (colorscheme etc) are hardcoded in the typescript program. It should be easy to make some buttons, to make it a bit more user friendly.

Concurrency improvement in Rust

The code is only using a single thread. Since the escape iteration detection is point-wise, this is a perfect example of an application that can be trivially parallelized. I should investigate how to do this in Rust.

Installation Steps

  1. First step to initialize the project: cargo init --lib mandelbrot.
  2. Create a directory mandelbrot/www.

Move in to the mandelbrot/www directory.

Initialize a node-project: npm init -y.

Install webpack npm install --save webpack webpack-cli copy-webpack-plugin.

Install development server: npm install --save-dev webpack-dev-server.

Create a .gitignore in www to avoid storing node_modules.

Create a sub-directory public.

Create index.html, index.js and bootstrap.js in www.

Create and configure webpack.config.js.

Add "dev": "webpack-dev-server" to the node configuration (package.json).

Add "mandelbrot": "file:../pkg", to package.json. In order to use typescript (from ~/www):

npm install --save typescript ts-loader.

Configure tsconfig.json.


Download Details:

Author: andreasatle
Source Code: https://github.com/andreasatle/rust-wasm-mandelbrot

#rust 

What is GEEK

Buddha Community

Rust Wasm Mandelbrot: A Simple Mandelbrot-computation

Serde Rust: Serialization Framework for Rust

Serde

*Serde is a framework for serializing and deserializing Rust data structures efficiently and generically.*

You may be looking for:

Serde in action

Click to show Cargo.toml. Run this code in the playground.

[dependencies]

# The core APIs, including the Serialize and Deserialize traits. Always
# required when using Serde. The "derive" feature is only required when
# using #[derive(Serialize, Deserialize)] to make Serde work with structs
# and enums defined in your crate.
serde = { version = "1.0", features = ["derive"] }

# Each data format lives in its own crate; the sample code below uses JSON
# but you may be using a different one.
serde_json = "1.0"

 

use serde::{Serialize, Deserialize};

#[derive(Serialize, Deserialize, Debug)]
struct Point {
    x: i32,
    y: i32,
}

fn main() {
    let point = Point { x: 1, y: 2 };

    // Convert the Point to a JSON string.
    let serialized = serde_json::to_string(&point).unwrap();

    // Prints serialized = {"x":1,"y":2}
    println!("serialized = {}", serialized);

    // Convert the JSON string back to a Point.
    let deserialized: Point = serde_json::from_str(&serialized).unwrap();

    // Prints deserialized = Point { x: 1, y: 2 }
    println!("deserialized = {:?}", deserialized);
}

Getting help

Serde is one of the most widely used Rust libraries so any place that Rustaceans congregate will be able to help you out. For chat, consider trying the #rust-questions or #rust-beginners channels of the unofficial community Discord (invite: https://discord.gg/rust-lang-community), the #rust-usage or #beginners channels of the official Rust Project Discord (invite: https://discord.gg/rust-lang), or the #general stream in Zulip. For asynchronous, consider the [rust] tag on StackOverflow, the /r/rust subreddit which has a pinned weekly easy questions post, or the Rust Discourse forum. It's acceptable to file a support issue in this repo but they tend not to get as many eyes as any of the above and may get closed without a response after some time.

Download Details:
Author: serde-rs
Source Code: https://github.com/serde-rs/serde
License: View license

#rust  #rustlang 

Running Rust in the Browser with Web Assembly

I’ve recently been working on a Rust course for the Qvault app. In order to write a more engaging course, I want students to be able to write and execute code right in the browser. As I’ve learned from my previous posts on this topic, the easiest way to sandbox code execution on a server is to not execute code on a server. Enter Web Assembly, stage left.

For those of you who don’t care about how it works, and just want to give it a try, checkout the demo: Rust WASM Playground.

How It Works

The architecture is fairly simple:

  • User writes code in the browser
  • Browser sends code to server
  • Server adds some glue and compiles code to WASM
  • Server sends WASM bytes or compiler errors back to browser
  • Browser runs WASM and displays console output, or shows compiler errors

Writing code and shipping it to the server hopefully needs no explanation, it’s a simple text editor coupled with the fetch API. The first interesting thing we do is compile the code on the server.

Compiling the Code

Qvault’s server is written in Go. I have a simple HTTP handler with the following signature:

func (cfg config) compileRustHandler(w http.ResponseWriter, r *http.Request)

At the start of the function we unmarshal the code which was provided in a JSON body:

type parameters struct {
	Code string
}

decoder := json.NewDecoder(r.Body)
params := parameters{}
err := decoder.Decode(&params)
if err != nil {
	respondWithError(w, 500, "Couldn't decode parameters")
	return
}

Next, we create a temporary folder on disk that we’ll use as a “scratch pad” to create a Rust project.

usr, err := user.Current()
if err != nil {
	respondWithError(w, 500, "Couldn't get system user")
	return
}
workingDir := filepath.Join(usr.HomeDir, ".wasm", uuid.New().String())
err = os.MkdirAll(workingDir, os.ModePerm)
if err != nil {
	respondWithError(w, 500, "Couldn't create directory for compilation")
	return
}
defer func() {
	err = os.RemoveAll(workingDir)
	if err != nil {
		respondWithError(w, 500, "Couldn't clean up code from compilation")
		return
	}
}()

As you can see, we create the project under the .wasm/uuid path in the home directory. We also defer an os.RemoveAll function that will delete this folder when we are doing handling this request.

#golang #languages #rust #wasm #rust #rustlang #wasm #web assembly

Rust Wasm Mandelbrot: A Simple Mandelbrot-computation

Mandelbrot set

We consider the sequence

$z_{n+1} = z_n^2 + c$, with $z_0=0$,

where $c$ is a complex number.

The Mandelbrot set are all $c$ such that the sequence is bounded.

The Mandelbrot sequence diverges if $|z_n|\ge 2$ and $|z_n|\ge |c|$ for any $n$.

The reverse triangle inequality, $|x+y| \ge |x| - |y|$.

$|z_{n+1}|=|z_n^2+c|\ge |z_n|^2-|c| \ge 2|z_n|-|z_n| = |z_n|$.

We have shown that such sequences are unbounded and do not belong complement of the Mandelbrot set.

A corollary is that the Mandelbrot set is contained in the circle $|c| \le 2$.

Example Image Snapshot of an early zoom in the Mandelbrot image.

Implementation details

The escape iteration is the first iteration where $|z|\ge 2$. We compute the escape iteration for each point in the image. The final task is then to decide how to map the escape iteration to a color. A major difficulty with any Mandelbrot computation is to balance the colors properly. We compute the frequency (occurances) of each escape iteration, and compute the cumulative sum of the frequencies. After we bin the different escape frequencies to have an approximately linear spread.

On the typescript side, the image is stored as an ImageData, consisting of a UInt8ClampedArray. The RGBA values are stored in steps of 4, repeating like

R,G,B,A, R,G,B,A, ...

On the Rust-side, I naively used a vector with u32, and used bit manipulations to get the bits for each color etc. This turned out to be a nightmare, due to little endian-problem. I switched to using a vec of u8, and then it worked.

Things to improve

Webpack

In order to have a reactive development environment, I use webpack. One thing that drives me crazy is that the CSS-code is written directly in the HTML code. In order to use a separate CSS-file, we need to configure webpack to keep track of CSS-files. There are a few npm packages to install, and then it should work. Maybe not sufficiently important (here) to make the effort, but I still want to explore webpack a bit more.

Extend to other iterative procedures

One could consider looking att different iterative schemes, like Julia-sets etc. The purpose of this project is mainly to get things working with Rust-wasm-typescript, with something that is computationally heavy.

Better (less minimalistic) User Interface

The resulting webpage consist of a single canvas, that have the mandelbrot image, and where you can zoom in using the mouse. All parameters (colorscheme etc) are hardcoded in the typescript program. It should be easy to make some buttons, to make it a bit more user friendly.

Concurrency improvement in Rust

The code is only using a single thread. Since the escape iteration detection is point-wise, this is a perfect example of an application that can be trivially parallelized. I should investigate how to do this in Rust.

Installation Steps

  1. First step to initialize the project: cargo init --lib mandelbrot.
  2. Create a directory mandelbrot/www.

Move in to the mandelbrot/www directory.

Initialize a node-project: npm init -y.

Install webpack npm install --save webpack webpack-cli copy-webpack-plugin.

Install development server: npm install --save-dev webpack-dev-server.

Create a .gitignore in www to avoid storing node_modules.

Create a sub-directory public.

Create index.html, index.js and bootstrap.js in www.

Create and configure webpack.config.js.

Add "dev": "webpack-dev-server" to the node configuration (package.json).

Add "mandelbrot": "file:../pkg", to package.json. In order to use typescript (from ~/www):

npm install --save typescript ts-loader.

Configure tsconfig.json.


Download Details:

Author: andreasatle
Source Code: https://github.com/andreasatle/rust-wasm-mandelbrot

#rust 

Awesome  Rust

Awesome Rust

1654894080

Serde JSON: JSON Support for Serde Framework

Serde JSON

Serde is a framework for serializing and deserializing Rust data structures efficiently and generically.

[dependencies]
serde_json = "1.0"

You may be looking for:

JSON is a ubiquitous open-standard format that uses human-readable text to transmit data objects consisting of key-value pairs.

{
    "name": "John Doe",
    "age": 43,
    "address": {
        "street": "10 Downing Street",
        "city": "London"
    },
    "phones": [
        "+44 1234567",
        "+44 2345678"
    ]
}

There are three common ways that you might find yourself needing to work with JSON data in Rust.

  • As text data. An unprocessed string of JSON data that you receive on an HTTP endpoint, read from a file, or prepare to send to a remote server.
  • As an untyped or loosely typed representation. Maybe you want to check that some JSON data is valid before passing it on, but without knowing the structure of what it contains. Or you want to do very basic manipulations like insert a key in a particular spot.
  • As a strongly typed Rust data structure. When you expect all or most of your data to conform to a particular structure and want to get real work done without JSON's loosey-goosey nature tripping you up.

Serde JSON provides efficient, flexible, safe ways of converting data between each of these representations.

Operating on untyped JSON values

Any valid JSON data can be manipulated in the following recursive enum representation. This data structure is serde_json::Value.

enum Value {
    Null,
    Bool(bool),
    Number(Number),
    String(String),
    Array(Vec<Value>),
    Object(Map<String, Value>),
}

A string of JSON data can be parsed into a serde_json::Value by the serde_json::from_str function. There is also from_slice for parsing from a byte slice &[u8] and from_reader for parsing from any io::Read like a File or a TCP stream.

use serde_json::{Result, Value};

fn untyped_example() -> Result<()> {
    // Some JSON input data as a &str. Maybe this comes from the user.
    let data = r#"
        {
            "name": "John Doe",
            "age": 43,
            "phones": [
                "+44 1234567",
                "+44 2345678"
            ]
        }"#;

    // Parse the string of data into serde_json::Value.
    let v: Value = serde_json::from_str(data)?;

    // Access parts of the data by indexing with square brackets.
    println!("Please call {} at the number {}", v["name"], v["phones"][0]);

    Ok(())
}

The result of square bracket indexing like v["name"] is a borrow of the data at that index, so the type is &Value. A JSON map can be indexed with string keys, while a JSON array can be indexed with integer keys. If the type of the data is not right for the type with which it is being indexed, or if a map does not contain the key being indexed, or if the index into a vector is out of bounds, the returned element is Value::Null.

When a Value is printed, it is printed as a JSON string. So in the code above, the output looks like Please call "John Doe" at the number "+44 1234567". The quotation marks appear because v["name"] is a &Value containing a JSON string and its JSON representation is "John Doe". Printing as a plain string without quotation marks involves converting from a JSON string to a Rust string with as_str() or avoiding the use of Value as described in the following section.

The Value representation is sufficient for very basic tasks but can be tedious to work with for anything more significant. Error handling is verbose to implement correctly, for example imagine trying to detect the presence of unrecognized fields in the input data. The compiler is powerless to help you when you make a mistake, for example imagine typoing v["name"] as v["nmae"] in one of the dozens of places it is used in your code.

Parsing JSON as strongly typed data structures

Serde provides a powerful way of mapping JSON data into Rust data structures largely automatically.

use serde::{Deserialize, Serialize};
use serde_json::Result;

#[derive(Serialize, Deserialize)]
struct Person {
    name: String,
    age: u8,
    phones: Vec<String>,
}

fn typed_example() -> Result<()> {
    // Some JSON input data as a &str. Maybe this comes from the user.
    let data = r#"
        {
            "name": "John Doe",
            "age": 43,
            "phones": [
                "+44 1234567",
                "+44 2345678"
            ]
        }"#;

    // Parse the string of data into a Person object. This is exactly the
    // same function as the one that produced serde_json::Value above, but
    // now we are asking it for a Person as output.
    let p: Person = serde_json::from_str(data)?;

    // Do things just like with any other Rust data structure.
    println!("Please call {} at the number {}", p.name, p.phones[0]);

    Ok(())
}

This is the same serde_json::from_str function as before, but this time we assign the return value to a variable of type Person so Serde will automatically interpret the input data as a Person and produce informative error messages if the layout does not conform to what a Person is expected to look like.

Any type that implements Serde's Deserialize trait can be deserialized this way. This includes built-in Rust standard library types like Vec<T> and HashMap<K, V>, as well as any structs or enums annotated with #[derive(Deserialize)].

Once we have p of type Person, our IDE and the Rust compiler can help us use it correctly like they do for any other Rust code. The IDE can autocomplete field names to prevent typos, which was impossible in the serde_json::Value representation. And the Rust compiler can check that when we write p.phones[0], then p.phones is guaranteed to be a Vec<String> so indexing into it makes sense and produces a String.

The necessary setup for using Serde's derive macros is explained on the Using derive page of the Serde site.

Constructing JSON values

Serde JSON provides a json! macro to build serde_json::Value objects with very natural JSON syntax.

use serde_json::json;

fn main() {
    // The type of `john` is `serde_json::Value`
    let john = json!({
        "name": "John Doe",
        "age": 43,
        "phones": [
            "+44 1234567",
            "+44 2345678"
        ]
    });

    println!("first phone number: {}", john["phones"][0]);

    // Convert to a string of JSON and print it out
    println!("{}", john.to_string());
}

The Value::to_string() function converts a serde_json::Value into a String of JSON text.

One neat thing about the json! macro is that variables and expressions can be interpolated directly into the JSON value as you are building it. Serde will check at compile time that the value you are interpolating is able to be represented as JSON.

let full_name = "John Doe";
let age_last_year = 42;

// The type of `john` is `serde_json::Value`
let john = json!({
    "name": full_name,
    "age": age_last_year + 1,
    "phones": [
        format!("+44 {}", random_phone())
    ]
});

This is amazingly convenient, but we have the problem we had before with Value: the IDE and Rust compiler cannot help us if we get it wrong. Serde JSON provides a better way of serializing strongly-typed data structures into JSON text.

Creating JSON by serializing data structures

A data structure can be converted to a JSON string by serde_json::to_string. There is also serde_json::to_vec which serializes to a Vec<u8> and serde_json::to_writer which serializes to any io::Write such as a File or a TCP stream.

use serde::{Deserialize, Serialize};
use serde_json::Result;

#[derive(Serialize, Deserialize)]
struct Address {
    street: String,
    city: String,
}

fn print_an_address() -> Result<()> {
    // Some data structure.
    let address = Address {
        street: "10 Downing Street".to_owned(),
        city: "London".to_owned(),
    };

    // Serialize it to a JSON string.
    let j = serde_json::to_string(&address)?;

    // Print, write to a file, or send to an HTTP server.
    println!("{}", j);

    Ok(())
}

Any type that implements Serde's Serialize trait can be serialized this way. This includes built-in Rust standard library types like Vec<T> and HashMap<K, V>, as well as any structs or enums annotated with #[derive(Serialize)].

Performance

It is fast. You should expect in the ballpark of 500 to 1000 megabytes per second deserialization and 600 to 900 megabytes per second serialization, depending on the characteristics of your data. This is competitive with the fastest C and C++ JSON libraries or even 30% faster for many use cases. Benchmarks live in the serde-rs/json-benchmark repo.

Getting help

Serde is one of the most widely used Rust libraries, so any place that Rustaceans congregate will be able to help you out. For chat, consider trying the #rust-questions or #rust-beginners channels of the unofficial community Discord (invite: https://discord.gg/rust-lang-community), the #rust-usage or #beginners channels of the official Rust Project Discord (invite: https://discord.gg/rust-lang), or the #general stream in Zulip. For asynchronous, consider the [rust] tag on StackOverflow, the /r/rust subreddit which has a pinned weekly easy questions post, or the Rust Discourse forum. It's acceptable to file a support issue in this repo, but they tend not to get as many eyes as any of the above and may get closed without a response after some time.

No-std support

As long as there is a memory allocator, it is possible to use serde_json without the rest of the Rust standard library. This is supported on Rust 1.36+. Disable the default "std" feature and enable the "alloc" feature:

[dependencies]
serde_json = { version = "1.0", default-features = false, features = ["alloc"] }

For JSON support in Serde without a memory allocator, please see the serde-json-core crate.

Link: https://crates.io/crates/serde_json

#rust  #rustlang  #encode   #json 

How to Predict Housing Prices with Linear Regression?

How-to-Predict-Housing-Prices-with-Linear-Regression

The final objective is to estimate the cost of a certain house in a Boston suburb. In 1970, the Boston Standard Metropolitan Statistical Area provided the information. To examine and modify the data, we will use several techniques such as data pre-processing and feature engineering. After that, we'll apply a statistical model like regression model to anticipate and monitor the real estate market.

Project Outline:

  • EDA
  • Feature Engineering
  • Pick and Train a Model
  • Interpret
  • Conclusion

EDA

Before using a statistical model, the EDA is a good step to go through in order to:

  • Recognize the data set
  • Check to see if any information is missing.
  • Find some outliers.
  • To get more out of the data, add, alter, or eliminate some features.

Importing the Libraries

  • Recognize the data set
  • Check to see if any information is missing.
  • Find some outliers.
  • To get more out of the data, add, alter, or eliminate some features.

# Import the libraries #Dataframe/Numerical libraries import pandas as pd import numpy as np #Data visualization import plotly.express as px import matplotlib import matplotlib.pyplot as plt import seaborn as sns #Machine learning model from sklearn.linear_model import LinearRegression

Reading the Dataset with Pandas

#Reading the data path='./housing.csv' housing_df=pd.read_csv(path,header=None,delim_whitespace=True)

 CRIMZNINDUSCHASNOXRMAGEDISRADTAXPTRATIOBLSTATMEDV
00.0063218.02.3100.5386.57565.24.09001296.015.3396.904.9824.0
10.027310.07.0700.4696.42178.94.96712242.017.8396.909.1421.6
20.027290.07.0700.4697.18561.14.96712242.017.8392.834.0334.7
30.032370.02.1800.4586.99845.86.06223222.018.7394.632.9433.4
40.069050.02.1800.4587.14754.26.06223222.018.7396.905.3336.2
.............................................
5010.062630.011.9300.5736.59369.12.47861273.021.0391.999.6722.4
5020.045270.011.9300.5736.12076.72.28751273.021.0396.909.0820.6
5030.060760.011.9300.5736.97691.02.16751273.021.0396.905.6423.9
5040.109590.011.9300.5736.79489.32.38891273.021.0393.456.4822.0
5050.047410.011.9300.5736.03080.82.50501273.021.0396.907.8811.9

Have a Look at the Columns

Crime: It refers to a town's per capita crime rate.

ZN: It is the percentage of residential land allocated for 25,000 square feet.

Indus: The amount of non-retail business lands per town is referred to as the indus.

CHAS: CHAS denotes whether or not the land is surrounded by a river.

NOX: The NOX stands for nitric oxide content (part per 10m)

RM: The average number of rooms per home is referred to as RM.

AGE: The percentage of owner-occupied housing built before 1940 is referred to as AGE.

DIS: Weighted distance to five Boston employment centers are referred to as dis.

RAD: Accessibility to radial highways index

TAX: The TAX columns denote the rate of full-value property taxes per $10,000 dollars.

B: B=1000(Bk — 0.63)2 is the outcome of the equation, where Bk is the proportion of blacks in each town.

PTRATIO: It refers to the student-to-teacher ratio in each community.

LSTAT: It refers to the population's lower socioeconomic status.

MEDV: It refers to the 1000-dollar median value of owner-occupied residences.

Data Preprocessing

# Check if there is any missing values. housing_df.isna().sum() CRIM       0 ZN         0 INDUS      0 CHAS       0 NOX        0 RM         0 AGE        0 DIS        0 RAD        0 TAX        0 PTRATIO    0 B          0 LSTAT      0 MEDV       0 dtype: int64

No missing values are found

We examine our data's mean, standard deviation, and percentiles.

housing_df.describe()

Graph Data

 CRIMZNINDUSCHASNOXRMAGEDISRADTAXPTRATIOBLSTATMEDV
count506.000000506.000000506.000000506.000000506.000000506.000000506.000000506.000000506.000000506.000000506.000000506.000000506.000000506.000000
mean3.61352411.36363611.1367790.0691700.5546956.28463468.5749013.7950439.549407408.23715418.455534356.67403212.65306322.532806
std8.60154523.3224536.8603530.2539940.1158780.70261728.1488612.1057108.707259168.5371162.16494691.2948647.1410629.197104
min0.0063200.0000000.4600000.0000000.3850003.5610002.9000001.1296001.000000187.00000012.6000000.3200001.7300005.000000
25%0.0820450.0000005.1900000.0000000.4490005.88550045.0250002.1001754.000000279.00000017.400000375.3775006.95000017.025000
50%0.2565100.0000009.6900000.0000000.5380006.20850077.5000003.2074505.000000330.00000019.050000391.44000011.36000021.200000
75%3.67708312.50000018.1000000.0000000.6240006.62350094.0750005.18842524.000000666.00000020.200000396.22500016.95500025.000000
max88.976200100.00000027.7400001.0000000.8710008.780000100.00000012.12650024.000000711.00000022.000000396.90000037.97000050.000000

The crime, area, sector, nitric oxides, 'B' appear to have multiple outliers at first look because the minimum and maximum values are so far apart. In the Age columns, the mean and the Q2(50 percentile) do not match.

We might double-check it by examining the distribution of each column.

Inferences

  1. The rate of crime is rather low. The majority of values are in the range of 0 to 25. With a huge value and a value of zero.
  2. The majority of residential land is zoned for less than 25,000 square feet. Land zones larger than 25,000 square feet represent a small portion of the dataset.
  3. The percentage of non-retial commercial acres is mostly split between two ranges: 0-13 and 13-23.
  4. The majority of the properties are bordered by the river, although a tiny portion of the data is not.
  5. The content of nitrite dioxide has been trending lower from.3 to.7, with a little bump towards.8. It is permissible to leave a value in the range of 0.1–1.
  6. The number of rooms tends to cluster around the average.
  7. With time, the proportion of owner-occupied units rises.
  8. As the number of weights grows, the weight distance between 5 employment centers reduces. It could indicate that individuals choose to live in new high-employment areas.
  9. People choose to live in places with limited access to roadways (0-10). We have a 30th percentile outlier.
  10. The majority of dwelling taxes are in the range of $200-450, with large outliers around $700,000.
  11. The percentage of people with lower status tends to cluster around the median. The majority of persons are of lower social standing.

Because the model is overly generic, removing all outliers will underfit it. Keeping all outliers causes the model to overfit and become excessively accurate. The data's noise will be learned.

The approach is to establish a happy medium that prevents the model from becoming overly precise. When faced with a new set of data, however, they generalise well.

We'll keep numbers below 600 because there's a huge anomaly in the TAX column around 600.

new_df=housing_df[housing_df['TAX']<600]

Looking at the Distribution

Looking-at-the-Distribution

The overall distribution, particularly the TAX, PTRATIO, and RAD, has improved slightly.

Correlation

Correlation

Perfect correlation is denoted by the clear values. The medium correlation between the columns is represented by the reds, while the negative correlation is represented by the black.

With a value of 0.89, we can see that 'MEDV', which is the medium price we wish to anticipate, is substantially connected with the number of rooms 'RM'. The proportion of black people in area 'B' with a value of 0.19 is followed by the residential land 'ZN' with a value of 0.32 and the percentage of black people in area 'ZN' with a value of 0.32.

The metrics that are most connected with price will be plotted.

The-metrics-that-are-most-connected

Feature Engineering

Feature Scaling

Gradient descent is aided by feature scaling, which ensures that all features are on the same scale. It makes locating the local optimum much easier.

Mean standardization is one strategy to employ. It substitutes (target-mean) for the target to ensure that the feature has a mean of nearly zero.

def standard(X):    '''Standard makes the feature 'X' have a zero mean'''    mu=np.mean(X) #mean    std=np.std(X) #standard deviation    sta=(X-mu)/std # mean normalization    return mu,std,sta     mu,std,sta=standard(X) X=sta X

 CRIMZNINDUSCHASNOXRMAGEDISRADTAXPTRATIOBLSTAT
0-0.6091290.092792-1.019125-0.2809760.2586700.2791350.162095-0.167660-2.105767-0.235130-1.1368630.401318-0.933659
1-0.575698-0.598153-0.225291-0.280976-0.4237950.0492520.6482660.250975-1.496334-1.032339-0.0041750.401318-0.219350
2-0.575730-0.598153-0.225291-0.280976-0.4237951.1897080.0165990.250975-1.496334-1.032339-0.0041750.298315-1.096782
3-0.567639-0.598153-1.040806-0.280976-0.5325940.910565-0.5263500.773661-0.886900-1.3276010.4035930.343869-1.283945
4-0.509220-0.598153-1.040806-0.280976-0.5325941.132984-0.2282610.773661-0.886900-1.3276010.4035930.401318-0.873561
..........................................
501-0.519445-0.5981530.585220-0.2809760.6048480.3060040.300494-0.936773-2.105767-0.5746821.4456660.277056-0.128344
502-0.547094-0.5981530.585220-0.2809760.604848-0.4000630.570195-1.027984-2.105767-0.5746821.4456660.401318-0.229652
503-0.522423-0.5981530.585220-0.2809760.6048480.8777251.077657-1.085260-2.105767-0.5746821.4456660.401318-0.820331
504-0.444652-0.5981530.585220-0.2809760.6048480.6060461.017329-0.979587-2.105767-0.5746821.4456660.314006-0.676095
505-0.543685-0.5981530.585220-0.2809760.604848-0.5344100.715691-0.924173-2.105767-0.5746821.4456660.401318-0.435703

Choose and Train the Model

For the sake of the project, we'll apply linear regression.

Typically, we run numerous models and select the best one based on a particular criterion.

Linear regression is a sort of supervised learning model in which the response is continuous, as it relates to machine learning.

Form of Linear Regression

y= θX+θ1 or y= θ1+X1θ2 +X2θ3 + X3θ4

y is the target you will be predicting

0 is the coefficient

x is the input

We will Sklearn to develop and train the model

#Import the libraries to train the model from sklearn.model_selection import train_test_split from sklearn.linear_model import LinearRegression

Allow us to utilise the train/test method to learn a part of the data on one set and predict using another set using the train/test approach.

X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.4) #Create and Train the model model=LinearRegression().fit(X_train,y_train) #Generate prediction predictions_test=model.predict(X_test) #Compute loss to evaluate the model coefficient= model.coef_ intercept=model.intercept_ print(coefficient,intercept) [7.22218258] 24.66379606613584

In this example, you will learn the model using below hypothesis:

Price= 24.85 + 7.18* Room

It is interpreted as:

For a decided price of a house:

A 7.18-unit increase in the price is connected with a growth in the number of rooms.

As a side note, this is an association, not a cause!

Interpretation

You will need a metric to determine whether our hypothesis was right. The RMSE approach will be used.

Root Means Square Error (RMSE) is defined as the square root of the mean of square error. The difference between the true and anticipated numbers called the error. It's popular because it can be expressed in y-units, which is the median price of a home in our scenario.

def rmse(predict,actual):    return np.sqrt(np.mean(np.square(predict - actual))) # Split the Data into train and test set X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.4) #Create and Train the model model=LinearRegression().fit(X_train,y_train) #Generate prediction predictions_test=model.predict(X_test) #Compute loss to evaluate the model coefficient= model.coef_ intercept=model.intercept_ print(coefficient,intercept) loss=rmse(predictions_test,y_test) print('loss: ',loss) print(model.score(X_test,y_test)) #accuracy [7.43327725] 24.912055881970886 loss: 3.9673165450580714 0.7552661033654667 Loss will be 3.96

This means that y-units refer to the median value of occupied homes with 1000 dollars.

This will be less by 3960 dollars.

While learning the model you will have a high variance when you divide the data. Coefficient and intercept will vary. It's because when we utilized the train/test approach, we choose a set of data at random to place in either the train or test set. As a result, our theory will change each time the dataset is divided.

This problem can be solved using a technique called cross-validation.

Improvisation in the Model

With 'Forward Selection,' we'll iterate through each parameter to assist us choose the numbers characteristics to include in our model.

Forward Selection

  1. Choose the most appropriate variable (in our case based on high correlation)
  2. Add the next best variable to the model
  3. Some predetermined conditions must meet.

We'll use a random state of 1 so that each iteration yields the same outcome.

cols=[] los=[] los_train=[] scor=[] i=0 while i < len(high_corr_var):    cols.append(high_corr_var[i])        # Select inputs variables    X=new_df[cols]        #mean normalization    mu,std,sta=standard(X)    X=sta        # Split the data into training and testing    X_train,X_test,y_train,y_test= train_test_split(X,y,random_state=1)        #fit the model to the training    lnreg=LinearRegression().fit(X_train,y_train)        #make prediction on the training test    prediction_train=lnreg.predict(X_train)        #make prediction on the testing test    prediction=lnreg.predict(X_test)        #compute the loss on train test    loss=rmse(prediction,y_test)    loss_train=rmse(prediction_train,y_train)    los_train.append(loss_train)    los.append(loss)        #compute the score    score=lnreg.score(X_test,y_test)    scor.append(score)        i+=1

We have a big 'loss' with a smaller collection of variables, yet our system will overgeneralize in this scenario. Although we have a reduced 'loss,' we have a large number of variables. However, if the model grows too precise, it may not generalize well to new data.

In order for our model to generalize well with another set of data, we might use 6 or 7 features. The characteristic chosen is descending based on how strong the price correlation is.

high_corr_var ['RM', 'ZN', 'B', 'CHAS', 'RAD', 'DIS', 'CRIM', 'NOX', 'AGE', 'TAX', 'INDUS', 'PTRATIO', 'LSTAT']

With 'RM' having a high price correlation and LSTAT having a negative price correlation.

# Create a list of features names feature_cols=['RM','ZN','B','CHAS','RAD','CRIM','DIS','NOX'] #Select inputs variables X=new_df[feature_cols] # Split the data into training and testing sets X_train,X_test,y_train,y_test= train_test_split(X,y, random_state=1) # feature engineering mu,std,sta=standard(X) X=sta # fit the model to the trainning data lnreg=LinearRegression().fit(X_train,y_train) # make prediction on the testing test prediction=lnreg.predict(X_test) # compute the loss loss=rmse(prediction,y_test) print('loss: ',loss) lnreg.score(X_test,y_test) loss: 3.212659865936143 0.8582338376696363

The test set yielded a loss of 3.21 and an accuracy of 85%.

Other factors, such as alpha, the learning rate at which our model learns, could still be tweaked to improve our model. Alternatively, return to the preprocessing section and working to increase the parameter distribution.

For more details regarding scraping real estate data you can contact Scraping Intelligence today

https://www.websitescraper.com/how-to-predict-housing-prices-with-linear-regression.php