1593533160
Creating a machine learning model requires setting up the environment, making changes in model, compiling the model and training the model again and again. Hence, most of the machine learning projects are not implemented due to this huge amount of manual working.
So, I made an effort to find a solution to this issue.
PROBLEM STATEMENT: 1. Create container image that’s has Python3 and Keras or numpy installed using dockerfile
2. When we launch this image, it should automatically starts train the model in the container.
3. Create a job chain of job1, job2, job3, job4 and job5 using build pipeline plugin in Jenkins
4. Job1 : Pull the Github repo automatically when some developers push repo to Github.
5. Job2 : By looking at the code or program file, Jenkins should automatically start the respective machine learning software installed interpreter install image container to deploy code and start training( eg. If code uses CNN, then Jenkins should start the container that has already installed all the softwares required for the cnn processing).
6. Job3 : Train your model and predict accuracy or metrics.
7. Job4 : if metrics accuracy is less than 99% , then tweak the machine learning model architecture.
8. Job5: Retrain the model or notify that the best model is being created.
9. Create One extra job job6 for monitor : If container where app is running. fails due to any reason then this job should automatically start the container again from where the last trained model left.
#devops #machine-learning #accuracy #jenkins #cnn
1656151740
Flutter Console Coverage Test
This small dart tools is used to generate Flutter Coverage Test report to console
Add a line like this to your package's pubspec.yaml (and run an implicit flutter pub get):
dev_dependencies:
test_cov_console: ^0.2.2
flutter pub get
Running "flutter pub get" in coverage... 0.5s
flutter test --coverage
00:02 +1: All tests passed!
flutter pub run test_cov_console
---------------------------------------------|---------|---------|---------|-------------------|
File |% Branch | % Funcs | % Lines | Uncovered Line #s |
---------------------------------------------|---------|---------|---------|-------------------|
lib/src/ | | | | |
print_cov.dart | 100.00 | 100.00 | 88.37 |...,149,205,206,207|
print_cov_constants.dart | 0.00 | 0.00 | 0.00 | no unit testing|
lib/ | | | | |
test_cov_console.dart | 0.00 | 0.00 | 0.00 | no unit testing|
---------------------------------------------|---------|---------|---------|-------------------|
All files with unit testing | 100.00 | 100.00 | 88.37 | |
---------------------------------------------|---------|---------|---------|-------------------|
If not given a FILE, "coverage/lcov.info" will be used.
-f, --file=<FILE> The target lcov.info file to be reported
-e, --exclude=<STRING1,STRING2,...> A list of contains string for files without unit testing
to be excluded from report
-l, --line It will print Lines & Uncovered Lines only
Branch & Functions coverage percentage will not be printed
-i, --ignore It will not print any file without unit testing
-m, --multi Report from multiple lcov.info files
-c, --csv Output to CSV file
-o, --output=<CSV-FILE> Full path of output CSV file
If not given, "coverage/test_cov_console.csv" will be used
-t, --total Print only the total coverage
Note: it will ignore all other option (if any), except -m
-p, --pass=<MINIMUM> Print only the whether total coverage is passed MINIMUM value or not
If the value >= MINIMUM, it will print PASSED, otherwise FAILED
Note: it will ignore all other option (if any), except -m
-h, --help Show this help
flutter pub run test_cov_console --file=coverage/lcov.info --exclude=_constants,_mock
---------------------------------------------|---------|---------|---------|-------------------|
File |% Branch | % Funcs | % Lines | Uncovered Line #s |
---------------------------------------------|---------|---------|---------|-------------------|
lib/src/ | | | | |
print_cov.dart | 100.00 | 100.00 | 88.37 |...,149,205,206,207|
lib/ | | | | |
test_cov_console.dart | 0.00 | 0.00 | 0.00 | no unit testing|
---------------------------------------------|---------|---------|---------|-------------------|
All files with unit testing | 100.00 | 100.00 | 88.37 | |
---------------------------------------------|---------|---------|---------|-------------------|
It support to run for multiple lcov.info files with the followings directory structures:
1. No root module
<root>/<module_a>
<root>/<module_a>/coverage/lcov.info
<root>/<module_a>/lib/src
<root>/<module_b>
<root>/<module_b>/coverage/lcov.info
<root>/<module_b>/lib/src
...
2. With root module
<root>/coverage/lcov.info
<root>/lib/src
<root>/<module_a>
<root>/<module_a>/coverage/lcov.info
<root>/<module_a>/lib/src
<root>/<module_b>
<root>/<module_b>/coverage/lcov.info
<root>/<module_b>/lib/src
...
You must run test_cov_console on <root> dir, and the report would be grouped by module, here is
the sample output for directory structure 'with root module':
flutter pub run test_cov_console --file=coverage/lcov.info --exclude=_constants,_mock --multi
---------------------------------------------|---------|---------|---------|-------------------|
File |% Branch | % Funcs | % Lines | Uncovered Line #s |
---------------------------------------------|---------|---------|---------|-------------------|
lib/src/ | | | | |
print_cov.dart | 100.00 | 100.00 | 88.37 |...,149,205,206,207|
lib/ | | | | |
test_cov_console.dart | 0.00 | 0.00 | 0.00 | no unit testing|
---------------------------------------------|---------|---------|---------|-------------------|
All files with unit testing | 100.00 | 100.00 | 88.37 | |
---------------------------------------------|---------|---------|---------|-------------------|
---------------------------------------------|---------|---------|---------|-------------------|
File - module_a - |% Branch | % Funcs | % Lines | Uncovered Line #s |
---------------------------------------------|---------|---------|---------|-------------------|
lib/src/ | | | | |
print_cov.dart | 100.00 | 100.00 | 88.37 |...,149,205,206,207|
lib/ | | | | |
test_cov_console.dart | 0.00 | 0.00 | 0.00 | no unit testing|
---------------------------------------------|---------|---------|---------|-------------------|
All files with unit testing | 100.00 | 100.00 | 88.37 | |
---------------------------------------------|---------|---------|---------|-------------------|
---------------------------------------------|---------|---------|---------|-------------------|
File - module_b - |% Branch | % Funcs | % Lines | Uncovered Line #s |
---------------------------------------------|---------|---------|---------|-------------------|
lib/src/ | | | | |
print_cov.dart | 100.00 | 100.00 | 88.37 |...,149,205,206,207|
lib/ | | | | |
test_cov_console.dart | 0.00 | 0.00 | 0.00 | no unit testing|
---------------------------------------------|---------|---------|---------|-------------------|
All files with unit testing | 100.00 | 100.00 | 88.37 | |
---------------------------------------------|---------|---------|---------|-------------------|
flutter pub run test_cov_console -c --output=coverage/test_coverage.csv
#### sample CSV output file:
File,% Branch,% Funcs,% Lines,Uncovered Line #s
lib/,,,,
test_cov_console.dart,0.00,0.00,0.00,no unit testing
lib/src/,,,,
parser.dart,100.00,100.00,97.22,"97"
parser_constants.dart,100.00,100.00,100.00,""
print_cov.dart,100.00,100.00,82.91,"29,49,51,52,171,174,177,180,183,184,185,186,187,188,279,324,325,387,388,389,390,391,392,393,394,395,398"
print_cov_constants.dart,0.00,0.00,0.00,no unit testing
All files with unit testing,100.00,100.00,86.07,""
You can install the package from the command line:
dart pub global activate test_cov_console
The package has the following executables:
$ test_cov_console
Run this command:
With Dart:
$ dart pub add test_cov_console
With Flutter:
$ flutter pub add test_cov_console
This will add a line like this to your package's pubspec.yaml (and run an implicit dart pub get
):
dependencies:
test_cov_console: ^0.2.2
Alternatively, your editor might support dart pub get
or flutter pub get
. Check the docs for your editor to learn more.
Now in your Dart code, you can use:
import 'package:test_cov_console/test_cov_console.dart';
example/lib/main.dart
import 'package:flutter/material.dart';
void main() {
runApp(MyApp());
}
class MyApp extends StatelessWidget {
// This widget is the root of your application.
@override
Widget build(BuildContext context) {
return MaterialApp(
title: 'Flutter Demo',
theme: ThemeData(
// This is the theme of your application.
//
// Try running your application with "flutter run". You'll see the
// application has a blue toolbar. Then, without quitting the app, try
// changing the primarySwatch below to Colors.green and then invoke
// "hot reload" (press "r" in the console where you ran "flutter run",
// or simply save your changes to "hot reload" in a Flutter IDE).
// Notice that the counter didn't reset back to zero; the application
// is not restarted.
primarySwatch: Colors.blue,
// This makes the visual density adapt to the platform that you run
// the app on. For desktop platforms, the controls will be smaller and
// closer together (more dense) than on mobile platforms.
visualDensity: VisualDensity.adaptivePlatformDensity,
),
home: MyHomePage(title: 'Flutter Demo Home Page'),
);
}
}
class MyHomePage extends StatefulWidget {
MyHomePage({Key? key, required this.title}) : super(key: key);
// This widget is the home page of your application. It is stateful, meaning
// that it has a State object (defined below) that contains fields that affect
// how it looks.
// This class is the configuration for the state. It holds the values (in this
// case the title) provided by the parent (in this case the App widget) and
// used by the build method of the State. Fields in a Widget subclass are
// always marked "final".
final String title;
@override
_MyHomePageState createState() => _MyHomePageState();
}
class _MyHomePageState extends State<MyHomePage> {
int _counter = 0;
void _incrementCounter() {
setState(() {
// This call to setState tells the Flutter framework that something has
// changed in this State, which causes it to rerun the build method below
// so that the display can reflect the updated values. If we changed
// _counter without calling setState(), then the build method would not be
// called again, and so nothing would appear to happen.
_counter++;
});
}
@override
Widget build(BuildContext context) {
// This method is rerun every time setState is called, for instance as done
// by the _incrementCounter method above.
//
// The Flutter framework has been optimized to make rerunning build methods
// fast, so that you can just rebuild anything that needs updating rather
// than having to individually change instances of widgets.
return Scaffold(
appBar: AppBar(
// Here we take the value from the MyHomePage object that was created by
// the App.build method, and use it to set our appbar title.
title: Text(widget.title),
),
body: Center(
// Center is a layout widget. It takes a single child and positions it
// in the middle of the parent.
child: Column(
// Column is also a layout widget. It takes a list of children and
// arranges them vertically. By default, it sizes itself to fit its
// children horizontally, and tries to be as tall as its parent.
//
// Invoke "debug painting" (press "p" in the console, choose the
// "Toggle Debug Paint" action from the Flutter Inspector in Android
// Studio, or the "Toggle Debug Paint" command in Visual Studio Code)
// to see the wireframe for each widget.
//
// Column has various properties to control how it sizes itself and
// how it positions its children. Here we use mainAxisAlignment to
// center the children vertically; the main axis here is the vertical
// axis because Columns are vertical (the cross axis would be
// horizontal).
mainAxisAlignment: MainAxisAlignment.center,
children: <Widget>[
Text(
'You have pushed the button this many times:',
),
Text(
'$_counter',
style: Theme.of(context).textTheme.headline4,
),
],
),
),
floatingActionButton: FloatingActionButton(
onPressed: _incrementCounter,
tooltip: 'Increment',
child: Icon(Icons.add),
), // This trailing comma makes auto-formatting nicer for build methods.
);
}
}
Author: DigitalKatalis
Source Code: https://github.com/DigitalKatalis/test_cov_console
License: BSD-3-Clause license
1593533160
Creating a machine learning model requires setting up the environment, making changes in model, compiling the model and training the model again and again. Hence, most of the machine learning projects are not implemented due to this huge amount of manual working.
So, I made an effort to find a solution to this issue.
PROBLEM STATEMENT: 1. Create container image that’s has Python3 and Keras or numpy installed using dockerfile
2. When we launch this image, it should automatically starts train the model in the container.
3. Create a job chain of job1, job2, job3, job4 and job5 using build pipeline plugin in Jenkins
4. Job1 : Pull the Github repo automatically when some developers push repo to Github.
5. Job2 : By looking at the code or program file, Jenkins should automatically start the respective machine learning software installed interpreter install image container to deploy code and start training( eg. If code uses CNN, then Jenkins should start the container that has already installed all the softwares required for the cnn processing).
6. Job3 : Train your model and predict accuracy or metrics.
7. Job4 : if metrics accuracy is less than 99% , then tweak the machine learning model architecture.
8. Job5: Retrain the model or notify that the best model is being created.
9. Create One extra job job6 for monitor : If container where app is running. fails due to any reason then this job should automatically start the container again from where the last trained model left.
#devops #machine-learning #accuracy #jenkins #cnn
1646789416
Scikit-learn is a library in Python that provides many unsupervised and supervised learning algorithms. It’s built upon some of the technology you might already be familiar with, like NumPy, pandas, and Matplotlib.
In this Article I will explain all machine learning algorithms with scikit-learn which you need to learn as a Data Scientist.
Lets start by importing the libraries:
%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from sklearn.linear_model import LinearRegression
from sklearn.metrics import accuracy_score
from sklearn.metrics import confusion_matrix
from scipy import stats
import pylab as pl
Given a scikit-learn estimator object named model, the following methods are available:
model.fit() : fit training data. For supervised learning applications, this accepts two arguments: the data X and the labels y (e.g. model.fit(X, y)). For unsupervised learning applications, this accepts only a single argument, the data X (e.g. model.fit(X)).
model.predict() : given a trained model, predict the label of a new set of data. This method accepts one argument, the new data X_new (e.g. model.predict(X_new)), and returns the learned label for each object in the array.
model.predict_proba() : For classification problems, some estimators also provide this method, which returns the probability that a new observation has each categorical label. In this case, the label with the highest probability is returned by model.predict(). model.score() : for classification or regression problems, most (all?) estimators implement a score method. Scores are between 0 and 1, with a larger score indicating a better fit.
model.predict() : predict labels in clustering algorithms. model.transform() : given an unsupervised model, transform new data into the new basis. This also accepts one argument X_new, and returns the new representation of the data based on the unsupervised model. model.fit_transform() : some estimators implement this method, which more efficiently performs a fit and a transform on the same input data.
data = pd.read_csv('Iris.csv')
data.head()
print(data.shape)
#Output
(150, 6)
data.info()
#Output
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 6 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Id 150 non-null int64
1 SepalLengthCm 150 non-null float64
2 SepalWidthCm 150 non-null float64
3 PetalLengthCm 150 non-null float64
4 PetalWidthCm 150 non-null float64
5 Species 150 non-null object
dtypes: float64(4), int64(1), object(1)
memory usage: 7.2+ KB
Some graphical representation of information and data.
sns.FacetGrid(data,hue='Species',size=5)\
.map(plt.scatter,'SepalLengthCm','SepalWidthCm')\
.add_legend()
sns.pairplot(data,hue='Species')
scikit-learn provides a helpful function for partitioning data, train_test_split, which splits out your data into a training set and a test set.
Training and test usually is 70% for training and 30% for test
X = data.iloc[:, :-1].values # X -> Feature Variables
y = data.iloc[:, -1].values # y -> Target
# Splitting the data into Train and Test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 0)
It is used to estimate real values (cost of houses, number of calls, total sales etc.) based on continuous variable(s). Here, we establish relationship between independent and dependent variables by fitting a best line. This best fit line is known as regression line and represented by a linear equation **Y= a *X + b.
#converting object data type into int data type using labelEncoder for Linear reagration in this case
XL = data.iloc[:, :-1].values # X -> Feature Variables
yL = data.iloc[:, -1].values # y -> Target
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
Y_train= le.fit_transform(yL)
print(Y_train) # this is Y_train categotical to numerical
#Output
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 2]
# This is only for Linear Regretion
X_trainL, X_testL, y_trainL, y_testL = train_test_split(XL, Y_train, test_size = 0.3, random_state = 0)
from sklearn.linear_model import LinearRegression
modelLR = LinearRegression()
modelLR.fit(X_trainL, y_trainL)
Y_pred = modelLR.predict(X_testL)
from sklearn import metrics
#calculating the residuals
print('y-intercept :' , modelLR.intercept_)
print('beta coefficients :' , modelLR.coef_)
print('Mean Abs Error MAE :' ,metrics.mean_absolute_error(y_testL,Y_pred))
print('Mean Sqrt Error MSE :' ,metrics.mean_squared_error(y_testL,Y_pred))
print('Root Mean Sqrt Error RMSE:' ,np.sqrt(metrics.mean_squared_error(y_testL,Y_pred)))
print('r2 value :' ,metrics.r2_score(y_testL,Y_pred))
#Output
y-intercept : -0.024298523519848292
beta coefficients : [ 0.00680677 -0.10726764 -0.00624275 0.22428158 0.27196685]
Mean Abs Error MAE : 0.14966835490524963
Mean Sqrt Error MSE : 0.03255451737969812
Root Mean Sqrt Error RMSE: 0.18042870442282213
r2 value : 0.9446026069799255
This is one of my favorite algorithm and I use it quite frequently. It is a type of supervised learning algorithm that is mostly used for classification problems. Surprisingly, it works for both categorical and continuous dependent variables.
In this algorithm, we split the population into two or more homogeneous sets. This is done based on most significant attributes/ independent variables to make as distinct groups as possible.
# Decision Tree's
from sklearn.tree import DecisionTreeClassifier
Model = DecisionTreeClassifier()
Model.fit(X_train, y_train)
y_pred = Model.predict(X_test)
# Summary of the predictions made by the classifier
print(classification_report(y_test, y_pred))
print(confusion_matrix(y_test, y_pred))
# Accuracy score
print('accuracy is',accuracy_score(y_pred,y_test))
#Output
precision recall f1-score support
Iris-setosa 1.00 1.00 1.00 16
Iris-versicolor 0.95 1.00 0.97 18
Iris-virginica 1.00 0.91 0.95 11
accuracy 0.98 45
macro avg 0.98 0.97 0.98 45
weighted avg 0.98 0.98 0.98 45
[[16 0 0]
[ 0 18 0]
[ 0 1 10]]
accuracy is 0.9777777777777777
Random Forest is a trademark term for an ensemble of decision trees. In Random Forest, we’ve collection of decision trees (so known as “Forest”). To classify a new object based on attributes, each tree gives a classification and we say the tree “votes” for that class. The forest chooses the classification having the most votes (over all the trees in the forest).
from sklearn.ensemble import RandomForestClassifier
Model=RandomForestClassifier(max_depth=2)
Model.fit(X_train,y_train)
y_pred=Model.predict(X_test)
# Summary of the predictions made by the classifier
print(classification_report(y_test,y_pred))
print(confusion_matrix(y_pred,y_test))
#Accuracy Score
print('accuracy is ',accuracy_score(y_pred,y_test))
#Output
precision recall f1-score support
Iris-setosa 1.00 1.00 1.00 16
Iris-versicolor 1.00 1.00 1.00 18
Iris-virginica 1.00 1.00 1.00 11
accuracy 1.00 45
macro avg 1.00 1.00 1.00 45
weighted avg 1.00 1.00 1.00 45
[[16 0 0]
[ 0 18 0]
[ 0 0 11]]
accuracy is 1.0
Don’t get confused by its name! It is a classification not a regression algorithm. It is used to estimate discrete values ( Binary values like 0/1, yes/no, true/false ) based on given set of independent variable(s).
In simple words, it predicts the probability of occurrence of an event by fitting data to a logic function. Hence, it is also known as logic regression. Since, it predicts the probability, its output values lies between 0 and 1 (as expected).
# LogisticRegression
from sklearn.linear_model import LogisticRegression
Model = LogisticRegression()
Model.fit(X_train, y_train)
y_pred = Model.predict(X_test)
# Summary of the predictions made by the classifier
print(classification_report(y_test, y_pred))
print(confusion_matrix(y_test, y_pred))
# Accuracy score
print('accuracy is',accuracy_score(y_pred,y_test))
#Output
precision recall f1-score support
Iris-setosa 1.00 1.00 1.00 16
Iris-versicolor 1.00 1.00 1.00 18
Iris-virginica 1.00 1.00 1.00 11
accuracy 1.00 45
macro avg 1.00 1.00 1.00 45
weighted avg 1.00 1.00 1.00 45
[[16 0 0]
[ 0 18 0]
[ 0 0 11]]
accuracy is 1.0
It can be used for both classification and regression problems. However, it is more widely used in classification problems in the industry. K nearest neighbors is a simple algorithm that stores all available cases and classifies new cases by a majority vote of its k neighbors. The case being assigned to the class is most common amongst its K nearest neighbors measured by a distance function.
# K-Nearest Neighbours
from sklearn.neighbors import KNeighborsClassifier
Model = KNeighborsClassifier(n_neighbors=8)
Model.fit(X_train, y_train)
y_pred = Model.predict(X_test)
# Summary of the predictions made by the classifier
print(classification_report(y_test, y_pred))
print(confusion_matrix(y_test, y_pred))
# Accuracy score
print('accuracy is',accuracy_score(y_pred,y_test))
#Output
precision recall f1-score support
Iris-setosa 1.00 1.00 1.00 16
Iris-versicolor 1.00 1.00 1.00 18
Iris-virginica 1.00 1.00 1.00 11
accuracy 1.00 45
macro avg 1.00 1.00 1.00 45
weighted avg 1.00 1.00 1.00 45
[[16 0 0]
[ 0 18 0]
[ 0 0 11]]
accuracy is 1.0
It is a classification technique based on Bayes’ theorem with an assumption of independence between predictors. In simple terms, a Naive Bayes classifier assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature.
For example, a fruit may be considered to be an apple if it is red, round, and about 3 inches in diameter. Even if these features depend on each other or upon the existence of the other features, a naive Bayes classifier would consider all of these properties to independently contribute to the probability that this fruit is an apple.
# Naive Bayes
from sklearn.naive_bayes import GaussianNB
Model = GaussianNB()
Model.fit(X_train, y_train)
y_pred = Model.predict(X_test)
# Summary of the predictions made by the classifier
print(classification_report(y_test, y_pred))
print(confusion_matrix(y_test, y_pred))
# Accuracy score
print('accuracy is',accuracy_score(y_pred,y_test))
#Output
precision recall f1-score support
Iris-setosa 1.00 1.00 1.00 16
Iris-versicolor 1.00 1.00 1.00 18
Iris-virginica 1.00 1.00 1.00 11
accuracy 1.00 45
macro avg 1.00 1.00 1.00 45
weighted avg 1.00 1.00 1.00 45
[[16 0 0]
[ 0 18 0]
[ 0 0 11]]
accuracy is 1.0
It is a classification method. In this algorithm, we plot each data item as a point in n-dimensional space (where n is number of features you have) with the value of each feature being the value of a particular coordinate.
For example, if we only had two features like Height and Hair length of an individual, we’d first plot these two variables in two dimensional space where each point has two co-ordinates (these co-ordinates are known as Support Vectors)
# Support Vector Machine
from sklearn.svm import SVC
Model = SVC()
Model.fit(X_train, y_train)
y_pred = Model.predict(X_test)
# Summary of the predictions made by the classifier
print(classification_report(y_test, y_pred))
print(confusion_matrix(y_test, y_pred))
# Accuracy score
print('accuracy is',accuracy_score(y_pred,y_test))
#Output
precision recall f1-score support
Iris-setosa 1.00 1.00 1.00 16
Iris-versicolor 1.00 1.00 1.00 18
Iris-virginica 1.00 1.00 1.00 11
accuracy 1.00 45
macro avg 1.00 1.00 1.00 45
weighted avg 1.00 1.00 1.00 45
[[16 0 0]
[ 0 18 0]
[ 0 0 11]]
accuracy is 1.0
In scikit-learn RadiusNeighborsClassifier is very similar to KNeighborsClassifier with the exception of two parameters. First, in RadiusNeighborsClassifier we need to specify the radius of the fixed area used to determine if an observation is a neighbor using radius.
Unless there is some substantive reason for setting radius to some value, it is best to treat it like any other hyperparameter and tune it during model selection. The second useful parameter is outlier_label, which indicates what label to give an observation that has no observations within the radius – which itself can often be a useful tool for identifying outliers.
#Output
from sklearn.neighbors import RadiusNeighborsClassifier
Model=RadiusNeighborsClassifier(radius=8.0)
Model.fit(X_train,y_train)
y_pred=Model.predict(X_test)
#summary of the predictions made by the classifier
print(classification_report(y_test,y_pred))
print(confusion_matrix(y_test,y_pred))
#Accouracy score
print('accuracy is ', accuracy_score(y_test,y_pred))
#Output
precision recall f1-score support
Iris-setosa 1.00 1.00 1.00 16
Iris-versicolor 1.00 1.00 1.00 18
Iris-virginica 1.00 1.00 1.00 11
accuracy 1.00 45
macro avg 1.00 1.00 1.00 45
weighted avg 1.00 1.00 1.00 45
[[16 0 0]
[ 0 18 0]
[ 0 0 11]]
accuracy is 1.0
PA algorithm is a margin based online learning algorithm for binary classification. Unlike PA algorithm, which is a hard-margin based method, PA-I algorithm is a soft margin based method and robuster to noise.
from sklearn.linear_model import PassiveAggressiveClassifier
Model = PassiveAggressiveClassifier()
Model.fit(X_train, y_train)
y_pred = Model.predict(X_test)
# Summary of the predictions made by the classifier
print(classification_report(y_test, y_pred))
print(confusion_matrix(y_test, y_pred))
# Accuracy score
print('accuracy is',accuracy_score(y_pred,y_test))
#Output
precision recall f1-score support
Iris-setosa 0.89 1.00 0.94 16
Iris-versicolor 0.00 0.00 0.00 18
Iris-virginica 0.41 1.00 0.58 11
accuracy 0.60 45
macro avg 0.43 0.67 0.51 45
weighted avg 0.42 0.60 0.48 45
[[16 0 0]
[ 2 0 16]
[ 0 0 11]]
accuracy is 0.6
Like MultinomialNB, this classifier is suitable for discrete data. The difference is that while MultinomialNB works with occurrence counts, BernoulliNB is designed for binary/boolean features.
# BernoulliNB
from sklearn.naive_bayes import BernoulliNB
Model = BernoulliNB()
Model.fit(X_train, y_train)
y_pred = Model.predict(X_test)
# Summary of the predictions made by the classifier
print(classification_report(y_test, y_pred))
print(confusion_matrix(y_test, y_pred))
# Accuracy score
print('accuracy is',accuracy_score(y_pred,y_test))
#Output
precision recall f1-score support
Iris-setosa 0.00 0.00 0.00 16
Iris-versicolor 0.00 0.00 0.00 18
Iris-virginica 0.24 1.00 0.39 11
accuracy 0.24 45
macro avg 0.08 0.33 0.13 45
weighted avg 0.06 0.24 0.10 45
[[ 0 0 16]
[ 0 0 18]
[ 0 0 11]]
accuracy is 0.24444444444444444
ExtraTreesClassifier is an ensemble learning method fundamentally based on decision trees. ExtraTreesClassifier, like RandomForest, randomizes certain decisions and subsets of data to minimize over-learning from the data and overfitting. Let’s look at some ensemble methods ordered from high to low variance, ending in ExtraTreesClassifier.
# ExtraTreeClassifier
from sklearn.tree import ExtraTreeClassifier
Model = ExtraTreeClassifier()
Model.fit(X_train, y_train)
y_pred = Model.predict(X_test)
# Summary of the predictions made by the classifier
print(classification_report(y_test, y_pred))
print(confusion_matrix(y_test, y_pred))
# Accuracy score
print('accuracy is',accuracy_score(y_pred,y_test))
#Output
precision recall f1-score support
Iris-setosa 1.00 1.00 1.00 16
Iris-versicolor 1.00 0.94 0.97 18
Iris-virginica 0.92 1.00 0.96 11
accuracy 0.98 45
macro avg 0.97 0.98 0.98 45
weighted avg 0.98 0.98 0.98 45
[[16 0 0]
[ 0 17 1]
[ 0 0 11]]
accuracy is 0.9777777777777777
Bagging classifier is an ensemble meta-estimator that fits base classifiers each on random subsets of the original dataset and then aggregate their individual predictions (either by voting or by averaging) to form a final prediction. Such a meta-estimator can typically be used as a way to reduce the variance of a black-box estimator (e.g., a decision tree), by introducing randomization into its construction procedure and then making an ensemble out of it.
#Output
precision recall f1-score support
Iris-setosa 1.00 1.00 1.00 16
Iris-versicolor 0.95 1.00 0.97 18
Iris-virginica 1.00 0.91 0.95 11
accuracy 0.98 45
macro avg 0.98 0.97 0.98 45
weighted avg 0.98 0.98 0.98 45
[[16 0 0]
[ 0 18 1]
[ 0 0 10]]
accuracy is 0.9777777777777777
An AdaBoost classifier is a meta-estimator that begins by fitting a classifier on the original dataset and then fits additional copies of the classifier on the same dataset but where the weights of incorrectly classified instances are adjusted such that subsequent classifiers focus more on difficult cases.
from sklearn.ensemble import AdaBoostClassifier
Model=AdaBoostClassifier()
Model.fit(X_train,y_train)
y_pred=Model.predict(X_test)
# Summary of the predictions made by the classifier
print(classification_report(y_test,y_pred))
print(confusion_matrix(y_pred,y_test))
#Accuracy Score
print('accuracy is ',accuracy_score(y_pred,y_test))
#Output
precision recall f1-score support
Iris-setosa 1.00 1.00 1.00 16
Iris-versicolor 0.95 1.00 0.97 18
Iris-virginica 1.00 0.91 0.95 11
accuracy 0.98 45
macro avg 0.98 0.97 0.98 45
weighted avg 0.98 0.98 0.98 45
[[16 0 0]
[ 0 18 1]
[ 0 0 10]]
accuracy is 0.9777777777777777
GBM is a boosting algorithm used when we deal with plenty of data to make a prediction with high prediction power. Boosting is actually an ensemble of learning algorithms which combines the prediction of several base estimators in order to improve robustness over a single estimator. It combines multiple weak or average predictors to a build strong predictor.
from sklearn.ensemble import GradientBoostingClassifier
Model=GradientBoostingClassifier()
Model.fit(X_train,y_train)
y_pred=Model.predict(X_test)
# Summary of the predictions made by the classifier
print(classification_report(y_test,y_pred))
print(confusion_matrix(y_pred,y_test))
#Accuracy Score
print('accuracy is ',accuracy_score(y_pred,y_test))
#Output
precision recall f1-score support
Iris-setosa 1.00 1.00 1.00 16
Iris-versicolor 0.95 1.00 0.97 18
Iris-virginica 1.00 0.91 0.95 11
accuracy 0.98 45
macro avg 0.98 0.97 0.98 45
weighted avg 0.98 0.98 0.98 45
[[16 0 0]
[ 0 18 1]
[ 0 0 10]]
accuracy is 0.9777777777777777
A classifier with a linear decision boundary, generated by fitting class conditional densities to the data and using Bayes’ rule.
The model fits a Gaussian density to each class, assuming that all classes share the same covariance matrix.
The fitted model can also be used to reduce the dimensionality of the input by projecting it to the most discriminative directions.
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
Model=LinearDiscriminantAnalysis()
Model.fit(X_train,y_train)
y_pred=Model.predict(X_test)
# Summary of the predictions made by the classifier
print(classification_report(y_test,y_pred))
print(confusion_matrix(y_pred,y_test))
#Accuracy Score
print('accuracy is ',accuracy_score(y_pred,y_test))
#Output
precision recall f1-score support
Iris-setosa 1.00 1.00 1.00 16
Iris-versicolor 1.00 1.00 1.00 18
Iris-virginica 1.00 1.00 1.00 11
accuracy 1.00 45
macro avg 1.00 1.00 1.00 45
weighted avg 1.00 1.00 1.00 45
[[16 0 0]
[ 0 18 0]
[ 0 0 11]]
accuracy is 1.0
A classifier with a quadratic decision boundary, generated by fitting class conditional densities to the data and using Bayes’ rule.
The model fits a Gaussian density to each class.
#Output
precision recall f1-score support
Iris-setosa 1.00 1.00 1.00 16
Iris-versicolor 1.00 1.00 1.00 18
Iris-virginica 1.00 1.00 1.00 11
accuracy 1.00 45
macro avg 1.00 1.00 1.00 45
weighted avg 1.00 1.00 1.00 45
[[16 0 0]
[ 0 18 0]
[ 0 0 11]]
accuracy is 1.0
It is a type of unsupervised algorithm which solves the clustering problem. Its procedure follows a simple and easy way to classify a given data set through a certain number of clusters (assume k clusters). Data points inside a cluster are homogeneous and heterogeneous to peer groups.
Remember figuring out shapes from ink blots? k means is somewhat similar this activity. You look at the shape and spread to decipher how many different clusters / population are present.
x = data.iloc[:, [1, 2, 3, 4]].values
#Finding the optimum number of clusters for k-means classification
from sklearn.cluster import KMeans
wcss = []
for i in range(1, 11):
kmeans = KMeans(n_clusters = i, init = 'k-means++', max_iter = 300, n_init = 10, random_state = 0)
kmeans.fit(x)
wcss.append(kmeans.inertia_)
#Plotting the results onto a line graph, allowing us to observe 'The elbow'
plt.plot(range(1, 11), wcss)
plt.title('The elbow method')
plt.xlabel('Number of clusters')
plt.ylabel('WCSS') # within cluster sum of squares
plt.show()
#Applying kmeans to the dataset / Creating the kmeans classifier
kmeans = KMeans(n_clusters = 3, init = 'k-means++', max_iter = 300, n_init = 10, random_state = 0)
y_kmeans = kmeans.fit_predict(x)
#Visualising the clusters
plt.scatter(x[y_kmeans == 0, 0], x[y_kmeans == 0, 1], s = 100, c = 'red', label = 'Iris-Setosa')
plt.scatter(x[y_kmeans == 1, 0], x[y_kmeans == 1, 1], s = 100, c = 'blue', label = 'Iris-Versicolour')
plt.scatter(x[y_kmeans == 2, 0], x[y_kmeans == 2, 1], s = 100, c = 'yellow', label = 'Iris-Virginica')
#Plotting the centroids of the clusters
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:,1], s = 100, c = 'green', label = 'Centroids',marker='*')
plt.legend()
I hope you like this article on All Machine Learning Algorithms.
Original article source at https://thecleverprogrammer.com
#machinelearning #algorithms
1636360749
The std
library provides many custom types which expands drastically on the primitives
. Some of these include:
String
s like: "hello world"
[1, 2, 3]
Option<i32>
Result<i32, i32>
Box<i32>
All values in Rust are stack allocated by default. Values can be boxed (allocated on the heap) by creating a Box<T>
. A box is a smart pointer to a heap allocated value of type T
. When a box goes out of scope, its destructor is called, the inner object is destroyed, and the memory on the heap is freed.
Boxed values can be dereferenced using the *
operator; this removes one layer of indirection.
use std::mem;
#[allow(dead_code)]
#[derive(Debug, Clone, Copy)]
struct Point {
x: f64,
y: f64,
}
// A Rectangle can be specified by where its top left and bottom right
// corners are in space
#[allow(dead_code)]
struct Rectangle {
top_left: Point,
bottom_right: Point,
}
fn origin() -> Point {
Point { x: 0.0, y: 0.0 }
}
fn boxed_origin() -> Box<Point> {
// Allocate this point on the heap, and return a pointer to it
Box::new(Point { x: 0.0, y: 0.0 })
}
fn main() {
// (all the type annotations are superfluous)
// Stack allocated variables
let point: Point = origin();
let rectangle: Rectangle = Rectangle {
top_left: origin(),
bottom_right: Point { x: 3.0, y: -4.0 }
};
// Heap allocated rectangle
let boxed_rectangle: Box<Rectangle> = Box::new(Rectangle {
top_left: origin(),
bottom_right: Point { x: 3.0, y: -4.0 },
});
// The output of functions can be boxed
let boxed_point: Box<Point> = Box::new(origin());
// Double indirection
let box_in_a_box: Box<Box<Point>> = Box::new(boxed_origin());
println!("Point occupies {} bytes on the stack",
mem::size_of_val(&point));
println!("Rectangle occupies {} bytes on the stack",
mem::size_of_val(&rectangle));
// box size == pointer size
println!("Boxed point occupies {} bytes on the stack",
mem::size_of_val(&boxed_point));
println!("Boxed rectangle occupies {} bytes on the stack",
mem::size_of_val(&boxed_rectangle));
println!("Boxed box occupies {} bytes on the stack",
mem::size_of_val(&box_in_a_box));
// Copy the data contained in `boxed_point` into `unboxed_point`
let unboxed_point: Point = *boxed_point;
println!("Unboxed point occupies {} bytes on the stack",
mem::size_of_val(&unboxed_point));
}
Vectors are re-sizable arrays. Like slices, their size is not known at compile time, but they can grow or shrink at any time. A vector is represented using 3 parameters:
The capacity indicates how much memory is reserved for the vector. The vector can grow as long as the length is smaller than the capacity. When this threshold needs to be surpassed, the vector is reallocated with a larger capacity.
fn main() {
// Iterators can be collected into vectors
let collected_iterator: Vec<i32> = (0..10).collect();
println!("Collected (0..10) into: {:?}", collected_iterator);
// The `vec!` macro can be used to initialize a vector
let mut xs = vec![1i32, 2, 3];
println!("Initial vector: {:?}", xs);
// Insert new element at the end of the vector
println!("Push 4 into the vector");
xs.push(4);
println!("Vector: {:?}", xs);
// Error! Immutable vectors can't grow
collected_iterator.push(0);
// FIXME ^ Comment out this line
// The `len` method yields the number of elements currently stored in a vector
println!("Vector length: {}", xs.len());
// Indexing is done using the square brackets (indexing starts at 0)
println!("Second element: {}", xs[1]);
// `pop` removes the last element from the vector and returns it
println!("Pop last element: {:?}", xs.pop());
// Out of bounds indexing yields a panic
println!("Fourth element: {}", xs[3]);
// FIXME ^ Comment out this line
// `Vector`s can be easily iterated over
println!("Contents of xs:");
for x in xs.iter() {
println!("> {}", x);
}
// A `Vector` can also be iterated over while the iteration
// count is enumerated in a separate variable (`i`)
for (i, x) in xs.iter().enumerate() {
println!("In position {} we have value {}", i, x);
}
// Thanks to `iter_mut`, mutable `Vector`s can also be iterated
// over in a way that allows modifying each value
for x in xs.iter_mut() {
*x *= 3;
}
println!("Updated vector: {:?}", xs);
}
More Vec
methods can be found under the std::vec module
There are two types of strings in Rust: String
and &str
.
A String
is stored as a vector of bytes (Vec<u8>
), but guaranteed to always be a valid UTF-8 sequence. String
is heap allocated, growable and not null terminated.
&str
is a slice (&[u8]
) that always points to a valid UTF-8 sequence, and can be used to view into a String
, just like &[T]
is a view into Vec<T>
.
fn main() {
// (all the type annotations are superfluous)
// A reference to a string allocated in read only memory
let pangram: &'static str = "the quick brown fox jumps over the lazy dog";
println!("Pangram: {}", pangram);
// Iterate over words in reverse, no new string is allocated
println!("Words in reverse");
for word in pangram.split_whitespace().rev() {
println!("> {}", word);
}
// Copy chars into a vector, sort and remove duplicates
let mut chars: Vec<char> = pangram.chars().collect();
chars.sort();
chars.dedup();
// Create an empty and growable `String`
let mut string = String::new();
for c in chars {
// Insert a char at the end of string
string.push(c);
// Insert a string at the end of string
string.push_str(", ");
}
// The trimmed string is a slice to the original string, hence no new
// allocation is performed
let chars_to_trim: &[char] = &[' ', ','];
let trimmed_str: &str = string.trim_matches(chars_to_trim);
println!("Used characters: {}", trimmed_str);
// Heap allocate a string
let alice = String::from("I like dogs");
// Allocate new memory and store the modified string there
let bob: String = alice.replace("dog", "cat");
println!("Alice says: {}", alice);
println!("Bob says: {}", bob);
}
More str
/String
methods can be found under the std::str and std::string modules
There are multiple ways to write string literals with special characters in them. All result in a similar &str
so it's best to use the form that is the most convenient to write. Similarly there are multiple ways to write byte string literals, which all result in &[u8; N]
.
Generally special characters are escaped with a backslash character: \
. This way you can add any character to your string, even unprintable ones and ones that you don't know how to type. If you want a literal backslash, escape it with another one: \\
String or character literal delimiters occuring within a literal must be escaped: "\""
, '\''
.
fn main() {
// You can use escapes to write bytes by their hexadecimal values...
let byte_escape = "I'm writing \x52\x75\x73\x74!";
println!("What are you doing\x3F (\\x3F means ?) {}", byte_escape);
// ...or Unicode code points.
let unicode_codepoint = "\u{211D}";
let character_name = "\"DOUBLE-STRUCK CAPITAL R\"";
println!("Unicode character {} (U+211D) is called {}",
unicode_codepoint, character_name );
let long_string = "String literals
can span multiple lines.
The linebreak and indentation here ->\
<- can be escaped too!";
println!("{}", long_string);
}
Sometimes there are just too many characters that need to be escaped or it's just much more convenient to write a string out as-is. This is where raw string literals come into play.
fn main() {
let raw_str = r"Escapes don't work here: \x3F \u{211D}";
println!("{}", raw_str);
// If you need quotes in a raw string, add a pair of #s
let quotes = r#"And then I said: "There is no escape!""#;
println!("{}", quotes);
// If you need "# in your string, just use more #s in the delimiter.
// There is no limit for the number of #s you can use.
let longer_delimiter = r###"A string with "# in it. And even "##!"###;
println!("{}", longer_delimiter);
}
Want a string that's not UTF-8? (Remember, str
and String
must be valid UTF-8). Or maybe you want an array of bytes that's mostly text? Byte strings to the rescue!
use std::str;
fn main() {
// Note that this is not actually a `&str`
let bytestring: &[u8; 21] = b"this is a byte string";
// Byte arrays don't have the `Display` trait, so printing them is a bit limited
println!("A byte string: {:?}", bytestring);
// Byte strings can have byte escapes...
let escaped = b"\x52\x75\x73\x74 as bytes";
// ...but no unicode escapes
// let escaped = b"\u{211D} is not allowed";
println!("Some escaped bytes: {:?}", escaped);
// Raw byte strings work just like raw strings
let raw_bytestring = br"\u{211D} is not escaped here";
println!("{:?}", raw_bytestring);
// Converting a byte array to `str` can fail
if let Ok(my_str) = str::from_utf8(raw_bytestring) {
println!("And the same as text: '{}'", my_str);
}
let _quotes = br#"You can also use "fancier" formatting, \
like with normal raw strings"#;
// Byte strings don't have to be UTF-8
let shift_jis = b"\x82\xe6\x82\xa8\x82\xb1\x82\xbb"; // "ようこそ" in SHIFT-JIS
// But then they can't always be converted to `str`
match str::from_utf8(shift_jis) {
Ok(my_str) => println!("Conversion successful: '{}'", my_str),
Err(e) => println!("Conversion failed: {:?}", e),
};
}
For conversions between character encodings check out the encoding crate.
A more detailed listing of the ways to write string literals and escape characters is given in the 'Tokens' chapter of the Rust Reference.
Option
Sometimes it's desirable to catch the failure of some parts of a program instead of calling panic!
; this can be accomplished using the Option
enum.
The Option<T>
enum has two variants:
None
, to indicate failure or lack of value, andSome(value)
, a tuple struct that wraps a value
with type T
.// An integer division that doesn't `panic!`
fn checked_division(dividend: i32, divisor: i32) -> Option<i32> {
if divisor == 0 {
// Failure is represented as the `None` variant
None
} else {
// Result is wrapped in a `Some` variant
Some(dividend / divisor)
}
}
// This function handles a division that may not succeed
fn try_division(dividend: i32, divisor: i32) {
// `Option` values can be pattern matched, just like other enums
match checked_division(dividend, divisor) {
None => println!("{} / {} failed!", dividend, divisor),
Some(quotient) => {
println!("{} / {} = {}", dividend, divisor, quotient)
},
}
}
fn main() {
try_division(4, 2);
try_division(1, 0);
// Binding `None` to a variable needs to be type annotated
let none: Option<i32> = None;
let _equivalent_none = None::<i32>;
let optional_float = Some(0f32);
// Unwrapping a `Some` variant will extract the value wrapped.
println!("{:?} unwraps to {:?}", optional_float, optional_float.unwrap());
// Unwrapping a `None` variant will `panic!`
println!("{:?} unwraps to {:?}", none, none.unwrap());
}
Result
We've seen that the Option
enum can be used as a return value from functions that may fail, where None
can be returned to indicate failure. However, sometimes it is important to express why an operation failed. To do this we have the Result
enum.
The Result<T, E>
enum has two variants:
Ok(value)
which indicates that the operation succeeded, and wraps the value
returned by the operation. (value
has type T
)Err(why)
, which indicates that the operation failed, and wraps why
, which (hopefully) explains the cause of the failure. (why
has type E
)mod checked {
// Mathematical "errors" we want to catch
#[derive(Debug)]
pub enum MathError {
DivisionByZero,
NonPositiveLogarithm,
NegativeSquareRoot,
}
pub type MathResult = Result<f64, MathError>;
pub fn div(x: f64, y: f64) -> MathResult {
if y == 0.0 {
// This operation would `fail`, instead let's return the reason of
// the failure wrapped in `Err`
Err(MathError::DivisionByZero)
} else {
// This operation is valid, return the result wrapped in `Ok`
Ok(x / y)
}
}
pub fn sqrt(x: f64) -> MathResult {
if x < 0.0 {
Err(MathError::NegativeSquareRoot)
} else {
Ok(x.sqrt())
}
}
pub fn ln(x: f64) -> MathResult {
if x <= 0.0 {
Err(MathError::NonPositiveLogarithm)
} else {
Ok(x.ln())
}
}
}
// `op(x, y)` === `sqrt(ln(x / y))`
fn op(x: f64, y: f64) -> f64 {
// This is a three level match pyramid!
match checked::div(x, y) {
Err(why) => panic!("{:?}", why),
Ok(ratio) => match checked::ln(ratio) {
Err(why) => panic!("{:?}", why),
Ok(ln) => match checked::sqrt(ln) {
Err(why) => panic!("{:?}", why),
Ok(sqrt) => sqrt,
},
},
}
}
fn main() {
// Will this fail?
println!("{}", op(1.0, 10.0));
}
?
Chaining results using match can get pretty untidy; luckily, the ?
operator can be used to make things pretty again. ?
is used at the end of an expression returning a Result
, and is equivalent to a match expression, where the Err(err)
branch expands to an early Err(From::from(err))
, and the Ok(ok)
branch expands to an ok
expression.
mod checked {
#[derive(Debug)]
enum MathError {
DivisionByZero,
NonPositiveLogarithm,
NegativeSquareRoot,
}
type MathResult = Result<f64, MathError>;
fn div(x: f64, y: f64) -> MathResult {
if y == 0.0 {
Err(MathError::DivisionByZero)
} else {
Ok(x / y)
}
}
fn sqrt(x: f64) -> MathResult {
if x < 0.0 {
Err(MathError::NegativeSquareRoot)
} else {
Ok(x.sqrt())
}
}
fn ln(x: f64) -> MathResult {
if x <= 0.0 {
Err(MathError::NonPositiveLogarithm)
} else {
Ok(x.ln())
}
}
// Intermediate function
fn op_(x: f64, y: f64) -> MathResult {
// if `div` "fails", then `DivisionByZero` will be `return`ed
let ratio = div(x, y)?;
// if `ln` "fails", then `NonPositiveLogarithm` will be `return`ed
let ln = ln(ratio)?;
sqrt(ln)
}
pub fn op(x: f64, y: f64) {
match op_(x, y) {
Err(why) => panic!("{}", match why {
MathError::NonPositiveLogarithm
=> "logarithm of non-positive number",
MathError::DivisionByZero
=> "division by zero",
MathError::NegativeSquareRoot
=> "square root of negative number",
}),
Ok(value) => println!("{}", value),
}
}
}
fn main() {
checked::op(1.0, 10.0);
}
Be sure to check the documentation, as there are many methods to map/compose Result
.
panic!
The panic!
macro can be used to generate a panic and start unwinding its stack. While unwinding, the runtime will take care of freeing all the resources owned by the thread by calling the destructor of all its objects.
Since we are dealing with programs with only one thread, panic!
will cause the program to report the panic message and exit.
// Re-implementation of integer division (/)
fn division(dividend: i32, divisor: i32) -> i32 {
if divisor == 0 {
// Division by zero triggers a panic
panic!("division by zero");
} else {
dividend / divisor
}
}
// The `main` task
fn main() {
// Heap allocated integer
let _x = Box::new(0i32);
// This operation will trigger a task failure
division(3, 0);
println!("This point won't be reached!");
// `_x` should get destroyed at this point
}
Let's check that panic!
doesn't leak memory.
$ rustc panic.rs && valgrind ./panic
==4401== Memcheck, a memory error detector
==4401== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==4401== Using Valgrind-3.10.0.SVN and LibVEX; rerun with -h for copyright info
==4401== Command: ./panic
==4401==
thread '<main>' panicked at 'division by zero', panic.rs:5
==4401==
==4401== HEAP SUMMARY:
==4401== in use at exit: 0 bytes in 0 blocks
==4401== total heap usage: 18 allocs, 18 frees, 1,648 bytes allocated
==4401==
==4401== All heap blocks were freed -- no leaks are possible
==4401==
==4401== For counts of detected and suppressed errors, rerun with: -v
==4401== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Where vectors store values by an integer index, HashMap
s store values by key. HashMap
keys can be booleans, integers, strings, or any other type that implements the Eq
and Hash
traits. More on this in the next section.
Like vectors, HashMap
s are growable, but HashMaps can also shrink themselves when they have excess space. You can create a HashMap with a certain starting capacity using HashMap::with_capacity(uint)
, or use HashMap::new()
to get a HashMap with a default initial capacity (recommended).
use std::collections::HashMap;
fn call(number: &str) -> &str {
match number {
"798-1364" => "We're sorry, the call cannot be completed as dialed.
Please hang up and try again.",
"645-7689" => "Hello, this is Mr. Awesome's Pizza. My name is Fred.
What can I get for you today?",
_ => "Hi! Who is this again?"
}
}
fn main() {
let mut contacts = HashMap::new();
contacts.insert("Daniel", "798-1364");
contacts.insert("Ashley", "645-7689");
contacts.insert("Katie", "435-8291");
contacts.insert("Robert", "956-1745");
// Takes a reference and returns Option<&V>
match contacts.get(&"Daniel") {
Some(&number) => println!("Calling Daniel: {}", call(number)),
_ => println!("Don't have Daniel's number."),
}
// `HashMap::insert()` returns `None`
// if the inserted value is new, `Some(value)` otherwise
contacts.insert("Daniel", "164-6743");
match contacts.get(&"Ashley") {
Some(&number) => println!("Calling Ashley: {}", call(number)),
_ => println!("Don't have Ashley's number."),
}
contacts.remove(&"Ashley");
// `HashMap::iter()` returns an iterator that yields
// (&'a key, &'a value) pairs in arbitrary order.
for (contact, &number) in contacts.iter() {
println!("Calling {}: {}", contact, call(number));
}
}
For more information on how hashing and hash maps (sometimes called hash tables) work, have a look at Hash Table Wikipedia
Any type that implements the Eq
and Hash
traits can be a key in HashMap
. This includes:
bool
(though not very useful since there is only two possible keys)int
, uint
, and all variations thereofString
and &str
(protip: you can have a HashMap
keyed by String
and call .get()
with an &str
)Note that f32
and f64
do not implement Hash
, likely because floating-point precision errors would make using them as hashmap keys horribly error-prone.
All collection classes implement Eq
and Hash
if their contained type also respectively implements Eq
and Hash
. For example, Vec<T>
will implement Hash
if T
implements Hash
.
You can easily implement Eq
and Hash
for a custom type with just one line: #[derive(PartialEq, Eq, Hash)]
The compiler will do the rest. If you want more control over the details, you can implement Eq
and/or Hash
yourself. This guide will not cover the specifics of implementing Hash
.
To play around with using a struct
in HashMap
, let's try making a very simple user logon system:
use std::collections::HashMap;
// Eq requires that you derive PartialEq on the type.
#[derive(PartialEq, Eq, Hash)]
struct Account<'a>{
username: &'a str,
password: &'a str,
}
struct AccountInfo<'a>{
name: &'a str,
email: &'a str,
}
type Accounts<'a> = HashMap<Account<'a>, AccountInfo<'a>>;
fn try_logon<'a>(accounts: &Accounts<'a>,
username: &'a str, password: &'a str){
println!("Username: {}", username);
println!("Password: {}", password);
println!("Attempting logon...");
let logon = Account {
username,
password,
};
match accounts.get(&logon) {
Some(account_info) => {
println!("Successful logon!");
println!("Name: {}", account_info.name);
println!("Email: {}", account_info.email);
},
_ => println!("Login failed!"),
}
}
fn main(){
let mut accounts: Accounts = HashMap::new();
let account = Account {
username: "j.everyman",
password: "password123",
};
let account_info = AccountInfo {
name: "John Everyman",
email: "j.everyman@email.com",
};
accounts.insert(account, account_info);
try_logon(&accounts, "j.everyman", "psasword123");
try_logon(&accounts, "j.everyman", "password123");
}
Consider a HashSet
as a HashMap
where we just care about the keys ( HashSet<T>
is, in actuality, just a wrapper around HashMap<T, ()>
).
"What's the point of that?" you ask. "I could just store the keys in a Vec
."
A HashSet
's unique feature is that it is guaranteed to not have duplicate elements. That's the contract that any set collection fulfills. HashSet
is just one implementation. (see also: BTreeSet
)
If you insert a value that is already present in the HashSet
, (i.e. the new value is equal to the existing and they both have the same hash), then the new value will replace the old.
This is great for when you never want more than one of something, or when you want to know if you've already got something.
But sets can do more than that.
Sets have 4 primary operations (all of the following calls return an iterator):
union
: get all the unique elements in both sets.
difference
: get all the elements that are in the first set but not the second.
intersection
: get all the elements that are only in both sets.
symmetric_difference
: get all the elements that are in one set or the other, but not both.
Try all of these in the following example:
use std::collections::HashSet;
fn main() {
let mut a: HashSet<i32> = vec![1i32, 2, 3].into_iter().collect();
let mut b: HashSet<i32> = vec![2i32, 3, 4].into_iter().collect();
assert!(a.insert(4));
assert!(a.contains(&4));
// `HashSet::insert()` returns false if
// there was a value already present.
assert!(b.insert(4), "Value 4 is already in set B!");
// FIXME ^ Comment out this line
b.insert(5);
// If a collection's element type implements `Debug`,
// then the collection implements `Debug`.
// It usually prints its elements in the format `[elem1, elem2, ...]`
println!("A: {:?}", a);
println!("B: {:?}", b);
// Print [1, 2, 3, 4, 5] in arbitrary order
println!("Union: {:?}", a.union(&b).collect::<Vec<&i32>>());
// This should print [1]
println!("Difference: {:?}", a.difference(&b).collect::<Vec<&i32>>());
// Print [2, 3, 4] in arbitrary order.
println!("Intersection: {:?}", a.intersection(&b).collect::<Vec<&i32>>());
// Print [1, 5]
println!("Symmetric Difference: {:?}",
a.symmetric_difference(&b).collect::<Vec<&i32>>());
}
(Examples are adapted from the documentation.)
Rc
When multiple ownership is needed, Rc
(Reference Counting) can be used. Rc
keeps track of the number of the references which means the number of owners of the value wrapped inside an Rc
.
Reference count of an Rc
increases by 1 whenever an Rc
is cloned, and decreases by 1 whenever one cloned Rc
is dropped out of the scope. When an Rc
's reference count becomes zero, which means there are no owners remained, both the Rc
and the value are all dropped.
Cloning an Rc
never performs a deep copy. Cloning creates just another pointer to the wrapped value, and increments the count.
use std::rc::Rc;
fn main() {
let rc_examples = "Rc examples".to_string();
{
println!("--- rc_a is created ---");
let rc_a: Rc<String> = Rc::new(rc_examples);
println!("Reference Count of rc_a: {}", Rc::strong_count(&rc_a));
{
println!("--- rc_a is cloned to rc_b ---");
let rc_b: Rc<String> = Rc::clone(&rc_a);
println!("Reference Count of rc_b: {}", Rc::strong_count(&rc_b));
println!("Reference Count of rc_a: {}", Rc::strong_count(&rc_a));
// Two `Rc`s are equal if their inner values are equal
println!("rc_a and rc_b are equal: {}", rc_a.eq(&rc_b));
// We can use methods of a value directly
println!("Length of the value inside rc_a: {}", rc_a.len());
println!("Value of rc_b: {}", rc_b);
println!("--- rc_b is dropped out of scope ---");
}
println!("Reference Count of rc_a: {}", Rc::strong_count(&rc_a));
println!("--- rc_a is dropped out of scope ---");
}
// Error! `rc_examples` already moved into `rc_a`
// And when `rc_a` is dropped, `rc_examples` is dropped together
// println!("rc_examples: {}", rc_examples);
// TODO ^ Try uncommenting this line
}
When shared ownership between threads is needed, Arc
(Atomic Reference Counted) can be used. This struct, via the Clone
implementation can create a reference pointer for the location of a value in the memory heap while increasing the reference counter. As it shares ownership between threads, when the last reference pointer to a value is out of scope, the variable is dropped.
fn main() {
use std::sync::Arc;
use std::thread;
// This variable declaration is where its value is specified.
let apple = Arc::new("the same apple");
for _ in 0..10 {
// Here there is no value specification as it is a pointer to a reference
// in the memory heap.
let apple = Arc::clone(&apple);
thread::spawn(move || {
// As Arc was used, threads can be spawned using the value allocated
// in the Arc variable pointer's location.
println!("{:?}", apple);
});
}
}
Original article source at https://doc.rust-lang.org
#rust #programming #developer
1657268760
Quer você fale de Twitter, Goodreads ou Amazon – dificilmente existe um espaço digital não saturado com as opiniões das pessoas. No mundo de hoje, é crucial que as organizações se aprofundem nessas opiniões e obtenham insights sobre seus produtos ou serviços. No entanto, esses dados existem em quantidades tão surpreendentes que medi-los manualmente é uma busca quase impossível. É aqui que mais um benefício da Data Science entra em jogo – Análise de Sentimentos . Neste artigo, exploraremos o que a análise de sentimentos abrange e as várias maneiras de implementá-la em Python.
A Análise de Sentimento é um caso de uso do Processamento de Linguagem Natural (NLP) e se enquadra na categoria de classificação de texto . Simplificando, a Análise de Sentimentos envolve a classificação de um texto em vários sentimentos, como positivo ou negativo, Feliz, Triste ou Neutro, etc. texto. Isso também é conhecido como Mineração de Opinião .
Vejamos como uma rápida pesquisa no Google define a Análise de Sentimento:
Bem, agora acho que estamos um pouco acostumados com o que é a análise de sentimentos. Mas qual é o seu significado e como as organizações se beneficiam dele? Vamos tentar explorar o mesmo com um exemplo. Suponha que você inicie uma empresa que vende perfumes em uma plataforma online. Você coloca uma grande variedade de fragrâncias por aí e logo os clientes começam a aparecer. Depois de algum tempo, você decide mudar a estratégia de preços dos perfumes - você planeja aumentar os preços das fragrâncias populares e, ao mesmo tempo, oferecer descontos nas impopulares . Agora, para determinar quais fragrâncias são populares, você começa a analisar as avaliações dos clientes de todas as fragrâncias. Mas você está preso! Eles são tantos que você não pode passar por todos eles em uma vida. É aqui que a análise de sentimentos pode tirá-lo do poço.
Você simplesmente reúne todas as avaliações em um só lugar e aplica a análise de sentimentos a elas. A seguir, uma representação esquemática da análise de sentimentos nas resenhas de três fragrâncias de perfumes – Lavanda, Rosa e Limão. (Observe que essas revisões podem ter ortografia, gramática e pontuação incorretas, como nos cenários do mundo real)
A partir desses resultados, podemos ver claramente que:
Fragrance-1 (Lavender) tem avaliações altamente positivas dos clientes, o que indica que sua empresa pode aumentar seus preços devido à sua popularidade.
Fragrance-2 (Rose) tem uma perspectiva neutra entre o cliente, o que significa que sua empresa não deve alterar seus preços .
O Fragrance-3 (Lemon) tem um sentimento geral negativo associado a ele – portanto, sua empresa deve considerar oferecer um desconto para equilibrar a balança.
Este foi apenas um exemplo simples de como a análise de sentimentos pode ajudá-lo a obter insights sobre seus produtos/serviços e ajudar sua organização a tomar decisões.
Acabamos de ver como a análise de sentimentos pode capacitar as organizações com insights que podem ajudá-las a tomar decisões baseadas em dados. Agora, vamos dar uma olhada em mais alguns casos de uso de análise de sentimentos.
O Python é uma das ferramentas mais poderosas quando se trata de realizar tarefas de ciência de dados — ele oferece várias maneiras de realizar análises de sentimentos . Os mais populares estão listados aqui:
Vamos mergulhar fundo neles um por um.
Nota: Para fins de demonstração dos métodos 3 e 4 (Usando Modelos Baseados em Vetorização Bag of Words e Usando Modelos Baseados em LSTM) foi utilizada a análise de sentimentos . Compreende mais de 5.000 excretos de texto rotulados como positivos, negativos ou neutros. O conjunto de dados está sob a licença Creative Commons.
Text Blob é uma biblioteca Python para processamento de linguagem natural. Usar o Text Blob para análise de sentimentos é bastante simples. Ele recebe texto como entrada e pode retornar polaridade e subjetividade como saída.
A polaridade determina o sentimento do texto. Seus valores estão em [-1,1] onde -1 denota um sentimento altamente negativo e 1 denota um sentimento altamente positivo.
A subjetividade determina se uma entrada de texto é uma informação factual ou uma opinião pessoal. O seu valor situa-se entre [0,1] onde um valor mais próximo de 0 denota uma informação factual e um valor mais próximo de 1 denota uma opinião pessoal.
Instalação :
pip install textblob
Importando Blob de Texto:
from textblob import TextBlob
Implementação de código para análise de sentimento usando blob de texto:
Escrever código para análise de sentimentos usando TextBlob é bastante simples. Basta importar o objeto TextBlob e passar o texto a ser analisado com os devidos atributos da seguinte forma:
from textblob import TextBlob
text_1 = "The movie was so awesome."
text_2 = "The food here tastes terrible."#Determining the Polarity
p_1 = TextBlob(text_1).sentiment.polarity
p_2 = TextBlob(text_2).sentiment.polarity#Determining the Subjectivity
s_1 = TextBlob(text_1).sentiment.subjectivity
s_2 = TextBlob(text_2).sentiment.subjectivityprint("Polarity of Text 1 is", p_1)
print("Polarity of Text 2 is", p_2)
print("Subjectivity of Text 1 is", s_1)
print("Subjectivity of Text 2 is", s_2)
Resultado:
Polarity of Text 1 is 1.0
Polarity of Text 2 is -1.0
Subjectivity of Text 1 is 1.0
Subjectivity of Text 2 is 1.0
O VADER (Valence Aware Dictionary and sEntiment Reasoner) é um analisador de sentimentos baseado em regras que foi treinado em texto de mídia social. Assim como o Text Blob, seu uso em Python é bastante simples. Veremos seu uso na implementação de código com um exemplo daqui a pouco.
Instalação:
pip install vaderSentiment
Importando a classe SentimentIntensityAnalyzer do Vader:
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
Código para análise de sentimentos usando o Vader:
Primeiramente, precisamos criar um objeto da classe SentimentIntensityAnalyzer; então precisamos passar o texto para a função polarity_scores() do objeto da seguinte forma:
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
sentiment = SentimentIntensityAnalyzer()
text_1 = "The book was a perfect balance between wrtiting style and plot."
text_2 = "The pizza tastes terrible."
sent_1 = sentiment.polarity_scores(text_1)
sent_2 = sentiment.polarity_scores(text_2)
print("Sentiment of text 1:", sent_1)
print("Sentiment of text 2:", sent_2)
Saída :
Sentiment of text 1: {'neg': 0.0, 'neu': 0.73, 'pos': 0.27, 'compound': 0.5719}
Sentiment of text 2: {'neg': 0.508, 'neu': 0.492, 'pos': 0.0, 'compound': -0.4767}
Como podemos ver, um objeto VaderSentiment retorna um dicionário de pontuações de sentimento para o texto a ser analisado.
Nas duas abordagens discutidas até agora, ou seja, Text Blob e Vader, simplesmente usamos bibliotecas Python para realizar a análise de sentimentos. Agora discutiremos uma abordagem na qual treinaremos nosso próprio modelo para a tarefa. As etapas envolvidas na análise de sentimentos usando o método Bag of Words Vectorization são as seguintes:
Código para análise de sentimentos usando a abordagem de vetorização Bag of Words:
Para construir um modelo de análise de sentimento usando a Abordagem de Vetorização BOW, precisamos de um conjunto de dados rotulado. Como afirmado anteriormente, o conjunto de dados usado para esta demonstração foi obtido do Kaggle. Nós simplesmente usamos o vetorizador de contagem do sklearn para criar o BOW. Após, treinamos um classificador Multinomial Naive Bayes, para o qual foi obtido um escore de precisão de 0,84.
O conjunto de dados pode ser obtido aqui .
#Loading the Dataset
import pandas as pd
data = pd.read_csv('Finance_data.csv')
#Pre-Prcoessing and Bag of Word Vectorization using Count Vectorizer
from sklearn.feature_extraction.text import CountVectorizer
from nltk.tokenize import RegexpTokenizer
token = RegexpTokenizer(r'[a-zA-Z0-9]+')
cv = CountVectorizer(stop_words='english',ngram_range = (1,1),tokenizer = token.tokenize)
text_counts = cv.fit_transform(data['sentences'])
#Splitting the data into trainig and testing
from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(text_counts, data['feedback'], test_size=0.25, random_state=5)
#Training the model
from sklearn.naive_bayes import MultinomialNB
MNB = MultinomialNB()
MNB.fit(X_train, Y_train)
#Caluclating the accuracy score of the model
from sklearn import metrics
predicted = MNB.predict(X_test)
accuracy_score = metrics.accuracy_score(predicted, Y_test)
print("Accuracuy Score: ",accuracy_score)
Saída :
Accuracuy Score: 0.9111675126903553
O classificador treinado pode ser usado para prever o sentimento de qualquer entrada de texto.
Embora tenhamos conseguido obter uma pontuação de precisão decente com o método Bag of Words Vectorization, ele pode não produzir os mesmos resultados ao lidar com conjuntos de dados maiores. Isso dá origem à necessidade de empregar modelos baseados em deep learning para o treinamento do modelo de análise de sentimentos.
Para tarefas de PNL, geralmente usamos modelos baseados em RNN, pois são projetados para lidar com dados sequenciais. Aqui, vamos treinar um modelo LSTM (Long Short Term Memory) usando o TensorFlow com Keras . As etapas para realizar a análise de sentimento usando modelos baseados em LSTM são as seguintes:
Código para análise de sentimentos usando abordagem de modelo baseada em LSTM:
Aqui, usamos o mesmo conjunto de dados que usamos no caso da abordagem BOW. Uma precisão de treinamento de 0,90 foi obtida.
#Importing necessary libraries
import nltk
import pandas as pd
from textblob import Word
from nltk.corpus import stopwords
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import classification_report,confusion_matrix,accuracy_score
from keras.models import Sequential
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
from keras.layers import Dense, Embedding, LSTM, SpatialDropout1D
from sklearn.model_selection import train_test_split
#Loading the dataset
data = pd.read_csv('Finance_data.csv')
#Pre-Processing the text
def cleaning(df, stop_words):
df['sentences'] = df['sentences'].apply(lambda x: ' '.join(x.lower() for x in x.split()))
# Replacing the digits/numbers
df['sentences'] = df['sentences'].str.replace('d', '')
# Removing stop words
df['sentences'] = df['sentences'].apply(lambda x: ' '.join(x for x in x.split() if x not in stop_words))
# Lemmatization
df['sentences'] = df['sentences'].apply(lambda x: ' '.join([Word(x).lemmatize() for x in x.split()]))
return df
stop_words = stopwords.words('english')
data_cleaned = cleaning(data, stop_words)
#Generating Embeddings using tokenizer
tokenizer = Tokenizer(num_words=500, split=' ')
tokenizer.fit_on_texts(data_cleaned['verified_reviews'].values)
X = tokenizer.texts_to_sequences(data_cleaned['verified_reviews'].values)
X = pad_sequences(X)
#Model Building
model = Sequential()
model.add(Embedding(500, 120, input_length = X.shape[1]))
model.add(SpatialDropout1D(0.4))
model.add(LSTM(704, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(352, activation='LeakyReLU'))
model.add(Dense(3, activation='softmax'))
model.compile(loss = 'categorical_crossentropy', optimizer='adam', metrics = ['accuracy'])
print(model.summary())
#Model Training
model.fit(X_train, y_train, epochs = 20, batch_size=32, verbose =1)
#Model Testing
model.evaluate(X_test,y_test)
Os modelos baseados em transformadores são uma das técnicas de processamento de linguagem natural mais avançadas. Eles seguem uma arquitetura baseada em Encoder-Decoder e empregam os conceitos de autoatenção para produzir resultados impressionantes. Embora sempre se possa construir um modelo de transformador do zero, é uma tarefa bastante tediosa. Assim, podemos usar modelos de transformadores pré-treinados disponíveis no Hugging Face . Hugging Face é uma comunidade de IA de código aberto que oferece uma infinidade de modelos pré-treinados para aplicativos de PNL. Esses modelos podem ser usados como tal ou podem ser ajustados para tarefas específicas.
Instalação:
pip install transformers
Importando a classe SentimentIntensityAnalyzer do Vader:
import transformers
Código para análise de sentimentos usando modelos baseados em Transformer:
Para executar qualquer tarefa usando transformadores, primeiro precisamos importar a função pipeline dos transformadores. Então, um objeto da função pipeline é criado e a tarefa a ser executada é passada como um argumento (ou seja, análise de sentimento no nosso caso). Também podemos especificar o modelo que precisamos usar para realizar a tarefa. Aqui, como não mencionamos o modelo a ser usado, o modo destilaria-base-uncased-finetuned-sst-2-English é usado por padrão para análise de sentimentos. Você pode conferir a lista de tarefas e modelos disponíveis aqui .
from transformers import pipeline
sentiment_pipeline = pipeline("sentiment-analysis")
data = ["It was the best of times.", "t was the worst of times."]
sentiment_pipeline(data)Output:[{'label': 'POSITIVE', 'score': 0.999457061290741}, {'label': 'NEGATIVE', 'score': 0.9987301230430603}]
Nesta era em que os usuários podem expressar seus pontos de vista sem esforço e os dados são gerados em superfluidade em apenas frações de segundos - extrair insights desses dados é vital para as organizações tomarem decisões eficientes - e a Análise de Sentimentos prova ser a peça que faltava no quebra-cabeça!
Até agora, cobrimos em detalhes o que exatamente envolve a análise de sentimentos e os vários métodos que podemos usar para realizá-la em Python. Mas essas foram apenas algumas demonstrações rudimentares - você certamente deve ir em frente e mexer nos modelos e testá-los em seus próprios dados.
Fonte: https://www.analyticsvidhya.com/blog/2022/07/sentiment-analysis-using-python/