Automation Bro

Automation Bro

1675687134

Data Driven testing using Postman JSON file

In this video, we'll be demonstrating how to use Postman for data-driven testing using a JSON file. We'll cover the process of importing a JSON file into Postman, creating a collection, and setting up test scripts to loop through the data and perform assertions. 

We will also learn how to work with dynamic JSON data. This is a great way to quickly and efficiently test multiple scenarios, making sure your API is working as expected. Whether you're a beginner or an experienced Postman user, this tutorial will provide valuable insights on how to streamline your testing process. So, tune in and learn how to take your Postman testing to the next level!

https://youtu.be/XM6kh_jnUSY

#postman #api #testing #webdev #automation #javascript 

Data Driven testing using Postman JSON file
Fabiola  Auma

Fabiola Auma

1675307640

A Complete Guide to Test Automation Framework

Gone are the days when enterprises relied solely on manual testing. Even though manual testing is an integral part of the testing process, there’s no denying its disadvantages. It’s tedious, time-consuming, and calls for hefty investment in human resources. 

The debate about manual vs. automated testing has been going on for a long time. And many people are still ignorant about what automation means in testing

This post provides a complete guide to test automation frameworks. Here, you will gain an insight into test automation frameworks and their components. You will also learn the importance of test automation frameworks and how to choose the best fit. Here’s a summary of what we’ll cover today:

What Is Test Automation?

The term automation refers to the automatic handling of various industrial processes. It indicates that there is little to no human intervention. 

When we define test automation in the IT sector, it means performing tests on applications via different automation tools to check how applications behave or respond to different actions. These tools can be both open-source and licensed. When the application is deployed, a variety of users perform a variety of actions on the application.

Test automation is the process of automating well-known or predictive actions of users to make sure the application behaves as expected.

What Is Meant By Automation Framework?

Before we go on to discuss test automation frameworks, let’s first understand what a framework is. 

To avail the benefits of any concept, there has a be a set of protocols and guidelines to abide by. In general, a framework is a combination of standards and rules which, when followed, can be used by an enterprise to make the best bang for their buck. 

Similarly, a test automation framework is a platform that is a combination of programs, compilers, features, tools, etc. It provides an environment where you can execute automated test scripts.

In short, a test automation framework is a set of components that facilitate executing tests and comprehensive reporting of test results. The major components that implement a test automation framework successfully are equipment, testing tools, scripts, procedures, and most importantly, test automation engineers. 

What Are The Main Components of a Test Automation Framework?

Test data management and testing libraries are some of the major components of test automation frameworks. Let’s take a look at each in detail.

  1. Test data management
  2. Testing libraries
    1. Unit testing
    2. A Unit test example
    3. Integration testing
    4. Behavior-driven development

1. Test Data Management

Harnessing data and extracting useful information is the biggest hassle during software testing automation. The availability of data to carry out tests is usually a major problem. To ensure the success of automation efforts, it’s necessary to have a strategic test data management approach.

 

Thus, a software company should equip their framework with resources like libraries or plugins for scavenging through test data and finding what can be used for positive testing. Your framework should also have a simulation tool to make the data more digestible and lucid. If the data is simplified, test data management becomes a lot easier.

2. Testing Libraries

The core of an application’s testing process comprises of managing and running the test cases. It’s ideal to get your test cases well-defined and organized so you can perform testing efficiently and effectively. A testing library is where you create and store the test cases. Testing libraries include unit testing, integration, and end-to-end testing, and behavior-driven development. Let’s see what each of them means.

2.1 Unit Testing

Unit testing libraries are a must-have for shaping up a vital part of any test automation framework. This is something which is done not only by the testers but the developers as well. Testers use them to define test methods through specified formal annotations.

Unit testing is also used for running simplified and straightforward tests. Unit testing libraries support most of the programming languages. For instance, if you’re a Java developer you probably use something like JUnit or TestNG. On the other hand, C# developers are likely to use NUnit or xUnit.NET.

When it comes to JavaScript unit test frameworks, you have many options at your disposal, including like QUnit, Mocha, Jest, Ava, Jasmine, to name just a few.

If you’re a developer, it’s a good practice to unit test your code as soon as you develop each module. This reduces the defect count during the later phases of testing.

2.2 A Unit Test Example

Here goes a simple unit test example., in JavaScript and using the Jest framework. Suppose you have the following function:


function add(numbers) {
 return numbers.split(',')
 .map(x => parseInt(x))
 .reduce((a, b) => a + b);
 }

The function above gets as an argument a string containing numbers separated by comma. It then splits the string using the comma as delimiter, parses each part into an integer, and finally adds the numbers together. So, a string like “1,2” should result in 3. The following table shows a few examples:

InputExpected Output
“5”5
“4,5”9
“1,2,3”6

Turned into code, the examples above might look like this:

=
test('string with a single number should result in the number itself', () => {
    expect(add('5')).toBe(5);
  });
 
test('two numbers separated by comma should result in their sum', () => {
    expect(add('4,5')).toBe(9);
  });
 
test('three numbers separated by comma should result in their sum', () => {
  expect(add('1,2,3')).toBe(6);
});

2.3 Integration Testing

So unit testing is where you test each module or functionality of an application. In unit testing, you must ensure each unit of the application is completely isolated. That means that, during unit testing, units can’t talk to one another. Also, they can’t interact with any dependency that lives outside the code, such as the database or the filesystem. When it comes to JavaScript apps, external dependencies are typically HTTP services, or APIs.

However, in the real world, units do interact with each other and with external dependencies. That’s why unit tests aren’t enough. Testing units in isolation is valuable and necessary, but you also need to test the integrations—both between units and between them and external dependencies—if you want to ensure your application works as intended. 

That’s where integration testing comes in handy. Bear in mind that, by and large, testing frameworks for integration testing are the same you’d use for unit testing—for example, JUnit for Java and NUnit for .NET. The difference lies in the way you use these frameworks. In other words: the tools are the same; the difference between unit tests and integration tests lie in the way tests are carried out.

 

2.4 Behavior-Driven Development

As important as they are to your testing library, unit and end-to-end testing have a problem. They rely a lot on the implementation of the functionality which is tested. So if you change the code, you’ll need to change the test case. 

How do we address this issue? Behavior-driven development (BDD) is key. Don’t get confused by the name. BDD is not related to development. It’s a collection of best practices. When those practices are applied to automation testing, BDD enables you to write great test cases.

It’s written in an English-like language that’s understandable for the team. You can convert scenarios and features of expected behavior into code. BDD enables the alignment of code with the intent and scope of automated tests.

How Many Types of Test Automation Are There? What Are The Different Types of Test Automation Frameworks?

Now that you understand what a test automation framework is and what its components, let’s look at the different types of frameworks out there. Automated testing covers a range of test frameworks. Here, I’ll go over the most common types.

  1. Linear Automation Framework
  2. Modular Based Testing Framework
  3. Library Architecture Testing Framework
  4. Keyword-Driven Framework
  5. Data-Driven Framework
  6. Hybrid Testing Framework

1. Linear Automation Test Framework

A linear automation test framework involves introductory level testing. Testers sequentially create test scripts and individually run them. There is no need to write custom code. So testers don’t have to be automation experts. It’s a one-at-a-time to the point testing approach. Pick one functionality, write a script, and test. Also, a speedy workflow is the biggest perk of linear automation framework.

Pros

  • Simple
  • Fast
  • Flexible

Cons

  • Single-use
  • High Maintenance
  • Redundant

2. Modular-Based Test Framework

Modular-based test frameworks break down test cases into small modules. Then, it follows a non-incremental and incremental approach. There, the modules are independently tested first and then the application is tested as a whole. This makes each test independent. 

Moreover, once a tester is done writing a function library, you can also store a script in it. Since you can easily make changes in a single script, adjusting the entire application is not necessary. Thus, testing requires less time and efforts.

Pros

  • Reusable
  • Modular approach
  • Efficient
  • Scalable

Cons

  • Less flexible
  • Requires technical knowledge
  • Complex

3. Library Architecture Test Framework

With a library architecture test framework, the framework identifies tasks holding similarities within the test script. After that, testers carry out the grouping of similar tasks by function. Then, a library stores all the sorted functions. Hence, it facilitates reusability of code across different test scripts. This framework is useful when the application has similar functionalities across different parts of the application.

Pros

  • High reusability
  • Cost-effective
  • Scalable
  • High long time ROI

Cons

  • More development time
  • High technical knowledge required
  • Complicated

4. Keyword-Driven Test Framework

A keyword-driven test framework separates script logic from test data. Then, it stores the data externally. After that, it stores the keywords in a different location. The keywords associated with actions testing GUI are a part of the test script. A keyword associates itself with an object or an action. Since testers can use the same keyword across different test scripts, it promotes reusability.

Pros

  • Reusable
  • Scalable
  • Less Maintenance

Cons

  • High development time
  • Complexity increase over time
  • High automation knowledge required

5. Data-Driven Test Framework

A data-driven test framework segregates test script logic and test data. After that, it stores the data externally. Here, the aim is to create reusable test scripts for testing different data sets. Testers can vary the data to change testing scenarios. This ensures reusability of code.

Pros

  • Scalable
  • Faster testing
  • Fewer scripts required
  • Flexible

Cons

  • High setup time
  • Excellent technical knowledge required
  • Troubleshooting is difficult

6. Hybrid Test Framework

A hybrid test framework mitigates the weaknesses of different test frameworks. It provides flexibility by combining parts of different frameworks to harness the advantages. Hence, the efficiency of testing also improves.

Popular test automation frameworks include:

  1. Selenium
  2. Appium
  3. UFT


Why Do We Need a Test Automation Framework?

In the modern era, the entire world is moving toward automation. With this, the need for test automation is rising. Proper planning and execution of test automation frameworks have a lot of perks to offer.

1. Optimization of Resources

A test framework helps in the optimization of resources. It does this by facilitating the use of different resources according to organizational needs. 

For instance, to achieve established goals, test automation frameworks provide a set of processes. These processes have to match resources to requirements. The higher the flexibility of adoption, the better your resource optimization will be. 

2. Increased Volume of Testing

Test automation frameworks increase the volume of testing. For instance, new mobile devices emerge every other day. It’s impossible to perform manual testing on all. Even if a firm managed to do so, it would take forever. But automated testing enables testers to run tests on thousands of mobile devices at the same time. 

3. Simultaneous Testing

Test automation frameworks enable simultaneous testing of different types of devices. When the test scripts are automated, all testers need to do is run them on different devices. Since the parameters are same, testers can quickly generate comparative test reports.

4. Enhanced Speed and Reliability

Writing and running tests can be time-consuming for any software company. Test automation frameworks reduce the time to carry out these activities. How? Suppose you’re testing the logout functionality of a website. If there are multiple testing scenarios, for each scenario you have to manually test whether the log out feature is working properly. But if you’re using a framework, you can simultaneously run all the scenarios and get the test results in very little time.

Moreover, automated testing is more reliable due to the use of automated tools. This reduces the chances of making mistakes.

5. More Output in Less Time

Test automation reduces challenges in synchronization, local configuration, error management, and report generation. An automation script minimizes the time taken to prepare and run tests. With increased efficiency and speed, a firm can gain more output in less time. 

6. Fixing Bugs at an Early Stage

A test automation framework helps in fixing bugs at an early stage. You don’t need much manpower to carry it out for you, which means the working hours and expenses involved are also reduced. A test automation engineer can write scripts and automate tests.

By using the right test automation frameworks, an organization can implement the concept of shift-left testing. That refers to the idea that you should move testing to as early in the software development lifecycle as possible.

The earlier you can get is actually creating automated testing before the writing the production code. That’s exactly the modus operandi of techniques such as TDD (test-driven development) and BDD (behavior-driven development.)

7. Remote Testing

With a test automation framework, it’s not necessary to stay at the office premises 24/7. For instance, you can start running a test before leaving. When you come back after a few hours, the test results will be ready. Moreover, you don’t need to buy a lot of devices since you can test remotely.

8. Reusable Automation Code

You can reuse test automation scripts in a different application. Suppose the testers of your organization wrote some scripts for testing the login functionality. You can use the same script for another application that has a login functionality.

9. Increased ROI

The initial investment involved in test automation frameworks is off-putting for many. But the long-term return on investment is high.

 

As discussed earlier, a test automation framework saves time and facilitates speedy delivery. It also reduces the need for more staff.

 

For instance, a company doesn’t have to hire multiple testers if the testing framework is automated. A test automation engineer can carry out most of the tasks like configuring the framework or running the scripts.

10. Continuous Testing

It’s 2022, and the importance of continuous integration and continuous delivery/deployment can’t be overstated. Having a fully automated software pipeline is the surest way to ensure your code reaches production as fast as possible.

However, it’s no use to ship broken code super fast. That’s why an essential piece of the CI/CD puzzle is continuous testing. What is continuous testing?

In a nutshell, it’s the practice of running your suite of automated tests continuously. Test automation frameworks are key in achieving continuous testing, since they enable not only the creation of the tests, but also their automatic execution.

Which Testing Framework Is Best?

Now that you know the benefits of test automation frameworks, it’s time to choose one.

With a plethora of different testing frameworks, it can become overwhelming to know which one is right for you. And to find the best solution, you need to understand the requirement first. Before looking for which testing framework suits you best, learn the basics about your testing process:

  • Code or no code: Understand how your testers test the application. Do they write codes for testing or they use something like record and playback testing? If testers using coding, then you would want something that’s flexible. You must seek one that supports different languages and applications, something like Selenium. If testing is codeless, you can choose from a wide range of smart tools that don’t need coding, something like Testim.
  • Platform: What platform does the application run on? Is it a web application, an Android app, an iOS app? Different frameworks special in different platforms. So you have to choose one that offers the most for the platform of your choice. You also have to consider which platform your testers use – Windows, Mac, or Linux? You should choose a framework that works on the platform being used. 
  • Budget: There’s so much we’d have if money was out of the question. But we live in a practical world and money matters. When deciding which testing framework you’d finally use, consider your budget. Budget can be flexible at times but not out of the league. So, you can use the budget as one of the filters. 

You can also start with a tool offering out-of-the-box solutions. Use our five step process to learn how to identify the best automation platform for your organization. Above all, you need to do what it takes to improve the testing process.Summing It Up

Boosting a testing team’s velocity is no child’s play. Enterprises keep struggling to find a way to maximize efficiency. Testing is one of the most important phases in the software development process. Automation test frameworks are the way to go to increase test precision and make the product better.

When you choose the best testing framework for you, the testing process becomes better, the quality of the application increases, optimum testing speed is achieved, and most importantly, your testers are happy. 

 

In addition to these perks, they offer high ROI. Test automation frameworks are something every software company should deploy for skyrocketing revenue. Hunting for the right testing framework would take some time and effort. But it’s worth it.

The choice of a software testing framework is something that remains the same in the organization through time. And shifting from one framework to another mid-way is difficult for testers to adapt to, and it also slows down the application development process. The smart move is to spend enough time right in the beginning and choose the best possible framework for you. 

Original article sourced at: https://www.testim.io

#automation #entity-framework 

A Complete Guide to Test Automation Framework
Automation Bro

Automation Bro

1674577278

Postman Data Driven Testing with CSV Data File

Data-driven testing is a powerful technique for automating the testing process with Postman. By using a CSV data file, you can run the same test multiple times on different datasets, saving time and reducing the amount of code needed. In this tutorial, I will show you how to import a CSV file into Postman and use it for data-driven testing.

https://youtu.be/Hwmdq1fpbUA

#postman #api #testing #automation 

Postman Data Driven Testing with CSV Data File
Royce  Reinger

Royce Reinger

1673771280

Igel: A Delightful ML tool That Allows You To Train, Test

igel

A delightful machine learning tool that allows you to train/fit, test and use models without writing code

Introduction

The goal of the project is to provide machine learning for everyone, both technical and non-technical users.

I needed a tool sometimes, which I can use to fast create a machine learning prototype. Whether to build some proof of concept, create a fast draft model to prove a point or use auto ML. I find myself often stuck at writing boilerplate code and thinking too much where to start. Therefore, I decided to create this tool.

igel is built on top of other ML frameworks. It provides a simple way to use machine learning without writing a single line of code. Igel is highly customizable, but only if you want to. Igel does not force you to customize anything. Besides default values, igel can use auto-ml features to figure out a model that can work great with your data.

All you need is a yaml (or json) file, where you need to describe what you are trying to do. That's it!

Igel supports regression, classification and clustering. Igel's supports auto-ml features like ImageClassification and TextClassification

Igel supports most used dataset types in the data science field. For instance, your input dataset can be a csv, txt, excel sheet, json or even html file that you want to fetch. If you are using auto-ml features, then you can even feed raw data to igel and it will figure out how to deal with it. More on this later in the examples.

Features

  • Supports most dataset types (csv, txt, excel, json, html) even just raw data stored in folders
  • Supports all state of the art machine learning models (even preview models)
  • Supports different data preprocessing methods
  • Provides flexibility and data control while writing configurations
  • Supports cross validation
  • Supports both hyperparameter search (version >= 0.2.8)
  • Supports yaml and json format
  • Usage from GUI
  • Supports different sklearn metrics for regression, classification and clustering
  • Supports multi-output/multi-target regression and classification
  • Supports multi-processing for parallel model construction
  • Support for auto machine learning

Installation

  • The easiest way is to install igel using pip
$ pip install -U igel

Models

Igel's supported models:

+--------------------+----------------------------+-------------------------+
|      regression    |        classification      |        clustering       |
+--------------------+----------------------------+-------------------------+
|   LinearRegression |         LogisticRegression |                  KMeans |
|              Lasso |                      Ridge |     AffinityPropagation |
|          LassoLars |               DecisionTree |                   Birch |
| BayesianRegression |                  ExtraTree | AgglomerativeClustering |
|    HuberRegression |               RandomForest |    FeatureAgglomeration |
|              Ridge |                 ExtraTrees |                  DBSCAN |
|  PoissonRegression |                        SVM |         MiniBatchKMeans |
|      ARDRegression |                  LinearSVM |    SpectralBiclustering |
|  TweedieRegression |                      NuSVM |    SpectralCoclustering |
| TheilSenRegression |            NearestNeighbor |      SpectralClustering |
|    GammaRegression |              NeuralNetwork |               MeanShift |
|   RANSACRegression | PassiveAgressiveClassifier |                  OPTICS |
|       DecisionTree |                 Perceptron |                KMedoids |
|          ExtraTree |               BernoulliRBM |                    ---- |
|       RandomForest |           BoltzmannMachine |                    ---- |
|         ExtraTrees |       CalibratedClassifier |                    ---- |
|                SVM |                   Adaboost |                    ---- |
|          LinearSVM |                    Bagging |                    ---- |
|              NuSVM |           GradientBoosting |                    ---- |
|    NearestNeighbor |        BernoulliNaiveBayes |                    ---- |
|      NeuralNetwork |      CategoricalNaiveBayes |                    ---- |
|         ElasticNet |       ComplementNaiveBayes |                    ---- |
|       BernoulliRBM |         GaussianNaiveBayes |                    ---- |
|   BoltzmannMachine |      MultinomialNaiveBayes |                    ---- |
|           Adaboost |                       ---- |                    ---- |
|            Bagging |                       ---- |                    ---- |
|   GradientBoosting |                       ---- |                    ---- |
+--------------------+----------------------------+-------------------------+

For auto ML:

  • ImageClassifier
  • TextClassifier
  • ImageRegressor
  • TextRegressor
  • StructeredDataClassifier
  • StructeredDataRegressor
  • AutoModel

Quick Start

The help command is very useful to check supported commands and corresponding args/options

$ igel --help

You can also run help on sub-commands, for example:

$ igel fit --help

Igel is highly customizable. If you know what you want and want to configure your model manually, then check the next sections, which will guide you on how to write a yaml or a json config file. After that, you just have to tell igel, what to do and where to find your data and config file. Here is an example:

$ igel fit --data_path 'path_to_your_csv_dataset.csv' --yaml_path 'path_to_your_yaml_file.yaml'

However, you can also use the auto-ml features and let igel do everything for you. A great example for this would be image classification. Let's imagine you already have a dataset of raw images stored in a folder called images

All you have to do is run:

$ igel auto-train --data_path 'path_to_your_images_folder' --task ImageClassification

That's it! Igel will read the images from the directory, process the dataset (converting to matrices, rescale, split, etc...) and start training/optimizing a model that works good on your data. As you can see it's pretty easy, you just have to provide the path to your data and the task you want to perform.

Note

This feature is computationally expensive as igel would try many different models and compare their performance in order to find the 'best' one.

Usage

You can run the help command to get instructions. You can also run help on sub-commands!

$ igel --help

Configuration Step

First step is to provide a yaml file (you can also use json if you want)

You can do this manually by creating a .yaml file (called igel.yaml by convention but you can name if whatever you want) and editing it yourself. However, if you are lazy (and you probably are, like me :D), you can use the igel init command to get started fast, which will create a basic config file for you on the fly.

"""
igel init --help


Example:
If I want to use neural networks to classify whether someone is sick or not using the indian-diabetes dataset,
then I would use this command to initialize a yaml file n.b. you may need to rename outcome column in .csv to sick:

$ igel init -type "classification" -model "NeuralNetwork" -target "sick"
"""
$ igel init

After running the command, an igel.yaml file will be created for you in the current working directory. You can check it out and modify it if you want to, otherwise you can also create everything from scratch.

  • Demo:

../assets/igel-init.gif


# model definition
model:
    # in the type field, you can write the type of problem you want to solve. Whether regression, classification or clustering
    # Then, provide the algorithm you want to use on the data. Here I'm using the random forest algorithm
    type: classification
    algorithm: RandomForest     # make sure you write the name of the algorithm in pascal case
    arguments:
        n_estimators: 100   # here, I set the number of estimators (or trees) to 100
        max_depth: 30       # set the max_depth of the tree

# target you want to predict
# Here, as an example, I'm using the famous indians-diabetes dataset, where I want to predict whether someone have diabetes or not.
# Depending on your data, you need to provide the target(s) you want to predict here
target:
    - sick

In the example above, I'm using random forest to classify whether someone have diabetes or not depending on some features in the dataset I used the famous indian diabetes in this example indian-diabetes dataset)

Notice that I passed n_estimators and max_depth as additional arguments to the model. If you don't provide arguments then the default will be used. You don't have to memorize the arguments for each model. You can always run igel models in your terminal, which will get you to interactive mode, where you will be prompted to enter the model you want to use and type of the problem you want to solve. Igel will then show you information about the model and a link that you can follow to see a list of available arguments and how to use these.

Training

  • The expected way to use igel is from terminal (igel CLI):

Run this command in terminal to fit/train a model, where you provide the path to your dataset and the path to the yaml file

$ igel fit --data_path 'path_to_your_csv_dataset.csv' --yaml_path 'path_to_your_yaml_file.yaml'

# or shorter

$ igel fit -dp 'path_to_your_csv_dataset.csv' -yml 'path_to_your_yaml_file.yaml'

"""
That's it. Your "trained" model can be now found in the model_results folder
(automatically created for you in your current working directory).
Furthermore, a description can be found in the description.json file inside the model_results folder.
"""
  • Demo:

../assets/igel-fit.gif


Evaluation

You can then evaluate the trained/pre-fitted model:

$ igel evaluate -dp 'path_to_your_evaluation_dataset.csv'
"""
This will automatically generate an evaluation.json file in the current directory, where all evaluation results are stored
"""
  • Demo:

../assets/igel-eval.gif


Prediction

Finally, you can use the trained/pre-fitted model to make predictions if you are happy with the evaluation results:

$ igel predict -dp 'path_to_your_test_dataset.csv'
"""
This will generate a predictions.csv file in your current directory, where all predictions are stored in a csv file
"""
  • Demo:

../assets/igel-pred.gif

../assets/igel-predict.gif


Experiment

You can combine the train, evaluate and predict phases using one single command called experiment:

$ igel experiment -DP "path_to_train_data path_to_eval_data path_to_test_data" -yml "path_to_yaml_file"

"""
This will run fit using train_data, evaluate using eval_data and further generate predictions using the test_data
"""
  • Demo:

../assets/igel-experiment.gif


Export

You can export the trained/pre-fitted sklearn model into ONNX:

$ igel export -dp "path_to_pre-fitted_sklearn_model"

"""
This will convert the sklearn model into ONNX
"""

Use igel from python (instead of terminal)

  • Alternatively, you can also write code if you want to:
from igel import Igel

Igel(cmd="fit", data_path="path_to_your_dataset", yaml_path="path_to_your_yaml_file")
"""
check the examples folder for more
"""

Serve the model

The next step is to use your model in production. Igel helps you with this task too by providing the serve command. Running the serve command will tell igel to serve your model. Precisely, igel will automatically build a REST server and serve your model on a specific host and port, which you can configure by passing these as cli options.

The easiest way is to run:

$ igel serve --model_results_dir "path_to_model_results_directory"

Notice that igel needs the --model_results_dir or shortly -res_dir cli option in order to load the model and start the server. By default, igel will serve your model on localhost:8000, however, you can easily override this by providing a host and a port cli options.

$ igel serve --model_results_dir "path_to_model_results_directory" --host "127.0.0.1" --port 8000

Igel uses FastAPI for creating the REST server, which is a modern high performance framework and uvicorn to run it under the hood.


Using the API with the served model

This example was done using a pre-trained model (created by running igel init --target sick -type classification) and the Indian Diabetes dataset under examples/data. The headers of the columns in the original CSV are ‘preg’, ‘plas’, ‘pres’, ‘skin’, ‘test’, ‘mass’, ‘pedi’ and ‘age’.

CURL:

  • Post with single entry for each predictor
$ curl -X POST localhost:8080/predict --header "Content-Type:application/json" -d '{"preg": 1, "plas": 180, "pres": 50, "skin": 12, "test": 1, "mass": 456, "pedi": 0.442, "age": 50}'

Outputs: {"prediction":[[0.0]]}
  • Post with multiple options for each predictor
$ curl -X POST localhost:8080/predict --header "Content-Type:application/json" -d '{"preg": [1, 6, 10], "plas":[192, 52, 180], "pres": [40, 30, 50], "skin": [25, 35, 12], "test": [0, 1, 1], "mass": [456, 123, 155], "pedi": [0.442, 0.22, 0.19], "age": [50, 40, 29]}'

Outputs: {"prediction":[[1.0],[0.0],[0.0]]}

Caveats/Limitations:

  • each predictor used to train the model must make an appearance in your data (i.e. don’t leave any columns out)
  • each list must have the same number of elements or you’ll get an Internal Server Error
  • as an extension of this, you cannot mix single elements and lists (i.e. {“plas”: 0, “pres”: [1, 2]} isn't allowed)
  • the predict function takes a data path arg and reads in the data for you but with serving and calling your served model, you’ll have to parse the data into JSON yourself however, the python client provided in examples/python_client.py will do that for you

Example usage of the Python Client:

from python_client import IgelClient

# the client allows additional args with defaults:
# scheme="http", endpoint="predict", missing_values="mean"
client = IgelClient(host='localhost', port=8080)

# you can post other types of files compatible with what Igel data reading allows
client.post("my_batch_file_for_predicting.csv")

Outputs: <Response 200>: {"prediction":[[1.0],[0.0],[0.0]]}

Overview

The main goal of igel is to provide you with a way to train/fit, evaluate and use models without writing code. Instead, all you need is to provide/describe what you want to do in a simple yaml file.

Basically, you provide description or rather configurations in the yaml file as key value pairs. Here is an overview of all supported configurations (for now):

# dataset operations
dataset:
    type: csv  # [str] -> type of your dataset
    read_data_options: # options you want to supply for reading your data (See the detailed overview about this in the next section)
        sep:  # [str] -> Delimiter to use.
        delimiter:  # [str] -> Alias for sep.
        header:     # [int, list of int] -> Row number(s) to use as the column names, and the start of the data.
        names:  # [list] -> List of column names to use
        index_col: # [int, str, list of int, list of str, False] -> Column(s) to use as the row labels of the DataFrame,
        usecols:    # [list, callable] -> Return a subset of the columns
        squeeze:    # [bool] -> If the parsed data only contains one column then return a Series.
        prefix:     # [str] -> Prefix to add to column numbers when no header, e.g. ‘X’ for X0, X1, …
        mangle_dupe_cols:   # [bool] -> Duplicate columns will be specified as ‘X’, ‘X.1’, …’X.N’, rather than ‘X’…’X’. Passing in False will cause data to be overwritten if there are duplicate names in the columns.
        dtype:  # [Type name, dict maping column name to type] -> Data type for data or columns
        engine:     # [str] -> Parser engine to use. The C engine is faster while the python engine is currently more feature-complete.
        converters: # [dict] -> Dict of functions for converting values in certain columns. Keys can either be integers or column labels.
        true_values: # [list] -> Values to consider as True.
        false_values: # [list] -> Values to consider as False.
        skipinitialspace: # [bool] -> Skip spaces after delimiter.
        skiprows: # [list-like] -> Line numbers to skip (0-indexed) or number of lines to skip (int) at the start of the file.
        skipfooter: # [int] -> Number of lines at bottom of file to skip
        nrows: # [int] -> Number of rows of file to read. Useful for reading pieces of large files.
        na_values: # [scalar, str, list, dict] ->  Additional strings to recognize as NA/NaN.
        keep_default_na: # [bool] ->  Whether or not to include the default NaN values when parsing the data.
        na_filter: # [bool] -> Detect missing value markers (empty strings and the value of na_values). In data without any NAs, passing na_filter=False can improve the performance of reading a large file.
        verbose: # [bool] -> Indicate number of NA values placed in non-numeric columns.
        skip_blank_lines: # [bool] -> If True, skip over blank lines rather than interpreting as NaN values.
        parse_dates: # [bool, list of int, list of str, list of lists, dict] ->  try parsing the dates
        infer_datetime_format: # [bool] -> If True and parse_dates is enabled, pandas will attempt to infer the format of the datetime strings in the columns, and if it can be inferred, switch to a faster method of parsing them.
        keep_date_col: # [bool] -> If True and parse_dates specifies combining multiple columns then keep the original columns.
        dayfirst: # [bool] -> DD/MM format dates, international and European format.
        cache_dates: # [bool] -> If True, use a cache of unique, converted dates to apply the datetime conversion.
        thousands: # [str] -> the thousands operator
        decimal: # [str] -> Character to recognize as decimal point (e.g. use ‘,’ for European data).
        lineterminator: # [str] -> Character to break file into lines.
        escapechar: # [str] ->  One-character string used to escape other characters.
        comment: # [str] -> Indicates remainder of line should not be parsed. If found at the beginning of a line, the line will be ignored altogether. This parameter must be a single character.
        encoding: # [str] -> Encoding to use for UTF when reading/writing (ex. ‘utf-8’).
        dialect: # [str, csv.Dialect] -> If provided, this parameter will override values (default or not) for the following parameters: delimiter, doublequote, escapechar, skipinitialspace, quotechar, and quoting
        delim_whitespace: # [bool] -> Specifies whether or not whitespace (e.g. ' ' or '    ') will be used as the sep
        low_memory: # [bool] -> Internally process the file in chunks, resulting in lower memory use while parsing, but possibly mixed type inference.
        memory_map: # [bool] -> If a filepath is provided for filepath_or_buffer, map the file object directly onto memory and access the data directly from there. Using this option can improve performance because there is no longer any I/O overhead.

    random_numbers: # random numbers options in case you wanted to generate the same random numbers on each run
        generate_reproducible:  # [bool] -> set this to true to generate reproducible results
        seed:   # [int] -> the seed number is optional. A seed will be set up for you if you didn't provide any

    split:  # split options
        test_size: 0.2  #[float] -> 0.2 means 20% for the test data, so 80% are automatically for training
        shuffle: true   # [bool] -> whether to shuffle the data before/while splitting
        stratify: None  # [list, None] -> If not None, data is split in a stratified fashion, using this as the class labels.

    preprocess: # preprocessing options
        missing_values: mean    # [str] -> other possible values: [drop, median, most_frequent, constant] check the docs for more
        encoding:
            type: oneHotEncoding  # [str] -> other possible values: [labelEncoding]
        scale:  # scaling options
            method: standard    # [str] -> standardization will scale values to have a 0 mean and 1 standard deviation  | you can also try minmax
            target: inputs  # [str] -> scale inputs. | other possible values: [outputs, all] # if you choose all then all values in the dataset will be scaled


# model definition
model:
    type: classification    # [str] -> type of the problem you want to solve. | possible values: [regression, classification, clustering]
    algorithm: NeuralNetwork    # [str (notice the pascal case)] -> which algorithm you want to use. | type igel algorithms in the Terminal to know more
    arguments:          # model arguments: you can check the available arguments for each model by running igel help in your terminal
    use_cv_estimator: false     # [bool] -> if this is true, the CV class of the specific model will be used if it is supported
    cross_validate:
        cv: # [int] -> number of kfold (default 5)
        n_jobs:   # [signed int] -> The number of CPUs to use to do the computation (default None)
        verbose: # [int] -> The verbosity level. (default 0)
    hyperparameter_search:
        method: grid_search   # method you want to use: grid_search and random_search are supported
        parameter_grid:     # put your parameters grid here that you want to use, an example is provided below
            param1: [val1, val2]
            param2: [val1, val2]
        arguments:  # additional arguments you want to provide for the hyperparameter search
            cv: 5   # number of folds
            refit: true   # whether to refit the model after the search
            return_train_score: false   # whether to return the train score
            verbose: 0      # verbosity level

# target you want to predict
target:  # list of strings: basically put here the column(s), you want to predict that exist in your csv dataset
    - put the target you want to predict here
    - you can assign many target if you are making a multioutput prediction

Read Data Options

Note

igel uses pandas under the hood to read & parse the data. Hence, you can find this data optional parameters also in the pandas official documentation.

A detailed overview of the configurations you can provide in the yaml (or json) file is given below. Notice that you will certainly not need all the configuration values for the dataset. They are optional. Generally, igel will figure out how to read your dataset.

However, you can help it by providing extra fields using this read_data_options section. For example, one of the helpful values in my opinion is the "sep", which defines how your columns in the csv dataset are separated. Generally, csv datasets are separated by commas, which is also the default value here. However, it may be separated by a semicolon in your case.

Hence, you can provide this in the read_data_options. Just add the sep: ";" under read_data_options.

Supported Read Data Options

ParameterTypeExplanation
sepstr, default ‘,’Delimiter to use. If sep is None, the C engine cannot automatically detect the separator, but the Python parsing engine can, meaning the latter will be used and automatically detect the separator by Python’s builtin sniffer tool, csv.Sniffer. In addition, separators longer than 1 character and different from 's+' will be interpreted as regular expressions and will also force the use of the Python parsing engine. Note that regex delimiters are prone to ignoring quoted data. Regex example: 'rt'.
delimiterdefault NoneAlias for sep.
headerint, list of int, default ‘infer’Row number(s) to use as the column names, and the start of the data. Default behavior is to infer the column names: if no names are passed the behavior is identical to header=0 and column names are inferred from the first line of the file, if column names are passed explicitly then the behavior is identical to header=None. Explicitly pass header=0 to be able to replace existing names. The header can be a list of integers that specify row locations for a multi-index on the columns e.g. [0,1,3]. Intervening rows that are not specified will be skipped (e.g. 2 in this example is skipped). Note that this parameter ignores commented lines and empty lines if skip_blank_lines=True, so header=0 denotes the first line of data rather than the first line of the file.
namesarray-like, optionalList of column names to use. If the file contains a header row, then you should explicitly pass header=0 to override the column names. Duplicates in this list are not allowed.
index_colint, str, sequence of int / str, or False, default NoneColumn(s) to use as the row labels of the DataFrame, either given as string name or column index. If a sequence of int / str is given, a MultiIndex is used. Note: index_col=False can be used to force pandas to not use the first column as the index, e.g. when you have a malformed file with delimiters at the end of each line.
usecolslist-like or callable, optionalReturn a subset of the columns. If list-like, all elements must either be positional (i.e. integer indices into the document columns) or strings that correspond to column names provided either by the user in names or inferred from the document header row(s). For example, a valid list-like usecols parameter would be [0, 1, 2] or ['foo', 'bar', 'baz']. Element order is ignored, so usecols=[0, 1] is the same as [1, 0]. To instantiate a DataFrame from data with element order preserved use pd.read_csv(data, usecols=['foo', 'bar'])[['foo', 'bar']] for columns in ['foo', 'bar'] order or pd.read_csv(data, usecols=['foo', 'bar'])[['bar', 'foo']] for ['bar', 'foo'] order. If callable, the callable function will be evaluated against the column names, returning names where the callable function evaluates to True. An example of a valid callable argument would be lambda x: x.upper() in ['AAA', 'BBB', 'DDD']. Using this parameter results in much faster parsing time and lower memory usage.
squeezebool, default FalseIf the parsed data only contains one column then return a Series.
prefixstr, optionalPrefix to add to column numbers when no header, e.g. ‘X’ for X0, X1, …
mangle_dupe_colsbool, default TrueDuplicate columns will be specified as ‘X’, ‘X.1’, …’X.N’, rather than ‘X’…’X’. Passing in False will cause data to be overwritten if there are duplicate names in the columns.
dtype{‘c’, ‘python’}, optionalParser engine to use. The C engine is faster while the python engine is currently more feature-complete.
convertersdict, optionalDict of functions for converting values in certain columns. Keys can either be integers or column labels.
true_valueslist, optionalValues to consider as True.
false_valueslist, optionalValues to consider as False.
skipinitialspacebool, default FalseSkip spaces after delimiter.
skiprowslist-like, int or callable, optionalLine numbers to skip (0-indexed) or number of lines to skip (int) at the start of the file. If callable, the callable function will be evaluated against the row indices, returning True if the row should be skipped and False otherwise. An example of a valid callable argument would be lambda x: x in [0, 2].
skipfooterint, default 0Number of lines at bottom of file to skip (Unsupported with engine=’c’).
nrowsint, optionalNumber of rows of file to read. Useful for reading pieces of large files.
na_valuesscalar, str, list-like, or dict, optionalAdditional strings to recognize as NA/NaN. If dict passed, specific per-column NA values. By default the following values are interpreted as NaN: ‘’, ‘#N/A’, ‘#N/A N/A’, ‘#NA’, ‘-1.#IND’, ‘-1.#QNAN’, ‘-NaN’, ‘-nan’, ‘1.#IND’, ‘1.#QNAN’, ‘<NA>’, ‘N/A’, ‘NA’, ‘NULL’, ‘NaN’, ‘n/a’, ‘nan’, ‘null’.
keep_default_nabool, default TrueWhether or not to include the default NaN values when parsing the data. Depending on whether na_values is passed in, the behavior is as follows: If keep_default_na is True, and na_values are specified, na_values is appended to the default NaN values used for parsing. If keep_default_na is True, and na_values are not specified, only the default NaN values are used for parsing. If keep_default_na is False, and na_values are specified, only the NaN values specified na_values are used for parsing. If keep_default_na is False, and na_values are not specified, no strings will be parsed as NaN. Note that if na_filter is passed in as False, the keep_default_na and na_values parameters will be ignored.
na_filterbool, default TrueDetect missing value markers (empty strings and the value of na_values). In data without any NAs, passing na_filter=False can improve the performance of reading a large file.
verbosebool, default FalseIndicate number of NA values placed in non-numeric columns.
skip_blank_linesbool, default TrueIf True, skip over blank lines rather than interpreting as NaN values.
parse_datesbool or list of int or names or list of lists or dict, default FalseThe behavior is as follows: boolean. If True -> try parsing the index. list of int or names. e.g. If [1, 2, 3] -> try parsing columns 1, 2, 3 each as a separate date column. list of lists. e.g. If [[1, 3]] -> combine columns 1 and 3 and parse as a single date column. dict, e.g. {‘foo’ : [1, 3]} -> parse columns 1, 3 as date and call result ‘foo’ If a column or index cannot be represented as an array of datetimes, say because of an unparseable value or a mixture of timezones, the column or index will be returned unaltered as an object data type.
infer_datetime_formatbool, default FalseIf True and parse_dates is enabled, pandas will attempt to infer the format of the datetime strings in the columns, and if it can be inferred, switch to a faster method of parsing them. In some cases this can increase the parsing speed by 5-10x.
keep_date_colbool, default FalseIf True and parse_dates specifies combining multiple columns then keep the original columns.
date_parserfunction, optionalFunction to use for converting a sequence of string columns to an array of datetime instances. The default uses dateutil.parser.parser to do the conversion. Pandas will try to call date_parser in three different ways, advancing to the next if an exception occurs: 1) Pass one or more arrays (as defined by parse_dates) as arguments; 2) concatenate (row-wise) the string values from the columns defined by parse_dates into a single array and pass that; and 3) call date_parser once for each row using one or more strings (corresponding to the columns defined by parse_dates) as arguments.
dayfirstbool, default FalseDD/MM format dates, international and European format.
cache_datesbool, default TrueIf True, use a cache of unique, converted dates to apply the datetime conversion. May produce significant speed-up when parsing duplicate date strings, especially ones with timezone offsets.
thousandsstr, optionalThousands separator.
decimalstr, default ‘.’Character to recognize as decimal point (e.g. use ‘,’ for European data).
lineterminatorstr (length 1), optionalCharacter to break file into lines. Only valid with C parser.
escapecharstr (length 1), optionalOne-character string used to escape other characters.
commentstr, optionalIndicates remainder of line should not be parsed. If found at the beginning of a line, the line will be ignored altogether.
encodingstr, optionalEncoding to use for UTF when reading/writing (ex. ‘utf-8’).
dialectstr or csv.Dialect, optionalIf provided, this parameter will override values (default or not) for the following parameters: delimiter, doublequote, escapechar, skipinitialspace, quotechar, and quoting
low_memorybool, default TrueInternally process the file in chunks, resulting in lower memory use while parsing, but possibly mixed type inference. To ensure no mixed types either set False, or specify the type with the dtype parameter. Note that the entire file is read into a single DataFrame regardless,
memory_mapbool, default Falsemap the file object directly onto memory and access the data directly from there. Using this option can improve performance because there is no longer any I/O overhead.

E2E Example

A complete end to end solution is provided in this section to prove the capabilities of igel. As explained previously, you need to create a yaml configuration file. Here is an end to end example for predicting whether someone have diabetes or not using the decision tree algorithm. The dataset can be found in the examples folder.

  • Fit/Train a model:
model:
    type: classification
    algorithm: DecisionTree

target:
    - sick
$ igel fit -dp path_to_the_dataset -yml path_to_the_yaml_file

That's it, igel will now fit the model for you and save it in a model_results folder in your current directory.

  • Evaluate the model:

Evaluate the pre-fitted model. Igel will load the pre-fitted model from the model_results directory and evaluate it for you. You just need to run the evaluate command and provide the path to your evaluation data.

$ igel evaluate -dp path_to_the_evaluation_dataset

That's it! Igel will evaluate the model and store statistics/results in an evaluation.json file inside the model_results folder

  • Predict:

Use the pre-fitted model to predict on new data. This is done automatically by igel, you just need to provide the path to your data that you want to use prediction on.

$ igel predict -dp path_to_the_new_dataset

That's it! Igel will use the pre-fitted model to make predictions and save it in a predictions.csv file inside the model_results folder

Advanced Usage

You can also carry out some preprocessing methods or other operations by providing them in the yaml file. Here is an example, where the data is split to 80% for training and 20% for validation/testing. Also, the data are shuffled while splitting.

Furthermore, the data are preprocessed by replacing missing values with the mean ( you can also use median, mode etc..). check this link for more information

# dataset operations
dataset:
    split:
        test_size: 0.2
        shuffle: True
        stratify: default

    preprocess: # preprocessing options
        missing_values: mean    # other possible values: [drop, median, most_frequent, constant] check the docs for more
        encoding:
            type: oneHotEncoding  # other possible values: [labelEncoding]
        scale:  # scaling options
            method: standard    # standardization will scale values to have a 0 mean and 1 standard deviation  | you can also try minmax
            target: inputs  # scale inputs. | other possible values: [outputs, all] # if you choose all then all values in the dataset will be scaled

# model definition
model:
    type: classification
    algorithm: RandomForest
    arguments:
        # notice that this is the available args for the random forest model. check different available args for all supported models by running igel help
        n_estimators: 100
        max_depth: 20

# target you want to predict
target:
    - sick

Then, you can fit the model by running the igel command as shown in the other examples

$ igel fit -dp path_to_the_dataset -yml path_to_the_yaml_file

For evaluation

$ igel evaluate -dp path_to_the_evaluation_dataset

For production

$ igel predict -dp path_to_the_new_dataset

Examples

In the examples folder in the repository, you will find a data folder,where the famous indian-diabetes, iris dataset and the linnerud (from sklearn) datasets are stored. Furthermore, there are end to end examples inside each folder, where there are scripts and yaml files that will help you get started.

The indian-diabetes-example folder contains two examples to help you get started:

  • The first example is using a neural network, where the configurations are stored in the neural-network.yaml file
  • The second example is using a random forest, where the configurations are stored in the random-forest.yaml file

The iris-example folder contains a logistic regression example, where some preprocessing (one hot encoding) is conducted on the target column to show you more the capabilities of igel.

Furthermore, the multioutput-example contains a multioutput regression example. Finally, the cv-example contains an example using the Ridge classifier using cross validation.

You can also find a cross validation and a hyperparameter search examples in the folder.

I suggest you play around with the examples and igel cli. However, you can also directly execute the fit.py, evaluate.py and predict.py if you want to.

Auto ML Examples

ImageClassification

First, create or modify a dataset of images that are categorized into sub-folders based on the image label/class For example, if you are have dogs and cats images, then you will need 2 sub-folders:

  • folder 0, which contains cats images (here the label 0 indicates a cat)
  • folder 1, which contains dogs images (here the label 1 indicates a dog)

Assuming these two sub-folder are contained in one parent folder called images, just feed data to igel:

$ igel auto-train -dp ./images --task ImageClassification

Igel will handle everything from pre-processing the data to optimizing hyperparameters. At the end, the best model will be stored in the current working dir.

TextClassification

First, create or modify a text dataset that are categorized into sub-folders based on the text label/class For example, if you are have a text dataset of positive and negative feedbacks, then you will need 2 sub-folders:

  • folder 0, which contains negative feedbacks (here the label 0 indicates a negative one)
  • folder 1, which contains positive feedbacks (here the label 1 indicates a positive one)

Assuming these two sub-folder are contained in one parent folder called texts, just feed data to igel:

$ igel auto-train -dp ./texts --task TextClassification

Igel will handle everything from pre-processing the data to optimizing hyperparameters. At the end, the best model will be stored in the current working dir.

GUI

You can also run the igel UI if you are not familiar with the terminal. Just install igel on your machine as mentioned above. Then run this single command in your terminal

$ igel gui

This will open up the gui, which is very simple to use. Check examples of how the gui looks like and how to use it here: https://github.com/nidhaloff/igel-ui

Running with Docker

  • Use the official image (recommended):

You can pull the image first from docker hub

$ docker pull nidhaloff/igel

Then use it:

$ docker run -it --rm -v $(pwd):/data nidhaloff/igel fit -yml 'your_file.yaml' -dp 'your_dataset.csv'
  • Alternatively, you can create your own image locally if you want:

You can run igel inside of docker by first building the image:

$ docker build -t igel .

And then running it and attaching your current directory (does not need to be the igel directory) as /data (the workdir) inside of the container:

$ docker run -it --rm -v $(pwd):/data igel fit -yml 'your_file.yaml' -dp 'your_dataset.csv'

Links

Help/GetHelp

If you are facing any problems, please feel free to open an issue. Additionally, you can make contact with the author for further information/questions.

Do you like igel? You can always help the development of this project by:

  • Following on github and/or twitter
  • Star the github repo
  • Watch the github repo for new releases
  • Tweet about the package
  • Help others with issues on github
  • Create issues and pull requests
  • Sponsor the project

Contributions

You think this project is useful and you want to bring new ideas, new features, bug fixes, extend the docs?

Contributions are always welcome. Make sure you read the guidelines first

Note

I'm also working on a GUI desktop app for igel based on people's requests. You can find it under Igel-UI.

Download Details:

Author: Nidhaloff
Source Code: https://github.com/nidhaloff/igel 
License: MIT license

#machinelearning #datascience #automation #neuralnetwork 

Igel: A Delightful ML tool That Allows You To Train, Test
Lawrence  Lesch

Lawrence Lesch

1673456280

Stryker-js: Mutation Testing for JavaScript and Friends

StrykerJS

Professor X: For someone who hates mutants... you certainly keep some strange company. William Stryker: Oh, they serve their purpose... as long as they can be controlled.

Welcome to StrykerJS's monorepo. This is where all official stryker packages are maintained. If you're new to monorepos: don't be scared. You'll find the packages in the packages folder.

If you're interested in why we chose a monorepo, please read babeljs's design document about monorepos. We use it for the same reasons as they do.

Introduction

For an introduction to mutation testing and Stryker's features, see stryker-mutator.io.

Getting started

Please follow the quickstart on the website.

For small js projects, you can try the following command:

npm install --save-dev @stryker-mutator/core
# Only for small projects:
npx stryker run

It will run stryker with default values:

  • Uses npm test as your test command
  • Searches for files to mutate in the lib and src directories

Usage

$ npx stryker <command> [options] [configFile]

See usage on stryker-mutator.io

Supported mutators

See our website for the list of currently supported mutators.

Configuration

See configuration on stryker-mutator.io.

Download Details:

Author: Stryker-mutator
Source Code: https://github.com/stryker-mutator/stryker-js 
License: Apache-2.0 license

#typescript #javascript #testing #automation 

Stryker-js: Mutation Testing for JavaScript and Friends
Automation Bro

Automation Bro

1673446274

Can an AI (ChatGPT) build test automation scripts from scratch?

In the video below, I will be utilizing ChatGPT, to assist me in building test automation scripts using the Cypress framework. I will ask the chatbot five questions, each of which will become progressively more challenging in order to test its knowledge and capabilities.

https://youtu.be/84O4_JqBqVE

 

#AI #testing #automation #softwaretesting #cypress #javascript 

Can an AI (ChatGPT) build test automation scripts from scratch?
Automation Bro

Automation Bro

1672625488

Read CSV file using Cypress fixture

In this tutorial, we will learn how to read a CSV (Comma Separated Values) file in Cypress for data-driven testing with the help of fixture. Data-driven testing is a testing technique where the test inputs and expected results are read from a data source, such as a CSV file. This allows us to run the same test multiple times with different data sets, reducing the amount of code we need to write and maintain.

Prerequisites:

  • Basic knowledge of Cypress and JavaScript
  • Node.js and npm installed on your machine
  • A CSV file to use as a data source

https://youtu.be/8h1pUyVHqn0

#testing #cypress #javascript #automation #csv 

Read CSV file using Cypress fixture
Nigel  Uys

Nigel Uys

1672307280

AWX: Provides a web-based user interface, REST API

AWX provides a web-based user interface, REST API, and task engine built on top of Ansible. It is one of the upstream projects for Red Hat Ansible Automation Platform.

To install AWX, please view the Install guide.

To learn more about using AWX, and Tower, view the Tower docs site.

The AWX Project Frequently Asked Questions can be found here.

The AWX logos and branding assets are covered by our trademark guidelines.

Contributing

  • Refer to the Contributing guide to get started developing, testing, and building AWX.
  • All code submissions are made through pull requests against the devel branch.
  • All contributors must use git commit --signoff for any commit to be merged and agree that usage of --signoff constitutes agreement with the terms of DCO 1.1
  • Take care to make sure no merge commits are in the submission, and use git rebase vs. git merge for this reason.
  • If submitting a large code change, it's a good idea to join the #ansible-awx channel on web.libera.chat and talk about what you would like to do or add first. This not only helps everyone know what's going on, but it also helps save time and effort if the community decides some changes are needed.

Reporting Issues

If you're experiencing a problem that you feel is a bug in AWX or have ideas for improving AWX, we encourage you to open an issue and share your feedback. But before opening a new issue, we ask that you please take a look at our Issues guide.

Code of Conduct

We ask all of our community members and contributors to adhere to the Ansible code of conduct. If you have questions or need assistance, please reach out to our community team at codeofconduct@ansible.com

Get Involved

We welcome your feedback and ideas. Here's how to reach us with feedback and questions:

  • Join the #ansible-awx channel on irc.libera.chat
  • Join the mailing list

Download Details:

Author: Ansible
Source Code: https://github.com/ansible/awx 
License: View license

#ansible #python #automation #django 

AWX: Provides a web-based user interface, REST API

Jeevi Academy

1672143538

7 Best Chrome Extensions for UI/UX Designers | Jeevisoft |

#chromeextension #chrome #extension #ux #uxbook #contentmarketing #design #principles #gooddesign ##ui #userinterface #services #academy #userflow #userjourney #devops #automation #designer #gestalt #ux #designer #skills #interviewquestions #aws #docker#interviewquestions #interview #aws #scenario #cheatsheet #solutionarchitect #azure #ansibleinterview #questions #Devops #interview #guideline #Terraform #cheatsheet #interview #steps #localbusiness #business #videocreating #containor #devops #interview #opportunities #findabestway #certification #top #digitalmarketing #seo #mail #ppc #socialmediamarketing #shorts #technology #frontend #developer #youtube#programming #learn #tech #technology #trending #beginners #worldnews #creative #knowledge #academy #shorts #youtubeshorts #youtube #aws #docker #ui #website #webdesign #development #developer 

7 Best Chrome Extensions for UI/UX Designers | Jeevisoft |

Learn Zapier Automation Inspiration for Entrepreneurs

Zapier is a powerful automation tool that can be used to create complex workflows. But when you’re new to automation and integration tools, it can be hard to get started. Check out these simple starter workflows for Zapier automation inspiration that will benefit you from day one.

Zapier connects your favorite apps and services. These connections allow teams and individuals to automate more of their workflows. Every connection between two apps is called a “Zap” and they’re designed to make you work smarter, not harder. With over 4,000 apps to choose from, there are plenty of practical Zapier workflows for entrepreneurs to build.

1. Create Mailchimp Subscribers from Typeform or PayPal

PayPal to Mailchimp workflow

How many marketing automation flows have you got set up in your business? Email is a perfect fit for this. Mailchimp is an email marketing service that lets you design emails, maintain a list of subscribers, email those subscribers, and then monitor the results of that campaign. But what happens when a customer isn’t a subscriber?

With this PayPal → Mailchimp workflow, Zapier monitors all sales made via PayPal and enters the customer’s email address into your email campaign in Mailchimp, effectively helping you to retain each new customer.

Typeform to Mailchimp workflow

You can do the same thing with Typeform. Let’s say that you’re using Typeform to survey your website’s visitors about the user experience that they received. The Typeform → Mailchimp Zap will retain the user’s email address in Mailchimp so you can tell that user about the new version of your website, derived from the feedback that they originally offered.

2. Create Trello Cards from Gmail Emails and Update in Slack

Gmail to Trello workflow

With the influx of morning emails, it can be near-impossible to convert each one into an actionable task in a sensible amount of time. The Gmail → Trello Zap can fix that, helping you to reach “inbox zero” much faster. In short, Zapier scans your Gmail for emails with a certain label and creates a formatted Trello card from them.

Trello to Slack workflow

The best way to get some automation inspiration is to start with simple workflows and identify ways of taking things further as you become more familiar with your tool. You can take this Zap a step further by automatically updating teammates about this new Trello card in Slack, so that the team can be notified about the task and complete it. All that from a simple label!

3. Get Notified of New Dropbox Files in Slack

Dropbox to Slack workflow

I can think of a number of reasons why this Dropbox Slack setup is epic. Firstly, Dropbox really drains your battery because it constantly checks for file updates, so having notifications in Slack can quite literally save you hours of battery life. But don’t worry, not only can you be notified in Slack of new Dropbox files, but Slack can import the file, making it searchable and downloadable from within Slack. Dropbox doesn’t even need to be switched on!

4. Create and Publish WordPress Posts in Evernote

Evernote to WordPress workflow

Hate using a CMS? I feel you. Bloggers like to stay focused on the words, which is why a Zapier integration that allows you to control WordPress articles in Evernote, a note-taking app, is such a neat idea. Essentially you create a fresh notebook and Zapier uploads the “notes” to WordPress.

Notes can also be tagged and dated, which makes it super easy to categorize articles. You can make Evernote your new CMS today!

5. Record Both PayPal and Stripe Sales in Google Sheets

PayPal and Stripe to Google Sheets workflow

By using these two Zaps you’ll be able to automate the ordeal of having to export transaction data into Google Sheets, but rather than replacing a spreadsheet with an updated version, both the PayPal and Stripe Zaps log new sales on a new row. If you use PayPal and Stripe you could set up both integrations to copy new sales into the same spreadsheet.

6. Copy Google Drive Files to Dropbox (or Vice Versa)

Google Drive to Dropbox workflow

Half the team uses Google Drive and the other half uses Dropbox — a typical “sigh…” situation. Everybody has their own reason for choosing one over the other.

Luckily, we can keep the two cloud storage services in sync by copying Google Drive files to Dropbox and vice-versa!

7. Create Trello Cards from Todoist Tasks

Many teams use Trello for task management, but for personal task management, services like Todoist are more popular. Even though I work within various large Trello boards I still find it useful to keep a private to-do list that incorporates only my tasks — this way I don’t have to sift through my assigned tasks in multiple Trello boards.

But rather than having to manage Trello boards and Todoist lists simultaneously, this Todoist → Trello Zap (or again, the vice-versa) can convert your listed tasks from Todoist into Trello cards, a workflow that benefits the task doers as well as the task managers.

Zapier Automation Inspiration: Next Steps

Zapier is that one friend that brings the group together — the organizer, the one that makes sure everybody is getting along. I only mentioned a handful of Zaps in this article, so I’d encourage you to search Zapier’s database to see how you can bridge the gap between the services that you use and get even more Zapier automation inspiration.

Want to build more complex automations? Check out some of our guides that use Zapier as integration glue:

Original article source at: https://www.sitepoint.com/

#automation #entrepreneurs 

Learn Zapier Automation Inspiration for Entrepreneurs

6 Best RPA Use-Cases for industry Automation

Introduction to Robotic Process Automation (RPA)

Humans are again entering into the new era of automation. Robotic Process Automation is one transformation that is automating our daily repetitive tasks. Like Chatbots and AI, RPA enables a higher efficiency in human actions. With RPA, we get a virtual employee who can perform repetitive activity faster and more cost-effectively than humans.

Simply put, RPA has been shown to generate quick and high levels of ROI (Return On Investment) for customers. Source: Robotic Process Automation in the software market

How to implement RPA with Humans?

The companies that implemented the RPA do not think of replacing their employees with automated bots. Instead, they want to relocate these workers to focus on creative and strategic work rather than repetitive work. RPA can prove to be highly beneficial when it comes to the market.

  • RPA bots automate data 24/7 without any break. It never makes any mistake. It hampers your team productivity. They work tirelessly with the same consistency and accuracy as humanoid robots made of software code.
  • With RPA, we can watch the customer activities for up-selling by targeting and preparing data for customer subscription or warranty renewals. Collection of data through web-scraping for both marketing and sale activities.
  • Bots can be programmed to monitor a client's policy status and identify discounts and bundles opportunities. Through this, we can send highly segmented emails to maximize sales opportunities.
  • RPA works alongside the IT structure. It just needs to be trained on how to use it. RPA uses the same graphic user interface that human workers would use to complete tasks, ensuring that the IT landscape need not be changed to accommodate RPA with the minimum cost.

RPA is helping financial institutions to provide 24/7 support for important activities and processes. Click to explore about, Robotic Process Automation for Financial Services

Top Use Cases of RPA Implementation
 

There are various RPA Use Cases in the trending industries to stay competitive. Below are the industry type and their challenges and solution to the problems.

How is RPA used in Human Resources?

RPA bots can compare resumes with the description for a particular job and shortlist those resumes. With the help of RPA, offer letters are customized according to the selected candidate.It also helps to check and keep track of time-to-time company reviews. Bots allow HR to manage the data of employees effectively. It also helps to verify the history of a team member.

Employee-On Boarding

Problem: On-boarding, an employee, can be a very tedious task. RPA can help us with this. In this use case. When an employee is boarded, their details have to be filled in the system. The details have to be fetched from the offer letter, filled duty form, and other documents.

Process:

  • Log in to the portal.
  • Search for the employee in the portal.
  • If the employee does not exist, then a mail will be generated.
  • If the employee exists, then data will be extracted from the passport, which is in the Folder. (The folder will be with the employee id)
  • Fill up the extracted data and complete the sections.
  • Choose the particular option from the drop-down as per requirement.
  • In section 3: The Performance evaluation will be selected based on the table given.
  • In Sponsor name: We will check for the business unit if it is citymax or not. For "Citymax," We will pick the name from the duty form.
  • And other business units will open SharePoint and search for employees, extract the "Labor Card number," and consider it a work permit number.
  • The work permit number will be searched on the Mohre portal, and the company code will be extracted from there.    
  • Read the offer letter from the Folder and extract salary details and fill in Section- 4.

Employee on Boarding RPA

Night Audit Process

Problem: The night audit process is an end-of-day process. This process takes up all the hotel's financial activities that have taken place in one day and adds them to their appropriate accounts. 

Process:

  •  Log in to the opera portal.
  • Go to "Front Desk"
  • Search for room type "PI."
  • Open room "9509".
  • The value of "Blocks not Picked Up" should be 0.
  • If not 0, then no nights will be updated from 0 to1, and then repeat "PI" room check.
  •  Else Open "End of Day routine" and log in.
  • Check for the country that no entry should be blank. If any country block is empty, then login to vicas and get the country from there.
  • If any arrival has not been checked in, it should be canceled.
  • Next, check the balance of PI rooms if zero or not. If not zero, the number of nights is increased from 0 to 1.
  • Cashier closing will be done, and the end of the day of notes will be added.
  • The message should be broadcasted to all terminals to log out.
  • Run the "End of day" procedure and wait for it to complete.
  • Run the final report and Print it if needed.

RPA in Night Audit Process

Resume Automation Process

Problem: HR gets a lot of resumes daily. Saving their data or daily downloading the resumes and sending the same message to every person manually can be a difficult task. So we can automate this process through RPA in human resources management by which all the resumes will get downloaded and stored into a folder, and all the mail ids will be extracted in an excel sheet. Then we can easily send the message to every person at the same time without spending so much time.

Process:

  • Gmail will be logged in by the bot.
  • It will check for unread emails.
  • The mails which contain the keyword resume anywhere a mail will be shared with them with a google form.
  •  The person will enter the details.
  • The data will be extracted from the google form. '
  • There we can apply the filtration of qualification and more constraints as per our creed rules.
  • A mail will be set for their further selection procedure.
  • An excel sheet will be created with all mail IDs and shared with HR.

RPA in HR for Resume Automation Process

Invoice Processing 

Problem:

  1. We receive many invoices that we have to update and send for approvals. This is a time-consuming task, and this work gets completed when the payment is made and updated. We receive invoices in the form of PDFs. The data has been copied from the PDFs to the portal for correct accounts and sent the invoices for approval. This manual work contains many chances of human errors and is time-consuming.
  2. This issue of invoice processing can be solved with RPA. All the data will be completed with automation without any human intervention. This work will be accomplished. There will be no scope for mistakes.

Process

  • Save the invoices received by mail from vendors.
  • Save all the attachments on the local folder.
  • Create a process to extract the data from the Invoice and store them in variables
  • Open portal. Search for account
  • Add the portal, and Data's details are successfully updated on the portal.
  • The invoice will be saved on the system for records. And can also be shared by mail.
  • And the process continues with the next invoice.

RPA in Invoice Processing

Business Application Use Cases of RPA in Manufacturing

  • One of the primary benefits of RPA in manufacturing is that it can generate accurate reports of production.
  • RPA can automate emails, monitor inventory levels, and simplify paperwork digitization in inventory management.
  • RPA use cases in manufacturing shows how bots can automate bills of material by extracting data and providing accuracy in data, leading to fewer transactional issues and errors.
  • PODs(Proof of Delivery) are essential documents for the customer service department of manufacturers. These documents contain a high risk for human errors and are highly labor-intensive. These problems can be solved with the help of RPA bots.

RPA use cases in the Retail Industry

  • Bots can extract data to help businesses categorize products and identify their market share in different regions. This also helps in saving countless hours of work.
  • Returning any product involves a lot of formalities and processing. Bots enable checking the record and quickening the entire process of return.

Robotic Process Automation in Telecommunications

Telecommunication is one of the industries that make use of RPA at another level. Some of the significant use cases are:

  • Bots can assist customers by offering guidance on their first call.
  • Automation can provide comparative price analysis to a telecom company.
  • RPA can offer assistance to address faults in real-time with negligible human intercession.

How RPA can help in healthcare?

  • RPA can provide automation processes to the healthcare organization, from operational processes to patient interaction and bill payment.
  • Bots can manage and schedule patient appointments.
  • RPA can use the document digitization process to prepare documents.
  • Bots can track patient records, medical records, etc.

How can RPA help in insurance?

RPA can give operational efficiency to insurance companies. RPA use cases in insurance:

  • Robotic process automation refers to bots doing the repetitive work of human workers, such as information collection of customers, data extraction, and so on.
  • Insurers fetch data automatically from registration forms with the help of robotic process automation.
  • RPA increases data reliability by replacing the manual process and removing human errors.
  • Bots can deal with different data formats to extract essential and relevant data.

What is use of RPA in IT industry?

RPA has a vast amount of uses in Information Technology. Here are some mainly used use cases:

  • The user login management system automatically uses OTP generation to secure login, reset a password, etc.
  • RPA provides temporary admin access according to companies' needs.
  • Server crashes and downtime are a nightmare for every IT department. RPA automatically reboots, shuts, restarts, and reconfigures various types of servers. It helps organizations to reduce IT operational costs and save time.
  • With a single click, complex systems can be installed quickly and in a short period by using RPA.

RPA in Banking

  • RPA helps banks and accounting departments to automate repetitive manual processes. And allows the employees to do more critical tasks.
  • With the help of RPA, it becomes a quick and straightforward process to open an account.
  • KYC(Know your customer) and AML(Anti-money laundering) these processes can be easily handled with the help of RPA.
  • RPA makes it easy to track accounts and send automated notifications for the required document submissions.
  • To generate audit reports, the manual process takes several hours. Still, it can be completed in minutes with the help of RPA bots.

Conclusion

Industries have now started implementing Automation with Robotic Process Automation technologies to minimize human errors. In one way, reducing human resources creates the ability to increase productivity. Using RPA technologies in their back-end operations, they can save 40% in various fields of business. Be it any of the RPA Use Cases in industries mentioned above.

Original article source at: https://www.xenonstack.com/

#rpa #automation 

6 Best RPA Use-Cases for industry Automation
Automation Bro

Automation Bro

1670246422

Playwright with Typescript tutorial on FIFA World cup site ⚽️

In this video, we will automate the FIFA world cup site using Playwright & Typescript. Let's do a quick overview on Playwright -

  • Playwright is one of the most popular e2e automation tool in the market today
  • It works with multiple languages such as Typescript, JS, Python, .Net & Java
  • It supports all the major browsers such as Chrome, FF, Safari and so on..
  • It's really easy to get started with Playwright, with just few steps you can have your first test working

The video below will cover the following topics -

  • Setup & Installation
  • Project & Config Overview
  • Write First Playwright Test
  • Playwright Page Locator
  • Working with multiple elements
  • Disable HTML Report
  • Playwright Debugger / Inspector
  • and much more…

https://youtu.be/Ov9e_F8I5zc

#javascript #typescript #playwright #softwaretesting #automation #qa #web-development #testing 

Playwright with Typescript tutorial on FIFA World cup site ⚽️
Automation Bro

Automation Bro

1668453858

Automate One Time Password (OTP) using Cypress

Automating One Time Password (OTP) is always a bit challenging as you need to work with a third-party service to send/receive messages. Let’s take a look at how we can do that using Cypress …

⚙️ Dependencies

  • Cypress: browser automation framework (can be replaced with any other JS browser automation framework)
  • Receive SMS: to generate a temporary phone number and receive SMS. Note: this can easily be replaced with a paid service such as Twilio, SMSArc, etc..
  • GitHub: example site to test OTP on

In this video, we will cover step-by-step how to automate OTP using Cypress:

https://youtu.be/iiGy69gMeAw

👩🏻‍💻 Access the source code here.


To learn more about Cypress, check out my free Cypress tutorial series here -

https://www.youtube.com/watch?v=krpKuSqQ0XM&list=PL6AdzyjjD5HAr_Jq1hwpFUIO49uyBZ9ma


👩🏻‍💻 It’s time to advance your career by joining the SDET-U Academy today 👇🏻
Join Academy

📧 Subscribe to my mailing list to get access to more content like this as well as be part of amazing free giveaways.

👍 You can follow my content here as well -

Thanks for reading!

 

#testing #automation #testautomation #javascript #cypress #selenium 

Automate One Time Password (OTP) using Cypress
Rupert  Beatty

Rupert Beatty

1668077715

Actions: Supercharge Your Shortcuts

Actions

Additional actions for the Shortcuts app

The app provides lots of powerful extra actions for the Shortcuts app on macOS and iOS. These actions make it significantly easier to create shortcuts.

Submit action idea
(Submit an issue before submitting a pull request)


Want to run shortcuts directly from the iOS Lock Screen? Check out my new Quick Launch app.


Download

Requires at least macOS 13 or iOS 16

Older versions (macOS)

Included actions

  • Add to List
  • Apply Capture Date
  • Ask for Text with Timeout
  • Authenticate
  • Blur Images
  • Calculate with Soulver
  • Choose from List (Extended)
  • Clamp Number
  • Combine Lists
  • Convert Coordinates to Location
  • Convert Date to Unix Time
  • Convert Location to Geo URI
  • Convert Unix Time to Date
  • Create Color Image
  • Create URL
  • Edit URL
  • Filter List
  • Flash Screen (macOS-only)
  • Format Currency
  • Format Date Difference
  • Format Duration
  • Format Number — Compact
  • Format Person Name
  • Generate CSV
  • Generate Haptic Feedback (iOS-only)
  • Generate Random Data
  • Generate Random Text
  • Generate UUID
  • Get Audio Playback Destination (iOS-only)
  • Get Battery State
  • Get Device Orientation
  • Get Emojis
  • Get File Icon (macOS-only)
  • Get File Path
  • Get Index of List Item
  • Get Music Playlists (iOS-only)
  • Get Query Item Value from URL
  • Get Query Items from URL
  • Get Query Items from URL as Dictionary
  • Get Random Boolean
  • Get Random Color
  • Get Random Date and Time
  • Get Random Emoticon
  • Get Random Floating-Point Number
  • Get Related Words
  • Get Running Apps (macOS-only)
  • Get Symbol Image
  • Get Title of URL
  • Get Uniform Type Identifier
  • Get Unsplash Image
  • Get User Details
  • Hex Encode
  • Hide Shortcuts App
  • Is Audio Playing (iOS-only)
  • Is Bluetooth On
  • Is Cellular Data On
  • Is Connected to VPN (iOS-only)
  • Is Dark Mode On
  • Is Device Orientation
  • Is Low Power Mode On
  • Is Online
  • Is Reachable
  • Is Screen Locked (macOS-only)
  • Is Silent Mode On (iOS-only)
  • Is Wi-Fi On (macOS-only)
  • Merge Dictionaries
  • Overwrite File
  • Parse CSV
  • Parse JSON5
  • Play Alert Sound (macOS-only)
  • Pretty Print Dictionaries
  • Remove Duplicate Lines
  • Remove Duplicates from List
  • Remove Emojis
  • Remove Empty Lines
  • Remove from List
  • Remove Non-Printable Characters
  • Reverse Lines
  • Reverse List
  • Round Number to Multiple
  • Sample Color from Screen (macOS-only)
  • Scan Documents (iOS-only)
  • Scan QR Codes in Image
  • Set Creation and Modification Date of File
  • Shuffle List
  • Sort List
  • Spell Out Number
  • Transcribe Audio
  • Transform Lists
  • Transform Text
  • Transform Text with JavaScript
  • Trim Whitespace
  • Truncate List
  • Truncate Text
  • Write or Edit Text

Looking for more?

Screenshot

Non-App Store version for macOS

A special version for users that cannot access the App Store. It won't receive updates.

Download (1.13.1)

Requires macOS 12 or later

FAQ

Why is this free without ads?

I just enjoy making apps. I earn money on other apps. Consider leaving a nice review on the App Store.

Can I contribute localizations?

I don't have any immediate plans to localize the app.

Other apps

  • Gifski - Convert videos to high-quality GIFs
  • System Color Picker - The macOS color picker as an app with more features
  • Plash - Make any website your Mac desktop wallpaper
  • Dato - Better menu bar clock with calendar and time zones
  • More apps…

Download Details:

Author: Sindresorhus
Source Code: https://github.com/sindresorhus/Actions 
License: MIT license

#swift #macos #ios #automation 

Actions: Supercharge Your Shortcuts
Royce  Reinger

Royce Reinger

1668024660

TPOT: A Python Automated Machine Learning tool

TPOT

TPOT stands for Tree-based Pipeline Optimization Tool. Consider TPOT your Data Science Assistant. TPOT is a Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.

TPOT Demo

TPOT will automate the most tedious part of machine learning by intelligently exploring thousands of possible pipelines to find the best one for your data.

An example Machine Learning pipeline

An example Machine Learning pipeline

Once TPOT is finished searching (or you get tired of waiting), it provides you with the Python code for the best pipeline it found so you can tinker with the pipeline from there.

An example TPOT pipeline

TPOT is built on top of scikit-learn, so all of the code it generates should look familiar... if you're familiar with scikit-learn, anyway.

TPOT is still under active development and we encourage you to check back on this repository regularly for updates.

For further information about TPOT, please see the project documentation.

Installation

We maintain the TPOT installation instructions in the documentation. TPOT requires a working installation of Python.

Usage

TPOT can be used on the command line or with Python code.

Click on the corresponding links to find more information on TPOT usage in the documentation.

Examples

Classification

Below is a minimal working example with the optical recognition of handwritten digits dataset.

from tpot import TPOTClassifier
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split

digits = load_digits()
X_train, X_test, y_train, y_test = train_test_split(digits.data, digits.target,
                                                    train_size=0.75, test_size=0.25, random_state=42)

tpot = TPOTClassifier(generations=5, population_size=50, verbosity=2, random_state=42)
tpot.fit(X_train, y_train)
print(tpot.score(X_test, y_test))
tpot.export('tpot_digits_pipeline.py')

Running this code should discover a pipeline that achieves about 98% testing accuracy, and the corresponding Python code should be exported to the tpot_digits_pipeline.py file and look similar to the following:

import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.pipeline import make_pipeline, make_union
from sklearn.preprocessing import PolynomialFeatures
from tpot.builtins import StackingEstimator
from tpot.export_utils import set_param_recursive

# NOTE: Make sure that the outcome column is labeled 'target' in the data file
tpot_data = pd.read_csv('PATH/TO/DATA/FILE', sep='COLUMN_SEPARATOR', dtype=np.float64)
features = tpot_data.drop('target', axis=1)
training_features, testing_features, training_target, testing_target = \
            train_test_split(features, tpot_data['target'], random_state=42)

# Average CV score on the training set was: 0.9799428471757372
exported_pipeline = make_pipeline(
    PolynomialFeatures(degree=2, include_bias=False, interaction_only=False),
    StackingEstimator(estimator=LogisticRegression(C=0.1, dual=False, penalty="l1")),
    RandomForestClassifier(bootstrap=True, criterion="entropy", max_features=0.35000000000000003, min_samples_leaf=20, min_samples_split=19, n_estimators=100)
)
# Fix random state for all the steps in exported pipeline
set_param_recursive(exported_pipeline.steps, 'random_state', 42)

exported_pipeline.fit(training_features, training_target)
results = exported_pipeline.predict(testing_features)

Regression

Similarly, TPOT can optimize pipelines for regression problems. Below is a minimal working example with the practice Boston housing prices data set.

from tpot import TPOTRegressor
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split

housing = load_boston()
X_train, X_test, y_train, y_test = train_test_split(housing.data, housing.target,
                                                    train_size=0.75, test_size=0.25, random_state=42)

tpot = TPOTRegressor(generations=5, population_size=50, verbosity=2, random_state=42)
tpot.fit(X_train, y_train)
print(tpot.score(X_test, y_test))
tpot.export('tpot_boston_pipeline.py')

which should result in a pipeline that achieves about 12.77 mean squared error (MSE), and the Python code in tpot_boston_pipeline.py should look similar to:

import numpy as np
import pandas as pd
from sklearn.ensemble import ExtraTreesRegressor
from sklearn.model_selection import train_test_split
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import PolynomialFeatures
from tpot.export_utils import set_param_recursive

# NOTE: Make sure that the outcome column is labeled 'target' in the data file
tpot_data = pd.read_csv('PATH/TO/DATA/FILE', sep='COLUMN_SEPARATOR', dtype=np.float64)
features = tpot_data.drop('target', axis=1)
training_features, testing_features, training_target, testing_target = \
            train_test_split(features, tpot_data['target'], random_state=42)

# Average CV score on the training set was: -10.812040755234403
exported_pipeline = make_pipeline(
    PolynomialFeatures(degree=2, include_bias=False, interaction_only=False),
    ExtraTreesRegressor(bootstrap=False, max_features=0.5, min_samples_leaf=2, min_samples_split=3, n_estimators=100)
)
# Fix random state for all the steps in exported pipeline
set_param_recursive(exported_pipeline.steps, 'random_state', 42)

exported_pipeline.fit(training_features, training_target)
results = exported_pipeline.predict(testing_features)

Check the documentation for more examples and tutorials.

Contributing to TPOT

We welcome you to check the existing issues for bugs or enhancements to work on. If you have an idea for an extension to TPOT, please file a new issue so we can discuss it.

Before submitting any contributions, please review our contribution guidelines.

Having problems or have questions about TPOT?

Please check the existing open and closed issues to see if your issue has already been attended to. If it hasn't, file a new issue on this repository so we can review your issue.

Citing TPOT

If you use TPOT in a scientific publication, please consider citing at least one of the following papers:

Trang T. Le, Weixuan Fu and Jason H. Moore (2020). Scaling tree-based automated machine learning to biomedical big data with a feature set selector. Bioinformatics.36(1): 250-256.

BibTeX entry:

@article{le2020scaling,
  title={Scaling tree-based automated machine learning to biomedical big data with a feature set selector},
  author={Le, Trang T and Fu, Weixuan and Moore, Jason H},
  journal={Bioinformatics},
  volume={36},
  number={1},
  pages={250--256},
  year={2020},
  publisher={Oxford University Press}
}

Randal S. Olson, Ryan J. Urbanowicz, Peter C. Andrews, Nicole A. Lavender, La Creis Kidd, and Jason H. Moore (2016). Automating biomedical data science through tree-based pipeline optimization. Applications of Evolutionary Computation, pages 123-137.

BibTeX entry:

@inbook{Olson2016EvoBio,
    author={Olson, Randal S. and Urbanowicz, Ryan J. and Andrews, Peter C. and Lavender, Nicole A. and Kidd, La Creis and Moore, Jason H.},
    editor={Squillero, Giovanni and Burelli, Paolo},
    chapter={Automating Biomedical Data Science Through Tree-Based Pipeline Optimization},
    title={Applications of Evolutionary Computation: 19th European Conference, EvoApplications 2016, Porto, Portugal, March 30 -- April 1, 2016, Proceedings, Part I},
    year={2016},
    publisher={Springer International Publishing},
    pages={123--137},
    isbn={978-3-319-31204-0},
    doi={10.1007/978-3-319-31204-0_9},
    url={http://dx.doi.org/10.1007/978-3-319-31204-0_9}
}

Randal S. Olson, Nathan Bartley, Ryan J. Urbanowicz, and Jason H. Moore (2016). Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science. Proceedings of GECCO 2016, pages 485-492.

BibTeX entry:

@inproceedings{OlsonGECCO2016,
    author = {Olson, Randal S. and Bartley, Nathan and Urbanowicz, Ryan J. and Moore, Jason H.},
    title = {Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science},
    booktitle = {Proceedings of the Genetic and Evolutionary Computation Conference 2016},
    series = {GECCO '16},
    year = {2016},
    isbn = {978-1-4503-4206-3},
    location = {Denver, Colorado, USA},
    pages = {485--492},
    numpages = {8},
    url = {http://doi.acm.org/10.1145/2908812.2908918},
    doi = {10.1145/2908812.2908918},
    acmid = {2908918},
    publisher = {ACM},
    address = {New York, NY, USA},
}

Alternatively, you can cite the repository directly with the following DOI:

DOI

Support for TPOT

TPOT was developed in the Computational Genetics Lab at the University of Pennsylvania with funding from the NIH under grant R01 AI117694. We are incredibly grateful for the support of the NIH and the University of Pennsylvania during the development of this project.

The TPOT logo was designed by Todd Newmuis, who generously donated his time to the project.

Download Details:

Author: EpistasisLab
Source Code: https://github.com/EpistasisLab/tpot 
License: LGPL-3.0 license

#machinelearning #python #datascience #automation 

TPOT: A Python Automated Machine Learning tool