Simple but powerful Automated Machine Learning library for tabular data. It uses efficient in-memory SAP HANA algorithms to automate routine Data Science tasks.
📚 Explore the docs »
🐞 Report Bug · 🆕 Request Feature
This library is an open-source research project and is not part of any official SAP products.
This is a simple but accurate Automated Machine Learning library. Based on SAP HANA powerful in-memory algorithms, it provides high accuracy in multiple machine learning tasks. Our library also uses numerous data preprocessing functions to automate routine data cleaning tasks. So, hana_automl goes through all AutoML steps and makes Data Science work easier.
From www.sap.com: SAP HANA is a high-performance in-memory database that speeds data-driven, real-time decisions and actions.
https://share.streamlit.io/dan0nchik/sap-hana-automl/main/web.py
https://sap-hana-automl.readthedocs.io/en/latest/index.html
https://github.com/dan0nchik/SAP-HANA-AutoML/blob/main/comparison_openml.ipynb
👇 By the end of summer 2021, blue part will be fully automated by our library
To get a package up and running, follow these simple steps.
Make sure you have the following:
** ✅ Setup SAP HANA** (skip this step if you have an instance with PAL enabled). There are 2 ways to do that.
In HANA Cloud:
In Virtual Machine:
** ✅ Installed software**
Python > 3.6
Skip this step if python --version
returns > 3.6
Cython
pip3 install Cython
There are 2 ways to install the library
Stable: from pypi
pip3 install hana_automl
Latest: from the repository
pip3 install https://github.com/dan0nchik/SAP-HANA-AutoML/archive/dev.zip
Note: latest version may contain bugs, be careful!
Check that PAL (Predictive Analysis Library) is installed and roles are granted
Read docs section about that.
If you don’t want to read docs, run this code
from hana_automl.utils.scripts import setup_user
from hana_ml.dataframe import ConnectionContext
cc = ConnectionContext(address='address', user='user', password='password', port=39015)
# replace with credentials of user that will be created or granted a role to run PAL.
setup_user(connection_context=cc, username='user', password="password")
Our library in a few lines of code
Connect to database.
from hana_ml.dataframe import ConnectionContext
cc = ConnectionContext(address='address',
user='username',
password='password',
port=1234)
Create AutoML model and fit it.
from hana_automl.automl import AutoML
model = AutoML(cc)
model.fit(
file_path='path to training dataset', # it may be HANA table/view, or pandas DataFrame
steps=10, # number of iterations
target='target', # column to predict
time_limit=120 # time limit in seconds
)
Predict.
model.predict(
file_path='path to test dataset',
id_column='ID',
verbose=1
)
For more examples, please refer to the Documentation
git clone https://github.com/dan0nchik/SAP-HANA-AutoML.git
pip3 install -r requirements.txt
streamlit run ./web.py
See the open issues for a list of proposed features (and known issues). Feel free to report any bugs :)
Any contributions you make are greatly appreciated 👏!
Fork the Project
Create your Feature Branch (git checkout -b feature/NewFeature
)
Install dependencies
pip3 install Cython
pip3 install -r requirements.txt
Create credentials.py
file in tests
directory Your files should look like this:
SAP-HANA-AutoML
│ README.md
│ all other files
│ .....
|
└───tests
│ test files...
│ credentials.py
Copy and paste this piece of code there and replace it with your credentials:
host = "host"
user = "username"
password = "password"
port = 39015 # or any port you need
schema = "your schema"
Don’t worry, this file is in .gitignore, so your credentials won’t be seen by anyone.
Make some changes
Write tests that cover your code in tests
directory
Run tests (under SAP-HANA-AutoML directory
)
pytest
Commit your changes (git commit -m 'Add some amazing features'
)
Push to the branch (git push origin feature/AmazingFeature
)
Open a Pull Request
Author: dan0nchik
The Demo/Documentation: View The Demo/Documentation
Download Link: Download The Source Code
Official Website: https://github.com/dan0nchik/SAP-HANA-AutoML
License: Distributed under the MIT License. See LICENSE
for more information.
Don’t really understand license? Check out the MIT license summary.
Authors: @While-true-codeanything, @DbusAI, @dan0nchik
Project Link: https://github.com/dan0nchik/SAP-HANA-AutoML
#machine-learning #python