A simple Machine Learning system demo, for ML study

Machine Learning System on Spark

introduction

This project is a full stack Django/React/Redux app that uses token based authentication with Knox.

Then I add Machine Learning features for demostrate the full workflow of the data mining, including the four stage corresponding to four pages:

data management
data explore
model train
prediction

The data set is the classic iris data, which is only for demo, and this project is from my interest. so you can reference, but the quality is not assured.

features

authentication functions

data management

input iris items edit iris items delete iris items

data explore

inspect attribute distribution through histogram inspect sepal distribution through scatter graph inspect petal distribution through scatter graph

model train

input cluster number train a cluster model using spark.ml library inspect cluster result through sepal and petal scatter

prediction

input iris sepal and petal attributes predict iris cluster

Architecture

Generally, train process is time consumming, and predict process is quick. So set train flow as async mode, and predict flow as sync mode.

train flow

user start model train from browser
django recieve the “start train” message
django schedule spark.ml celery process to train model and return to user immediately.
browser query train status to django
django query train status from spark.ml celery process.
django feedback the train over status to browser

predict flow

user input prediction features on browser, then click submit
browser send prediction features to django
django call prediction api with prediction features
django feedback the prediction result to browser

+---------+            +-------------+            +------------+
|         | start train|             |            |            |
|         +------------>             |   start    |            |
|         |            |             +---train---->            |
|         |  query train             |            |            |
|         +--status---->             |            |            |
|         |            |             +----query -->            |
|         <---train ---+             |    train   |            |
|         |   over     |             |    status  |            |
|  browser|            |   django    |            |  spark.ml  |
|         |            |             |            |  on celery |
|         |  predict   |             |            |            |
|         +------------>             |    predict |            |
|         |            |             +----------->+            |
|         <--predict---+             |            |            |
|         |  result    |             |            |            |
|         |            |             |            |            |
|         |            |             |            |            |
+---------+            +-------------+            +------------+

technology stack

category	name	comment
frontend	reactjs	frontend framework
frontend	redux	state management
frontend	react-C3JS	D3 based graph tool
frontend	react-bootstrap	style component library
frontend	data-ui	react data visualization tool
backend	django	backend framework
backend	django-rest-knox	authentication library
backend	djangorestframework	restful framework
backend	spark.ml	machine learning tool

Install

python3.5 install

wget https://www.python.org/ftp/python/3.5.3/Python-3.5.3.tgz
tar -xvzf Python-3.5.3.tgz
cd Python-3.5.3
./configure
make && make install

Install python package

pip3 install -r requirements.txt

install redis

sudo apt-get install redis-server

Spark Install

reference

Run

start redis server

redis-server

frontend build

cd ./frontend
# Install dependencies
npm install
# Build for production
npm run build

start celery worker for django

cd ./backend
celery worker -A config -l debug

start Server on localhost:8000

cd ./backend
python3 manage.py runserver

snapshot

login page

data management page

data explore page

model train page

prediction page

TO DO LIST

introduction pyenv

manage dependencies include python version and pip dependency

Memo:

(1) use this command to make new requirements.txt, if new package is installed.
pip3 freeze > requirements.txt

(2) python version change

sudo pip3 uninstall pip && sudo apt install python3-pip --reinstall
or
curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
python get-pip.py

from
https://zhuanlan.zhihu.com/p/37473690

(3) sqlite3 error solution
# sqlite install
wget https://www.sqlite.org/2017/sqlite-autoconf-3170000.tar.gz --no-check-certificate
tar xf  sqlite-autoconf-3170000.tar.gz
cd sqlite-autoconf-3170000/
./configure --prefix=/usr/local/sqlite3 --disable-static --enable-fts5 --enable-json1 CFLAGS="-g -O2 -DSQLITE_ENABLE_FTS3=1 -DSQLITE_ENABLE_FTS4=1 -DSQLITE_ENABLE_RTREE=1"
make && make install

python install
cd Python-3.5.3
LD_RUN_PATH=/usr/local/sqlite3/lib ./configure LDFLAGS="-L/usr/local/sqlite3/lib" CPPFLAGS="-I /usr/local/sqlite3/include"
LD_RUN_PATH=/usr/local/sqlite3/lib make
LD_RUN_PATH=/usr/local/sqlite3/lib sudo make install

from
https://www.cnblogs.com/i1991/p/9497259.html

(4) python specific version install & pipenv usage
apt-get update
apt-get install python3.7
pip3 install pipenv
pipenv install

(5) django and celery integration
https://www.pythonf.cn/read/7143
https://www.cnblogs.com/wdliu/p/9530219.html

Download Details:

Author: fanqingsong

Source Code: https://github.com/fanqingsong/machine_learning_system_on_spark

#reactjs #react #javascript #machine learning