1. Introduction
Our Machine2Learn EzeeAI or The Easy Artificial Intelligence platform really is just that, an Open Source platform to make training, running and deploying Artificial Intelligence models a whole lot easier.
With EzeeAI, you can now create deep learning models graphically and intuitively. No need for coding, just drag ‘n drop your data files, create and train your deep learning model and train your deep learning model with a few mouse clicks! After that, deploying your model becomes very easy.
EzeeAI was originally developed by Machine2Learn’s research team to make their own lives a lot easier. With EzeeAI, they could, for example, quickly assess the predictive possibilities of customer datasets. When word got out, other data scientists and academics became so enthusiastic, we decided to make EzeeAI available to others.
1.1. Why EzeeAI?
- Intuitive, easy to understand Graphic User Interface.
- No coding required just drag & drop datasets.
- Visually create deep learning models.
- Start training models in a matter of minutes.
- Understand and explain certain outcomes i.e. for GDPR.
- Make your models easily deployable.
- No technical background required.
- Open source software licence.
You do not need much experience with deep learning or coding to use EzeeAI. That’s why EzeeAI is now used by many users around the world, ranging from high school students and academics to data scientists and business managers.
1.2. What can you do with EzeeAI?
- Simply upload datasets easily by drag ‘n drop upload.
- Use e.g. CSV file datasets.
- Automatically detect feature types and change manually.
- Use out of the box deep learning models for regression and classification.
- Choose out of the box models or create new ones and compare their results.
- Easily manage your models.
- Intuitively design deep learning model architectures.
- Specify and choose your features.
- Tune hyper-parameters.
- Graphically log and see the progress of the model metrics such as accuracy, loss and r2.
- Use Tensorflow, Google’s open source machine learning framework.
- Run it on any cloud platform.
- Automatically create docker containers to deploy and run trained models in production environments.
- Use explainability, a unique functionality enabling you to explain what the model is actually doing. This is especially important for GDPR.
2. Get Started
Note: The steps described in this tutorial are originally followed on a Google Chrome browser.
2.1. Installation & Execution
There are two possible ways to install EzeeAI: direct installation of Python code and installing a Docker.
Once the installation is completed and EzeeAI run command is executed, you can access EzeeAI by following the logging-in instructions.
2.1.1. Installation via Python
First get the publicly available GitHub repository on your local machine.
git clone https://github.com/machine2learn/ezeeai.git
We assume that Python 3 is the default python interpreter. From the terminal, go to the ezeeai root folder and run the following command to install all required packages.
pip install -r requirements.txt
2.1.1.1. Execution via Python
In a terminal, execute the following command (Python 3 is the assumed python interpreter).
python wsgi.py
2.2. Installation & Execution via Docker
First, you need to install Docker on your device; please follow the instructions for your platforms contained in the official Docker guide.
After you launch Docker on your device, open a terminal window. Then you can start a Docker container with the EzeeAI published image by using the following command line; the first time this command is executed, the EzeeAI Docker image will be downloaded automatically from this link:
docker run -p 5000:5000 -p 55500:55600 machine2learn/ezeeai
To save your model after Docker finishes, you can also mount a volume to folder like /tmp/data
:
docker run -p 5000:5000 -p 55500-55600:55500-55600 -v $(pwd)/data:/tmp/data machine2learn/ezeeai
3. Logging In
You can use the application by launching chrome browser and connecting to http://localhost:5000.
To login you can use the following credentials.
Username | Password |
---|---|
test | test_machine2learn |
4. Tutorials
In these tutorials we will guide you through two example procedures for developing effective neural network models.
- For regression, we will use built-in layers to make a neural network similar to the one used in this example.
- For classification, we will create a custom neural network for image processing.
These boxes will contain brief explanation of relevant machine learning concept and related reference to additional material.
4.1. Regression
The first tutorial will focus on developing a regression for tabular data.
- Regression consist in developing a model that predict the values of a continuous variable based on the values of other variables
- Tabular data are any data that can be expressed in the form of a table
- Features corresponds to the columns of our tabular data
- Target is the column of our tabular data that we want to predict
The goal is to develop a model that, given a customer profile, can predict the amount of money this person is willing to pay for buying a car.
The profile customers have the following features:
age
gender
miles
(average miles driven per day)debt
(personal debt)income
(monthly income)
while the target sale
is the value of the purchased car.
4.1.1. Upload Tabular Data
The first steps for the development of any AI model is data loading.
We will upload in EzeeAI a dataset called cars
collecting customer profiles and the price they payed
A good dataset should satisfy the following characteristics:
- Relevant information contained in the data must be informative for the task: the data features should be sensibly related to the target and the data points (rows) should be somehow related to the data points to which the model will be applied
- Rich in general, the more data points (rows), the better the model can learn
- Reliable features should be accurate and have low biases
- Repleted few missing values
- Download
cars.csv
file from here to your device - On the left menu, go to
Datasets > Tabular
- On the
Upload Data
box (top left), click on theBrowse
button belowTrain dataset in CSV format
and select thecars.csv
file - To complete the file upload, click on the
Save
button of theUpload Data
box. A blue loading bar will appear, and the completion of the upload will be signaled by the writingFile upload completed
.
The Choose Data
box below should now contain an entry named cars
.
4.1.1.1. Visualization
We can now have a first look at our cars
data.
By selecting cars
in the Choose Data
box, the following boxes will start displaying information:
Raw Data
displays the data through a simple table, where each row corresponds to the first 10 data point (a customer, in this case) and each column correspond to a feature. We can scroll among all the other data points with the buttons at the bottom.Scatter Plot
is a simple but effective tool to visually grasp the pairwise relations between the data. Each plotted point corresponds to a datapoint $(x, y)$, where $x$ and $y$ are its values for two features or target of your choice.- By setting
x-axis
toincome
andy-axis
tosale
, notice thatsale
increases withincome
.
- By setting
Feature Histogram
shows the histogram and the inferred probability density function for the selected feature; it allows to spot biases and outliers.- The distribution of
miles
is concentrate on values below 30 and rarely goes beyond 50.
- The distribution of
Heat Map
gives an overview of the linear relations between features and targets by plotting their pairwise Pearson Correlation Coefficients (PCC); the actual coefficients can be visualized by hoovering over the respective rectangle.debt
andmiles
are a somehow positively linearly correlated (PCC=0.54) whilegender
andage
have no linear correlation (PCC=0)
Additional visualizations are available by clicking on Show report
at the bottom of the page.
4.1.2. Model Creation
Now that we have uploaded our data, we can start building the model. We will create the InputLayer
node, that is the first node of the network, and add to it the data cars
; then we create the rest of the network and add the loss function node at the very end.
Loss function is the tool used by the network to understand how good its predictions are. Essentially, it is a mathematical function that takes high values for bad predictions and low value for good predictions.
On the left menu, go to Models
.
Here is the blank draw grid where we are going to design our model.
We will simply need to select nodes from the blue menu and then click on the canvas to paste them; then we create the connection by clicking on the center top of the source node and drag the line to the destination node.
In the remainder of this section, we will simply call menu this blue menu on the left of the Models
page.
4.1.3. Input Layer Creation
We now create the InputLayer_0
node with the cars
data.
On the menu, select Input Layers > InputLayer
.
-
Create the
InputLayer_0
node by clicking in any place of the draw grid. -
Right click on the
InputLayer_0
node and selectadd input data
-
The
Input Layer
window will appear. Selectcars
among the datasets and clickContinue
to go to theSplit
tab. -
Now is the time to split your
cars
dataset in Train, Validation, and Test data by dragging the percentage bar. We recommend in this case to set Train to 70%, Validation to 20% and Test to 10%. Then clickContinue
to go to theFeatures
tab.Reference material on the importance of data splitting Blog
- https://machinelearningmastery.com/difference-test-validation-datasets/
- https://towardsdatascience.com/train-validation-and-test-sets-72cb40cba9e7
- https://cs230-stanford.github.io/train-dev-test-split.html
Video
-
In the
Features
tab we have to select the feature to be used and their categories.-
Details on the features categories
-
Reference material on data normalization
Blog
Video
- Select
gender
ascategorical
because the gender has no meaningful order. - Select all the remaining features as
Numerical
. - Tick
Normalize
because without normalized feature training the neural network will be harder. - Click
Continue
to go to theTargets
tab.
-
-
In the
Targets
tab we have to select the feature to be used as target; selectsales
and click onSave
.
4.1.4. Network Creation
We now create a collection of dense layers by creating just one DNN_0
node. We select Canned Model > DNN
and then clicking on the grid:
- Connect
InputLayer_0
toDNN_0
: hold click on the center top ofInputLayer_0
and drag to theDNN_0
till the arrow appears - Setup
DNN_0
to have 2 hidden layers of 12 and 8, ReLu as activation function, and initialize the weights to random values generated from a standard normal distribution; a final 1-node layer with linear activation function will be added automatically at the end of the DNN node:-
Left-click on the node; the property menu will appear on the right.
-
Set
hidden_layers
to12,8
-
Set
kernel_initializer
torandomNormal
-
4.1.5. Loss Node Creation and Model Saving
-
Now we have to create the loss function node
Loss_0
. We selectLoss Functions > Loss
and then clicking on the grid.-
Connect
DNN_0
toLoss_0
-
By accessing the property menu, you can check that the mean squared error is used as the loss function
-
-
Now the model is ready: name it
cars_model
on the top leftModel name
, then check its correctness and save it by clickingValidate
andSave
4.1.6. Model Training
Now that we have a model designed, it is time to train and see how it performs on train and validation data.
- On the left menu, go to
Train > Run
- In the
Run Config
menu, go toModel
and selectcars_model
4.1.6.1. Experiment&Training Parameters
EzeeAI allows to tune our training parameters very easily. In this tutorial will just focus on two training parameters:
-
In the
Run Config
menu, setNumber of epochs
to 150 andBatch size
to 50.- This means that the model will go through all the train data for 150 cycles; at each iteration performed within each epoch, 50 data points will be used to update the network. Hence the model will take 3 iterations to complete one epoch.
-
Start the training by clicking on the play button:
4.1.6.2. Training Visualization
After a few seconds, the boxes belows the play button will start to visualize information on the model performances:
-
The
Checkpoints
box indicates the model performance saved for different point in time during the training. Each checkpoint is a saving of the model, and is identified by the number of iterations performed when the model was saved. The number of saved checkpoint is currently 5 and they are saved after 50 iterations, as specified by the parametersMaximum # of checkpoints
andSave checkpoints after
in theRun Config
menu, respectively:The R-squared, also called the coefficient of determination, measure the quality of the prediction; it is a value lower than 1, where 1 indicates perfect linear agreement
- As hoped, as the iterations increase the R-squared increases toward maximum potential value 1 while the Loss decreases toward the minimum potential value 0, therefore indicating that the model performance on the train data is improving.
-
The other two boxes show the graphs of the Loss and the R-squared measured for different steps; both the Loss and the R-squared are compute both on the train data and the validation data
Much better model performance on the train data would have signaled that our model is overfitting, that means the model is too influenced by the train data and could have performance issue on other datasets.
- As the steps increase, the two measures have similar values for the train data and the validation data.
4.1.6.3. TensorBoard
Additional details on the model training can be visualized through the Google Tensorflow board, that is accessible from them menu in Run > Tensorboard
:
4.1.7. Inference
4.1.7.1. Predict
Sometimes you want to see how your model would behave on a specific datapoint and play a bit with its features. EzeeAI allows that.
-
On the left menu, go to
Predict
-
Select a checkpoint: in the
Model
section, selectcars_model
; the checkpoints table will appear. Pick the ones with the best R-squared. -
Now we can set the datapoint values to what we want and see the outcome. A box named
Your Features
will appear; setdebt
to 39999 andincome
to 2000. Then clickPredict
.- We get a prediction of 17444.355. This means that, according to our model trained on the train data, a 25-year-old woman, who commutes 23 miles per day, has a debt of USD 40k and a monthly income of USD 2000 should be willing to spend about USD 17.5k to buy a second-hand car.
4.1.7.2. Explainability
In some situation we would really like to understand why we get a certain prediction from the model. EzeeAI can give you some hints.
-
On the left menu, go to
Explain
-
Select a checkpoint: in the
Model
section, selectcars_model
; the checkpoints table will appear. Pick the ones with the best R-squared. -
Now we can set the datapoint values to what we want and see the outcome. A box named
Your Features
will appear; setdebt
to 40000 andincome
to 2000. Then clickPredict
. -
Let us set the explainability parameters. In the bottom part of the
Your Features
box, we can select the number of most explaining features to be displayed in the box on the right; let’s live#Features
to 3. The fieldTop labels
is only used for explaining classification outputs so we leave it set to 1 (for classification, we can set it up to the number of classes in the data).
By clicking Explain
, the following boxes will visualize our results:
-
Prediction Output
simply visualizes the prediction as an histogram; the number at the top and the bottom of the y axis are the minimum and the maximum potential value a prediction of our model could give- Notice that the range of potential prediction is very high, starting from less than USD 2000 to USD 320k
-
Your Explanation
shows the explanation of what contributed to determine the prediction. This is based on 3 top explaining features, as specified by#Features
value of theExplain Params
. The label on the left indicates the logical condition to which the score refers to, and the score is an indicator of how much it contributed to make the final prediction higher or lower.- Most of the prediction is determined by the fact that the debt is higher than USD 18126.50; the fact that the monthly income is below USD 3349 reduced the estimated sale price. The gender has basically no role in determining the prediction
The explainability analysis is performed via the LIME
library available on GitHub and described in this research paper.
This is a brief description of the method extracted from the GitHub page:
Intuitively, an explanation is a local linear approximation of the model’s behavior. While the model may be very complex globally, it is easier to approximate it around the vicinity of a particular instance. While treating the model as a black box, we perturb the instance we want to explain and learn a sparse linear model around it, as an explanation.
4.1.7.3. Test
We now look at the performance of our model on the test data; these data have not been seen by the model so far, so it can be very informative on the generalizability of the model. To this aim, we need to choose a model (i.e. checkpoint) to be tested and the data on which we want to assess the performances.
-
On the left menu, go to
Test
-
Select a checkpoint: in the
Model
section, selectcars_model
; the checkpoints table will appear. Pick the ones with the best R-squared. -
Select the test data: after you select a checkpoint, select
cars_split_test.csv
beneathSelect a test file
.
By clicking Test
, the following boxes will visualize our results:
Your Predictions
similarly to theRaw Data
table we saw earlier, it displays the data and the predicted target value through a simple table, where each row corresponds to the first 10 data point (a customer, in this case) and each column correspond to a feature. We can scroll among all the other data points with the buttons at the bottom.Predicted vs. Actual Response
is a scatterplot of the test data point; the x-axis correspond to the actual target feature value while the y-axis shows the predicted target value.- The model tends to overestimate the sale budget up to 10k and underestimate it between 10k and 20k
4.2. Classification
This second tutorial focuses on classification in the context of image data.
The goal is to develop a model that, given an input image, it can predict at what image class it belongs. We will work with a dataset called cifar10 collecting containing 10 evenly-distributed image classes. The 10 different classes represent airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships, and trucks. There are 6,000 images of each class.[4]
4.2.1. Upload Image Data
The first steps for the development of any AI model is data loading.
We will upload in EzeeAI the cifar10
dataset.
-
Download
cifar10.npz
to your device. -
On the left menu, go to
Datasets > Image
-
On the
Upload Data
box (top left), atSelect format option
chooseNumpy file
, then click on theBrowse
button below and select thecifar10.npz
file -
To complete the file upload, click on the
Save
button of theUpload Data
box. A blue loading bar will appear, and the completion of the upload will be signaled by the writingFile upload completed
.
The Choose Data
box below should now contain an entry named cifar10
.
4.2.1.1. Visualization
We can now have a first look at our cifar10
data.
By selecting cifar10
in the Choose Data
box, the following boxes will start displaying information:
Sample Images
shows our first image in the data according order in the input file.- We can see that is an airplane. Clicking on the thumbnail images, we can check that the second is a car, the third is a bird and the fourth is a cat.
Label Distribution
is an histogram showing the distribution of the images according to their class labels.- Each class contains 5000 images.
4.2.2. Model Creation
Now that we have uploaded our data, we can start building the model. We will create the InputLayer
node, that is the first node of the network, and add to it the data cifar10
with some additional images automatically generated from it; then we create the rest of the network and add the loss function node at the very end.
Loss function is the tool used by the network to understand how good its predictions are. Essentially, it is a mathematical function that takes high values for bad predictions and low value for good predictions
On the left menu, go to Models
.
Here is the blank draw grid where we are going to design our model.
We will simply need to select nodes from the blue menu and then click on the canvas to paste them; then we create the connection by clicking on the center top of the source node and drag the line to the destination node.
In the remainder of this section, we will simply call menu this blue menu on the left of the Models
page.
4.2.3. Input Layer Creation
We now create the InputLayer_0
node with the cifar10
data.
On the menu, select Input Layers > InputLayer
.
- Create the
InputLayer_0
node by clicking in any place of the draw grid. - Right click on the
InputLayer_0
node and selectadd input data
-
The
Input Layer
window will appear. Selectcifar10
among the datasets and clickContinue
to go to theSplit
tab. -
Now is the time to split your
cifar10
dataset in Train, Validation, and Test data by dragging the percentage bar. We recommend in this case to set Train to 70%, Validation to 20% and Test to 10%. Then clickContinue
to go to theFeatures
tab. -
Deep Learning models benefits by being trained on high amount of data. To increase our dataset, we perform what is called data augmentation
Data augmentation is a way to increase our data samples. It consists in generate new realistic data samples by applying transformations to the available data samples. The transformations simulate the representation of the original samples under different conditions.
-
Now we are going to apply some data transformation to our images: rotation and horizontal flip:
-
From
Data Augmentation
list selectFlip
andRotation
. Then press the>
symbol on top-right. The two rows will now appear on theSelected
menu -
Select
Flip
and tick theHorizontal Flip
box -
Select
Rotation
and set the values to0
and6
, so that all possible rotations can be done
-
-
Now we can click on
Continue
and thenSave
4.2.4. Node Creation
-
Now we create the rest of the network; let’s create the first node
Conv2D_0
corresponding to a Convolutional Neural Network (CNN) layer. We create a network with 32 2-dimensional filters with kernel window size 3x3; padding will be such that the output has the same length as the original input. Then it will be followed by aReLu
activation function.Fur further information on CNN https://www.youtube.com/watch?v=YRhxdVk_sIs
-
On the left menu with the node models, click
Convolutional Layers > Conv2D
and then click anywhere in the grid. -
To configure the node, do a left-click on it and the configuration menu will appear on the right
-
Set the following parameters:
kernel_size
to[3, 3]
filter
to 32padding
tosame
-
Connect
InputLayer_0
toConv2D_0
: hold click on the center top ofInputLayer_0
and drag to theConv2D_0
till the arrow appears.
-
-
Similarly, we create another convolutional layer by duplicating the previous one and switch to the default padding.
-
Right-click on
Conv2D_0
node and selectduplicate
-
Left-click and check that the node is configured as
Conv2D_1
. Then changepadding
tovalid
. -
Connect
Conv2D_1
withConv2D_0
-
-
We can now create the first Maximum Pooling layer
MaxPooling2D_0
with a 2x2 pooling window size.-
On the left menu with the node models, click
Pooling Layers > MaxPooling2D
and then click anywhere in the grid. -
Left-click on the
MaxPooling2D_0
node and in the configuration menu set bothpool_size
andstrides
to[2, 2]
. Then connect it toConv2D_1
-
-
To reduce overfitting we put here a Dropout
Dropout_0
node with 25% dropout rate. Then we save the network-
On the left menu with the node models, click
Core Layers > Dropout
and then click anywhere in the grid. -
Left-click on the
Dropout_0
node and in the configuration menu setrate
to0.25
. Then connect it toMaxPooling2D_0
-
Let’s save what we have done so far. On the top left of the grid, click on
new_model
and replace it withcifar10_model
. At the bottom-right corner of the screen, click onSave
to save the model withcifar10_model
name. Since our model cannot be validated yet, we’ll have to store it without the input data.
-
4.2.5. Group Creation
The next four nodes are very similar to the last four. So the fastest way to make them is to transform the previous one in a group, duplicate it, and edit what we want
-
While keeping the key Shift pressed, left-click on the latest four nodes (
Conv2D_0
,Conv2D_1
,MaxPooling2D_0
,Dropout_0
) -
Right-click on any of those nodes and select
group nodes
-
Right-click on the newly created group
block_0
and selectduplicate
-
A new group will appear; edit its
Conv2D_2
andConv2D_3
nodes such that theirfilters
values are 64 -
Connect
Dropout_0
toConv2D_2
4.2.6. Complete Network, Loss Node Creation, and Model Saving
-
The next series of nodes will remove the spatial structure of the network and output 10 positives values, one per class. The nodes have to be connected to the previous ones in the order of appearance.
-
First we flatten the last 2-dimensional network layer by adding a Flatten node: select it from the left menu
Core Layers > Flatten
-
Then we add a Dense node (also called fully-connected): select it from the left menu
Core Layers > Dense
and setunits
to512
-
We put a
Dropout
node again, but this time with 50% rate (setrate
to0.5
) -
Finally we add a final
Dense
node with 10 units and no activation function (setactivation
tolinear
). We call itDense_1
-
-
Now we have to create the loss function node
Loss_0
. We selectLoss Functions > Loss
and then clicking on the grid.-
Connect
Dense_0
toLoss_0
-
By accessing the property menu, you can check that the
softmax_cross_entropy
is used as the loss function:
-
-
Now we check that everything is correct. On the bottom-right of the grid, click on the button
Validate
, and thenSave
:
4.2.7. Model Training
Now that we have a model designed, it is time to train and see how it performs on train and validation data.
- On the left menu, go to
Train > Run
- In the
Run Config
menu, go toModel
and selectcifar10_model
4.2.7.1. Experiment&Training Parameters
EzeeAI allows to tune our training parameters very easily. In this tutorial will just focus on two training parameters:
-
In the
Run Config
menu, set:Number of epochs
to 10000Optimizer
toRMSProp
Learning rate
to 0.0001
-
Start the training by clicking on the play button