1. Introduction

Our Machine2Learn EzeeAI or The Easy Artificial Intelligence platform really is just that, an Open Source platform to make training, running and deploying Artificial Intelligence models a whole lot easier.

With EzeeAI, you can now create deep learning models graphically and intuitively. No need for coding, just drag ‘n drop your data files, create and train your deep learning model and train your deep learning model with a few mouse clicks! After that, deploying your model becomes very easy.

EzeeAI was originally developed by Machine2Learn’s research team to make their own lives a lot easier. With EzeeAI, they could, for example, quickly assess the predictive possibilities of customer datasets. When word got out, other data scientists and academics became so enthusiastic, we decided to make EzeeAI available to others.

1.1. Why EzeeAI?

You do not need much experience with deep learning or coding to use EzeeAI. That’s why EzeeAI is now used by many users around the world, ranging from high school students and academics to data scientists and business managers.

1.2. What can you do with EzeeAI?

2. Get Started

2.1. Installation&Execution

There are two possible ways to install EzeeAI: direct installation of Python code and installing a Docker.

Once the installation is completed and EzeeAI run command is executed, you can access EzeeAI by following the logging-in instructions.

2.1.1. Installation via Python

First get the publicly available GitHub repository on your local machine.

git clone https://github.com/machine2learn/ezeeai.git

We assume that Python 3 is the default python interpreter. From the terminal, go to the ezeeai root folder and run the following command to install all required packages.

pip install -r requirements.txt
2.1.1.1. Execution via Python

In a terminal, execute the following command (Python 3 is the assumed python interpreter).

python wsgi.py

2.2. Installation&Execution via Docker

First, you need to install Docker on your device; please follow the instructions for your platforms contained in the official Docker guide.

After you launch Docker on your device, open a terminal window. Then you can start a Docker container with the EzeeAI published image by using the following command line; the first time this command is executed, the EzeeAI Docker image will be downloaded automatically from this link:

docker run -p 5000:5000 -p 55500:55600 machine2learn/ezeeai

To save your model after Docker finishes, you can also mount a volume to folder like /tmp/data:

docker run -p 5000:5000 -p 55500-55600:55500-55600 -v $(pwd)/data:/tmp/data machine2learn/ezeeai

3. Logging In

You can use the application by launching chrome browser and connecting to http://localhost:5000.

To login you can use the following credentials.

Username Password
test test_machine2learn

4. Tutorials

In these tutorials we will guide you through two example procedures for developing effective neural network models.

These boxes will contain brief explanation of relevant machine learning concept and related reference to additional material.

4.1. Regression

The first tutorial will focus on developing a regression for tabular data.

  • Regression consist in developing a model that predict the values of a continuous variable based on the values of other variables
  • Tabular data are any data that can be expressed in the form of a table
  • Features corresponds to the columns of our tabular data
  • Target is the column of our tabular data that we want to predict

The goal is to develop a model that, given a customer profile, can predict the amount of money this person is willing to pay for buying a car.

The profile customers have the following features:

while the target sale is the value of the purchased car.

4.1.1. Upload Tabular Data

The first steps for the development of any AI model is data loading. We will upload in EzeeAI a dataset called cars collecting customer profiles and the price they payed

A good dataset should satisfy the following characteristics:

  • Relevant information contained in the data must be informative for the task: the data features should be sensibly related to the target and the data points (rows) should be somehow related to the data points to which the model will be applied
  • Rich in general, the more data points (rows), the better the model can learn
  • Reliable features should be accurate and have low biases
  • Repleted few missing values
  1. Download cars.csv file from here to your device
  2. On the left menu, go to Datasets > Tabular Upload CSV data
  3. On the Upload Data box (top left), click on the Browse button below Train dataset in CSV format and select the cars.csv file
  4. To complete the file upload, click on the Save button of theUpload Data box. A blue loading bar will appear, and the completion of the upload will be signaled by the writing File upload completed.

The Choose Data box below should now contain an entry named cars.

4.1.1.1. Visualization

We can now have a first look at our cars data.

By selecting cars in the Choose Data box, the following boxes will start displaying information:

Data Visualization

Additional visualizations are available by clicking on Show report at the bottom of the page.

4.1.2. Model Creation

Now that we have uploaded our data, we can start building the model. We will create the InputLayer node, that is the first node of the network, and add to it the data cars; then we create the rest of the network and add the loss function node at the very end.

Loss function is the tool used by the network to understand how good its predictions are. Essentially, it is a mathematical function that takes high values for bad predictions and low value for good predictions.

On the left menu, go to Models. Here is the blank draw grid where we are going to design our model.

Data Visualization

We will simply need to select nodes from the blue menu and then click on the canvas to paste them; then we create the connection by clicking on the center top of the source node and drag the line to the destination node. In the remainder of this section, we will simply call menu this blue menu on the left of the Models page.

4.1.3. Input Layer Creation

We now create the InputLayer_0 node with the cars data. On the menu, select Input Layers > InputLayer.

  1. Create the InputLayer_0 node by clicking in any place of the draw grid.

  2. Right click on the InputLayer_0 node and select add input data

    Add input data

  3. The Input Layer window will appear. Select cars among the datasets and click Continue to go to the Split tab.

  4. Now is the time to split your cars dataset in Train, Validation, and Test data by dragging the percentage bar. We recommend in this case to set Train to 70%, Validation to 20% and Test to 10%. Then click Continue to go to the Features tab.

    Reference material on the importance of data splitting Blog

    Video

    Split Data

  5. In the Features tab we have to select the feature to be used and their categories.

    Blog

    Video

    1. Select gender as categorical because the gender has no meaningful order.
    2. Select all the remaining features as Numerical.
    3. Tick Normalize because without normalized feature training the neural network will be harder.
    4. Click Continue to go to the Targets tab.

    Features Processing

  6. In the Targets tab we have to select the feature to be used as target; select sales and click on Save.

    Select Target

4.1.4. Network Creation

We now create a collection of dense layers by creating just one DNN_0 node. We select Canned Model > DNN and then clicking on the grid:

Create DNN Node

  1. Connect InputLayer_0 to DNN_0: hold click on the center top of InputLayer_0 and drag to the DNN_0 till the arrow appears
  2. Setup DNN_0 to have 2 hidden layers of 12 and 8, ReLu as activation function, and initialize the weights to random values generated from a standard normal distribution; a final 1-node layer with linear activation function will be added automatically at the end of the DNN node:
    1. Left-click on the node; the property menu will appear on the right.

    2. Set hidden_layers to 12,8

    3. Set kernel_initializer to randomNormal

      Create Node

4.1.5. Loss Node Creation and Model Saving

  1. Now we have to create the loss function node Loss_0. We select Loss Functions > Loss and then clicking on the grid.

    1. Connect DNN_0 to Loss_0

    2. By accessing the property menu, you can check that the mean squared error is used as the loss function

      Create Loss Node

  2. Now the model is ready: name it cars_model on the top left Model name, then check its correctness and save it by clicking Validate and Save

4.1.6. Model Training

Now that we have a model designed, it is time to train and see how it performs on train and validation data.

  1. On the left menu, go to Train > Run
  2. In the Run Config menu, go to Model and select cars_model
4.1.6.1. Experiment&Training Parameters

EzeeAI allows to tune our training parameters very easily. In this tutorial will just focus on two training parameters:

  1. In the Run Config menu, set Number of epochs to 150 and Batch size to 50.

    • This means that the model will go through all the train data for 150 cycles; at each iteration performed within each epoch, 50 data points will be used to update the network. Hence the model will take 3 iterations to complete one epoch.
  2. Start the training by clicking on the play button:

    Run Training

4.1.6.2. Training Visualization

After a few seconds, the boxes belows the play button will start to visualize information on the model performances:

  1. The Checkpoints box indicates the model performance saved for different point in time during the training. Each checkpoint is a saving of the model, and is identified by the number of iterations performed when the model was saved. The number of saved checkpoint is currently 5 and they are saved after 50 iterations, as specified by the parameters Maximum # of checkpoints and Save checkpoints after in the Run Config menu, respectively:

    The R-squared, also called the coefficient of determination, measure the quality of the prediction; it is a value lower than 1, where 1 indicates perfect linear agreement

    • As hoped, as the iterations increase the R-squared increases toward maximum potential value 1 while the Loss decreases toward the minimum potential value 0, therefore indicating that the model performance on the train data is improving.
  2. The other two boxes show the graphs of the Loss and the R-squared measured for different steps; both the Loss and the R-squared are compute both on the train data and the validation data

    Much better model performance on the train data would have signaled that our model is overfitting, that means the model is too influenced by the train data and could have performance issue on other datasets.

    • As the steps increase, the two measures have similar values for the train data and the validation data.

Results Training

4.1.6.3. TensorBoard

Additional details on the model training can be visualized through the Google Tensorflow board, that is accessible from them menu in Run > Tensorboard: Tensorboard

4.1.7. Inference

4.1.7.1. Predict

Sometimes you want to see how your model would behave on a specific datapoint and play a bit with its features. EzeeAI allows that.

  1. On the left menu, go to Predict

  2. Select a checkpoint: in the Model section, select cars_model; the checkpoints table will appear. Pick the ones with the best R-squared.

    Select checkpoint for predict

  3. Now we can set the datapoint values to what we want and see the outcome. A box named Your Features will appear; set debt to 39999 and income to 2000. Then click Predict.

    • We get a prediction of 17444.355. This means that, according to our model trained on the train data, a 25-year-old woman, who commutes 23 miles per day, has a debt of USD 40k and a monthly income of USD 2000 should be willing to spend about USD 17.5k to buy a second-hand car.

    Your prediction

4.1.7.2. Explainability

In some situation we would really like to understand why we get a certain prediction from the model. EzeeAI can give you some hints.

  1. On the left menu, go to Explain

  2. Select a checkpoint: in the Model section, select cars_model; the checkpoints table will appear. Pick the ones with the best R-squared.

    Select checkpoint for explain

  3. Now we can set the datapoint values to what we want and see the outcome. A box named Your Features will appear; set debt to 40000 and income to 2000. Then click Predict.

  4. Let us set the explainability parameters. In the bottom part of the Your Features box, we can select the number of most explaining features to be displayed in the box on the right; let's live #Features to 3. The field Top labels is only used for explaining classification outputs so we leave it set to 1 (for classification, we can set it up to the number of classes in the data).

By clicking Explain, the following boxes will visualize our results:

The explainability analysis is performed via the LIME library available on GitHub and described in this research paper. This is a brief description of the method extracted from the GitHub page:

Intuitively, an explanation is a local linear approximation of the model's behavior. While the model may be very complex globally, it is easier to approximate it around the vicinity of a particular instance. While treating the model as a black box, we perturb the instance we want to explain and learn a sparse linear model around it, as an explanation.

4.1.7.3. Test

We now look at the performance of our model on the test data; these data have not been seen by the model so far, so it can be very informative on the generalizability of the model. To this aim, we need to choose a model (i.e. checkpoint) to be tested and the data on which we want to assess the performances.

  1. On the left menu, go to Test

  2. Select a checkpoint: in the Model section, select cars_model; the checkpoints table will appear. Pick the ones with the best R-squared.

  3. Select the test data: after you select a checkpoint, select cars_split_test.csv beneath Select a test file.

    Test Checkpoint

By clicking Test, the following boxes will visualize our results:

Test Visualization

4.2. Classification

This second tutorial focuses on classification in the context of image data.

The goal is to develop a model that, given an input image, it can predict at what image class it belongs. We will work with a dataset called cifar10 collecting containing 10 evenly-distributed image classes. The 10 different classes represent airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships, and trucks. There are 6,000 images of each class.[4]

4.2.1. Upload Image Data

The first steps for the development of any AI model is data loading. We will upload in EzeeAI the cifar10 dataset.

  1. Download cifar10-10-python.tar.gz data to your device and extract the cifar10.npy file from the archive

  2. On the left menu, go to Datasets > Image

    Upload numpy data

  3. On the Upload Data box (top left), at Select format option choose Numpy file, then click on the Browse button below and select the cifar10.npy file

  4. To complete the file upload, click on the Save button of theUpload Data box. A blue loading bar will appear, and the completion of the upload will be signaled by the writing File upload completed.

The Choose Data box below should now contain an entry named cifar10.

4.2.1.1. Visualization

We can now have a first look at our cifar10 data.

By selecting cifar10 in the Choose Data box, the following boxes will start displaying information:

Data Visualization

4.2.2. Model Creation

Now that we have uploaded our data, we can start building the model. We will create the InputLayer node, that is the first node of the network, and add to it the data cifar10 with some additional images automatically generated from it; then we create the rest of the network and add the loss function node at the very end.

Loss function is the tool used by the network to understand how good its predictions are. Essentially, it is a mathematical function that takes high values for bad predictions and low value for good predictions

On the left menu, go to Models. Here is the blank draw grid where we are going to design our model.

Data Visualization

We will simply need to select nodes from the blue menu and then click on the canvas to paste them; then we create the connection by clicking on the center top of the source node and drag the line to the destination node. In the remainder of this section, we will simply call menu this blue menu on the left of the Models page.

4.2.3. Input Layer Creation

We now create the InputLayer_0 node with the cifar10 data. On the menu, select Input Layers > InputLayer.

  1. Create the InputLayer_0 node by clicking in any place of the draw grid.
  2. Right click on the InputLayer_0 node and select add input data

Add input data

  1. The Input Layer window will appear. Select cifar10 among the datasets and click Continue to go to the Split tab.

    Select Dataset

  2. Now is the time to split your cifar10 dataset in Train, Validation, and Test data by dragging the percentage bar. We recommend in this case to set Train to 70%, Validation to 20% and Test to 10%. Then click Continue to go to the Features tab.

    Split Data

  3. Deep Learning models benefits by being trained on high amount of data. To increase our dataset, we perform what is called data augmentation

    Data augmentation is a way to increase our data samples. It consists in generate new realistic data samples by applying transformations to the available data samples. The transformations simulate the representation of the original samples under different conditions.

  4. Now we are going to apply some data transformation to our images: rotation and horizontal flip:

    1. From Data Augmentation list select Flip and Rotation. Then press the > symbol on top-right. The two rows will now appear on the Selected menu

    2. Select Flip and tick the Horizontal Flip box

      Augmentation - Flip

    3. Select Rotation and set the values to 0 and 6.28, so that all possible rotations can be done

      Augmentation - Rotation

  5. Now we can click on Continue and then Save

    Input Layer completed

4.2.4. Node Creation

  1. Now we create the rest of the network; let's create the first node Conv2D_0 corresponding to a Convolutional Neural Network (CNN) layer. We create a network with 32 2-dimensional filters with kernel window size 3x3; padding will be such that the output has the same length as the original input. Then it will be followed by a ReLu activation function.

    Fur further information on CNN https://www.youtube.com/watch?v=YRhxdVk_sIs

    1. On the left menu with the node models, click Convolutional Layers > Conv2D and then click anywhere in the grid.

    2. To configure the node, do a left-click on it and the configuration menu will appear on the right

    3. Set the following parameters:

      • kernel_size to [3, 3]
      • filter to 32
      • padding to same
    4. Connect InputLayer_0 to Conv2D_0: hold click on the center top of InputLayer_0 and drag to the Conv2D_0 till the arrow appears.

      Create first Conv2D

  2. Similarly, we create another convolutional layer by duplicating the previous one and switch to the default padding.

    1. Right-click on Conv2D_0 node and select duplicate

      Create second Conv2D

    2. Left-click and check that the node is configured as Conv2D_1. Then change padding to valid.

      Edit second Conv2D

    3. Connect Conv2D_1 with Conv2D_0

  3. We can now create the first Maximum Pooling layer MaxPooling2D_0 with a 2x2 pooling window size.

    1. On the left menu with the node models, click Pooling Layers > MaxPooling2D and then click anywhere in the grid.

      Create first MaxPooling2D

    2. Left-click on the MaxPooling2D_0 node and in the configuration menu set both pool_size and strides to [2, 2]. Then connect it to Conv2D_1

      Edit first MaxPooling2D

  4. To reduce overfitting we put here a Dropout Dropout_0 node with 25% dropout rate. Then we save the network

    1. On the left menu with the node models, click Core Layers > Dropout and then click anywhere in the grid.

    2. Left-click on the Dropout_0 node and in the configuration menu set rate to 0.25. Then connect it to MaxPooling2D_0

    3. Let's save what we have done so far. On the top left of the grid, click on new_model and replace it with cifar10_model. At the bottom-right corner of the screen, click on Save to save the model with cifar10_model name. Since our model cannot be validated yet, we'll have to store it without the input data.

      Edit first Dropout

4.2.5. Group Creation

The next four nodes are very similar to the last four. So the fastest way to make them is to transform the previous one in a group, duplicate it, and edit what we want

  1. While keeping the key Shift pressed, left-click on the latest four nodes (Conv2D_0, Conv2D_1, MaxPooling2D_0, Dropout_0)

  2. Right-click on any of those nodes and select group nodes

    Create group

  3. Right-click on the newly created group block_0 and select duplicate

    Duplicate group

  4. A new group will appear; edit its Conv2D_2 and Conv2D_3 nodes such that their filters values are 64

  5. Connect Dropout_0 to Conv2D_2

    Connect the two group

4.2.6. Complete Network, Loss Node Creation, and Model Saving

  1. The next series of nodes will remove the spatial structure of the network and output 10 positives values, one per class. The nodes have to be connected to the previous ones in the order of appearance.

    1. First we flatten the last 2-dimensional network layer by adding a Flatten node: select it from the left menu Core Layers > Flatten

    2. Then we add a Dense node (also called fully-connected): select it from the left menu Core Layers > Dense and set units to 512

    3. We put a Dropout node again, but this time with 50% rate (set rate to 0.5)

    4. Finally we add a final Dense node with 10 units and no activation function (set activation to linear). We call it Dense_1

      Complete the network

  2. Now we have to create the loss function node Loss_0. We select Loss Functions > Loss and then clicking on the grid.

    1. Connect Dense_0 to Loss_0

    2. By accessing the property menu, you can check that the softmax_cross_entropy is used as the loss function:

      Create Loss Node

  3. Now we check that everything is correct. On the bottom-right of the grid, click on the button Validate, and then Save:

    Validate

4.2.7. Model Training

Now that we have a model designed, it is time to train and see how it performs on train and validation data.

  1. On the left menu, go to Train > Run
  2. In the Run Config menu, go to Model and select cifar10_model
4.2.7.1. Experiment&Training Parameters

EzeeAI allows to tune our training parameters very easily. In this tutorial will just focus on two training parameters:

  1. In the Run Config menu, set:

    • Number of epochs to 10000
    • Optimizer to RMSProp
    • Learning rate to 0.0001
  2. Start the training by clicking on the play button