About

For our Humanoids final project we decided to create a web app where a user can train a pasta identification model using hand-drawn pictures and test the model on new drawings. We used a convolutional neural network and created our project using Python, Keras (an API from Tensorflow) and Flask. 


Tools

  • Python 

  • Tensorflow

  • HTML/CSS/Javascript

Members 

  • Ricky Lee

  • Spoorthi Cherivirala

image8.png
image3.png
image7.png

Demo Video

Walkthrough:
The user first enters the game screen; the displayed interface has 2 options: either play or train the model. So we’re gonna start with training the model. This model has already been trained with 200 pasta drawings, so right now we're just adding to this dataset. At any point, the user can clear their drawing and start over. After they are done,  the user can return to the home screen and choose to play the game. Given a pasta prompt, they must draw the corresponding pasta shape. If their drawing is detected as correct, the game will display a positive message, if not, it displays a negative message and allows the user to progress to the next question. 

1-  Web App Structure

Routes

The home page links to a page to add user-inputted data (hand-drawn) to the model, as well as a page to test the model against user-inputted data. Each of these pages is an API endpoint from the main home page (with a corresponding route), and has their own GET and POST methods for collecting image data and outputting prompts respectively.

On the page to add data, the user is prompted by a short message asking for a drawing representing a certain pasta. This drawing is created in a small canvas by clicking and dragging. When the user is done, they can click complete to submit the form in the web app which then passes the image data to the model. Of course, there is the option to clear the canvas and start over.

 

The page to test the model has a similar layout. However, instead of submitting an image as training data for the model, the model classifies the image and provides feedback on whether it believes the image to be correct or not. 

1.5-  Styling (HTML/CSS)

Play & Add to Model Pages 

image2.png
image5.png

Index & Main CSS Stylesheet

image6.png
image1.png

2-  Getting User Input (Javascript)

Javascript for mouse input on canvas
We use event listeners, which wait for user inputs on the mouse for the canvas. We altered the weight, color, and responsiveness of the cursor. The image on the right depicts the resulting canvas on the draw page.

image3.png
image1.png
image5.png

3-  Representation of User Input

image2.png

Hand-drawn user input represented as a compressed array of 0s and 1s 
We used the Canvas API, and the function getImageData. We downsample the image data that is taken from the hand-drawn input of the pasta to go from a 200x200 image to 50x50. 

4-  Collecting data from user

When a user is done drawing, they submit through a Javascript form which also has a label for their drawing. 

The bi-directional encoder array represents a dictionary of our types of pasta with their corresponding string name. This is important for randomly selecting a pasta for the user to draw (whether for adding data or for testing new drawings). 

We collect the data into a npy file, which persistently stores a NumPy array on local disk. This ensures that a user can still run the web app without training a new model if they trained one in a previous session.

We wanted to make our game appear appealing and pasta-themed! This is our custom CSS/ HTML to insert a pasta background and include a consistent theme of red buttons and styling for clear/ submit, etc. The image on the left shows the canvas element dimensions and specifications as well as the messages that will display depending on correctness. ​The styling for the play and add to model pages are very similar and most include in-line CSS. 

The styling for our main page comes from our CSS stylesheet where we dictate basic font sizes, styles, margins, etc for the different hierarchy headers as well as boxes on the page. The HTML shows the two-button text to either play or add to model with in-line CSS.

screen 1 w pastas.PNG
image3.png
image2.png
image1.png

4.5- Pairing Data with label

Each drawing is paired with a label, which is taken from the aforementioned encoder array.

image5.png
image6.png

5- Aggregating data by labels

User data inputted into array “imgs”, paired with labels from bi-directional array “ENCODER”
Data then split into training and test sets
E

Each drawing, as an array of pixel values, is then paired with its label, and is then stored in a larger array containing all the images for each label (ie. all the lasagna drawings are grouped together this way). 

Once the data has been collected and the model begins training, it is important to split the data into training and test sets. This ensures that different data is being used to train and test the model. We randomly sample the data when splitting.

image4.png
image7.png

6- Model

We use a convolutional neural network, which is used for analyzing images

1_uAeANQIOQPqWZnnuH-VEyw.jpeg

Like neural networks, they are comprised of layers and try to analyze some type of input data

  • CNNs are generally used for image processing and analysis

Convolving - sliding window of pixels which does some type of operation with a certain filter, which is just a matrix of values

  • Ie. element wise multiplication used for c

  • CNNs are made of convolutional layers

    • Convolutional layers are made of filters

    • Filters are used to find patterns in an image - simple ones can be used for edge detection, whereas more complex ones can be used later on to identify perhaps corners, circles, or even real objects (in images)

    • Each layer tends to have more and more filters, as it picks up on patterns in the image

Code Notes (building the model)

Code Notes (training the model)

image1.png
image4.png
  • We use a convolutional neural network, which is used for analyzing images. Our model uses three layers, using the relu activation function. We utilize max pooling, which takes the strongest / highest value in a window of 2 by 2 pixels. We also use dropout, which downsamples the data. Both of these methods reduce the amount of data being processed, while retaining the most significant data for the model. Each layer reduces the amount of data by more than a factor of ten. 

  • To obtain a metric for optimization in the model, we use the sparse categorical cross-entropy loss function. This is due to the multi-class nature of our classification problem, as well as the fact that each image can only be one pasta (there are no hybrid pasta drawings allowed). We use the Adam optimizer, as is standard practice with this type of image classification problem. Finally we use an early stopping function which ends the training process if two epochs pass without progress in the performance metric, accuracy. 

6.5- Training Model

Added 197 data points, Model took 5m 17s to run (20 epochs)

image2.png
image3.png
pasta collage.PNG

6.5- Experimenting with model

Batch size, epochs, kernel size
In order to improve accuracy, we experimented with changing the specifications of the model including batch size, epochs, and kernel size. Below are some of the results our models generated. As you can see in the bottom left image, the validation and training accuracy are similar which is good. There isn’t too much overfitting.

image3.png
image2.png
image9.png
image4.png

7- Showing Results

Confusion matrix to show incorrect vs. correct guesses
This confusion matrix shows incorrect vs. correct guesses made by our final model. There are no incorrect guesses, suggesting that the data might be too homogenous or the problem may be too simple (we may need more pasta shapes or more complex ones). In previous iterations of the model, there were often mistakes made in terms of distinguishing between ravioli and penne. This makes sense considering the similarities in their shapes. When the model was initially generated with few data entries, ravioli and lasagne were often mistaken because of their essential rectangle shape. 

image6.png
image8.png

8- "Play" Page

Play page allows user to test their drawings once the model has been generated
Determines correctness, passes it the html code

The right images show the corresponding html/css pages depending on the correctness of the user input. Play page allows the user to test their drawings once the model has been generated. Using the POST method for this page a randomly generated label is chosen and outputted through a prompt on the screen. The GET method collects the user’s drawing and runs the model on it. On submit the model’s label and the prompted label are compared, which is how correctness is determined. 

image5.png
image1.png
image7.png

9- Next Steps

Based on feedback from in-class presentations, moving forward we worked on developing these steps.

next steps.PNG

Improving Model: We experimented and analyzed results from running different models when changing epochs, batch-size, max pooling, dropout size, and kernel-size; We also generated new heatmaps that show increased confusion during certain models with more downsampling. ​

279676101_546741597128856_4718869099205586248_n.png
279891720_542831640749622_1534054456716826782_n.png
280575340_556593579401701_2778344963342992237_n.png
279637787_722555355606923_1246137775060119940_n.png
279473989_685085869536357_6182564562377004271_n.png
280150022_313164354331655_1070963906793236180_n.png
280323113_4823599184434109_2929001444445572568_n.png
279473989_685085869536357_6182564562377004271_n.png
279875418_532118251904683_4247503964602509877_n.png
279637787_722555355606923_1246137775060119940_n.png
280698182_4922697324450836_7473965644868342915_n.png
279718821_1014738819165047_1136296709130375117_n.png

Complex Pasta: We tried to train the model on new drawing styles to increase its versatility (imitating different users) and see if it's still able to distinguish between pasta shapes effectively when playing the game. We simplified the pasta shapes into one line (squiggle, rectangle, square, triangle) to emulate how others might draw them as well as drew the pasta in different orientations. Image below shows some of these entries.

Frame 5 (11).png
heatmap.PNG
epochs.PNG

After training the model with around 100 of these abnormal pasta shapes, the accuracy decreased when we ran the same model. The heatmap also showed a lot more confusion between the pasta shapes, especially ravioli and lasagne which is what we predicted because of the similarities in their appearance. Although most of the shapes are still distinguishable, this indicates that our game would be more ideal when the model is trained by the actual players.

Findings and Reflections

Most of our data was created by one person (Spoorthi) - this may have contributed to the ease with which the model classified data. Pastas drawn in different styles, using different stroke orders, by different people, may complicate the classification problem.

 

Each image contains a lot of data, but the model did not take very long in terms of processing these volumes of data. We believe that the max pooling and dropout downsampling techniques greatly contributed to this - it was impossible to run the model without doing this.

 

If the stroke order or way in which the pasta was drawn is encoded into the model, it may be possible for it to identify users given an image and the correct label for which it represents. This would be a very difficult task considering the limits of convolutional neural networks but perhaps with enough training data and time it would be able to solve this. This would be an interesting future addition to this project.

 

Finally, in the near future we will gamify the web app, as to show data to the user in terms of their performance relative to the model over time.