Introduction

Machine Learning is a current buzzword. I’m not going to explain to you what it is. You can find the answer everywhere on the web. What we are interested in here is to do ML on a practical case, from scratch … and ideally not on handwritten numbers.

Yes, we are going to build everything from the beginning: It’s the topic of this post.

If you are eager to play with some code, have a hack on the project free on GitHub.

An example of a traffic camera image

Data

So first we need data.

Secondly … we need more data.

Data can be found everywhere. There are a lot of captors and other sources of data in our daily life:

Thanks to the title, you already know that the last example is going to be our source of data. We should be able to get the images online.

An example of a traffic camera image

An example of a traffic camera image

On every image, as a human, you can easily say if it’s rainy, cloudy, sunny or snowy. The luminosity and all the colours also give you information about when the image has been taken, for instance.

Basically, we want to train the machine to recognise the weather. Snow is the hardest weather to forecast because it depends on small differences of pressure, temperature and heights of clouds. With images, we won’t look for the sky only. To know if it’s snowy, it’s easier to look at the floor. The amount of white could give you the answer and characterise the snowy weather.

Image acquisition: How? Where? When? Looking for a good API …

How to automatically get the images? And where? That was the first issue of this project.

After a quick walk through on Traffic England, the traffic cam images seem to respect the same modele. It’s quite convenient for us, we don’t need to make the images uniform. Uniformity is essential to do ML or any images analyses because you cannot compare two items that are not homogenous.

Exelis’ Helios Weather Platform’s has been the open data API I used. For this back end problem, I chose JavaScript and the Node.js environment.

Helios

Here is the map of the cam on Helios API

Each camera has a unique URL and an identification number, which is readable in the URL. Scrapping begins. With this script you can create a csv file of all the camera URLs available in UK on the API. Being aware of all the problems of authorisations and keys on each platform, you can scrape the images and store it on an AWS S3 bucket. Code here.

csv

Inside the red box, the camera Id

Because we always want to have more data to analyse, I decided to scrape images every day. Hopefully, this GitHub project makes cron job easier to run in Node.js environment. You just have to take into account the refreshing time of cams.

Cron job using node-cron

    new CronJob('* * * * * *', function() {
        go()
    }, null, true, 'Europe/London')

The go function is my main

Enough to teach the machine?

When we are going to teach the weather on an image to our computer, we need to have the answer. And we are not going to observe every images to get it. So No, it’s not enough. It’s time to acquire metadata for our images!

Thanks to datapoint-js, Met Office’s observations are available. Lets scrape daily forecasts where our cameras are at the same time we scrape the images.

Using an AWS DynamoDB table to store our weather items, we must be sure they are unique. To do so, we need a primary key and a sort key. I chose the camera ID as the primary key. For the sort key, the scraping time makes each item unique.

DynamoDB

Our items store on a DynamoDB table

Processing

According to me, Python is more appropriate than JS to do statistics. The CADL project on GitHub has been really helpful for the beginner in ML I am. That’s why I use TensorFlow in my Jupyter Notebook to begin the processing.

Using 100 traffic images, the session-1 already gave interesting results.

Dataset of images.

Here is the dataset I use

Basically, we consider each image as an array of RGB component. So the size of our dataset follows this form: NxHxWxC:

For instance (0, 0, 99, 2) refer to the blue component of the top right pixel of the first image of our dataset (in Python, the first index of lists and arrays is 0). Furthermore, if:

That means that the colour of our present working pixel is Cyan.

Knowing that, we can calculate classic mathematical elements as:

Mean image for the current dataset.

Mean image for the previous dataset

Standard deviation image for the current dataset.

Standard deviation image for the previous dataset

The normalised dataset

The normalised dataset

All of those operations are defined in the graph of our neural network. It’s when we are going to open a session that they are going to be run.

I am going to let you train your programm on your own … because I am working on it at the moment!

Next step

Configuring Lambda function on AWS would avoid you to run the cron job on your own machine. It would handle bigger files and allow you to scrape at anytime and anywhere.

Have a look on kappa if you are as lazy as me. It’s a command line tool that makes deployment, updating and test functions easier to set up. Now let’s go to work!