Human brains make vision seem very easy. It doesn’t take much effort for human brain to distinguish a Bus and a Car, Read a sign or even recognising a picture. But these are very actually hard problems to solve with a computer.

They only seem easy because our brains are incredibly good at understanding images & processing information at the speed of 1/100’s of second. But this isn’t the case with computers.

Technically Deep Learning tries to mimic what human brains try to do. So hows it done ?

With a large data-set of Images, Signs, Patterns, Sequence of Events – the powerful computer are “Made to train”  , “Designed to look for Outcomes” like human brains.

Image result for deep learning

So how does Deep Learning perform like human brains ?

Okay, to understand in simpler terms – The simplest, deep learning can be thought of as a way to automate predictive analytics..

Here are 2 good examples of Predictive Analytics using by ATCs across the world

a) How many flights can get delayed if the first 5, 10, 25 flights get delayed due to bad weather at day break.

b) How many Airports will witness a cascading delay & for how long due to a bad weather at 2 busy airports ?

So taking it further ~ How does traditional Machine Learning differ from Deep Learning.

Traditional machine learning algorithms are linear

a) Based on Outcomes, Result, Failures , Outliers

b) Based on Pre fed data into the system [Eg. Total Jobs completed same time last year vs last month vs current data].

While Deep Learning algorithms are stacked in a hierarchy of increasing complexity and abstraction.

a) Model with pre-sets [Info of Vehicles, then Info of Passenger Cars, Then by Types, Then by Makes, Then by Classifications, Then by Configurations, Then by Colours] #note : not in the order mentioned above .

b) Classification based on outcomes. [Yes its a car vs No its not a Car]

A Complex abstraction comes into action using more context specific algorithms..

eg. Processing of an Image like below:

Now : A Deep Learning Process like YOLO (You Only Look Once) or ResNet can detect that there are many objects inside the frame of reference like

  • Person & Bicycle
  • A Car Approaching Towards the Frame
  • Person with a Dog

So now, this is an example of Deep Learning understands and process this image like “Human Brain”. There are many ways of doing it like ResNet, YoLo, YoLo3.. which looks into the images and coverts them to S * S grids and starts identifying a pattern and starts understanding each of these grids & then finally tagging them.

A convolutional neural network (CNN) is a type of artificial neural network that machine learning algorithm –

  1. For supervised learning,
  2. To Analyze data.

A CNN engines process a given image (like shown above) into multiple layers and keeps processing it (with all the data-sets) pre-fed into the system & then finally derives a number out of the process. So higher the number the image is processed and highly likely its present in the dataset. Lower the value means lower confidence.

So its starts like this

  1. Start with the bounding box that has the highest score.
  2. Remove any remaining bounding boxes that overlap it more than the given threshold amount (i.e. more than 50%).
  3. Go to step 1 until there are no more bounding boxes left.

This removes any bounding boxes that overlap too much with other boxes that have a higher score. It only keeps the best ones – This way it processes things more similar to human brains.

And that’s pretty much all there is to it: a regular convolutional network and a bit of post processing of the results afterwards (to make it more human friendly).

Thanks for reading.. More to come ! TensorFlow !