Demystifying ML, AI, Deep Learning, Cognitive Computing and more
In 2005, a driverless Volkswagen won a $2 million race across the rugged Nevada desert, beating four other robot-guided vehicles. The winner, Stanley, the VW Touareg, designed by Stanford University, zipped through the 132-mile Mojave Desert course in 6 hrs 53 min, using only computer brain and sensors to navigate rough and twisted desert and mountain trails. The Stanford team celebrated by popping champagne and pouring it over the mud-covered Stanley. To read more about this look up DARPA Grand Challenge (2005). This article will explore what each of these technologies mean and how they differ from each other, if at all, and more importantly, how we have been using these technologies already and where. As always, the goal is to engage the community and open discussions around this subject.
First off, what do Machine Learning (ML), Artificial Intelligence (AI), Deep Learning (DL), and Cognitive Computing have in common and how do they differ?
Machine Learning is a set of algorithms, some going back many decades, and are used in predicting outcomes. It uses data, preferably vast amounts of data, about the past events to train or learn about the patterns in the data using one of many algorithms, each of which works well in specific situations. There are more academic definitions of ML as well as definitions by Wiki. ML has been referred to by many avatars, like Pattern Recognition, Statistical Modeling (yes, ML is based on statistics), Predictive Analytics, Data Mining, Knowledge Discovery and by the more recent buzzword, Data Science.
The Concept of Learning
The concept ‘learn’ is essentially the code that reconciles the output of an algorithm against parameters like past results in an iterative manner. Based on reconciliation, the code continues to refine the inputs so that the next set of outputs has an improved degree of accuracy. In other words, the learning is aimed at knowing and being able to mathematically calculate the influence of the input data to predict the behavior of output data.
How do we learn? How do we predict? How do we know the accuracy of predictions and get better at it? There are many schools of thought within the study of Machine Learning, namely, Symbolists, Connectionists, Evolutionaries, Bayesians and Analogizers. Each of these schools of thought has its algorithm(s). Connectionists use Neural Network theories and related algorithms, Bayesians use Bayesian Inference and its associates, Analogizers use Support Vector Machines and Symbolists use Decision Trees. While these do what they are designed to do, in academia, there is research going on to figure out whether there is one master algorithm that can discover or learn knowledge from the data and use it for predictions.
Where does AI fit in? In layman’s terms, AI is when a machine or a computer can replicate the functions of human brain with reasonable amount of certainty that we can trust. It is way of teaching computers to do what humans do, better. How is it related to ML? ML is a subfield of AI, that is, ML algorithms are the ones that drive AI to achieve its intelligence. But ML has grown so much that it is a major area of study, research and real-life applications in itself.
DL is another way of applying learning algorithms based on Neural Networks. There are many ways of using Neural Networks and develop algorithms using it for many of the functions that conventional machine learning do – learn from data and predict. Neural Networks go back many decades, out of attention outside academia for sometime but back in wide use because of the computing power available especially if you have to train or discover knowledge from large amount of data.
Cognitive computing, as a term, has become more popular primarily because of IBM and Watson but it eventually is a combination of ML, and DL with a goal to achieve what a human brain would do. That would be in theory. Many scientists would say that human brain is too complex for a machine to learn and humans to program it. But what exactly do companies like IBM mean when they say Cognitive Computing? They call it ‘augmented intelligence’ that involves humans in the loop as opposed to pure AI.
There is another related field but following a totally different way of doing traditional AI, called Knowledge Engineering. This is developed with the belief that all knowledge needs to be “taught” or fed into the computers. It took prominence for a while some decades back, there are some who still believe in it. To know more about it look up Cyc project. The commercial version of it was released, called Lucid and it appears to have been taken down. This is an effort taken up by scientists at MIT.
With the definitions out of the way, here are some current day uses of these technologies. This will not go into the benefits, pros and cons of technologies. This will also not go into the details of the math behind the algorithms nor how they work.
Naïve Bayes - A simple algorithm that can be expressed in a short equation has been used for various purposes
Spam filters for emails
Google uses machine learning extensively and one point of time, Naïve Bayes was the most widely used algorithm there
Nearest Neighbor - Another simple algorithm that has been used for
Handwriting recognition – when we mail a letter (regular mail, USPS – remember that?), the ZIP codes that we write are identified and sorted automatically. This involves scanning of the addresses written by hand and recognizing the digits. There are other algorithms that could for this purpose as well.
Control of Robot Hands.
Recommendation of books, movie etc.
It is considered to be the fastest algorithm
When you apply for a credit card, a decision is taken within a few seconds to decide whether it is to be approved or not. Decision Tree algorithms have been used for it
Identifying splice junctions in DNA!
In games – what would be the next move in a Chess game for example.
The above three are simple algorithms but it can be seen how flexible, powerful and generalist these algorithms are. One that can play a game can identify splice junctions in DNA while an algorithm that can recognize digits can control Robots! That is the power of ML.
In linguistics, multiple algorithms are used for Natural Language processing. When you use Google Translate, ML is used for knowing what language one is typing. And then ML is used again to translate it to the language of choice. Spelling correction is another example of ML. When using a cell phone and typing on the touch screen, ML suggests the next word. ML is the underlying method for parsing text, word sense disambiguation, part of speech tagging, semantic analysis – all geared towards understanding what is written or going to be written in a document. The methods using ML can analyze and understand language with accuracy close to humans.
Microsoft Kinect: Uses Decision Tree algorithms to locate parts of your body and the movements, direction, depth and speed when you play an Xbox game!
Driverless cars: Most of us hear about driverless cars now, with Google, Tesla and Uber leading the way. and those of us interested in driverless cars have heard about DARPA challenge. But it started very early – Carnegie Mellon, one of the leaders in ML and AI research, had a driverless car in the 90s that drove across America using blurry cameras. And it used Multilayer Perceptron (uses neural networks). In the nineties!! But every driverless car project, involves extensive amount of ML in combination with technologies like LiDar for 3D mapping of surrounding objects.
Search Engines: Based on one’s query, what are the relevant results to return? The classification of what is relevant and not relevant is done often by Naïve Bayes algorithm.
Siri and other speech recognition systems: It will be interesting to note that Siri, the assistant in one of the most advanced phones, iPhone uses a theory that traces its origin to at least 100 years!! Siri uses a machine learning model called Hidden Markov Model (HMM). This is based on a theory published by a Russian Scientist called Andrei Markov who applied probability theory to poetry. It is about how words or letters in a sentence follow each other. The model uses the observations – spoken words – to infer words from a list of written words. The model tries to use the probability of the next word given the first word. So, Siri has a Russian connection!!
Cell phone calls: The same HMM model is used whenever one uses a cell phone to make a call. When the stream of bits are sent over the network, the model uses prediction probability to insert missing/corrupted bits of the bit stream, to make the calls sound better. Looks like my T-Mobile doesn’t use the technology much. But this will work only if the number of bits missing are not too many in number
Missiles: A derivative of HMM called Kalman filters are used in navigation of missiles to keep following the expected trajectory! And the same Kalman filters are used in Economics when economists need to remove outliers from data points over a period of time.
Even while self driven cars have to use Laser based technologies for obstacle detection, the same Kalman filters are used to track objects and other ML technologies to actually detect and avoid objects!
High frequency trading: One of the earliest use of Neural networks was to use it for Stock price prediction and trading. Since investment banking companies were earliest users of high powered computing, they were able to initially use Neural net technologies to suggest stocks to trade and then evolved into computers deciding on what and how much to trade.
So how do these technologies all stack up? Based on our understanding of each of these technologies we have attempted at placing them all together. Figure below connects each of these technologies in reference to core Mathematics and Statics and how they differ from each other in terms of algorithms and use cases involved.
Some interesting real stories...
Walmart was supposed to have stored beer and diapers together. The story goes that when dads are asked to go and get diapers, just to remove the guilt of doing it, they also bought beer. Not sure whether the story is true or not but product placement in stores follow a pattern based on what similar products are bought together. A recommendation system for a brick and mortar store!!
Earliest case of a machine learning algorithm is also the beginning of the field of epidemiology. When Cholera hit parts of London in 1854, Dr. John Snow plotted the occurrence Cholera on the city map of London. And isolated the cause of Cholera to one pump! And when they sealed that pump, Cholera went away. It is also the earliest case of Nearest Neighbor algorithm used in real life scenario. If you were wondering what this map is, it is the actual map used by Dr. Snow to do his nearest neighbor algorithm.
At a more personal level, when we went shopping, my 8-10 year old sons used to wait outside the stores and tried to predict – yes actually, predict – which car is driven by whom. They used to match the person walking out of the store and try to predict which car he/she is going to unlock and start driving. Over a period of time, their accuracy improved a lot – they leant with more data. Their prediction improved over a time!! Human Of course, it helped that we lived in a very diverse community and it hard to discuss those results here without being accused of encouraging stereotypes and being politically incorrect – but really, pattern recognition is identifying stereotypes, at its best!!
A problem faced when training a machine learning algorithm is that it will have errors in predictions in many cases which are natural and OK. One should not try to “overfit” an algorithm with available training data so it works perfectly with the available data but not with future data. A human example – a real story - of this is when a little girl looked a Latina baby in a shopping mall and told her mom, “hey, mom, look at the baby maid.”