In the previous post of this mini series, our data scientist Christian hinted at what distinguishes a Machine Learning (ML) algorithm from a more generic Artificial Intelligence (AI) solution. In this new episode, we will go into more detail on the functioning of Machine Learning. Ready? Let’s start!
Hello Christian, here we are again, happy to discover this fascinating world! Shall we start with a brief summary of what we discussed last time?
Of course! To describe a generic solution of AI and a non-ML machine, we took as an example the case of the Deep Blue computer – capable of defeating the World Chess Champion in 1997! – stressing how chess rules were hard-coded into such a computer, courtesy of their being formal. As a consequence, Deep Blue played the game rather mechanically, without any sort of “creativity” or own initiative. Next, we talked about the case of another task – that is, facial recognition – where a similar approach cannot be applied, essentially because the corresponding “rules” are not formal enough to be easily and unambiguously coded. In fact, for instance, it is hard to describe the shape of a nose with high precision without using too many words – and some degree of ambiguity would persist anyway even if a long description was provided.
Since automatic facial recognition is nowadays effective, how does it work then? How does a device – say, a computer – get taught to classify pictures portraying or not a person? Let’s say Albert Einstein, for instance!
It goes as follows. The computer is provided with a) a (possibly large) set of pictures in the form of pixel arrays, b) a label associated with each image (“Yes, this picture does display Albert Einstein” or “No, the person portrayed in this image is not Albert Einstein”) and c) an architecture (or model) featuring parameters to be appropriately tuned. Such an architecture can be thought as a dashboard, the parameters as a series of knobs on it that can be rotated and the tuning – or training, in the ML jargon – as the procedure yielding the overall setup of knob positions that results in the best classification performance (the smallest number of mistakes in recognizing Einstein, for example). In this framework of image processing, it is the way the tuning is usually carried out that qualifies the whole approach as Machine Learning, contrary to the brute-force strategy employed to teach Deep Blue to play chess. In fact, rather than incepting formal rules into the computer and providing it with a fancy method to surf through them, now the programmer “simply” instructs the device to iteratively update all the model parameters by itself via an appropriate algorithm to be applied to the input data (i.e., the pictures with their labels).
How long does the training last?
The training process stops as soon as the performance in classifying images does not improve any more. In other words, the computer is not given – since the very beginning – a path through the forest to be rigidly followed, but it is provided with a machete (and some instructions to use it) with which it has to create the trail by itself. This is the reason why this approach is an example of Machine Learning: the device learns the rules (i.e., the best parameter setup) by itself starting from the data it gets fed with. Thus, you do not have to bother finding a quick but unambiguous way to describe the nose of the acquaintance of yours you would like the computer to recognize: just give the device pictures of her and it will find its own rule(s) to describe it – together with all the other important features and, moreover, the ways every feature combines with the remaining ones.
At first sight, the fact that the development of computer vision has taken longer than the programming of Deep Blue might seem somehow counterintuitive at this point, isn’t it? If human intervention enters to a lesser extent into image recognition and the required human work is apparently less, why haven’t machines learnt how to process images earlier than how to play chess?
Good point. Indeed, in this case programmers do not have to incept all the rules in detail but they “just” have to give the device a tool to find them and wait for it to do the rest of the job. The reasoning behind this question actually overlooks a few issues that have been left in the background so far! But not to worry, we will cover these issues in the next post of this series.