According to R.E. Bellman (1978) artificial intelligence is the ’[automation of] activities that we associate with human thinking, activities such as decision-making, problem solving, learning …. ‘. With this broad definition the scientific fields of artificial intelligence features Greek ancient roots. Nevertheless, the first working AI programs were developed in the 1950’s and from this time also the term artificial intelligence (AI) originates. An important aspect of AI is machine learning (ML) and the first ML methods were developed in the field of statistics around 1900.
But what is ‘Machine learning (ML)’? ML is the science, which researches, develops and uses algorithms, statistical/mathematical methods that allow computer systems to improve their performance on a specific task. Machine learning methods either explicitly or implicitly construct a statistical/mathematical model of the data based on a sample dataset called ‘training data’. Based on the underlying statistical/mathematical model ML methods can perform predictions or make decisions without being explicitly programmed to do so. This is the un-formal version of the famous and often quoted definition of ML by Tom M. Mitchell, 1997. ML techniques are utilized in a wide range of application ranging from spam detection in email accounts to computer vision and data analysis of spectroscopic data. Deep learning is a special type of machine learning methods, which were developed since 2006 and are often applied since 2010. These methods are composed of multiple layers of nonlinear processing units forming a high parameterized version of ML, which features a high degree of non-linearity. The difference between classical machine learning and deep learning is described in the respective sections.
Like stated above machine learning is defined by Tom M. Mitchell (1997) as ‘a computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E’. In order to make this clearer, we like to link it to the most prominent machine learning task. This often tested machine learning task is the following: A computer program should classify hand written digits (MNIST database of handwritten digits). In that respect the input of the model are images of size 28x28 pixels and the output of the model is one of the digits 0, 1, …, 8, 9. This task (T) is visualized above and it gives an instructive example of the working of ML methods for image recognition. The experience E are the know ground truth for the trainings data, e.g. the known connection between the training images and the numbers. The performance measure is some error rate between the ML prediction and the ground truth. We utilize these ML and Deep Learning methods similarly to elucidate the difference between bio-medical conditions of samples, which are characterized by spectral or image measurements.
In classical machine learning (CML) the link between the images and higher information about this data, e.g class labels, is carried out in a two-step procedure. In the first step image features are extracted. The concrete types of image features are designed by a researcher using his/her knowledge and intuition about the samples and data at hand. Therefore human intervention is needed for this feature extraction. After the features are extracted the (multivariate) dataset is analyzed using easy classification models, like linear discriminant analysis (LDA).
This procedure is sketched in the figure above. A multimodal image is used as input image and the desired output is an image with three gray levels: one indicating background pixel and the other both gray levels represent different tissue classes. Therefore the machine learning task is to perform the translation between both images and is called semantic segmentation. After the image features ware extracted the so called training data set is used to train the classification model, where a number of images together with their true tissue segmentation is known.
In the figure above a classical machine learning framework is visualized for the study of Heuke et al. (2016). The multimodal images (left side) are utilized to calculate statistical moments of the histogram in a local mode, which characterize the texture. In a region around a central pixel the histogram of the image values is calculated. Based on that first order statistical moments or derivate of them, like mean, standard deviation and entropy, are calculated and this is performed on every pixel of the image as central pixel. In that way feature images (mid part of the figure) are generated which are subsequently utilized to predict, which tissue regions are present at the central pixel. The results are false color images like shown on the right of the above figure, where green represents healthy epithelium, while red indicates cancerous areas.
Deep Learning is a special version of machine learning, which is inspired by the working of the human brain in processing of visual data or other kinds of input. Deep learning for semantic segmentation is visualized in the figure above. The main property of deep learning is that the feature extraction is done by the method implicitly in combination with the construction of a classification model. This is the main advantage, because no human is needed to construct a suitable feature extraction. The drawback is that these models can’t be interpreted and the model has a lot of parameters (typically 1Mio parameters). Nevertheless, these models feature a unique potential to solve a large class of machine learning tasks, especially if a large amount of data is existing.
In the figure above a special kind of deep learning method, a convolutional neuronal network is visualized (CN24 architecture (https://www.github.com/cvjena/cn24) pre-trained using the ILSVRC2012 dataset). It adapts the processing of visual input by the human brain, by performing a large number of subsequent convolutional operations. Therefore, the main part of these networks consists of convolutional filters applied subsequently to generate feature images, which are subsequently converted into class prediction. Again a local prediction of tissue types based on multimodal images is generated using this network, which can be seen on right of the figure. The deep learning technique can be seen as non-linear translation tool from the multimodal images (left side) into the false color images characterizing the tissue types (right side).