The subject of the use of artificial intelligence, in situations where it is impossible to clearly classify data, is enjoying increasing popularity nowadays. Artificial neural networks (ANNN) are used in smartphones, autonomous cars or translational tools [1, 2, 11, 17]. Thanks to their ability to learn, based on searching for similarities between objects and their generalization, they are able to deal with problems where a very accurate classification is required [6, 8, 18, 19].
The article presents an application for recognizing objects in images using machine learning algorithms, which task is to recognize objects visible in the image and assigning them the correct label [13, 16. Particularly important here is the high ability to identify objects that were not previously included in the training set, which distinguishes neural networks from other algorithms.
Before it becomes possible to use the ANN as a model using machine learning algorithms, it is necessary to properly process the image, which can be divided into several stages as shown in Figure 1. Segmentation is the first activity carried out in the process of machine learning to recognize objects [3, 14].
The image is here divided into parts that are somehow related to each other. This is to pre-isolate areas that belong to a given objects, its boundaries, form or limit sending to the next stage of unnecessary information in the computer memory.
The next stage is the analysis of image features, which makes it possible to reveal and describe the object properties, which often remain unnoticed using only eyes.
Object features can be grouped into several basic groups, e.g. geometric, nongeometric, topological. The choice of which image features are to be analyzed is an individual matter, depending on the final result [4, 5, 10]. For the purposes of this algorithm, however, the most important is the mathematical side of the processing the analyzed image features, resulting in the so-called signatures and skeletons, that is, one-dimensional functions representing the contours of an object. After completing these preliminary actions, there is a process of object recognition using the artificial neural networks, in which it is possible to use the deep learning methods.
ARTIFICIAL NEURAL NETWORKS
ANN is the name of mathematical structures and their software or hardware implementation, consisting of individual elements called neurons capable to performing basic operations at their input .
The principle of the neuron can be represented as follows:
where: xi - value of the i-th input signal, wi - weight factor of the i-th signal, n - number of neurons in the input signal, e - total value of neuron stimulation, y - value on the output neuron, f - activation function.
After calculating the sum e, using the weight factors wi and input signals xi, the result is multiplied by the activation function f, which should meet the following conditions:
The ReLU activation function was used which parameters are depicted in Figure 2. This type of activation function is currently the most-used activation function for learning neural networks that are designed to recognize objects in images. This is due to her endlessly striving response for a valid signal and zeroing the neuron value for a negative signal.
Such approach means that not all neurons are used in the network, which protects it from overfitting and speeds up the process of network learning. In addition, ReLU calculates the derivative very simply, and its linearity enables the use of back error propagation method to make correction in weight coefficients in the network.
Neural network model
Creating a program for recognizing objects in the image requires the development of a mathematical model and a comprehensive approach due to the large amount of data necessary for processing and classification. One of the possible solutions for recognizing objects in the image, due to the effectiveness of operation, is the use of ANN, in which the correct machine learning algorithms will be implemented. The diagram of the neural network used in this program is depicted in Figure 3, .
This is a feedforward network. Information in this type of network architecture only moves forward, Fig. 4. The network consists of several layers, which is a very common solution . The result is a complex structure with a large number of connections, prone to so-called overfitting. The network consists of an input layer, hidden layers and an output layer, which by their nature can be further divided into two groups:
The use of this algorithm allows to accelerate the operation of the neural network, because the programmer acts as a teacher who knows what values he expects at the output of the neural network, determining the correctness of the algorithm result. When the discrepancy is too large, he gives a suggested change in value that will allow correct recognition of the object and instructs the algorithm to make the next iteration - the network learns knowing what result it should obtain, so initially random values on individual neurons will quickly set at the level enabling assumed network operation. The idea of the back propagation method is shown in Figure 5.
This method is used to calculation the neuron values - it does so using mathematical formulas. The basis of its operation is the use of knowledge about the result to be obtained at the output of the network. Then the error is calculated between the suggested value and that obtained by the network and the error is corrected by changing the value on neurons . This simple task is complicated by a network consisting of several layers, which is the basic type used in more advanced artificial intelligence algorithms. In this case, the back propagation algorithm sums up the neuron errors from the hidden layer preceding the last modified layer and only then corrects their values and weights between them.
The operations of the algorithm can be represented as follows:
OPTIMIZATION OF NETWORK MODEL OPERATION
The neural network model presented in section 3.2 consists of several basic elements. In addition to using its typical neural network layers consisting of neurons and activation functions, it also has elements responsible for the optimization of the network’s operation, which allows its proper calculations and blocks the possibility of overtraining, despite a very large amount of data.
The model’s function norm1 is responsible for the local response normalization, through the procedure so-called attenuation, whose task is to normalize the infinite activity of the neuron. Two types of normalization are possible, each of which tends to strengthen the excited neuron and suppress neighboring in the range of limited values. The method searches for the strongest neuron responses and normalizes the responses of neighboring neurons, making the selected neuron even more sensitive to object features. If all the responses of neighboring neurons are large enough, then the function will limit the values in all neurons from a given channel, because they will not accept any of them to be particularly sensitive. In addition to normalization, this function also limits the number of neurons used in the learning.
There are two types of normalization: in the same channel (a group of neighboring neurons) and between the channels, which involves considering the neighborhoods in three, but not in two dimensions.
Below is the formula describing the process of two-dimensional normalization, i.e. for neighboring groups of neurons:
where: - normalized output of the kernel “i” (weight summation place) at position (x, y), - kernel source output attached to position (x, y), N - total number of neurons, n - normalization channel size, k - function hyperparameter.
After normalization, the data is being sent to the statistical filter so-called max pooling, which extracts the maximum value from the mask and reduces the number of calculations in subsequent layers. The application of this filter to non-overlapping subregions results in passing only the largest values, which significantly reduces the amount of data necessary for further processing, without reducing the effectiveness of the algorithm . The simplicity of this solution for the 2x2 masking filter is shown in Figure 6.
Before the information reaches the network output, it still has to go through the Softmax function. Its task, as a transfer function, is to change the input vector from the received value into the output information in the form of a vector with normalized values between 0 and 1, so that the output layer receives specific information that can be interpreted as a certain probability, which will help determine the percentage of accuracy networks in case of its analysis. An example of how this function works is shown in Figure 7.
In adjusting the network structure help a such tools as Optimal Brain Damage and Optimal Brain Surgery, which tasks are detection and getting rid of neurons not involved in the operation. The most advanced Google algorithms for object recognition use the patented Dropout method, working similarly to the above-mentioned tools.
Cleaning the network not only ensures faster learning process, but also increases its accuracy, once again making it more resistant to overfitting. You should also remember to keep the optimal number of hidden layers, because too many of them will overwhelm the network, and too small may not be complex enough to solve the task.
THE PROCESS OF LEARNING A NEURON NETWORK
It is extremely important to set the learning rate at the appropriate level in the program [15, 17]. This factor defines by what maximum value can change the neuron weight. When this value is too low, the network learning process is disproportionately longer because it requires much more iterations to make the teacher-specified corrections. However, when it is too high, the weight correction performed on neurons will be so large that it will prevent the correct learning process.
The weights correction with large values means that the calculated values for neurons, despite subsequent iterations, will not be able to fit into the network scheme, which will cause them to be random all the time, such as in the initial phase of the learning process. When choosing the right learning rate (Fig. 8), the network learns correctly and after a few epochs already achieves the required recognition accuracy. Further learning has no greater impact on the improving recognition accuracy.
If the value of this factor is set too high (Fig. 9), the network learning process is very fast, which is not always a positive phenomenon, because the network can stop at the local minimum and thus will not achieve maximum recognition accuracy. Incorrect selection of weights in the process of learning a network can also extend its learning time.
In the final phase, when the ratio has decreased due to the use of the Drop LearnigFactor function, the network is stuck on minimum accuracy values and will not work properly. Too little correction of weights, as shown in Figure 10, will unnecessarily extend the duration of the learning process.
As you can see from the above calculation results, the worst for the network is setting too high weight correction, which can cause the mean square error to accidentally set below the value, despite the fact that the weight distribution on the neurons will not yet guarantee the correct operation of the network. To prevent such a situation, the momentum rule was used in the developed software application. It works by coupling together the correction values of weights from the previous and current iteration, through the factor α. Such use the momentum causes that the weights correction (learning rate) starts from a certain high level, and then decreases with each iteration as it approaches the value specified by the teacher at the output. This approach extends the learning process and refines it and at the same time protects against accidental ending of the program due to a lot number of weights correction.
The method of using the parameter α is represented by the formula (4), which describes the change in weights used in the gradient method
where: - modification of weight i after iteration t, η - coefficient of change in the value of neuron weights (learning rate), – gradient fall, α - weight correction factor from previous and current iteration.
The process of network learning depends on the correct use of all the above algorithms and their methods. For the network to work properly, it needs to provide a lot of training data, while using as few neurons as possible and connections between them to avoid overtraining.
RESULTS OF NUMERICAL CALCULATIONS
Training a neural network requires operation on a large set of input data. Creating such a large image database is extremely time consuming, therefore in the first stage of work the CIFAR-10 image database available on the Internet will be used (Fig. 11), . Research and simulation with using the designed program was carried out in MATLAB environment, . It contains 70 000 images of sizes 32x32 pixels, assigned to 10 basic categories, on the basis of which the network will be pre-trained. In the first stage, after loading the data into the network, they were scaled up, taking place in the input data layer, which consists of two basic parts:
After uploading and training images to the network, it is able to recognize objects in images with high accuracy, giving them the right label. The network compares its recognition result with the real image and when the recognition result is correct, displays the label above that image in green and otherwise in red color. An example of how the above application works are shown in Figure 11, .
Adding the ability of scaling the data enabled the use of the same database for a neural network with a slightly different structure, adapted to recognize images after adding the Gaussian noise in order to increase the number of images. Thanks to the initial testing of the network on images with lower resolution, the network worked faster and it took much less time to determine its structure and select the appropriate learning factor. This time, the percentage of the recognition correctness determined by the network is displayed above the images, Fig. 12.
After testing the network operation in simpler cases, its subsequent application was presented, this time in connection with objects detection. The proven network structure (except the change of neurons number in the output layer to three - due to three categories of objects) remained unchanged, while the database consisting of 1000 pedestrians, cars and road signs has been changed. The MATLAB function rcnnObjectDetector is responsible for the objects detection in the images, and the artificial neural network from previous points was loaded into it and trained on a new database .
Figures 13 and 14 show the result of calculations obtained from the program described above for pedestrian, car and road signs recognition in the image. After ending the learning process, the network correctly detected and recognized cars, pedestrians and traffic signs, Fig. 14.
To sum up, the presented above application enables the different objects recognition in images, applying the machine learning algorithms for classification with using the artificial neural networks. The neural network is an excellent tool for recognizing objects in images, but it should remember about the appropriate selection of its model. The proper selection of number and types of layers, number of neurons, activation functions and the value of the learning factor is also extremely important. The interface of the computer application is based on the use of the Deep Learning Toolbox tool, enabling easy uploading to the previously designed neural networks, databases, or selecting the percentage of data to be used in the process of network learning, object recognition and validation.
The presented experiment results show the advantages of the software used in ANN processing. Knowledge of the neural network architecture allows to decrease the learning time and recognize objects in images in the final system in real time. In order to be able to use the ANN for recognition in real systems, it is necessary to have a large number of known reference objects, which can be used in the images learning process. The recognition model presented here can be adapted to the needs of any objects recognition.
Hryvachevskyi, A., Prudyus, I., Lazko, L. and Fabirovskyy, S., “Improvement of segmentation quality of multispectral images by increasing resolution,” in 2nd International Conference on Information and Telecommunication Technologies and Radio Electronics, UkrMiCo 2017 - Proceedings 8095371, (2017). https://doi.org/10.1109/UkrMiCo.2017.8095371 Google Scholar
Kaniewski P., Leśnik C., Serafin P.and Łabowski M., “Chosen Results of Flight Tests of WATSAR System,” in 17th International Radar Symposium IRS 2016, 1 –5 (2016). Google Scholar
Konatowski, S., “The development of nonlinear filtering algorithms,” Przeglad Elektrotechniczny, 86 (9), 272 –277 (2010). Google Scholar
Osowski S., Sieci neuronowe do przetwarzania informacji, Oficyna Wydawnicza Politechniki Warszawskiej, Warszawa (2006). Google Scholar
Parrallel Neural Network Training with OpenCL, (2018) https://bib.irb.hr/datoteka/584308.MIPRO_2011_Nenad.pdf November ). 2018). Google Scholar
Prudyus, I. and Hryvachevskyi, A., “Image segmentation based on cluster analysis of multispectral monitoring data,” in Modern Problems of Radio Engineering, Telecommunications and Computer Science, Proc. of the 13th International Conference on TCSET 2016, 226 –229 (2016). https://doi.org/10.1109/TCSET.2016.7452020 Google Scholar
Rajkowski, A., Wykorzystanie algorytmów uczenia maszynowego do rozpoznawania obiektów na obrazach, Praca dyplomowa, WAT, Warszawa (2019). Google Scholar
Tadeusiewicz R. and Szaleniec M., Leksykon Sieci Neuronowych, Wydawnictwo Fundacji Projekt Nauka, Wrocław (2015). Google Scholar
Vitalii, B., Kirichenko, L. and Radivilova, T., “Classification of multifractal time series by decision tree methods,” in 14th International Conference on ICT in Education, Research and Industrial Applications. Integration, Harmonization and Knowledge Transfer. CEUR Workshop Proceedings 2015, 457 –460 (2015). Google Scholar
Wajszczyk B. and Biernacki K., “Optimization of the efficiency of search operations in the relational database of radio electronic systems,” in Proc. of SPIE - The International Society for Optical Engineering, 2017 Radioelectronic Systems Conference, 107150H https://doi.org/10.1117/12.2317741 Google Scholar