How to build adaptive signal processing applications?
S. Sochacki, PhD in Computer Science
A2D, ZA Les Minimes, 1 rue de la Trinquette, 17000 La Rochelle, FRANCE
.
.
Machine vision is the application of computer-aided vision to industrial production and research. High-speed mass production, constant concern for quality improvement and the quest for economic gains are increasingly driving manufacturers to automate their production resources. Machine vision is one answer to these concerns, for production control operations. In fact, machine vision systems enable high-speed control, and ensure good repeatability (unlike a human operator, a machine is never tired, and its decision-making criteria do not vary). Another application is the management of object flows. For example, scanning a barcode or postal address on a parcel to direct it through a sorting center. Or sorting apples by color before packing. Finally, it can be used to guide an autonomous mobile system (such as a robot or drone) when its movements cannot be determined in advance, such as gripping objects on a conveyor belt. In this case, a camera is mounted on the robot’s head, enabling it to be positioned at the desired point.
.
This discipline is essentially based on pattern or character recognition. However, an image is an object containing a vast amount of information (shapes, colors, etc.), depending on the level of analysis: the image itself, each of its objects or each of its pixels. With a view to instrumenting part of human vision, we need to ask ourselves how image processing can be automated. Today, there are a large number of computer operations that can be applied to an image. The problem is that these operations depend on the goal being pursued: enhancing contrast will not involve the same actions as extracting the contours of objects. The same applies to the extraction of image shapes or, quite simply, image denoising. The choice of operators, and their sequencing, will therefore depend on the data available and the goal sought.
.
These questions don’t just arise in the context of pattern and character recognition: any computer vision application raises them. There are a number of methods, approaches, algorithms etc. that enable experts in this field to develop applications specific to a given problem or field of application. The drawback is that, given the complexity of the image object and the specificity of the applications, it is often necessary to redevelop an entire application for each new problem.
Our work is based on the assumption that any image processing application can be represented as a chain of unitary operators. These operators include all pre-processing steps (correction of image illumination, orientation, scale, etc.), denoising (color, grayscale or black-and-white images), concentration of so-called useful data (thresholding, binarization, etc.), segmentation, attribute computation on the elements resulting from segmentation, usually followed by a classification and decision step. Denoising is not included in pre-processing operations, given its complex nature. In fact, it is quite possible to carry out a series of operations (denoising of a color image, conversion of the image to grayscale, denoising of a grayscale image, binarization, denoising of a binary image), or to correct different sources of noise (noise linked to the document itself, noise linked to the digitization system, etc.).
As we have already stated, in the majority of cases, a new image processing application has to be developed in its entirety for each new problem that arises, which means that the sequences have to be redone, with new operators. In this case, the processes are different, but they functionally represent the same role; an attribute calculation using Zernike moments will serve exactly the same purpose as using Fourier moments. The same applies to the technical means of performing denoising (whatever the method, the aim is to remove noise), classification (whatever the method, the aim is to assign a class to the attributes) and so on. If the functional schema remains the same, can we imagine providing the system with sets of operators, a sort of toolbox for each stage in the processing chain? In this way, a system for analyzing operator outputs and looping between stages would provide a dynamic, adaptive processing chain, resolving a number of constraints.
.
Our main hypothesis is that any image processing chain should operate in a loop. The main question is therefore the exit criterion of the processing loop. According to such a formalism, the looping criterion must relate to the information resulting from processing, typically the labeling decision. Faced with this question, it seems obvious to us that to justify looping, the quality of the decision must be estimated to serve as a criterion. If we stop at this question, there are numerous solutions for producing both a decision and an estimate of its quality, but our reasoning implies that the quality of the decision is linked to the quality of the data, the quality of the values characterizing this data and the quality of the intermediate decision or labeling systems. Consequently, there is nothing to prevent the implementation of different levels of looping to obtain a system as described by Fayyad [1] [2] in his work giving rise to the foundations of “Data Mining” and “Knowledge Discovery in Databases” (KDD), illustrated in Figure 2-2.
.
Fayyad’s model, adapted to our problem, can be represented by the following figure.
.
Fayyad’s total approach may be intellectually appealing, but in practical terms it appears very difficult to implement, due to the highly non-linear nature of the various processes. In contrast to the internal management of a fully automatic processing system, Régis Clouard [3] relies on the strong dependence on the expert for the production of an image processing chain.
.
According to him, this discipline has reached a certain apogee, given the methods developed, but currently lacks generic modeling and automatic application construction tools, hence the almost systematic recourse to image processing experts. In his model, the complexity of the analysis lies in the fact that image processing applications are always dedicated, and secondly that the choice of operations is often if not always subject to the choices of the image processing expert. Under these two constraints, the production of the image processing system lies in the fine collaboration between the application expert and the image processing expert, but induces two problems:
.
These problems are typical of linear decision chains. Naturally, our aim is to remedy these problems by allowing intermediate looping (the classic “feedback” aspect of automatic control), reconsidering a choice of operators in the loop (the non-linear aspect), or even trying a different combination of operators if the remaining time imposed by the application allows. Given the need to loop on any stage of the chain, at any time, our approach is bound to be dynamic and non-linear.
.
Even if, from a conceptual point of view, this non-linear chain and its loops seem to provide answers to our specifications, its realization seems very complex; how can we make the system “decide” to go back on a treatment to try another? The literature provides a basis for such methods as statistics, decision theory, fuzzy logic and evidence theory. The latter has the advantage of taking into account a pre-established uncertainty, i.e. a formalism integrating the lack of information or its incompleteness.
.
This establishes the formal framework for our work, which involves the dynamic selection of computational operators. As previously indicated, the selection or looping rules must be established on the basis of quality criteria estimated at the time the attribute is calculated and at the time the decision is made (Figure 2-4).
The most conventional approach to designing an image processing application is to choose a single classification method once and for all, a priori. However, in terms of performance (average number of correct classifications), it seems more interesting to us to combine classification methods, thereby compensating for the errors of each method. There are several ways of approaching the combination of classifiers, including a dynamic one, based on calculating the local accuracy of each classifier. The local accuracy of a classifier, as defined by Woods [4] and Giacinto [5], is a measure of the local decision of this operator. It is possible to integrate this measure during processing. In this way, we obtain a processing chain schematized in Figure 2-5.
Whether it’s for attribute calculation or classification, the decision whether or not to keep the result of processing is based on a measurement of the quality of the data resulting from this processing. The impact of this data on the final decision is essentially linked to its quality and accuracy.
References
[1] | U. Fayyad, G. Pietetsky-Shapiro and P. Smyth, “Knowledge discovery and data mining: towards a unifying framework,” in Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, 1996. |
[2] | U. Fayyad, G. Pietetsky-Shapiro and P. Smyth, “From Data Mining to Knowledge Discovery in Databases,” AI Magazine, vol. 17, no. 3, pp. 37-54, 1996. |
[3] | R. Clouard, “Incremental and opportunistic reasoning applied to the dynamic construction of image processing plans,” 1994. |
[4] | K. Woods, W. P. Kegelmeyer and K. Bowyer, “Combination of multiple classifiers using local accuracy estimates,” in IEEE Trans. on Pattern Analysis and Machine Intelligence , 1997. |
[5] | G. Giacinto and F. Roli, “Adaptive selection of image classifiers,” in ICIAP ’97, Lectures Notes in Computer Science, 1997. |
.
.