Home
Ressources
Deciding in the presence of imprecision

Technology

Deciding in the presence of imprecision

How to decide in the presence of imprecision?

S. Sochacki, PhD in Computer Science

A2D, ZA Les Minimes, 1 rue de la Trinquette, 17000 La Rochelle, FRANCE

1 Introduction

Imprecision/uncertainty/conflict
AI Act, explainability

How do you decide in IT? What does an automatic system base its decisions on? Very often, this kind of system is based on a principle of decision rules (often probabilistic), or on a pignistic approach (a level entirely dedicated to decision-making and clearly separated from data modeling).

Although a very large proportion of image processing applications for the general public are geared towards the enhancement, restoration or processing of photographs, in an industrial context, the target applications are image-based decision-making. While the classes of information used for this purpose are well known (texture, shape, color), they are not all of the same interest in decision support. Various forms of learning have therefore been developed to identify the best possible combination of information. Dozens of solutions exist to answer this question, and some even combine several of these approaches to optimize decision-making as much as possible.

However, while not all information classes, and more specifically all information extraction operators (Zernike moments, D’Haralick attributes, to name but two), are of equal interest for decision-making purposes, the information extracted is judged to be of equal quality. And yet, image processing comes in part from signal processing, and thus from the world of physics and, more specifically, metrology. So we’re a long way from a binary world, where information is true or false, but certain as to its assignment. In metrology, measurement only exists in relation to a criterion of precision or tolerance. This criterion quantifies the range of values within which the measurement lies with a good degree of certainty. This point of view, typical of needle-based metrology, is often forgotten in the transition to digital technology. And yet, every scientist remains sceptical when announcing a result with more than 3 digits behind the decimal point (when this precision has not been justified beforehand).

A2D is committed to developing applications using decision-making algorithms to meet these new needs, while respecting the constraints linked to trusted AI (AI Act $^1$ , ISO/IEC 42001 $^2$ ) and energy sobriety. In practice, we are designing data processing and analysis applications to understand outdoor scenes, digitized with a view to detecting precise areas of interest, depending on business uses.

We also endeavor to provide, at all times during the processing of information, an explanation of the results obtained and the choices proposed by the machine.

2 Probabilistic approach

Since our aim is to propose a new extension to decision-making tools, we felt it was important to stay within the framework of mathematically-based tools, as opposed to rule-based tools. As probabilities are at the heart of these approaches, we have chosen to start this study from this formalism.

Classically, we call probability the evaluation of the likelihood of an event. It is represented by a real number between 0 and 1, indicating the degree of risk (or chance) that the event will occur. In the typical example of a coin toss, the event “tails side up” has a probability of $1/2$ , which means that after tossing the coin a very large number of times, the frequency of occurrence of “tails side up” will be $1/2$ .

In probability, an event can be just about anything you can imagine, whether it happens or not. It could be that the weather will be fine tomorrow, or that a roll of a die will produce the number 6. The only constraint is to be able to verify the event; it’s perfectly possible to check whether the weather will be fine tomorrow (an event that we measure personally, the following day, according to our tastes in terms of sunshine, temperature, etc.), or whether we manage to obtain the number 6 with a die.

We’ll say that an event with a probability of 0 is impossible, while an event with a probability of 1 is certain.

Note that probability is a tool for predicting real-world events, not for explaining them. It’s about studying sets by measuring them.

2.1 Definitions

An experiment is said to be random when its outcome cannot be predicted. Let’s define $\mathcal{E}$ , a random experiment whose result $\omega$ is an element of $\Omega$ , the set of all possible outcomes, called the universe of possibilities or frame of reference.

Let $\mathcal{P}(\Omega)$ be theset of parts of $\Omega$ . An event is then a logical proposition linked to an experiment, relative to its result.

Let $\mathcal{A}$ be the set of events and subset of $\mathcal{P}$ . For all $A\in\mathcal{A}$ and $B\in\mathcal{A}$ :

A \cup B $designates the realization of$ A $or$ B $. </li> <li>A \cap B$ denotes the realization of $A$ and $B$ .
$\bar{A}=\Omega\backslash A$ denotes the opposite of $A$ .
$\Omega$ is the certain event
$\emptyset$ is the impossible event

When the reference frame $\Omega$ is finite, $\mathcal{A}$ designates all the parts of $\Omega$ , usually noted $2^\Omega$ . When the reference frame is $\mathbb{R}$ (or an interval of $\mathbb{R}$ ), the notion of tribe makes it possible to define $\mathcal{A}$ :

\mathcal{A}\subseteq\mathcal{P}(\Omega) $</li> <li>$ \emptyset\in\mathcal{A} $</li> <li>$ \forall\{A_1,\ldots,A_n\}\in\mathcal{A} $,$ \bigcup_iA_i\in\mathcal{A} $and$ \bigcap_iA_i\in\mathcal{A} $</li> <li>$ \bar{A}\in\mathcal{A}\forall A\in\mathcal{A} $</li> </ul> A probability measure on a measurable space$ (\Omega,\mathcal{A}) $is a function of$ \mathcal{A} $in$ [0,1] $such that : <ul> <li>P(\Omega)=$ 1
For all incompatible elements $A$ and $B$ (i.e. $A\cap B=emptyset$ ), $P(A\cup B)=P(A)+P(B)$ .

The number $P(A)$ quantifies how likely the event $A\subseteq\Omega$ is.

In the case of a throw of the dice, we can state that $\Omega=\{1,2,3,4,5,6\}$ represents all 6 faces of the dice. Thus, the probability of obtaining a number between 1 and 6 is certain, and therefore equal to 1. We’ll write $P(1\cup 2\cup 3\cup 4\cup 5\cup 6)=P(\Omega)=1$ .

Similarly, the probability of obtaining 1 (or 2, or 3, or 4 etc.) is 1/6. So, $P(1)=P(2)=...=P(6)=\frac{1}{6}$ .

Related to $P$ , in the case where $\Omega$ is finite, a probability distribution is defined as a function $p$ of $\Omega$ in $[0,1]$ , such that:

(2.1.1) $\begin{equation*} P(A)=\sum_{\omega\in A}p(\omega) \;\;\; \forall A\subseteq\\Omega \end{equation*}$

with the normalization condition:

(2.1.2) $\begin{equation*} \sum_{omegain\Omega}p(\omega)=1 \end{equation*}$

Note that $P(A)+P(\bar{A})=1$ .

2.2 Bayes’ rule

The introduction of a statistical model leads to a decision theory directly applicable, in principle, to the problem of classification. This application is essentially based on Bayes’ rule, which makes it possible to evaluate probabilities a posteriori (i.e. after the actual observation of certain quantities) knowing the conditional probability distributions a priori (i.e. independent of any constraints on the observed variables). The probabilities required to make a decision according to Bayes’ rule are provided by one (or more) statistical classifiers, which indicate the probability of a given individual belonging to each class of the problem.

In probability theory, Bayes’ theorem states conditional probabilities: given two events $A$ and $B$ , Bayes’ theorem allows us to determine the probability of $A$ knowing $B$ , if we know the probabilities: of $A$ , of $B$ and of $B$ knowing A. To arrive at Bayes’ theorem, we need to start from one of the definitions of conditional probability: $P(A|B)P(B)=P(A \cap B)=P(B|A)P(A)$ , noting $P(A \cap B)$ the probability that $A$ and $B$ will both occur. Dividing on both sides by $P(B)$ , we obtain $P(A|B)=P(B|A)P(A)/P(B)$ , i.e. Bayes’ theorem. Each term in Bayes’ theorem has a common name. The term $P(A)$ is the a priori probability of A. It is “prior” in the sense that it precedes any information about $B$ . $P(A)$ is also called the marginal probability of $A$ . The term $P(A|B)$ is called the a posteriori probability of $A$ knowing $B$ (or of $A$ under condition $B$ ). It is “posterior”, in the sense that it depends directly on $B$ . The term $P(B|A)$ , for a known $B$ , is called the likelihood function of $A$ . Similarly, the term $P(B)$ is called the marginal or a priori probability of $B$ .

But, as previously stated, the probability values used by Bayes’ rule are provided by statistical classifiers, or simply by estimating the probability distributions of each class.

2.3 Statistical decision-making

From the hypothesis about the probability distributions of the problem’s classes (and its validation tests) to the measurement of the distance to the center of each class, via various forms of model estimation, there are a huge number of methods in the field of statistics. Our aim is not to make an exhaustive study of them, but to compare theories (statistics, decision theory…) on the principle.

Let’s just say that there are two main types of statistical approach: linear classifiers and quadratic classifiers. The former are so called because their approach consists of partitioning the attribute space in a simple way, based on straight lines, each straight line symbolizing the boundary between two classes, as illustrated by the boundary in green in figure 2.3.1. The second approach is more complex, as it attempts to estimate the shape of the boundaries between regions, thus obtaining the circular separation between classes.

exlinquad — Figure 2.3.1 – Boundaries using a linear classifier (Bayes-Normal-1) and a quadratic classifier (Bayes-Normal-2)

2.3.1 Bayesian classifier [93][4][14]

Bayes decision theory is a statistical approach to the problem. It is based on the following assumptions:

The decision problem is expressed in terms of probabilities
The value of the a priori probabilities of drawing a class is known

Let $\omega$ be a random variable representing the individual’s state ( $\omega_j$ if the individual belongs to class $j$ ).

Let $\Omega=\{\omega_1,\ldots,\omega_S\}$ be the set of $S$ possible states or classes of a problem.

The set of a priori probabilities $p(\omega_1), p(\omega_2), \ldots$ is calculated from observations made on the system under study. In the absence of other information, we have no choice but to write $p($ individual $= \omega_i)=p(\omega_i)$ . But if it is possible to obtain other information (measurements), then $p(x|\omega_j)$ , the probability density of $x$ knowing that the individual is $\omega_j$ , or the probability density conditional on his belonging to $\omega_j$ , can be calculated.

p(\omega_j) $is therefore the a priori probability of appearance of the$ \omega_j $class and$ p(\omega_j|x) $is the probability that the shape characterized by the$ x $vector belongs to$ \omega_j $. In this case, we have$ x=\{x_1 \ldots x_d\} $, the random vector (measurements or situation) on$ {mathbb{R}}^d $. The problem then becomes : is an individual and its measure$ x $. How do you decide which category ($ \omega_j

$) to assign this individual to? Bayes' rule gives the answer: <a name="id2636030347"></a> (2.3.1.1) <img src="https://www.a2d.ai/wp-content/ql-cache/quicklatex.com-899977000eb1cbb0839f9776842aa4b9_l3.png" height="43" width="282" class="ql-img-displayed-equation quicklatex-auto-format" alt="\begin{equation*} \protect p(\omega_j|x) = \frac{p(x|\omega_j)p(\omega_j)}{p(x)} \,\,\, , \,\,\, j\in\{1,S\} \end{equation*}" title="Rendered by QuickLaTeX.com"/> with (2.3.1.2) <img src="https://www.a2d.ai/wp-content/ql-cache/quicklatex.com-3fce94cd507ef6aeec065193603f07ef_l3.png" height="55" width="182" class="ql-img-displayed-equation quicklatex-auto-format" alt="\begin{equation*} p(x) = \sum_{j=1}^{S}p(x|\omega_j)p(\omega_j) \end{equation*}" title="Rendered by QuickLaTeX.com"/> where$

S $is the number of different classes and$ p(x) $the probability distribution for all individuals (probability density of vector$ x $), in which the samples of each class$ \omega_j $appear proportionally to the a priori probabilities$ p(\omega_j) $; this is the normalization factor. The decision rule is then as follows: <center>decide$ \omega_j $if$ p(\omega_j|x)=max_{i=1,n}(p(\omega_i|x)) $Thus</center>, it becomes possible to enumerate$ A={alpha_1,\ldots,\alpha_k\} $the set of$ a $possible decisions. Now we need to express the conditional probability densities. To do this, let's recall theBienaymé-Tchebicheff inequality: Let$ X_1, X_2, …, X_n $be independent random variables, all with expectation$ \mu $and variance$ \sigma^2 $. Let$ \bar{X}=\frac {1}{n}\sum_{i=1}^{n}X_i $then for all$ d>0

$: (2.3.1.3) <img src="https://www.a2d.ai/wp-content/ql-cache/quicklatex.com-724e7e10b23ee47f713ee563543e2f4e_l3.png" height="39" width="201" class="ql-img-displayed-equation quicklatex-auto-format" alt="\begin{equation*} P(|\bar{X}-\mu|\geq d)\leq \frac{sigma^2}{nd^2} \end{equation*}" title="Rendered by QuickLaTeX.com"/> The probability of a random variable deviating by more than d from its mean value is lower the smaller its variance and the larger d is. In particular(Law of Large Numbers) [7]: (2.3.1.4) <img src="https://www.a2d.ai/wp-content/ql-cache/quicklatex.com-a6f0a18a1408816d1d620d573c9af3dd_l3.png" height="21" width="295" class="ql-img-displayed-equation quicklatex-auto-format" alt="\begin{equation*} P(|\bar{X}-\mu|\geq d)\longrightarrow 0 \,\,\, when \,\,\, n\rightarrow +\infty \end{equation*}" title="Rendered by QuickLaTeX.com"/> If$

n\geq 30 $, then, by applying the central limit theorem (which intuitively states that any sum of independent, identically distributed random variables tends towards a Gaussian random variable), it can be assumed that$ X_1+X_2+…+X_n

(2.3.1.6) $practically follows a Normal distribution. Recall that the density of a Normal distribution is of the form (on one dimension): (2.3.1.5) <img src="https://www.a2d.ai/wp-content/ql-cache/quicklatex.com-2a2fcbc4a0cc9cb46b84ff3de74627f8_l3.png" height="160" width="584" class="ql-img-displayed-equation quicklatex-auto-format" alt="\begin{equation*} x \sim N(\mu, \sigma^2),\,\,\, f(x) = \frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{1}{2}(\frac{x-\mu}{\sigma})^2} \end {equation*} Finally, we know that the mathematical expectation of a random variable can be estimated pointwise by: \begin{equation*} \mu \simeq \bar{X}=\frac{1}{n}\sum_{i=1}^{n}X_i \end{equation*}" title="Rendered by QuickLaTeX.com"/> So, for our$

x $measure, the samples are assumed to follow a Normal distribution$ N(\bar $).{x},\sigma^2)$ whose probability density will be expressed in matrix form:

(2.3.1.7) $\begin{equation*} P(x)= \frac{1}{2\pi^{frac{n}{2}}|\Phi|^\frac{1}{2}}\exp\{-\frac{1}{2}(x-\bar{x})^t[\Phi]^{-1}(x-\bar{x})\} \end{equation*}$

with $\bar{x}$ mean of a class, $[\Phi_k]$ covariance matrix of class $k$ and $|\Phi_k|$ determinant of covariance matrix of class $k$ .

The conditional probability is therefore expressed by the following equation:

(2.3.1.8) $\begin{equation*} P(x|omega_k)= \frac{1}{2\pi^{frac{n}{2}}|\Phi_k|^\frac{1}{2}}\exp\{-\frac{1}{2}(x-\bar{x}_k)^t[\Phi_k]^{-1}(x-\bar{x}_k)\} \end{equation*}$

The classification operation consists in calculating for each measure $x$ its probability of belonging to all the classes, and assigning it to the class that verifies the maximum a posteriori probability.

2.3.2 K-nearest neighbors (KPPV or KNN)

This approach consists in taking a given number ( $k$ ) of known individuals from the attribute space, individuals chosen from the neighborhood of the individual whose class we’re trying to estimate. The labeling of the unknown individual is based on knowledge of the classes represented in this neighborhood.

The k-nearest neighbor rule was formalized by Fix and Hodges [5] in the 1950s. This rule consists in basing the choice of the neighborhood $\mathbb{R}$ on the number of observations it contains, fixed at $k$ among the total number of observations $m$ . In other words, we select a subset of $k$ individuals from the set of observations. It is the volume $V_k$ of $\mathbb{R}$ that is variable. The probability density estimator is then expressed as follows:

(2.3.2.1) $\begin{equation*} \hat{P}(x)=\frac{k}{m\times V_k} \end {equation*} This estimate is valid only under the following conditions: \begin{equation**} \left\{ \begin{array}{ccc} \lim_{m\rightarrow +\infty}V_k&=&0\ \lim_{m\rightarrow +\infty}\frac{k}{V_k}&=&0 \end{array} \right. \end{equation**}$

The first assumption corresponds to a space completely occupied by observations: when the number of observations is infinite, these observations occupy the space uniformly. The second hypothesis considers that, for an infinite number of observations, any subset of cardinal $k$ is occupied by an infinite number of uniformly distributed elements.

In the context of supervised classification, the conditional probability distribution can be approximated by:

(2.3.2.2) $\begin{equation*} \hat{P}(x|\omega_q)=\frac{k_q}{m_q\times A} \end{equation*}$

where $k_q$ is the number of observations contained in the volume $V_k$ belonging to the label class $\omega_q$ and $m_q$ the total number of observations belonging to the class $\omega_q$ . The a posteriori law of observation of a label conditional on an observation is obtained with Bayes’ rule:

(2.3.2.3) $\begin{equation*} \hat{P}(\omega_q|y)=\frac{P(\omega_q)\hat{P}(y|\omega_q)}{\sum_k P(\omega_k)\hat{P}(y|\omega_k)}=\frac{P(\omega_q)\frac{k_q}{m_qtimes V_k}}{hat{P}(y)}=\frac{P(\omega_q)\times m}{m_q}\frac{k_q}{k} \end{equation*}$

The choice of $k$ is linked to the learning base. Choosing a large $k$ allows local properties to be exploited to the full, but requires a large number of samples to keep the neighborhood volume small. Often, $k$ is chosen as the square root of the average number of elements per class, i.e. $\sqrt{\frac{m}{c}}$ [2].

2.4 Summary

But how do you choose anything other than what is statistically most likely?

The problem then arises of measuring the accuracy of a decision and producing a measure of accuracy, even if the problem may be this measure.

3 Theory of evidence

The transferable belief model (TBM) is a generic formal framework developed by Philippe Smets [20] for the representation and combination of knowledge. TBM is based on the definition of belief functions provided by information sources that may be complementary, redundant and possibly non-independent. It proposes a set of operators for combining these functions. It is therefore naturally used in the context of information fusion to improve the analysis and interpretation of data from multiple information sources. This framework naturally corresponds to the one we’re concerned with in its decision-making aspect, thanks to different information from different attributes.

One of the fundamental features of TBM is the differentiation between the knowledge representation and decision levels. This differentiation is much less preponderant in other approaches, particularly the probabilistic model, where decision is often the sole objective. TBM’s reasoning mechanisms are therefore grouped into two levels, as illustrated in figure 3.1 :

The credal level: home to knowledge representation (static part), combinations and reasoning about this knowledge (dynamic part).
The pignistic level indicates a decision, possibly taking into account the risk and/or reward associated with that decision.

MCT — Figure 3.1 – Mechanisms of the transferable belief model

In the context of representation in the form of a processing chain, we are faced with the problem of observing the same event from several sources, and combining the information provided by these sources (see Figure 3.2), which raises the following questions:

How can we represent the knowledge of an information source in the form of belief functions, and what advantages can we derive from such a representation?
How can we combine several sources of belief functions to summarize information and improve decision-making?
How to make a decision based on belief functions?

3.1 Presentation

Let’s take the example of handwritten number recognition. The possible digits that can be written are called hypotheses, and the set of hypotheses forms the discernment frame, usually denoted $\Omega$ . For our example, $\Omega=\{\omega_0,\omega_1,\omega_2,\omega_3,\omega_4,\omega_5,\omega_6,\omega_7,\omega_8,\omega_9\}$ where $\omega_i$ is the symbol representing the digit $i$ . Since two hypotheses can’t be true simultaneously (a symbol representing several digits can’t be written at the same time), they are assumed to be exclusive, resulting in a null intersection between them (i.e. $\{omega_1\}\cap\{\omega_3\}=\emptyset$ ). Now let’s suppose that the document we’re studying is degraded, or that the figure is very badly written, a reader would then give a weighted opinion on this figure, making it a source of information. With his available knowledge, the reader would say that it’s either $\omega_1$ , $\omega_7$ or $\omega_9$ . The question is how to represent this information.

In the context of probability theory, the answer is given by the principle of equiprobability, i.e. each digit is assigned the same probability ( $\frac{1}{3}$ ) so that the sum of all probabilities is equal to 1. Thus, the probability distribution for this digit is equal to:

$\begin{equation*} P(\omega_1)=P(\omega_7)=P(\omega_9)=\frac{1}{3} \end{equation*}$

This probability is only assigned to hypotheses taken alone (also called singletons).

In evidence theory, the main notion is that of belief functions, within which knowledge is modeled by a distribution of belief masses. In our example, the entire belief mass (an integer unit, as with probabilities) is assigned to the union of all hypotheses, i.e. the proposition $\{omega_1,\omega_7,\omega_9\}$ :

$\begin{equation*} m(\{omega_1,\omega_7,\omega_9\})=1 \end{equation*}$

As we can see, this mass gives us no information about the belief of each of the hypotheses ( $\omega_1$ , $\omega_7$ and $\omega_9$ ) that make it up. We can then say that the belief mass explicitly models doubt (or ignorance) about this set of hypotheses. To return to probability theory, the value of the probability on a union of hypotheses is implicitly derived from the probability on the singletons, a probability established from the sum of all the private probabilities of the common part, i.e. the probability of the intersection. For example:

$\begin{equation*} P(\{omega_1,\omega_7\})=P(\omega_1)+P(\omega_7)-P(\{(\omega_1,\omega_7)\}) \end{equation*}$

Here, the probability on $P(\{(\omega_1,\omega_7)\})$ represents the common part of the events“The number is 1” and“The number is 7“, which we need to remove so as not to count the events twice. In the case where the hypotheses are exclusive (a case we’ll come back to throughout this article), we obtain :

$\begin{equation*} \{omega_1}\cap\{omega_7}=\emptyset \Rightarrow P(\{(\omega_1,\omega_7)\})=0 \Rightarrow P(\{omega_1,\omega_7\})=P(\omega_1)+P(\omega_7) \end{equation*}$

taking into account that $P(\emptyset)=0$ by definition.

Unlike probabilities, a mass function on a union of hypotheses is not equal to the sum of the masses of the hypotheses making up the union [10], and it is precisely thanks to this property that we represent doubt between hypotheses. Moreover, belief functions have the advantage of being generic, since they are the mathematical representation of a superset of probability functions. Indeed, a mass distribution, where only singleton hypotheses have non-zero mass, can be interpreted as a probability distribution. Klir and Wierman explain in [10] that belief functions mathematically generalize possibility functions, and that each of these functions (probabilities, possibilities and beliefs) has particular characteristics and enables knowledge to be modeled differently.

Let’s now look at the mathematical formalism of belief functions, based on the models of Dempster [6][7], Shafer [15], transferable beliefs [20] and the Hints model [9]. For this work, we have restricted ourselves to the axiomatized and formalized framework of the transferable belief model developed by P. Smets.

3.2 Formalism

3.2.1 Introduction

If we return to the problem of managing multiple sources of information (as illustrated in Figure 3.2), what we’re trying to do is determine the actual $\omega_0$ state of the system under study, using multiple observations from multiple observers. In our case, $\omega_0$ will be the decision to be produced and the various attributes provide the observations on the system.

We assume that this state $\omega_0$ takes discrete values $\omega_k$ (the hypotheses) and that the set of $N$ possible hypotheses is called the discernment frame (or \textbf{universe of discourse}), generally noted:

(3.2.1.1) $\begin{equation*} \Omega=\{\omega_1,...,\omega_k,...,\omega_N\}=\bigcup_{k=1}^{N}\{\omega_k\} \end{equation*}$

The total number of hypotheses making up $\Omega$ is called the cardinal and is written as $|\Omega|=N$ . As we saw earlier, all hypotheses are considered exclusive.

The system’s set of observers (the sensors in figure 3.2) provides a weighted opinion, represented by a belief function, about its actual state $omega_0\subseteq \Omega$ . We call these sensors belief sources.

3.2.2 Mass functions

Let $2^\Omega$ be the space formed by all the parts of $\Omega$ , i.e. the space containing all the possible subsets formed by the hypotheses and unions of hypotheses, i.e. :

$\begin{equation*} 2^\Omega=\{\emptyset,\{\omega_1\},\{\omega2\},\{\omega_1,\omega_2\},\{\omega_3\},\{\omega_1,\omega_3\},\{\omega_2,\omega_3\},\{\omega_1,\omega_2,\omega_3\},...,\{\omega_1,...,\omega_N\}\} \end{equation*}$

Let $A$ be a proposition such that $A\subseteq\Omega$ , e.g. $A=\{\omega_1,\omega_2\}$ . $A$ explicitly represents the doubt between its hypotheses ( $\omega_1$ and $\omega_2$ ). As we saw above, belief masses are non-additive, which is a fundamental difference from probability theory. Thus, the belief mass $m(A)$ allocated to $A$ gives no information about the hypotheses and subsets making up $A$ [10].

We can define a distribution of belief masses (BBA: basic belief assignment) as a set of belief masses concerning any $A\subseteq\Omega$ propositions verifying:

(3.2.2.1) $\begin{equation*} \sum_{A\subseteq\Omega}m(A)=1 \end{equation*}$

Each of the sensors observing the system will contribute its own mass distribution.

The mass $m(A)$ supplied by a sensor is that sensor’s share of belief in the proposition “ $\omega_0\in B$ “, where $\omega_0$ is the actual state of the observed system. Any non-zero mass element $B$ (i.e. $m(B)>0$ ) is called the focal element of the belief mass distribution. Ramasso [13] presents a notable set of belief mass distributions in his dissertation.

When the universe of discourse contains all possible hypotheses (i.e. when it is exhaustive), it necessarily contains the hypothesis that describes the observed system. Otherwise, the discernment framework is not adapted to the problem at hand.

From the moment $\Omega$ is exhaustive, the mass of the empty set is zero ( $m(\emptyset)=0$ ), the mass is said to be normal, and the problem then enters the framework of a closed world as in the case of Shafer’s model [15].

In the case of the transferable belief model, the universe of discourse can be non-exhaustive; the mass of the set is non-zero ( $m({\emptyset})\neq 0$ ) and is said to be subnormal, and in this case the problematic is part of an open-world framework. This type of construction makes it possible to redefine the discernment framework, for example, by adding a hypothesis based on the mass of the empty set, such as a rejection hypothesis. Rejection hypotheses correspond to states such as“I don’t know“. At the end of the chain, they allow us to reject an individual for whom there is too much doubt, rather than taking the risk of classifying him or her in spite of everything. To move from an open to a closed world, we need to redistribute the mass of the empty set to the other subsets of the $2^\Omega$ space [11].

To summarize what we’ve seen so far, the mass function is determined a priori, without the need for probabilistic considerations. It may correspond to an observer’s subjective attribution of degrees of credibility to the realization of various conceivable events on $\Omega$ .

The assignment of $m$ values is used to express the degree to which the expert or observer judges that each hypothesis is likely to come true. Thus, $m(\Omega)$ describes its imprecision relative to these degrees.

3.2.3 Dempster’s combination rule

If we return to figure 3.2, we see that the problem at hand is information fusion, i.e. combining heterogeneous information from several sources to improve decision-making [3].

Let’s now look at a formal tool developed within the framework of the transferable belief model for information fusion. We assume that the sources are distinct [16][21][22][23], by analogy with theindependence of random variables in statistics.

It may happen that for the same hypothesis, two sensors provide completely different information. From a formal point of view, when combining two belief mass functions $m_1$ and $m_2$ , the intersection between the focal elements may be empty. The resulting belief mass function in this combination will be non-zero on the empty set. This value then quantifies the discordance between the belief sources, and is called conflict [17]. A normalization can be performed, bringing the conjunctive combination rule back to Dempster’s orthogonal rule [6][7] (denoted $\oplus$ ) used in Shafer’s model [15][18]:
\begin {eqnarray}
m_{1\oplus 2}(A)=(m_1 \oplus m_2)(A)=\left\{
\begin{array}{lccc}
\frac{m_{1\overlay{\cap}{\bigcirc}2}(A)}{1-m_{1\overlay{\cap}{\bigcirc}2}(\emptyset)}, & \forall A\subseteq\Omega & \mathrm{si }&A\neq\emptyset\\\
0 & & \mathrm{si }&A=\emptyset\\
\end{array}
\right.
\label{incertitude:eq:dempsterrule}\tag{3.2.3.1}
\end{eqnarray}
Another way of writing this combination rule is:

*** QuickLaTeX cannot compile formula:
\begin{eqnarray*}
m(A)=\frac{{sum_{B\cap C=A}m_1(B)m_2(C)}{{sum_{B\cap Cneq0}m_1(B)m_2(C)} & \forall A\subseteq\Omega
\end{eqnarray*}

*** Error message:
File ended while scanning use of \frac .
Emergency stop.

With, as a measure of the conflict between sources:

(3.2.3.3) $\begin{equation*} K = \sum_{B\cap C=0}m_1(B)m_2(C) \end{equation*}$

3.2.4 Simple attenuation

The simple weakening rule to incorporate source reliability. Applying the disjunctive combination rule can be too conservative when the sources of belief mass functions are unreliable. It can thus lead to a completely non-informative (empty) belief mass function. In some applications, however, it is possible to quantify reliability, enabling a less conservative combination rule to be applied in the case of unreliable sources.

Taking reliability into account in the context of belief functions is calledweakening, because the process consists in weighting the mass of elements of a belief mass function. The first work on weakening in the context of belief functions was developed by Shafer [15], axiomatized by Smets [16] and generalized by Mercier and Denœux [12]. Generalized weakening was called contextual weakening.

Shafer’s simple weakening of a belief mass function $m$ is defined as the weighting of each mass $m(B)$ of the distribution by a coefficient $(1-\alpha)$ called reliability and where $\alpha\in[0,1]$ is the rate of weakening.

Formally:

(3.2.4.1) $\begin{eqnarray*} m^\alpha(B) =& (1-\alpha).m(B),&\forall B\subset\Omega\\ m^\alpha(\Omega) =& (1-\alpha).m(\Omega)+\alpha \end{eqnarray*}$

By the principle of minimum information (maximum entropy), the rest of the mass (after weighting) is transferred to the element of ignorance $\Omega$ . The more reliable the source (and the lower the rate of weakening $\alpha$ ), the less the belief mass function is modified, whereas as reliability decreases, the belief mass function resulting from weakening tends towards the empty belief mass function.

3.3 The pignistic decision

From the set of belief functions provided by all the sensors, and following their combination by Dempster’s rule, we obtain a unique belief function for each of the possible hypotheses modeling the system’s knowledge. This knowledge is then used to make a decision on one of the hypotheses.

Note that the system’s hypotheses don’t necessarily represent the only possible classes (i.e. the 26 letters of our alphabet). It is also possible to create a new hypothesis corresponding to the event “none of the known hypotheses”. This is referred to as rejection, i.e. the unknown individual is not re-injected into the chain, but simply classified as “impossible to recognize”. The measure of sensor conflict during the Dempster merge (see equation 3.2.3.3) is also a measure on which the system can base its rejection of the individual.

Information fusion systems, whether based on probability theory, possibility theory or belief functions, are designed for decision making and decision analysis. Decision-making consists in choosing a hypothesis on a betting framework, generally the $\Omega$ discernment framework. Decision-making can be automatic or left to the responsibility of the end-user (as in the case of diagnostic aids in the field of medicine).

Within the framework of the transferable belief model, the decision phase is based on the pignistic probability distribution noted $\textrm{BetP}\{m\}$ from the mass distribution $m$ [19]. The pignistic transform consists of distributing the mass of a proposition $B$ equiprobably over the hypotheses contained in $B$ .

Formally:

(3.3.1) $\begin{eqnarray*} \textrm{Bet}\{m\}: & \Omega\rightarrow[0,1]\\ & \omega_k\mapsto\textrm{BetP}\{m\}(\omega_k) \end{eqnarray*}$

where $\textrm{BetP}\{m\}(\omega_k),\forall\omega_k\in\Omega$ is given by:

(3.3.2) $\begin{equation*} \textrm{BetP}\{m\}(\omega_k)=\frac{1}{(1-m(\emptyset))}\sum_{B\subseteq\\Omega,\omega_k\in B}\frac{m(B)}{|B|} \end{equation*}$

The decision is generally made by choosing the $\omega_k$ element with the highest pignistic probability:

(3.3.3) $\begin{equation*} \omega_k=\textrm{arg}\max_{\omega_k\in\Omega}\textrm{BetP}\{m\}(\omega_k) \end{equation*}$

The agent who relies on the pignistic transform in the decision-making phase exhibits rational behavior by maximizing expected utility. Alternatively, plausibilities and credibilities can be used when the agent presents a rather optimistic or pessimistic attitude.

In the case of plausibilities, the agent is said to have an optimistic attitude in the sense that the plausibility of an element represents the maximum value of the belief mass on this element that could be reached if additional information were to reach the fusion system (the latter using the conjunctive combination to integrate this new information). The probability distribution over singletons from which a decision can be made is then given $\forall\omega_k\in\Omega$ by:
\begin {equation}
\textrm{PIP}\{m\}(\omega_k)=\frac{pl(\omega_k)}{\sum_{\omega_k\in\Omega}pl(\omega_k)}
\label{incertitude:eq:pip}\tag{3.3.4}
\end{equation}

Note that the plausibility of a singleton $pl(\omega_k)$ is equal to its communality $q(\omega_k)$ . In the same way, a criterion adapted to a “pessimistic” agent can be obtained using credibilities. The credibility of a singleton $bel(\omega_k)$ is simply equal to its mass after normalization of the belief mass function $m$ . Note that for any element $A\in2^\Omega$ :
\begin {equation}
bel(A)\leq \textrm{BetP(A)}\leq pl(A)
\label{incertitude:eq:bel}\tag{3.3.5}
\end{equation}
and in particular when plausibilities and credibilities are calculated on singletons.

For the decision-making problem, we assume we have a belief function $m$ on $\Omega$ which summarizes all the information provided on the value of the variable $y$ . The decision consists in choosing an action $a$ from a finite set of actions $\mathcal{A}$ . A loss function $\lambda:\mathcal{A}\times\Omega\rightarrow\mathbb{R}$ is assumed to be defined such that $\lambda(a,\omega)$ represents the loss incurred if action $a$ is chosen when $y=\omega$ . From the pignistic probability, each action $a\in\mathcal{A}$ can be associated with a risk defined as the expected cost relative to $p_m$ if $a$ is chosen:
\begin {equation}
R(a)=sum_{\omega\in\Omega}\lambda(a,\omega)p_m(\omega)
\label{incertitude:eq:risk}\tag{3.3.6}
\end{equation}

The idea is then to choose the action that minimizes this risk, generally referred to as pignistic risk. In pattern recognition, $\Omega=\{\omega_1,...,\omega_Q\}$ is the set of classes and the elements of $\mathcal{A}$ are generally the actions $a_q$ which consist in assigning the unknown vector to the class $\omega_q$ .

It has been shown that minimizing the pignistic risk $R$ leads to choosing the class $\omega_0$ with the highest pignistic probability [8]. If an additional rejection action $a_0$ of constant cost $\lambda_0$ is possible, then the vector is rejected if $p_m(\omega_0)<1-\lambda_0$ .

3.4 Decision-making based on belief functions

Let’s now take a look at how to construct belief functions for decision-making purposes, based on the accuracy of the attributes, the accuracy of the classifiers and the output of the classification stage.

At the credal level of the transferable belief model, in order to obtain belief functions from training data, two families of techniques are generally used: likelihood-based methods that use density estimation, and a distance-based method in which mass sets are constructed directly from distances to training vectors.

Likelihood-based methods.
The probability densities conditional on the classes are assumed to be known. Having observed , the likelihood function is a function in defined by , for all . From , Shafer [15] proposed to construct a belief function on defined by its plausibility function as:
\begin {eqnarray}
pl(A)=\frac{\max_{omega_q\in A}[L(\omega_q|X)]}{\max_{omega_q\in A}{q}[L(\omega_q|X)]} & \forall A\subseteq\Omega
\label{incertitude:eq:pl}\tag{3.4.1}
\end{eqnarray}Starting from axiomatic considerations, Appriou [1] proposed another method based on the construction of belief functions . The idea is to consider each class separately and evaluate the degree of belief accorded to each of them. In this case, the focal elements of each of the belief functions are the singletons , the complementary subsets and . Appriou thus obtains two different models:
- Model 1:
  
  (3.4.2) $\begin{eqnarray*} m_q(\{omega_q}) = \alpha_q\frac{R.L(\omega_q|x)}{1+R.L(\omega_q|x)} m_q(\overline{\omega_q}) = \alpha_q\frac{1}{1+R.L(\omega_q|x)} m_q(\Omega) = 1-\alpha_q \end{eqnarray*}$
- Model 2:
  
  (3.4.3) $\begin{eqnarray*} m_q(\{omega_q}) = 0\ m_q(\overline{\omega_q}) = \alpha_q(1-R.L(\omega_q|x))\ m_q(\Omega) = 1-\alpha_q(1-R.L(\omega_q|x)) \end{eqnarray*}$
Here, $\alpha_q$ is a coefficient that can be used to model additional information (such as sensor reliability), and $R$ is a normalization constant that is chosen in $]0,(\sup_x \max_q(L(\omega_q|x)))^{-1}]$ . In practice, the performance of these two models appears to be equivalent. However, Appriou recommends the use of model 2, which has the advantage of being consistent with the “generalized Bayes theorem” proposed by Smets in [16]. From these $Q$ belief functions and using Dempster’s combination rule, a unique belief function $m$ is obtained by $m=bigoplus_qm_q$ .
Distance-based method.
The second family of models is based on distance information. An example of this is the extension of the $k$ nearest neighbor algorithm introduced by T. Denœux in [18]. In this method, a belief function $m_i$ is directly constructed using the information provided by the vectors $x^i$ located in the neighborhood of the unknown vector $x$ by:
\begin {eqnarray}
m^i(\omega_k\})=alpha^iphi^i(d^i)\\
m^i(\Omega) = 1-alpha^iphi^i(d^i)\
m^i(A) = 0,\forall A\in 2^\Omega \backslash \omega_k\},\Omega\}
\label{incertitude:eq:denoeux_modele}\tag{3.4.4}
\end{eqnarray}
where $d^i$ is the Euclidean distance from the unknown vector to the $x^i$ vector, $\alpha^i$ a parameter associated with the $i$ -th neighbor and $\phi^i(d^i)=e^{-\gamma^i(d^i)^2}$ with $\gamma^i$ a positive parameter (recall that the $\backslash$ operator means “deprived of”). The $k$ nearest neighbor method yields $k$ belief functions to be aggregated by the combination rule for decision making.

4 Proposal

4.1 Principle

The theory of evidence presents several elements of interest in our quest for a measure of the accuracy associated with the decision, as well as in the integration of such a measure into this decision-making. The first of these is certainly that of being able to propose a pessimistic and optimistic boundary for the decision, which can in the first instance be brought closer to the tolerance desired in association with the decision. The second is the introduction of the “weakening” factor. This factor takes into account the reliability of the operator providing the information. This reliability, or trust, is deduced from a learning process carried out beforehand, i.e. off-line. However, as the approach is constructed, whatever the value established, whatever the object from which the measurement is established, it will be considered at the same level of importance as all those coming from the operator.

This should not be seen as a criticism of the construction of the attenuation factor, which was introduced to allow the combination of different information sources, such as sensors or frequency bands in MRI. Behind this proposal lies the assumption of linearity in the behavior of the information source in the face of noise and/or the information to be acquired. While this assumption is certainly valid in the context in which this theory was developed, it is not valid in our more generic context.

There are two possible solutions for developing our proposal: the first would be to integrate a second factor into the formulation, which would depend solely on the object to be measured, via the operator $j$ . The second solution is to retain a single attenuation factor integrating the two aspects to be taken into account:

The contribution of the type of information considered to the decision-making process. This corresponds to a priori relevance, i.e. uniquely linked to the operator and considered following learning before the actual decision-making phase (the so-called on-line phase).
The quality of the measurement, which we call precision. This precision measurement is directly dependent on the object to be measured as seen through the operator’s eyes, and is carried out to take into account non-linearities in operator or sensor behavior (e.g. responses according to different tissues in medical imaging). This precision measurement will therefore be referred to as a posteriori precision measurement, as it is established at the end of the measurement.

In both cases, we propose to retain Appriou’s notation. The combination of the various attenuation elements is written in multiplicative form. For reasons of time, we have not combined these two parts of the total accuracy estimation (a priori and a posteriori) in this work. Another reason for this decision is the problem of estimating a priori accuracy, which is highly dependent on the construction of the learning base.

Let’s take a look at some of the avenues we’ve explored to answer this question in the context of classification. However, as the literature on the subject is already extensive, we felt that there were already many bases for answering this question.

4.2 Installation

The belief functions per class/classifier/attribute are combined (Dempster) to construct a belief function per class. The decision is then made according to equation 3.3.3. The decision mechanism is illustrated in figure 4.2.1. This figure shows the sequence of processes and the exchanges between them. Each variable in the formalism is represented here, along with its origin and destination. The source of the flow illustrating the individual to be analyzed ( $x^*$ ) is not shown, as we are not dealing with this part. $x^*$ can be an object injected directly into the process, but it can also be the result of subsequent processing.

DFD_chain — Figure 4.2.1 – Data flow diagram of the processing chain

The exploitation of Appriou’s expression, and our modified version of Appriou’s expression, exploits discrete labels based on the likelihood function $L(\omega_q|x)$ linking an individual $x$ to its membership of the class $\omega_q$ .

In order to obtain these functions, and according to the diagram in figure 4.2.2, these labels are in our case derived from a classification stage linking the individual to the $\omega_q$ class. Classically in this framework, the likelihood function $L(\omega_q|x)$ is then linked to the distance $C_i(\omega_q|x)$ of the individual $x$ to the class $\omega_q$ according to the classifier $i$ (depending on the classifier, the distance can be replaced by a degree of membership or a probability). However, the ability to associate an individual with a label by a classifier is dependent on the attribute set used. Similarly, each classifier proposes a different partitioning of the attribute space, depending on the class model used or the metrics implemented. We will therefore relate the likelihood function $L(\omega_q|x)$ to the distance $C_i(\omega_q|A_j(x))$ , where $A_j(x)$ represents the vector of attributes provided by the $j$ operator (Zernike moments of order 15, for example, or a 32-bin color histogram obtained by fuzzy C-means, etc.) and $C_i$ refers to the $i$ classifier.

A diagram of a process Description automatically generated — Figure 4.2.2 – Processing chain integrating attribute precision

The choice of this script allows us to combine several labels from different classifiers and/or different attributes. Depending on the chosen protocol, the different classifiers will either compete directly with each other, or a dynamic selection of the best classifier will take place before the merging step. In the latter case, the competition merges the labels from different attributes seen through the best classifier for each of them. The expression we arrive at is directly deduced from equation 3.3.3 :

(4.2.1) $\begin{eqnarray*} \left\{ \begin{array}{ll} m_{ijq}(\{omega_q\})&=0\\ì m_{ijq}(\overline{\omega_q})&=\alpha_{jq}(1-R_{ijq}.C_i(\omega_q|A_j(x)))\\\ m_{ijq}(\Omega)&=1-m_{ijq}(\overline{\omega_q})\\ \end{array} \right.\ \nonumber\forall {i}\in[1,n_c], \forall {j}\in[1,n_a], \forall {q}\in[1,n_q] \end{eqnarray*}$

Appriou incorporates $R$ , a normalization parameter of the formula established so that the belief mass $m_{ijq}(\overline{\omega_q})$ is between 0 and 1. $R$ is therefore a value between $0$ and the maximum value that can be estimated for $C_i(\omega_q|A_i(x))$ . This value of $R$ is unique for classifier $i$ and corresponds to the greatest distance identified or possible between an individual and the classes defined for this classifier, hence:
\begin {eqnarray}
\nonumber\mbox{for }0\leq 1-\frac{x}{y}\leq 1\mbox{ it is necessary }y\geq x\
\nonumber\mbox{ hence }y=sup_x(\max_q(C_i(\omega_q|A_j(x))))\
0<R\leq\frac{1}{\sup_x(\max_q(C_i(\omega_q|A_j(x))))}
\tag{4.2.2}
\end{eqnarray}

The belief mass we’ve just established is $m_{ijq}(\{omega_q\})$ , i.e. the belief mass associated with the fact that class $q$ is assigned to individual $x$ by classifier $i$ for attribute vector $j$ . What we’re interested in is the final decision to assign the class $q$ to the individual $x$ , and consequently the mass of belief in the class $q$ , $m_q(\{omega_q})$ , and the associated masses $m_q(\overline{\omega_q})$ and $m_q(\Omega)$ :

(4.2.3) $\begin{eqnarray*} \left\{ \begin{array}{l} m_q(\{omega_q\})\\ m_q(\overline{\omega_q})\ m_q(\Omega) \end{array} \right.\ \nonumber\forall q\in[1,n] \end{eqnarray*}$

Based on Dempster’s combination model, we formalize this decision making from $q$ intermediate decisions on a set of $n$ labels for a single set of attributes:

(4.2.4) $\begin{eqnarray*} \left\{ \begin{array}{ll} m(\{omega_k\})&=\bigoplus_{q=1}^{n_q}m_{iq}(\{omega_k\})\ m(\overline{\omega_k})&=\bigoplus_{q=1}^{n_q}m_{iq}(\overline{\omega_k}) \end{array} \right.\ \nonumber\forall q\in[1,n], \forall i\in[1,n_c] \end{eqnarray*}$

and :

(4.2.5) $\begin{equation*} m(\Omega)=1-\sum_{i=1}^{n}(m(\omega_i\})+m(\overline{\omega_i})) \end{equation*}$

More generally, in the presence of several sets of attributes from different operators (or different parameters), Dempster’s combination rule leads to the general formulation :

(4.2.6) $\begin{equation*} \left\{ \begin{array}{ll} m(\{\omega_k\})&=\bigoplus_{q=1}^{n_q}\bigoplus_{i=1}^{n_c}\bigoplus_{j=1}^{n_a}m_{ijq}(\omega_k\}) m(\overline{\omega_k})&=\bigoplus_{q=1}^{n_q}\bigoplus_{i=1}^{n_c}\bigoplus_{j=1}^{n_a}m_{ijq}(\overline{\omega_k})\\ \end{array} \right. \end{equation*}$

The data flow diagram in figure 4.2.1 illustrates how this script works.

Bibliography

[1] A. Appriou. Probabilities and uncertainties in multi-sensor data fusion. Revue Scientifique et Technique de la Défense, 11 :27-40, 1991.
[2] D. Arrivault. Apport des graphes dans la reconnaissance non-contrainte de caractères manuscrits anciens. PhD thesis, Université de Poitiers, 2002.
[3] I. Bloch. Fusion d’informations numériques: panorama méthodologique. In Journées Nationales de la Recherche en Robotique, 2005.[7]
[4] R. Caruana and A. Niculescu-Mizil. An empirical comparison of supervised learning algorithms. In ICML ’06: Proceedings of the 23rd international conference on Machine learning, pages 161-168, New York, NY, USA, 2006. ACM.
[5] T. M. Cover and P. E. Hart. Nearest neighbor pattern classification. IEEE Trans. on Information Theory 13(1):21-27, 1967.
[6] A. P. Dempster. Upper and lower probabilities induced by multiple valued mappings. Annals of Mathematical Statistics, 38 :325-339, 1967.
[7] A. P. Dempster. A generalization of bayesian inference. Journal of the Royal Statistical Society, 30 :205-247, 1968.
[8] T. Denœux. Analysis of evidence-theoretic decision rules for pattern classification. Pattern Recognition, 30(7) :1095-1107, 1997.
[9] J. Kholas and P. A. Monney. Lecture Notes in Economics and Mathematical Systems, volume 425, chapter A mathematical theory of hints: An approach to the Dempster-Shafer theory of evidence. Springer-Verlag, 1995.
[10] G. J. Klir and M. J. Wierman. Uncertainty-based information. Elements of generalized information theory, 2nd edition. Studies in fuzzyness and soft computing . Physica-Verlag, 1999.
[11] E. Lefevre, O. Colot, and P. Vannoorenberghe. Belief function combination and conflict management. Information Fusion, 3 :149-162, 2002.
[12] D. Mercier, B. Quost, and T. Denoœux. Refined modeling of sensor reliability in the belief function framework using contextual discounting. Information Fusion, 2007.
[13] E. Ramasso. Recognition of state sequences by the Transferable Belief Model – Application to the analysis of athletics videos. PhD thesis, Université Joseph Fourier de Grenoble, 2007.
[14] I. Rish. An empirical study of the naive bayes classifier. In IJCAI-01 workshop on “Empirical Methods in AI”, 2001.
[15] G. Shafer. A Mathematical Theory of Evidence. Princeton University Press, 1976.
[16] P. Smets. Beliefs functions: The disjunctive rule of combination and the generalized bayesian theorem. Int. Jour. of Approximate Reasoning 9 :1-35, 1993.
[17] P. Smets. The nature of the unnormalized beliefs encountered in the transferable belief model. In Proceedings of the 8th Annual Conference on Uncertainty in Artificial Intelligence, pages 292-297, 1992.
[18] P. Smets. Advances in the Dempster-Shafer Theory of Evidence – What is Dempster-Shafer’s model? Wiley, 1994.
[19] P. Smets. Decision making in the tbm: The necessity of the pignistic transformation. Int. Jour. of Approximate Reasoning 38 :133-147, 2005.
[20] P. Smets and R. Kennes. The transferable belief model. Artificial Intelligence, 66(2) :191-234, 1994.
[21] B. Yaghlane, P. Smets, and K. Mellouli. Yaghlane, P. Smets, and K. Mellouli. Belief function independence: I. the marginal case. Int. Jour. of Approximate Reasoning 29 :47-70, 2001.
[22] B. Yaghlane, P. Smets, and K. Mellouli. Belief function independence : II. the conditionnal case. Int. Jour. of Approximate Reasoning 31(1) :31-75, 2002.
[23] B. Yaghlane, P. Smets, and K. Mellouli. Independence concept for belief functions. In Physica-Verlag, editor, Technologies for constructing intelligent systems : Tools, 2002 [93].

$^1$ Proposal for a Regulation of the European Parliament and of the Council laying down harmonized rules on artificial intelligence (artificial intelligence legislation) and amending certain Union legislative acts

$^2$ ISO/IEC 42001