The recent years have seen many developments in uncertainty reasoning taking place around Bayesian Networks
(BNs). BNs allow fast and efficient probabilistic reasoning. One of the key issues that researchers have
faced in using a BN is determining its parameters and structure for a given problem. Many techniques have
been developed for learning BN parameters from a given dataset pertaining to a particular problem. Most
of the methods developed for learning BN parameters from partially observed data have evolved around the
Expectation-Maximization (EM) algorithm. In its original form, EM algorithm is a deterministic iterative two-step
procedure that converges towards the maximum-likelihood (ML) estimates.
The EM algorithm mainly focuses on learning BN parameters from imperfect data where some of the values are
missing. However in many practical applications, partial observability results in a wider range of imperfections,
e.g., uncertainties arising from incomplete, ambiguous, probabilistic, and belief theoretic data. Moreover, while
convergence is to their ML estimates, the EM algorithm does not guarantee convergence to the underlying true
In this paper, we propose an approach that enables one to learn BN parameters from a dataset containing
a wider variety of imperfections. In addition, by introducing an early stopping criterion together with a new
initialization method to the EM-algorithm, we show how the BN parameters could be learnt so that they are
closer to the underlying true parameters than the converged ML estimated parameters.
Numerous applications of topical interest call for knowledge discovery and classification from information that may be inaccurate and/or incomplete. For example, in an airport threat classification scenario, data from heterogeneous sensors are used to extract features for classifying potential threats. This requires a training set that utilizes non-traditional information sources (e.g., domain experts) to assign a threat level to each training set instance. Sensor reliability, accuracy, noise, etc., all contribute to feature level ambiguities; conflicting opinions of experts generate class label ambiguities that may however indicate important clues. To accommodate these, a belief theoretic approach is proposed. It utilizes a data structure that facilitates belief/plausibility queries regarding “ambiguous” itemsets. An efficient apriori-like algorithm is then developed to extract frequent such itemsets and to generate corresponding association rules. These are then used to classify an incoming “ambiguous” data instance into a class label (which may be “hard” or “soft”). To test its performance, the proposed algorithm is compared with C4.5 for several databases from the UCI repository and a threat assessment application scenario.