In the last three decades, there has been an exponential growth in the area of information technology providing the
information processing needs of data-driven businesses in government, science, and private industry in the form of
capturing, staging, integrating, conveying, analyzing, and transferring data that will help knowledge workers and
decision makers make sound business decisions. Data integration across enterprise warehouses is one of the most
challenging steps in the big data analytics strategy. Several levels of data integration have been identified across
enterprise warehouses: data accessibility, common data platform, and consolidated data model. Each level of integration
has its own set of complexities that requires a certain amount of time, budget, and resources to implement. Such levels of
integration are designed to address the technical challenges inherent in consolidating the disparate data sources. In this
paper, we present a methodology based on industry best practices to measure the readiness of an organization and its
data sets against the different levels of data integration. We introduce a new Integration Level Model (ILM) tool, which
is used for quantifying an organization and data system’s readiness to share data at a certain level of data integration. It
is based largely on the established and accepted framework provided in the Data Management Association (DAMADMBOK).
It comprises several key data management functions and supporting activities, together with several
environmental elements that describe and apply to each function. The proposed model scores the maturity of a system’s
data governance processes and provides a pragmatic methodology for evaluating integration risks. The higher the
computed scores, the better managed the source data system and the greater the likelihood that the data system can be
brought in at a higher level of integration.
Educational data analytics is an emerging discipline, concerned with developing methods for exploring the unique types
of data that come from the educational context. For example, predicting college student performance is crucial for both
the student and educational institutions. It can support timely intervention to prevent students from failing a course,
increasing efficacy of advising functions, and improving course completion rate. In this paper, we present the efforts
carried out at Oak Ridge National Laboratory (ORNL) toward conducting predictive analytics to academic data collected
from 2009 through 2013 and available in one of the most commonly used learning management systems, called Moodle.
First, we have identified the data features useful for predicting student outcomes such as students’ scores in homework
assignments, quizzes, exams, in addition to their activities in discussion forums and their total GPA at the same term
they enrolled in the course. Then, Logistic Regression and Neural Network predictive models are used to identify
students as early as possible that are in danger of failing the course they are currently enrolled in. These models compute
the likelihood of any given student failing (or passing) the current course. Numerical results are presented to evaluate
and compare the performance of the developed models and their predictive accuracy.
The majority of funding for research and development (R&D) in cyber-security is focused on the end of the software
lifecycle where systems have been deployed or are nearing deployment. Recruiting of cyber-security personnel is
similarly focused on end-of-life expertise. By emphasizing cyber-security at these late stages, security problems are
found and corrected when it is most expensive to do so, thus increasing the cost of owning and operating complex
software systems. Worse, expenditures on expensive security measures often mean less money for innovative
developments. These unwanted increases in cost and potential slowing of innovation are unavoidable consequences of an
approach to security that finds and remediate faults after software has been implemented. We argue that software
security can be improved and the total cost of a software system can be substantially reduced by an appropriate
allocation of resources to the early stages of a software project. By adopting a similar allocation of R&D funds to the
early stages of the software lifecycle, we propose that the costs of cyber-security can be better controlled and,
consequently, the positive effects of this R&D on industry will be much more pronounced.
The success of data-driven business in government, science, and private industry is driving the need for seamless
integration of intra and inter-enterprise data sources to extract knowledge nuggets in the form of correlations, trends,
patterns and behaviors previously not discovered due to physical and logical separation of datasets. Today, as volume,
velocity, variety and complexity of enterprise data keeps increasing, the next generation analysts are facing several
challenges in the knowledge extraction process. Towards addressing these challenges, data-driven organizations that rely
on the success of their analysts have to make investment decisions for sustainable data/information systems and
knowledge discovery. Options that organizations are considering are newer storage/analysis architectures, better analysis
machines, redesigned analysis algorithms, collaborative knowledge management tools, and query builders amongst
many others. In this paper, we present a concept of operations for enabling knowledge discovery that data-driven
organizations can leverage towards making their investment decisions. We base our recommendations on the experience
gained from integrating multi-agency enterprise data warehouses at the Oak Ridge National Laboratory to design the
foundation of future knowledge nurturing data-system architectures.
The US Congress has passed legislation dictating that all government agencies establish a plan and process for
improving energy efficiencies at their sites. In response to this legislation, Oak Ridge National Laboratory (ORNL) has
recently conducted a pilot study to explore the deployment of a wireless sensor system for a real-time measurement-based
energy efficiency optimization framework within the steam distribution system within the ORNL campus. We
make assessments on the real-time status of the distribution system by observing the state measurements of acoustic
sensors mounted on the steam pipes/traps/valves. In this paper, we describe a spectral-based energy signature scheme
that interprets acoustic vibration sensor data to estimate steam flow rates and assess steam traps health status.
Experimental results show that the energy signature scheme has the potential to identify different steam trap health status
and it has sufficient sensitivity to estimate steam flow rate. Moreover, results indicate a nearly quadratic relationship
over the test region between the overall energy signature factor and flow rate in the pipe. The analysis based on
estimated steam flow and steam trap status helps generate alerts that enable operators and maintenance personnel to take
remedial action. The goal is to achieve significant energy-saving in steam lines by monitoring and acting on leaking
The Extreme Measurement Communications Center at Oak Ridge National Laboratory (ORNL) explores the deployment
of a wireless sensor system with a real-time measurement-based energy efficiency optimization framework in the ORNL
campus. With particular focus on the 12-mile long steam distribution network in our campus, we propose an integrated
system-level approach to optimize the energy delivery within the steam distribution system. We address the goal of
achieving significant energy-saving in steam lines by monitoring and acting on leaking steam valves/traps. Our approach
leverages an integrated wireless sensor and real-time monitoring capabilities. We make assessments on the real-time
status of the distribution system by mounting acoustic sensors on the steam pipes/traps/valves and observe the state
measurements of these sensors. Our assessments are based on analysis of the wireless sensor measurements. We describe
Fourier-spectrum based algorithms that interpret acoustic vibration sensor data to characterize flows and classify the
steam system status. We are able to present the sensor readings, steam flow, steam trap status and the assessed alerts as
an interactive overlay within a web-based Google Earth geographic platform that enables decision makers to take
remedial action. We believe our demonstration serves as an instantiation of a platform that extends implementation to
include newer modalities to manage water flow, sewage and energy consumption.
The problem of optical character recognition (OCR) of handwritten Arabic has not received a satisfactory solution yet.
In this paper, an Arabic OCR algorithm is developed based on Hidden Markov Models (HMMs) combined with the
Viterbi algorithm, which results in an improved and more robust recognition of characters at the sub-word level.
Integrating the HMMs represents another step of the overall OCR trends being currently researched in the literature. The
proposed approach exploits the structure of characters in the Arabic language in addition to their extracted features to
achieve improved recognition rates. Useful statistical information of the Arabic language is initially extracted and then
used to estimate the probabilistic parameters of the mathematical HMM. A new custom implementation of the HMM is
developed in this study, where the transition matrix is built based on the collected large corpus, and the emission matrix
is built based on the results obtained via the extracted character features. The recognition process is triggered using the
Viterbi algorithm which employs the most probable sequence of sub-words. The model was implemented to recognize
the sub-word unit of Arabic text raising the recognition rate from being linked to the worst recognition rate for any
character to the overall structure of the Arabic language. Numerical results show that there is a potentially large
recognition improvement by using the proposed algorithms.
In this paper, we conduct performance evaluation study for an aviation security cargo inspection queuing system for
material flow and accountability. The queuing model employed in our study is based on discrete-event simulation and
processes various types of cargo simultaneously. Onsite measurements are collected in an airport facility to validate the
queuing model. The overall performance of the aviation security cargo inspection system is computed, analyzed, and
optimized for the different system dynamics. Various performance measures are considered such as system capacity,
residual capacity, throughput, capacity utilization, subscribed capacity utilization, resources capacity utilization,
subscribed resources capacity utilization, and number of cargo pieces (or pallets) in the different queues. These metrics
are performance indicators of the system's ability to service current needs and response capacity to additional requests.
We studied and analyzed different scenarios by changing various model parameters such as number of pieces per pallet,
number of TSA inspectors and ATS personnel, number of forklifts, number of explosives trace detection (ETD) and
explosives detection system (EDS) inspection machines, inspection modality distribution, alarm rate, and cargo closeout
time. The increased physical understanding resulting from execution of the queuing model utilizing these vetted
performance measures should reduce the overall cost and shipping delays associated with new inspection requirements.
Recent events highlight the need for efficient tools for anticipating the threat posed by terrorists, whether individual or
groups. Antiterrorism includes fostering awareness of potential threats, deterring aggressors, developing security
measures, planning for future events, halting an event in process, and ultimately mitigating and managing the
consequences of an event. To analyze such components, one must understand various aspects of threat elements like
physical assets and their economic and social impacts. To this aim, we developed a three-layer Bayesian belief network
(BBN) model that takes into consideration the relative threat of an attack against a particular asset (physical layer) as
well as the individual psychology and motivations that would induce a person to either act alone or join a terrorist group
and commit terrorist acts (social and economic layers). After researching the many possible motivations to become a
terrorist, the main factors are compiled and sorted into categories such as initial and personal indicators, exclusion
factors, and predictive behaviors. Assessing such threats requires combining information from disparate data sources
most of which involve uncertainties. BBN combines these data in a coherent, analytically defensible, and understandable
manner. The developed BBN model takes into consideration the likelihood and consequence of a threat in order to draw
inferences about the risk of a terrorist attack so that mitigation efforts can be optimally deployed. The model is
constructed using a network engineering process that treats the probability distributions of all the BBN nodes within the
broader context of the system development process.
Beginning in 2010, the U.S. will require that all cargo loaded in passenger aircraft be inspected. This will require more
efficient processing of cargo and will have a significant impact on the inspection protocols and business practices of
government agencies and the airlines. In this paper, we develop an aviation security cargo inspection queuing simulation
model for material flow and accountability that will allow cargo managers to conduct impact studies of current and
proposed business practices as they relate to inspection procedures, material flow, and accountability.