Riccardo Bellazzi, Blaz Zupan
Intelligent Data Analysis in Medicine and Pharmacology: A Position
Statement
Intelligent data analysis methods support information extraction from
data by exploiting domain's Background Knowledge. We address several
issues regarding definition, use and impact of these methods, and
investigate for their acceptance in application domains of medicine
and pharmacology by a MEDLINE search. The authors of the paper
believe that the basic philosophy of IDA is to be application driven:
its goal is to develop, adapt, or re-use existing methods to solve a
specific problem. Sticking to application driven approach may help to
prove the points on cost-effectiveness and may increase the awareness
and acceptance of these methods in medical community.
Nada Lavrac
Data Mining in Medicine: Selected Techniques and Applications
Widespread use of medical information systems and explosive growth of
medical databases require traditional manual data analysis to be
coupled with methods for efficient computer-assisted analysis. This
paper presents selected data mining techniques that can be applied in
medicine, and in particular some machine learning techniques including
the mechanisms that make them better suited for the analysis of
medical databases (derivation of symbolic rules, use of background
knowledge, sensitivity and specificity of induced descriptions). The
importance of the interpretability of results of data analysis is
discussed and illustrated on selected medical applications.
Ulf S. Carlin, Jan Komorowski and Aleksander Ohrn
Rough Set Analysis of Medical Datasets and A
Case of Patients with Suspected Acute
Appendicitis
A significant area in the field of medical informatics is concerned
with learning of medical models from lowlevel data. Ultimate goals of
this activity are, among others, development of classifiers or
predictors for unseen cases and analysis of the developed models so
that new insight into the nature of the given problem can be obtained.
This article introduces a methodology based on rough sets and
Boolean reasoning and illustrates its application on a dataset
describing 257 patients with suspected acute appendicitis. Exactly the
same dataset has previously been analyzed using logistic regression,
and the difference in performance between the two methods is
found to be very small. However, the rough set approach offers in
addition a set of decision rules that explicitly represent the
discovered knowledge. These automatically synthesized rules perform
better than a surgeon with a 2 to 6 year training. The main
attractions of rough sets for the medical informatics community should
be their classificatory power and, most importantly, the possibility
of mixing qualitative and quantitative parameters (both continuous and
discrete) and combining explicit (user-defined) and data-generated
models. There also exist now good toolkits running on Windows NT/95
that support the knowledge discovery process with rough sets. An
example is the Rosetta toolkit which is also available in public
version.
Dragan Gamberger, Nada Lavrac
Experiments with noise detection algorithms in
the diagnosis of coronary artery disease
The paper presents a series of noise detection experiments in a
medical problem of coronary artery disease diagnosis. The following
algorithms for noise detection and elimination are tested: a
classification filter, a saturation filter, a combined
classification-saturation filter, and a consensus saturation filter.
The distinguishing feature of the consensus saturation filter is its
high reliability which is due to multiple detection of potentially
noisy examples. Reliable detection of noisy examples is important for
the analysis of patient records in medical databases, as well as for
the induction of rules from filtered data, representing genuine
characteristics of the diagnostic domain. Medical evaluation in the
problem of coronary artery disease diagnosis shows that the detected
noisy examples are indeed noisy or non-typical class representatives.
Marc Weeber and Rein Vos
Extracting Expert Knowledge from Medical Texts
In this paper we argue that researchers in Intelligent Data Analysis
(IDA) in Medicine and Pharmacology should consider tex-tual databases
as an additional source of knowledge. We describe three areas where
medical knowledge extraction from textual data-bases can be fruitful:
finding new applications for existing drugs, evolution of medical
knowledge in time, and drug risk assessment. We evaluate two textual
knowledge extraction methods, where an IDA approach shows to be robust
and efficient compared to a com-mon computational linguistic one.
Steen Andreassen, Leonard Leibovici, Henrik C. Schonheyder, Brian
Kristensen, Christian Riekehr, Anders Geill Kjar and Kristian
G. Olesen
A Decision Theoretic Approach to Empirical Treatment of Bacteraemia
originating from the Urinary Tract
It is a dilemma that empirical antibiotic treatment with
broad-spectrum antibiotics provides a high probability of covering
treatment, but may be associated with unnecessary costs in the form of
direct expenses, side effects, and facilitated development of
antibiotic resistance. We present a decision support system (DSS) that
use a causal probabilistic network (CPN) to represent medical
knowledge and to analyze the data from each patient. The CPN has
utilities assigned to therapeutic options and outcome. Each category
of cost is expressed as loss of life-years (LY) and the ecological
consequences of antibiotic use is accounted for in LYs under the
assumption that antibiotic resistance leads to loss of therapeutic
options for future patients. For the purpose of building the CPN,
clinical data and outcomes were taken from a database covering
1992-94, containing data from 491 patients with urosepticaemia. By
necessity, a range of assumptions had to be made regarding average
life expectancy and the monetary value of one LY, which was set to
50,000 ECU. Simulations on data from 426 cases of urosepticaemia
collected during 1995-1996 showed the DSS capable of selecting
antibiotics of an overall lower price, higher coverage and less
ecological cost than the antibiotics actually chosen for empirical
treatment. Thus, a DSS incorporating the CPN could achieve a desirable
antibiotic policy, and it holds promise for improving empirical
antibiotic therapy.
Peter Hammond and Paul M. Speight
Screening for Risk of Oral Cancer and Pre-cancer
Oral cancer has relatively low incidence but is potentially very
serious if not identified early. For economic reasons, widespread
screening for the disease is not appropriate so opportunistic,
computer-aided screening in primary and secondary healthcare is being
investi-gated. The low incidence and disproportionately low number of
pos-itive diagnoses arising in oral screening programmes gives rise to
a very sparse dataset. Even so, the machine learning techniques
con-sidered here perform as well as general dental and hospital-based
screeners in selecting individuals at risk, if less well at avoiding
false positives. Once identified, patients can be recalled for a
detailed ex-amination of their oral mucosa and lifestyle counselling.
Sylvie Jami, Xiaohui Liu and George Loizou
Learning from an Incomplete and Uncertain Data
Set: The identification of variant haemoglobins
The use of AI techniques to deal with the identification of variant
haemoglobins has been so far restricted to the develop-ment of expert
systems of limited scope. The process of identify-ing haemoglobins is
difficult, particularly because the large number of missing values in
the data set hinders comparisons between data about an unknown
haemoglobin and data about known haemoglobins, in a context where
classification is not appropriate. Case-based reasoning requires an
assessment of similarity between a new case and known cases, and it is
possible to use a distance mea-sure making implicit assumptions about
the missing values. How-ever, the characteristics of the data set make
this unsafe, as well as the use of such assumptions when trying to
predict directly each col-umn using all the others. Consequently, the
induction of rules out of the available data, to be used to fill the
missing values, seems to be a good option, in partic-ular the use of
association rules, since they impose few requirements on the
data. However, they tend to produce too many rules of little interest.
The Flexible Consequent (FC) algorithm proposed herein represents an
attempt to produce fewer and more relevant rules in a flexible and
efficient way; the actual prediction of the missing values is not
considered here. A comparison is made between results obtained by
using standard association rules techniques and those obtained by
us-ing the FC algorithm.
Stefania Montani, Riccardo Bellazzi, Luigi Portinale, Stefano Fiocchi,
and Mario Stefanelli
A Case-Based Retrieval System for Diabetic Patients Therapy
We propose a decision support tool based on the Case Based Reasoning
technique, meant to help physicians in the retrieval of past similar
cases, able to provide a suggestion about the revision of diabetic
patients' therapy scheme. A case is defined as a set of features
collected during a visit. A taxonomy of prototypical situ-ations, or
classes, has been formalized; a set of cases belonging to these
classes has been stored into a relational data-base. For each in-put
case, the system allows the physician find similar situations that
already took place in the past, both for the same patient and for
dif-ferent ones. The reasoning process consists of two steps: 1)
finding the classes to which the input case could belong; 2) finding
the most similar cases from these classes, through a nearest neighbor
tech-nique. The tool is integrated in the EU funded T-IDDM (Telematic
management of Insulin Dependent Diabetes Mellitus) project.
Werner Horn, Christian Popow, and Lukas Unterasinger
Metaphor graphics to visualize ICU data over time
The time-oriented analysis of electronic patient records at
a (neonatal) intensive care unit is a tedious and time-consuming task.
The vast amount of data available makes it hard for the physician
to recognize the essential changes over time. VIE-VISU is a data
visualization system which uses multiples to present the change in
the patient's status over time in graphic form. Metaphor graphics
is used to sketch the parameters most relevant in characterizing the
situation of a patient.
Yuval Shahar and Cleve Cheng
Knowledge-Based Visualization and Navigation of Time-Oriented Clinical
Data and their Abstractions
We describe a framework (KNAVE) that is independent of any clinical
domain, but that is specific to the task of interpretation,
summarization, visualization, explanation, and interactive navigation
in a context-sensitive manner through time-oriented raw clinical data
and the multiple levels of higher-level, interval-based concepts that
can be abstracted from these data. The KNAVE domain-independent
navigation operators access the domain-specific knowledge base, which
is modeled by the formal ontology of the knowledge-based
temporal-abstraction method; the method generates the temporal
abstractions from the time-oriented database. Thus, domain-specific
knowledge underlies the semantics of the domain-independent
visualization and navigation processes. By accessing the
domain-specific temporal-abstraction knowledge base and the
domain-specific time-oriented database, the KNAVE modules enable users
to query for domain-specific temporal abstractions and to change the
focus of the visualization. The KNAVE framework reuses for a different
task (visualization and navigation) the same domain model that has
been acquired from physicians who are experts in the domain for the
purpose of temporal abstraction. Initial evaluation of the KNAVE
prototype has been encouraging. The KNAVE methodology has broad
implications for tasks such as therapy planning, patient monitoring,
explanation in medical decision-support systems, and semimanual data
mining in time-oriented clinical databases.
Short Papers
Katarina Kralj, Matjaz Kukar
Using Machine Learning to Analyse Attributes in the Diagnosis of
Coronary Artery Disease
Coronary artery disease (CAD) is one of the world's most important
diseases and causes of early mortality. The diagnostic process of CAD
consists of four levels. Various machine learning methods were used
for learning classifiers: naive and semi-naive Bayesian classifier,
K-nearest neighbours, Assistant-R, Assistant-I, and multilayered
feedforward neural net. We tried to extract the best subset of
attributes in order to maximise the classification accuracy while
minimising the number of attributes. Six estimates were used for
attribute filtering: information gain, chi-square test, ReliefF,
correlation
coefficient and square of ratios. The 6.9% increase of classification
accuracy was achieved using methods of machine learning compared to
the accuracy, achieved by physicians, by using domain that includes
the attributes of all three levels. If only attributes of the first
two levels were used, the increase of classification accuracy is 15.3%.
G.Paoli , G.Sanguineti , R.Anselmi, R.Bellazzi , F.Foppiano,
L.Andreucci
Experimental design for intelligent interpretation of Dose Volume
Histograms for radiotherapy treatment planning in head and neck
cancer.
The goal of radiation therapy is to achieve in a selected treatment
volume a dose distribution of radiation that provides the patient with
maximum tumour control and the least possible effect on surrounding
normal tissues. The target of our work is to establish if there is a
dependence between irradiation conditions and toxicity: particularly
from dose distribution and toxicities. Consequently the secondary aim
is to verify if it is possible to use Dose Volume Histograms to
forecast toxicity episodes of a radiotherapy planning. It was decided
to face the problem step by step: first step was to analyse data with
a statistical tool; then we are now going to use decision tree and
neural network to try to better describe the non linear relationships
underlying parameters dependencies. This work is in progress: we
cannot assume to end it in a short time because it is necessary to
have a large patients database and it is necessary to know patients
acute toxicity.
Ingo J. Timm
Automatic Generation of Risk Classification
for Decision Support in Critical Care
Modern critical care medicine is marked by a steadily
growing amount of available patient data. This causes a new chal-lenge
in extracting significant information concerning the patient's
prognosis or diagnosis. Classic non-computer science approaches of
decision support are prognostic scoring systems like the APACHE
Scoring System. Surprisingly, these risk scores although very com-mon
in the U.S.A., are not widely used within decision processes on
intensive care units (ICU) in Germany.
This paper addresses this phenomenon and suggests a methodology
for determining risks for a given patient, based on previous experi-ence
with other patients. The here presented automatic risk classifi-cation
is based on knowledge discovery in databases (KDD).
Gunther Kohler, Dimitrij Surmeli and Horst-Michael Gross
The state of traumatic coma patient can be
visualized by means of a SOM
This paper aims at efficient visualization and intelligent alarming
while monitoring the state of traumatic coma patients. We apply and
extend a visualization method that is well known in knowledge
discovery to monitoring the state of traumatic coma patients. We argue
for state observa- tion using a set of geometric shapes such as cubes
or polygons in a single display rather than observing a set of time
se- ries graphs. The necessary transformations can be carried out by
means of a SOM. Beyond mappings, it allows intelligent alarming and
state prognosis. Advantages of this approach are discussed along with
some limitations.