Regular Papers


Riccardo Bellazzi, Blaz Zupan
Intelligent Data Analysis in Medicine and Pharmacology: A Position Statement

Intelligent data analysis methods support information extraction from data by exploiting domain's Background Knowledge. We address several issues regarding definition, use and impact of these methods, and investigate for their acceptance in application domains of medicine and pharmacology by a MEDLINE search. The authors of the paper believe that the basic philosophy of IDA is to be application driven: its goal is to develop, adapt, or re-use existing methods to solve a specific problem. Sticking to application driven approach may help to prove the points on cost-effectiveness and may increase the awareness and acceptance of these methods in medical community.


Nada Lavrac
Data Mining in Medicine: Selected Techniques and Applications

Widespread use of medical information systems and explosive growth of medical databases require traditional manual data analysis to be coupled with methods for efficient computer-assisted analysis. This paper presents selected data mining techniques that can be applied in medicine, and in particular some machine learning techniques including the mechanisms that make them better suited for the analysis of medical databases (derivation of symbolic rules, use of background knowledge, sensitivity and specificity of induced descriptions). The importance of the interpretability of results of data analysis is discussed and illustrated on selected medical applications.


Ulf S. Carlin, Jan Komorowski and Aleksander Ohrn
Rough Set Analysis of Medical Datasets and A Case of Patients with Suspected Acute Appendicitis

A significant area in the field of medical informatics is concerned with learning of medical models from lowlevel data. Ultimate goals of this activity are, among others, development of classifiers or predictors for unseen cases and analysis of the developed models so that new insight into the nature of the given problem can be obtained. This article introduces a methodology based on rough sets and Boolean reasoning and illustrates its application on a dataset describing 257 patients with suspected acute appendicitis. Exactly the same dataset has previously been analyzed using logistic regression, and the difference in performance between the two methods is found to be very small. However, the rough set approach offers in addition a set of decision rules that explicitly represent the discovered knowledge. These automatically synthesized rules perform better than a surgeon with a 2 to 6 year training. The main attractions of rough sets for the medical informatics community should be their classificatory power and, most importantly, the possibility of mixing qualitative and quantitative parameters (both continuous and discrete) and combining explicit (user-defined) and data-generated models. There also exist now good toolkits running on Windows NT/95 that support the knowledge discovery process with rough sets. An example is the Rosetta toolkit which is also available in public version.


Dragan Gamberger, Nada Lavrac
Experiments with noise detection algorithms in the diagnosis of coronary artery disease

The paper presents a series of noise detection experiments in a medical problem of coronary artery disease diagnosis. The following algorithms for noise detection and elimination are tested: a classification filter, a saturation filter, a combined classification-saturation filter, and a consensus saturation filter. The distinguishing feature of the consensus saturation filter is its high reliability which is due to multiple detection of potentially noisy examples. Reliable detection of noisy examples is important for the analysis of patient records in medical databases, as well as for the induction of rules from filtered data, representing genuine characteristics of the diagnostic domain. Medical evaluation in the problem of coronary artery disease diagnosis shows that the detected noisy examples are indeed noisy or non-typical class representatives.


Marc Weeber and Rein Vos
Extracting Expert Knowledge from Medical Texts

In this paper we argue that researchers in Intelligent Data Analysis (IDA) in Medicine and Pharmacology should consider tex-tual databases as an additional source of knowledge. We describe three areas where medical knowledge extraction from textual data-bases can be fruitful: finding new applications for existing drugs, evolution of medical knowledge in time, and drug risk assessment. We evaluate two textual knowledge extraction methods, where an IDA approach shows to be robust and efficient compared to a com-mon computational linguistic one.


Steen Andreassen, Leonard Leibovici, Henrik C. Schonheyder, Brian Kristensen, Christian Riekehr, Anders Geill Kjar and Kristian G. Olesen
A Decision Theoretic Approach to Empirical Treatment of Bacteraemia originating from the Urinary Tract

It is a dilemma that empirical antibiotic treatment with broad-spectrum antibiotics provides a high probability of covering treatment, but may be associated with unnecessary costs in the form of direct expenses, side effects, and facilitated development of antibiotic resistance. We present a decision support system (DSS) that use a causal probabilistic network (CPN) to represent medical knowledge and to analyze the data from each patient. The CPN has utilities assigned to therapeutic options and outcome. Each category of cost is expressed as loss of life-years (LY) and the ecological consequences of antibiotic use is accounted for in LYs under the assumption that antibiotic resistance leads to loss of therapeutic options for future patients. For the purpose of building the CPN, clinical data and outcomes were taken from a database covering 1992-94, containing data from 491 patients with urosepticaemia. By necessity, a range of assumptions had to be made regarding average life expectancy and the monetary value of one LY, which was set to 50,000 ECU. Simulations on data from 426 cases of urosepticaemia collected during 1995-1996 showed the DSS capable of selecting antibiotics of an overall lower price, higher coverage and less ecological cost than the antibiotics actually chosen for empirical treatment. Thus, a DSS incorporating the CPN could achieve a desirable antibiotic policy, and it holds promise for improving empirical antibiotic therapy.


Peter Hammond and Paul M. Speight
Screening for Risk of Oral Cancer and Pre-cancer

Oral cancer has relatively low incidence but is potentially very serious if not identified early. For economic reasons, widespread screening for the disease is not appropriate so opportunistic, computer-aided screening in primary and secondary healthcare is being investi-gated. The low incidence and disproportionately low number of pos-itive diagnoses arising in oral screening programmes gives rise to a very sparse dataset. Even so, the machine learning techniques con-sidered here perform as well as general dental and hospital-based screeners in selecting individuals at risk, if less well at avoiding false positives. Once identified, patients can be recalled for a detailed ex-amination of their oral mucosa and lifestyle counselling.


Sylvie Jami, Xiaohui Liu and George Loizou
Learning from an Incomplete and Uncertain Data Set: The identification of variant haemoglobins

The use of AI techniques to deal with the identification of variant haemoglobins has been so far restricted to the develop-ment of expert systems of limited scope. The process of identify-ing haemoglobins is difficult, particularly because the large number of missing values in the data set hinders comparisons between data about an unknown haemoglobin and data about known haemoglobins, in a context where classification is not appropriate. Case-based reasoning requires an assessment of similarity between a new case and known cases, and it is possible to use a distance mea-sure making implicit assumptions about the missing values. How-ever, the characteristics of the data set make this unsafe, as well as the use of such assumptions when trying to predict directly each col-umn using all the others. Consequently, the induction of rules out of the available data, to be used to fill the missing values, seems to be a good option, in partic-ular the use of association rules, since they impose few requirements on the data. However, they tend to produce too many rules of little interest. The Flexible Consequent (FC) algorithm proposed herein represents an attempt to produce fewer and more relevant rules in a flexible and efficient way; the actual prediction of the missing values is not considered here. A comparison is made between results obtained by using standard association rules techniques and those obtained by us-ing the FC algorithm.


Stefania Montani, Riccardo Bellazzi, Luigi Portinale, Stefano Fiocchi, and Mario Stefanelli
A Case-Based Retrieval System for Diabetic Patients Therapy

We propose a decision support tool based on the Case Based Reasoning technique, meant to help physicians in the retrieval of past similar cases, able to provide a suggestion about the revision of diabetic patients' therapy scheme. A case is defined as a set of features collected during a visit. A taxonomy of prototypical situ-ations, or classes, has been formalized; a set of cases belonging to these classes has been stored into a relational data-base. For each in-put case, the system allows the physician find similar situations that already took place in the past, both for the same patient and for dif-ferent ones. The reasoning process consists of two steps: 1) finding the classes to which the input case could belong; 2) finding the most similar cases from these classes, through a nearest neighbor tech-nique. The tool is integrated in the EU funded T-IDDM (Telematic management of Insulin Dependent Diabetes Mellitus) project.


Werner Horn, Christian Popow, and Lukas Unterasinger
Metaphor graphics to visualize ICU data over time

The time-oriented analysis of electronic patient records at a (neonatal) intensive care unit is a tedious and time-consuming task. The vast amount of data available makes it hard for the physician to recognize the essential changes over time. VIE-VISU is a data visualization system which uses multiples to present the change in the patient's status over time in graphic form. Metaphor graphics is used to sketch the parameters most relevant in characterizing the situation of a patient.


Yuval Shahar and Cleve Cheng
Knowledge-Based Visualization and Navigation of Time-Oriented Clinical Data and their Abstractions

We describe a framework (KNAVE) that is independent of any clinical domain, but that is specific to the task of interpretation, summarization, visualization, explanation, and interactive navigation in a context-sensitive manner through time-oriented raw clinical data and the multiple levels of higher-level, interval-based concepts that can be abstracted from these data. The KNAVE domain-independent navigation operators access the domain-specific knowledge base, which is modeled by the formal ontology of the knowledge-based temporal-abstraction method; the method generates the temporal abstractions from the time-oriented database. Thus, domain-specific knowledge underlies the semantics of the domain-independent visualization and navigation processes. By accessing the domain-specific temporal-abstraction knowledge base and the domain-specific time-oriented database, the KNAVE modules enable users to query for domain-specific temporal abstractions and to change the focus of the visualization. The KNAVE framework reuses for a different task (visualization and navigation) the same domain model that has been acquired from physicians who are experts in the domain for the purpose of temporal abstraction. Initial evaluation of the KNAVE prototype has been encouraging. The KNAVE methodology has broad implications for tasks such as therapy planning, patient monitoring, explanation in medical decision-support systems, and semimanual data mining in time-oriented clinical databases.


Short Papers


Katarina Kralj, Matjaz Kukar
Using Machine Learning to Analyse Attributes in the Diagnosis of Coronary Artery Disease

Coronary artery disease (CAD) is one of the world's most important diseases and causes of early mortality. The diagnostic process of CAD consists of four levels. Various machine learning methods were used for learning classifiers: naive and semi-naive Bayesian classifier, K-nearest neighbours, Assistant-R, Assistant-I, and multilayered feedforward neural net. We tried to extract the best subset of attributes in order to maximise the classification accuracy while minimising the number of attributes. Six estimates were used for attribute filtering: information gain, chi-square test, ReliefF, correlation coefficient and square of ratios. The 6.9% increase of classification accuracy was achieved using methods of machine learning compared to the accuracy, achieved by physicians, by using domain that includes the attributes of all three levels. If only attributes of the first two levels were used, the increase of classification accuracy is 15.3%.


G.Paoli , G.Sanguineti , R.Anselmi, R.Bellazzi , F.Foppiano, L.Andreucci
Experimental design for intelligent interpretation of Dose Volume Histograms for radiotherapy treatment planning in head and neck cancer.

The goal of radiation therapy is to achieve in a selected treatment volume a dose distribution of radiation that provides the patient with maximum tumour control and the least possible effect on surrounding normal tissues. The target of our work is to establish if there is a dependence between irradiation conditions and toxicity: particularly from dose distribution and toxicities. Consequently the secondary aim is to verify if it is possible to use Dose Volume Histograms to forecast toxicity episodes of a radiotherapy planning. It was decided to face the problem step by step: first step was to analyse data with a statistical tool; then we are now going to use decision tree and neural network to try to better describe the non linear relationships underlying parameters dependencies. This work is in progress: we cannot assume to end it in a short time because it is necessary to have a large patients database and it is necessary to know patients acute toxicity.


Ingo J. Timm
Automatic Generation of Risk Classification for Decision Support in Critical Care

Modern critical care medicine is marked by a steadily growing amount of available patient data. This causes a new chal-lenge in extracting significant information concerning the patient's prognosis or diagnosis. Classic non-computer science approaches of decision support are prognostic scoring systems like the APACHE Scoring System. Surprisingly, these risk scores although very com-mon in the U.S.A., are not widely used within decision processes on intensive care units (ICU) in Germany. This paper addresses this phenomenon and suggests a methodology for determining risks for a given patient, based on previous experi-ence with other patients. The here presented automatic risk classifi-cation is based on knowledge discovery in databases (KDD).


Gunther Kohler, Dimitrij Surmeli and Horst-Michael Gross
The state of traumatic coma patient can be visualized by means of a SOM

This paper aims at efficient visualization and intelligent alarming while monitoring the state of traumatic coma patients. We apply and extend a visualization method that is well known in knowledge discovery to monitoring the state of traumatic coma patients. We argue for state observa- tion using a set of geometric shapes such as cubes or polygons in a single display rather than observing a set of time se- ries graphs. The necessary transformations can be carried out by means of a SOM. Beyond mappings, it allows intelligent alarming and state prognosis. Advantages of this approach are discussed along with some limitations.