Guillermo Díaz Herrero, José Miguel Franco Valiente, César Suárez Ortega1, Manuel Rubio del Solar, Guillermo Díaz Herrero, Carmen Martín Moreno, Miguel Ángel Guevara López, Naimy González de Posada, Raúl Ramos Pollán, Isabel Ramos, Joanna Loureiro, Teresa Fernández, Bruno Araújo, Inés C. Moreira
Lugar: Mexico City
Fecha: 27-29 Junio, 2012
Tipo de publicación: Oral
This paper describes the main results of the IMED Project. The IMED project is collaboration among CETA-CIEMAT in Spain, the Faculty of Engineering (FEUP) and the Faculty of Medicine of the University of Porto (FMUP) and Hospital Sao Joao in Porto (HSJ), Portugal, devoted to the research of computer-based detection and diagnosis methods of breast cancer.
The project activities can be separated in two main task groups: 1) data gathering and dataset building and 2) designing and modelling of Machine Learning Classifiers (MLC).
During the first stage, a set of real patient cases (clinical data and mammography images) provided by the Faculty of Medicine has been digitalised. These cases are properly anonymised to be compliant to the privacy regulations. Besides, a digital repository, called Breast Cancer Digital Repository (BCDR), has been designed and implemented to store all the data from the digitalised set of cases from the FMUP. The metadata of the digital repository is a subset of the DICOM-SR standard customised according the requirements of the staff of the FMUP and Hospital Sao Joao. The BCDR is being hosted in an instance of the Digital Repository Infrastructure (DRI), a software platform aimed at simplifying and reducing the cost of hosting digital repositories over Grid infrastructures and that has been developed at CETA-CIEMAT. Currently, the BCDR contains cases of 1610 patients with mammography and ultrasound images, clinical history, lesion segmentations and selected 18 pre-computed image-based descriptors. Patient cases are BI-RADS classified and annotated by specialised radiologists. At the time of writing, the BCDR includes 820 segmented lesions (276 biopsy-proven).
In this first stage, a software suite called Mammography Image Workstation for Analysis and Diagnosis (MIWAD) has been also developed. MIWAD consists of a desktop application, called MIWAD-DB, that allows storage, retrieval and management of patient and case information of the BCDR and a specialised graphical user interface, called MIWAD-CAD, for processing, analysing and diagnosing mammography images by combining digital image processing, pattern recognition and artificial intelligence techniques. MIWAD is currently being used to feed the BCDR with new patient cases from the FMUP and it is also used by experts from the FMUP and Hospital Sao Joao to evaluate and classify those cases. As previously mentioned, the project has a second stage that consists of the design and implementation of automatic classifiers for breast cancer detection and diagnosis. The content of the BCDR is used to build datasets to train machine learning classifiers to be later integrated in MIWAD for automatic classification to be used by specialists as second opinion to support their diagnosis and patient management decisions. Grid computing is been used to make massive explorations of artificial neural networks (ANN) and support vector machines (SVM) classifiers to find well performing classifiers for our datasets. At the time of writing, results between 0,661 and 0,996 ROC Az has been obtained.
As a future work, the project plans to open the content of the BCDR to the public. It is also planned to feed the repository with new cases provided by the FMUP that will contain high resolution mammography images from a digital mammography station. Besides, The BCDR data model will be evolved to be full-compliant with the DICOM-SR standard for breast imaging. According the design and development of classifiers, the project will continue to evaluate which image descriptors are more suitable for breast imaging and new MLCs configuration will be evaluated.
As a conclusion, the IMED project should be considered as a good case of exploitation of e- infrastructures for collaborative research as both GRID storage and computing capacities are being used for storing the BCDR content and evaluating classifiers configurations.