Big Data, Big Science, Smart Medicine?

avril 20, 2015 | François Sigaux, Franck Lethimonnier

Throughout the 21st century the science of biology will undergo profound methodological and conceptual change, perhaps of the same magnitude as that of physics in the early 20th century. The rise of molecular biology over the last forty years has revolutionized our capacity to analyze and understand living things. And for the past twenty years, increasingly comprehensive and rapid sourcing of data (e.g. genome sequencing, genomics, proteomics etc) has led to vast volumes of data, much of which has yet to be analyzed. The EBI (European Bioinformatics Institute), one of the largest data banks, now contains over 20 petabytes of gene sequences, genome and protein data and the amount of genomic data more than doubles every year.

The development of technologies for data storage, sharing, structuring and analysis is likely to profoundly change the fields of medicine, healthcare and well-being. Two new entities have appeared in diagnostic and therapeutic medicine, in addition to the traditional parties involved (i.e. doctor and patient): digital avatars that could be called the digital doctor and patient. They interact by extracting shared decision-making information from the database, both for diagnosis and for therapeutic care. In public healthcare and welfare, information provided by connected objects and traces left in online forums help to create a specific aspect of digital beings. While these avatars are invaluable sources of information for public and individual healthcare, they also raise legitimate ethical and regulatory concerns.

1) The growing diversification of data sources for electronic medical records

Currently, the electronic medical records set up by certain health care facilities generally contain only limited structured administrative data and sometimes diagnoses, biological and imaging data, as well as lists of diagnostic and therapeutic procedures. In France, the National Health Insurance Fund (Caisse nationale de l’assurance maladie, CNAM) has established several databases, including the National Health Insurance Cross-Schemes Information System (Système national d’information inter-régimes de l’Assurance maladie, SNIIRAM). This database includes only some elements at the individual (personal data) and group (anonymous data) levels, and notably excludes results of clinical examinations and diagnostic tests. Contemporary approaches to health care, and notably those related to personalized (or precision) medicine, have demonstrated the importance of compiling anonymous electronic information within collective databases. The contents of a given patient’s electronic medical record can be entered in a database in order to apply decision aid algorithms. With the patient’s agreement, this medical record can also be integrated in these databases, thereby enriching the knowledge and improving the quality of the algorithms. Under the premise that the greater the amount of data in such databases, the better the individual patient care, it is therefore essential to harmonize, increase and diversify digital patient data. Genomics and other omics deriving from this field represent major sources of data, the cost of production of which is becoming more accessible with recent advances in technology. In medical fields such as oncology, for example, the considerable amount of data produced from omics could represent several hundred petabytes per year in the near future. Moreover, an increasing percentage of the general population is today using connected objects to monitor specific health parameters. This could also be a major source of medical data, especially if the continuous recording of such data could be implemented. This growing diversity of data sources raises the question of their collection and the definition of the ontologies required for interoperability. The gradual shift from illness to health therefore induces changes of scale and complexity of big data.

2) The growing importance of Big Science and maintaining the doctor-patient relationship, for medicine that is smart, personalized and ethical

The opinion held by some in the industry, namely that the simple “reading” of electronic data (possibly also by the patients themselves) could provide information useful in such complex situations as treating illnesses, appears unrealistic and difficult to defend from an ethical point of view. The true medical value of these data cannot be achieved without an in-depth understanding of their meaning. This requires new breakthroughs in computational science in order to develop shared doctor/patient medical decision aid algorithms, forming a sort of Big Science in the field of health care. These algorithms, especially those from systems biology, will benefit from the globalization of databases, each patient being considered as a disruption in the system due to an unfortunate experiment of nature, namely illness. The validation of these algorithms requires a two-way match using biological models and comparison with the illness natural history of a large number of patients. It also takes advantage of the comparison between digital patient data and multi-scale modelling of living organisms, which offers the theoretical possibility of predicting in silico the consequences of disturbances in the model by therapeutics. This is, for example, one of the objectives of the EU’s Human Brain Project.

3) Strengthening interaction between imaging, pathology and therapeutics when scaling to take advantage of nanotechnology

The scale at which the data are viewed is key in order to “de-pixelize” the image of certain diseases. Thus, when anomalies are topographically limited (as in the case of a tumour or atheromatous plaque, for example), the scale of analysis is crucial in data capture. Non-invasive analysis of the lesions is a fundamental objective, for both ethical and medical reasons. To date, in vivo imaging techniques (in the broad sense) are of the order of a millimeter. The heterogeneous nature of lesions, the influence of which is vital for diagnosis and therapy, requires analyses carried out at the micro-metric scale. This can currently only be done by anatomical pathology performed on tissue samples and therefore ex vivo. The development of nanotechnology should prepare the way for multimodal imaging at this scale. Matching with therapeutics, also multimodal and targeted, could be achieved via nano-objects. This new approach, which we could refer to as interventional nano-pathology, could form the basis of a new medical discipline at the forefront of imaging, anatomical pathology, clinical medicine and computational science. Beyond technical barriers, its implementation will require taking a multidisciplinary view resulting in profound changes in the different branches of medicine.

The combination of big data and Big Science in the field of health care will very probably cause significant change in medical practice and public health. This will only be possible, however, if all technical, societal, economic and ethical issues are addressed in an integrated way and are presented within the context of their relevant international dimension.

François Sigaux & Franck Lethimonnier


Francois Sigaux, MD, is the Director of Research and Innovation Programmes at the French National Cancer Institute (INCa) and Director of the Cancer Multi-organization thematic institutes of the French National Alliance for Life Sciences and Health (ITMO Cancer – Aviesan). He is Professor of hematology at the Paris Diderot University and Director of the Saint-Louis Institute – French Campus dedicated and to Personalized Medicine in Hematology. His research, dedicated to leukemia genomics, is performed at INSERM and at the Paris Diderot University.

He has co authored of more than 200 scientific papers receiving more than 10,000 citations. Former President of the French Society of Hematology, he was chairman and/or member of many steering committees.

Franck Lethimonnier is a member of the Healthcare Data Institute’s Board of Directors. He is Director of the Multi-Organization Thematic Institute for Health Technology of the French National Alliance for Life Sciences and Health (Aviesan) that groups together the main stakeholders of life and health sciences in France. As such, he is also Director of the Thematic Institute Health Technologies at INSERM (the French National Institute of Health and Medical Research ) and the coordinator of the Covalliance Committee that groups together the TTO of the Alliance AVIESAN. Read the full bio.