Data Science at the heart of massive Health Data

| Benjamin Guinhouya & Djamel Zitouni - Associate professors, University of Lille

The rise of digital technology seems to have undergone a historical shift. The proliferation and massification of data, which is also called the “ubiquity and the Internet of objects“, involves a large number of areas of people’s lives. It affects all sectors of activity, including Medicine and the broader field of Health.

This digital revolution signals the urgent need for a profound transformation and diversification of health practices inside and outside institutions. An epistemic rupture is announced with the “passage from health that produced data to health determined by its data“. Obviously, evidence-based medicine and evidence-based public health will be questioned or even reworked by integrating this new development.

Traditionally, health data is produced in medical offices, hospitals or by ad hoc surveys, and transmitted to Health Agencies. Henceforth, health data also emanates from new sources, in particular as a result of the individual traces left on the “information highway”, due to the increasing mobilisation of modern communication tools (e.g. forums, blogs, chatrooms) and the widespread use of embedded devices (e.g. smartphones, connected items, virtual “coaches”). In addition, there is open access of health data held by institutions (e.g. Health Insurance) for research and to the private sector.

The availability and accessibility of health data should make it possible to realise the full potential of the 4Ps Medicine (i.e., Personalised, Predictive, Prescriptive and Participatory). At a Public Health standpoint, epidemiological surveillance of infectious and chronic diseases as well as the design, implementation and evaluation of prevention and health promotion programmes will be strengthened. Finally, the surveillance of unknown side effects and pharmacovigilance will yield still unsuspected benefits.

The potential offered by large masses of health data is significant, especially since it has become feasible to build on the advances known in recent years by massive data analytics and Artificial Intelligence. If the digital revolution is well accompanied and exploited, it may well support the future longer life span or at least a healthier long life expectancy.

However, these benefits also present major challenges, particularly for the proper exploitation of massive health data. Thus, data scienstists with an extended profile have emerged within the socio-professional environment. They are expected to have the following triple competence:

  1. Mastery of data mining techniques and strong inclination towards database technologies and tools;
  2. High level in mathematics, statistics, and massive data processing techniques;
  3. Perfect knowledge and know-how in the field of application.

Whereas most current data scientist training are oriented in the traditional branches of banking, telecommunications, mass distribution, or even aerospace, the specificities of the Biology-Health field requires academic programmes that are rooted in them. It is precisely this type of training that we are proposing at the University of Lille, in opting for a modular pedagogical project, backed by a high-level academic research programme, which may be itself fuelled by issues faced in R&D from health and pharmaceutical industries and other allied sectors: a win-win proposition for all partners.

In a common definition that sums up quite well the characteristics and particularity of this professional profile, a “data scientist is someone who is better at statistics than any software engineer, and better at software engineering than any statistician“. We believe that health data scientists are not just an engineer in the classical sense. They must combine engineering skills with a thorough knowledge of the stakes, organizations and other businesses in the field of Biology-Health.

The high-level of requirement for this training proposal also implies for the academic world to admit new fields at the frontier of existing disciplines. Indeed, the training of health data scientist requires an effective implementation of both interdisciplinary and transdisciplinary approaches beyond declarations of intent. Together with the consideration of issues pinpointed and/or relayed by industries, the convergence of academic practices, if really supported and appropriately gathered, should fertilize new lines of research, which can be put to the service of demanding studies. We are confident that students becoming health data scientists should be greatly involved in producing and supporting further innovation in these industries.


Benjamin GUINHOUYA, PhD, MPH, MAAssociate Professor, University of LilleEpidemiology and Philosophy of SciencesEA 2694, Public: Epidemiology and Healthcare QualityFaculty of Health Science and Management
Djamel ZITOUNI, PhDAssociate Professor, University of LilleComputer Science and BiomathematicsEA 2694, Public: Epidemiology and Healthcare QualityFaculty of Biological and Pharmaceutical Sciences