Big Data and healthcare data: an impossible alliance?

février 18, 2015 | Pierre Desmarais

No, effectively combining Big Data and healthcare data is not impossible.

Far from it.

First, because the notion of « healthcare data » encompasses several health-related fields. This simple concept can include very complex information relating to the healthcare system generally, to professionals and healthcare facilities, to patients, etc.

Secondly, because Big Data does not necessarily allow for the identification of individuals. Big Data analysis results in probabilities, not certainties. Ensuring anonymity simply means defining a maximum degree of probability.

Finally, because the very concept of Big Data is particularly volatile. Couldn’t data mining within your own database be considered Big Data if the database exceeds a certain size? This is apparently the case for the Terminology Commission. (1)

That said, let’s be honest. When we talk about Big Data in healthcare, we automatically think of « patient » data.

This is why applying concepts from Big Data to healthcare data is often considered impossible. However, chapters IX and X of the Data Protection Act, often forgotten, show that the government anticipated this type of mass analysis several years ago.

We also forget that other kinds of healthcare data may be just as or even more interesting.

In fact, the most important question about Big Data relates to the origin of the data itself.

Regardless of the kind of healthcare data to be treated, this question will surely arise for anyone who wishes to combine Big Data and healthcare data.

The debate brings us back to the question of « data ownership. »

With very few exceptions, a piece of data – personal or not – cannot be claimed. I know, I know, this goes against everything we’ve been told. But imagine for a moment a world where this was true, that all data could be claimed. How could the press provide information if the media claimed exclusive rights to every detail of their big scoops? How could society continue to make advances in research for the general interest if individuals claimed sole rights to the use of their data?

Therefore, we must consider another kind of property. In this case, intellectual property.

No one can claim ownership of a specific piece of data, yet claiming exclusive rights to a database is the prerogative of every database-owning entity. This happens to be the basis of the National Health Insurance Fund’s opposition to the re-use of its Ameli-Direct directory. (2)

As long as it is external to the healthcare field, working with Big Data in healthcare will always require obtaining the right to reuse the data from the entity that owns it. In the private sphere, this will require licensing. In the public sphere, official applications to reuse public data. Fortunately, the movement to release healthcare data for analysis has already succeeded in obtaining a certain amount of data. Let’s just hope that this process will not be hindered by prohibitively strict legal frameworks.


Pierre Desmarais
Member of the Healthcare Data Institute’s Board of Directors.

  • Notice of August 22nd, 2014, NOR : CTNX1219323X.
  • TA Paris, 23 December 2013, LBCS c/ CNAM-TS, n° 1201686