Data Protection

Learning objectives and introduction
Key issues
Regulations and guidelines
Case studies
Resources

Data Protection

1. Introduction - the journey to the General Data Protection Regulation 2016/679

In the late 1970s, there was an international realisation that computers had, and would continue to develop, extraordinary power to store and process large amounts of data, and that this revolution had the potential to produce the potential for harm as well as benefit to people to whom the data related. Therefore, under the auspices of OECD an international agreement was reached about privacy in processing personal data in 1980.[1] In Europe, the Council of Europe agreed a translation of that international expectation for its Member States in 1981.[2] These two developments introduced into many national jurisdictions a detailed (and to some extent harmonised) expression of privacy in relation to the electronic processing of personal data held about their citizens.

By the 1990s it was clear in the European Union (as it is now), that the processing of personal data was at the heart of a lot of modern commerce, and that if citizens were to have confidence to participate in a single European market they had to have confidence that their data would be processed in at least as good a way as it would be processed within their home jurisdictions. It was also realised that the protection envisaged in the early 1980s, relating to only electronic processing of personal data, was inadequate, and that the regulation if the processing of personal data had to start from a presumption that data protection covered all forms of processing of personal data. This could be relaxed in certain areas (for example, purely domestic processing of personal data for private use by citizens), but it was necessary to widen the scope of the concept of "processing" of personal data so that the protections were more widely available than simply relating to the rather arbitrary 'electronic processing' coverage of the first iteration of data protection. The response was Directive 95/46/EC on the processing of personal data. Today, nearly 20 years on from that Directive, the EU has completed a further reform of the data protection regime, again, taking into consideration further developments in technology - particularly the processing of personal data via the internet and world-wide web. The opportunity was taken to seek a further and more effective harmonisation of data protection law, with a move from a Directive[3] to a Regulation.[4] The process of reform, publicly, started on 25th January 2012 with the publication by the European Commission of its draft Regulation on processing personal data. The legislative process to agree the Regulation was extremely difficult. Whereas the initial response of the Council was favourable, it was not expressed in a ‘first reading’ of the Bill in Council.

The Parliament, with the work of the LIBE select committee, tabled a record number of amendments to the BIll. A first reading in both institutions took years to achieve. Thereafter, faced with an almost intractable impasse, the Bill moved into Trilogue - a process whereby representatives from the Council, the Parliament, and the Commission directly negotiate to (seek to) achieve a workable compromise that is then presented for approval in the Council and in the Parliament. The General Data Protection Regulation 2016/679 (GDPR) was adopted on 27th April 2016 into EU law on 27th April 2016. It comes into force in the Member States on 25th May 2018.

2. The shape of European Union data protection

The GDPR is very familiar to those who know both Directive 95/46/EC and the earlier Council of Europe Convention. The same basic structure is in place. A small table of the mapping between the Directive and the Regulation is given after the ‘Reading’ at the end of this part.

Data protection concerns the processing of (sensitive) personal data, relating to data subjects by data controllers (perhaps through data processors), still under a high degree of control from the national control of Supervisory Authorities. The addition to the dramatic personae in the GDPR is the inclusion of Data Protection Officers who will be appointed at an institutional level and play an important role in relation particularly to high impact processing.
Data controllers owe duties to data subjects, particularly to process the data fairly and lawfully (Articles 5, 6, and 9), and to inform the data subject about the processing (Articles 13 and 14).
Data subjects have rights, essentially to ensure their own protection, particularly to gain access to the data that is processed about them, to have that data corrected where it is incorrect, to block its processing, and to have the data erased (Article 15–22). In relation to research, the much discussed “right to be forgotten” does not apply.
Member States each have the duty to create a Supervisory Authority that must operate the registration of date processing, engage where appropriate in responding to ‘high impact’ processing, investigate and prosecute complaints of breaches in data protection law, ensure the operation of the Regulation in their jurisdiction, with some discretionary powers within the Regulation still falling to them (as in the Directive).
Member States must also ensure that there is a com pensation and punishment regime in place in its jurisdiction in line with the requirements of the Regulation. The sanctions available under the Regulation are much higher than those under the Directive.
The EU supervisory authorities are strengthened under the Regulation. There is a EU Data Protection Supervisor, and the Article 29 Working Group becomes the EU Data Protection Board.

“‘Personal data’ means any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person” (Article 4.1, GDPR)

Article 4 provides the definitions that operate in the interpretation of the GDPR. Whilst it might be too obvious to mention this, it must be done: definitions and understandings of terms in other disciplines or contexts have no bearing on the interpretation of words that are defined in the GDPR. In particular, the concepts of 'anonymisation' and 'pseudonymisation' that operate in many of our disciplines have to be put aside when thinking about the GDPR. In the GDPR the concept in operation is identifiability (although the GDPR does define pseudonymisation); can the (potential) data subject be identified either from the data in the possession of an individual or that data in combination with other data that are reasonably foreseeable to come into the possession of that individual (not just the data controller). If the answer yes, then the person to whom that data relates is a data subject and GDPR applies to them (through the domestic law of the jurisdiction in which s/he is situated). If the answer is no, then the GDPR does not apply. If the answer is that the data subject is no longer identifiable, following a process of removing sufficient identifiers from the data to make re-identification impossible, then the GDPR does not apply to future processing, but it is perhaps arguable that there are some continuing duties towards the data subject that arose when the data subject identifiable in the data (or in the data and reasonably foreseeable connections with other data). Thus, rendering data 'anonymous' or 'pseudonymising' data and the rules relating to that in particular disciplines is not relevant to the GDPR: the question is only about whether the data subject can be identified, within the definitions contained in the GDPR.

We should now turn to the basic structure of the GDPR. There are essentially four key elements: the data protection principles, the routes to lawful processing, the information provisions and the rights of data subjects. Those familiar with the Directive will see the similarities immediately.

A. The Data Protection Principles

Whilst not formally titled as such, there are a number of principles that underpin the GDPR. These are contained in Article 5. Data should be processed fairly, lawfully and transparently (Art. 5.1.a). Transparently is a new addition and must refer to the processing techniques rather than the content - as confidentiality and privacy of information must be maintained.[5] (Routes to) Lawful processing are found in Article 6 (for general personal data) and 9 (for sensitive personal data). The processing must be limited (Art. 5.1.b) to those necessary and compatible with the declared purpose(s). Further processing must not be incompatible. The data collected must be only that which is necessary for the purpose of the processing (Art. 5.1.c), and must be accurate as far as possible and “where necessary” (Art. 5.1.d). There is a presumption that data should be de-identified as soon as possible (relating to the purposes of the processing)(Art. 5.1.e), and data must be stored securely (Art. 5.1.f). In addition to these principles, there is now a presumption in Article 25 of “data protection by design” - that where data will be processed, the controller must build into the enterprise systems that ensure data protection. This is a new concept that will have an impact in research - a protocol must show that data protection has been designed into the research as a ‘bottom-up’ principle.

Most importantly, whereas under the Directive a Data Controller was under an obligation to notify the Supervisory Authority of any processing of personal data, and the Supervisory Authority was under a duty to undertake, where necessary prior checking to ensure compliance, there are major changes. Prior checking, given the amount of work involved compared to the general funding of Supervisory Authorities was not particularly successful under the Directive. The GDPR requires all Controllers undertaking processing that is likely to be of high risk to the data subject’s (data protection) interests must make an ‘impact assessment’ (Article 35) before any processing is undertaken. The Supervisory Authority must make a list of processing that is to be considered as high risk. Article 35 outlines an extensive, systematic evaluation that must be undertaken where an impact assessment is required. Prior consultation with the Supervisory Authority must be undertaken where the controller is not able to provide mitigation for high risk processing. There is a potential weakness here as Article 36.1 does not require an external evaluation of whether mitigation is achieved. Of course, the prudent Controller will ensure that there is either mitigation of risk, or consultation - and the evaluation of a Data Protection Officer may assist in this where such a person is appointed. However, the imprudent Controller may only be found out in the event of a breach, and whether a high sanction will be sufficient to compensate the loss is not always clear.

It will be noted that there are considerable opportunities for Supervisory Authorities to produce local interpretations of the requirements of the GDPR. There are some measures that suggest that the Board will have a role in attempting to achieve the harmonisation desired for the GDPR, but the fact that the Regulation still contains many of the Directive’s discretions rather indicates that there is not harmony between Member States in this area, and differences will persist. Likewise, the GDPR is a ‘general’ Regulation, attempting to cover all processing of personal data. This is, of course, a Herculean, almost fantastical task, because there is such variation between processing sectors as to what constitutes acceptable limits and interpretations of the Regulation. Therefore, it is to be hoped that the opportunity offered for European Commission and EU Data Protection Board approval (under Art. 40), will be taken to create sectoral Codes of Conduct - sectoral interpretations of how to interpret the GDPR in particular circumstances, for example, in life science and genomic research. RECs should be aware that there may well be sectoral Codes that apply to research presented to them.

B. Fair and Lawful Processing.

Under Article 5.1.a, data controllers are given the duty to process data fairly, lawfully and transparently. Lawful processing is to some extent dealt with under Articles 6 and 9 (to some extent, in that if there are other legal conditions acting in relation to the data, then they must also be followed to achieve lawful processing).

Article 6 sets out the conditions for lawful processing of personal data. The first conditions relate to informed consent, either directly given or given through a contract. The second condition is where the processing is in the vital interests of the data subject. The third conditions relate to duties imposed by Law. The fourth route to lawful processing is where the processing is in the interests of data controller and would not be in contrast with the fundamental rights and freedoms of the data subject. The final route is through an appeal to the public interest.

Article 9 prohibits the processing of sensitive personal data (i.e. “racial or ethnic origin, political opinions, religious or philosophical beliefs, or trade union membership, and the processing of genetic data, biometric data for the purpose of uniquely identifying a natural person, data concerning health or data concerning a natural person's sex life or sexual orientation” - Article 9.1). Thus, medical research often concerns sensitive personal data under this definition. The prohibition can be lifted in certain conditions, found in Article 9. First, where the data subject has consented to the processing (“except where Union or Member State law provide that the prohibition referred to in paragraph 1 may not be lifted by the data subject” – Article 9.2 – which might be considered a highly paternalistic approach to the data subject when compared to other uses of consent in, for example, medical research). Second, the data controller is acting under a legal obligation or right under national employment Law. Third, the vital interests of the data subject require the processing. Fourth, and with “appropriate guarantees”, the processing is necessary for activities of bodies such as political parties or trades unions, etc. Fifth, that the data are already published by the data subject, or are necessary in legal proceedings. Sixth, that the processing is necessary for preventive medicine or occupational medicine. Seventh, the prohibition can be lifted for various medical purposes – diagnosis, treatment, prevention, and the management of health care. Most interestingly is the inclusion of the eighth condition, Article 9.1.j:

“processing is necessary for archiving purposes in the public interest, scientific or historical research purposes or statistical purposes in accordance with Article 89(1) based on Union or Member State law which shall be proportionate to the aim pursued, respect the essence of the right to data protection and provide for suitable and specific measures to safeguard the fundamental rights and the interests of the data subject.”

Equally, Member States may create new legislation to allow processing of sensitive personal data in the “substantial public interest”.

Article 9.1.j is a substantial change from the position of research under the Directive. Article 89 reuses that essentially research is undertaken on data that has been pseudonymised, unless that compromises the purpose of the processing. However, this is not the lawful processing for research, as satisfying Article 9 is not sufficient; there must also be a route to lawful processing available under Article 6, and Article 9.1.j is not mirrored in Article 6. In one of the earlier draft Bills, there was, under a then Proposed Article 6.2 a route to lawful processing for general processing simply for research. However, this was removed in the negotiations. Thus, whilst the prohibition on processing sensitive personal data may be lifted for scientific research with an appeal to Article 9.1.j, there must still be a route to lawful processing under Article 6. What a REC must bear in mind is that there are a number of routes to lawful processing and not only informed consent. For example, an appeal could be made to processing the data in the public interest, or for the legitimate interests of the controller without damaging the interests of the data subject. This would require a case to be made, but in principle it must be an available route. We will return to informed consent after the basic shape of the GDPR is outlined.

C. Information Provisions.

In order for the Data Subject to act on his or her rights under the GDOR, he or she must know about the processing. Whereas there is a limited amount of protection afforded to the data subject through the Supervisory Authority, and perhaps through other bodies such as RECs in relation to medical research, in the vast majority of cases, the regime is arguably a 'self-help' regime. Data controllers must observe their duties towards data subjects, but the rights of the data subject (perhaps particularly those relating to his or her specific sensitivities) are very largely left to be enforced by the data subject. Therefore, the data subject must be informed about processing that is to be undertaken on their data, and who is responsible for that processing. This is addressed in Articles 13 and 14 of the GDPR.

There are, essentially, two scenarios addressed in relation to informing the data subject about processing: either the data controller is collecting the data directly from the data subject for foreseeable processing (direct gathering), or the data controller receives the data from a third party (most probably another data controller)(indirect gathering).

There is, of course, a further scenario: a data controller, having gathered or received data for a particular purpose (or set of foreseeable purposes), then sees another unforeseen purpose for which the data could be processed. 'Processing for further purposes' is, unfortunately, not dealt with simply under the Directive, so we will leave it to one side for the time being and return to it for separate consideration (in Discussion Point 2).

The information that must be given to a data subject before his or her data are processed are the contact details of the data controller and a description of the purpose of the processing to be undertaken. When the data are gathered directly from the data subject, the information must be given to the data subject. Where the data are to be processed by a third party, then again, the presumption is that the information must be given to the data subject unless s/he is already in possession of that information, or that it is impossible or would require a disproportionate effort (Article 14.5.b). In cases of impossibility or disproportionate effort the Member State must provide alternative safeguards.

What is clear is that the data protection regime requires those who gather data directly from data subjects to provide information so that the data subject can protect their own rights. There is no Article 14.5.b equivalent in Article 13 – no ‘impossible or disproportionate effort’ – and this is understandable. If the data subject is there for a direct gathering of data, then the information can be given.

D. Data Subject Rights

The rights of a data subject are largely the same, in respect of research, as those available under the Directive 95/46/EC. As indicated above, the ‘right to be forgotten’, which is largely driven by concerns about the internet, is not available to data subjects where the processing is for research (Article 17.3.d). It is worth noting that research, under the GDPR includes applied research (Recital 159). One question that remains is how the right to withdraw operates in relation to research. It has long been a standard of research that a participant is included in a voluntary way and can withdraw from the research at will. However, there could be another argument, given the potential impact of withdrawal from a study on the scientific impact of the study, and given the difficulty of withdrawing from processing once results of a study have been published.

The GDPR addresses this to some extent. Article 21 - the “right to object” - under the general provision indicates that data subjects have a right to object to processing “unless the controller demonstrates compelling legitimate grounds for the processing which override the interests, rights and freedoms of the data subject” (Article 21.1) However, under Article 21.6 the provision for research is slightly different:

“Where personal data are processed for scientific or historical research purposes or statistical purposes pursuant to Article 89(1), the data subject, on grounds relating to his or her particular situation, shall have the right to object to processing of personal data concerning him or her, unless the processing is necessary for the performance of a task carried out for reasons of public interest.”

It remains to be seen how “on grounds relating to his or her particular situation” will be interpreted and whether there will be a harmonised interpretation in the Member States to this.

A right that may produce difficulties for researchers is Article 20 - “the right to at a portability”. Under this Article,

“The data subject shall have the right to receive the personal data concerning him or her, which he or she has provided to a controller, in a structured, commonly used and machine-readable format and have the right to transmit those data to another controller without hindrance from the controller to which the personal data have been provided.”

Two conditions attach - that the route to lawful processing is informed consent, and that the processing of the data is automated. This does not have the administrative cost clause of the Article 15 “right of access”, and much will hang, for research, on the interpretation of “which he or she has provided to a controller”. There is an on-going question relating to data ownership about how far personal data simply relate to the data subject, being generated through the labour of the data controller. But if one, for example, took the example of genetic information derived from a blood sample, how far does that constitute data ‘provided to a controller’, or is it only the blood sample that is provided? At the other end of the spectrum, when a data subject participates in the highly structured information gathering of, say, a biobank, how far must the biobank provide all that data “in a structured, commonly used and machine-readable format”. At that point, the interests of third parties (protected under Article 20.4) may restrict the amount of data that is available to the data subject through the participation in a biobank. The reasoning for this is considered in the first question below.

3. Questions still unresolved by the GDPR

A. Who is the Data Subject? Dealing with genetic relatives.

There is often a problem in medical research, particularly research using genetic data or biobanks, about individuals who are the genetic relatives of the participant. There has, arguably, been something of a difficulty in knowing how to deal with this 'penumbra' of relatives. The temptation is to think that only the direct research participant is a data subject. And, indeed, it is convenient to think in that way.

At first thought, of we were to treat all the genetic relatives of the potential participant as data subjects, then research would immediately collapse under the weight of informed consent negotiations. However, this pragmatic solution does leave an uncomfortable feeling. Let us consider, for example, the situation of a genetic relatives in a single purpose research project. Arthur presents himself for enrolment having being identified as a potential participant for the study. He gives blood, urine and saliva samples, and a medical history as requested. Arthur has three brothers, his parents are still alive, as is one of his father's brothers who has two daughters. He indicates that his mother had a great-aunt who they know to have emigrated many years ago to Australia, who they know had a son through an affair, but because that branch of the family was quite religious, contact was lost with the great-aunt, and Arthur believes that given her age she must have died some years ago.

Making Arthur's extended family - the ones that he has named so far - all data subjects has the feel of a crazy, unreasonable suggestion. And yet, each of them has grey similar things to lose - harms to suffer - from a participation in research that Arthur has. Arthur is not a special case because he has been invited to participate in the research; the rights to privacy and data protection that Arthur must be able to enjoy must, arguably, must also be enjoyed by those who are identifiable in the data disclosed by Arthur. We know a great deal about the relatives that Arthur's samples (and history) disclose to the research data controller.

This is, however, not catastrophic when we allow the structure of the GDPR to dictate the answer. Arthur is the data subject from whom an Article 13, Direct gathering operates. All Arthur's relatives are data subjects from whom the data are gathered indirectly. Therefore, those genetic relatives are within the conditions of Article 14, and must be informed of the data controller's contact details and the purpose of the processing where informing them is reasonable - where it is not impossible or requiring a disproportionate effort. The question becomes one of fact and balance - what are the potential risks to one's fundamental rights and freedoms arising through participation in balance with how much effort would it take to notify the data subject? 'But s/he might not want to participate' cannot be a reason not to notify him or her (but remember that notification is not to gain informed consent under the GDPR.[6]

B. Informed Consent

There was considerable concern in the research community during the passage of the Bill. After an initial draft from the Commission that indicated that there would not be a need form researchers to rely on a narrow, highly specified consent for, for example, biobanking or data intensive research, the provision that allowed research as a route to lawful processing in Article 6 was lost. The approved GDPR text has a compromise, but it is not one that is without difficulties.

As in the Directive 95/46/EC, informed consent is the first of the routes to lawful processing for general personal data (Article 6.1.a) and for lifting the restriction on processing sensitive personal data (Article 9.2.a). These are not, of themselves problematic texts:

“the data subject has given consent to the processing of his or her personal data for one or more specific purposes.” (Art. 6.1.a)

“the data subject has given explicit consent to the processing of those personal data for one or more specified purposes, except where Union or Member State law provide that the prohibition referred to in paragraph 1 may not be lifted by the data subject.” (Art. 9.2.a)

Further, the GDPR includes two specific Articles on consent: Article 7 on general issues about consent, and Article 8 on gaining consent from minors. These are more concerned with the procedures for gaining and evidencing consent. The problem arises in the definition of consent contained in Article 4.11 - the definitions Article - where:

“‘consent’ of the data subject means any freely given, specific, informed and unambiguous indication of the data subject's wishes by which he or she, by a statement or by a clear affirmative action, signifies agreement to the processing of personal data relating to him or her;”

This appears to create a requirement that informed consent is “specific”. It may well amount to an attempt at clever legal footwork to suggest that “freely given, specific, informed and unambiguous” are qualifiers to “indication”, and not the substance of the consent itself - that the data subject must be specific in the indication , not specific in the consent itself. These could well be taken to amount to the same thing: to indicate specifically is to specify the parameters of the consent envisaged.

Likewise, it may well be insufficient to argue that specifying “research” in a broad consent way would satisfy the requirements of Article 4.11 on its own. However, a last minute inclusion in the GDPR was Recital 33, which it is worth reproducing in full:

“It is often not possible to fully identify the purpose of personal data processing for scientific research purposes at the time of data collection. Therefore, data subjects should be allowed to give their consent to certain areas of scientific research when in keeping with recognised ethical standards for scientific research. Data subjects should have the opportunity to give their consent only to certain areas of research or parts of research projects to the extent allowed by the intended purpose.”

This, it is widely accepted is designed to allow research the opportunity for ‘broad’ informed consent. However, as a Recital, it does not have the same immediate weight as an Article, and must therefore be accepted into the common interpretation of the GDPR, perhaps ideally through a Code of Conduct. What would be disappointing is if this was left to individual Member State Supervisory Authorities to take a view on the relationship between Article 4.11 and Recital 33. Further, RECs must be aware of the interplay between the two elements of the GDPR.

C. Processing for further purposes.

The Data Controller must inform the data subject of all the purposes for which s/he wishes to process the data if he or she collects the data directly from the data subject or, where he or she indirectly collects the data, where it is possible and not requiring a disproportionate effort. Imagine the situation of Anna, professor of oncology at a large university hospital. She gathered data from 150 data subjects about a particular cancer she was studying. The research was completed, and she published papers on her findings. Some time later, two developments happened: Anna herself made a new, and rather surprising connection to a different cancer, and realised that a further processing of her original data set could lead to interesting results; Anna’s funding body require her, as a condition of the grant, to make her data available to other researchers (unidentified) through a ‘data hub’ - which requires a standardisation of the metadata and the linkage of data with other data sets, and therefore (pseudonymised) identifiability of data subjects (to prevent duplication of subjects in the dataset).

There are a number of routes to explore here. The first is, of course, are the two developments covered by the original route to lawful processing and information provisions? There is a chance that the informed consent has been broad enough to cover both developments, and the information about the processing was similarly broad to cover the possibility. However, this may well not be the case. Let us consider the two elements of routes to lawful processing and information provisions separately.

Under the Directive, the route to lawful processing element was very difficult for this sort of secondary processing. The Directive, under its Article 6.1.b was very ambiguous about secondary processing for a compatible purpose, because the drafting could be interpreted as either meaning that compatible processing for the same purpose was acceptable, or processing for compatible purposes was acceptable. The first draft of the GDPR from the Commission sought to clarify this immediately.

Under the Proposed Article 5, it provided that data should be gathered for a specific purpose and not further processing in an incompatible way - the first element; and then under the Proposed Article 6 it made it clear that processing for further purposes was acceptable where the purposes were compatible with the original purpose for the processing, and special provision was made for presuming that research was a compatible purpose. The political negotiations have slightly muddied that initial clarity. The ambiguous wording of the Directive is imported into the GDPR in Article 5.1.b:

“personal data shall be (b) collected for specified, explicit and legitimate purposes and not further processed in a manner that is incompatible with those purposes; further processing for archiving purposes in the public interest, scientific or historical research purposes or statistical purposes shall, in accordance with Article 89(1), not be considered to be incompatible with the initial purposes (‘purpose limitation’);”

However, Article 6.4 is retained concerning processing for a compatible purpose:

“Where the processing for a purpose other than that for which the personal data have been collected is not based on the data subject's consent or on a Union or Member State law which constitutes a necessary and proportionate measure in a democratic society to safeguard the objectives referred to in Article 23(1), the controller shall, in order to ascertain whether processing for another purpose is compatible with the purpose for which the personal data are initially collected, take into account” a number of conditions.

This is perhaps not elegant, but it does spell out that the possibility for processing for a purpose compatible with the original purpose is envisaged under the GDPR. It is further underlined in the first paragraph of Recital 50:

“The processing of personal data for purposes other than those for which the personal data were initially collected should be allowed only where the processing is compatible with the purposes for which the personal data were initially collected. In such a case, no legal basis separate from that which allowed the collection of the personal data is required. If the processing is necessary for the performance of a task carried out in the public interest or in the exercise of official authority vested in the controller, Union or Member State law may determine and specify the tasks and purposes for which the further processing should be regarded as compatible and lawful. Further processing for archiving purposes in the public interest, scientific or historical research purposes or statistical purposes should be considered to be compatible lawful processing operations. The legal basis provided by Union or Member State law for the processing of personal data may also provide a legal basis for further processing. In order to ascertain whether a purpose of further processing is compatible with the purpose for which the personal data are initially collected, the controller, after having met all the requirements for the lawfulness of the original processing, should take into account, inter alia: any link between those purposes and the purposes of the intended further processing; the context in which the personal data have been collected, in particular the reasonable expectations of data subjects based on their relationship with the controller as to their further use; the nature of the personal data; the consequences of the intended further processing for data subjects; and the existence of appropriate safeguards in both the original and intended further processing operations.”

D. Identification, De-identification, and Re-identificationThere is a significant problem in the personal data sharing and data-intensive health, medicine and life science research community. Large data sets, in order to be useful, need to be up-dated regularly, so that the life-experience of the individual data subject can be followed; medical histories and genomic data as a snap-shot are useful, but as an on-going narrative they are so much righter. Therefore, it is necessary to keep the dataset (be it centrally located, or federated[7]) in an identifiable form. This will be in a pseudonymised (coded) form for security, but it will be possible to identify individuals within the set.

The first problem therefore arises when data is passed from the dataset to researchers. It is highly likely that this will be passed in a de-identified way; the identifiers in the dataset that is passed to the researchers will have been stripped from the data, and individuals will not be identifiable from the aggregated data or the data that is passed. However, because there is a technical possibility that the data could be re-identified by connecting the data back to the original, identifiable dataset, many jurisdictions take this to mean that the data remains personal data (identifiable) throughout its life, and that the researcher with the de-identified set is bound by the conditions of the GDPR. The GDPR is concerned with the reasonableness of the potential for identification. There is a first set of questions to be asked here: is this possibility of re-identification one that should be reasonably considered as a threat to the interests of the data subject such that the GDPR should bind the researcher in this scenario, and what conditions might be sufficient - for example, in the data sharing agreement - that might mitigate that threat (are there, for example, technical safeguards to prevent re-identification that would be sufficient; is an undertaking, with sanctions, against re-identification sufficient)?

The second problem moves from the internal difficulty of imagining the dataset and its key, and the likelihood of re-connecting the key and the de-identified data to the external question of the likelihood of connecting the de-identified data in the hands of the researcher (either deliberately or accidentally) to an external dataset (perhaps already held by the researcher, or falling into her hands from a third party) that then re-identifies the data subject. Indeed, this possibility begins to question whether, in an internet culture with so many different datasets being connected internationally with increased computing power, it is still possible to speak of unbreakable de-idetification; is it impossible to be truly ‘anonymous’ in any circumstances anymore (remembering that identification is not a matter of names and addresses, but any data that, when connected together, identify an individual). So, the second question is, regardless of the possibility of connecting the de-identified dataset to the original identifiable source, what is the likelihood that the de-identified dataset will be connected to sufficient other data to re-identify the data subject? How remote does this possibility have to be to disengage the GDPR?

Of course, at another level both of these questions presume that disengaging the GDPR is a good and desirable thing. In most cases this is not necessarily the case, but there is a case to suggest that there is a difficulty in maintaining the GDPR for data sharing and data-intensive research. If the GDPR is engaged in anonymous data sets, only the information provisions have a ‘disproportionate effort’ or ‘impossibility’ limitation. There must be a route to lawful processing, and the reluctance to ‘change horses’ between routes to lawful processing for processing for secondary purposes is already noted.

So, if the original route to lawful processing was informed consent, and the informed consent was not broad enough to capture the secondary processing, and the wording of that original consent precluded an appeal to compatible processing (which is not uncommon), then is re-consenting the data subject to be able to connect the privacy-protected data the only way forward? This would seem to be at odds with the spirit of Article 89, and Recitals 33 and 50, for example, which seek to enable data sharing and data-intensive research for health, medicine and life science research.

REC members may well take a view that this is not a matter for RECs and that they should depend on the EU Data Protection Supervisor and Board for guidance. To some extent, of course, this is correct; those bodies, and the Court of Justice of the European Union, have the authority to pronounce definitively on the interpretation of the GDPR. However, RECs see the practical setting of these dilemmas for research, and so they can voice an opinion to contribute to the empowered authorities’ deliberations. Further, and most importantly, personal data privacy and confidentiality are not only legal matters, they pose ‘ethics’ questions also, and there the REC has responsibilities. Does, for example, ethics demand specific informed consent where the GDPR might countenance the public interest? We would suggest that answer is ‘no’, indeed, ethics may take a more solidarity-based view - that the desire for medical research and therapies in an increasingly individual-focused society requires that research be allowed to take place in the public interest. It could be that the ethics debate forces the legal, data protection debate to reconsider some of its more extreme autonomy-based (and solidarity-rejecting) interpretations of confidentiality and privacy. But that is for debate.

Questions for Discussions

Is this a reasonable approach to the problem of genetic relatives?

How far will informed consent present a problem to new data intensive research methodologies and data sharing?

How does your REC deal with requests for processing for further purposes that were not foreseen at the initial gathering of the data?

How far, and in what circumstances would your REC allow processing of data where informed consent was not gained, but where, for example, an appeal to the substantial public interest was made?

[1] OECD Guidelines on the Protection of Privacy and Transborder Flows of Personal Data http://www.oecd.org/internet/ieconomy/oecdguidelinesontheprotectionofprivacyandtransborderflowsofpersonal data.htm (Last visited 1st September 2014). These guidelines have been updated to their current version last updated in 2013.

[2] Convention for the Protection of Individuals with regard to Automatic Processing of Personal Data http://conventions.coe.int/Treaty/en/Treaties/html/108.htm (Last visited 1st September 2014).

[3] with indirect effect, requiring implementation (transposition) into Member States’ law.

[4] with direct effect in Member States’ law.

[5] See, GDPR Article 12.

[6] And one might argue, if the route to lawful processing is a research in the public interest route, where the data subject's rights are highly restricted, and their interests are protected by alternative safeguards, then this might go to the proportionality of the effort.

[7] i.e. maintained at different locations but linked, for example, in a way that allows remote interrogation.

Learning objectives

To understand the principles of participant protection, anonymisation and informed consent
To learn and follow the principles of general data protection regulations in a European context

Introduction

The Gold Standard of Participant Protection - Anonymisation and Informed Consent

(Research participant) autonomy is seen as a central premise of bioethics and of human dignity. One's right to chose to participate in medical research in an informed way, and to be protected from identification within research as far as is possible, is almost unquestioned. It is at the heart of the Belmont Report[1] and the work of Beauchamp and Childress;[2] it is one of immediate concerns for RECs in their assessments of research protocols.

Arguably, anonymisation and informed consent are seen as default safeguards of participant autonomy. However, neither anonymisation nor informed consent are without their conceptual and practical problems.

A. Anonymisation

There are a number of problems, at different conceptual levels.

1. Meaning

"Anonymisation" is used to mean different things in different jurisdictions and disciplines. In certainsettings, it is taken to mean that the participant will no longer be identifiable in the research - i.e. in the raw data and in the processed data and products of the research, the participant will not be identifiable. In other settings, anonymous data might relate to a downstream use of the data - the data, in the he hands of the individual in question (perhaps a second researcher using the data gathered by another) holds it without identifiers, but the participant could be re-identified by linking the data to the key held by another. To some, this would describe a form of "pseudonymisation" of data; data held in a form that prevents immediate identification of participants without access to a key held separately. These terms are to a very large extent context specific, and the context will define the meaning of the terms. However, this uncertainty of language itself produces confusion.

2. Availability

When data was processed without electronic means, or at least before the linking power of the internet, the concept of removing parts of the data such that the remaining data no longer identified an individual (or perhaps a group to whom the individual belonged) might have been more possible. Of course, it was never completely possible. Data that relate to an individual are dynamic composites of snips that link together in different ways making individuals more or less identifiable at any given time, depending on who is looking at the data. And equally, it is extremely rare that a single snip alone identifies a particular individual (perhaps in any meaningful sense); personal data are composite and context specific (as Taylor has shown).

First, even one's name, alone, means relatively nothing. The name "David Townend" printed on an otherwise blank piece of paper means nothing on its own. It only resonates and finds identifying meaning when it is linked to other information. Thus, if someone 'Googles' his or her own name, in the vast majority of cases, one finds a number of entries for that name. First, that person will know that (almost invariably) not all the references relate to him or her; most often, the name relates to a number of individuals. However, in that realisation, there is a second element: each reference gives a context within which the name becomes identifying. So, an individual labelled "David Townend" might be a Professor in Maastricht (and because of the long memory of the internet, a Senior Lecturer in Sheffield), a person giving a number of conference papers in relation to Law, a singer in a jazz band, or a bass soloist in a number of choral concerts. Only some people will know that these elements different contexts relate to a particular holder of the name "David Townend", and that they should be distinguished from a host of other "David Townend"s for whom there are results.

The second major observation is that the same information has different value in different contexts. Add the name "David Townend" to a list of students and identify him as the tutor of the group, and the value is increased, but perhaps of little worth; add the tutorial times and the addresses of the people on the paper, and in the hands of a door-to-door sales person, it has a particular value (at those times, no-one is in) whereas in the hands of a house-breaker, the list has another value (at those times, no-one is in!). And arguably, there is no intrinsic value in any particular type of data (e.g. medical or genetic data); even 'sensitive personal data' has different values in different combinations and in different contexts.

This means that in some situations, we over-compensate for the presumed value of data; in other situations we might under-estimate the value of data.

In terms of the possibility of anonymisation of data, the composite, dynamic, context specific nature of data, and the vast amount of data available on-line and on demand, anonymisation is perhaps a promise that can no longer be made. It is based on an idea that data sets are fully independent - fully free of external connections. If that were possible, a data-bubble might have sufficient 'snips' removed to render the remaining data without identifiers. However, such bubbles burst. Data sets are not held in perfect isolation of other data, and snips can be linked to identify individuals from other means.

3. Desirability

Anonymisation has been (some would argue still is) a great safeguard for identity. However, is it a great safeguard for dignity?

Imagine that one finds that one's tissue and medical data, given for research on the strict understanding that it would be anonymised, has been used for chemical weapons research. To many, such a finding would offend his or her dignity.

Likewise, imagine that one finds that the tumour that has just been found by one's doctors and is inoperable at its stage of growth was seen (as an incidental finding) in a scan that one had as part of a research project one year earlier in a much smaller and operable state. But for the safeguard of anonymity, the researchers would have sent such data to one's Personal physician. Again, a safeguard of one's dignity?

Of course, these two examples are not uncontested in themselves, but they are contested, and make the claim to the supervening value of anonymisation as a natural safeguard to (medical) research participants itself debatable.

Questions for Discussions

How far does a lack of clarity in the meaning of "anonymisation" (and related concepts) cause difficulty, especially in multi-centre or multi-disciplinary research?

How far do we treat different types of data as necessarily requiring and deserving of higher safeguards, without seeing the context within which is processed?

On the other hand, how far is it possible to offer "anonymity" to research participants in the 'information age'?

What response can be offered if that is the case?

How far is "anonymity" desirable?

Does this answer differ at different stages of the research?

Is (the concept of) "confidentiality" a better safeguard for the participant than privacy? i.e. a binding duty on those who receive the data not to identify the participant (similar to a duty owed, for example, by a medical doctor to his or her patient).

B. Informed Consent

Informed consent is difficult. It is at the heart of the modern consumer (transactional) society. Individuals have freedoms of choice, and are accountable for the actions; they have the duty to inform themselves to their own satisfaction before entering a transaction as there will be no appeal to 'I didn't know' in the 'caveat emptor' market. However, there are exceptions to this hard world. Sellers have legal duties, to greater or lesser degrees depending on the jurisdiction, to tell the truth or not to conceal or cloak relevant information. More than that, although increasingly lost as the commercial model roles out under the guise of individual freedom and self-determination, 'professionalism' demands a different relationship between people.

Caveat-emptor-contracting thrives where there is, or is presumed to be, 'equality of bargaining power'. Where there is inequality of bargaining power, some duty of protection is often required of the stronger party at Law - a 'fiduciary duty'. In situations where the bargain is forged with, or perhaps because of, an imbalance of power (for example, between doctor and patient, lawyer and client, banker and client, teacher and pupil, guardian and minor or incompetent adult), the stronger party is (most often) required to act in (or to protect) the interests of the weaker party.

Research with human participants arguably (strongly arguably) falls into this fiduciary duty. Researchers have in the vast majority of cases much greater knowledge of the area, it's risks and potential benefits, than the participants in their research. That imbalance, that vulnerability, must be protected. And one major element of this safeguard is to require the researchers to inform the participants about what they are proposing to do and what they expect the outcomes to be - arguably, to give some background about the choices they have made in developing the methodology. They must inform the potential participants to redress the knowledge imbalance and to equip the potential participant to make an real choice about whether or not to participate.

Does this mean "full information". This is difficult. One must redress the knowledge imbalance, but there are arguably some caveats. First, by definition, in research there can be no "full knowledge"; research is testing a hypothesis about what might be the case. Therefore, there is a gap in the knowledge that is available that is shared by the researcher and the potential participants. So the informed consent is already not about full information. Second, there are limits on how much information is relevant. As in clinical medicine, choices have to be made in informing potential participants about which information is relevant. Remoteness of risk and proportionality must be in play, with a strong measure of 'reasonableness' to stop the drive to information becoming a requirement to inter-connect all knowledge to the particular research. Again, the standard is not binary - information / not information - it is a spectrum.

This second problem can be seen in the area of biobanking. Biobanks operate on the basis of developing a repository of information for the purpose of 'research' (perhaps with some limits, for example, relating to disease type and the like). Access to the data set to create cohorts for particular research projects then, depending on the model, is made on the basis of the initial, broad consent of the participants to participate in the Biobank for research purposes. This causes problems to some people: informed consent requires detailed information about every research project and broad consent, by definition cannot be informed consent; to others, without broad consent biobanks become impossible to operate. Now, of course, the second argument - the practical argument - whilst important, is not of the same nature as the first. However, 'broad consent' and 'informed consent' are not opposite arguments, as the first argument implies.

When one takes the words, the opposite of 'informed' is not broad but 'uninformed'; the opposite of 'broad' is 'narrow' or 'specific'. Whilst is it conceptually difficult to imagine how one could give 'uninformed consent' as the concept of 'consent' itself seems to require a degree of information - at least to know that consent is required in a particular situation -, it is, arguably, possible to give 'informed broad consent' as well as 'informed narrow consent'. This is because, as we have already admitted, information in consent is not a binary informed / uninformed, but rather a question of being sufficiently informed to make a fair and binding decision. The question then is, 'who judges sufficiency?'

Presently, the sufficiency of information is governed in the most part by the REC. Researchers produce information sheets and these are scrutinised and accepted by RECs as part of their validation of research. Whereas participants have the opportunity to ask questions of the researcher, and, arguably, to shape the interaction about becoming informed to make the decision to participate, in practice one wonders how far this is a real or free dialogue. Is this problematic? Yes, if it does not fit the needs of the participants.

The information sheets reflect the perceptions of those who write them (and RECs are co-authors of the sheets given their role). However, when one looks at studies of expressed sensitivities of citizens, some people will share those perceptions and concerns, whereas others will not, judging less information to be 'sufficient', and others again will require more or different information. One size does not fit all. And some will make a judgement that informed broad consent is sufficient, others will require informed specific consent.

Why is this a problem? Because increasingly, the size that is adopted does not allow for individual participants to make their choice, and that super-sizing of informed narrow consent makes many new research methodologies impossible when they would be acceptable to some participants - which is, perhaps, ironic, when the purpose of informed consent is to protect participant self-determination.

How might this be solved?

Dynamic consent. Many have written on dynamic consent, and some projects are developing models of dynamic, participant-centred consent. The idea is to develop consent interactions between (potential) participants and researchers that allow the participant to determine the level (and, perhaps, nature) of his or her participation. Such mechanisms could be on-going, perhaps making use of secure internet portals such that individual participants could develop increasingly sophisticated consent profiles as their understanding and relationship with the research (or, for example, Biobank) develops. Likewise, the portal could be used by the researchers or Biobank as an educational or information tool to share findings and discuss methods, even difficulties, with the public.

Questions for Discussions

How far does this analysis of informed consent as problematic ring true?

Is 'informed narrow/specific consent necessarily required to meet the safeguard of informed consent, or can broad consent be sufficient?

Is participant-determination of sufficiency of information acceptable?

How far is dynamic consent a desirable and practical development in informed consent?

Are data science - online - portals the only realistic mechanism for delivering truly dynamic consent?

[1] 7 Belmont Report (1979). The Belmont Report: Ethical principles and guidelines for the protection of human subjects of research. hhs.gov/ohrp/humansubjects/guidance/belmont.html (Last visited 1st September 2014).

[2] Beauchamp, T. and Childress, J. Principles of Biomedical Ethics. (7th edition). Oxford University Press, New

Main menu

Overlapping issues

Data Protection

Data Protection

1. Introduction - the journey to the General Data Protection Regulation 2016/679

2. The shape of European Union data protection

A. The Data Protection Principles

B. Fair and Lawful Processing.

C. Information Provisions.

D. Data Subject Rights

3. Questions still unresolved by the GDPR

A. Who is the Data Subject? Dealing with genetic relatives.

B. Informed Consent

C. Processing for further purposes.

Further Reading:

Learning objectives

Introduction

The Gold Standard of Participant Protection - Anonymisation and Informed Consent

A. Anonymisation

1. Meaning

2. Availability

3. Desirability

B. Informed Consent

[1] 7 Belmont Report (1979). The Belmont Report: Ethical principles and guidelines for the protection of human subjects of research. hhs.gov/ohrp/humansubjects/guidance/belmont.html (Last visited 1st September 2014).

Further Reading

Data protection

Main menu

Overlapping issues

You are here

Data Protection

Data Protection

1. Introduction - the journey to the General Data Protection Regulation 2016/679

2. The shape of European Union data protection

A. The Data Protection Principles

B. Fair and Lawful Processing.

C. Information Provisions.

D. Data Subject Rights

3. Questions still unresolved by the GDPR

A. Who is the Data Subject? Dealing with genetic relatives.

B. Informed Consent

C. Processing for further purposes.

Further Reading:

Learning objectives

Introduction

The Gold Standard of Participant Protection - Anonymisation and Informed Consent

A. Anonymisation

1. Meaning

2. Availability

3. Desirability

B. Informed Consent

[1] 7 Belmont Report (1979). The Belmont Report: Ethical principles and guidelines for the protection of human subjects of research. hhs.gov/ohrp/humansubjects/guidance/belmont.html (Last visited 1st September 2014).

Further Reading

Data protection