Privacy Governed Data FlowData Flow


ON May 14, 2020

Privacy governed data flow is a technology that applies privacy controls to data as it moves between systems and organizations. It provides a descriptive method for implementing constraints imposed by regulations, privacy policies, and information sharing agreements to streamline data flow. It can also apply user consent information to mask or remove parts of data based on user data sharing preferences. Privacy governed data flow works with standard or customized JSON and XML schemas. Because of this, it can be used in many data processing and exchange scenarios, including:

  • data analytics applications,
  • personal data management applications,
  • financial applications,
  • healthcare data exchange (using FHIR, HL7, Open mHealth schemas),
  • health and wellness applications,
  • medical research data sharing.

Privacy governed data flow implements the privacy layer for JSON and XML schemas using overlays. Standardized schemas play a crucial role in interoperability. When data conforms to a schema it can be easily exchanged and correctly interpreted by multiple parties. A schema is a machine-readable specification that describes the structure, and to some extent, the meaning of data. JSON and XML schemas are widely used in the industry to validate data during processing and exchange. However, schemas do not include a built-in mechanism to apply privacy concerns to data. You can use privacy governed data flow to implement the privacy layer for standard schemas such as FHIR, or custom schemas used among partnering organizations.

Separating privacy concerns from schemas using overlays has many benefits. ConsentGrid uses metadata added by overlays to filter data fields based on user consent. This can be used to remove certain sensitive information when exchanging data with third parties. Multiple overlays can be used to compose views of data annotated and transformed for different use-cases. For example, a mobile application collecting personal data can apply one set of overlays to remove identifiable information when sending data to a public service but use a pseudonymization overlay when performing analytics. Shared overlays can be used to sanitize data when multiple parties feed data into a common data pool.

Privacy governed data flow extracts entity relationships and data structures from JSON and XML schemas. There are many domain-specific standard schemas for data processing and exchange (e.g. FHIR and HL7 schemas for healthcare, Open mHealth for mobile health data, etc.). Organizations also use custom schemas for exchanging data with their partners, or for internal use. These schemas usually define numerous interconnected entities. Privacy governed data flow uses entities and entity relationships extracted from schema as the basis for overlays. However, overlays are not tied to the underlying schema. An overlay defined for an entity can be used for others containing similar attributes.

Overlays define additional processing steps for an entity. An overlay contains pointers to fields in a JSON or XML document along with processing directives. The structure of these processing directives depends on the type of the overlay.

Labeling overlays adds metadata that classify data fields to different privacy categories. This metadata is not directly stored in the document, but kept as a separate layer of information that can be used by other overlays. The following labeling overlay for a hypothetical Person entity adds PII label to firstName and lastName fields, and PII and LOC labels to all the elements of the address field:

    "overlayType": "label",
    "fields": {
        "firstName": [ "PII" ],
        "lastName": [ "PII" ],
        "address.*": [ "PII", "LOC" ]

These labels can be linked with granular consent scopes, and additional overlays can remove these fields if user consent is lacking for those labels.

Hashing overlays can be used in the pseudonymization of data. Below is an example hashing overlay for the same Person entity. When processed, it will calculate a SHA256 hash using the firstName and lastName fields, insert a hash field into the document, and remove the firstName and lastName fields.

    "overlayType": "hash", 
    "hash": {
        "algorithm": "SHA256", 
        "sources": [ 
            "firstName": { "remove": true  },
             "lastName": { "remove": true  }

Masking overlays hide data by removing fields from the document, or by modifying field values. Specialized masking overlays are used to filter data fields based on the privacy labels added using labeling overlays and active user consent.

Even though these overlays are defined as JSON documents, they can be used to process both JSON and XML documents.

Multiple overlays can be stacked to compose different views of an entity. For example, one view can have several labeling overlays each using different labeling criteria, and then consent-based filtering can be used to remove fields based on the user consent. Another view can be used for pseudonymization during data transfer to a research data store.

If we process a hypothetical Person entity with the labeling overlay, it adds the following labels to the document:

  "firstName": "Test", [PII]
  "lastName": "User",  [PII]
  "birthDate": "2001-09-01",
  "address": {                    
    "streetAddress": "12 Main St", [PII,LOC]
    "city": "Anycity",             [PII,LOC]
    "state": "CA"                  [PII,LOC]

If the view uses an overlay that checks consent for data labeled with LOC and if the data subject does not give consent, the resulting data becomes:

  "firstName": "Test",
  "lastName": "User", 
  "birthDate": "2001-09-01",
  "address": {}

The same document processed by the hashing overlay looks like this:

  "hash": "e3b0c...",
  "birthDate": "2001-09-01",
  "address": {
    "streetAddress": "123 Main St",
    "city": "Anycity",
    "state": "CA"

You can see a demonstration at this technology at the Privacy Governed Data Flow Demo page.

Sign up to our newsletter