Monday, August 24, 2009

In an earlier article I talked about data ownership - or lack thereof - at a low, technical level. There are three principal technical actors: the physical custodian, the logical custodian, and the data originator. This article deals with the problem (for the data originator) to limit the powers the physical custodian has. As the owner of the physical equipment that hosts the data, the physical custodian can perform a number of undesired actions with the data he hosts, specifically: (i) copy and distribute it and (ii) disable physical access to it. In many cases, both actions are not desired by the data originator or consumer.

As a first step towards limiting the physical custodians powers, it is important to make sure that the physical custodian (PC) is not also a logical custodian (LC). By this I mean the following: the PC has access to the physical equipment that hosts the data, as well as the transport infrastructure to get access to it. By denying the PC the role of the logical custodian, he may ultimately host data, but will not be able to use or interpret the data in a meaningful way. An obvious way to achieve this, is to encrypt the data and make sure that the PC does not get access to the key. For most practical purposes, this addresses action (i).

But even if the PC cannot access the data he hosts, he still has the "power of the plug": if the PC cuts that connection to the network, or switches of the data equipment, all access to data is lost. In order to be able to address this problem, one can use the following scheme:

  1. Data is stored in some atomic units like files, that can be represented as a data stream.

  2. The data stream is encrypted; keys are not stored with the data.

  3. The encrypted stream is chunked into at least two chunks of identical size. The number of chunks is arbitrary.

  4. At least one parity chunk is computed - think RAID 5 or 6.

  5. The chunks are stored on different data services. This could be a traditional data service, but also other services such as a mail service or a blog service could be used to store the chunks. The table linking the different chunks is stored separate from the data.

The effect of creating such a "Redundant Array of Independent Services" (RAIS) is obvious: not only can the physical custodians not access the data since it is encrypted and they only have a portion. Also, since there is at least one parity chunk, if one provider decides to "pull the plug", the lost data can be reconstructed from the remaining chunks. As an additional protection, users might want to mirror individual chunks on different services as well, thus improving availability.

The obvious open questions are crypto key and chunk table management, especially since these become high-value targets. Master key techniques and independent RAIS systems can address some of these issues through best practices.

tags:

Monday, August 24, 2009 1:29:44 PM (Eastern Standard Time, UTC-05:00)  #    Comments [0]  | 

User-centricity - often expressed in the "7 Laws of Identity" - has been a common theme in identity management for a while now. At the heart of these principles lies the desire to empower the end-users of a computer systems and enable them to negotiate with the provider of service the amount of PII data the users have to disclose for getting access. Beyond the initial authentication and authorization steps for resource access also lies an ocean of other problems such as delegation, pre-authorization, and emergency overrides. These issues play into a vast number of use cases in very different areas such as financials, health care, and social networking.

At the same time, a rather important aspect of identity has been completely ignored: the systems we interact with and their component services and devices do have identities as well, and these identities must be managed with the same details as person identities. The need for non-person identity management goes well beyond the realm of security sensitive environments such as various government services: we are getting ever more dependent on a growing number of devices and services including mundane things such as smart phones and ebook readers, but also critical items such as health monitors. In many cases, high-value or critical services rely on less valued service (such as a health monitors that use the mobile phone system for notification). Overall, we are seeing a polynomial growth of interdependencies of such services of devices.

With these problems looming, it becomes more and more urgent to extend the practices learned in identity management for persons to non-person entities. The solutions for this new class of identities will have to be significantly different, since devices and services will interact with the IdM systems in very different ways and might also have significantly different needs. For example, while privacy protection is important for end-users, devices and services and their operators will likely be more concerned with secrecy, which might borrow from some privacy best practices, but be different in other respects. 

Interestingly enough, PKI has had a notion of non-person identities already for some while. We are relying on the internet PKI for authenticating servers to users and services. At the same time, PKI has been very cumbersome to roll-out to end-users and edge devices. As such, there are some lessons that PKI can provide, so that the efficiencies and abstractions of SAML and related technologies can to go beyond simple user-centricity.

As a challenge, here are some questions that I have with regards to identity management of non-person entities:

  1. What identity can devices and services have? How are these identities different from human identities?
  2. What are the minimal requirements on machine identities?
  3. What new and different interaction patterns are required for enabling machine identities?
  4. How do concepts such as reputation translate into the machine world? 
  5. When machine and human identities interact, is there a need for disclosure that one party is non-human? Or human?
tags:

Monday, August 24, 2009 9:32:12 AM (Eastern Standard Time, UTC-05:00)  #    Comments [0]  | 
Tuesday, August 18, 2009

Data ownership is a rather nasty topic: at a legal level, we have many rights related to data we create or that is about us: privacy regulations, intellectual property rights, copyrights and trademarks, etc. are all aspects of how society attributes ownership to immaterial goods. This practice has been in place since at least the early 19th century, but even then there were critics, among them Thomas Jefferson and James Madison.

With the advent of digitized storage, reproduction of immaterial data has become cheap and lossless. This has a significant impact on the industry: for example, the entertainment industry is currently facing the consequences of this highly disruptive technology advancement, and has yet to redesign their business model to accommodate this paradigm shift.

But this change goes far beyond the entertainment industry or any specific market: at this time, most people have started to realize that data they release about themselves will be reproduced, indexed, and made available via 3rd party search engines. Once the cat is out of the box, it it too late for restricting distribution.

This leads me to believe that we need to re-think the concept of data ownership, at least at a technology level: it does not make a lot of sense to claim ownership of data if one has no means of asserting this ownership in an effective manner. The judicial processes are too slow and too much bound to physical objects. As a result, only a small portion of data ownership infractions is dealt with by courts, and effective enforcement on a global scale is practically impossible.

As a result, it would seem appropriate to me to abandon the concept of data ownership on a technical level altogether - and replace it with concepts that are better suited to how information systems are designed in the 21st century:

  • A physical custodian of data has access and control over the physical object where the data is stored. In many cases this will be effectively a system administrator that is taking care of the computer and harddrives where the data is stored. It also makes sense to consider the organization that employs the system administrator(s) to be physical custodians. The physical custodian has significant control over the data, since he can simply "pull the plug" and make data unavailable.
  • A logical custodian can access and modify the data. A logical custodian can also grant the logical custodian role to other entities. While in many cases a physical custodian is also a logical custodian, there are important cases where this is not the case: in multi-level security systems or environments where data-at-rest is encrypted, the physical custodian might not have meaningful access to the data. The granting of this role can not be reversed: once an entity has access to data, this data can be copied to other physical systems and be re-used.
  • The data originator is the entity that created the data. While origin may be an important factor to determine authority or validity of the data, it does not guarantee either.

Anything beyond these roles cannot - at least with current technology - be properly modeled without relying on concepts beyond the realm of technology. Nevertheless, even these limited roles can be used to model interesting scenarios. For example, a distributed storage system that stores encrypted and chunked data with parity (i.e. RAID 5 or 6 across different services, not disks), can practically eliminate the role of the physical custodian.

Higher level technologies (such as DRM or multi-party encryption) may be successful in restricting the significant control that a logical custodian to some extent, only external mechanisms (such as system certification, trust models, or judicial redress procedures) can limit the logical custodian.

tags:

Tuesday, August 18, 2009 3:07:34 PM (Eastern Standard Time, UTC-05:00)  #    Comments [0]  | 

For some time I have been working with a number of folks at MITRE on a simple representation for electronic health data. Digging into the depth of various standards organizations such as HL7, HITSP, or HIMSS was interesting, painful, and enlightening at the same time. Since last week, our project is online at http://projecthdata.org/, and the hData project has announced releasing specifications, schemas, and code there soon. At this time, you can get the hData white paper, which was also presented at the recent Balisage 2009 conference in Montreal. Overall, hData's approach is very much focused on implementability and ease-of use for developers (since - quoting Mike Kay at Balisage - "As a developer I am also human.")

Interestingly enough, the combination of ODF/Jar style packaging and RESTful integration (taking a ZIP archive of hierarchically organized component documents and representing it as a collection of resources) has some folks interested. If there are more, I will suggest taking this out of hData and creating an independent specification.

tags:

Tuesday, August 18, 2009 2:56:53 PM (Eastern Standard Time, UTC-05:00)  #    Comments [1]  | 

Copyright by Gerald Beuchelt.