Tuesday, November 03, 2009

Today, we released the hData technical specifications: hData Record Format and hData Packaging and Network Transport. This is the mail that went out to the mailing lists:

Today we are releasing the first public version of the hData specification for the record format and the packaging and network transport (REST API). They are available here:

http://www.projecthdata.org/documents.html

We will be making some changes to the documents in the next few days to add a simple meta data model and streamline certain elements. Once this is complete, we are planning on moving the specification to a wiki and open up the process of editing. Until this is done, we would like to ask you sending your comments to hdata-general@googlegroups.com

At this time we are also exploring how the hData specifications can be licensed in an open source friendly way. Possible options include an OASIS style non-assertion covenant – please contact us if you have suggestions.

So far, this covers the core data and exchange architecture, but we have started to work on a RESTful security architecture, as well. The scenario we are trying to solve is outline in a recent presentation at NIST's IT Security Automation Conference. In support of this I have come up with a meta data schema, which I will put into the v0.8 version of the hData Record Format specification. Hopefully, I can upload that new version some time next week.

We are very much looking for comments and suggestions. 

tags:

Tuesday, November 03, 2009 3:03:39 PM (Eastern Standard Time, UTC-05:00)  #    Comments [0]  | 
Friday, October 23, 2009

Marc just made my day by sending me the link to the official submission of WADL to the W3C. Quick background: WADL (Web Application Description Language) is a simple interface definition language, specifically targeted at RESTful applications. It is significantly easier than WSDL 2.0 (or WSDL 1.x for that matter), and has some good tooling support through the Jersey implementation of JAX-RS.

tags:

Friday, October 23, 2009 12:00:08 PM (Eastern Standard Time, UTC-05:00)  #    Comments [0]  | 
Thursday, October 08, 2009

IBAC, RBAC, ABAC ... a lot of folks in identity land are currently investigating authorization models with a little more scrutiny. Mark Dixon has a nice piece up on his blog, covering some of the current trends in the commercial sector.

I would like to make interested folks aware of an extension to the existing approaches to access control, that take it beyond ta simple binary decision: in the Risk Adaptive Access Control (RAdAC) model, the authorization decision is not simply based on pre-defined mandatory and discretionary rules, but instead includes environmental policies such as Security Risk and Operational Need. As such, the authorization decision depends not only on traditional factors such as resource meta data, access control policy, or user attributes, but also factors such as access decision histoy, IT computing platform trustworthiness, or general situational awareness.

RAdAC is not a technology, but instead a more uncconvetional model for making an authorization decision. It will be interesting to see how a model like this can actually be implemented.

Wednesday, October 07, 2009 11:28:36 PM (Eastern Standard Time, UTC-05:00)  #    Comments [0]  | 
Tuesday, October 06, 2009

Our effort to improve electronic health data exchange is starting to pick up some steam: After a very successful rounds of discussions at the HL7 General Plenary in Atlanta in late September (kudos to Andy Gregorowicz for covering this one) and a pretty warm reception, I presented last week at the NIH in Bethesda during the Tao of Attributes workshop on hData and our plans for the identity management and access control piece. I got some really great feedback, and I am hopeful that the idea of using a set of technologies that is know to scale (RESTful architecture style) can address the needs of a complex health data exchange.

Going forward, we would really like to start building a community around hData and L32. To this effect, we have created a couple of email aliases (see here for details) for starting a dialogue. 

Tuesday, October 06, 2009 9:10:11 AM (Eastern Standard Time, UTC-05:00)  #    Comments [0]  | 

I liked Bob Blakey's recent article on privacy, along with the paper he and Ian Glazer published. One direction that might need some additional coverage at some time is the “privacy of organizations”. Organizational sensitive data (such as trade secrets or classified material) follows a similar pattern of what Bob and Ian are laying out for PII: it is disclosed to a trusted group (as such it would not fall under their definition of secrecy), and a legal instrument (such as a NDA) is used to ensure that this data is not released to non-authorized parties. 

In my own world, I have seen privacy and secrecy as very closely related: to some extend, secrecy was to me privacy with a solid logging/auditing system, so that secrecy is really only preserved operationally, and full access to the audit trail would restore the identity (oh dear *that* loaded term again) of all actors. Bob and Ian obviously use a different definition of privacy, which has much stronger implications for the meta-data architecture, including sensitivity markings or IRM controls.

In order to draw a more precise distinction between different concepts of privacy, it might be relevant to examine the origin of the data about me (the data subject): 

  • The first bucket is data for which I am the originator (source).
  • The next bucket is data that someone I interact with directly collects about me, so they are the originator. This may include web server access logs, shopping profiles, etc.
  • The final bucket is data that a third party collects about me, without me interacting with them. In many cases they are not the originator of that data, but instead collect other party's data (including myself). Note that data in this bucket gets particularly interesting when aggregated.
In an ideal world, I (as a person or organization) would have full control over all three buckets, and could determine how the data about me flows. Unfortunately, the world is not ideal. In most cases I can only control the release (!) of data in the first bucket, but once that data is out in the wild, it will inevitably land in the third bucket, which I have least control over. Attempts at controlling that third bucket through regulatory measures are fairly ineffective, as can be seen by the many identity data releases and losses, even in relatively strict privacy regimes.

Tuesday, October 06, 2009 8:25:55 AM (Eastern Standard Time, UTC-05:00)  #    Comments [0]  | 
Wednesday, September 30, 2009
Interesting news this week: Microsoft, SAP, and Siemens have been awarded the SAML interoperable certification for their SAML 2.0 products for the first time. From a customer perspective this excellent news - cross-vendor certifications by independent third parties are a good decisions tools for selecting products. While even a comprehensive test suite cannot guarantee perfect interoperability, it puts the responsibility for debugging the most blatant problem into the court of the vendors.

Wednesday, September 30, 2009 6:56:46 PM (Eastern Standard Time, UTC-05:00)  #    Comments [0]  | 

My town (Burlington, MA) has just revived the Information Systems Advisory Committee (ISAC) to assist in the alignment of the school system's and the administration's IT departments. With many high-technology companies in town, the administration has been at the forefront of the IT development, with a respectable web presence that dates back into the 90s - at a time where only few towns and cities took the web seriously.

To support the new projects, I have been appointed to a position in the ISAC, and I am looking forward to helping the town staff to decide how to move forward.

Wednesday, September 30, 2009 2:56:20 PM (Eastern Standard Time, UTC-05:00)  #    Comments [0]  | 
Monday, August 24, 2009

In an earlier article I talked about data ownership - or lack thereof - at a low, technical level. There are three principal technical actors: the physical custodian, the logical custodian, and the data originator. This article deals with the problem (for the data originator) to limit the powers the physical custodian has. As the owner of the physical equipment that hosts the data, the physical custodian can perform a number of undesired actions with the data he hosts, specifically: (i) copy and distribute it and (ii) disable physical access to it. In many cases, both actions are not desired by the data originator or consumer.

As a first step towards limiting the physical custodians powers, it is important to make sure that the physical custodian (PC) is not also a logical custodian (LC). By this I mean the following: the PC has access to the physical equipment that hosts the data, as well as the transport infrastructure to get access to it. By denying the PC the role of the logical custodian, he may ultimately host data, but will not be able to use or interpret the data in a meaningful way. An obvious way to achieve this, is to encrypt the data and make sure that the PC does not get access to the key. For most practical purposes, this addresses action (i).

But even if the PC cannot access the data he hosts, he still has the "power of the plug": if the PC cuts that connection to the network, or switches of the data equipment, all access to data is lost. In order to be able to address this problem, one can use the following scheme:

  1. Data is stored in some atomic units like files, that can be represented as a data stream.

  2. The data stream is encrypted; keys are not stored with the data.

  3. The encrypted stream is chunked into at least two chunks of identical size. The number of chunks is arbitrary.

  4. At least one parity chunk is computed - think RAID 5 or 6.

  5. The chunks are stored on different data services. This could be a traditional data service, but also other services such as a mail service or a blog service could be used to store the chunks. The table linking the different chunks is stored separate from the data.

The effect of creating such a "Redundant Array of Independent Services" (RAIS) is obvious: not only can the physical custodians not access the data since it is encrypted and they only have a portion. Also, since there is at least one parity chunk, if one provider decides to "pull the plug", the lost data can be reconstructed from the remaining chunks. As an additional protection, users might want to mirror individual chunks on different services as well, thus improving availability.

The obvious open questions are crypto key and chunk table management, especially since these become high-value targets. Master key techniques and independent RAIS systems can address some of these issues through best practices.

tags:

Monday, August 24, 2009 1:29:44 PM (Eastern Standard Time, UTC-05:00)  #    Comments [0]  | 

User-centricity - often expressed in the "7 Laws of Identity" - has been a common theme in identity management for a while now. At the heart of these principles lies the desire to empower the end-users of a computer systems and enable them to negotiate with the provider of service the amount of PII data the users have to disclose for getting access. Beyond the initial authentication and authorization steps for resource access also lies an ocean of other problems such as delegation, pre-authorization, and emergency overrides. These issues play into a vast number of use cases in very different areas such as financials, health care, and social networking.

At the same time, a rather important aspect of identity has been completely ignored: the systems we interact with and their component services and devices do have identities as well, and these identities must be managed with the same details as person identities. The need for non-person identity management goes well beyond the realm of security sensitive environments such as various government services: we are getting ever more dependent on a growing number of devices and services including mundane things such as smart phones and ebook readers, but also critical items such as health monitors. In many cases, high-value or critical services rely on less valued service (such as a health monitors that use the mobile phone system for notification). Overall, we are seeing a polynomial growth of interdependencies of such services of devices.

With these problems looming, it becomes more and more urgent to extend the practices learned in identity management for persons to non-person entities. The solutions for this new class of identities will have to be significantly different, since devices and services will interact with the IdM systems in very different ways and might also have significantly different needs. For example, while privacy protection is important for end-users, devices and services and their operators will likely be more concerned with secrecy, which might borrow from some privacy best practices, but be different in other respects. 

Interestingly enough, PKI has had a notion of non-person identities already for some while. We are relying on the internet PKI for authenticating servers to users and services. At the same time, PKI has been very cumbersome to roll-out to end-users and edge devices. As such, there are some lessons that PKI can provide, so that the efficiencies and abstractions of SAML and related technologies can to go beyond simple user-centricity.

As a challenge, here are some questions that I have with regards to identity management of non-person entities:

  1. What identity can devices and services have? How are these identities different from human identities?
  2. What are the minimal requirements on machine identities?
  3. What new and different interaction patterns are required for enabling machine identities?
  4. How do concepts such as reputation translate into the machine world? 
  5. When machine and human identities interact, is there a need for disclosure that one party is non-human? Or human?
tags:

Monday, August 24, 2009 9:32:12 AM (Eastern Standard Time, UTC-05:00)  #    Comments [0]  | 
Tuesday, August 18, 2009

Data ownership is a rather nasty topic: at a legal level, we have many rights related to data we create or that is about us: privacy regulations, intellectual property rights, copyrights and trademarks, etc. are all aspects of how society attributes ownership to immaterial goods. This practice has been in place since at least the early 19th century, but even then there were critics, among them Thomas Jefferson and James Madison.

With the advent of digitized storage, reproduction of immaterial data has become cheap and lossless. This has a significant impact on the industry: for example, the entertainment industry is currently facing the consequences of this highly disruptive technology advancement, and has yet to redesign their business model to accommodate this paradigm shift.

But this change goes far beyond the entertainment industry or any specific market: at this time, most people have started to realize that data they release about themselves will be reproduced, indexed, and made available via 3rd party search engines. Once the cat is out of the box, it it too late for restricting distribution.

This leads me to believe that we need to re-think the concept of data ownership, at least at a technology level: it does not make a lot of sense to claim ownership of data if one has no means of asserting this ownership in an effective manner. The judicial processes are too slow and too much bound to physical objects. As a result, only a small portion of data ownership infractions is dealt with by courts, and effective enforcement on a global scale is practically impossible.

As a result, it would seem appropriate to me to abandon the concept of data ownership on a technical level altogether - and replace it with concepts that are better suited to how information systems are designed in the 21st century:

  • A physical custodian of data has access and control over the physical object where the data is stored. In many cases this will be effectively a system administrator that is taking care of the computer and harddrives where the data is stored. It also makes sense to consider the organization that employs the system administrator(s) to be physical custodians. The physical custodian has significant control over the data, since he can simply "pull the plug" and make data unavailable.
  • A logical custodian can access and modify the data. A logical custodian can also grant the logical custodian role to other entities. While in many cases a physical custodian is also a logical custodian, there are important cases where this is not the case: in multi-level security systems or environments where data-at-rest is encrypted, the physical custodian might not have meaningful access to the data. The granting of this role can not be reversed: once an entity has access to data, this data can be copied to other physical systems and be re-used.
  • The data originator is the entity that created the data. While origin may be an important factor to determine authority or validity of the data, it does not guarantee either.

Anything beyond these roles cannot - at least with current technology - be properly modeled without relying on concepts beyond the realm of technology. Nevertheless, even these limited roles can be used to model interesting scenarios. For example, a distributed storage system that stores encrypted and chunked data with parity (i.e. RAID 5 or 6 across different services, not disks), can practically eliminate the role of the physical custodian.

Higher level technologies (such as DRM or multi-party encryption) may be successful in restricting the significant control that a logical custodian to some extent, only external mechanisms (such as system certification, trust models, or judicial redress procedures) can limit the logical custodian.

tags:

Tuesday, August 18, 2009 3:07:34 PM (Eastern Standard Time, UTC-05:00)  #    Comments [0]  | 

For some time I have been working with a number of folks at MITRE on a simple representation for electronic health data. Digging into the depth of various standards organizations such as HL7, HITSP, or HIMSS was interesting, painful, and enlightening at the same time. Since last week, our project is online at http://projecthdata.org/, and the hData project has announced releasing specifications, schemas, and code there soon. At this time, you can get the hData white paper, which was also presented at the recent Balisage 2009 conference in Montreal. Overall, hData's approach is very much focused on implementability and ease-of use for developers (since - quoting Mike Kay at Balisage - "As a developer I am also human.")

Interestingly enough, the combination of ODF/Jar style packaging and RESTful integration (taking a ZIP archive of hierarchically organized component documents and representing it as a collection of resources) has some folks interested. If there are more, I will suggest taking this out of hData and creating an independent specification.

tags:

Tuesday, August 18, 2009 2:56:53 PM (Eastern Standard Time, UTC-05:00)  #    Comments [1]  | 

Copyright by Gerald Beuchelt.