https://identityassurance.blog.gov.uk/2014/11/05/tech-arch-privacy/

How the GOV.UK Verify technical architecture protects users' privacy, and why it's appropriate

When we designed the identity assurance architecture we wanted to protect users from identity theft and fraud, to secure their data as it is used online, and to reduce the amount of information needed from the user to a minimum.

These concepts have been embodied in the Identity Assurance Principles as developed by our Privacy and Consumer Advisory Group and have been a guide throughout the development of our architecture.

To help us to achieve these goals we created a hub service that sits between online government services and the identity providers to help users to authenticate with an appropriate identity provider, facilitate matching of users to services, and to enforce policy e.g. only allowing trusted services to make requests.

Services need some data to identify you but we want to keep that data safe…

One of the consequences of transacting with government services is that we need to connect the user wanting to access a service with records or identifiers that mean something in the context of that service. For example, we want to make sure that a service retrieves your records, such as your driver licence details, rather than those for the other John Smith who lives just a few streets away.

To do this services need data that we can trust so integrity and provenance are important to the service providers. We also need to move that data from a trusted identity provider across the internet safely. To achieve this we only send data we really need to help you access a government service, and we use powerful cryptography to ensure that only the intended recipient can read the data and that no one can change that data in-flight.

The data we send about you when you log-in is limited to something called a Matching Data Set which is made up of your Name, Address, Date of Birth, and Gender (if you provided it*). This data is verified by your choice of Identity Provider and is used to access government services by matching to the data they already have such as your driver licence in the example above.

We want to minimise the data we need…

We don’t keep your identity data centrally; in fact we don’t keep it at all, or even get to see it ourselves: it is held by the identity providers on your behalf.

Some people think “it’s the government, they know me…” but largely it’s only the service or department you are interacting with that know your details and even then the data they hold could well be out of date if you haven’t used their services recently, or incomplete if they didn’t need to keep all of your information as part of the process.

To protect your privacy the hub service only has access to a small amount of your data when you want to access a service, which is only released to us when you consent to that happening, and the identity provider of your choice verifies it in advance. We call this small set of identity data the Matching Data Set and we make sure that services don’t keep this data or re-use it for any other purpose unless they first gain your consent.

We don’t let identity providers know what service you are accessing…

One advantage of our hub service is that we can prevent identity providers from knowing which government service you are about to use. We appreciate that our identity providers are commercial organisations and we don’t think it’s appropriate for them to know what service you are trying to access. Unless you choose to tell your identity provider, they won’t know with which service you’re using, and they certainly won’t be able to see what you’re doing there, they will simply know that you are accessing a government service.

We want to make sure you can protect your identity…

Identity theft and account takeover are real-world problems and whilst we might not be able to completely stop this happening we do want to provide users with the ability to repair identities or to recover accounts depending on the situation. 

Identity providers play a major part in this, but the hub service and our architecture allows us to identify potential problems as they happen, or investigate when and where identities have been used should an incident be reported.

In short, we care about your identity and we want to protect it but we realise that you also want a great service. Minimising the data we need, protecting it at all times, and making sure that we get your consent should we need to use your data as part of a service helps us to ensure that protection and still provide great services.

Fig.1 - The data you enter, where we process it, what data is stored, and by who

What data is entered, processed and stored
What data is entered, processed and stored

In simple terms there are three services that you will encounter when accessing a government service: the government service itself, the hub service, and an identity provider of your choice where you prove your identity or simply login. In the diagram you can see the type of data these three services ask you to provide, where this date might be processed, and where it may be stored. You are only asked to enter what we need, we minimise what is stored, and we always gain your consent to proceed.

 

*Clarification note:

Users are asked to provide their gender but it's not mandatory. If a user does say what their gender is, it may be useful in some cases as part of the set of data that's used to match them to the correct record in a department. But it's optional, relies on data asserted by the user (ie it is definitely gender, not sex), and unlike some of the other elements of the matching set, IDPs certainly don't ask for any history of gender (unlike, say, recent address history which is needed to help identify some people).

13 comments

  1. MarkK

    1. The link is to the draft principles which were circulated for public consultation. Even if you rejected all of my comments, surely there was someone able to offer improvements? A presentation was made to OIX, but did the promised public workshops ever happen?
    2. The last part of the third principle is only that the Service provider cannot know the IdP, so why add the onerous extra feature that the IdP must not (rather than need not always) know the relying party? Not only does this make it harder to correct errors when they occur, it is incompatible with the legislation requiring ISPs to keep metadata, and they can easily be one and the same. If, say, Verizon are required to note that I contacted nhs.gov.uk, why does it matter that they know something was verified? I don't hide from my chosen telephone company the number I am calling. What's the big deal (that the privacy experts didn't include)?
    3. The principles presume an ombudsman; who is it, or when will it be recruiting and functioning, at least in test mode? (DEFRA has already mandated operational use of the service even although it is only in test - need to apply for CAP basic online, which needs IDAP they say - so the Ombusdman appears to be a significant missing component.)
    4. How can just name, address and date of birth be adequate for the international obligations for interoperability such as a UK user using German public online services as called for in the Manchester Ministerial Declaration (2005)?
    5. How will the Hub handle, say, a notified Estonian eID, as required by the eIDAS regulation?
    6. The published SAML spec only allows for printable ASCII and the Euro symbol. How will it work for those whose names or addresses contain accents or other non-American characters?
    7. How does the user know it's the hub and not some hub-in-the-middle attacker?

    Link to this comment
    • Janet Hughes

      Hi Mark, thanks for your comment. Here are answers to your questions - I hope this covers it but please comment again if you would like more information or explanation.

      1. Here's a link to a blogpost by the chair of the group explaining the consultation process worked: http://ntouk.wordpress.com/2013/12/05/uk-government-id-assurance-principles-consultation-and-feedback/

      2. The identity provider won't know which service the user is accessing - all communications between services and identity providers happen through the hub. This is because the identity provider doesn't need to know that information, so it's consistent with the principle of data minimisation.

      3. We're working on proposals for how we should deal with complaints from users whose complaints have not been resolved by the identity provider, and we'll carry on developing them in the light of our learning from public beta.

      4. and 5. We've been working with EU Member State colleagues and the Commission on the new eID regulation and implementing acts and we'll be posting in more detail on this issue and work on this shortly.

      6. The published version of the SAML profile is over a year old - we've been reviewing how this profile works with real users, services and identity providers, and this has resulted in a new version of the profile which we expect to publish before the end of the year. This new version will take into account this and other changes that we have made to the service during beta or we plan to introduce soon.

      7. Users should start at GOV.UK and do everything they'd normally do to be safe online.

      We have an 'EV' TLS certificate which is used to secure the browser connection to the hub. This allows users to check in their browser that they are connected to the hub service operated by the Cabinet Office.

      This type of attack is not uncommon on the internet and we’ve designed the architecture and our monitoring to prevent this and other potential security issues. Each entity in the architecture is able to check the integrity of requests and responses received and is capable of acting should an unknown entity issue these messages or an untrusted message be reviewed.

      Link to this comment
      • MarkK

        Thanks for the update.
        1. The group revised the principles last year and hoped they would be
        published early in 2014, so we may be working off different versions as
        the link is to the draft version.
        2. Privacy nuances are hard to discuss in blogs, but presumably will be
        laid out clearly in the privacy impact assessment for the system as a
        whole (distinct from what separate IdPs do) which is mandated by Cabinet
        Office policy for all new systems. When is the PIA scheduled for
        publication?
        3. Stakeholders listed in GPG43 include those who are not users but
        whose identity has been usurped. The service providers will be unable to
        identify the IdP at fault, and a memoryless hub doesn't sound helpful
        for after-the-event fraud investigation. To whom should such people turn?
        Meanwhile, there would seem to be a chance that genuine users will
        falsely claim to service providers that their ID has been used online.
        What is the process for handling such assertions?
        4. Foreign relying parties have not been mentioned in anything I've seen
        in IDAP2 framework, yet would seem to be possible in the timescales
        indicated. It has been a requirement now for ten years, so potential
        bidders will need to understand the implications if they are being
        'notified'.
        5. This issue around using UK ID arose when the register behind the ID
        cards was scrapped, not when eIDAS was finally agreed. Details are
        important, but the outline architecture is presumably known by now. My
        question was one on information theory: if you don't have the relevant
        information (e.g. on whether or not a person is a citizen), how can you
        pass it into a system which needs it?
        6. The spec may be a year old, but the issues on handling user choices
        such as language support were raised more than two years ago, and would
        normally be included in requirements by cerebration rather than user
        trials (where it is true that they might be adjusted).
        .gov.uk (unlike .gc.ca since last century) doesn't appear to have any
        systematic approach to Welsh or other language choice (www.gov.uk/cymraeg
        shows lang="en" and asks, only in English, is there anything wrong with this page. The
        Welsh language Act calls for consideration of the use of Welsh for
        services, which must mean it's used by some and so must be supported in
        any central component. Universal Credit in Sutton says it will be available in Welsh, although it doesn't specify online.There's a German government standard on character
        sets, which would seem technically adequate - but it is in German.
        Amending the character set is a major issue for relying parties and
        IdPs. The IDAP2 is about to go out, so maybe it will be in the
        documentation for that. There's also appears to be nothing requiring at
        least one IdP to offer services in Welsh, so no reason to assume there
        would be any.
        7.Some parts of .gvo.uk require us to trust something issued by an American or Irish company neither of which the man in the street has heard of. Using certificates with strict liability
        limitations is fine for e-commerce, but as a single point of failure for
        a national system seems odd - even if CESG have signed them off as
        trustworthy.

        Link to this comment
        • Janet Hughes

          Hi Mark. Just a few additional points in response to your questions - some of this is quite detailed so we'll need to cover it in separate posts rather than in comments, to be able to address the points fully.

          Both versions of the principles are included on that page - the initial consultation version and the newest version. They are still called ‘draft’, recognising that further development might be needed as we go through public beta.

          See Toby’s post about his planned work to assess the system as a whole (https://identityassurance.blog.gov.uk/2014/11/05/protecting-privacy-in-gov-uk-verify/).

          We’ll respond to your questions on how GOV.UK Verify protects people against fraud in a separate post - it’s a little too detailed a set of questions to deal with usefully in comments.

          We’re planning a separate post on the work we’re doing with the EU on the interoperability of different national systems which will cover your questions on that.

          We’ll be updating the SAML profile shortly to reflect the language and other issues you mention.

          Thanks again,

          Janet

          Link to this comment
  2. Graham Jenkins

    "We don’t let identity providers know what service you are accessing" is a good principle. But could they still make simple inferences?

    If I used to authenticate once a year at the same time, I'm probably doing my taxes, and my credit rating might go up. If I now authenticate multiple times in close succession, my circumstances have changed, and I might be deemed a worse credit risk?

    Link to this comment
    • Janet Hughes

      Hello Graham, thanks for your comment. The fact that an identity check has taken place has no impact on your credit rating, and the providers aren't allowed to use data gathered or inferred during the verification or sign-in process for any other purpose without your informed consent. Where credit reference agency data has been used, there will be what's known as a 'soft' marker on your credit reference agency file, showing it was used for the purposes of verifying your identity. 'Soft' means that only you will be able to see the marker (so you would know if someone else had used it for that purpose, for example).

      Link to this comment
  3. simonfj

    Thanks Adam (Janet),

    That's a pretty good overview of the process.

    We've still got the problem of expecting people to offer their "life histories" to a private entity. But maybe I'm wrong and they will. I'm more inclined to believe the .govspace will follow the .eduspace. i.e. we'll get down to local govs being identity providers, as individual unis do.

    This might be useful, as even though most people inside a gov have attended a uni, they won't appreciate the way federated networks are put together. https://wiki.edugain.org/Federation_Architecture So "Hub and spoke with distributed login" is the established language (even if the GOV.UK domain is the only one you're working on just now, you'll be collaborating with your peers inside a few ec silos, as "externals" https://ecas.ec.europa.eu/cas/wayf) . Pictures are easier than words.

    Two notes. On your graphic. Take out the WE in "data We process(ed)". Data WE store(d). Bit confusing when you're talking about three entities. You also say "We appreciate that OUR identity providers are commercial organisations and we don’t think it’s appropriate for them to know what service you are trying to access. Maybe THESE would be a safer description.

    One consideration. How would one provision (services) for groups (like the 14 GDS groups) who span different central (and local) departments, and between departments and their communities of interest? The question tends to get "collaboration" put into the network-design agenda.

    Link to this comment
  4. Janette

    I am a farmer's wife aged nearly 82 years. I am the sole user of a computer in the house and I have attempted to understand the process of registering and verifying my husband's identity etc. This information is not understood by the non-technical reader or are we not supposed to want/need to know. Nowhere in these many pages of information is there any refernece to the cost of these processes if an agent has to be used. Also if I manage to master the access the CMS will his identity be accepted in order to comply with such mandatory records even though I am the operator? Makes a mockery of the whole security system!

    Link to this comment
  5. Frustrated reader

    Is someone going to respond to this comment? Or are comments from a normal citizen rather than the techies and the general cohort of GDS luvvies the only ones you really try to answer rather than fob off or ignore?

    Link to this comment
    • Janet Hughes

      Hello - I'm sorry it took us a while to respond - you raised a number of points and I was keen to ensure we took the time to respond fully.
      Thanks again for commenting.
      Janet

      Link to this comment
  6. Kenneth MacArthur

    Why was the decision made not to include unique identifiers like NI number, UTR or driving license number in the Matching Data Set - at least optionally if I consent to it, like gender?

    We're having to burn huge amounts of human time in the UK to solve a problem that's already been solved long ago in countries like Sweden, where everyone has a unique ID number used in interactions with all public sector and many private sector organisations. Do we need to tie our hands behind our back when solving this problem by not even utilising for matching the identifiers that we do have?

    Link to this comment
    • Rebecca Hales

      Hi Kenneth

      Services can use identifiers like the ones you've mentioned for matching purposes. When the service the person is trying to access receives the matching data set (name, address, date of birth and gender if the user has provided it), the service has the option of asking the user an additional question to help find the correct record within the service. The question is usually based on a unique identifier that's relevant to the particular service the person is trying to access.

      Link to this comment
      • TimC

        I'm pretty confused about the semantics of the term 'identity' in the context of this piece. Surely what's being confirmed here is the authority of an individual to access/control a service, rather than who the individual is - or even what a person/robot knows, which is what actually seems to be specified.

        Presumably, that's why globally unique identifiers for people have been avoided as they would allow information to leak from the individuals into government, but the objectives are still not totally clear.

        fwiw, non-repudiation and revocation look a bit limited.

        Link to this comment