A bear playing hopscotch

Authorization Patterns in GraphQL

Patrick O'Doherty

Authorization Patterns in GraphQL

One of the fun parts of working at a developer-focused company like Oso as an engineer is helping customers with their authorization problems via our Slack, Github, Engineer 1x1s and other forums. In these scenarios I often get questions about what authorization best practices customers should follow with GraphQL. My advice on building authorization in GraphQL applications usually boils down to these rules of thumb:

  1. Build your authorization logic as close to the data as possible, ideally within your GraphQL API.
  2. Custom Directives and Middleware are both neat ways to do this while keeping your authorization logic decoupled from your schema – but only if you only have a single GraphQL API.
  3. API Gateways can work if you need a single solution for multiple GraphQL APIs at once, but they can limit the types of rules you will be able to write.

In this article I’ll explore what makes GraphQL tricky (and fun!) when building authorization, as well as some of the options you can use. I will lay out a number of factors to consider, and how they interact with your architecture and other requirements of your application.

Why is authorization with GraphQL hard?

The major reason that building authorization is hard in GraphQL is because of the changes it makes to the relationship between client and server in web application APIs. In traditional REST APIs, servers statically define their endpoints and responses. By contrast, GraphQL lets clients submit arbitrary queries to the server. REST endpoints typically only return content for a single resource type, e.g., Posts. This means that REST clients need to send multiple requests to get the data they need to present their UI. GraphQL clients, in contrast, can send a single request with a query document that describes exactly the data that they need and get it all back in a single response.

Consider the following SQL schema for a Github-like app where we store Repositories organized by their parent Organization:

create table repositories (
  id INTEGER PRIMARY KEY,
  organization_id INTEGER,
  name VARCHAR(255) NOT NULL,
  owner VARCHAR(255) NOT NULL,
  FOREIGN KEY(organization_id) REFERENCES organizations(organization_id),
);

create table organizations (
  id INTEGER PRIMARY KEY,
  name VARCHAR(255) NOT NULL,
  owner VARCHAR(255) NOT NULL,
);

In the REST world you would expose these via a GET /organizations endpoint that returns an index of all the organizations that a user belongs to, and a GET /organizations/<organization_id>/repositories endpoint that returns the repositories for that organization. This means that a user who belongs to four organizations would have to make five requests (one index + four repositories) just to see a list of their repositories.

In GraphQL, however, you would expose this schema as follows:

type Organization {
  id: ID!
  name: String!
  owner: String!
  repositories: [Repository]
}

type Repository {
  id: ID!
  name: String!
  owner: String!
}

type Query {
  organizations: [Organization]
}

The important feature about GraphQL is that it exposes this schema to clients for any arbitrary queries. Using a GraphQL client you could recreate the same response with the following Query to get all Organizations and their Repositories in a single request:

organizations {
  id
  name
  owner
  repositories {
    id
    name
    owner
  }
}

Securing dynamic data access is hard

This flexibility, which makes GraphQL APIs so appealing, is also the property that makes it difficult to add authorization to them. In the REST world, you can (at a minimum) authorize individual endpoints. In the bright GraphQL future, you have to find a way to generically authorize each query and mutation.

On top of the client/server behavior changes, GraphQL also introduces novel architecture options for you to use when deploying your applications. Features like federation make it possible to deploy many different GraphQL services and expose them via a single unified API to clients. As I’ll discuss later, the big issue you face building authorization in a distributed architecture is not having local access to all of the data required to make authorization decisions.

Where to perform authorization in a GraphQL architecture?

I’ll start by discussing authorization in the context of a single GraphQL service as this is the most straightforward case. Often companies will spin up multiple GraphQL services when they launch new product features and use tools like GraphQL federation and API Gateways to combine them into a single client-facing API. I’ll also talk about which aspects of authorization can be handled easily at this API Gateway level, and which are better to build into your nested GraphQL services themselves.

Note: If you’d find it helpful to brush up on any authorization fundamentals, you might pop open up a tab with this article on topics like enforcement, models and architecture.

Building authorization inside a GraphQL service

One reason I recommend that you build authorization directly into your GraphQL API is that it gives you the greatest access to data to use in authorization rules. Within the GraphQL API you can access request-specific metadata, user identity, as well as the local objects – e.g. Repositories – as authorization inputs.

Before I jump into specifics, it’s first helpful to consider the parts of a GraphQL service that work together to respond to requests. This diagram shows some of the different layers in your service that your GraphQL query will pass through in a typical request from beginning to end.

Untitled

The first is the HTTP API. This is usually a thin layer on top of GraphQL that takes care of authentication and some data validation. The HTTP API passes client requests to GraphQL where the real magic happens.

Within GraphQL your request is first processed by middleware. Middleware are functions that wrap your schema to add additional behavior e.g. logging or error handling. After starting the middleware, the GraphQL server parses the user’s query to identify the resolver functions that it needs to invoke to retrieve the necessary results. Each resolver function in turn queries the database or performs the work involved to fetch data for each field in your schema. Beneath all of this is the data access layer. This is the code that talks to our database or otherwise manages the state involved in our GraphQL API. Resolvers are used to wrap access to the data access layer and to add additional business logic for how data should be processed.

Building authorization into resolvers

Untitled

One place to consider building authorization is in the GraphQL resolver layer. GraphQL resolvers are the functions that fetch data for entries in your schema in response to client queries. The server executes each resolver as it traverses the query type hierarchy to generate data to send back to the client. Building authorization directly into resolvers can be tempting because they are the way that you transform a client’s query into data. Doing so however is not very scalable, and can lead to significant headaches due to the requirement to repeat the logic in every resolver function definition. In addition, coupling the authorization and data access logic like this will make it hard to change and test them in isolation.

To see an example of resolver-based authorization in action, I’ll revisit my earlier GraphQL schema. In this example I have an Organization type, and a resolver which fetches Organizations by their ID. I want my resolver to only allow users to see the data for Organizations of which they are a member. This membership check is based on the presence of the user’s username in the relevant Organization’s members field. Resolvers take a context object as one of their arguments that can contain identity and role information as well as request-specific metadata. I use this object to pass the current user’s identity which the resolver checks for each individual Organization it retrieves.

type Organization {
  id: ID!
  name: String!
  members: [String]!
}

type Query {
  # fetch an Organization by ID
  organization(id: ID!): Organization
}
const resolvers = {
  Query: {
    organization(parent, args, context, info) {
      // retrieve our `user` identity from the context variable
      const { user } = context;

      // authorization logic: ensure user is member of organization
        const organization = await Organization.find(args.id);
      if (organization.members.include?(user.username)) {
        return organization;
      else {
        // raise 401 unauthorized error
      }
    }
  }
}

You can apply your own judgement here, but generally in small applications with only basic authorization requirements, building in your resolvers is fine. For anything beyond very simple role-based access control it's not a great idea. (For the uninitiated, here’s a separate guide on role-based access control, including simple to more complex patterns.) If you find your authorization logic growing to dozens of lines in each resolver, for instance, it's probably time to consider different approaches.

Building authorization with custom directives

Untitled

Fortunately, the GraphQL designers had problems like this in mind when they were dreaming up the GraphQL specification. Their solution is custom directives.

GraphQL directives are annotations that you use to add extra configuration data to your schema. Directives let you signal to a client or resolver that it should process a specific attribute differently. For example, the GraphQL specification includes a @deprecated directive that you can use to signal that a field is deprecated and likely to be removed in a future API version. You can use directives to change the behavior of many different parts of your GraphQL schema, which makes them a great candidate mechanism for implementing authorization.

To see how custom directives can be used for authorization, I’ll use them in my example Git web application. Here I have a GraphQL schema defining an Organization type with a members list of usernames and a single owner. I make two directives, @isOrganizationMember and @isOrganizationOwner, that use these fields to enforce simple role-based access control (RBAC). I use @isOrganizationMember to allow every member of the organization to query for it, and @isOrganizationOwner to allow only the owner to use inviteUser to invite additional members.

directive @isOrganizationMember ON FIELD_DEFINITION
directive @isOrganizationOwner ON FIELD_DEFINITION

type Organization {
  id: ID!
  name: String!
  members: [String]!
  owner: String
}

type Query {
  # allow members to retrieve their Organizations
  organization(organizationId: ID!): Organization @isOrganizationMember
}

type Mutation {
  # only owners can invite new users to an Organization
  inviteUser(organizationId: ID!, username: String!): Organization @isOrganizationOwner
}
// Import GraphQLField type definition and SchemaDirectiveVisitor which applies our directive to annotated fields in our schema.
import { SchemaDirectiveVisitor } from '@graphql-tools/utils';
import { GraphQLField } from 'graphql';

export class IsOrganizationMember extends SchemaDirectiveVisitor {
  // directive code to invoke when we encounter an @isORganizationMember
  visitFieldDefinition(field: GraphQLField<any, any>) {
    const { resolve } = field;
    const { role } = this.args;

    field.resolve = async function(...args) {
      const ctx = args[2];
      const { user } = ctx; // user identity data e.g. from JWT
      const organization = await ctx.db.Organization.find(args.id);
      // check that the user is listed as a member of the organization
      if !organization.members.includes(user.username) {
        // raise 401 error
      }
      return resolve!.apply(this, args);
    };
  }
}

export class IsOrganizationOwner extends SchemaDirectiveVisitor {
  // directive code to invoke when we encounter an @isOrganizationOwner field
  visitFieldDefinition(field: GraphQLField<any, any>) {
    const { resolve } = field;
    const { role } = this.args;

    field.resolve = async function(...args) {
      const ctx = args[2];
      const { user } = ctx; // user identity data e.g. from JWT
            const organization = await ctx.db.Organization.find(args.id);
      // check the user is the organization owner
      if !(organization.owner == user.username) {
        // raise 401 error
      }
      return resolve.apply(this, args);
    };
  }
}

The major advantage to using custom directives compared to resolvers is that they let you reuse the same authorization building blocks throughout your schema but maintain only a single definition. Building with Custom Directives also means that you can work on your authorization separately from any business logic in resolvers. The only downside is that they can become hard to manage if you want to implement the same authorization logic for all of your types. Doing this means you have to annotate every type definition, and also that any accidental omission will allow unauthorized data access.

Authorization using middleware

Untitled

Another alternative to building into resolvers, but one that moves authorization out of your schema entirely, is to use GraphQL middleware. GraphQL middleware are similar to the HTTP request middleware pattern used in web development frameworks like Express, Django, Rails, and others. Middleware are a "vertical" layering abstraction that lets you hook into different parts of the request life cycle.

In GraphQL, middleware work very similarly to directives because they let you wrap existing schema objects and change their behavior. This is what re-implementation of @isOrganizationOwner and @isOrganizationMember custom directives would look like in middleware.

const organizationOwner = async (resolve, parent, args, context, info) => {
  const { user } = context;
  const { organizationId } = args;
  const organization = await context.db.Organization.find(organizationId);
    // authorization logic: check the user is the organization owner
  if !(user.username == organization.owner) {g
    // throw 401 unauthorized
  }
  resolve(parent, args, context, info)
}

const organizationMember = async (resolve, parent, args, context, info) => {
  const { user } = context;
  const { organizationId } = args;
  const organization = await context.db.Organization.find(organizationId);
  // authorization logic: check the user is a member of the organization
  if !(user.members.contains(user.username)) {
    // throw 401 unauthorized
  }
  resolve(parent, args, context, info)
}

// Middleware, like annotations, can be applied to specific graph fields
// Here I want to apply attribute-based access control to the `inviteUser` on its own
const organizationOwnerMiddleware = {
  Mutation: {
    inviteUser: organizationOwner,
      ..
  }
}

const schema = makeExecutableSchema({ typeDefs, resolvers })
const schemaWithMiddleware = applyMiddleware(
  schema,
  // schema-wide Member authorization check
    organizationMember
  // mutation-specific Owner authorization
  organizationOwnerMiddleware,
)

One big advantage to middleware over directives is that they let you write rules that apply to your whole schema at once i.e. every query and resolver. This ability to wrap every query makes them a great way to build common behavior that you want to enforce regardless of any underlying schema changes.

Building authorization at the data access layer

Untitled

If you want to keep your authorization logic and GraphQL completely separate, then you might consider building authorization into your data access layer. This is the layer of your application beneath GraphQL which handles fetching resource data from your storage or database technology. Implementing your authorization here is completely independent from your schema and resolver code.

The big idea behind building authorization at the data access layer is that it will be automatically reused in all of your resolvers and mutations. This lets you keep your GraphQL API code short and simple while you perform the heavy lifting elsewhere. Here I use Oso Data Filtering to authorize access to repositories based on a user’s organization membership. To begin I tell Oso about all of our application types, and their relationships to each other. Oso uses this data combined with our authorization policy to apply dynamic authorization constraints to all my database queries. This means that when the GraphQL server fetches data in resolvers it will only ever receive valid authorized results specific to the requesting user.

After I tell Oso about all of the relevant data types, I need to pair that the rules about how I want my application to work. Oso provides a declarative language, called Polar, which I’ll use to declare the rules in my application. Here’s an example snippet describing users, organizations, and how they are related to each other via roles.

# policy entrypoint. has_permission calls the `has_role` rules
# below with others to authorize a query
allow(actor, action, resource) if
  has_permission(actor, action, resource);

# User actor type
actor User {}

# Org type with `member` and `read`
resource Organization {
  roles = ["member"];
  permissions = [
    "read",
  ];
  # members see Organization data
  "read" if "member";
}

# define how Users and Organizations are related to each other via OrganizationRoles
has_role(user: User, name: String, org: Organization) if
    role in user.organizationRoles and
    role matches { role: name, organization: organization };

With this policy in place I can use Oso’s Data Filtering API to fetch authorized resources with my resolvers. This allows them to be very simple and free of any authorization logic entirely.

const resolvers = {
  Query: {
    organizations(parent, args, context, info) {
      // retrieve our `user` identity from the context variable
      const { user } = context;
      // pass this user identity to Oso,
      // use it to retrieve a list of all the Organizations that they can read
      return oso.authorizedResources(user, 'read', Organization);
    }
  }
}

Building authorization at the data access layer is a good fit the “read” path of your application when clients are fetching data. However, it’s less ideal in the opposite “write” path when you want to enforce rules about how clients make changes to data. The big issue is lack of access to all the rich request-specific data that you might want for authorization inputs. By leaving the GraphQL layer you lose access to many of the request and user-specific variables that let you build fine-grained authorization logic. For example, if you wanted to limit writes to an object to only special circumstances – i.e. in specific mutations only – you would not have access to that knowledge at the data access layer.

Building authorization in distributed GraphQL APIs

I’ve talked so far about authorization in a single GraphQL API, but what happens if you need to run multiple of them? Maybe your company launched a new product with its own service and you want to combine it with your existing as a federated graph? Building authorization in a distributed microservices architecture brings an entirely different set of challenges to that of building in a monolith. As I’ll show in the examples below, the biggest of these is the lack of easy access to all of the data you might need when running your authorization logic.

Using an API Gateway

Untitled

If you are using an API gateway to deploy a federated graph server, then you can use middleware within the gateway to consolidate your authorization logic. This central vantage point, however, comes with some limitations. API Gateways cannot easily access resource data that is only available later within each service in the request flow. This will limit the types of authorization rules you can write.

If you are building advanced authorization logic that relies on resource data like attribute-based access control, your best bet is to do so in the source GraphQL service. To do this you'll need to configure your API Gateway to pass along the user role and identity data with each request. The inner service can then use these authorization inputs to enforce the more specific policies as they would with any other request.

To see an example of this problem I’ll imagine that my Git web service is now split into separate subgraphs each for Repositories and Organizations. I want to use a single API Gateway to authorize all client access to both graphs. Each request through the API Gateway carries a JSON Web Token (JWT) containing the user ID and their role in their current organization. Here’s an example:

{
  "user_id": "3e5e3819-058a-4047-ae95-b048a5983b60",
  "username": "patrickod",
  "organization_id": 123,
  "organization_role": "ADMIN",
}

The Repository service exposes a inviteCollaborator mutation which I want to protect with the following rules:

  • users who have the ADMIN role in the organization that owns the repository can invite collaborators
  • users who have the MAINTAINER role in the repository can invite collaborators

Enforcing the first of these rules at the gateway is straightforward as we have the necessary organization role data to enforce role-based access control over the inviteCollaborator mutation.

But what happens when you want to enforce authorization that varies by each resource? In this case my second rule depends on the user’s role in the specific repository that they want to invite someone to, and not its parent organization. From this position so early in the request the API Gateway has only half of the input variables for its authorization decision. It knows the identity of the user, but nothing about the Repository object that lives in the nested service or the user’s relationship to it.

In situations like this you could add more detailed resource-specific role data to your JSON Web Token (JWT), but there are practical and performance limitations to this approach. For example, if your clients pass JSON Web Tokens (JWT) as a HTTP request header, you might run into restrictions on HTTP proxies that reject HTTP requests whose headers exceed their size limitations, typically 8kb. Another consideration is that you must keep the list of resources in this JWT up to date. If someone updates the relation data in your application, e.g. promotes someone from MEMBER to MAINTAINER in a repository, that change will only become effective once the other client refreshes their JSON Web Token (JWT). For these reasons, I recommend you use JWTs to describe who/what the user is and not their specific relationship to different resources.

Authorization in a distributed GraphQL architecture

The API Gateway is a tough place to build features such as role-based access control (RBAC). If you wanted to build authorization into each of your federated graphs and have them work in harmony, what other options are there?

As a distributed system, keeping all of the object and relation data – i.e. who has access to what resources – in sync between services is not simple. You might try to avoid this problem altogether and use a just-in-time approach within GraphQL middleware to fetch the most up to date relation data from other services. This would require nested API requests during GraphQL queries and would likely have latency and other performance consequences. While you can delay fetching relation data until a request arrives, your server will need to know its authorization logic ahead of time. If you need to apply the same authorization logic in all of your services, you'll also need a method of keeping their codebases in sync with any changes.

An alternative is to run a centralized service to house all of your relation and authorization policy data. This means that instead of performing their own authorization checks each service would instead ask this central authorization service for results. An example of such a service is the one described in Google's Zanzibar paper. With Zanzibar, you store all the data describing users, objects, and the relations between them in a central service. Zanzibar then uses this data to perform authorization checks for all of your other applications. Some companies such as Airbnb and Carta have built Zanzibar-based systems to solve authorization at a large scale. (The Zanzibar model has pros and cons, some of which are described in my colleague’s blog post on Best Practices for Authorization in Microservices). But building these systems can take 12-24 months for a full team of engineers, who then need to continue to build and maintain them evermore.

The Oso team and I actually build and run a solution to distributed authorization, called Oso Cloud. It builds on the Oso library, and solves many of the problems that come up with distributed architectures like how to pull disparate data together, performance, and reliability. You can try Oso Cloud for yourself. Also, feel free to set up a 1x1 with an Oso engineer for advice on how to build authorization in your distributed application.

Further resources

As I said earlier, there’s no one-size-fits-all solution to authorization in the GraphQL world. For other resources on authorization, we've written the Authorization Academy to help you understand the fundamentals. These guides aren't specific to Oso, and cover industry-standard authorization concepts.

If you’re ready to start getting your hands dirty, I’d encourage you to try Oso Cloud. If you'd like to see an example implementation, check out our walkthrough to how you can use Oso Cloud and GraphQL directives to enforce consistent authorization at the schema layer.

Want us to remind you?
We'll email you before the event with a friendly reminder.

Write your first policy