Chapter III: Modeling Roles
In the previous chapter, we introduced a typical web application in the form of GitClub and walked through adding authorization to that app. We showed that authorization can be thought of as two parts: the decision and the enforcement.
Decision is the question of "is this user allowed to perform this action on this resource?" In many cases, that's a "yes" or "no."
Enforcement is what action we take once we’ve decided. If the decision was “deny,” do we redirect them, or do we show them a “Permission Denied” page? If the user was allowed to perform the action they asked to do, what does that look like?
In this chapter, we focus on the decision. In particular, we’ll ask the question: "who can do what in the application?" Also, we’ll need to talk about data – when we ask “who can do what,” what data structures are we examining?
To prevent this from being an entirely open-ended problem, we use authorization models to guide our implementation.
In this chapter, we’ll talk through how to use role-based access control (RBAC). We’ll cover a few different variants, focused around different kinds of business-to-business (B2B) applications. For each model, we’ll talk through:
What the authorization model is. Choosing the right authorization model depends on the desired user experience. We’ll talk about what the model represents from a high level, show examples of it in the real world, show when it makes sense to use, and describe what makes it a good fit for that use case.
How to implement the model. As we spoke about in the previous chapter, authorization decisions consist of two pieces of information: data and logic. Authorization logic is the abstract set of rules that decide who can do what. E.g. members of an organization are allowed to access repositories that belong to that organization. These rules are expressed over the authorization data.
We’ll additionally describe how to structure the authorization data to back our logic by including schema diagrams for how this could be stored in a relational database (e.g. PostgreSQL).
In many cases, we’ll use the GitClub example application that we introduced in the previous chapter. As a reminder: GitClub is a website for source code hosting, collaboration, and version control, similar to real-life applications GitLab and GitHub. GitClub offers a pure example of what motivates authorization in the first place – securing access to resources. In GitClub, an example of a resource is a repository. Users may or may not be able to read or make changes to a repository.
For each example, we’ll assume that the authorization is enforced in the application as previously recommended. For example, the method that handles returning a repository will check the user is allowed to read that repository, and return a permission denied otherwise:
We’re using the is_allowed interface introduced in the previous chapter.
This takes in an actor, an action and a resource and returns a True/False.
Authorization models help apply some structure to our otherwise open-ended question: who can do what in our application. This is great from an implementation standpoint – it's helpful to have a template to apply. Also, it dramatically improves our user experience.
An authorization model doesn’t only guide our implementation. It is the mental model that we’ll teach to our users to show them what they can do inside our application. By having a clear authorization model, we can tell our users, “here are the duties you are expected to perform in our application."
Most of us are familiar with what this looks like when done poorly – giant matrices of permissions that require you to have intimate knowledge of how the application functions in order to understand what permissions you need:
Or dialogs asking you to go pester some administrator somewhere:
So how do we avoid these situations?
By using an authorization model that is appropriate to our current needs, and flexible enough to grow over time to accommodate our new needs as they arise.
Accommodating new needs over time is the hard part, and that’s the one that we’ve seen teams struggle with. We'll address that in this chapter by making sure our models build on top of each other and balance flexibility and simplicity.
Roles are a widely used approach to authorization. Also known as “role-based access control,” roles are an effective way to simplify authorization logic for both implementers and users.
A role is simply a way to group permissions so that they can be assigned to users.
When a user is assigned to a role, the user will be granted all the permissions that the role has.
A permission specifies an action that a user can take a resource. For example, we might say that a user in an organization has permission to read repositories.
A role’s permissions are not chosen arbitrarily. In general, a role should align with who the user is, what they want to do in the application, and perhaps even their role or title within their organization.
For example, in GitClub the primary users are developers – that would be one “role” in our system. There are other roles, too: we might have IT administrators who are responsible for configuring organizations or finance users who are responsible for the billing. Each of these users have a different set of permissions they need to use our application. Each of these will be a single role in our GitClub authorization model.
Somebody assigned to the “billing” role would naturally expect to be able to configure billing for an organization - pay for new memberships, update payments information, and so on.
By using roles, we have reduced the surface area of permissions information that we need to expose to our end-users. Instead of asking users to configure a vast number of permissions in order to use the application, they can instead pick from a small number of options.
Roles are incredibly flexible - we will see several different ways to use them to achieve different functionality in this section - and can be used in virtually any business-to-business application.
At GitClub, we’re building our application, much like many others, as a multi-tenant application. This means that we have a single version of our application with many instances serving requests for all of our users and organizations. Naturally, we need to make sure that users cannot view resources for organizations they don’t belong to. So how do we accomplish this?
A simple starting point for roles is to associate each user with a single organization and assign them to a role. All access is then controlled by whatever role the user has in the organization.
But what roles do we create?
To keep the user experience as simple as possible, you should start out with a small number of roles. Too many roles can quickly become confusing and difficult to maintain.
For GitClub, we'll start with just two roles: admin and member. This is a common starting point for many B2B applications. The member role will have access to all core functionality of the application (e.g. reading and writing to repositories). The admin can do everything a member can, plus being able to invite users to the organization. In a real system, you might give the admin permission to set up payments, configure settings, or delete the organization.
First of all, we need a way to associate users with a specific organization and assign them to a role. A simple way to achieve this in the data model is by having a one-to-many relationship from organizations to users and have a separate role column on the user, which stores the role name itself as a string.
With this data model, the logic for checking whether a user can take an action on a resource consists of the following checks:
The role and organization comes from the user data.
We will define “what permissions the role has” by storing them in a simple dictionary from role name to a string list of permissions. To start with, let’s just focus on some simple permissions around reading and writing to repositories, and adding users to organizations. For example:
For our permissions we’re using the convention action:resource, meaning that if a user has that permission, they can perform “action” on a resource of type “resource.” Notice that the user has a role for a specific organization, but the permissions apply to both organizations and repositories.
We know a permission applies to the target resource if the resource belongs to the same organization as the role. Putting this all together, we get:
This gives a simple, maintainable model that works well when the only thing that determines access within the application is the organization-level role.
There are a few limitations with this model: each user is associated with just one organization, and we treat all resources within an organization in the same way. This isn’t enough if we need to assign people different permissions for different repositories. The next section will show how to expand our model to cover that case.
In collaborative software like GitClub, it's common for a user to have their own account for their personal projects and also join many other organizations to work with others.
Instead of having each user be associated with just one organization, we’ll instead make it possible for a single user to belong to multiple organizations. The user will additionally need a role for each organization they belong to.
To support this, we need to modify our previous one-to-many users to organizations to a many-to-many relationship.
We’ll structure the data by creating a join table – much like we would for any many-to-many relationship – and including a column for the role name on the table.
This makes it explicit that belonging to an organization is the same thing as being assigned a role in an organization. It is impossible for a user to belong to an organization without having a role, and vice-versa.
For the logic, we’ll use the same permissions from before. But we need to update our logic to lookup all of the user’s roles, and check them against the target organization:
The logic is starting to get a little more complex. We’re now checking for an applicable role from potentially many by filtering the roles from the database. But we can see the same basic structure is still present: get the role, check it matches the organization, and check for the correct permission.
If you need a user to have an account across multiple organizations, this model is a great improvement over the simple model we discussed earlier!
The alternative to this is a situation like Slack (at the time of writing), where you need to create a separate account and login for every Slack workspace you join. This leads to a situation where users may not even know if they have an account, forget what their login is, and generally increases the barrier to joining new organizations.
This can often be a worthwhile effort for future-proofing too. Even if the application seems to clearly fit the one-user-per-organization model, in the future we may need to add cross-organizational users. By using the more flexible model, adding this new functionality at a later stage doesn’t require any migrations. The cost of this flexibility is a slightly more complex data model.
Suppose that at GitClub we’ve started to attract larger organizations to use the product. We’re starting to hear from customers that they need more customization over who can see specific repositories. For example, one customer has all of their infrastructure stored as code in a repository. It shouldn’t be possible for all team members to have write access to this repository.
Many B2B companies will encounter some variant of this feature request. The organizational roles we’ve written work well, so long as we are happy to treat access to all resources in an organization uniformly. But as the size of the organizations using our product grows, a single set of roles doesn’t adequately describe what all users can do. Our customers might want to create folders and manage access to files by assigning roles to folders, group resources by project or department, or make it possible to control access explicitly to individual resources.
To address this, we can introduce resource-specific roles.
Resource-specific roles are a more general form of organization roles. Instead of associating roles with a specific organization, we associate roles with any kind of resource. Since an organization itself is a type of resource, organization roles are a specific form of resource role.
In GitClub, repositories are another kind of resource. We’ll be introducing roles that are scoped to repositories in addition to organization roles.
The nice thing about this model is that it is structured identically to organization roles.
In fact, organizations themselves are resources. Accessing (or being a member of) an organization is gated in the same way as access to any other resource! Here we’re making that explicit – organizations are no longer special-cased. All resources, in this case organizations and repositories, have equivalent data models, like this:
As a best practice, you minimize the number of different ways a user can get access to the same resource. Therefore, we make it so that in order to access a repository you must have a repository role. We will no longer assign permissions to an organization roles that permit interacting with a repository.
For GitClub, we’ll keep our two organization roles - “admin” and “member” - and define two new roles for repositories - “maintainer” and “contributor.” Maintainers can do anything to the repository: create branches, push code, merge pull requests, and so on. Contributors are only allowed to read the code, open issues, and open pull requests.
We can automatically assign the “maintainer” role to whoever creates the repository. But it would be annoying to have to manually add everybody from an organization to a repository in order to get access. What we want is some base-level access that applies to everybody in the organization.
We can do this by having a default role: for example, anybody in the same organization as the repository has the default role of “contributor,” unless they have been explicitly assigned a role.
Meanwhile, checking access on organization-level actions would continue to check only the organization roles. This includes actions like adding new users to an organization.
Resource-specific roles is an extremely powerful model that can support granular permissions sufficient for many use cases. We started with a few simple resources here - just repositories and organizations - but we can continue to apply the same logic to any other types of resources in the app for which we want to have granular access control.
This model is a great fit for anything that has some kind of resource hierarchy. Whether it matches folders (think Google Drive or Dropbox), projects, directories, or similar.
If you reach this level of maturity, you should feel confident that you can address most needs with your role-based approach!
Here’s one further complication. One of our GitClub users has a CI/CD team whose sole job is to sequence merging commits to main and ensuring that the CI jobs run. They shouldn’t be able to read any of the code, but need to be able to trigger workflows. This doesn’t fit into any of our existing roles.
Despite all of our best efforts to support different use cases and customization, there are a small number of enterprise users who need precise control over what their users can do in their account. As a last resort, we can make it possible for customers to create their own roles and customize them. This adds database complexity and needs a UI — we don’t recommend it for most systems.
Custom roles is a model in which end-users (i.e. users of the application) can create their own roles, and assign permissions to those roles. These custom roles can then be used as usual by assigning users to the roles. Custom roles can be used alongside existing roles.
The model for custom roles is effectively the same as we have used before, except that the map from a role to a list of permissions needs to be dynamic, and we need a way to associate users with a dynamically created role.
To achieve this, we create an additional table to manage the roles. Each role must be associated with a specific resource. If we want an organization to customize its roles, we would associate the role with an organization.
A role consists of a role name and a list of permissions.
From an abstract viewpoint, this looks almost identical from the logic we’ve written in previous schemes. We have the same step of “checking what permissions the role has.” The difference from an implementation standpoint is that those permissions are now stored in a database instead of in memory.
Allowing users to create their own roles is a great way to support a wide range of use cases without needing to anticipate all of them and define roles for those use cases.
However, be very careful with choosing to support this. To make custom roles work, you need to be on board with making your definitions of permissions public. This means documenting them, and notifying users when they change. Adding new features with new permissions or moving things around in your application might have unexpected side-effects.
In fact, looking at our two main sources of inspiration - GitHub and GitLab - neither application supports custom roles.
GitHub opened an issue on their public roadmap in 2020, but have been moving it around ever since:
Meanwhile GitLab has had a ton of discussion about the feature, ultimately opting not to support it: https://gitlab.com/gitlab-org/gitlab-foss/-/issues/12736#note_50662947
But when does it make sense to build custom roles? One possibility is for applications that need extensive configurability, like platforms-as-a-service. In this case, you might not want to impose on your users how you expect them to organize their teams and use your product.
Getting into the world of end-user configurable roles leads towards fully configurable permissions systems like the AWS Identity and Access Management (IAM) system. We’ll talk more about this in a future chapter.
We showed four variants of role-based authorization, from least complex (and least flexible) to most.
The roles we’ve described here will cover many situations you’ll see in production. We haven’t covered everything, though — here are some situations that you won’t be able to handle with roles:
In the next chapter, we’ll introduce relationship-based authorization to handle these cases.
Of course, every organization will have different needs for their authorization design. As before, we encourage you to join the community of developers in the Oso Slack! We'd love to talk about what you're working on and answer any questions you may have. If you want to kickstart the process of building authorization into your application, you can use the Oso library and learn more about it in the Oso documentation.