Map Relational Data to Facts

Map Relational Data to Facts

💡

This guide assumes that you're familiar with:

Oso Cloud represents data as facts, which are optimized for authorization operations. Facts are flexible enough to represent any application data, but it's not always obvious how to translate from one to the other. Since most software developers are familiar with relational databases, this guide will discuss how to map common relational data structures to facts.

Facts define some trait of an entity: something it is (public, protected) or something it has (color, amount). Good fact names reflect this. They look like is_TRAIT or has_TRAIT. For instance, if the fact expresses that an entity is public, you should name the fact is_public. If it expresses that an entity has a role, you should name it has_role.

Fact arguments provide the details that uniquely define the fact. The first argument is the entity that the fact describes. We'll call this entity the subject. The fact is_public(Repository:cool_app) states that the cool_app repository (the subject) is public (the trait).

These conventions will guide how facts are constructed in this guide.

💡

We used the name of the repository here for illustration, but in practice you'll usually refer to objects by their ID.

Simple-Valued Fields

Simple-valued fields (i.e. fields that aren't foreign keys) typically describe an attribute of a subject. They can be represented by facts that take the form is_FIELD_NAME or has_FIELD_NAME.

Boolean Fields

Boolean fields express whether or not something about the subject is true (public, protected, banned, disabled). They are the simplest to convert to facts, because the only information you need is the name of the field and the identity of the subject that the field describes. They look like this:


trait_name(subject)

We saw this above in the is_public example. Suppose a repository table with the following structure.


---------- repository -----------
| | | |
| id | name | is_public |
---------------------------------
| 1 | cool_app | TRUE |
| 2 | okay_app | FALSE |
---------------------------------

The trait we're interested in is whether or not a repository is public, which is expressed by the is_public field. The row with ID 1 describes the cool_app repository, for which the value of is_public is TRUE.

If you wanted to determine whether the cool_app repository is public in SQL, you'd write:


SELECT is_public FROM repository
WHERE id=1

To express this same data as a fact using the convention above, you'd write


is_public(Repository:1)

The name of the fact is the name of the trait (is_public), and the first argument is the subject the fact describes (repository ID 1, or cool_app).

💡

Boolean fields are often named is_SOMETHING. If the field name already starts with is or has, don't prepend it again to the fact name.

Strings and Numbers

Strings and numbers express traits that can be quantified (amount, weight) or labeled (role, status). Unlike boolean facts, it is necessary to specify the value of a string or number to fully define it. For instance, your application may have users who have different roles (admin, writer, read-only). You could express this as a string-valued role field on a User table (you probably wouldn't, but let's keep things simple for now).


------------ user --------------
| |
| id | name | role |
--------------------------------
| 1 | alice | admin |
| 2 | bob | read-only |
| 3 | charlie | writer |
--------------------------------

The trait we're interested in now is the role a user has, which is expressed by the role field. You can see that alice has the admin role, bob has the read-only role, and charlie has the writer role.

If you want to fetch alice's role in SQL, you'd use this query:


SELECT role FROM user
WHERE id=1

If you want to express alice's role as a fact, you need more than the name of the trait and the identity of the subject. That approach works for boolean traits because they only have two possible values: TRUE and FALSE. If the fact exists, then the value is TRUE. If it doesn't, then the value is FALSE.

Since string and numeric traits have many possible values, you also need to record the value of the trait in the fact. The fact has_role(User:1) just says that alice has a role. It doesn't say which role alice has. You use the second argument of the fact for that information: has_role(User:alice, "admin").

💡

Roles have special meaning to Oso. If you define roles in a field called something other than role, you should still name the associated fact has_role.

You can use this general form for all traits that have a value:


trait_name(subject, trait_value)

Numbers are expressed almost identically to strings, except that the trait_value isn't quoted. You might have a login_count field to store the number of times a user has logged in.


------------------- user -------------------
| |
| id | name | role | login_count |
--------------------------------------------
| 1 | alice | admin | 100 |
| 2 | bob | read-only | 24 |
| 3 | charlie | writer | 15 |
--------------------------------------------

The user bob (ID 2) has a login count of 24, so you express the number of times bob has logged in as has_login_count(User:2, 24).

Foreign Keys

A foreign key is a reference to another table in a relational database. These references serve two major purposes:

Lookups: Lookup tables define values that are used repeatedly in other tables. For example, you may want to have a role table where each role name is defined a single time rather than repeating the names of the roles directly in your user table.

Relationships: The majority of foreign keys define relationships between two entities.

Lookup tables are closely related to string-valued fields, so we'll start with those and then discuss relationships.

Lookup tables

Rather than repeating the same values over and over for strings that have system meaning like roles and status, many teams will capture all the possible values of those strings in a separate table. That would look like this for our example of users with roles:

Moving roles into a lookup table

The role table simply defines the list of roles that are available.


------ role ------
| |
| id | role |
------------------
| 1 | admin |
| 2 | writer |
| 3 | read-only |
------------------

Now the user table looks a little different.


---------- user ------------
| |
| id | name | role_id |
----------------------------
| 1 | alice | 1 |
| 2 | bob | 3 |
----------------------------

The SQL to get alice's role is also a bit different:


SELECT role FROM role
JOIN user on user.role_id = role.id
WHERE user.id = 1

You could express alice's role in a fact by its ID like this: has_role(User:1, 1), but you probably don't want to. This is because your authorization logic will have rules that depend on a specific role value. You may have a rule that states that users with the admin role get the add_user permission. If you express roles using their names, then you can write that rule like this:


has_permission(user: User, "add_user") if
has_role(user, "admin");

If instead you use the role IDs to express the roles, then the rule would look like this:


has_permission(user: User, "add_user") if
has_role(user, 1);

The first version is much easier for a person to understand. Someone reading the second version has to know which role has which ID (and it's not uncommon for that mapping to drift between pre-prod and production environments). So, if you're using lookup tables to store strings that you use in your policy, store the strings as facts rather than their IDs. Your policy will be much easier to maintain.

Relationships

Most foreign keys express relationships between two entities. The rest of this section covers the most common:

  • One-to-Many
  • Recursive
  • Many-to-Many

One-to-many relationships

A one-to-many relationship is a relationship where one entity may have a relationship to multiple other entities of a given type. If you want to capture the owner of a repository, you can create a foreign key relationship between the repository table and the user table.

Adding an owner relation to the repository table

The owner of a repository is defined in the owner_id field of the repository table.


---------------- repository ----------------
| | | |
| id | name | is_public | owner_id |
--------------------------------------------
| 1 | cool_app | TRUE | 1 |
| 2 | okay_app | FALSE | 3 |
--------------------------------------------

This shows that alice is the owner of cool_app and charlie is the owner of okay_app.

This is an example of a one-to-many relationship. Each user will have exactly one record in the user table. But any given user may be the owner of multiple repositories, so there could be many rows in the repository table with the same owner_id.

Because the owner_id expresses a relation between the user and repository tables, we use the name has_relation for the trait that this field defines.

💡

Although we could have used the name has_owner, relations occur so often in authorization logic that they have special meaning to Oso. Whenever a fact defines a relation between two entities, name it has_relation.

This also makes it obvious from the policy that the fact represents a foreign key in the source database.

We use the same general form to define the owner fact: trait_name(subject, trait_value).


has_relation(Repository:1, "owner")

The trait_value for a relation is the name of the relation. But once again, you need more information to fully define the fact. Now you need to know which User is the owner of the repository. This SQL query returns the owner of the cool_app repository (user ID 1).


SELECT owner_id
FROM repository
WHERE id = 1

We'll call this user the object of the relation. You probably see where this is going. Enter the third argument:


has_relation(Repository:1, "owner", User:1)

💡

You may be wondering why the repository is the subject instead of the user. This is because the foreign key (owner_id) exists on the repository table, so the primary key of the row that contains the foreign key identifies a repository. For one-to-many relationships, it's natural to think of the primary key as the subject, and the foreign key as the object.

All facts that involve a subject and an object have this general form:


trait_name(subject, trait_value, object)

Because relations are so common in authorization, it's worth reinforcing that form:


has_relation(subject, relation_name, object)

Recursive (Self-Referential) Relationships

Recursive relationships are a special form of one-to-many relationship where the foreign key refers to a field in its own table (usually the primary key). Folder hierarchies are a common example. You may have folders that contain folders to arbitrary depths.


folder-1
|_folder-2
|_folder-3
|_folder-4
...

You can represent this with a folder table that has a parent_id field that refers back to the id of another row in the table.

Recursive folder table structure

The hierarchy above would be represented like this:


---------- folder -----------
| |
| id | name | parent_id |
-----------------------------
| 1 | folder-1 | NULL |
| 2 | folder-2 | 1 |
| 3 | folder-3 | 2 |
| 4 | folder-4 | 1 |
-----------------------------

You express the parent relation using has_relation facts the same way you did before. Since the parent_id field is the foreign key, the primary key of the row that contains the parent_id (that is, the child folder) is the subject of the fact. The row that the parent_id refers to is the object of the fact. So we say that folder-2 has the parent folder-1, or:


has_relation(Folder:2, "parent", Folder:1)

💡

In Polar terms, a recursive relationship is expressed by a fact where the subject and object have the same type.

Many-to-Many Relationships (JOIN Tables)

Applications frequently model many-to-many relationships. In a multitenant application, users belong to organizations. Since a given user will likely have different roles in different organizations, it doesn't make sense to associate a single role with a user. Instead, you need a way to associate a role with the combination of a user and an organization. You do this by creating a JOIN table that defines a many-to-many relationship between users and organizations (we'll switch back to the string-valued role here for simplicity).

Organizations and users

This structure allows you to assign users to multiple organizations and to give them different roles on each. Let's say you have two organizations: acme and banjo. alice is an admin in acme and a member in banjo. You represent this as follows (assume the same user data we've used throughout, where alice is user ID 1).


-- organization --
| |
| id | name |
------------------
| 1 | acme |
| 2 | banjo |
------------------
--------- organization_user ----------
| |
| organization_id | user_id | role |
--------------------------------------
| 1 | 1 | admin |
| 2 | 1 | member |
--------------------------------------

Now, if you want to get a user's role, you also need to know the organization you need it for. If you want to know which role alice has on the acme organization, you'd write this query:


SELECT role FROM organization_user
WHERE organization_id = 1
AND user_id = 1

How would you express this as a fact? This is just the general form of a fact with a subject and an object:


trait_name(subject, trait_value, object)

The trait we're interested in is the role. The subject (the entity that has the role) is alice. The value is admin. The object (the entity on which alice has the role) is the acme organization. Putting that all together, we get:


has_role(User:1, "admin", Organization:1)

Summary

You can visualize the mapping from relational data to facts like this:

Map relational data to facts

Much of the power of facts derives from their flexibility, but this flexibility can cause confusion when you map relational data to facts. In this guide, we've proposed some conventions that we've found helpful. Though by no means comprehensive, these guidelines cover the majority of relational data structures in most applications:

  • The general form of a fact should be trait_name(subject, [trait_value], [object])
    • The trait_name should take the form is_TRAIT or has_TRAIT
  • The subject is the entity that has the trait (e.g. the user that has the role)
    • You can also think of this as the row that has the field.
  • The trait_name should mirror the name of the field that it corresponds to in most cases
  • Two types of traits have special meaning
    • Roles should always be called has_role
    • Relations should always be named has_relation
  • If you use lookup tables to store the values of strings that have meaning to your policy, use the values in your fact definitions, not the IDs.
    • has_role(User:1, "admin") instead of has_role(User:1, 1)
  • The object should be the the record that is referenced by the subject
    • The parent folder
    • The organization on which the user has a role