Building Zanzibar from Scratch

Sam Scott

Building Zanzibar from Scratch

Despite being published back in 2019, there has been a sudden surge of interest in Google Zanzibar, the system used at Google for handling authorization. Hardly the most exciting topic in the world, right? But in the last few months, engineering teams at Airbnb and Carta have developed their own internal versions of Zanzibar.

Where is all this recent interest coming from? I can only speculate, but it seems more people are realizing that authorization is the next piece of software to be unbundled.

Lea Kissner, one of the original authors of Zanzibar, recently tweeted about the "reverse-index" property of Zanzibar:

Put simply, reverse-indexable means that instead of being able to answer "can this user access resource X", you can instead ask: "what can this user access?" or "who can access this file?".

That's pretty damn important if you ever need to act on a list of data! That gives you the ability to list all the objects a user can view on their homepage, or implement search over a set of protected resources.

In this post, I'm going to reimplement Zanzibar from scratch.

(Kind of โ€” I'm not doing a Sagan "If you want to make a pie from scratch, you must first create the universe" type of thing here. I'll keep it to the most relevant parts.)

I'll use regular tools that your average engineers would have access to. Yes, this includes our product, Oso. It's open-source, after all!

What are Relationships?

For this blog post, I'm going to use my favorite fake website โ€” GitClub. It's an imaginary GitHub/GitLab clone.

Let's say we're adding the ability to users to close issues. Here's our hypothetical route handler for that:

@app.route("/issues/<id:int>/close", actions=["POST"])
def close_issue(id: int):
    issue = g.session.query(Issue).get_or_404(id)
    issue.closed = True
    g.session.commit()

This completely oversimplified route is great! Except right now we're letting anybody close issues. That's a little too trusting of people on the internet.

Instead let's make sure that whoever reported the issue can close it again. After all, we don't want to rob them of the joy of saying "never mind, I fixed it" and then vanishing into thin air.

What does our data model look like?

For issues:

from sqlalchemy import Boolean, Column, Integer, String, Text
from sqlalchemy.ext.declarative import declarative_base

Base = declarative_base()

class Issue(Base):
    __tablename__ = "issues"

    id = Column(Integer, primary_key=True)
    title = Column(String)
    body = Column(Text)
    closed = Column(Boolean, default=False)

    # Each issue belongs to a single repository
    repository_id = Column(Integer, ForeignKey("repositories.id"))
    repository = relationship("Repository", backref="issues", lazy=True)

    # The user who reported the issue
    reporter_id = Column(Integer, ForeignKey("users.id"))
    reporter = relationship("User", backref="issues_created", lazy=True)

I'm using SQLAlchemy part out of convenience, and part because I want to use the Oso SQLAlchemy integration later to handle some of the query building.

I can check whether this current user was the issue reporter:

@app.route("/issues/<id:int>/close", actions=["POST"])
def close_issue(id: int):
    issue = g.session.query(Issue).get_or_404(id)
    if g.current_user != issue.reporter:
        raise Forbidden
    issue.closed = True
    g.session.commit()

And voila! I've implemented relationship-based access control, or "ReBAC" if you'd prefer to save a few syllables.

Sufficiently underwhelmed? That's reasonable.

That's really all there is to relationship-based access control. It's in the name "relationships". What are we using? Databases with one-to-many relationships. It literally says "relationship" right there in the code reporter = relationship("User", ...)

This implementation is still incomplete and would be a hassle to use in a production system. As it currently stands, only issue reporters can close issues. Think of the poor open source maintainers doomed to wait on users to close issues for them! Clearly, we would also like repository maintainers to close issues too.

In the GitClub app, you have repository maintainers, contributors, and guests. These are all examples of repository roles. Users can also have roles in an organization: either admin or member. We also have some additional logic, like:

If you're an organization admin you're automatically a repository maintainer for any repository in that organization. Everyone is a repository guest if the repository is public.

All of this is role-based access control. But we can think of roles as a kind of relationship: a user can have the "contributor" relationship with a repository.

Let's put this all together in terms of relationships. If:

  • A repository is the parent of the issue, and
  • An organization is the parent of that repository, and
  • The user is an admin of that organization

Then the user is allowed to close the issue.

@app.route("/issues/<id:int>/close", actions=["POST"])
def close_issue(id: int):
    issue = g.session.query(Issue).get_or_404(id)
    if g.current_user == issue.reporter:
        pass
    elif g.current_user in issue.repository.organization.users_by_role("admin"):
        pass
    else:
        raise Forbidden
    issue.closed = True
    g.session.commit()

We're getting a little closer, but we've still got a long list of cases to handle:

  • Users who have been directly assigned as a maintainer of the repository
  • Users who belong to a team, where team members are maintainers of the repository
  • And plenty of other cases.

(Admittedly, I'm intentionally structuring my code in a very naive way for the sake of exposition. In practice, we've probably already abstracted the roles logic away into some other method.)

Why Zanzibar? Why ReBAC?

We've already written extensively about roles and relationships (RBAC and ReBAC if you prefer). One of the benefits of thinking in these terms is that it provides a mental model for how to structure authorization. It also provides an abstraction which can be turned into tooling or frameworks for implementing authorization systems.

Zanzibar is an example of one of those systems.

In Google Zanzibar specifically, there were two challenges of pure engineering. The first was the authorization model โ€” focusing on access control by thinking about it in terms of relationships, which includes the data model, API, and configuration interface.

The second feat was how Google was able to scale such a system to mind-blowing scales. We're talking hundreds of terabytes of data stored, and ten million client requests per second. Making this work required serious work around the data infrastructure, indexing, and the use of "zookies" for consistency.

I'm going to focus on the model part of Zanzibar. In my opinion, that's the most applicable part for most people. While the scaling issues that Google solved with Zanzibar are super impressive, the modeling part is relevant whether you're a 2 person startup or a Fortune 500 company.

Zanzibar's Data Model

Let's start with the data model!

The general idea is to describe all relationships in a single data model. Once we've done this, it's easier to implement abstract logic on top of it, because we have it in a consistent format to manipulate it. The split between authorization logic and data is a recurrent one in authorization!

In Zanzibar, relationships are described as relation tuples, which take the form:

<tuple> ::= <object>'#'<relation>'@'<user>
<object> ::= <namespace>':'<object_id>
<user> ::= <user_id> | <userset>
<userset> ::= <object>'#'<relation>

This definition isn't the easiest to follow โ€” let's break it down a bit.

An example tuple would be issue:412#reporter@alice . In this, the object is issue:412. That is, issue number 412.

The relation is "reporter".

And the user is alice.

In sum, this tuple represents that Alice is the reporter of issue 412. The part that gets a little awkward with this syntax is that the user field can also be a "userset". This is a set of users, described by all users who have a certain relation to an object.

For example, team:eng#member would represent the set of all users who are members of the eng team. Using this, it's possible to write repo:acme#maintainer@team:eng#member to say "all members of the eng team are maintainers of the Acme repository".

Note that there's a bit of a gap here by trying to express everything in terms of users. There's no way to represent "the acme repo is the parent of issue 412." So the Zanzibar paper hacks this in by expressing this as issue:412#parent@repo:acme#... .

The problem here is that user has to be either a user ID, or something that represents a set of users. But we have a relationship that is purely resource-to-resource.

To be honest, I don't know whether this is a flaw in the design of the system, a representational issue in the paper, or something else.

Comparison to other models

This data model is complex! I'm on about iteration #312 of getting my head completely around it. Here are a few comparisons that might help:

Graphs

Zanzibar%20from%20Scratch%203712940328e34fd69fc68dcf9aa79523/Untitled.png

Graph view of entity relationships

In a very similar model, we can think about our relationships as a graph. The nodes are our users and resources, and the edges are the property that relates them.

To quote the paper directly: "Group membership can be considered as a reachability problem in a graph, where nodes represent groups and users and edges represent direct membership".

There's a subtle distinction in that quote. Nodes represent groups and users. For example, a node would need to represent: organizations admins of Acme, organization members of Acme, repository maintainers of Anvil. All of these are distinct nodes. Then we would need to layer in a bunch of logic like: the tuple issue:412#parent@repo:acme#... represents an edge connecting all groups that mention the repository acme with issue 412.

Honestly, trying to wrap my head around that part was quite mind-bending. But from a high-level view, I think the visualization is helpful!

RDF

Here's another comparison: the relationship tuple here actually look and behave pretty similar to triples from the Resource Description Framework (RDF) data model:

<subject> <predicate> <object>

This would be

<Repository acme> <parent> <Issue 412>

I really like the terminology here: subject, predicate, object reads a lot easier (to me) than object#relation@user. So I'll be sticking with that notation throughout. (RDF doesn't define any semantics that make our relationship modeling easier, though, so that's where the similarities end.)

Relation tuples in SQLAlchemy

Zanzibar itself is implemented on top of Google Spanner, a SQL database. In the original paper, they state that they use a table per object namespace. For us, that would be one table each for users, teams, repos, orgs, and issues.

Here is where I'm going to take the biggest detour from the Zanzibar paper. I'm interested in going through the model here, rather than trying to talk through how Google made this system scale. I'm just going to shove all of my relation tuples into one PostgreSQL database.

Here it is. The Google Zanzibar data model in 10 lines:

class RelationTuple(Base):
    __tablename__ = "relations"

    id = Column(Integer, primary_key=True)
    subject_namespace = Column(String)
    subject_key = Column(Integer)
    subject_predicate = Column(String, nullable=True)    
    object_namespace = Column(String)
    object_key = Column(Integer)
    object_predicate = Column(String)

If we wanted to be a little more faithful to the Google spec, we would have a table per object-namespace. But whereas Zanzibar handles joins between namespaces in the Zanzibar servers, we're going to use the database to do that for us!

This is a polymorphic data store: both subject and object refer to data stored in other tables. To break it down, we have a subject (userset in the Zanzibar language), made up of a namespace, a key, and an optional predicate.

For example, ('issues', 412, None) represents issue number 412. Or ('teams', 12, "member") represents all members of team 12.

I'm writing this assuming that all other tables use an integer as their primary key. This is a pretty big assumption. In practice, we might need to convert indexes to a unique identifier like a UUID so that we can adapt to however the downstream applications index their data.

Much like subject, object is also broken into a namespace, key, and predicate. This time the predicate is required. The only data that we are storing here is the existence of a relationship, so without a predicate linking the subject and the object, the data here would be meaningless! So ('organizations', 1, 'admin) is saying that the subject(s) are an admin of organization 1.

Okay, we have the Zanzibar data model in place!

Now we can start implementing the Zanzibar API and do something useful with it.

The Zanzibar API

The Zanzibar API has just five methods: read, write, watch, check, and expand.

Of these, read, write, and watch are methods for interacting with the data itself. Check and expand are both authorization specific.

First, let's look at read. It takes in one or many tuplesets, and returns all the relation tuples matching the tuplesets.

A tupleset just means the (predicate, object) pair.

The concrete tuples need to handled recursively: a user is in a userset if there exists a path from the user to the (predicate, object) tuple, where an edge exists between (subjectA, predicateA, objectA) and (subjectB, predicateB, objectB) if subjectB = (predicateA, objectA).

Concretely:

# Sam is a member of the eng team
("sam", "member", "eng team")

# members of the eng team are maintainers of the acme repo
(("member", "eng team"), "maintainer", "acme repo")

So suppose we want to find all subjects that have a relation on an object. First we need to get all the subjects that directly have the relation:

def read_one(self, object, relation=None, subject_predicate=None):
    filter = RelationTuple.object_key == object.id
    filter &= RelationTuple.object_namespace == object.__tablename__

    # filter by relation if specified
    if relation:
        filter &= RelationTuple.object_predicate == relation

    # filter by source relation if specified
    if subject_predicate:
        filter &= RelationTuple.object_predicate == subject_predicate
    direct_tuples = self.session.query(RelationTuple).filter(filter)

If you're not familiar with SQLAlchemy, or ORMs in general, this might look a little too magic. The SQLAlchemy ORM uses Python's dynamic class system to express SQL querying logic using Pythonic expressions.

The line filter &= RelationTuple.object_namespace == object.__tablename__ is AND-ing on a filter for relation tuples where the object_namespace column matches the tablename of the concrete object 's tablename.

For example, this could get us all the users who have directly been assigned a maintainer of the acme repo.

But we also want to find any users who are in tuplesets where the tupleset has assigned maintainer of the repo:

cte = direct_tuples.cte(
    recursive=True, name=f"{object.__tablename__}_{relation}"
)
cte.union(
    self.session.query(RelationTuple).join(
        cte,
        and_(
            cte.c.subject_key == RelationTuple.object_key,
            cte.c.subject_namespace == RelationTuple.object_namespace,
            cte.c.subject_predicate == RelationTuple.object_predicate,
        ),
    )
)

Here is where we're handling the recursive nature of relation tuples. Whenever you have something that looks like a graph walk in SQL, you'll probably end up reaching for recursive compile-time expressions (CTEs).

In fact, the SQLite documentation has a fairly extensive list of examples for graph traversals.

We can get that in SQLAlchemy by using the built-in methods to create CTEs. The main thing to look at is where we doing a join between the CTE and the relation tuples table. We're looking for all relation tuples where the object is the same as the subject from the existing tuples, and the object predicate matches the subject predicate.

With this, we have successfully implemented the read API!

This is useful if you want to be able to list all the users who have some assigned permission on a resource.

Write and watch have interesting subtleties that I'm not going to cover here. I want to get to check โ€” our main authorization API that justifies the entire approach.

The idea of check is relatively simple: you check whether a user belongs to a userset. Where remember that a userset is defined as: "users who have a particular relation on an object".

This is exactly what we need for authorization! I can check whether a user is the issue owner, and check whether the user is a member of the repository. Or I can even turn permissions themselves into relationships and check if a user belongs to the set of users who have permission to close the issue!

Implementing the Check API

We've already implemented the direct relationships. For example, we can check if a user is an issue owner with the read API.

But how about the roles logic from before? There we had to know that repository maintainers can close issues. And organization admins are repository maintainers. And so on.

The generic process we are undertaking is a rule-driven graph traversal. What I mean, is that although we are mostly traversing a graph, there are a few hops in the graph that are expressed as logical rules as opposed to concrete edges on the graph. The recursive query we wrote for "read" was a regular graph traversal.

But a logical rule would be: all organizations admins are organization members. So whenever the tuple (user, "admin", org) exists, there is implicitly (user, "member", org).

A more complex example has an intermediate hop: a user is an issue closer if they are a repository maintainer on the issue's parent. For this, if there exists a tuple (user, "maintainer", repository) and (repository, "parent", issue) then there exists an implied tuple (user, "can_close", issue).

As it happens, almost everything can be described in terms of concrete tuples or these logically implied tuples!

Let's come back to our original problem: check whether a user can close an issue.

We have a few rules:

  • Issue owners are issue closers
  • Repository maintainers on an issue's parent are issue closers
  • Organization admins on a repository's parent are organization maintainers

We can mostly express this purely in terms of our read queries:

# get issue owner
issue_owners = z._read_one(object=issue, relation="owner")

# repository maintainers on an issue's parent are issue closers
issue_parents = z._read_one(object=issue, relation="parent")
repository_maintainers = z._read_one(object=issue_parents, relation="maintainer")

# organization admins on a repository's parent are organization maintainers
repository_parents = z._read_one(object=issue_parents, relation="parent")
organization_admins = z._read_one(object=repository_parents, relation="admin")

# put it all together: 
users = (
    session.query(issue_owners)
    .union(
        session.query(repository_maintainers), session.query(organization_admins)
    )
    .all()
)

I had to make a quick update to the read check from earlier:

if isinstance(object, Base):
    filter = RelationTuple.object_key == object.id
    filter = RelationTuple.object_namespace == object.__tablename__
    name = f"{object.__tablename__}__{relation}"
else:
    # object is a cte?
    assert isinstance(object, CTE)
    filter = RelationTuple.object_key == object.c.subject_key
    filter = RelationTuple.object_namespace == object.c.subject_namespace
    name = f"{object.name}__{relation}"

In the Zanzibar API, read can actually accept multiple tuplesets. When I pass in a CTE as the "object", really what I'm doing is specifying multiple tuplesets. It's just that I haven't actually evaluated those concretely yet! So I'm evaluating a read query on the output of a read query, and I'm getting chained CTE calls.

I think that's pretty cool, actually. Check is just a bunch of read queries chained together.

However, in the above code we evaluated a very specific sequence of reads to make it work, and we knew what we were trying to do. How can we implement a generic check interface that knows what combination of reads to execute?

Configuring relationships

What we need to do is separate the query implementation from the logic that specifies what combinations of relationships are used to imply other relationships.

The Google Zanzibar paper achieves this through namespace configurations.

For example, the configuration for issues might look like:

namespace: "issue"

relation { name: "owner" }

relation {
  name: "can_close"
  userset_rewrite {
    union {
      child { computed_userset { relation: "owner" } }
      child { tuple_to_userset {
        tupleset { relation: "parent" }
        computed_userset {
          object: $TUPLE_USERSET_OBJECT  # parent repository
          relation: "maintainer"
        } } }
  } } }

First, we're defining a simple relation of "owner". This isn't doing much beyond data validation โ€” we're simply saying this is a valid relation.

We fetch this with issue_owners = z._read_one(object=issue, relation="owner")

The "can_close" relation is where it gets interesting. For instance, I decided to capture "users who can close an issue" as a distinct kind of relationship. This is really a permission assignment, but you can think of that as a relationship too.

The "userset_rewrite" piece of the configuration is instructing Zanzibar how to take some existing usersets, and rewrite them to compute users who have the "can_close" relation.

This is what's happening in the two queries:

issue_parents = z._read_one(object=issue, relation="parent")
repository_maintainers = z._read_one(object=issue_parents, relation="maintainer")

First get all the issue parents (this is the tupleset), and then compute the userset of users who are maintainers of the repository.

This would recursively need to evaluate the configuration for repositories. That includes organization owners are repository admins, and so on.

The logic we need to rewrite that statement is a union โ€” i.e. do it for any of the following relations. (Zanzibar also supports intersection and negation).

The first userset to rewrite is computed from those who have the "owner" relation on the same object. In other words, any user who is an "owner" of an issue, has the "can_close" relation with the issue too.

The second is for repository maintainers. These two get unioned together:

session.query(issue_owners)
  .union(
    session.query(repository_maintainers)
  )

Okay, so we can think about the configuration as giving us rules to follow on what transformations to apply to the dataset.

Zanzibar with Oso

At Oso, we've been working on providing a batteries-included experience for authorization by adding patterns for common models like roles. Next up is adding a configuration interface for relationships, so this seemed like a great opportunity to explore it!

You express authorization logic in Oso using Polar, a declarative, logic-based language. You express concepts like relationships by writing them as rules.

For example, we might write a rule:

user_owns_issue(user: User, issue: Issue) if
   user = issue.owner;

By writing the user_owns_issue rule, I am declaring that user (of type User) owns issue (of type Issue), when then user matches the issue's owner field. The attribute checks are handled in the application.

If we want to make this more generic, we might write a relationship rule:

relationship(subject, predicate, object) if ...

That is to say: there exists a relationship between subject and object, with value predicate, under the following conditions.

The previous rule becomes:

relationship(user: User, "owner", issue: Issue) if 
    user = issue.owner;

Oso policies use application data for evaluating the conditions.

In my Zanzibar implementation, all this data resides inside the RelationTuple table.

I could set up my SQLAlchemy models here so that I can write Polar policies and have them compile into the full recursive SQL queries. That would let me do cool things like add additional conditions, or compute intersections, and so on. Instead, I'm going to be a little lazy here. I'm going to use the existing API method:

relationship(user: User, "owner", issue: Issue) if 
    user in Z.read("owner", issue);

In my Python code, I've registered an instance of my Zanzibar client as the constant Z. This makes it possible for me to access it from inside the policy.

The above expresses: "a user has the owner relationship with an issue, if the user is in the userset of issue owners". This may seem a little redundant. But by expressing this as a Polar rule, we can start implementing logic on top of our data:

relationship(user: User, "permission:close", issue: Issue) if 
   relationship(user, "owner", issue);

There we go! A user has "permission:close" on an issue, if they are an owner of the issue. Which is the same logic we had before.

Of course, we could do all of this already. The point of having a configuration interface is that it provides a structured way to express all this logic, without needing to know how to write the right Polar.

We can use Polar to write a simple configuration interface that mimics the Zanzibar configuration.

This allows you to define whether a relation exists:

# namespace(name)
namespace("organizations");

# relation(namespace, name)
relation("organizations", "admin");

Then, define whether a relation "implies" another relation:

# implies(namespace, predicate, implied)
implies("organizations", "admin", "member");

And even define the more complex multi-hop implications:

implies(
  "repositories",
    { 
    subject_predicate: "admin"
    object_predicate: "parent", object_namespace: "organizations",
  },
  "maintainer"
);

This middle part is saying: if the subject is an "admin", for an organization that is a "parent" of the repository, that implies the subject is a maintainer. Alternatively, you're a repository maintainer if you're an organization admin on the repository's parent.

With this in place, we can write the generic logic that backs our relationship-based model:

# direct relation
relationship(subject, predicate, {object: object, namespace: namespace}) if
    relation(namespace, predicate) and
    subject = Z.read(predicate, object);

# computed_userset
relationship(subject, implied_predicate, object) if
    implies(object.namespace, predicate, implied_predicate) and
    relationship(subject, predicate, object);

# tuple_to_userset
relationship(subject, implied_predicate, object) if
    # compound implies definition
    implies(object.namespace, {
        object_predicate: object_predicate,
        object_namespace: object_namespace,
        subject_predicate: subject_predicate,
    }, implied_predicate) and
    # there is an intermediate tupleset for the object
    relationship(tupleset, object_predicate, object) and
    # the subject has relationship with a member of the tuplset
    tupleset_object = { object: tupleset, namespace: object_namespace}
    relationship(subject, subject_predicate, tupleset_object);

That's quite a dense bit of logic! The key thing to notice is that this is a recursive definition, and ultimately everything recurses to that single subject = Z.read(...) call in the first rule.

What this shows is that really all we need is those three rules to implement most of Zanzibar! There's a little more work we would need to do to add some of the other operations like intersection and negation, but we have the general structure in place.

And with that we are finally done!

From here, all we need to do is implement the check API method itself. This uses Oso to query the policy, with the target object and relation as input. The query returns all combinations of queries that could result in a user having the relationships.

To implement the check API, we just need to check whether the user is one of those returned users.

How well does this perform?

I said at the beginning this was more about learning the model than it is about the work to scale this. But if you're like me, you're at least a little bit curious how well this performs. Are we in the realm of reasonable, or is this going to bring down production within minutes?

Let's find out!

I didn't go with a particularly scientific approach. I generated a bunch of data, threw is in a PostgreSQL database, and ran some test queries.

On average, I made each organization have 10 repositories, each repository have 200 issues, and each organization have 50 users. But these are not uniformly distributed. I tried to make it have some interesting statistics by having a power law for how many orgs each user belongs to, etc.

Cutting to the chase. For a database with about 2 million relation tuples, querying to check whether a user can close an issue takes 50 milliseconds.

That's not bad for regular Postgres!

As you scale that up, it starts to take a bit longer until I run analyze, and then the time comes back down. So ,without going further into optimizing PostgreSQL, I'd call that a success!

If we needed, we could probably follow the paper's lead and start implementing some caching. For example, it could be fun to see if we can turn each of those recursive CTE subqueries into materialized views, and see what impact that has.

I'll leave that as an exercise for the reader.

Zanzibar without the data model?

You made it to the end, dear reader. We have implemented Zanzibar in our application.

What did it cost? Well, refactoring our entire application to put relationships into the central data model.

What would it look like if we attempted to get the same authorization logic as Zanzibar but without the data model?

As it happens, Oso is designed to integrate with existing application data models. Moreover, there's a sqlalchemy-oso integration which means we can push all the authorization logic down into the database.

Using that, we can see what it would look like if our application leveraged the existing data model.

The main change we need to make is in how the relationship predicate is implemented from before. Here it is currently:

# direct relation
relationship(subject, predicate, {object: object, namespace: namespace}) if
    relation(namespace, predicate) and
    subject = Z.read(predicate, object);

This was where we used our Zanzibar data model to read relation tuples.

To read this straight from the application, we just need to implement relationship predicates for each kind of concrete application data. For example, there exists a "reporter" relationship between a user and an issue, if the user is equal to the issue's reporter field:

relationship(user: User, "reporter", issue: Issue) if
    user = issue.reporter;

Thanks to data filtering (soon to be available for all languages!), this will turn into a SQL join when it's evaluated.

We need to implement the other relationships similarly. Roles get a bit more complex because we're traversing a many-to-many relationship:

relationship(user: User, role, org: Organization) if
    user_role in org.user_roles and
    user_role.role = role and
    user_role.user = user;

But overall, I'm happy with how that turned out!

The logic we have in place is already handling the configuration + recursion, so there's nothing more to do. In the actual code, I made a few other changes to make my life easier, but it was a quick exercise.

What we end up with is the same model for authorization using relationships, but with data staying as it is in the application.

Conclusion

We did what we set out to do! We built Zanzibar from scratch, and it clocked in at roughly 130 lines of Python and 20 lines of Oso policy code.

We built the relationship tuple data model, a model to query the data, and a configuration interface to compose queries along with the policy logic to evaluate it.

The tradeoff that Zanzibar asks of you is: conform to this data model and we'll give you an authorization model. As you saw at the end, you can achieve the same authorization model without conforming to a particular data model, and without refactoring your entire application.

So if you're wondering: do I need my own Google Zanzibar? Then it's more of a architectural question, as we've covered elsewhere. For us, the golden rule of authorization is: build authorization around your application, not the other way around. Authorization shouldn't be a reason to refactor your application and centralize all your data.

Once you reach Google scale (e.g., 10 million client queries per second), perhaps that's a time to reconsider.

And if you do find yourself considering going down that path: here's a reference implementation for you.


If you enjoyed learning about authorization or Zanzibar and want to learn more, you should check out our series Authorization Academy which walks through application authorization. Or you can join our community Slack and come talk with us and hundreds of other developers working on authorization.

Want us to remind you?

We'll email you before the event with a friendly reminder.

Get involved in the Oso community

Connect on Slack

Get help from our team, and talk with hundreds of like-minded developers.
Join the Slack

Share the love

Show off the problems you're solving with Oso and how you're leading the charge.

Get Oso Swag

We're sending free Oso swag to users anywhere in the world. Seriously.
Get swag

Get updates from Oso.

We won't spam you. Ever.