Zanzibar is Google's purpose-built authorization system. It's a centralized authorization database built to take authorization queries from high-traffic apps and return authorization decisions. An instance of Zanzibar hosts a list of permissions and responds to queries from many apps.
A Zanzibar instance is made up of a cluster of servers, a distributed SQL database, an indexing system, a snapshotting system, and a periodic jobs system.
Why did Google develop Zanzibar?
Google has many high-traffic apps, like Search, Docs, Sheets, and Gmail. Google accounts are shared between those systems, so authorization decisions (that is, what actions a Google account can take) need to be coordinated. These apps operate at huge scales, so constant inter-service communication wasn't practical. Their authorization system needed to handle billions of objects shared by billions of users and needed to return results with very low latency. Also, their system needed to handle filtering questions, like "what documents can this user see?"
In particular, their authorization system needed to be:
- Error-free. An incorrect authorization decision might let someone see a document that wasn't meant for their eyes.
- Fast. All other apps will be waiting on authorization decisions from Zanzibar. Google's target was <10ms per query.
- Highly available. Authorization must be at least as available as the apps that depend on it.
- High-throughput. Google handles billions of queries per day.
How does Zanzibar solve these problems?
Zanzibar limits both user errors and system errors. To quote one of Zanzibar’s designers, Lea Kissner, "The semantics in Zanzibar are very carefully designed to try and make it very difficult for you to shoot yourself in the foot." For a resource like a git repository, Zanzibar's API exposes who can see (or edit/delete/act upon) that repository, why they can see it, and how to stop it from being seen.
Zanzibar also limits system errors. Zanzibar is a distributed system, which means it takes time to propagate new permissions. To avoid data staleness, Zanzibar stores permissions in Google's Spanner database. Spanner provides strong consistency guarantees, so Zanzibar never applies old permissions to new content.
Speed and availability
Zanzibar uses several tricks to reduce latency. First, it uses several layers of caching. The outermost cache layer is Leopard, an indexing system built to respond quickly to authorization checks. Then, read requests are cached across the servers that store permissions. Also, calls between services inside Zanzibar are cached.
Secondly, Zanzibar replicates data to move it closer to its physical access point. This system works like a CDN—Google maintains many instances of Zanzibar throughout the world.
On top of that, Zanzibar relies on some hand-tuning. In any authorization policy, some common permissions are used far more often than others. Zanzibar's team hand-tunes these hot spots, for instance by enabling cache prefetching.
With Zanzibar's replication and caching, it can store trillions of access control rules and handle millions of requests per second.
What does Zanzibar do well?
Zanzibar is a centralized source of authorization decisions. That can be a useful approach for two reasons. First, Zanzibar is a single source of truth. Each of your services can call Zanzibar and get a "yes" or "no" answer in response, and those answers are consistent between services. Second, each of those services calls the same API, which makes it easier to use Zanzibar across many services.
Zanzibar also supports reverse indexing (also known as data filtering). This means that after assigning a user many individual permissions, you can also ask, "what resources does this user have access to?" This is a common authorization request (e.g., for list endpoints). It's also useful for maintaining and debugging access controls.
What doesn't Zanzibar do?
Zanzibar does not solve for enforcement, it only provides the "decision" part of authorization. You must determine where and when to call Zanzibar in your application. And when your application calls Zanzibar and gets an "access denied" decision, it's up to you to interpret that response and decide what should happen. There's no is_allowed?() call in a Zanzibar-like system.
Also, Zanzibar provides few abstractions to work with. Zanzibar's authorization logic is a flat list of access controls. You can define relationships between users and resources, but you can't use properties of resources (like default roles) to make authorization decisions. It’s up to you to work out how to represent whatever authorization model you may have as a set of relationships. Google's engineers recommend that you use a policy engine alongside Zanzibar to do the things that Zanzibar doesn't.
Finally, Zanzibar is a major technical investment. Building your own Zanzibar takes at least a year of effort from a dedicated team. Airbnb's Himeji (a Zanzibar-alike) took more than a year of engineering work from a dedicated team. Using Zanzibar also takes engineering effort. At Google, Zanzibar is supported by a full-time team of engineers, plus several engineers from each service that uses Zanzibar. Most apps that use Zanzibar-like systems require hand-tuning to avoid hot spots.
Looking for an authorization service?
Engineering teams are increasingly adopting services for core infrastructure components, and this applies to authorization too. There are a number of authorization-as-a-service options available to those who want something like what Google made available to its internal engineers via Zanzibar.
Oso Cloud is a managed authorization service that provides the benefits of Zanzibar while filling in a number of Zanzibar’s gaps. You use Oso Cloud to provide fine-grained access to resources in your app, to define deep permission hierarchies, and to share access control logic between multiple services in your backend.
Oso is built for application authorization. It comes with built-in primitives for patterns like RBAC and ReBAC, and it is extensible for other use cases like attribute-based access control (ABAC). It is built using a best practices data model that makes authorization requests fast and ensures that you don’t need to make schema changes to make authorization changes. It provides APIs for enforcement and data filtering. Oso Cloud is also deployed globally for high availability and low-latency.
Fun fact: Abhishek Parmar, one of the co-creators of Google Zanzibar and Airbnb Himeji, is a technical advisor to the Oso engineering team.
Oso Cloud is free to get started – try it out. If you’d like to learn more about Oso Cloud or ask questions about authorization more broadly, set up a 1x1 with an Oso engineer.