Benchmarking OpenFGA

Published — Edited

Introduction

A client of Tarka Labs reached out for help in solving an ever-growing problem in their legacy codebase. Their system keeps track of groups, users, other resources, and the relationships between them. It also needs to quickly handle lookups to determine which groups or users have access to a particular resource. The existing implementation was struggling to meet service-level agreements (SLAs), and it was ill-suited for future requirements such as role-based access control (RBAC).

As consultants, we strive to provide solutions that scale to meet and exceed our client’s projected needs, and to deliver them in the shortest possible time. During our research into solutions for this problem, we came across the paper for Google’s Zanzibar and it stood out as a strong candidate as it supports a variety of permissions models, achieves high throughput with low latency, and has been battle-tested by many of Google’s largest services. It has also been utilized by Google, for handling authorization for “…hundreds of its services and products including; YouTube, Drive, Calendar, Cloud and Maps.”

Seeing the merit in Zanzibar, our client agreed to an evaluation of it. We needed to ensure that it can satisfy their SLAs, work on all three major operating systems (Linux, macOS, and Windows), and can handle millions of relationships while responding to any lookup request in under one second.

In researching open-source implementations of Zanzibar, we found OpenFGA, Ory Keto, and Permify. Each of these appeared to meet our needs, but we decided to first evaluate OpenFGA as their documentation seemed well-written and their team responsive to questions.

Evaluation

Both OpenFGA and its CLI provide methods of installation which support Linux, macOS, and Windows. Their respective repository releases pages provide binaries for Linux and Windows, and installation is also available for all three operating systems VIA go install.

Testing the installation steps for each operating system was straightforward and easy. What was more difficult was determining the throughput of each major operation in OpenFGA, as well as determining if it can consistently respond to complex lookup requests in under one second. For this, we needed to develop simple, repeatable benchmarks for each major operation: Relationship Creation, Deletion, Direct Lookup, and Indirect (Transitive) Lookup.

Following past experience, we decided to utilize Docker, Maven, the Java Microbenchmark Harness (JMH), and the OpenFGA Java SDK to develop our benchmarks. A major benefit to this approach is that we would gain valuable experience with the SDK, which would later speed-up development if OpenFGA were chosen as a solution to our client’s problem.

Relationship Modeling & Validation

With our tools decided upon, the next step was to create a version of our client’s authorization model within OpenFGA. An authorization model can be defined either through Zanzibar-compliant JSON or OpenFGA’s own DSL. After some experimentation, we found it difficult to write and reason-about the JSON syntax when compared to that of the DSL. For this reason, we decided to write our models using the DSL.

As a whole, OpenFGA’s documentation is quite good in comparison to other projects that we have worked with. Their Modeling Guides are well-written and we wrote our initial test model, shown below, without much trouble. The most difficult part of writing the model was figuring out how indirect (transitive) relationships work; the documentation around them could use additional examples and explanation.


model
  schema 1.1

type user

type group
  relations
    # Direct subgroups of this group.
    define child: [group]

    # Direct subgroups of this group, and all subgroups of those subgroups.
    define descendant: child or child from child

    # Direct users of this group, and all users of its subgroups.
    define member: [user] or member from child

type report
  relations
    # Users and groups that can access this report.
    define accessor: [user, group, group#member, group#descendant]

Regardless of what you believe about your model, you must test and validate your assumptions. OpenFGA’s CLI is invaluable for this, and we highly recommend using it for model validation. For the test model above, we approached validation by creating a diagram of test scenarios and the commands required to setup and execute them. This approach worked well for our small model, and it would be interesting to try and apply it to a larger, more complex model.

A visual representation of the OpenFGA model, including notes on how to manually test it.
A visual representation of the OpenFGA model, including notes on how to manually test it.

Performance Benchmarking

We need to benchmark each major operation (Relationship Creation, Deletion, Direct Lookup, and Indirect (Transitive) Lookup) to estimate their throughput and to ensure that the system can handle millions of relationships while responding to any lookup request in under one second.

A direct lookup is used to check if a group or user has a direct relationship with a resource, and an indirect (transitive) lookup is used to check if a group or user has a relationship with a resource through their relationship with a group that they belong to.

The throughput benchmarks are straightforward and can be found here. In general, each of them does a small amount of setup and teardown with a call to the OpenFGA API in-between. We used the JMH’s default settings, and the results are as follows:

Benchmark Node Count Score Error Units
Relationship Creation thrpt 25 552.711 ±97.894 ops/s
Relationship Deletion thrpt 25 613.854 ±32.272 ops/s
Direct Lookup for Existing Relationships thrpt 25 1652.408 ±24.617 ops/s
Direct Lookup for Nonexistent Relationships thrpt 25 1685.459 ±19.937 ops/s
Indirect (Transitive) Lookup thrpt 25 1522.859 ±21.139 ops/s

These results were more than satisfactory for our client’s needs, and they readily accepted our recommendation to use OpenFGA in developing a replacement for their legacy system. We encountered no further difficulties, save for the few mentioned in the Miscellaneous section below, in writing the new system. It also exceeded expectations when tested in a production-equivalent environment.

Miscellaneous

API Rate Limits

While updating the test suite from version 0.3.x to 0.8.x of the OpenFGA Java SDK, a seemingly undocumented change was made to the way that writes are handled in the OpenFGAClient. This change enforces the documented limit where the “…Write API allows you to send up to 100 unique tuples in the request.”, which was previously disregarded. This caused issues in the helper functions, which were easily resolved by updating our calls to follow the documented limit. Previously, we were sending up to a thousand tuples in a single request.

DSL Ingestion

The OpenFGA API does not support ingestion of its own DSL, so you cannot use it when creating an authorization model VIA the API. You can use the OpenFGA CLI to convert your model from OpenFGA’s own DSL into a Zanzibar-compliant JSON file with fga model transform, and then send the generated JSON to the API as necessary. We found this to be an odd design choice, as OpenFGA should be able to automatically detect the input format and internally convert the DSL to JSON for ingestion as necessary.