Should You Respect 3rd-Party Permissions or Sync to Your Own System? The RAG Chatbot Dilemma

As more companies build RAG-based LLM applications—like chatbots that answer questions over a user's Google Drive, Notion workspace, or Jira instance—one deceptively simple question keeps surfacing.

“How do we keep this thing from leaking data?"

This question came up in a recent engineering discussion—and it’s one we hear more and more:

Let’s walk through the architecture patterns to examine the trade-offs you’ll need to consider if you’re building secure AI over third-party data.

The Problem: Filtering Retrieved Data Based on Permissions

Imagine you're building a chatbot that can summarize and answer questions over a user’s Google Drive. You’re converting the content to embeddings in a vector database and retrieving similar documents using a similarity search. Alice uses your chatbot to ask a question:

“How does the company determine raises?”

Now you need to make sure the documents the LLM sees are ones Alice is allowed to access. You want to show her the annual review process, how company and individual performance affect raises, and the evaluation criteria that apply to her role. You definitely don’t want to show her everyone’s raises for the last three years. But your chatbot can see all of that information.

Do you:

  1. Use Google’s APIs to filter results at query time, iteratively?
  2. Sync Google Drive permissions into the same place where the similarity search is happening (e.g., a vector db)?
  3. Try to mirror Google’s entire permissions logic in your own policy language (e.g., a Polar policy) and sync only the relevant metadata? (e.g., File:1 belongs to Folder:2 belongs to Drive:3)?

Let’s consider the options.

Option 1: Query-Time Filtering Using the Third-Party API

This approach is appealing because it avoids data syncing. First you retrieve similar files from your vector DB, then you call Google’s files.list endpoint to get all files a user is authorized to view, and finally you intersect the two. Hear from our engineers:

Pros:

  • No syncing or duplication
  • Simple to implement (if the API is robust)

Cons:

  • Catastrophic latency at scale (slow for thousands of files)
  • No “one-shot” queries (you're stuck iterating)
  • Only viable if the permissions API exists and is fast, centralized, and expressive (rare)

This works okay for prototypes or low-scale use cases where high latency is acceptable (e.g., the user waits a few seconds for a chatbot reply). But if you have to authorize hundreds of vectors this way, that latency adds up. When “a few seconds” starts to get closer to “30 seconds” or “60 seconds,” your users may be less forgiving.

A reason we don't do that approach for general authorization queries is that authorization is typically on the critical request path, and any authorization latency directly leads to request latency.Gabe, Oso engineer

What about extending beyond Google to other third-party stores? Not every third-party store has well defined permissions logic and APIs like GDrive. You might not even be able to get the permissions you need for your filter.

Option 2: Sync ACLs into Your Vector DB

In this approach, you sync every ACL from Google into your vector DB as part of your ETL pipeline. Then you can filter the results locally using metadata during similarity search.

This means:

  • You're syncing the actual access control list (ACL) (e.g., which users can access which documents)
  • You're putting it directly into the vector database (or wherever similarity search runs)
  • When results are retrieved, you’re filtering them locally using those ACLs

This keeps both the data and permission metadata in the same place — fast, but might not scale well for large ACLs.

Pros:

  • Fast queries with no round-trips
  • All permissions logic handled locally

Cons:

  • Massive sync burden, especially for large organizations
  • Many third-party APIs don’t expose ACLs
  • ACLs are often incomplete ("group X has access", now go fetch who’s in group X)
  • Doesn’t scale well, especially when ACLs are massive (i.e., when almost but not everyone can view a particular file, folder, etc.)

This is impractical for many real-world third-party integrations. Even if you theoretically were able to limit the ACL size (e.g. filtering for particular sets of users or something) it's still difficult to do today because:

  • Many integrations simply don't have permissions APIs (e.g. Notion — surprising, right?)
  • Many integrations with permissions APIs don't expose one that offers a centralized view of both logic + data
    • e.g. the ACL they give you might say 'this "group" has "read" access' or 'this has given you "read" access to the entire folder' but you need to figure out who's in that group and what's in that folder

So you are stuck guessing who has access to what.

Option 3: Mirror Permissions Logic in Your System

In this model, you write policies (e.g., using Oso’s Polar language) that mimic the logic of the third-party system. Then you sync only the authorization-relevant metadata—folders, groups, ownership chains, etc.—into your own system and run permission checks there.

Pros:

  • Small, selective sync (just the required subset of authorization-relevant data)
  • Fast, local enforcement
  • Logic is auditable, testable, observable

Cons:

  • Hard to reverse-engineer third-party logic
  • You now own maintaining a copy of their logic
  • Third-party systems evolve and you have to keep up
  • Still doesn’t solve for all edge cases

This is what our customers are experimenting with using Oso. It's the most robust and scalable option—but also hard to build.

The Elephant in the Room: Who’s Asking?

Authorization is already a complex problem, but it becomes even harder when you don’t know who the user is. This is a common challenge in AI systems that rely on multiple upstream data sources or services, each with its own identity model: different user IDs, group structures, and naming conventions. Technically, identity resolution falls under authentication, but it’s inseparable from authorization. You can’t enforce permissions if you don’t know whom you're authorizing. And if you’re replicating the authorization logic of a third-party system, you need a way to consistently map its identities—user IDs, group names, roles—into your own framework. Without this, building a unified authorization layer is impossible.

So What Should You Do?

The truth is, there’s no one-size-fits-all answer. But here’s a framework to guide your decision:

Question If the answer is yes… Then consider…
Do you expect high latency tolerance? Yes Option 1 (Query-time API filtering)
Do you have full access to clean, complete ACLs? Yes Option 2 (Sync ACLs into the vector DB)
Do you need full control and scalability? Yes Option 3 (Replicate logic in a policy engine)
Do you care about future write/edit actions? Yes Option 3 (Start now)

What Comes Next?

Authorization for AI apps isn’t just about access control. It’s about getting the right answer, grounded in the right context, for the right user. As interest in third-party RAG integrations grows, we’re seeing teams explore different paths to that goal. Some are experimenting with rebuilding and centralizing permission logic. Others are pushing for upstream services to expose clean authorization APIs. It’s still early days, and the right approach may vary by stack, team, or use case.

And platforms like Perplexity are already talking about the next step: Actions. 

When LLMs can not only read from your docs, but also act on what they read, getting the right answer is only part of the story. You also have to be sure your LLM does the right thing

At Oso, we’re thinking hard about what authorization should look like in this next wave of intelligent apps. Have thoughts? Come join the conversation, we’d love to hear how you’re approaching it.

Want us to remind you?
We'll email you before the event with a friendly reminder.
About the author

Hazal Mestci

Developer Experience Engineer

Write your first policy