The MOIST Principle for GraphQL Schema Design

Sorry for the title, I had to.

I've recently seen a slight push for DRY (Don't Repeat Yourself) and the importance of sharing types as much as possible when building GraphQL schemas. This is something I hear a lot, and I've even seen some linting rules being created to enforce this principle.

I have to admit my experience and mistakes have made me a WET (Write Everything Twice) advocate when it comes to schema design. In this post, I'll explain why that's the case, and why it might be important to gain the full benefits of GraphQL.

WET seems to lack nuance; there are certainly parts of the schema that should share types when it makes sense. Instead, I present to you the MOIST principle: Mitigate the Overuse of Illusory Shared Types.

APIs are Hard to Change

At first glance, sharing types as much as possible seems like a great idea in GraphQL:

  • Consistency for free across different entry points.
  • Collaboration is almost "forced."
  • Easy to maintain, changing things only in one place.

These are all great points to consider. They're undeniable.

The WET vs. DRY battle is not new and is more than API schemas. There's one key difference between API schemas and your typical statically typed programming languages that I think should make everyone err on the side of caution when it comes to sharing and reusing types: How easy it is to change after the fact.

Changing API schemas is usually much harder than refactoring a program. The network boundary and how decoupled callee and caller are means that changing types requires long deprecation cycles, usage analysis, reaching out, running brownouts, keeping backward compatibility, etc. Anyone who's managed a large API surface knows how arduous changing an API that is currently being used is.

All that to say, when it comes to network APIs, mistakes about what should be shared/not are much more costly, so I like to default to writing twice rather than sharing.

Identical Type Shape != Same Type

One particularly easy mistake to make early on in API development is to assume that because two response "shapes" are identical, they should share a single type. The most infamous example of this is probably User types. I can't tell you how many times I've seen teams having to roll back changes/deprecate their initial structure due to sharing a User type all over the place.

For example, it used to be common to have a viewer root field in GraphQL. This would represent the currently logged-in user. Then, you may have an Organization type that has a list of Team, and Team with a list of User. You may also have a User type used when looking at a profile.

At first, all these users may look the exact same:

type User {
  name: String!
  profilePictureUrl: URL!
}

After a while, most have the unfortunate realization that these were, in fact, three distinct concepts of the domain. The viewer field starts wanting contextual fields about the viewer, which make no sense in other contexts. The team users start having contextual team-specific information, and a profile has data that can only be seen by the logged-in user. Our preferred schema would have looked like this:

type Viewer {
  name: String!
  profilePictureUrl: URL!
  hasUnreadNotifications: Boolean!
}

type UserProfile {
  name: String!
  profilePictureUrl: URL!
  paymentPlan: PaymentPlan!
}

type TeamMember {
  name: String!
  profilePictureUrl: URL!
  memberSince: Date!
}

(I'm deliberately skipping the fact team members should probably use an edge/node pattern, but the principle remains the same)

Keen readers might be bothered by the repetition of name and profilePictureUrl. I'm not bothered by this at all when it comes to a few fields. Fields are relatively cheap in GraphQL, and the maintenance cost can always be reduced by typical code techniques rather than simplifying the schema. But if it starts bothering us too much, I suggest using composition instead:

type Viewer {
  userData: UserData!
  hasUnreadNotifications: Boolean!
}

type UserProfile {
  userData: UserData!
  paymentPlan: PaymentPlan!
}

Strange authorization problems are a common symptom of this problem. If the profile of the logged-in user shared a User, and only the logged-in user could see the paymentPlan field, this authorization logic now becomes a runtime problem. Clients using the API have no idea in which contexts they can select paymentPlan. GraphQL being more static/declarative, this is incredibly annoying. This forces us to make paymentPlan nullable, with a description explaining that context. Using a specific type like UserProfile can mean a stronger schema, and we can guard authorization at places that make more sense, like on a viewer field, for example.

Can We Be BFFs?

A common use of GraphQL is to allow a multitude of clients to be served with more specificity than with a "One-Size-Fits-All" API. Many (including me) have compared it to the Backend For Frontends pattern.

Although they share many similarities and goals, a single GraphQL server can't ever quite replace the full flexibility of a BFF architecture. BFFs allow for completely different behaviors per client, different serialization/API styles even. This comes at a cost, like consistency, for example, which GraphQL can definitely help with.

My current thinking is that it's best to see GraphQL as an approach somewhere in between a BFF architecture and an OSFA approach to gain most benefits. If you're OK with a low cardinality of server-driven use-cases, a well-designed REST/endpoint-based API with an OpenAPI specification is pretty great. If you require absolute flexibility and client independence, it's hard to argue against a BFF approach. GraphQL is for those of us who are attempting to stay in that sweet spot, powering a bit of flexibility per client, while maintaining some sort of server-driven consistency and modeling of our domain.

I swear I'm going somewhere with this. It's very easy to turn GraphQL into something that resembles a One-Size-Fits-All API. Sharing types across the board and attempting to reduce all use cases to "one true type/fields" can quickly do that. To harness most of the power of GraphQL, I believe we need to find that sweet spot, which may mean allowing certain client-specific fields, different ways of representing use-cases, variability in types, etc. Never forget that GraphQL helps us do this without adding any overhead to other clients; we should be using that power.

An OSFA GraphQL API still provides the capability of selecting subsets of the API surface, which may be enough for simple client differences like a mobile phone displaying the same, but less data than a desktop client. In practice, differences might go further than a subset/superset relationship and require different fields altogether. We should not fear this but rather embrace it and be thankful GraphQL helps us manage this complexity.

Stay MOIST

I hope you don't take away from this post that I never recommend looking for opportunities to share types/logic. Far from the case. However, I do think we shouldn't make that an explicit goal. Chances are new types and fields are cheaper than you think, and your clients will thank you for them.

Thanks for reading!

Why, after 8 years, I still like GraphQL sometimes in the right context

A recent post, Why, after 6 years, I’m over GraphQL, made the rounds in the tech circle. The author argues that they would not recommend GraphQL anymore due to concerns like security, performance, and maintainability. In this post, I want to go over some interesting points made, and some points I think don't hold up to scrutiny.

Always be Persistin'

Ok, first of all, let's start with something maybe a little bold: Persisted Queries are basically essential for building a solid GraphQL API. If you are not using them, you're doing GraphQL on hard mode. It's not impossible, but it leads to difficult problems, some of them discussed in the post. After 8 years of GraphQL, this has only gotten more and more important to me. Persist all queries, as soon as possible in your GraphQL journey. You'll thank yourself later.

It is a little sad that this is basically glossed over and mentionned only in a small note at the bottom:

Persisted queries are also a mitigation for this and many attacks, but if you actually want to expose a customer facing GraphQL API, persisted queries are not an option.

I assume the author is talking about public APIs here. While I don't think this is necessarily inherently true (One could ask customers to register queries first, figuring out the DX for this would be an interesting task), it's still a valid point. That's why We Don’t See Many Public GraphQL APIs out there, and why I would not pick GraphQL if I were to expose a public API today.

For a public API, a coarser-grained, resource-based API works great, and can be described through OpenAPI. SDKs can be generated through amazing tools like Kiota. It's hard to beat a well-made SDK for a public API, and in my experience, that's actually what customers expect and want to use. Moving on.

Haxors

The author's first point is about GraphQL's allegedly bigger attack surface. Again this focuses more on completely public GraphQL APIs which are relatively rare:

exposing a query language to untrusted clients increases the attack surface of the application

I think that's right, it's hard to argue about this, hence not exposing a query language to untrusted clients unless you're ready to handle the trade-offs. But let's see what they think is hard to get right here.

Authz is really hard...

Authorization is a challenge with GraphQL. The thing is it's almost always challenging no matter what API style you use. I'd go as far as saying it is a challenge with designing software in general. The example given in the post actually highlights this very well:

query {
  user(id: 321) {
    handle # ✅ I am allowed to view Users public info
    email # 🛑 I shouldn't be able to see their PII just because I can view the User
  }
  user(id: 123) {
    blockedUsers {
      # 🛑 And sometimes I shouldn't even be able to see their public info,
      # because context matters!
      handle
    }
  }
}

handle and email both have different authorization rules. This is actually quite tricky to handle with a GET /user/:id endpoint as well. There's really nothing that makes GraphQL harder here. Yes when fine-grained authorization is needed, you'll need fine-grained authorization checks at that level.

  user(id: 123) {
    blockedUsers {
      # 🛑 And sometimes I shouldn't even be able to see their public info,
      # because context matters!
      handle
    }
  }

This part is interesting as well and another challenge of authorizing code and models in general. The same "object" accessed through different contexts can actually have different authorization rules. Again, this is common in all API styles as well. That's why I actually usually advise designing this through a different model entirely, since it's likely they'll evolve differently. Here this is even possibly an API design mistake instead. If handle should never be seen for blockedUsers, then it shouldn't even be part of the schema here. This is a super common mistake where folks try to reuse common models/types instead of being specific.

After 8 years of GraphQL, I realize more and more that authorization is much more than a GraphQL problem. The API layer part is easy, but most code-bases are much more complex and must guard against unauthorized access in much better ways. Companies like oso and authzed and two great examples of how to do this well but also how complex this thing can be in general.

Demand Control

The section on rate limiting starts with this:

With GraphQL we cannot assume that all requests are equally hard on the server.

Let me fix this to something I think is more accurate:

We cannot assume that all requests are equally hard on the server.

There, much better. The truth is that no matter what API style you use, whether that's a binary protocol over UDP or GraphQL, it is extremelly rare, especially as the API surface grows, that all use-cases and "requests" will be equally expensive for a server to process.

A very easy example to show this is simply a paginated endpoint:

GET /users?first=100
GET /users?first=1

Or super expensive mutations:

POST /expensive-creation

To be 100% fair, of course GraphQL exposes this problem a bit more, earlier on, especially when not using persisted queries (Which should not happen!!). And while folks building a small RPC API may not need to implement variable rate limiting or some sort of cost categories, they almost always end up having to.

And again: this focuses on public, unauthenticated public APIs. I think we can agree this is not GraphQL's Sweet Spot.

The rest of the section shows how simple it is to rate limit a simple rest API. Sure? I have never had the chance to work on an API that was that easy to implement demand control for.

Performance

The performance section focuses mainly on dataloader and n+1s. I think the author makes some good points here. It's true that a GraphQL API must be ready for many query shapes and use cases. It is wise to implement efficient data loader for most fields through dataloaders. In fact, that's why I don't recommend using datafetching techniques that are overly coupled to the query shape, things like AST analysis, lookaheads, and things using context from sibling or parent fields.

The author acknowledges that this is a problem with REST as well, but still makes this statement:

Meanwhile, in REST, we can generally hoist nested N+1 queries up to the controller, which I think is a pattern much easier to wrap your head around.

Again this is an extremelly simple example, which has a trivial solution. But it's true, an endpoint based API that is simple can usually be kept simple, rather than being part of multiple other use-cases that could affect its performance long term.

Overall I think dataloader is a requirement for GraphQL, and I agree that it's part of the slight complexity add for a GraphQL API, even simple ones. Authorization n+1s are also an issue.

Again, this problem simply does not exist in the REST world.

But that's simply not true. Authz n+1s exist everywhere including in REST given a sufficiently complex API. Performant authz is a problem of its own.

Coupling

In my experience, in a mature GraphQL codebase, your business logic is forced into the transport layer. This happens through a number of mechanisms, some of which we’ve already talked about:

That's of course an observation from the author's experience, but in general, this couldn't be further from the truth. Hell, this is even specifically stated on GraphQL's website:

business logic

If one ends up with coupling it's because there's a tendency to couple business logic with the transport layer in general. But it's not like GraphQL encourages you to do so. It actually does the opposite.

Solving data authorisation leads to peppering authorisation rules throughout your GraphQL types

Again, even the website tells you not to do this: Delegate authorization logic to the business logic layer.

Solving resolver data fetching N+1s leads to moving this logic into GraphQL specific dataloaders

I actually agree with this one. Data-loading in a performant way can be in tension with keeping things in reusable "business logic" units. I think this is a challenge when it comes to implementing a GraphQL API.

Leveraging the (lovely) Relay Connection pattern leads to moving data fetching logic into GraphQL specific custom connection objects

True, similar to the point above.

Breaking Changes??

Probably the strangest sentence in the post is this one:

GraphQL discourages breaking changes and provides no tools to deal with them. This adds needless complexity for those who control all their clients, who will have to find workarounds.

What? What API style encourages breaking changes? That's probably not a good idea. I think there's some confusion here. GraphQL encourages continuous evolution of your API. This usually relates to versioning rather than avoiding breaking changes. Instead of breaking changes, with GraphQL you deprecate schema members, and only remove them once they are not used anymore (or are ok with breaking).

Arguably one of GraphQL's most powerful tool is around continuous evolution. The fact the client declaratively selects the API surface it is interested in means we can track with very high precision how our API is used. This allows us to actually avoid breaking changes, and make removals safely on fine grained parts of the schema.

Breaking changes and deprecations suck. We all try to avoid them, and yes it's annoying for clients. But if anything GraphQL makes this easier, not harder.

Conclusion

Overall, I can feel the pain of the author when it comes to building public GraphQL APIs. It's not easy. But the post in general never really addresses a very common use-case for GraphQL, which is an internal API for known multiple clients. In this context, using persisted queries is easy, and solves a lot of the problems the author encountered in their journey. There are also a lot of problems mentionned here that are hard problems in general, and I don't always buy the fact that GraphQL makes them any harder.

After 8 years of GraphQL, I still enjoy the decoupling a GraphQL schema offers between server-side capabilities and client-side requirements, but am aware of the trade-offs when picking it as a technology.

Anyway, Persist your queries and probably don't build public GraphQL APIs unless you really know what you're doing.