On the Proper Care and Feeding of REST APIs, Part II

In the first part of this article we introduced the choice of tools used to build our API. In this second part I’ll go into the details of how we used these tools, what challenges we faced, and how we overcame them.

Apiary and API Blueprint

In addition to the features of Apiary mentioned in the previous post, Apiary’s API Blueprint (APIB) format was another big win for documenting our API. APIB is a Markdown extension that adds routes, query parameter definition and request/response definition to Markdown, allowing for APIs to be defined using a relatively terse and yet human readable syntax.

APIB also allows for prototyping the data structure and relationships directly inside the documentation itself. By using MSON - Markdown Syntax for Object Notation - request and response objects can be reused across multiple endpoint, removing the grunt work of creating example data for the documentation. At the same time, this also forces me to consider what data the consumer of the API should have access to, which improved the API design.

Here’s an example of the Agenda object, used to represents the event the agenda is for:

### Agenda
+ name: SomethingConf Asia 2016 (string, required) - Name of the event
+ description: The greatest SomethingScript conference under the sun! (string) - Long description of the event 
+ location: Suntec City Hall 2A (string) - Where the event is happening
+ published: false (boolean, default) - Is the agenda publicly visible
+ icon (number) - ID of uploaded image from the upload API endpoint
+ duration: 3 (number) - Number of days the event will go on for 
+ start_at: `2016-08-22` (date) - The date on which event is starting, represented as an ISO 8601 date string 

This is then plugged into a route, like so:

## POST /agendas

Creates a new event agenda under the currently logged in user. 

+ Request (application/json)
    + Attributes (Agenda)

+ Response 200 (application/json)
        + Attributes (Agenda Item) 

As you can see, APIB reads very similarly to plain Markdown, and maintains the original simplicity of the language while still being expressive enough to easily define most of our API routes.

With all that said, Apiary is not perfect. Here are some of the problems faced, and how we resolved them.

Memory Leak

The documentation editor has a massive memory leak problem that forced me to turn off the live preview and refresh the page every half hour to ensure the browser does not eat up all of my computer’s RAM. When contacted, Apiary’s customer support staff suggested I use a third-party editor instead instead.

Request/Response Representation

MSON is not perfect either. When writing the API, at first I naively used the same data structure for both request and responses. However, I quickly realized this won’t work - the response data format are similar but not exactly the same as the request format. Response objects will of course carry with it the object’s URL - this is a REST API, after all - as well as other metadata like creation date and computed fields like the event’s end date, computed from the event’s start date and duration.

My solution to this was to create a mixin data structure containing the response data common to all objects, then mix it with each of the request data structures using the Include keyword to create the response data structure. This is the one for the Agenda object seen above:

### Item
+ id: 1 (number) - the ID of the object
+ url (string) - the API URL representing the object 
+ created_at (date) - the date at which this object was created, represented as an ISO 8601 timestamp string 
+ updated_at (date) - the date at which this object was last updated, represented as an ISO 8601 timestamp string 

### Agenda Item (Agenda)
+ Include Item
+ slug (string) - a slug generated from the agenda name, guaranteed to be unique
+ icon (string) - URL to an image of the icon 
+ end_at (date) - When the event is ending, computed from duration

Notice how by repeating the icon field in the child object, we can have request and response data structure use different representations for the same field. In this case the icon field takes in an ID, since the API allows attachments to be uploaded asynchronously, but are returned to the consumer as a URL string to the image. This should of course be used sparingly, but it is very useful in cases like this.

Authentication and Reused Headers

Describing authentication, and other common headers between sets of routes is also far harder than it needs to be. Ideally there should be some way to define set of reusable headers like request/response objects. Because this does not exist, and I did not want to copy and paste the same set of authorization headers across the entire API, headers were left out of most routes. Unfortunately this also meant that I could not run Dredd, the automated API conformance checker, that would have helped me keep the API documentation up to date.

Django and Django REST Framework

Whereas a normal Django application would use Forms to process user input and Views with Django templates to render the response, Django REST Framework uses Serializers to transform models into the data structure fit for the API and receive input from the consumer, then use DRF’s APIView classes to render the response. The view class is designed specifically to be configurable by simply plugging in different Permission, Authorization, Throttle, Parser and Renderer classes, and DRF itself comes with a set of generic view classes for common use cases, like ListCreateAPIView and RetrieveUpdateDestroyAPIView.

In addition, DRF also has ViewSet and Router classes similar to Rails Controllers that uses resource verbs (list, create) instead of HTTP verbs (GET, POST) as is conventional in Django views and allows for easier route-view binding. All of these takes care of a lot of the repetitive parts of building the API, making it easy to construct a large API like ours in a very short period of time.

Testing

One of the best decisions I’ve made while building the backend is to adopt Test-Driven Development. No, I’m not joking here - setting up tests with DRF is relatively painless, and it ensured the API behaved according to the specifications set out in the documentation, as well as giving me confidence when I do refactoring when the requirements for the API changed.

This also allowed the backend to be developed separately from he frontend, as the tests stand in for the consumer. Our frontend was mostly complete before a single line of frontend code was written, but I can still be confident it has been coded to specifications because each endpoint has already been exercised.

The tests uses DRF and Django’s test client, which I used to simulate requests made to the server. This allowed me to treat each API endpoint as a black box to conduct testing on.

Representing relationships

Relationships are always difficult for APIs to handle. Django REST Framework provide a number of ways to represent relations, including nested objects, by primary key (ID) or hyperlinks (URLs), but which one to use is a matter of design.

  • Nested objects
    • Pro: No additional calls or lookups needed to retrieve information about related objects
    • Con: Duplicate information sent, especially for list endpoints
  • Primary key
    • Pro: Most compact out of all representations
    • Con: Not RESTful - client need to construct URLs from the client side, and require additional lookup or queries.
  • Hyperlinks
    • Pro: RESTful - client do not need to construct URLs, which insulates them from changes in the URL structure of the API
    • Con: Much less compact representation than primary key

The original design of the API was for list endpoints (routes that return many objects) to return related objects as primary keys, and for retrieve endpoints (routes that return single objects) to return related objects as nested objects. This avoids the problem of sending duplicate information for list endpoints, but still ensured a single request will return all information about a single object.

Unfortunately this approach is not well supported by DRF, as there is no easy way for a serializer to return related fields in different representations depending on whether the serializer is used to serialize a QuerySet of one or many objects. The major annoyance was with the documentation, which suggested using the depth property, but this does not work when using custom serializers to represent related fields. The frontend developers also did not liked that the API used two different representation dependent on the situation.

The second approach was to use primary keys for all relations. We send down all information related to an event - sessions, tracks, speakers, categories, etc. down in a single request, and all related fields are represented using primary keys, such that no single object is nested more than two levels deep.

## GET /{agenda_id}

Returns a specific event's aganda, including sessions and all other information associated with the 
agenda. This information can be used by both the agenda viewer and agenda builder. 

Unpublished agendas, and all subresources under the agenda like sessions and speakers 
require authentication to fetch. They will return 404 if no token is provided. 

+ Response 200 (application/json)
    + Attributes (Agenda Item)
        + sessions (array[Session Item])
        + speakers (array[Speaker Item])
        + tracks (array[Track Item])
        + session_venues (array[Venue Item])
        + categories (array[Category Item])

This keeps the object structures simple and uniform, but requires a larger download upfront for the client. Thankfully because this is all text which compresses well, the initial download can be completed within an acceptable amount of time. This can be further reduced in the future if the data can be injected into the HTML on the server-side, or even better, with server-side rendering. Modern networks, especially mobile networks have high bandwidth and high latency, which means a single large response is preferable to many small response.

Removing empty fields

Django REST Framework returns empty fields - many-to-many relationships that are empty produce empty arrays, empty char fields produce empty strings and so on. This is problematic because most fields in our API are optional, so these empty fields adds significant amount of dead weight to the response from the server.

The solution was to extend the base ModelSerializer class’s to_representation method which is called to transform the model to a dictionary, which can then be encoded in JSON (or any other desired representation).

class BaseSerializer(ModelSerializer):
    def to_representation(self, instance):
        obj = super().to_representation(instance)
                
        # Filter out empty strings and arrays 
        filtered_obj = OrderedDict()
        for key, value in obj.items():
            try:
                if value is not None and len(value):
                    filtered_obj[key] = value
            except TypeError:
                filtered_obj[key] = value

        return filtered_obj