March 27, 2016

Composible Architecture: A real-world example.

Oh god not this again

I am the primary maintainer of a significant REST API written using Flask-Restful. Flask-Restful is a pretty full-featured library for creating REST frameworks with Flask. It features everything you need to manage routing to class-based resources, a library for marshalling objects to JSON format, and reqparse, an argparse-inspired library for writing input parsers.

It’s reqparse that I’m going to talk about today.

An intro to Reqparse

Reqparse has a very simple UI:

from flask.ext.restful import reqparse

parser = reqparse.RequestParser()

parser.add_argument(
  "name",
  type=str,
  required=True
)


# Within a flask request context
args = parser.parse_args()
args.get('name')  # => The "name" parameter from the request

You declare a parser, then add arguments to it. Each argument is in fact an instance of an Argument class, which receives the arguments passed to add_argument. Broadly, here’s some pseudocode explaining how reqparse functions:

class Argument(object):
    def __init__(self, name, **kwargs):
        self.name = name
        self.type = kwargs.pop('type')

    def parse(self, data):
        val = data.get(self.name)
        try:
            return self.type(val)
        except Exception as e:
            return ValueError(e.message)


class RequestParser(object):
    def __init__(self):
        self.args = []

    def add_argument(self, name, **kwargs):
        self.args.append(Argument(name, **kwargs))

    def parse_args(self):
        results = {}
        for arg in self.args:
            result = arg.parse(flask.request.json)
            if isinstance(result, ValueError):
                abort(400, result.message)
            results[arg.name] = results
        return results

This is all well and good, but if you’ve ever written any sort of validation code before, you probably noticed all sorts of edge cases that would stymie the above. The one I’ll focus on is, what of that required keyword argument?

A tale of two philosophies

The above is pseudocode, for those reasons and more, but the core of the question here is, how best to implement the above?

To tell the story out of order, reqparse’s approach (as I learned when I attempted to update it) involves adding more kwargs to Argument and RequestParser, and updating their implementations with special cases for that argument. As a consequence, here’s what the parse method of Argument looks like now:

    def parse(self, request, bundle_errors=False):
        """Parses argument value(s) from the request, converting according to
        the argument's type.
        :param request: The flask request object to parse arguments from
        :param do not abort when first error occurs, return a
            dict with the name of the argument and the error message to be
            bundled
        """
        source = self.source(request)

        results = []

        # Sentinels
        _not_found = False
        _found = True

        for operator in self.operators:
            name = self.name + operator.replace("=", "", 1)
            if name in source:
                # Account for MultiDict and regular dict
                if hasattr(source, "getlist"):
                    values = source.getlist(name)
                else:
                    values = [source.get(name)]

                for value in values:
                    if hasattr(value, "strip") and self.trim:
                        value = value.strip()
                    if hasattr(value, "lower") and not self.case_sensitive:
                        value = value.lower()

                        if hasattr(self.choices, "__iter__"):
                            self.choices = [choice.lower()
                                            for choice in self.choices]

                    try:
                        value = self.convert(value, operator)
                    except Exception as error:
                        if self.ignore:
                            continue
                        return self.handle_validation_error(error, bundle_errors)

                    if self.choices and value not in self.choices:
                        if current_app.config.get("BUNDLE_ERRORS", False) or bundle_errors:
                            return self.handle_validation_error(
                                ValueError(u"{0} is not a valid choice".format(
                                    value)), bundle_errors)
                        self.handle_validation_error(
                                ValueError(u"{0} is not a valid choice".format(
                                    value)), bundle_errors)

                    if name in request.unparsed_arguments:
                        request.unparsed_arguments.pop(name)
                    results.append(value)

        if not results and self.required:
            if isinstance(self.location, six.string_types):
                error_msg = u"Missing required parameter in {0}".format(
                    _friendly_location.get(self.location, self.location)
                )
            else:
                friendly_locations = [_friendly_location.get(loc, loc)
                                      for loc in self.location]
                error_msg = u"Missing required parameter in {0}".format(
                    ' or '.join(friendly_locations)
                )
            if current_app.config.get("BUNDLE_ERRORS", False) or bundle_errors:
                return self.handle_validation_error(ValueError(error_msg), bundle_errors)
            self.handle_validation_error(ValueError(error_msg), bundle_errors)

        if not results:
            if callable(self.default):
                return self.default(), _not_found
            else:
                return self.default, _not_found

        if self.action == 'append':
            return results, _found

        if self.action == 'store' or len(results) == 1:
            return results[0], _found
        return results, _found

I’ll take a break here to point out that this code works perfectly well, and Flask-Restful has been a great library to use. But look at some of the branching logic necessitated by those special cases. For example, this branch lowercases the value, and then has to check through this choices argument (used to specify a set of valid values the input could take).

    if hasattr(value, "lower") and not self.case_sensitive:
        value = value.lower()

        if hasattr(self.choices, "__iter__"):
            self.choices = [choice.lower()
                            for choice in self.choices]

This has all the smell of a bug borne of an obscure edge case, and indeed, we can find the commit where this branch was added.

Moving back in time to before all this: As our application grew more and more requirements of its data, a reusable “validators” library was born. It simply provides functions that can be passed into the type parameter of reqparse’s add_argument. It started out simply enough, with validators like email:

import re
EMAIL_RE = re.compile(r".+@.+\..+")

def email(s):
    if not EMAIL_RE.match(s):
        raise ValidationError("Invalid email format")
    return s

Eventually, though, it gained some additional tracks. Here’s the choice validator that was added at some point, probably by me when I couldn’t be bothered to find reqparse’s inbuilt choices option.

def choice(choices, msg='Invalid Choice'):
    def choose(choice):
        if choice not in choices:
            raise ValidationError(msg=msg)
        if isinstance(choices, dict):
            return choices.get(choice)
        else:
            return choice
    return choose


# Usage
parser.add_argument(
  'category',
  type=validators.choice(['News', 'Entertainment', 'Other']),
)

Notice how the responsibility for this validation is now moved to the type function, ensuring that any special handling of choice code is kept away from the general-purpose parsing functions? Without really intending to, the validators library grew a whole bunch of functionality paralleling reqparse’s built-ins. In parallel to reqparse’s keywords, validators gained optional, required, nullable, and validated_list.

Recently, I had to update Flask-Restful for some reason or another, and was dismayed when some of the validators stopped working. This wasn’t really anyone’s fault, just a case of my validator functions expecting some edge-case behavior that had changed. But I realized that, by now, I didn’t really need all of the functionality that reqparse offered. I replaced the RequestParser and Argument classes with some custom ones that look a lot more like the pseudocode example above than the current implementations that exist within flask-restful. It’s not an all-at-once process, but over time, all keyword-based validators are moving to function-based ones, and the Argument and RequestParser implementations keep getting simpler and simpler.

Incidentally, this is the sort of system that WTForms, another great input-wrangling library, has always used. Each Field class accepts a list of validators, which can be totally custom-written about as simply as mine above (you can even just make them closures returning functions). What WTForms provides, in lieu of extra keyword arguments, is a suite of validators covering common cases, such as DataRequired, EqualTo, etc.:

class MyForm(Form):
  email = StringField("email", validators=[DataRequired(), Email()])

But, not to be outdone in the field of keyword austerity, validators has long provided a comp method for this sort of situation:

parser.add_argument(
  'email',
  type=validators.comp(validators.required, validators.email)
)

So what’s the lesson? I don’t know, I’m not a teacher, but I guess it’s something along the lines of: Wherever possible, favor function composition for those cases where, for once, it actually makes things simpler.