December 7, 2014

Functional vs Object-oriented approaches to validation

As demonstrated in Python

Functional programming is often described in terms of its contrast with object-oriented programs; that is, you write functions that act on data instead of objects that wrap data and use methods to act on themselves. Functional programming wonks (like me) will tell you that writing code this way is generally better than OO, but I don’t want to do that (right now).

However, in this post, I’m not here to argue either side. Today, I’m just going to demonstrate a few equivalent approaches to the same problem: validating data.

Say we want to write a form-validation-and-cleaning routine. We are given an incoming data structure, and must apply a set of rules to it, returning the (cleaned) data structure if everything went well, and an itemized list of errors if not. We don’t want to short-circuit our code; if 2 fields are incorrect, we want to know about both of them.

Since we’re working in a language that expects exceptions, we’ll allow our external interface to use them. So, for all examples, we’ll define a function validate_form that accepts our data and a list of validators, raises an exception containing all the errors if there are any, and returns the data otherwise.

OO Implementation

Let’s start with an example object-oriented approach, which I happen to think does the job ok:

class ValidationError(Exception):
    def __init__(self, message, errors={}):
        self.errors = errors


class Validator(object):
    def __init__(self, field_name):
        self.field_name = field_name

    def validate(self, data):
        raise Exception('Please implement validate')


class ValidatorSuite(Validator):
    def __init__(self):
        self.validators = []

    def add_validator(self, validator):
        self.validators.append(validator)

    def validate(self, data):
        errors = {}
        for validator in self.validators:
            try:
                data = validator.validate(data)
            except ValidationError as e:
                errors[validator.field_name] = e.message

        if len(errors) > 0:
            raise ValidationError("Validation failed", errors=errors)

        return data


# Specific Validators

class NonBlankValidator(Validator):
    def validate(self, data):
        s = data.get(self.field_name)
        if not isinstance(s, str) or len(s) == 0:
            raise ValidationError(
                "Field '{}' must not be blank".format(self.field_name))

        return data

class DefaultValidator(Validator):
    def __init__(self, field_name, default):
        self.field_name = field_name
        self.default = default

    def validate(self, data):
        data[self.field_name] = data.get(self.field_name, self.default)
        return data


# Our public API

def validate_form(data, validators):
    suite = ValidatorSuite()
    for v in validators:
        suite.add_validator(v)
    return suite.validate(data)

This is pretty good; let’s see if rewriting this code in a functional handling style saves us any trouble, and then we’ll have some comparative discussion.

The reason functional programming folks don’t like exceptions is that they really wreak havoc on the flow of execution. We’d rather assemble our program out of defined functions that accept values and return values, and never jump around. Djikstra had some strong words for goto blocks in C because they make it more difficult than necessary to follow the flow of data through your program; the exact same thing is true of exceptions.

class Failure(object):
    def __init__(self, errors):
        self.errors = errors


# Specific Validators

def non_blank_validator(field_name):
    def validate(data):
        if not isinstance(data, str) or len(data) == 0:
            return Failure({
                field_name: "Field '{}' must not be blank".format(field_name)
            })
        return data
    return validate


def default_validator(field_name, default_val):
    def validate(data):
        data[field_name] = data.get(field_name, default_val)
        return data
    return validate


def validation_suite(validators):
    def validate(data):
        errors = {}
        for v in validators:
            val = v(data)
            if isinstance(val, Failure):
                errors = dict(val.errors, **errors)
            else:
                data = val
        if len(errors) > 0:
            return Failure(errors)
        return data
    return validate


# Our public API

def validate_form(data, validators, return_error=False):
    val = validation_suite(validators)(data)
    if isinstance(val, Failure):
        if return_error:
            return val
        else:
            raise ValidationError("Validation Failed", errors=val.errors)
    return val

In the OO example, validators were classes with a constructor that accepted the values required, and a method validate than actually performed the validation. In the functional version, validate is a closure around the required values, with a consistant signature. If you’re not used to writing code with closures you might not like the style I’ve chosen for the validators, but I don’t find it difficult to read or understand.

The main thing we’ve introduced here is Failure, a sort of flag value. Our only constraint on validator functions is that they must return an instance of Failure if they fail. This removes the need for us to raise exceptions. However, we can take this further.

Another Functional Style

This one has a twist, but I’ll save the reveal until after. Here’s the code:

class ValidatedData(dict):
    def __init__(self, data=None, errors=None):
        self['data'] = data or {}
        self['errors'] = errors or {}

    def run(self, *validator_fns):
        result = self
        for fn in validator_fns:
            result = result.merge(fn(result['data']))
        return result

    def merge(self, other):
        self['data'] = dict(self['data'], **other['data'])
        self['errors'] = dict(self['errors'], **other['errors'])
        return self


def success(data):
    return ValidatedData(data=data)


def fail(field_name, error):
    return ValidatedData(errors={field_name: error})


def non_blank_validator(field_name):
    def validate(data):
        s = data.get(field_name)
        if not isinstance(s, str) or len(s) == 0:
            return fail(field_name, "Field '{}' must not be blank".format(field_name))
        return success(data)
    return validate


def default_validator(field_name, default_val):
    def _inner(data):
        data[field_name] = data.get(field_name, default_val)
        return success(data)
    return validate


# Public API

class ValidationError(Exception):
    def __init__(self, message, errors={}):
        self.errors = errors


def validate_form(data, validators):
    result = ValidatedData(data).run(*validators)
    if len(result['errors']) > 0:
        raise ValidationError("Validation Failed", errors=result['errors'])
    return result['data']

In this example, our validators accept a raw dict as before, but return a wrapped object we’ve called ValidatedData. ValidatedData is (in effect) a monad, with functions that return monadic values and run filling in for bind (I didn’t feel the need to be strict about the semantics in Python). But don’t worry, the code still works if you don’t know that.

I prefer the way the monad works over both of the validation suite functions above. We’ve abstracted away all that business in favor of something more generic. I also felt clever for inheriting from dict, but that’s not really necessary.

Overall this came out a bit longer than the other functional version, but only just. Most of that is the explicit success and fail functions, which I think became necessary as our expected return value became more complex.

Comparison

Talking is all well and good, but let’s compare some situations where we want to work with our code.

Writing a new validator

Let’s look at what it takes to add a validator. We’ll skip the actual implementation of the fiddly bits so we can just look at the patterns and differences side-by-side.

# Common functions. TODO: implement

def email_valid(email):
    return True


def email_domain_equals(email, domain):
    return True


# OO-Style

class EmailValidator(Validator):
    def __init__(self, field_name, domain):
        self.field_name = field_name
        self.domain = domain

    def validate(self, data):
        email = data.get(self.field_name)
        if not email_valid(email):
            raise ValidationError("Invalid email address.")
        elif not email_domain_equals(email, self.domain):
            raise ValidationError("Email must have domain {}".format(self.domain))
        return data


# Functional Style

def email_validator(field_name, domain):
    def validate(data):
        email = data.get(field_name)
        if not email_valid(email):
            return Failure({field_name: "Invalid email address."})
        elif not email_domain_equals(email, domain):
            raise Failure({field_name: "Email must have domain {}".format(domain)})
        return data
    return validate


# Monadic Style

def email_validator(field_name, domain):
    def validate(data):
        email = data.get(field_name)
        if not email_valid(email):
            fail(field_name, "Invalid email address.")
        elif not email_domain_equals(email, domain):
            fail(field_name, "Email must have domain {}".format(domain)})
        return data
    return validate

Not much changed here. The Monadic version benefits from the addition of the fail function, but it’s basically equivalent to the OO version. The Class-based validator must remember to store the incoming values in the constructor, which is something that the other two don’t need to worry about – in that way, I think the functional versions are a bit simpler (provided you’re comfortable with first-class functions, of course).

Running a Suite (without the external function)

Let’s take a look at that code side-by-side:

# OO Version
def validate_form(data, validators):
    suite = ValidatorSuite()
    for v in validators:
        suite.add_validator(v)
    return suite.validate(data)


# Functional Version
def validate_form(data, validators):
    val = validation_suite(validators)(data)
    if isinstance(val, Failure):
        if return_error:
            return val
        else:
            raise ValidationError("Validation Failed", errors=val.errors)
        return val


# Monadic Version
def validate_form(data, validators):
    result = ValidatedData(data).run(*validators)
    if len(result['errors']) > 0:
        raise ValidationError("Validation Failed", errors=result['errors'])
    return result['data']

I could complain about the way the suite uses the add_validator pattern, but that would be pretty disengenuous given that I wrote it. Honestly, since the OO version matches the spec here we set out from the get-go, I’d have to give it the edge. But wait!

Nested Validation

This should be fun. Let’s say that we want to validate that data['person']['name'] is not blank.


# Object-Oriented

class SuiteValidator(Validator):
    def __init__(self, field_name, suite):
        self.field_name = field_name
        self.suite = suite

    def validate(self, data):
        try:
            data[self.field_name] = self.suite.validate(data.get(self.field_name, {}))
        except ValidationError as e:
            raise ValidationError(errors=e.errors)
        return data

suite = ValidatorSuite()
suite.add_validator(NonBlankValidator('name'))
suite.add_validator(NonBlankValidator('email'))

outerSuite = ValidatorSuite()
outerSuite.add_validator(SuiteValidator('person', suite))

try:
    print outerSuite.validate({'person': {'email': '[email protected]'}})
except Exception as e:
    print e.errors


# Functional

def nested_validator(field_name):
    def validate(data):
        suite = validation_suite([
            non_blank_validator('email'),
            non_blank_validator('name'),
        ])
        result = suite(data.get(field_name, {}))
        if isinstance(result, Failure):
            return Failure({field_name: result.errors})
        return result
    return validate

validation_suite([nested_validator('person')])({'person': {'email': '[email protected]'}})
# => Failure(errors={'person': {'name': "Field 'name' must not be blank."}})


# Monadic

def nested_validator(field_name, validators):
    def validate(data):
        result = ValidatedData(data.get('person')).run(*validators)
        if len(result['errors']) > 0:
            return fail(field_name, result['errors'])
        return success({field_name: result['data']})
    return validate

ValidatedData({'person': {'email': '[email protected]'}}).run(
    nested_validator('person', [
        non_blank_validator('name'),
        non_blank_validator('email')
    ]))

# {'errors': {'person': {'name': "Field 'name' must not be blank"}}, 'data': {'person': {'email': '[email protected]'}}}

I like the monad best again – all the nested validator has to do is unpack the returned monad and construct a returned one.

Note that the OO code couldn’t be made to do this without changing the implementation of ValidationSuite to collect the errors properly. This is a bit of a self-serving point, so take it as you will, but I think it shows that the functional options are a bit more generic/flexible (even if the OO version could be refactored pretty easily). It wasn’t on purpose, honest! So right now, the OO version only remembers one error for each nested field.

Words of warning

All of these techniques will work in any language with the following features:

  • Classes (or typeclasses or objects)
  • Exceptions
  • First-class functions

So, Python, Ruby, Javascript & friends, Java 8, Scala, Clojure (naturally), C#, F#, Caml, and many many more.

However, as is always the case using functional techniques in not-necessarily-functional-languages, you should exercise caution. Whether you’re writing an open-source project or working with a team, you need to be sure that your code fits the contextually-appropriate definition of “idiomatic”. And if you’re writing a library, at the very least you should assure that it can be used in the common way – this is why all of the above contains an interface that throws an exception.

I don’t believe that any of the above implementations are too strange to qualify as idiomatic python, but your mileage may vary. Do you have a tale of stylistic culture clash?