March 31, 2023

One month with Github Copilot

I’ve been trialling Github Copilot for some weeks now, and so far I’ve been very impressed. It’s clear to me that this and similar tools will become a permanent part of many programmers’ toolkits (at least until the AI gets so clever that it obviates our jobs completely).

So what does it do?

Copilot is a junior developer that lives in your IDE. It lacks expert knowledge, but it knows code patterns and it’s super fast at typing, so anything basic enough for it to understand, it can help you do faster. It goes like this:

  • You type something – a function name and/or docstring, an if statement, or just a few characters.
  • After a short time, a suggestion appears in your IDE to complete the line.
  • You can press Tab to accept the suggestion, or keep typing to continue the loop.

Clearly effort has gone into usability here; this is no tech demo, this is a product, full stop. Also clearly, this workflow shares a lot with existing IDE autocompletion tool, such as Intellisense. The suggestions it gives are of a truely different category though.

Not just autocomplete

For starters, the suggestions are much larger. Intellisense will complete the token you’re typing; Copilot can suggest lines and lines of code for a short cue. A few things I’ve noticed it excels at:

Completing repeated patterns, such as switch statements or long logic blocks. This, to me, is Copilot’s bread and butter, and I’ve come to expect Copilot to complete structured switch/if-else chains after I write the first case.

For example, in one instance I was translating an enumeration of potential operations to Elasticsearch terms. After the first case (something like if operator == Operator.EQ: return {"term": {field_name: value}}), Copilot was able to complete the rest of the operators (GT/LT/GTE/LTE were translated to range, and it even got the quirky {"bool": {"must-not": [{"term": {field_name: value}}]}} workaround for the Operator.NEQ case), with some minor touchups.

def get_elasticsearch_term(operator, field_name, value):

  if operator == Operator.EQ:
    # Everything after this comment was generated by Copilot while I was
    # writing this blog post, though it had help from the above 
    # description that I'd already written.
    return {"term": {field_name: value}}
  elif operator == Operator.GT:
    return {"range": {field_name: {"gt": value}}}
  elif operator == Operator.GTE:
    return {"range": {field_name: {"gte": value}}}
  elif operator == Operator.LT:
    return {"range": {field_name: {"lt": value}}}
  elif operator == Operator.LTE:
    return {"range": {field_name: {"lte": value}}}
  elif operator == Operator.NEQ:
    return {"bool": {"must-not": [{"term": {field_name: value}}]}}
  else:
    raise ValueError(f"Unknown operator {operator}")

Adding comments and docstrings is one of those things that I always tell myself I’ll do better, and never actually improve at – not until a few weeks ago, anyhow. It doesn’t have much nuance, but it’s perfectly capable of summarizing the function and its arguments, and even adding a few lines of example usage with some prompting.

Here’s the docstring it produced for the above function. Can you guess which parts I typed and which parts it generated?

"""
Translate an operator and value to an Elasticsearch query term.

>>> get_elasticsearch_term(Operator.EQ, "foo", "bar")
{"term": {"foo": "bar"}}
"""

If you guessed that I typed """ and >>>, you’re right. The rest was Copilot.

It’s also pretty good at completing prose, such as this very post. I’m not taking its suggestions often, but it’s helping me steer away from statements banal enough to be generated by a machine – or at least, banal enough to be the machine’s first suggestion.

Limitations

The main thing to remember about Copilot is that it doesn’t know, it guesses, and sometimes it guesses wrong. Frequently it will suggest a function/instance field/argument name/other token that doesn’t exist. This is a failure mode that the deterministic tooling of a conventional IDE doesn’t have, so it sometimes catches me off guard. It would not surprise me if future tools find a way to integrate the IDE’s perfect knowledge with generated suggestions.

For less-common cases, it’s possible to conjure up specific code from github, which, depending on said code’s licensing, may not be desirable behavior. Here’s an example from my not-especially-popular open-source clojure library failjure, where Copilot completes (failjure/if-let-ok? with an example usage not exactly from that function’s docstring, but more of a portmanteau of various docstring examples from that and other functions in the same file.

(failjure/if-let-ok? [x (some-fn)]
  (do-something x)
  (do-something-else))

I think the courts are working on this issue right now.

Finally, I’ve noticed that Copilot’s suggestions often match the quality of the surrounding code. Consistent naming and structure in particular are important, and I’ve found that Copilot is much more likely to produce good suggestions when I’m working in a section of code that’s already well-organized. I guess this is for the same reasons that I have an easier time making predictions about consistent and well-organized codebases. I’m hopeful that this habit will guide Copilot users to better habits.

What’s next?

I have no answers here, but after using Copilot for a few weeks, I’m sure that more is on the horizon. I now find myself curating Copilot’s suggestions for a good percentage of my time. How long until a Copilot-like tool is submitting PRs for me to review? Judging by github’s documented efforts to do exactly this, not long!