Category – Tech
60 posts tagged with "Tech" (See all categories)

Type coverage for Sydent: evaluation

2021-12-17 — Tech — David Robertson

This is the third in a series of three posts which discuss recent work to improve type annotations in Sydent, the reference Matrix Identity server. Last time we discussed the mechanics of how we added type coverage. Now I want to reflect on how well we did. What information and guarantees did we gain from mypy? How could we track our progress and measure the effect of our work? And lastly, what other tools are out there apart from mypy?

The best parts of --strict

While the primary goal was to improve Sydent's coverage and robustness, to some extent this was an experiment too. How much could we get out of typing and static analysis, if we really invested in thorough annotations? Sydent is a small project that would make for a good testbed! I decided my goal would be to get Sydent passing mypy under --strict mode. This is a command line option which turns on a number of extra checks (though not everything); it feels similar to passing -Wall -Wextra -Werror to gcc. It's a little extreme, but Sydent is a small project and this would be a good chance to see how hard it would be. In my view, the most useful options implied by strict mode were as follows.

--check-untyped-defs

By default, mypy will only analyze the implementation of functions that are annotated. On the one hand, without annotations for inputs and the return type, it's going to be hard for mypy to thoroughly check the soundness of your function. On the other, it can still do good work with the type information it has from other sources. Mypy can

  • infer the type of literals, e.g. deducing x: str from x = "hello";
  • lookup the return types of standard library calls, via typeshed; and similarly
  • lookup the return types of any annotated functions in your code or dependencies.

The information is already available for free: we may as well try to use it to spot problems.

--disallow-untyped-defs and friends

This flag forces you to fully annotate every function. There are less extreme versions available, e.g. --disallow-incomplete-defs; but I think this is a good option to ensure full coverage of your module. It means you can rely on mypy's error output as a to-do list.

One downside to this: sometimes I felt like I was writing obvious boilerplate annotations, e.g.

    def __str__(self) -> str:
        ...

There was one particular example of this that crops up a lot. Mypy has a special exception for a class's __init__ and __init_subclass__ methods. If a return type annotation is missing, it will assume these functions return None instead of Any. (See here for its implementation.) This is normally compatible with --disallow-untyped-defs and --disallow-incomplete-defs, with one exception. If your __init__ function takes no arguments other than self, mypy won't consider it annotated, and you'll need to write -> None explicitly.

It's also worth mentioning --disallow-untyped-calls, which will cause an error if an annotated function calls an unannotated function. Again, it helps to ensure that mypy has a complete picture of the types in your function's implementation. It also helps to highlight dependencies—if you see errors from this, it might be more practical to annotate the functions and modules it's calling first.

--warn-return-any

If I've written a function and annotated it to return an int, mypy will rightly complain if its implementation actually goes on to return a str.

def foo() -> int:
    # error: Incompatible return value type (got "str", expected "int")
    return "hello"

If mypy isn't sure what type I'm returning though, i.e. if I'm returning an expression of type Any, then by default mypy will trust that we've done the right thing.

def i_promise_this_is_an_int():
    return "hello"

def bar() -> int:
    reveal_type(i_promise_this_is_an_int()) # Any
    return i_promise_this_is_an_int()

Enabling --warn-return-any will disable this behaviour; to make this error pass we'll have to prove to mypy that i_promise_this_is_an_int() really is an int. Sometimes that will be the case, and an extra annotation will provide the necessary proof. At other times (like in this example), investigation will prove that there really is a bug!

--strict-equality

This is a bit like a limited form of gcc's -Wtautological-compare. Mypy will report and reject equality tests between incompatible types. If mypy can spot that an equality is always False, there's a good chance of there being a bug in your program, or else an incorrect annotation.

I'm not sure how general this check is, since users can define their own types with their own rules for equality by overriding eq. Perhaps it only applies to built-in types?

Quantifying coverage

It was important to have some way to numerically evaluate our efforts to improve type coverage. It's a fairly abstract piece of work: there's nothing user-visible about it, unless we happen to discover a bug and fix it.

The most obvious metric is the number of total errors reported by mypy. Before the recent sprint, we had roughly 600 errors total.

dmr on titan in sydent on  HEAD (3dde3ad) via 🐍 v3.9.7 (env)
2021-11-08 12:35:37 ✔  $ mypy --strict sydent
Found 635 errors in 59 files (checked 78 source files)

This is a decent way to measure your progress when working on a particular module or package, but it's not perfect, because the errors aren't independent. Fixing one could fix another ten or reveal another twenty—the numeric value can be erratic.

Reports

I found mypy's various reports to be a better approach here. There were three reports I found particularly useful.

--html-report

This produces a main index page showing the "imprecision" of each module. At the bottom of the table is a total imprecision value across the entire project.

HTML report, index page. A table showing each module's precision and number of lines of code.

The precision for each module is broken down line-by-line and colour-coded accordingly, which is useful for getting an intuition for what makes a line imprecise. More on that shortly.

HTML report, module page. Most lines of source code are highlighted green; a minority are highlighted red.

--txt-report

This reproduces the index page from the html report as a plain text file. It's slightly easier to parse—that's how I got the data for the precision line graphs in part one of this series. That was a quick and dirty hack, though; a proper analysis of precision probably ought to read from the json or xml output formats. Here's a truncated sample:

+-----------------------------------+-------------------+----------+
| Module                            | Imprecision       | Lines    |
+-----------------------------------+-------------------+----------+
| sydent                            |   0.00% imprecise |    1 LOC |
| sydent.config                     |   0.00% imprecise |  266 LOC |
| sydent.config._base               |   0.00% imprecise |   31 LOC |
| sydent.config.crypto              |  15.94% imprecise |   69 LOC |
| ...                               |               ... |      ... |
| sydent.validators                 |   0.00% imprecise |   61 LOC |
| sydent.validators.common          |   7.35% imprecise |   68 LOC |
| sydent.validators.emailvalidator  |   1.30% imprecise |  154 LOC |
| sydent.validators.msisdnvalidator |   1.34% imprecise |  149 LOC |
+-----------------------------------+-------------------+----------+
| Total                             |   5.95% imprecise | 9707 LOC |
+-----------------------------------+-------------------+----------+

--any-exprs-report

Selecting this option generate two reports: any-exprs.txt and types-of-anys.txt. The latter is interesting to understand where the Anys come from, but the former is more useful for quantifying the progress of typing. Another sample:

                  Name   Anys   Exprs   Coverage
-------------------------------------------------
                sydent      0       2    100.00%
         sydent.config      0     185    100.00%
   sydent.config._base      0       3    100.00%
  sydent.config.crypto     34      80     57.50%
sydent.config.database      0       8    100.00%
   sydent.config.email      0      86    100.00%
                   ...    ...     ...        ...
-------------------------------------------------
                 Total    544   11366     95.21%

The breakdown in types-of-anys.txt has more gory detail. I found the "Unimported" column particularly interesting: it lets us see how exposed we are to a lack of typing in our dependencies.

                             Name   Unannotated   Explicit   Unimported   Omitted Generics   Error   Special Form   Implementation Artifact
-------------------------------------------------------------------------------------------------------------------------------------------
                           sydent             0          0            0                  0       0              0                         0
                    sydent.config             0          3            0                  0       0              0                         0
              sydent.config._base             0          0            0                  0       0              0                         0
                              ...           ...        ...          ...                ...     ...            ...                       ...
        sydent.util.versionstring             0         80            0                  0       0              0                         0
                sydent.validators             0          4            0                  0       0              0                         0
         sydent.validators.common             0         20            0                  0       0              0                         0
 sydent.validators.emailvalidator             0          8            0                  0       0              0                         0
sydent.validators.msisdnvalidator             0          8            0                  0       0              0                         0
-------------------------------------------------------------------------------------------------------------------------------------------
                            Total             9       1276          273                  0      37              0                        17

The meaning of precision

There are two metrics I chose to focus on:

  • the proportion of "imprecise" lines across the project; I also used the complement, precision = 100% - imprecision, and
  • the proportion of expressions whose type is not Any.

These are plotted in the graph at the top of this writeup. I could see that precision and the proportion of typed expressions were correlated, but I didn't understand how they differed. I couldn't see an explanation in the mypy docs, so I went digging into the mypy source code. My understanding is as follows.

  1. There are five kinds of precision. Full details are visible in the --lineprecision-report.
  2. Two kinds of precision, EMPTY and UNANALYZED convey no information, because there's nothing to analyze.
  3. A line is marked as precise, imprecise or any based on the expressions it uses.
    • An expression that has type Any leads to precision ANY.
    • I think an expression that involves Any but is not Any counts as imprecise. For instance, Dict[str, Any].
    • Remaining expressions have precision PRECISE.
  4. A line's precision is the worst of all its expressions' precisions.
    • ANY is worse than IMPRECISE, which is worse than PRECISE.

The "imprecision" number reported by mypy counts the number of lines classified as IMPRECISE or ANY.

On balance, my preferred metric is the line-level (im)precision percentage. There wasn't much difference between the two in my experience, but the colour-coded visualisation in the HTML report is a neat feature to have. Maybe in the future there could be a version of the HTML report that colour-codes each expression?

The larger typing ecosystem

There are plenty of articles out there about the typing. As well as the mypy blog itself, see Daniele Varrazzo's post on psycopg3 (2020), Dropbox's blog post (2019), Zulip's blog post (2016), Glyph's blog post on Protocols (2020) and a follow-up comparing them to zope.interface (2021) and Nylas's blog (2019). I'm sure there's plenty more out there.

It's worth highlighting the typeshed project, which maintains stubs for the standard library, plus popular third-party libraries. I submitted a PR to add a single type hint—it was a very pleasant experience! Microsoft has an incubator of sorts for stubs too.

Other typecheckers

After the sprint to improve coverage, I spent a short amount of time trying the alternative type checkers out there. Mypy isn't the only typechecker out there—other companies have built and open-sourced their own tools, with different strengths, weaknesses and goals. This is by no means an authoritative, exhaustive survey—just my quick notes.

Pyre (Facebook)

  • I couldn't work out how to configure paths to resolve import errors; in the end, I wasn't able to process much of Sydent's source code.
  • Couldn't get it to process annotations like syd: "Sydent" where "Sydent" is an import guarded by TYPE_CHECKING.
  • No plugin system that I could see. That said, it has a separate mode/program for running "Taint Analysis" to spot security issues.
  • Seemed stricter by default compared to mypy: there was less inference of types.
  • Also has its own strict mode.

Pyright (Microsoft)

  • Didn't seem to recognise getLogger as being imported from logging. Not sure what happened there—maybe something wrong with its bundled version of typeshed?
  • In a few places, Sydent uses urllib.parse.quote but only imports urllib. We must be unintentionally relying on our dependencies to import urllib.parse somewhere! Mypy didn't complain about this; pyright did.
  • Seemed to give a better explanations of why complex types were incompatible. For example:
    /home/dmr/workspace/sydent/sydent/replication/pusher.py
       /home/dmr/workspace/sydent/sydent/replication/pusher.py:77:16 - error: Expression of type "DeferredList" cannot be assigned to return type "Deferred[List[Tuple[bool, None]]]"
         TypeVar "[email protected]" is contravariant
           TypeVar "[email protected]" is invariant
             Tuple entry 2 is incorrect type
               Type "None" cannot be assigned to type "[email protected]_DeferredListResultItemT" (reportGeneralTypeIssues)
     /home/dmr/workspace/sydent/sydent/sms/openmarket.py
       /home/dmr/workspace/sydent/sydent/sms/openmarket.py:93:13 - error: Argument of type "dict[[email protected], list[bytes]]" cannot be assigned to parameter "rawHeaders" of type "Mapping[[email protected]__init__, Sequence[[email protected]__init__]] | None" in function "__init__"
         Type "dict[[email protected], list[bytes]]" cannot be assigned to type "Mapping[[email protected]__init__, Sequence[[email protected]__init__]] | None"
           TypeVar "[email protected]" is invariant
             Type "[email protected]" is incompatible with constrained type variable "AnyStr"
           Type cannot be assigned to type "None" (reportGeneralTypeIssues)
    This would have been really helpful when interpreting mypy's error reports; I'd love to see something like it in mypy. Here's another example where I tried running against a Synapse file.
    /home/dmr/workspace/synapse/synapse/storage/databases/main/cache.py
    /home/dmr/workspace/synapse/synapse/storage/databases/main/cache.py:103:53 - error: Expression of type "list[tuple[Unknown, Tuple[Unknown, ...]]]" cannot be assigned to declared type "List[Tuple[int, _CacheData]]"
      TypeVar "[email protected]" is invariant
        Tuple entry 2 is incorrect type
          Tuple size mismatch; expected 3 but received indeterminate number (reportGeneralTypeIssues)
    This is really valuable information. It's worth considering Pyright as an option to get a second opinion!
  • It looks like Pyright's name for Any is Unknown. I think that does a better job of emphasising that Unknown won't be type checked. I'd certainly be more reluctant to type x: Unknown versus x: Any!
  • Pyright is the machinery behind Pylance, which drives VS Code's Python extension. That alone probably makes it worthy of more eyes.
  • Seemed like it was the best-placed alternative typechecker to challenge mypy (the de-facto standard).

Pytype (Google)

  • Google internal? Seems to be maintained by one person semiregularly by "syncing" from Google.
  • Apparently contains a script merge-pyi to annotate a source file given a stub file.
  • No support for TypedDict: as soon as it saw one in Sydent, it stopped all analysis.
  • No Python 3.10 support (according to the README anyway).
  • I think it might use a different kind of typing semantics; its typing FAQ speaks of "descriptive typing" and a more lenient approach.

PyCharm

PyCharm has its own means to typechecking code as you write it. It's definitely caught bugs before, and having the instant feedback as you type is really nice! I have seen it struggle with zope.interface and some uses of Generics though.

Runtime uses of annotations

When annotations were first introduced, they were a generic means to associate Python objects with parts of a program. (It was only later that the community agreed that we really want to use them to annotate types). These annotations are available at runtime in the __annotations__ attribute. There's also a helper function in typing which will help resolve forward references.

>>> from typing import get_type_hints
>>> def foo(x: int) -> str: pass
...
>>> get_type_hints(foo)
{'x': <class 'int'>, 'return': <class 'str'>}

Programs and libraries are free to use these annotations at runtime as they see fit. The most well-known examples are probably dataclasses, attrs with auto_attribs=True and Pydantic. I'd be interested to learn if anyone else is consuming annotations at runtime!

Summary

All in all, in a two-week sprint we were able to get Sydent's mypy coverage from a precision of 83% up to 94%. Our work would have spotted the bytes-versus-strings bug; we understand why the missing await wasn't detected. We fixed other small bugs too as part of the process. As well as fix bugs, I've hopefully made the source code clearer for future readers (but that one is hard to quantify).

There's room to spin out contributions upstream too. I submitted two PRs to twisted upstream; have started to work on annotations for pynacl in my spare time; and submitted a quick fix to typeshed.

Looking forward, I think we'd get a quick gain from ensuring that our smaller libraries (signedjson, canonicaljson) are annotated. We'll be sticking with mypy for now—the mypy-zope plugin is crucial given our reliance on twisted. We're also working to improve Sygnal and Synapse—though not to the extreme standard of --strict across everything.

I'd say the biggest outstanding hole is our processing of JSON objects. There's too much Dict[str, Any] flying around. The ideal for me would be to define dataclass or attr.s class C, and be able to deserialise a JSON object to C, including automatic (deep) type checking. Pydantic sounds really close to what we want, but I'm told it will by default gladly interpret the json string "42" as the Python integer 42, which isn't what we'd like. More investigation needed there. There are other avenues to explore too, like jsonschema-typed, typedload or attrs-strict.

To end, I'd like to add a few personal thoughts. Having types available in the source code is definitely A Good Thing. But there is a part of me that wonders if it might have been worth writing our projects in a language which incorporates types from day one. There are always trade-offs, of course: runtime performance, build times, iteration speed, ease of onboarding new contributors, ease of deployment, availability of libraries, ability to shoot yourself in the foot... the list goes on.

On a more upbeat note, adding typing is a great way to get familiar with new source code. It involves a mixture of reading, cross-referencing, deduction, analysis, all across a wide variety of files. It'd be a lot easier to type as you write from the get-go, but typing after the fact is still a worthy use of time and effort.


Many thanks for reading! If you've got any corrections, comments or queries, I'm available on Matrix at @dmrobertson:matrix.org.

Type coverage for Sydent: annotation

2021-12-10 — Tech — David Robertson

This is the second in a series of three posts which discuss recent work to improve type annotations in Sydent, the reference Matrix Identity server. Last time we discussed the motivation for doing this work in the first place: the why. Now I want to talk about the how. How did we add annotations to individual files, and across the project as whole? What common idioms did we learn on the way?

The process of improving coverage

From experience adding typing to Synapse, we decided to annotate one module at a time. We configured a list of files in pyproject.toml which we knew passed mypy's checks. We could run mypy in CI to check that new PRs didn't introduce type problems in modules already covered.

From there, the workflow was

  1. Choose a new module to annotate. Add it to the list of files in mypy's configuration.
  2. Run mypy. See how many errors you get.
  3. Choose an error. Fix it. Re-run mypy.
  4. Repeat until no errors remaining.
  5. Submit for review.

There are two parts in there which involve a choice. Being honest, I made those choices unscientifically: I tried to choose the easy tasks to do first. My first target was actually the entire sydent.util subpackage. Probably a bit too large for a first bite! My thinking was that util sounded like something with few dependencies that would have impact across the whole source tree.

Within a given module, I'd try to fix easier errors first: partly for confidence, partly to build up momentum, and partly to get myself familiar with that piece of source code. For example, I'd often start by telling mypy that mylist = [] was actually was a List[str] (rather than the generic List[Any] which it would use otherwise).

Picking and choosing easy targets works fairly well, but sometimes that means you end up fixing an error that's really a symptom of an earlier one. Other times fixing one error, e.g. by giving a return type annotation to a function—would solve a series of other errors throughout the file. Watching the total number of errors mypy reports bob up and down was intriguing!

In retrospect, I think it would be smoother to generate some kind of dependency graph for the package. I'm imagining a DAG where whose vertices are modules, and there's an edge A -> B if A imports from B. The sinks of this DAG (i.e. modules which don't depend on any others in the package) are the ideal place to start: you can get something strictly typechecked there without having to annotate a long chain of dependencies across other files. Another strategy would be to see which modules were the least precise according to mypy's reports—but more on those next time.

You're at the mercy of your dependencies

I think this is my single biggest takeaway from the process of adding annotations to Sydent. I'll admit the phrasing is melodramatic, but I think it rings true.

Improving coverage boils down to giving the typechecker more information about your program. The more information it has, the more it can check—and the more errors it can spot. (Hopefully this doesn't make typing come across like a pyramid scheme.) If your dependencies aren't typed, mypy can't validate you're correctly providing inputs and correctly consuming outputs. You might have a bigger impact on overall typing coverage by annotating a dependency (directly or via stubs). I have a hunch that bugs are more likely in code that uses an external dependency: we're much more familiar with the details of our own source code compared to that of a third party we trust.

It's worth looking to see if your dependencies have a newer version including type annotations. Failing that, they may have a stub package added to typeshed and published on PyPI. I saw example of both cases when choosing how to configure mypy for Sydent. If some of your dependencies are under your control, consider annotating them—you'll feel the benefits across multiple projects pretty quickly.

Annotations when working with twisted

Twisted is Sydent's biggest dependency, and I certainly felt at its mercy! In particular, it has a few quirks which make it trickier to annotate applications using it. Here's a summary of the work we had to do to get those annotations working.

Partially typed modules and stubs

Early into the process, mypy reported that calling the function twisted.python.log.err was an error. It did so because I was running mypy in --strict mode. We'll talk more about why I did so and what this means next time; for now, it's enough to know that calling a function that isn't fully annotated from within a function that is constitutes an error under strict mode. twisted is partially annotated: many key modules and functions have type annotations, but others don't. I was reluctant to give up on --strict. Instead, I decided to stub the err function myself.

A stub is a cut-down version of a python function, class or module which lives in a .pyi file. All implementation details are removed; only type annotations remain. Stubs are useful when you want to write annotations for code you don't control. The typeshed library is probably the best example: a collection of third party stubs for the standard library, plus some popular third party packages. Microsoft's python-type-stubs is another example. They'd also solve the problem I mentioned at the end of the first part: I could use a stub to teach mypy that IResponse.headers was a Headers object.

Writing a stub for log.err was straightforward, thanks mainly to Twisted's thorough documentation. I chose to write it by hand, rather than use stubgen. I'd heard of the latter, but was reluctant to use it for a few reasons.

  • Twisted is a big project with many big files. I didn't want to commit huge stub files for me and my colleagues to maintain.
  • The more our stubs cover, the larger the risk of a stub becoming out-of-date with twisted itself. Upstream twisted is the best place for these annotations.
  • We only need annotations for the bits of twisted that we're using.
  • And, being honest: I wanted the low-level experience of writing stubs, cross-referencing between the source and working how to best capture its type semantics.

I think this decision to write a targeted stub made sense at the time. After all, twisted.python.logging.err is just one simple function! But for the project as a whole, I regret not using stubgen to generate module-level stubs. The reason for this is that a .pyi stub file accounts for an entire module: no more and no less. This means that the stubs I was writing for parts of twisted only covered the functions and classes I'd stubbed. Any existing annotations in the twisted source code would be ignored, along with any types that mypy was able to infer for itself.

I think it would have been more efficient to use stubgen to generate stubs and patch them up, rather than writing them. That would have helped avoid a few cases I encountered where stubbing one function would cause additional typechecking failures (because mypy was no longer examining the twisted source for that file). It would also have meant that I could just faithfully stub Twisted as it is; in practice, I would sometimes hesitate to stub to avoid having to cover another module. sydent.http was the most painful part of the source tree for this: that's where we make the most use of Twisted.

defer.inlineCallbacks

This is a decorator which allows us to write code in the style of async/await without actually using that syntax. (Twisted predates asyncio and the async/await syntax, introduced in Python 3.4 and 3.5 respectively. It was originally released in 2002, back when Python had released version 2.2.) We only use it in one place in Sydent nowadays, but I've seen used across Synapse too. I mention it here because it was a bit fiddly to annotate. Here's its use in Sydent:

    @defer.inlineCallbacks
    def request(
        self,
        method: bytes,
        uri: bytes,
        headers: Optional["Headers"] = None,
        bodyProducer: Optional["IBodyProducer"] = None,
    ) -> Generator["defer.Deferred[Any]", Any, IResponse]:

Here, request is a generator function because its body uses the yield keyword. Yielding allows the function to relinquish control back to twisted's reactor, only for its execution to be resumed asynchronously in the future. The Generator type takes three parameters:

  • a YieldType, Deferred[Any];
  • a SendType, Any; and
  • a ReturnType, IResponse.

Why have I opted to use Any here, when we've seen (and will see) that this limits mypy's ability to run checks? The answer is that we yield two different types within the function. Firstly a routing result:

        routing: _RoutingResult
        routing = yield defer.ensureDeferred(self._route_matrix_uri(parsed_uri))

and later, the IResponse we go on to return:

        res: IResponse
        res = yield agent.request(method, uri, headers, bodyProducer)
  • In this example:

  • We yield a value y: Deferred[Any].

  • That will later be resolved by twisted to an x: Any value. Twisted will send that value to request, and the execution continues.

  • This repeats, until we eventually return an IResponse.

I could more correctly annotate the YieldType as Deferred[Union[_RoutingResult, IResponse]] so that the SendType was Union[_RoutingResult, IResponse]. But this would mean we end up having some kind of type check at each yield point: a cast, or a type: ignore, or a runtime isinstance check. It didn't feel like it was worth the boilerplate, especially since I could narrow e.g. res: Any to res: IResponse with minimal effort.

This problem doesn't arise with using the async/await syntax:

async def foo() -> int:
    return 1

async def bar() -> None:
    x = await foo()
    reveal_type(foo())
    reveal_type(x)
$ mypy example.py
example.py:6: note: Revealed type is "typing.Coroutine[Any, Any, builtins.int]"
example.py:7: note: Revealed type is "builtins.int*"

Behind the scenes, I think that x = await foo() is really using the same mechanism as the inlineCallbacks approach.

  • An async def function is really a generator function behind the scenes.
  • When we await foo(), we yield the expression foo()
  • Then the machinery running our coroutine c will call c.send(x) to resume execution, where x is the value produced by waiting for foo().

With the await form, mypy knows two things:

  • the value foo() which was yielded should be Awaitable[T], and
  • the value x send to the coroutine should come from awaiting foo(), and therefore be of type T.

Mypy can't assume or enforce these rules for the yield form, which can yield and send whatever it likes. There's no reason why the send type should be related to the yield type. Here's a toy example:

from typing import Any, Dict, Generator


def generator_function() -> Generator[int, str, Dict[str, Any]]:
  y: int = 10
  x: str = yield y
  print("Coroutine was sent", x)  # -> Coroutine was sent hello
  return {"got": x, "done": True}

coroutine = generator_function()
y = next(coroutine)

try:
  coroutine.send("hello")
except StopIteration as e:
  return_value = e.value
  print(return_value)  # -> {'got': 'hello', 'done': True}

All in all, the handling of inlineCallbacks is a situation specific to working with (older?) twisted code. It's still nice to understand what's going on behind the scenes though!

zope.interface.Interface

Twisted makes use of zope's Interface to define a number of abstract interface classes. Speaking personally, I've not seen it used outside twisted, and I think that means it's not supported by much of the typechecking tooling. For example, I've definitely seen PyCharm struggle to realise that it's okay to pass a Response to a function which expects an IResponse! Here's a more complicated example where PyCharm isn't happy with me widening the type LoggingHostnameEndpoint to IStreamClientEndpoint, even though the latter implements the former.

Screenshot from pycharm showing a false positive warning

Mypy out of the box doesn't play well with a zope Interface (nor does any other typechecker I tried). Fortunately, the excellent mypy-zope plugin helps here: it teaches mypy that any class like Response which @implements(IResponse) can be passed in place of an IResponse.

Tricks of the trade

At this point I'd like to share a few generic lessons about typing I'd picked up. Nothing ground-breaking here: I think these are all fairly well-known. Hopefully they're the start of a good cheat sheet for annotating—though it pales in comparison to the Mypy cheat sheet.

Annotating decorators

Annotating decorators is fiddly, but it's also vitally important. An unannotated decorator will mask or throw away your decoratee's annotations! The way to write the annotation is best explained by the mypy docs, but briefly: it involves a TypeVar and sometimes a cast too. Here's an example.

from typing import TypeVar, Callable, Any, cast

F = TypeVar("F", bound=Callable[..., Any])

def decorator(input: F) -> F:
    def wrapped(*args: Any, **kwargs: Any) -> Any:
        print(f"Calling {f.__name__}")
        return input_func(*args, **kwargs)
    return cast(F, wrapped)

The idea here is

  1. Use Callable[..., Any] to describe a generic function with no particular signature.
  2. Use that as a bound on a type variable F. At each usage of @decorator, mypy will deduce a more specific version of F, e.g. Callable[[int], str].
  3. Within that usage of @decorator, F is fixed to that specific type. We use -> F to express that "we return a function with the same signature as the decoratee".
  4. Unfortunately, we don't have a good way to tell mypy that wrapped also has that signature F. We resort to a cast to force mypy to accept this without proof.

This might change in the future, e.g. when ParamSpec is fully understood by mypy.

Prefer object over Any

Both of these are general types for expressing "I don't know anything about this expression". But only the former will undergo static type checks. We want those type checks to guard against bugs accidentally introduced in the future. For instance, imagine a class with an Any attribute.

from typing import Any
import dataclasses

@dataclasses.dataclass
class C:
    label: Any

Imagine in the future we add a new method which assumes that label is a string:

    def greeting(self) -> str:
        return "My name is " + self.label

Mypy will consider this valid, because no type-checking is done on an Any value. It will complain that it can't prove that greeting returns a str, if --warn-return-any is enabled; but putting that aside, it can't identify the call site as a bug. The bug slips through to runtime.

C(123).greeting()  # TypeError: can only concatenate str (not "int") to str

Replacing the Any with object does allow mypy to spot the problem.

error: Unsupported operand types for + ("str" and "object")  [operator]

# type: ignore and cast sparingly

There's a good chance that mypy knows better than we do, so we should only overrule it if there's no better option. There are two techniques for this. One option is to tell mypy to just silence the error, by appending a # type: ignore comment to the erroneous line. The other is to force it to accept that a certain expression has a given type: that's what cast is for.

I never like using either of these, but sometimes they're the most practical choice. I'd recommend two best practices for their use, however:

  1. Only ignore a specific error code, e.g. # type: ignore[assignment]. Those codes can be displayed by passing --show-error-codes to mypy.
  2. Leave a comment before every #type: ignore[...] and every cast to justify their correctness. I don't think this is a widely-held practice. I picked it up from the Rust world, where it's encouraged as a way to justify unsafe source code.

There's good stuff in typing

It's worth a read through the module documentation—plenty of things in there I wish I'd known about sooner. There's lots in the toolkit to chose from. Some bits I use fairly often include:

  • Protocol: lets you formalise duck typing. To use it, define a class that inherits from Protocol. Its methods and attributes are all stubs which describe what you require of objects belonging to this type. It's like an abstract base class or interface, but purely at typecheck time.
  • Generic: define your own generic types. A bit of a slippery slope to type mania!
  • Optional`: I use this all the time. It's always good to make your Nones explicit!
  • NewType: I haven't had a chance to use it much. As I understand it, it's a way to define a "strong typedef". For example, we can use it to distinguish lengths from durations, even if they're both represented by a float at runtime.

I want to call out two parts of typing in particular:

overload

overload is a way to provide extra information about a function depending on how it's called. For instance, consider this function which takes a str or bytes as input and returns its uppercase version.

def upper(x: Union[bytes, str]) -> Union[bytes, str]:
    return x.upper()

The problem with this annotation is that we'll have to check at every call site to see if the return value was a bytes or a str object. But we know that uppercasing a str gives us a str, and uppercasing a bytes gives us a bytes. We can use overload to express this.

from typing import overload

@overload
def upper(x: str) -> str: ...

@overload
def upper(x: bytes) -> bytes: ...

def upper(x: Union[bytes, str]) -> Union[bytes, str]:
    return x.upper()

The first two @overload definitions are like stubs: they're purely used for their annotations. The actual runtime implementation is specified at the end, undecorated. (NB: this specific example is better expressed without overloads, by using AnyStr.)

I was really impressed to see how mypy could use this overloading information. Here's an example:

from typing import overload, Literal

@overload
def f(x: int) -> Literal[1]: ...

@overload
def f(x: str) -> Literal[2]: ...

def f(x: object) -> int:
    if isinstance(x, int):
        return 1
    elif isinstance(x, str):
        return 2
    return 3

if f(10) == 2:
    print("potato")

We can see that f(10) == 1 and so the equality is always False: we'll never print the word "potato". Mypy can reason through this too, if we ask it nicely.

$ mypy example.py
Success: no issues found in 1 source file

$ mypy --strict-equality --warn-unreachable example.py
example.py:16: error: Non-overlapping equality check (left operand type: "Literal[1]", right operand type: "Literal[2]")
example.py:17: error: Statement is unreachable
Found 2 errors in 1 file (checked 1 source file)

TypedDict

I mentioned this in last week's post) when talking about the missing await bug. Subclassing from TypedDict allows us to define a new type whose values are dictionaries with

  • a fixed set of keys (some mandatory, some not), and
  • a fixed type for each key's value.

PEP 589 can best describe the motivation, but in short: there's a lot of source code out there that passes dictionaries around. TypedDict is a way to gradually add typechecking to that code, without having to refactor it to use e.g. a dataclass or a NamedTuple. Let's have a quick example.

from typing import List, TypedDict

class Person(TypedDict):
    full_name: str
    nicknames: List[str]
    age: int

p: Person = {
    "full_name": "David Matthew Robertson",
    "nicknames": ["dmr"],
    "age": 29,
}

Mypy will detect if you delete or omit a required key, or if you insert a key that's not part of the type.

# error: Key "age" of TypedDict "Person" cannot be deleted
del p["age"]

# error: Missing keys ("full_name", "nicknames", "age") for TypedDict "Person"
p2: Person = {}

# error: TypedDict "Person" has no key "eye_colour"
p["eye_colour"] = "brown"

TypedDict is a useful tool, but there are a few important gotchas to be aware of. In my view, these are all minor and worth putting up with, to benefit from the extra type information that a TypedDict provides. Let's take a look.

TypedDict is incompatible with Dict

This one surprised me at first. Why can't I pass my TypedDict to a function that accepts a generic dictionary?

import json
from typing import Dict, TypedDict

class Foo(TypedDict):
    bar: str

def print_size(d: Dict[object, object]) -> None:
    print(len(d))

f: Foo = {"bar": "baz"}
# error: Argument 1 to "print_size" has incompatible type "Foo"; expected "Dict[object, object]"
print_size(f)

The answer is that it wouldn't be type-safe! For all we know, the function print_size might mutate its argument d. It might add a key, remove a key, or change the type of a value. Each of those would break the contract that TypedDict is supposed to enforce, so mypy is correct to flag this as an error. This GitHub issue has more discussion.

There are few workarounds for this kind of problem.

  • The function might be able to accept a TypedDict directly.
  • The function could accept a Mapping instead of a Dict. This is type-safe because the Mapping type does not offer mutating methods such as pop, __setitem__, __del__; but it only works if the function does not mutate the dictionary.
  • A more drastic approach would discard the TypedDict information with a cast to Dict[str, object] or similar. If so, I'd strongly recommend a comment explaining why the cast is correct and safe.
Mixing mandatory and required fields

Secondly, defining a TypedDict with a mixture of optional and required keys is a little fiddly. All keys can be made optional by passing total=False in the class definition. To have some keys optional and others mandatory, we have to make two TypedDicts. Say for instance we wanted to allow a potentially-missing favourite_colour field to Person. We can't add an annotation favourite_colour: Optional[str] to the first class body. That's optional in a different sense: it would mean that favourite_colour is a mandatory field which is allowed to be None. Instead, we apply total=False to a subclass:

from typing import List, TypedDict

class _PersonRequired(TypedDict):
    full_name: str
    nicknames: List[str]
    age: int

class Person(_PersonRequired, total=False):
    favourite_colour: str

This means that a valid Person dictionary may have no favourite_colour key. If it is present, it must be a str.

The syntax is unfortunately a little clunky. PEP 655 acknowledges this and proposes an alternative.

TypedDict is typecheck-time only

One final note: a TypedDict does no validation or conversion whatsoever. If mypy can't know the keys and values a dictionary will have a typecheck-time, you'll need to validate it by hand at runtime. We see this a lot because when deserialising json objects into a dictionary. Here's an example.

import json
from typing import TypedDict

class Foo(TypedDict):
    bar: str

# note: Revealed type is "Any"
reveal_type(json.loads("{}"))
# no error. mypy can't do analysis on Any.
f: Foo = json.loads("{}")

Next time

This covers a lot of the machinery and the day-to-day process of annotating Sydent. The last part of this series will give us a chance to quantify our efforts and reflect on the wider typing ecosystem.


Many thanks for reading! If you've got any corrections, comments or queries, I'm available on Matrix at @dmrobertson:matrix.org.

Type coverage for Sydent: motivation

2021-12-03 — Tech — David Robertson

This is the first of three posts on improving type coverage in Sydent. Join us next Friday for the second part!

Sydent is the reference Matrix Identity server. It provides a lookup service, so that you can find a Matrix user via their email address or phone number (if they've chosen to share it).

We recently worked on improving Sydent's type coverage: the proportion of its source code with explicit annotations denoting the kind of data that each variable, expression and return value is expected to hold. These annotations are optional, but if present, they allow tools like mypy to analyze your programs and spot entire classes of bugs before you ship them! In this instance, we aimed to make Sydent pass mypy --strict, which it now does. Here's what the process looked like:

Coverage as measured by mypy. Precision and the number of typed expressions increase over the latter half of 2021.

Two lines show two different measures of how well-typed the project is. The grey region covers our two-week sprint towards improving coverage; the earliest data point is from just before previous efforts to improve typing earlier in the year.

In a series of posts, I'd like to reflect on this sprint and share what we've learned. In particular, I aim to:

  • explain why we wanted to improve type coverage now;
  • work through examples to see how (if?) mypy could have spotted bugs;
  • describe the annotation process;
  • illustrate common patterns I learned along the way;
  • discuss the checks that mypy provides; and finally
  • reflect on the state of Python's typing ecosystem.

In this first part, we'll concentrate on the first two topics; the second covers the middle two; and the third the last two.

Why do this now?

It took us a long time (too long) to notice that the Sydent instance serving matrix.org was failing to send SMS messages for verification. We suspected that something was going wrong with our API call to OpenMarket. Our first step was to improve logging, so we could start to deduce what was going wrong and why. Whilst trawling through logs, we spotted one problem which meant we weren't actually sending off the API request in the first place. Further investigation revealed a strings-versus-bytes confusion which meant that we would always (incorrectly) interpret the API response as having failed.

All in all, phone number verification was unknowingly broken in the 2.4.0 release, to be fixed in 2.4.6 a month later. How could we do better? Better test coverage is (as ever) one answer. But it struck me that the two bugs we'd encountered might be ripe for automatic detection:

  • we created an Awaitable but didn't await it or use it in any way, and
  • we tried to look up a str key in a dictionary which mapped bytes to bytes.

Could a static analysis tool like mypy detect these? How much work would it take to do so? Are there other bugs and problems we could spot with it? I was curious to answer these questions and learn more about the tools that Python's typing ecosystem provides.

Could typing have spotted these problems?

Let's start with the first of question: what can mypy detect?

The missing await

Instead of writing x = await foo(), we simply had x = foo() and didn't then go on to await x. Mypy doesn't have means to detect this at present. There was interest in this issue on such a feature, with related threads here and here.

Are there other opportunities to spot the error? Here's the relevant bit of source code from before the fix.

            sid = self.sydent.validators.msisdn.requestToken(
                phone_number_object, clientSecret, sendAttempt, brand
            )
            resp = {
                "success": True,
                "sid": str(sid),
                "msisdn": msisdn,
                "intl_fmt": intl_fmt,
            }

The call to requestToken produces a value of type Awaitable[int]. If we tried to assign that to an expression of type int we'd get an error that mypy can spot.

$ cat example.py
async def foo() -> int:
    return 1

async def bar():
    x = foo()      # no error
    y: int = foo() # error: rhs is Awaitable[int], but lhs expects int

$ mypy --check-untyped-defs example.py
example.py:6: error: Incompatible types in assignment (expression has type "Coroutine[Any, Any, int]", variable has type "int")
Found 1 error in 1 file (checked 1 source file)

Note that we have to specifically ask mypy to typecheck the body of bar by passing --check-untyped-defs; by default, mypy will only typecheck annotated code.

We might also have been able to detect the error by looking at how we used sid. Unfortunately, the only use of was in a conversion str(sid), which is a perfectly type-safe call for both sid: int and sid: Awaitable[int]. But let's put that aside for a second—suppose we added "sid": sid directly into the resp dictionary. Could mypy have spotted there was a problem with that?

Unfortunately not. Because resp has no annotation, we have to rely on how it's used to spot any type inconsistencies. There's only one use of resp: as the return value from its enclosing function, render_POST. Mypy's only chance to spot a type problem would be to compare the mypy's inferred type for resp to the return type of render_POST. What are those types? We can use reveal_type to see the former is Dict[str, object]. For the latter:

    @jsonwrap
    def render_POST(self, request: Request) -> JsonDict:

The return type is JsonDict, which is an alias for Dict[str, Any]. This is bad news, because Dict[str, object] and Dict[str, Any] are compatible. Digging a level deeper, this is because sid: Any holds true for both sid: int and sid: Awaitable[int]—so there's no error to spot here. The Any type is compatible with any other type whatsoever! Mypy uses Any as a way to defer all type checking to runtime; mypy won't (and can't!) statically analyse the usage of an expression of type Any. Indeed, mypy's reports will tell you how many Anys you're working with, and offer a variety of options to warn or error on usages of Any.

If we were inserting sid directly into a dictionary, we could do better by annotating the dictionary (or the function's return type) as a TypedDict. This is a way to specify a dictionary with a fixed set of keys, each with a fixed type. It comes in really handy for Sydent, Sygnal and Synapse—all of the Matrix APIs exchange JSON dictionaries, so anything we can do to teach mypy about their shape and types is gold dust.

In short, there were options for detecting this with some code changes, but no magic wand that would have spotted the error in the code as written.

The strings/bytes confusion

Our error was here:

        headers = dict(resp.headers.getAllRawHeaders())
        request_id = None
        if "X-Request-Id" in headers:
            request_id = headers["X-Request-Id"][0]

In this sample, resp.headers is a twisted.web.http_headers.Headers instance. getAllRawHeaders is documented as returning an iterable of (bytes, Sequence[bytes]) pairs. Even better, mypy can see this because getAllRawHeaders is annotated (many thanks to the twisted authors for this). Mypy should be able to deduce that we build a dictionary headers: Dict[bytes, Sequence[bytes]. We can check this using reveal_type:

        headers = dict(resp.headers.getAllRawHeaders())
        reveal_type(headers)
$ mypy
sydent/sms/openmarket.py:110: note: Revealed type is "builtins.dict[builtins.bytes*, typing.Sequence*[builtins.bytes]]"

(The * in builtins.bytes* here means mypy has inferred that the dictionary's keys are bytes, rather than being told explicitly that they must be bytes.)

That's all fine and dandy. But why didn't we spot this before if the annotations were all in place in twisted? Let's put aside the fact that, erm, we weren't running mypy in Sydent's CI until the recent sprint, unlike our other projects. Checking out the problematic version, we can run mypy on the file we know to contain the bug.

$ git checkout v2.4.0
$ mypy --strict sydent/sms/openmarket.py
sydent/sms/openmarket.py:82: error: Dict entry 0 has incompatible type "str": "int"; expected "str": "str"  [dict-item]

Huh. Mypy spots something, but not the error we were hoping for. What's going on here? We can ask mypy to show its working with reveal_type again.

        resp = await self.http_cli.post_json_get_nothing(
            API_BASE_URL, send_body, {"headers": req_headers}
        )
        reveal_type(resp)
        headers = dict(resp.headers.getAllRawHeaders())
        reveal_type(resp.headers)
        reveal_type(resp.headers.getAllwRawHeaders())
        reveal_type(headers)

This yields:

$ mypy sydent/sms/openmarket.py
sydent/sms/openmarket.py:82: error: Dict entry 0 has incompatible type "str": "int"; expected "str": "str"  [dict-item]
sydent/sms/openmarket.py:102: note: Revealed type is "twisted.web.iweb.IResponse*"
sydent/sms/openmarket.py:104: note: Revealed type is "Any"
sydent/sms/openmarket.py:105: note: Revealed type is "Any"
sydent/sms/openmarket.py:106: note: Revealed type is "builtins.dict[Any, Any]"
Found 1 error in 1 file (checked 1 source file)

Ahh, the Any type. As mentioned above, this represents a value whose type can't be statically determined. We're left to runtime checks to detect the problem. But we won't detect it at runtime, because dictionaries don't enforce any kind of type requirements on their keys and values.

The problem here is that mypy can't see that resp.headers is a twisted Headers object. If we could inform it of this, mypy would spot our bug:

        import twisted.web.http_headers
        raw_headers: twisted.web.http_headers.Headers = resp.headers
        reveal_type(resp)
        headers = dict(raw_headers.getAllRawHeaders())
        reveal_type(raw_headers)
        reveal_type(raw_headers.getAllRawHeaders())
        reveal_type(headers)
$ mypy sydent/sms/openmarket.py
sydent/sms/openmarket.py:82: error: Dict entry 0 has incompatible type "str": "int"; expected "str": "str"  [dict-item]
sydent/sms/openmarket.py:104: note: Revealed type is "twisted.web.iweb.IResponse*"
sydent/sms/openmarket.py:106: note: Revealed type is "twisted.web.http_headers.Headers"
sydent/sms/openmarket.py:107: note: Revealed type is "typing.Iterator[Tuple[builtins.bytes, typing.Sequence[builtins.bytes]]]"
sydent/sms/openmarket.py:108: note: Revealed type is "builtins.dict[builtins.bytes*, typing.Sequence*[builtins.bytes]]"
sydent/sms/openmarket.py:114: error: Invalid index type "str" for "Dict[bytes, Sequence[bytes]]"; expected type "bytes"  [index]
sydent/sms/openmarket.py:114: error: Argument 1 to "split" of "bytes" has incompatible type "str"; expected "Optional[bytes]"  [arg-type]
Found 3 errors in 1 file (checked 1 source file)

There it is, on line 114: Invalid index type "str" for "Dict[bytes, Sequence[bytes]]"; expected type "bytes".

Unfortunately it'd be a pain to annotate our application code to mark every use of IResponse.headers as a Headers object. We'll see a better way to do things in this the next post, which discusses the nitty-gritty details of adding annotations file-by-file.


Many thanks for reading! If you've got any corrections, comments or queries, I'm available on Matrix at @dmrobertson:matrix.org.

How the UK's Online Safety Bill threatens Matrix

2021-05-19 — Tech, General — Denise Almeida

Last week the UK government published a draft of the proposed Online Safety Bill, after having initially introduced formal proposals for said bill in early 2020. With this post we aim to shed some light on its potential impacts and explain why we think that this bill - despite having great intentions - may actually be setting a dangerous precedent when it comes to our rights to privacy, freedom of expression and self determination.

The proposed bill aims to provide a legal framework to address illegal and harmful content online. This focus on “not illegal, but harmful” content is at the centre of our concerns - it puts responsibility on organisations themselves to arbitrarily decide what might be harmful, without any legal backing. The bill itself does not actually provide a definition of harmful, instead relying on service providers to assess and decide on this. This requirement to identify what is “likely to be harmful” applies to all users, children and adults. Our question here is - would you trust a service provider to decide what might be harmful to you and your children, with zero input from you as a user?

Additionally, the bill incentivises the use of privacy-invasive age verification processes which come with their own set of problems. This complete disregard of people’s right to privacy is a reflection of the privileged perspectives of those in charge of the drafting of this bill, which fails to acknowledge how actually harmful it would be for certain groups of the population to have their real life identity associated with their online identity.

Our view of the world, and of the internet, is largely different from the one presented by this bill. Now, this categorically does not mean we don’t care about online safety (it is quite literally our bread and butter) - we just fundamentally disagree with the approach taken.

Whilst we sympathise with the government’s desire to show action in this space and to do something about children’s safety (everyone’s safety really), we cannot possibly agree with the methods.

Back in October of 2020 we presented our proposed approach to online safety - ironically also in response to a government proposal, albeit about encryption backdoors. In it, we briefly discussed the dangers of absolute determinations of morality from a single cultural perspective:

As uncomfortable as it may be, one man’s terrorist is another man’s freedom fighter, and different jurisdictions have different laws - and it’s not up to the Matrix.org Foundation to play God and adjudicate.

We now find ourselves reading a piece of legislation that essentially demands these determinations from tech companies. The beauty of the human experience lies with its diversity and when we force technology companies to make calls about what is right or wrong - or what is “likely to have adverse psychological or physical impacts” on children - we end up in a dangerous place of centralising and regulating relative morals. Worst of all, when the consequence of getting it wrong is criminal liability for senior managers what do we think will happen?

Regardless of how omnipresent it is in our daily lives, technology is still not a solution for human problems. Forcing organisations to be judge and jury of human morals for the sake of “free speech” will, ironically, have severe consequences on free speech, as risk profiles will change for fear of liability.

Forcing a “duty of care” responsibility on organisations which operate online will not only drown small and medium sized companies in administrative tasks and costs, it will further accentuate the existing monopolies by Big Tech. Plainly, Big Tech can afford the regulatory burden - small start-ups can’t. Future creators will have their wings clipped from the offset and we might just miss out on new ideas and projects for fear of legal repercussions. This is a threat to the technology sector, particularly those building on emerging technologies like Matrix. In some ways, it is a threat to democracy and some of the freedoms this bill claims to protect.

These are, quite frankly, steps towards an authoritarian dystopia. If Trust & Safety managers start censoring something as natural as a nipple on the off chance it might cause “adverse psychological impacts” on children, whose freedom of expression are we actually protecting here?

More specifically on the issue of content moderation: the impact assessment provided by the government alongside this bill predicts that the additional costs for companies directly related to the bill will be in the billions, over the course of 10 years. The cost for the government? £400k, in every proposed policy option. Our question is - why are these responsibilities being placed on tech companies, when evidently this is a societal problem?

We are not saying it is up to the government to single-handedly end the existence of Child Sexual Abuse and Exploitation (CSAE) or extremist content online. What we are saying is that it takes more than content filtering, risk assessments and (faulty) age verification processes for it to end. More funding for tech literacy organisations and schools, to give children (and parents) the tools to stay safe is the first thing that comes to mind. Further investment in law enforcement cyber units and the judicial system, improving tech companies’ routes for abuse reporting and allowing the actual judges to do the judging seems pretty sensible too. What is absolutely egregious is the degradation of the digital rights of the majority, due to the wrongdoings of a few.

Our goal with this post is not to be dramatic or alarmist. However, we want to add our voices to the countless digital rights campaigners, individuals and organisations that have been raising the alarm since the early days of this bill. Just like with coercive control and abuse, the degradation of our rights does not happen all at once. It is a slippery slope that starts with something as (seemingly) innocuous as mandatory content scanning for CSAE content and ends with authoritarian surveillance infrastructure. It is our duty to put a stop to this before it even begins.

Twitter card image credit from Brazil, which feels all too familiar right now.

The Matrix Space Beta!

2021-05-17 — Tech, General — Matthew Hodgson

Hi all,

As many know, over the years we've experimented with how to let users locate and curate sets of users and rooms in Matrix. Back in Nov 2017 we added 'groups' (aka 'communities') as a custom mechanism for this - introducing identifiers beginning with a + symbol to represent sets of rooms and users, like +matrix:matrix.org.

However, it rapidly became obvious that Communities had some major shortcomings. They ended up being an extensive and entirely new API surface (designed around letting you dynamically bridge the membership of a group through to a single source of truth like LDAP) - while in practice groups have enormous overlap with rooms: managing membership, inviting by email, access control, power levels, names, topics, avatars, etc. Meanwhile the custom groups API re-invented the wheel for things like pushing updates to the client (causing a whole suite of problems). So clients and servers alike ended up reimplementing large chunks of similar functionality for both rooms and groups.

And so almost before Communities were born, we started thinking about whether it would make more sense to model them as a special type of room, rather than being their own custom primitive. MSC1215 had the first thoughts on this in 2017, and then a formal proposal emerged at MSC1772 in Jan 2019. We started working on this in earnest at the end of 2020, and christened the new way of handling groups of rooms and users as... Spaces!

Spaces work as follows:

  • You can designate specific rooms as 'spaces', which contain other rooms.
  • You can have a nested hierarchy of spaces.
  • You can rapidly navigate around that hierarchy using the new 'space summary' (aka space-nav) API - MSC2946.
  • Spaces can be shared with other people publicly, or invite-only, or private for your own curation purposes.
  • Rooms can appear in multiple places in the hierarchy.
  • You can have 'secret' spaces where you group your own personal rooms and spaces into an existing hierarchy.

Today, we're ridiculously excited to be launching Space support as a beta in matrix-react-sdk and matrix-android-sdk2 (and thus Element Web/Desktop and Element Android) and Synapse 1.34.0 - so head over to your nearest Element, make sure it's connected to the latest Synapse (and that Synapse has Spaces enabled in its config) and find some Space to explore! #community:matrix.org might be a good start :)

The beta today gives us the bare essentials: and we haven't yet finished space-based access controls such as setting powerlevels in rooms based on space membership (MSC2962) or limiting who can join a room based on their space membership (MSC3083) - but these will be coming asap. We also need to figure out how to implement Flair on top of Spaces rather than Communities.

This is also a bit of a turning point in Matrix's architecture: we are now using rooms more and more as a generic way of modelling new features in Matrix. For instance, rooms could be used as a structured way of storing files (MSC3089); Reputation data (MSC2313) is stored in rooms; Threads can be stored in rooms (MSC2836); Extensible Profiles are proposed as rooms too (MSC1769). As such, this pushes us towards ensuring rooms are as lightweight as possible in Matrix - and that things like sync and changing profile scale independently of the number of rooms you're in. Spaces effectively gives us a way of creating a global decentralised filesystem hierarchy on top of Matrix - grouping the existing rooms of all flavours into an epic multiplayer tree of realtime data. It's like USENET had a baby with the Web!

For lots more info from the Element perspective, head over to the Element blog. Finally, the point of the beta is to gather feedback and fix bugs - so please go wild in Element reporting your first impressions and help us make Spaces as awesome as they deserve to be!

Thanks for flying Matrix into Space;

Matthew & the whole Spaces (and Matrix) team.

Introducing the Pinecone overlay network

2021-05-06 — Tech — Neil Alexander

Since the end of 2019, we have spent quite a bit of time thinking about and exploring different technologies whilst building various demos for P2P Matrix. Our mission for P2P Matrix is to evolve Matrix into a hybrid between today's server-oriented network and a pure P2P network - empowering users to have total autonomy and privacy over their data if they want (by storing it in P2P Matrix, by embedding their server into their Matrix client), while also letting users store their data in serverside nodes if they so desire.

The goal is to protect metadata much better (as users no longer have to depend on a server run by someone else to communicate), as well as drive new features such as account portability, multi-homed accounts, low-bandwidth Matrix and smarter federation transports - and provide support for internet-less mesh communication via Matrix which can also interoperate with the wider network. You can read more about it in our Introducing P2P Matrix blog post from last summer, or watch our FOSDEM 2021 talk where we previewed Pinecone. It's important to note that this has been a small but important long-term project for Matrix, and has been progressing entirely outside our business-as-usual work of improving the core protocol and reference implementations.

As the project has progressed, we've built a variety of prototypes using existing libraries (go-libp2p, js-libp2p and Yggdrasil), demonstrating what an early P2P Matrix might feel like if it were running on a mobile device, in the web browser and so on using such an overlay network. Each of these demos has taught us something new, and so in October 2020 we decided to take this knowledge to build an experimental new overlay network of our own.

Pinecone is designed to provide end-to-end encrypted communications between devices, regardless of how they are connected to one another, in a lightweight and self-arranging fashion. The routing protocol is a hybrid, taking inspiration from Yggdrasil by building a global spanning tree, but rather than forwarding all traffic using the spanning tree topology, we use it as a bootstrap routing mechanism for a line/snake topology, ordered by their ed25519 public keys, which we have affectionately named SNEK (Sequentially Networked Edwards Key) routing.

Nodes seek out their closest keyspace neighbours on the network and paths are built between these pairs of nodes, similar to how a Chord DHT functions, populating the routing tables of intermediate nodes in the process. These paths are then used to forward traffic without having to perform up-front searches, allowing for very fast connection setups between overlay nodes. These paths are resilient to network topology changes and handle node mobility considerably better than any other name-independent routing scheme that we have seen — early results are very promising so far. We have also been experimenting with a combination of the μTP (Micro Transport Protocol) and TLS to provide stateful connection setup, congestion control and end-to-end encryption for all federation traffic carried over the Pinecone network.

Pinecone simulator showing line/snake logical network topology

If Pinecone works out, our intention is to collaborate with the libp2p and IPFS team to incorporate Pinecone routing into libp2p (if they'll have us!) while incorporating their gossipsub routing to improve Matrix federation... and get the best of both worlds :)

Today we're releasing the source code for our current early implementation of Pinecone — you can get it from GitHub right now! It's very experimental still and not very well optimised yet, but it is the foundation of our latest mobile P2P Matrix demos, which support P2P Matrix over both Bluetooth Low Energy mesh networks, multicast DNS discovery within a LAN, and/or by routing through static Pinecone peers on the Internet:

Building a routing overlay is only the first step in the journey towards P2P Matrix. We will also be looking closely in the coming months at improving the Matrix federation protocol to work well in mixed-connectivity scenarios (rather than the full mesh approach used today) as well as decentralised identities, hybrid deployments with existing homeservers and getting Dendrite (the Matrix homeserver which is embedded into the current P2P demos) more stable and feature-complete.

The long-term plan could look something like this:

Diagram showing possible P2P Matrix stack

Most discussion around P2P Matrix takes place in #p2p:matrix.org, so if you are interested in what's going on, please join us there!

NextPage 2