Tech

61 posts tagged with "Tech" (See all Category)

Atom Feed

Matrix.org homeserver outage (25th Jan 2017)

25.01.2017 00:00 — Tech Matthew Hodgson

Hi folks,

As many will have noticed there was a major outage on the Matrix homeserver for matrix.org last night (UK-time). This impacted anyone with an account on the matrix.org server, as well as anyone using matrix.org-hosted bots & bridges. As Matrix rooms are shared over all participants, rooms with participants on other servers were unaffected (for users on those servers). Here's a quick explanation of what went wrong (times are UTC):

  • 2017-01-24 16:00 - We notice that we're badly running out of diskspace on the matrix.org backup postgres replica. (Turns out the backup box, whilst identical hardware to the master, had been built out as RAID-10 rather than RAID-5 and so has less disk space).
  • 2017-01-24 17:00 - We decide to drop a large DB index: event_push_actions(room_id, event_id, user_id, profile_tag), which was taking up a disproportionate amount of disk space, on the basis that it didn't appear to be being used according to the postgres stats. All seems good.
  • 2017-01-24 ~23:00 - The core matrix.org team go to bed.
  • 2017-01-24 23:33 - Someone redacts an event in a very active room (probably #matrix:matrix.org) which necessitates redacting the associated push notification from the event_push_actions table. This takes out a lock within persist_event, which is then blocked on deleting the push notification. It turns out that this deletion requires the missing DB constraint, causing the query to run for hours whilst holding the transaction lock. The symptoms are that anything reading events from the DB was blocked on the transaction, causing messages not to be relayed to other clients or servers despite appearing to send correctly. Meanwhile, the fact that events are being received by the server fine (including over federation) makes the monitoring graphs look largely healthy.
  • 2017-01-24 23:35 - End-to-end monitoring detects problems, and sends alerts into pagerduty and various Matrix rooms. Unfortunately we'd failed to upgrade the pageduty trial into a paid account a few months ago, however, so the alerts are lost.
  • 2017-01-25 08:00 - Matrix team starts to wake up and spot problems, but confusion over the right escalation process (especially with Matthew on holiday) means folks assume that other members of the team must already be investigating.
  • 2017-01-25 09:00 - Server gets restarted, service starts to resume, although box suffers from load problems as traffic tries to catch up.
  • 2017-01-25 09:45 - Normal service on the homeserver itself is largely resumed (other than bridges; see below)
  • 2017-01-25 10:41 - Root cause established and the redaction path is patched on matrix.org to stop a recurrence.
  • 2017-01-25 11:15 - Bridges are seen to be lagging and taking much longer to recover than expected. Decision made to let them continue to catch up normally rather than risk further disruption (e.g. IRC join/part spam) by restarting them.
  • 2017-01-25 13:00 - All hosted bridges returned to normal.

Obviously this is rather embarrassing, and a huge pain for everyone using the matrix.org homeserver - many apologies indeed for the outage. On the plus side, all the other Matrix homeservers out there carried on without noticing any problems (which actually complicated spotting that things had broken, given many of the core team primarily use their personal homeservers).

In some ways the root cause here is that the core team has been focusing all its energy recently on improving the overall Matrix codebase rather than operational issues on matrix.org itself, and as a result our ops practices have fallen behind (especially as the health of the Matrix ecosystem as a whole is arguably more important than the health of a single homeserver deployment). However, we clearly need to improve things here given the number of people (>750K at the last count) dependent on the Matrix.org homeserver and its bridges & bots.

Lessons learnt on our side are:

  • Make sure that even though we had monitoring graphs & thresholds set up on all the right things... monitoring alerts actually have to be routed somewhere useful - i.e. phone calls to the team's phones. Pagerduty is now set up and running properly to this end.
  • Make sure that people know to wake up the right people anyway if the monitoring alerting system fails.
  • To be even more paranoid about hotfixes to production at 5pm, especially if they can wait 'til the next day (as this one could have).
  • To investigate ways to rapidly recover bridges without causing unnecessary disruption.

Apologies again to everyone who was bitten by this - we're doing everything we can to ensure it doesn't happen again.

Matthew & the team.

Synapse 0.18.7 is out - Please upgrade, especially if on 0.18.5 or 0.18.6.

06.01.2017 00:00 — Tech Matthew Hodgson

Hi all,

TL;DR: Please upgrade to Synapse 0.18.6, especially if you are on 0.18.5 which is a bad release.

TL;DR: Please upgrade to Synapse 0.18.7 - especially if you are on 0.18.5 or 0.18.6 which both have serious federation bugs.

Synapse 0.18.5 contained a really nasty regression in the federation code which causes servers to echo transactions that they receive back out to the other servers participating in a room. This has effectively resulted in a gradual amplification of federation traffic as more people have installed 0.18.5, causing every transaction to be received N times over where N is the number of servers in the room.

We'll do a full write-up once we're happy we've tracked down all the root problems here, but the short story is that this hit critical mass around Dec 26, where typical Synapses started to fail to keep up with the traffic - especially when requests hit some of the more inefficient or buggy codepaths in Synapse.  As servers started to overload with inbound connections, this in turn started to slow down and consume resources on the connecting servers - especially due to an architectural mistake in Synapse which blocks inbound connections until the request has been fully processed (which could require the receiving server in turn to make outbound connections), rather than releasing the inbound connection asap.  This hit the point that servers were running out of file descriptors due to all the outbound and inbound connections, at which point they started to entirely tarpit inbound connections, resulting in a slow feedback loop making the whole situation even worse.

We've spent the last two weeks hunting all the individual inefficient requests which were mysteriously starting to cause more problems than they ever had before; then trying to understand the feedback misbehaviour; before finally discovering the regression in 0.18.5 as the plausible root cause of the problem.  Troubleshooting has been complicated by most of the team having unplugged for the holidays, and because this is the first (and hopefully last!) failure mode to be distributed across the whole network, making debugging something of a nightmare - especially when the overloading was triggering a plethora of different exotic failure modes.  Huge thanks to everyone who has shared their server logs with the team to help debug this.

Some of these failure modes are still happening (and we're working on fixing them), but we believe that if everyone upgrades away from the bad 0.18.5 release most of the symptoms will go away, or at least go back to being as bad as they were before.  Meanwhile, if you find your server suddenly grinding to a halt after upgrading to 0.18.6 0.18.7 please come tell us in #matrix-dev:matrix.org.

We're enormously sorry if you've been bitten by the federation instability this has caused - and many many thanks for your patience whilst we've hunted it down.  On the plus side, it's given us a lot of very useful insight into how to implement federation in future homeservers to not suffer from any of these failure modes.  It's also revealed the root cause of why Synapse's RAM usage is quite so bad - it turns out that it actually idles at around 200MB with default caching, but there's a particular codepath which causes it to spike temporarily by 1GB or so - and that RAM is then not released back to the OS.  We're working on a fix for this too, but it'll come after 0.18.7.

Unfortunately the original release of 0.18.6 still exhibits the root bug, but 0.18.7 (originally released as 0.18.7-rc2) should finally fix this.  Sorry for all the upgrades :(

So please upgrade as soon as possible to 0.18.7. Debian packages are available as normal.

thanks,

Matthew

🔗Changes in synapse v0.18.7 (2017-01-09)

  • No changes from v0.18.7-rc2

🔗Changes in synapse v0.18.7-rc2 (2017-01-07)

Bug fixes:

  • Fix error in rc1's discarding invalid inbound traffic logic that was incorrectly discarding missing events

🔗Changes in synapse v0.18.7-rc1 (2017-01-06)

Bug fixes:

  • Fix error in #PR 1764 to actually fix the nightmare #1753 bug.
  • Improve deadlock logging further
  • Discard inbound federation traffic from invalid domains, to immunise against #1753

🔗Changes in synapse v0.18.6 (2017-01-06)

Bug fixes:

  • Fix bug when checking if a guest user is allowed to join a room - thanks to Patrik Oldsberg (PR #1772)

🔗Changes in synapse v0.18.6-rc3 (2017-01-05)

Bug fixes:

  • Fix bug where we failed to send ban events to the banned server (PR #1758)
  • Fix bug where we sent event that didn't originate on this server to other servers (PR #1764)
  • Fix bug where processing an event from a remote server took a long time because we were making long HTTP requests (PR #1765, PR #1744)
Changes:
  • Improve logging for debugging deadlocks (PR #1766, PR #1767)

🔗Changes in synapse v0.18.6-rc2 (2016-12-30)

Bug fixes:

  • Fix memory leak in twisted by initialising logging correctly (PR #1731)
  • Fix bug where fetching missing events took an unacceptable amount of time in large rooms (PR #1734)

🔗Changes in synapse v0.18.6-rc1 (2016-12-29)

Bug fixes:

  • Make sure that outbound connections are closed (PR #1725)

matrix-appservice-irc 0.7.0 is out!

19.12.2016 00:00 — Tech Matthew Hodgson

Also, we've just released a major update to the IRC bridge codebase after trialling it on the matrix.org-hosted bridges for the last few days.

The big news is:

  • The bridge uses Synapse 0.18.5's new APIs for managing the public room list (improving performance a bunch)
  • Much faster startup using the new /joined_rooms and /joined_members APIs in Synapse 0.18.5
  • The bridge will now remember your NickServ password (encrypted at rest) if you want it to via the !storepass command
  • You can now set arbitrary user modes for IRC clients on connection (to mitigate PM spam etc)
  • After a split, the bridge will drop Matrix->IRC messages older than N seconds, rather than try to catch the IRC room up on everything they missed on Matrix :S
  • Operational metrics are now implemented using Prometheus rather than statsd
  • New !quit command to nuke your user from the remote IRC network
  • Membership list syncing for IRC->Matrix is enormously improved, and enabled for all matrix.org-hosted bridges apart from Freenode.  <b>At last, membership lists should be in sync between IRC and Matrix; please let us know if they're not</b>.
  • Better error logging
For full details, please see the changelog.

With things like NickServ-pass storing, !quit support and full bi-directional membership list syncing, it's never been a better time to run your own IRC bridge.  Please install or upgrade today from https://github.com/matrix-org/matrix-appservice-irc!

Synapse 0.18.5 released!

19.12.2016 00:00 — Tech Matthew Hodgson

Hi folks,

We released synapse 0.18.5 on Friday.  This is mainly about fixing performance problems with the unread room counts and the public room directory; polishing the E2E endpoints based on beta feedback; and general minor bits and bobs.

Get it whilst it's (almost) hot from https://github.com/matrix-org/synapse!  Changelog follows:

🔗Changes in synapse v0.18.5 (2016-12-16)

Bug fixes:

  • Fix federation /backfill returning events it shouldn't (PR #1700)
  • Fix crash in url preview (PR #1701)

🔗Changes in synapse v0.18.5-rc3 (2016-12-13)

Features:

  • Add support for E2E for guests (PR #1653)
  • Add new API appservice specific public room list (PR #1676)
  • Add new room membership APIs (PR #1680)
Changes:
  • Enable guest access for private rooms by default (PR #653)
  • Limit the number of events that can be created on a given room concurrently (PR #1620)
  • Log the args that we have on UI auth completion (PR #1649)
  • Stop generating refresh_tokens (PR #1654)
  • Stop putting a time caveat on access tokens (PR #1656)
  • Remove unspecced GET endpoints for e2e keys (PR #1694)
Bug fixes:
  • Fix handling of 500 and 429's over federation (PR #1650)
  • Fix Content-Type header parsing (PR #1660)
  • Fix error when previewing sites that include unicode, thanks to @kyrias (PR #1664)
  • Fix some cases where we drop read receipts (PR #1678)
  • Fix bug where calls to /sync didn't correctly timeout (PR #1683)
  • Fix bug where E2E key query would fail if a single remote host failed (PR #1686)

🔗Changes in synapse v0.18.5-rc2 (2016-11-24)

Bug fixes:

  • Don't send old events over federation, fixes bug in -rc1.

🔗Changes in synapse v0.18.5-rc1 (2016-11-24)

Features:

  • Implement "event_fields" in filters (PR #1638)
Changes:
  • Use external ldap auth package (PR #1628)
  • Split out federation transaction sending to a worker (PR #1635)
  • Fail with a coherent error message if /sync?filter= is invalid (PR #1636)
  • More efficient notif count queries (PR #1644)

Synapse 0.18.4

22.11.2016 00:00 — Tech Matthew Hodgson

Uncharacteristically, we're actually remembering to announce a new release of Synapse!

Major performance fixes on federation, as well as the changes required to support E2E encrypted attachments (yay!)

Please install or upgrade from https://github.com/matrix-org/synapse :)

🔗Changes in synapse v0.18.4 (2016-11-22)

Bug fixes:

  • Add workaround for buggy clients that the fail to register (PR #1632)

🔗Changes in synapse v0.18.4-rc1 (2016-11-14)

Changes:

  • Various database efficiency improvements (PR #1188, #1192)
  • Update default config to blacklist more internal IPs, thanks to Euan Kemp @euank (PR #1198)
  • Allow specifying duration in minutes in config, thanks to Daniel Dent @DanielDent (PR #1625)
Bug fixes:
  • Fix media repo to set CORs headers on responses (PR #1190)
  • Fix registration to not error on non-ascii passwords (PR #1191)
  • Fix create event code to limit the number of prev_events (PR #1615)
  • Fix bug in transaction ID deduplication (PR #1624)

Matrix-IRC Bridge reaches v0.4.0

15.08.2016 00:00 — Tech Kegan Dougal

A new version of the IRC bridge has been released onto NPM and the matrix.org bridges!

The IRC bridge has undergone quite a number of modifications since its original inception over a year ago. Version 0.4 introduces a number of additional features and improvements, which can be found in the changelog. These include automatically linkifying large blocks of text and mirroring kicks/bans to and from Matrix.

With a plethora of protocol gotchas and non-standard features on well-known IRC networks, IRC is a challenging protocol to work with. It's inevitable that some corner cases are not handled well by the bridge. Over time, the bridge has been hardened by edge cases which we have encountered and patched. These releases signify the continual improvement in the robustness of the bridge, which we aim to continue with into the foreseeable future.

Performance wise, our busiest bridge which we host is the bridge to Freenode. We now have over 1300 active connections to it and have a steady rate of about 240 messages per minute going through to Matrix. We expect to see this number increase significantly over the next few months. Let's see what the next year will bring!

Synapse 0.14 is released!

30.03.2016 00:00 — Tech Matthew Hodgson

We just released Synapse 0.14.0 - a major update which incorporates lots of work on making Synapse more RAM efficient. There's still a lot of room for further improvements, but the main headlines are reducing the resident memory footprint dramatically by interning strings and deduplicating events across the many different caches. It also adds a much-needed SYNAPSE_CACHE_FACTOR environment variable that can be used to globally decrease or increase the sizing of all of Synapse's various caches (with an associated slow-down or speed-up in performance). Quite how improved the new memory footprint seems to very much depend on your own use case, but it's certainly a step in the right direction.

For more details on recent Synapse performance work (and a general state of the union for the whole Matrix ecosystem), check out our Spring update.

Get all new synapse from https://github.com/matrix-org/synapse - we recommend upgrading (or installing!) asap :)

Full changelog follows:

🔗Changes in synapse v0.14.0 (2016-03-30)

No changes from v0.14.0-rc2

🔗Changes in synapse v0.14.0-rc2 (2016-03-23)

Features:

  • Add published room list API (PR #657)
Changes:
  • Change various caches to consume less memory (PR #656, #658, #660, #662, #663, #665)
  • Allow rooms to be published without requiring an alias (PR #664)
  • Intern common strings in caches to reduce memory footprint (#666)
Bug fixes:
  • Fix reject invites over federation (PR #646)
  • Fix bug where registration was not idempotent (PR #649)
  • Update aliases event after deleting aliases (PR #652)
  • Fix unread notification count, which was sometimes wrong (PR #661)

🔗Changes in synapse v0.14.0-rc1 (2016-03-14)

Features:

  • Add event_id to response to state event PUT (PR #581)
  • Allow guest users access to messages in rooms they have joined (PR #587)
  • Add config for what state is included in a room invite (PR #598)
  • Send the inviter's member event in room invite state (PR #607)
  • Add error codes for malformed/bad JSON in /login (PR #608)
  • Add support for changing the actions for default rules (PR #609)
  • Add environment variable SYNAPSE_CACHE_FACTOR, default it to 0.1 (PR #612)
  • Add ability for alias creators to delete aliases (PR #614)
  • Add profile information to invites (PR #624)
Changes:
  • Enforce user_id exclusivity for AS registrations (PR #572)
  • Make adding push rules idempotent (PR #587)
  • Improve presence performance (PR #582, #586)
  • Change presence semantics for last_active_ago (PR #582, #586)
  • Don't allow m.room.create to be changed (PR #596)
  • Add 800x600 to default list of valid thumbnail sizes (PR #616)
  • Always include kicks and bans in full /sync (PR #625)
  • Send history visibility on boundary changes (PR #626)
  • Register endpoint now returns a refresh_token (PR #637)
Bug fixes:
  • Fix bug where we returned incorrect state in /sync (PR #573)
  • Always return a JSON object from push rule API (PR #606)
  • Fix bug where registering without a user id sometimes failed (PR #610)
  • Report size of ExpiringCache in cache size metrics (PR #611)
  • Fix rejection of invites to empty rooms (PR #615)
  • Fix usage of bcrypt to not use checkpw (PR #619)
  • Pin pysaml2 dependency (PR #634)
  • Fix bug in /sync where timeline order was incorrect for backfilled events (PR #635)

Synapse 0.13 released!

10.02.2016 00:00 — Tech Matthew Hodgson

Hi all,

Synapse 0.13 was released this afternoon, bringing a new wave of features, bug fixes and performance fixes. The main headlines include: huge performance increases (big catchup /syncs that were taking 20s now take 0.3s!), support for server-side per-room unread message and notification badge counts, ability for guest accounts to upgrade into fully-fledged accounts, change default push rules back to notifying for group chats, and loads of bug fixes. This release incorporates what-was 0.12.1-rc1.

Please note that on first launch after upgrading a pre-0.13 server to 0.13 or later, synapse will add a large database index which may take several minutes to complete. Whilst the index is added the service will be unresponsive.

Please get the new release from https://github.com/matrix-org/synapse and have fun!

Matthew

Full release notes:

Changes in synapse v0.13.1 (2016-02-10) =======================================
  • Bump matrix-angular-sdk (matrix web console) dependency to 0.6.8 to pull in the fix for SYWEB-361 so that the default client can display HTML messages again(!)

🔗Changes in synapse v0.13.0 (2016-02-10)

This version includes an upgrade of the schema, specifically adding an index to the events table. This may cause synapse to pause for several minutes the first time it is started after the upgrade.

Changes:

  • Improve general performance (PR #540, #543. #544, #54, #549, #567)
  • Change guest user ids to be incrementing integers (PR #550)
  • Improve performance of public room list API (PR #552)
  • Change profile API to omit keys rather than return null (PR #557)
  • Add /media/r0 endpoint prefix, which is equivalent to /media/v1/ (PR #595)

Bug fixes:

  • Fix bug with upgrading guest accounts where it would fail if you opened the registration email on a different device (PR #547)
  • Fix bug where unread count could be wrong (PR #568)

🔗Changes in synapse v0.12.1-rc1 (2016-01-29)

Features:

  • Add unread notification counts in /sync (PR #456)
  • Add support for inviting 3pids in /createRoom (PR #460)
  • Add ability for guest accounts to upgrade (PR #462)
  • Add /versions API (PR #468)
  • Add event to /context API (PR #492)
  • Add specific error code for invalid user names in /register (PR #499)
  • Add support for push badge counts (PR #507)
  • Add support for non-guest users to peek in rooms using /events (PR #510)

Changes:

  • Change /sync so that guest users only get rooms they've joined (PR #469)
  • Change to require unbanning before other membership changes (PR #501)
  • Change default push rules to notify for all messages (PR #486)
  • Change default push rules to not notify on membership changes (PR #514)
  • Change default push rules in one to one rooms to only notify for events that are messages (PR #529)
  • Change /sync to reject requests with a from query param (PR #512)
  • Change server manhole to use SSH rather than telnet (PR #473)
  • Change server to require AS users to be registered before use (PR #487)
  • Change server not to start when ASes are invalidly configured (PR #494)
  • Change server to require ID and as_token to be unique for AS's (PR #496)
  • Change maximum pagination limit to 1000 (PR #497)

Bug fixes:

  • Fix bug where /sync didn't return when something under the leave key changed (PR #461)
  • Fix bug where we returned smaller rather than larger than requested thumbnails when method=crop (PR #464)
  • Fix thumbnails API to only return cropped thumbnails when asking for a cropped thumbnail (PR #475)
  • Fix bug where we occasionally still logged access tokens (PR #477)
  • Fix bug where /events would always return immediately for guest users (PR #480)
  • Fix bug where /sync unexpectedly returned old left rooms (PR #481)
  • Fix enabling and disabling push rules (PR #498)
  • Fix bug where /register returned 500 when given unicode username (PR #513)

Synapse 0.12 released!

04.01.2016 00:00 — Tech Matthew Hodgson

Happy 2016 everyone!

To greet the new year, we bring you all new Synapse 0.12. The focus here has been a wide range of polishing, bugfixes, performance improvements and feature tweaks. The biggest news are that the 'v2' sync APIs are now production ready; the search APIs now work much better; 3rd party ID invites now work; and we now mount the whole client-server API under the /_matrix/client/r0 URI prefix, as per the r0.0.0 release of the Client Server API from a few weeks ago. The r0 release unifies what were previously the somewhat confusing mix of 'v1' and 'v2' APIs as a single set of endpoints which play nice together.

We highly recommend all homeservers upgrading to v0.12.0 as soon as possible. Get it now from https://github.com/matrix-org/synapse/ or via our shiny new Debian packages at https://matrix.org/packages/debian/.

Full changelog follows:

🔗Changes in synapse v0.12.0 (2016-01-04)

  • Expose /login under r0 (PR #459)

🔗Changes in synapse v0.12.0-rc3 (2015-12-23)

  • Allow guest accounts access to /sync (PR #455)
  • Allow filters to include/exclude rooms at the room level rather than just from the components of the sync for each room. (PR #454)
  • Include urls for room avatars in the response to /publicRooms (PR #453)
  • Don't set a identicon as the avatar for a user when they register (PR #450)
  • Add a display_name to third-party invites (PR #449)
  • Send more information to the identity server for third-party invites so that it can send richer messages to the invitee (PR #446)
  • Cache the responses to /initialSync for 5 minutes. If a client retries a request to /initialSync before the a response was computed to the first request then the same response is used for both requests (PR #457)
  • Fix a bug where synapse would always request the signing keys of remote servers even when the key was cached locally (PR #452)
  • Fix 500 when pagination search results (PR #447)
  • Fix a bug where synapse was leaking raw email address in third-party invites (PR #448)

🔗Changes in synapse v0.12.0-rc2 (2015-12-14)

  • Add caches for whether rooms have been forgotten by a user (PR #434)
  • Remove instructions to use --process-dependency-link since all of the dependencies of synapse are on PyPI (PR #436)
  • Parallelise the processing of /sync requests (PR #437)
  • Fix race updating presence in /events (PR #444)
  • Fix bug back-populating search results (PR #441)
  • Fix bug calculating state in /sync requests (PR #442)

🔗Changes in synapse v0.12.0-rc1 (2015-12-10)

  • Host the client APIs released as r0 by https://matrix.org/docs/spec/r0.0.0/client_server.html on paths prefixed by/_matrix/client/r0. (PR #430, PR #415, PR #400)
  • Updates the client APIs to match r0 of the matrix specification.
    • All APIs return events in the new event format, old APIs also include the fields needed to parse the event using the old format for compatibility. (PR #402)
    • Search results are now given as a JSON array rather than a JSON object (PR #405)
    • Miscellaneous changes to search (PR #403, PR #406, PR #412)
    • Filter JSON objects may now be passed as query parameters to /sync (PR #431)
    • Fix implementation of /admin/whois (PR #418)
    • Only include the rooms that user has left in /sync if the client requests them in the filter (PR #423)
    • Don't push for m.room.message by default (PR #411)
    • Add API for setting per account user data (PR #392)
    • Allow users to forget rooms (PR #385)
  • Performance improvements and monitoring:
    • Add per-request counters for CPU time spent on the main python thread. (PR #421, PR #420)
    • Add per-request counters for time spent in the database (PR #429)
    • Make state updates in the C+S API idempotent (PR #416)
    • Only fire user_joined_room if the user has actually joined. (PR #410)
    • Reuse a single http client, rather than creating new ones (PR #413)
  • Fixed a bug upgrading from older versions of synapse on postgresql (PR #417)

Matrix: One year in.

07.09.2015 00:00 — Tech Matthew Hodgson

Hi all,

Just realised that the release of Synapse 0.10.0 on Sept 3rd 2015 was precisely one year from the initial launch of Matrix. As such, it feels only right to have a quick update on where we've got to so far, and where we expect things to go from here!

Back at the original launch, all we had was a very rough and ready Synapse homeserver, an early draft of the spec, and precisely one client - the Angular webclient, much of which was mainly written at the last minute on the plane to TechCrunch Disrupt SF (and has never quite recovered :S). From this initial seed it's been incredibly exciting and slightly scary to see how much things have advanced and grown - the big headlines for the past year (in roughly chronological order) include:

  • Making Federation Work:
    • Switching all of federation over to SSL, using perspectives for key management
    • Crypto-signing all the events in a room's message graph to assert integrity
    • Sorting out 'power levels' and 'auth events' to allow totally decentralised kicks/bans/etc to work in an open federated environment
  • Decentralised content repository and thumbnailing
  • Reference mobile "Matrix Console" clients for iOS and Android
  • Official client SDKs for iOS and Android - both at the API wrapping layer and the reusable UI component layer
  • Push notifications for APNS and GCM (both on server & clients)
  • Official client SDKs for JavaScript, Python and Perl
  • Typing notifications
  • The sytest integration test harness
  • Proper WebRTC support for VoIP, including TURN.
  • Application Services and Bots - actually letting Matrix defragment communications :)
    • Bridging to all of Freenode, Moznet and other IRC networks
    • Matrix<->SMS bridge from OpenMarket
    • SIP bridges via FreeSWITCH and Verto
    • Parrot Bebop Drone <-> Matrix bridge via Janus
    • ODB2 telemetry <->  Matrix bridge via Android SDK
    • Reusable bridging framework in Node
  • Many iterations and refinements to the spec, including designing v2 of the client SDK
  • Switching from Angular to React for all of our web-client development
  • Customisable skins and embedding support for the matrix-react-sdk
  • End-to-end encryption (not quite formally released yet, but it's written, specced and it works!)
  • VoIP support on mobile (landed in Android; still experimenting with different WebRTC stacks on iOS)
  • History ACLs
  • Delivery reports
  • Switching from access_tokens to macaroons for authentication (not yet released)
  • Lots and lots of performance work on Synapse, as we've tried to get the most out of Twisted.
...and last but not least, the evolution of the #matrix:matrix.org community - including loads of 3rd party clients, SDKs and application services, synapse packaging and even experimental home servers :)

Overall the last year was an exercise in actually fleshing out the whole ecosystem of Matrix and getting it to a stable usable beta acceptable to early adopters. The plan for the next 12 months is then to make the transition from stable beta to a properly production grade environment that can be used to run large scale services used by normal end-users. In practice, this means:

  • A major rearchitecture of Synapse.
    • Synapse currently has no support for horizontal scaling or clustering within a single instance, and many will have seen the performance problems we've hit with a relatively monolithic Twisted app architecture. Profiling deferreds in Twisted has been a particular nightmare.
    • During September we are starting the process of splitting Synapse apart into separate services (e.g. separating reading eventstreams from writing messages) both to allow horizontal scalability and to experiment with implementing the services in more efficient languages than Python/Twisted.
    • We will continue the normal Synapse release process in parallel with this work.
  • Ensuring Matrix can support a genuinely excellent UX for normal end-users on glossy clients, and supporting glossy client development as required.  The days of Matrix being just for powerusers are numbered... :)
  • Switching to use 3rd party IDs as the primary means of referring to users in Matrix, hiding matrix IDs as a feature for powerusers and developers.
  • Finishing the spec. You may have noticed the spec has been quietly evolving over the last few months - finally gaining a versioning system, and with larger chunks of it being automatically generated from formal API spec descriptions. We will be finishing off and filling in the remaining holes.
  • Improving the documentation (and FAQ!) on matrix.org in general by switching to a git-backed jekyll system for all the staticish content
  • Replace the Angular-based reference webapp bundled with Synapse entirely with a matrix-react-sdk based reference app, and providing better examples and documentation for using it to embed Matrix functionality into existing websites.
  • Moving to v2 of the client-server API. This fixes some significant limitations in the v1 API that everyone's been using all year, and should improve performance significantly for many use cases (especially when launching apps). The v1 API will hang around for a very very long time for backwards compatibility.
  • Writing *lots* more bridges and integrations to other protocols, now we have a nice framework for rapidly developing them.
  • General security audits and double-checking the security model.
  • New features, including:
    • Multiway VoIP and Video conferencing, most likely using FreeSWITCH's new conferencing capabilities via an Application Service bridge (should be ready very shortly!)
    • Getting E2E crypto reviewed/audited and putting it live across the board.
    • Adding VoIP to iOS
    • Implementing delivery reports in all clients
    • Improving how invites work (ability to reject them; metadata about where they came from)
    • Search API
    • Improved file management
  • ...and an awful lot of bug fixing as we work through the backlog we've accumulated on JIRA.
Hopefully this won't take up all year(!) and is just a beginning - there's a huge list of interesting ideas beyond this baseline which we'll be looking at assuming the core stuff above is on track. For instance, we need to work out how to decentralise the identity services that mapping 3rd party IDs to matrix IDs. We need to work out how to avoid spam. And it could be fascinating to start bridging more internet-of-things devices and ecosystems into Matrix, or decentralising user accounts between homeservers, or perhaps using Matrix for synchronising more sophisticated data structures than timelines and key-value state dictionaries...

Finally, we also want to save as much time as possible to help support the wider community in building out clients, services and servers on top of Matrix. Just like the web itself, Matrix is only as useful as the content and services built on top of it - and we will do everything we can to help the pioneers who are interested in colonising this brave new world :)

Huge thanks to everyone over the last year who have supported us - whether that's by creating an account and using the system, running a homeserver, hacking on top of the platform, contributing to the core project, enduring one of our presentations, or even paying for us to work on this. The coming year should prove incredibly interesting, and we hope you'll stay and bring along all your friends, family and colleagues for the ride as the adventure continues!

Matthew, Amandine & the whole Matrix.org team.