1 Appendices

Warning

You are viewing an outdated version of this specification. To view the current specification, please click here.

Table of Contents

1 Appendices
2 Unpadded Base64
3 Signing JSON
- 3.1 Canonical JSON
  - 3.1.1 Grammar
  - 3.1.2 Examples
- 3.2 Signing Details
- 3.3 Checking for a Signature
4 Identifier Grammar
- 4.1 Server Name
- 4.2 Common Identifier Format
5 3PID Types
- 5.1 E-Mail
- 5.2 PSTN Phone numbers
6 Security Threat Model
7 Cryptographic Test Vectors
- 7.1 Signing Key
- 7.2 JSON Signing
- 7.3 Event Signing

2 Unpadded Base64

Unpadded Base64 refers to 'standard' Base64 encoding as defined in RFC 4648, without "=" padding. Specifically, where RFC 4648 requires that encoded data be padded to a multiple of four characters using = characters, unpadded Base64 omits this padding.

For reference, RFC 4648 uses the following alphabet for Base 64:

Value Encoding  Value Encoding  Value Encoding  Value Encoding
    0 A            17 R            34 i            51 z
    1 B            18 S            35 j            52 0
    2 C            19 T            36 k            53 1
    3 D            20 U            37 l            54 2
    4 E            21 V            38 m            55 3
    5 F            22 W            39 n            56 4
    6 G            23 X            40 o            57 5
    7 H            24 Y            41 p            58 6
    8 I            25 Z            42 q            59 7
    9 J            26 a            43 r            60 8
   10 K            27 b            44 s            61 9
   11 L            28 c            45 t            62 +
   12 M            29 d            46 u            63 /
   13 N            30 e            47 v
   14 O            31 f            48 w
   15 P            32 g            49 x
   16 Q            33 h            50 y

Examples of strings encoded using unpadded Base64:

UNPADDED_BASE64("") = ""
UNPADDED_BASE64("f") = "Zg"
UNPADDED_BASE64("fo") = "Zm8"
UNPADDED_BASE64("foo") = "Zm9v"
UNPADDED_BASE64("foob") = "Zm9vYg"
UNPADDED_BASE64("fooba") = "Zm9vYmE"
UNPADDED_BASE64("foobar") = "Zm9vYmFy"

When decoding Base64, implementations SHOULD accept input with or without padding characters wherever possible, to ensure maximum interoperability.

3 Signing JSON

Various points in the Matrix specification require JSON objects to be cryptographically signed. This requires us to encode the JSON as a binary string. Unfortunately the same JSON can be encoded in different ways by changing how much white space is used or by changing the order of keys within objects.

Signing an object therefore requires it to be encoded as a sequence of bytes using Canonical JSON, computing the signature for that sequence and then adding the signature to the original JSON object.

3.1 Canonical JSON

We define the canonical JSON encoding for a value to be the shortest UTF-8 JSON encoding with dictionary keys lexicographically sorted by Unicode codepoint. Numbers in the JSON must be integers in the range [-(2**53)+1, (2**53)-1].

We pick UTF-8 as the encoding as it should be available to all platforms and JSON received from the network is likely to be already encoded using UTF-8. We sort the keys to give a consistent ordering. We force integers to be in the range where they can be accurately represented using IEEE double precision floating point numbers since a number of JSON libraries represent all numbers using this representation.

Warning

Events in room versions 1, 2, 3, 4, and 5 might not be fully compliant with these restrictions. Servers SHOULD be capable of handling JSON which is considered invalid by these restrictions where possible.

The most notable consideration is that integers might not be in the range specified above.

Note

Float values are not permitted by this encoding.

import json

def canonical_json(value):
    return json.dumps(
        value,
        # Encode code-points outside of ASCII as UTF-8 rather than \u escapes
        ensure_ascii=False,
        # Remove unnecessary white space.
        separators=(',',':'),
        # Sort the keys of dictionaries.
        sort_keys=True,
        # Encode the resulting Unicode as UTF-8 bytes.
    ).encode("UTF-8")

3.1.1 Grammar

Adapted from the grammar in http://tools.ietf.org/html/rfc7159 removing insignificant whitespace, fractions, exponents and redundant character escapes.

value     = false / null / true / object / array / number / string
false     = %x66.61.6c.73.65
null      = %x6e.75.6c.6c
true      = %x74.72.75.65
object    = %x7B [ member *( %x2C member ) ] %7D
member    = string %x3A value
array     = %x5B [ value *( %x2C value ) ] %5B
number    = [ %x2D ] int
int       = %x30 / ( %x31-39 *digit )
digit     = %x30-39
string    = %x22 *char %x22
char      = unescaped / %x5C escaped
unescaped = %x20-21 / %x23-5B / %x5D-10FFFF
escaped   = %x22 ; "    quotation mark  U+0022
          / %x5C ; \    reverse solidus U+005C
          / %x62 ; b    backspace       U+0008
          / %x66 ; f    form feed       U+000C
          / %x6E ; n    line feed       U+000A
          / %x72 ; r    carriage return U+000D
          / %x74 ; t    tab             U+0009
          / %x75.30.30.30 (%x30-37 / %x62 / %x65-66) ; u000X
          / %x75.30.30.31 (%x30-39 / %x61-66)        ; u001X

3.1.2 Examples

To assist in the development of compatible implementations, the following test values may be useful for verifying the canonical transformation code.

Given the following JSON object:

{}