Serialization
Serialization of consensus-relevant data types
Last updated
Serialization of consensus-relevant data types
Last updated
Amounts may take any integral value between 0 and 2^64 - 1. Amount serialization takes into account the fact that assets are often manipulated in quantities that are multiples of powers of ten.
Amount serialization has the following properties:
Every value with 3 or fewer significant decimal figures has a two-byte representation.
Every value with 8 or fewer significant decimal figures has a four-byte representation.
Every value less than 2^56 has an eight-byte representation.
Every value up to 2^64 - 1 has a nine-byte representation.
This compares to an uncompressed uint64 occupying eight bytes.
Assets have a version, a contract hash and a subtype. The version and contract hash together form the contract id. The version may take any uint32 value, the contract hash any 32-byte array value, and the subtype any 32-byte array value.
As a special case, the Zen native token has version 0, contract hash with all bytes set to zero, and subtype with all bytes set to zero.
Asset serialization is optimized to represent the Zen native token efficiently, as well as to efficiently represent assets with low version numbers and with subtypes that have many trailing zero bytes. As the subtype is under the control of the contract generating the asset, contract writers can gain some efficiency for the assets their contracts make, by giving them short subtypes.
Serialized assets use between one and 65 bytes. The first byte uses two bits for signalling, and six to encode the version. Between zero and four more bytes encode the rest of the version, followed by either zero or 32 bytes to represent the contract hash, followed by between zero and 32 bytes to represent the subtype.
If the version is between 0 and 31, inclusive, there are no additional version bytes. The third most significant bit of the first byte is set to 0, and the lower five bits represent the version.
Versions are serialized by an algorithm similar to that used for protocol buffers' varint
type. Between one and five bytes are used to represent the 32-bit version, including the first byte described above.
The two most significant bits of the first byte, marked above as ??
, signal the type of compression used for the contract hash and subtype.
There are no further bytes. The top two bits of the first byte are set to zero.
The two first bits of the first (version) byte are set to 10
. After the version bytes, the next 32 bytes encode the contract hash.
The first two bits of the first byte are set to 11
. After the version bytes, the next 32 bytes encode the contract hash, and a further 32 bytes encode the subtype.
The first two bits of the first byte are set to 01
. After the version bytes, the next 32 bytes encode the contract hash. One byte encodes the 'size' of the subtype – i.e., 32 minus the number of trailing zero bytes. A further size
bytes encode the leading bytes of the subtype.
Versions are written as integers (decimal). The contract hash and subtype are written as arrays of bytes, of length 32.
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
0
Value
Notes
0
1
1
1
1
1
0
x
x
x
x
x
x
x
x
x
NaN
"quiet" NaN
0
1
1
1
1
0
x
x
x
x
x
x
x
x
x
x
Infinity
Or overflow, underflow. Causes deserialization failure.
0
a
b
e
e
e
t
t
t
t
t
t
t
t
t
t
tttttttttt_2 * 10^(abeee_2)
Subject to ab != 11 (base 2)
0
1
1
a
b
e
e
e
t
t
t
t
t
t
t
t
100tttttttt_2 * 10^(abeee_2)
Subject to ab != 11 (base 2). This is a non-canonical representation (the client will not make it).
31
30
29
28
27
26
25
24
Value
Notes
1
1
1
1
1
1
0
x
NaN
quiet NaN
1
1
1
1
1
0
x
x
Infinity
Or overflow/underflow. Deserialization error.
1
0
e
e
e
e
t
t
((tt_2)*2^24 + L) * 10^(eeee_2)
'L' is the unsigned big-endian value of the next 3 bytes.
1
1
1
a
b
e
e
t
((10t_2)*2^24 + L) * 10^(abee_2)
Subject to ab != 11 (base 2). This is a non-canonical representation (the client will not make it).
1
1
0
e
e
e
e
t
((10t_2)*2^24 + L) * 10^(eeee_2)
Canonical representation of values with large significands (>= 2^26 and less than 10^8).
Highest byte
Seven lower bytes
Value
Notes
0x7E/0x7F
L
L
'L' is a 7-byte unsigned big-endian value
Highest byte
Eight lower bytes
Value
Notes
0xFE/0xFF
L
L
'L' is an 8-byte unsigned big-endian value
7
6
Rest of byte
Uncompressed representation?
Subtype present?
Version or upper part of version
First byte (base 2)
Version
Contract Hash
Subtype
More bytes?
Notes
00000000
Zero
Zero
Zero
No
Zen native token
10000000
Zero
Non-zero
Zero
Yes, 32 more
Contract's default asset
11000000
Zero
Any
Non-zero
Yes, 64 more
Uncompressed subtype (32 bytes). Canonical iff the uncompressed subtype has 0 or 1 trailing zero bytes.
01000000
Zero
Any
Non-zero
Yes, between 34 and 63 more
Compressed subtype (< 32 bytes). Canonical iff the uncompressed subtype has at least two trailing zero bytes.
xy0abcde
abcde (base 2)
As above
As above
Only if x <> 0 or y <> 0
Represents versions between 0 and 31.
xy1abcde
>=32
As above
As above
Yes
Represents versions greater than or equal to 32. See below.
Version range
Big-endian uncompressed version
Bytes
0 <= v < 32
000xxxxx
??0xxxxx
32 <= v < 2^12
0000xxxx xyyyyyyy
??1xxxxx 0yyyyyyy
2^12 <= v < 2^19
00000xxx xxyyyyyy yzzzzzzz
??1xxxxx 1yyyyyyy 0zzzzzzz
2^19 <= v < 2^26
000000xx xxxyyyyy yyzzzzzz zwwwwwww
??1xxxxx 1yyyyyyy 1zzzzzzz 1wwwwwww
2^26 <= v
xxxxyyyy yyyzzzzz zzwwwwww wvvvvvvv
??10xxxx 1yyyyyyy 1zzzzzzz 1wwwwwww 1vvvvvvv