Data Representation

This portion of the architecture guide describes the octet stream form of data types understood by the LWMsg marshaller. All data encoding follow several common rules unless otherwise specified:

  • Type information is not encoded in the stream.
  • All multi-octet fields in the data stream are encoded in big-endian byte order.
  • Fields are not padded or aligned
  • Bits which do not have a specified meaning or value should be set to zero.

Core Data Types

This section describes the core data types understood by the marshaller, and does not include extended types added by the association and connection abstractions, nor common type aliases which may be reduced to combinations of core types.

Integers

Integers are arbitrary-length integral values (although the length must be a multiple of 8 bits). They are encoded starting with the most significant byte and ending with the least. Integers may be signed or unsigned. Signed integers are encoded using two's complement representation. Although the data representation does not impose a limit on the size of integers, LWMsg only supports integers as wide as intmax_t on a given platform. The largest size which is guaranteed to be supported is 64 bits.

Integer
WidthValue
8Most significant byte
...
8Least significant byte

Pointers

Pointers represent a potentially-null reference to zero or more contiguous, homogenous elements of a particular type. If a pointer is not null, it must be unique – that is, no two pointers in an encoded LWMsg octet stream can share a referent.

There are three elements in the octet representation of a pointer:

  1. A flag indicating whether the pointer is null or not
  2. The length (number of elements) of the pointer referent
  3. The encoding of the elements of the referent

The first byte of a pointer representation is a flag which indicates whether the pointer is null:

  • 0x00: the pointer is null
  • 0xff: the pointer is non-null

Pointer types may be decorated with an attribute that requires them to be non-null, in which case the indicator byte is omitted entirely.

The number of elements may be determined in three ways:

  1. As a static length
  2. As the value of an earlier field in the stream (correlated length)
  3. Implicitly through termination with a zero element

If the first case, the length of the referent is well-known and is not encoded in the octet stream. In the second case, the length already appears previously in the stream and is not repeated. In the third case, the length is encoded explicitly as a 32-bit unsigned integer. In all three cases, the length specifies the number of elements, not the size in bytes.

Finally, each element of the referent is encoded in order according to the rules of that type. In the case of a zero-terminated referent, the zero element is not encoded in the stream and is not counted in the transmitted length. The decoder implicitly adds it back.

Pointer
WidthValue
8Indicator flag (omitted for non-nullable pointers)
32Length of referent (omitted for static or correlated length)
wRepresentation of 1st element
wRepresentation of 2nd element
...
wRepresentation of nth element

Arrays

Arrays share many characteristics with pointers but can never be null due to the fact that they are laid out contiguously in memory within their containing type. Because of this, their octet encoding is identical to that of a non-nullable pointer. Otherwise, arrays support the same set of length determination methods as pointers.

Array
WidthValue
32Length of array (omitted for static or correlated length)
wRepresentation of 1st element
wRepresentation of 2nd element
...
wRepresentation of nth element

Some encodings which are possible in theory are not allowed in practice because they cannot be decoded to a usable in-memory structure. In particular, an array with a variable length cannot occur in the middle of a structure or another array – it must come at the end. This is known as a flexible array member.

Structures

Structures are heterogeneous tuples of zero or more members, each of a specific type. Members in structures may be correlated:

  • The length of an array or pointer referent may be the value of an earlier field
  • The active arm of a union must be determined by the value of an earlier field

The last member of a structure may optionally be an array with a non-static (variable) length. This is known as a flexible array member. A flexible array may not appear in any other position in a structure. A structure with a flexible array member must always be reached through a pointer with a static length of 1 – that is, it may not be a direct member of another structure, of an array, or of a pointer referent with more than 1 element.

The encoding of a structure is merely the encoding of its members in order.

Structure
WidthValue
w1Representation of the 1st member
w2Representation of the 2nd member
...
wnRepresentation of the nth member

Unions

Unions are a combination of one or more hetergeneous arms, only one of which is present for any given instance. Each arm is associated with a unique integer tag which identifies it. Every instance of a union must be correlated with an integer member of its containing structure. This integer is known as a discriminator and distinguises which arm of the union instance is active. Only the representation of this active arm is encoded in the octet stream.

Union
WidthValue
waRepresentation of the active arm

Useful Data Types

The following data types are extensions built on the core set using the custom type mechanism (see LWMSG_CUSTOM) and included as part of the standard LWMsg software package.

Handles

Handles are opaque, persistent pointers which allow peers joined by an association to reference each other's objects without transmitting them. Handles are the recommended means of maintaining connection state.

A handle's representation consists of its locality and handle ID. The locality is an 8-bit value which specifies the side of an association – local or remote – where the physical object represented by the handle resides. Alternatively, it may indicate that the handle is null. The handle ID is a 32-bit integer distinguishing the handle from all other possible active handles in the session. Handle IDs are arbitrarily assigned by the peer which first creates the handle. Both peers may by chance pick the same handle ID for handles they create; this is allowed because the locality of a handle is also taken into account when resolving the handle ID to an object in memory.

Handle
WidthValue
8Locality
32Handle ID (omitted if locality is NULL)

The locality field has three legal values:

  • 0x00: The handle is null
  • 0x01: The handle is local from the perspective of the encoder
  • 0x02: The handle is remote from the perspective of the encoder

File Descriptors

The file descriptor type allows LWMsg applications communicating over UNIX domain sockets to exchange UNIX file descriptors between processes. Because the mechanism to achieve this involves passing special ancillary data to the kernel, the actual file descriptor is not encoded into the representation. Instead, an 8-bit flag is sent indicating whether the file descriptor was valid.

File descriptor
WidthValue
8Validity flag

The flag field has two legal values:

  • 0x00: the file descriptor was invalid (-1)
  • 0xff: the file descriptor was valid and was transmitted as ancillary data