.. SPDX-FileCopyrightText: 2020-2023 Jochem Rutgers SPDX-License-Identifier: CC-BY-4.0 Synchronizer ============ Distributed store synchronizer. In a distributed system, every process has its own instance of a store. Synchronization between these instances is implemented by the :cpp:class:`stored::Synchronizer`. The Synchronizer can be seen as a service, usually one per process, which knows all stores in that process and all communication channels to other processes. At regular intervals, it sends updates of locally modified data to the other Synchronizers. The topology can be configured at will. In principle, a process can have any number of stores, any number of synchronizers (which all handle any subset of the stores), any number of connections to any other process in the system. There are a few rules to keep in mind: - Only :cpp:class:`stored::Synchronizable` stores can be handled by the Synchronizer. This has to be used correctly when the store is instantiated. - To synchronize a store, one must define which store is the one that provides the initial value. Upon connection between Synchronizers, the store's content is synchronized at once from one party to the other. Afterwards, updates are sent in both directions. - Writes to different objects in the same store by the same process are observed by every other process in the same order. All other write orders are undefined (like writes to objects of different stores by the same process, or writes to the same store by different processes), and can be observed to happen in a different order by different processes at the same time. - Writes to one object should only be done by one process. So, every process owns a subset of a store. If multiple processes write to the same object, behavior is undefined. That would be a race-condition anyway. - The communication is done in the store's endianness. If a distributed system have processors with different endianness, they should be configured to all-little or all-big endian. Accessing the store by the processor that has a store in a non-native endianness, might be a bit more expensive, but synchronization is cheaper. - Stores are identified by their (SHA-1) hash. This hash is computed over the full source code of the store (the .st file). So, only stores with the exact same definition, and therefore layout, can be synchronized. Protocol -------- The protocol for synchronization consists of four messages. These are sent when appropriate, not in a request-response paradigm. There is no acknowledge. Invalid messages are just ignored. Hello ````` "I would like to have the full state and future changes of the given store (by hash). All updates, send to me using this reference." (``h`` | ``H``) The hash is returned by the ``hash()`` function of the store, including the null-terminator. The id is arbitrary chosen by the Synchronizer, and is 16-bit in the store's endianness (``h`` indicates little endian, ``H`` is big). Welcome ``````` This is a response to a Hello. "You are welcome. Here is the full buffer state, upon your request, of the store with given reference. Any updates to the store at your side, provide them to me with my reference." (``w`` | ``W``) The hello id is the id as received in the hello message (by the other party). The welcome id is chosen by this Synchronizer, in the same manner. Update `````` "Your store, with given reference, has changed. The changes are attached." (``u`` | ``U``) The updates are a sequence of the triplet: . The key and length have the most significant bytes stripped, which would always be 0. All values are in the store's endianness (``u`` is little, ``U`` is big endian). Proposal '''''''' The updates are a sequence defined as follows: <5 MSb key offset, 3 LSb length> . The key offset + 1 is the offset from the previous entry in the updates sequence. Updates are sent in strict ascending key offset. The initial key is -1. For example, if the previous key was 10 and the 5 MSb indicate 3, then the next key is 10 + 3 + 1, so 14. If the 5 MSb are all 1 (so 31), an additional key byte is added after the first byte (which may be 0). This value is added to the key offset. If that value is 255, another key byte is added, etc. The 3 LSb bits of the first byte are decoded according to the following list: - 0: data length is 1 - 1: data length is 2 - 2: data length is 3 - 3: data length is 4 - 4: data length is 5 - 5: data length is 6 - 6: data length is 7, and an additional length byte follows (like the key offset) - 7: data length is 8 Using this scheme, when all variables change within the store, the overhead is always one byte per variable (plus additional length bytes, but this is uncommon and fixed for a given store). This is also the upper limit of the update message. If less variables change, the key offset may be larger, but the total size is always less. The asymmetry of having 6 as indicator for additional length bytes is because this is an unlikely value (7 bytes data), and at least far less common than having 8 bytes of data. The data is sent in the store's endianness (``u`` is little, ``U`` is big endian). Bye ``` "I do not need any more updates of the given store (by hash, by id or all)." | (``b`` | ``B``) | (``b`` | ``B``) | (``b`` | ``B``) A bye using the id can be used to respond to another message that has an unknown id. Previous communication sessions remnants can be cleaned up in this way. ``b`` indicates that the id is as little endian, ``B`` indicates big endian. For the other two variants, there is no difference in endianness, but both versions are defined for symmetry. stored::StoreJournal -------------------- .. doxygenclass:: stored::StoreJournal stored::SyncConnection ---------------------- .. doxygenclass:: stored::SyncConnection stored::Synchronizable ---------------------- .. doxygenclass:: stored::Synchronizable stored::Synchronizer -------------------- .. doxygenclass:: stored::Synchronizer