hx.clone doc
Overview
Cloning is a cluster feature which enables fine-grained record based replication. Cloning is significantly different from database replication that is used to replicate the entire Folio database including all the history data, spark cache, and arc documents. Instead cloning is used to replicate specific records from one project to another project.
Salient features of the cloning extension include:
- Synchronized replication of records between project
- Only persistent tags are synchronized
- Records to replicate are identified using filters
- Leverages Arcbeam clustering
- Automatic synchronization of changes (typically within a few seconds)
- Supports synchronization from Haxall or SkySpark nodes
- Client support only in SkySpark
- Clones can be used as targets for rule engine
- Synchronize history data via cloneSyncHis function
Terminology
The following terminology is used with the cloning framework:
- server project: the project with the source records to replicate from
- client project: project we wish to replicate to
- clone set: record in client project used to configure the server project and filter of records to replicate
- progenitor: a single record in the server project replicated to one or more client projects
- clone: replicated copy record in the client project
Clone Sets
Clone sets are records configured in the client project to subscribe to records in remote projects. Clone sets are configured with the following tags:
- cloneSet: marker tag
- cloneProj: project name or node id for remote project
- cloneFilter: filter used to subscribe to matching records
If you are cloning from a Haxall node, then the cloneProj tag should be
set to the clustering node id without the n: prefix.
Clones
Clones are records in the client project which are automatically kept in-sync with their respective progenitors. Most tags in a clone are mirrored from their progenitor with the following exceptions:
- clone: marker tag is added to all clones
- cloneSetRef: reference to parent clone set which defines source project
- cloneMod: used to track of their progenitor's mod timestamp for sync status
- all references are relativized to the local project (discussed below)
Attempts to modify a clone record will raise an exception. This means they cannot be edited in the UI, nor may be they be changed via the commit() function. The only way to change a clone is it to update its progenitor which will in turn be reflected in the clone through the synchronization mechanism. Note if the clone extension is not enabled, then clones may be modified as normal records.
Clone Refs
The ref tags of progenitors are relativized to the local project. This includes the id tag as well as relationship tags such as siteRef, equipRef, etc:
Server Progenitor Clone
-------- ------------------- ------------------
SkySpark @p:server:r:abc-123 @p:client:r:abc-123
Haxall @abc-123 @p:client:r:abc-123
Clone Orphans
If you remove a cloneSet rec from the database, then its clones are left
in an orphaned state. Because clones cannot be modified through standard
tools such as commit(), you must use specialized functions to clean
up orphaned clones. Use the cloneReadOrphans() to read all the orphaned
records. Use cloneRemoveOrphans() to remove orphans from the database:
// read all orphans and then remove them from database
cloneReadOrphans().cloneRemoveOrphans
Protocol
The cloning extension utilizes a simple protocol over Arcbeam to efficiently maintain synchronization between projects. Understanding how the protocol works under the coves is necessary for tuning and diagnostics. There are three different exchanges between the client and server projects:
- Sub: subscription of a client's clone sets with the server project
- Sync: compare manifest of a client's clone set with the server project
- Updates: events sent by the server to the client with updated records
The sub or subscription request is sent periodically by the client to the server project to register the client's clone sets. The server uses the subscription to maintain a registry of client clone sets for where it needs to push updates. During the subscription phase the client and server both compute a SHA-1 digest of their local records. If a client detects that a given clone set is out of date, then a sync request is initiated.
The sync request is sent by a client to the server for each clone set it detects is out of date from a subscription request. The request includes a manifest of every id and mod timestamp it has locally. The server uses this information to queue up any out of date records to push to the client.
The server pushes modified records in the clone set to the client via update events. During initialization, updates are pushed from the sync request. But once the initial synchronization is complete, the server will automatically push updates whenever it detects local changes. Servers maintain an obsCommits subscription on the clone set filter to listen for changes to the set.
Security
If you do not wish a given project to be a clone server, then do not enable the cloning extension. Projects without the clone extension enabled cannot be used as a server and clients will receive a subscription errror.
Arcbeam nodes which are set to isolation mode can be used as a clone server. However if a node is isolated then it cannot be used as a clone client.
Tuning
In a typical steady state case, the only exchange should be periodic subscription
requests. If no changes have been made then the digests should match and
this mechanism efficiently ensures client projects are up-to-date. Subscription
requests are throttled via the subMinTime and subMaxTime settings.
By default a subscription is issued whenever a node transitions from down to up as long
the last subscription was older than subMinTime. This ensures that we immediately
check for changes which might have been made while the connection was down.
However, this might be an expensive check for unstable Arcbeam connections
which are going up and down; in that case consider turning off this feature
with the subOnUp flag in the settings.
The updateFreq setting is used to throttle how fast server side changes
are pushed to the client via update events. For records which don't change
frequently then using a few seconds ensures rapid synchronization. However,
if you have records changing often then consider tuning this frequency
higher to save bandwidth and processing.
In terms of performance the goal is to avoid the sync request - it is very expensive for large clone sets. Ideal operation is to perform an initial synchronization, then only perform periodic subscriptions to ensure everything is up-to-date. In the steady state case with a stable Arcbeam connection then re-syncs should be rare.
Overlaps
It is a configuration error when two or more different clone sets to the same project match the same records. We call this condition overlaps. Because a given clone rec can only be managed by one clone set, this condition puts the client project into an indeterminate state. When this condition is flagged as a warning, it is imperative to fix your configuration to ensure each progenitor is matched by only one client clone set filter. This condition will cause your system to thrash with repeated synchronization requests that will never resolve. Note this same condition will occur if cloning recs from different projects which share the same id.
Over Max
Clone set size has a large impact on performance. Therefore we limit the size
of a single clone set to 2500 records. If the system exceeds this limit
then you will see the overMax flag in the debug views. Once you go over the
limit, the client project's state becomes indeterminate. The first 2500 records
sorted by id will continue to be synchronized. But in most cases the client
project will end up with orphaned clones that will not be synchronized. The
only safe way to recover from this issue is to delete your cloneSet and its
orphaned records, then recreate the cloneSet from scratch using a more narrow filter.
Sync History
You can synchronize history data from cloned points into SkySpark using the cloneSyncHis() function. This function works just like connSyncHis(), except it leverages your cloning Arcbeam channel instead of a connector. For example you can setup a task to periodically sync all your cloned points using an expression like this:
readAll(clone and point and his).cloneSyncHis(null)