Generic Synchronization
Cisco
170 West Tasman Drive
MS: SJC-21/2
San Jose
CA
95134
USA
+1 408 421-9990
fluffy@cisco.com
APP
This brief note discusses the growing need for a standardized sync
protocol for generic information and looks at one possibility for what
such a protocol might look like. It lists some of the things people are
syncing today, introduces a "ferry" concept of synchronization, and
briefly sketches a possible synchronization scheme.
Synchronization is about keeping a set of data consistent between a set
of devices. The classic example of this would be a user's Personal Address
Book (PAB) but there are many other things that are becoming common to
sync: media play lists, photo albums, missed call lists on phones or
dialed call lists, task lists, gps location tracks, and many more. One way
to deal with sync issues is to have a central server that is the master
source of truth, with all the devices periodically connecting to that
server and updating. However, in many situations having every device
connect to a master source is impossible. This note is about
synchronization in environments in which a centralized server architecture
is impossible.
This note looks at systems in which all the devices are equivalent and
there is no master server that all devices can reach. This allows a device
to act as ferry to bridge the synced information. As an example, I sync my
PAB in my notebook computer to my phone using USB, and then I sync my
phone to the navigation system in my car using bluetooth. In this case my
phone has acted as a ferry to transfer the data from my PC to car. The car
does not have internet connectivity and could not have contacted a central
server. With the price of flash memory falling below $10 per gigabyte,
phones, PDAs, and even just plain USB keys are rapidly becoming usable to
synchronizing multiple gigabytes of data, making it practical to use a
device like a phone as a ferry for music and video. The device that is
acting as the ferry could easily synchronize information that it did not
understand or use itself.
The most critical functionality of any sync system is that a user never
wants to see "you have 376 conflicts! how would you like to resolve them?"
It is also important that any sync protocol be able to gracefully deal
with syncs that have been brutally interrupted by events such as the plug
being pulled out part way through the sync. It is also key that when the
communication mechanism between the two devices is very slow, large chunks
of data that do not need to be transferred should not be transferred.
Data schemas that need to be synchronized need to be designed for this if
you want them to have reasonable conflict resolution properties. This note
uses two terms: record and fields. Records are just a collection of
fields. Fields are an atomic piece of data from a synchronization point
of view. A field can be more complex than a single data item. For example
in a PAB, a telephone number may contain an indication that it is a
mobile, home or work number. This indication would likely be in the same
field as the phone number since they need to handled as a single atomic
unit when dealing with conflict resolution. However a PAB would not want
the the email address and phone number in the same field because then you
could not successfully synchronize in a situation in which one device has
added a new phone number for a particular user and another device has
added a new email address. The point of this is that the data schema needs
to clearly outline what the atomic blocks are.
The approach proposed here gives every record and field a UUID and fields
have a time-stamp of when they were last modified. One can debate the
issues with time synchronization but from a pragmatic point of view, just
about every device today has a rough idea of time.
The sync protocol starts with both sides exchanging their UUIDs and
checking whether either device has any fields that are missing from the
other and if so transferring them. For fields that have changed on both
sides, the devices would compare the time-stamps and select the most
recent one. It would be possible to query the users about resolving
conflicts at this point but for many cases it would not be
necessary. Deletion would be done by removing all the data from the field
except the UUID and time-stamp, and marking a deleted flag on the
field. Devices that were not capable of knowing the time could just
slightly increment the time-stamp when they changed a field. Each device
would also have a UUID; when a pair of devices had previously synced, they
could optimize the sync by only exchanging records and fields that had
changed since the previous sync.
This type of sync protocol could be realized other ways. For example one
could just have an XML file on a USB key that had all the records and
fields in the file; the USB key would be used as a ferry between the
devices to be synchronized.