Born from the ashes of Stadia, this repository comprises tools for synching and
streaming recordsdata from Windows to Linux. They are essentially based fully on Teach material Defined
Chunking (CDC), particularly
FastCDC,
to nick up up recordsdata into chunks.
Historical previous
At Stadia, sport builders had procure admission to to Linux cloud cases to hurry video games.
Most builders wrote their video games on Windows, despite the undeniable fact that. Therefore, they the biggest a
potential to fabricate them readily accessible on the far away Linux event.
As builders had SSH procure admission to to those cases, they would well also exhaust scp
to reproduction
the sport yelp. On the opposite hand, this develop into impractical, especially with the shift to
working from residence all thru the pandemic with sub-par cyber internet connections. scp
repeatedly copies tubby recordsdata, there may be no “delta mode” to reproduction kindly the issues that
changed, it is miles behind for tons of small recordsdata, and there may be no quick compression.
To abet this topic, we developed two tools, cdc_rsync
and cdc_stream
,
which enable builders to mercurial iterate on their video games without continually
incurring the ticket of transmitting dozens of GBs.
CDC RSync
cdc_rsync
is a instrument to sync recordsdata from a Windows machine to a Linux tool,
same to the fashioned Linux rsync. It’s
customarily a reproduction instrument, but optimized for the case the build there may be already an extinct
version of the recordsdata readily accessible in the aim list.
- It mercurial skips recordsdata if timestamp and file size match.
- It makes exhaust of quick compression for all recordsdata transfer.
- If a file changed, it determines which sides changed and kindly transfers the
variations.
The far away diffing algorithm relies on CDC. In our assessments, it is miles up to 30x
faster than the one ragged in rsync
(1500 MB/s vs 50 MB/s).
The following chart reveals a comparison of cdc_rsync
and Linux rsync
running
below Cygwin on Windows. The take a look at recordsdata contains 58 construction builds
of some sport supplied to us for evaluation capabilities. The builds are 40-45 GB
effective. For this experiment, we uploaded the first own, then synced the 2nd
own with each of the two tools and measured the time. To illustrate, syncing
from own 1 to own 2 took 210 seconds with the Cygwin rsync
, but kindly 75
seconds with cdc_rsync
. The three outliers are potentially just drops from
one more construction department, the build the delta develop into great greater. General,
cdc_rsync
syncs recordsdata about 3 times faster than Cygwin rsync
.
We additionally ran the experiment with the native Linux rsync
, i.e syncing Linux to
Linux, to rule out disorders with Cygwin. Linux rsync
performed on moderate 35%
worse than Cygwin rsync
, that may per chance also additionally be attributed to CPU variations. We did
no longer encompass it in the resolve because to this, but you would bag it
here.
How does it work and why is it faster?
The in vogue Linux rsync
splits a file into mounted-size chunks of assuredly
several KB.
If the file is modified in the heart, e.g. by inserting xxxx
after 567
,
this customarily potential that the modified chunks moreover to
all subsequent chunks commerce.
The in vogue rsync
algorithm hashes the chunks of the far away “extinct” file
and sends the hashes to the local tool. The local tool then figures out
which sides of the “recent” file fits known chunks.
It’s miles a simplification. The actual algorithm is extra subtle and makes exhaust of
two hashes, a venerable rolling hash and a sturdy hash, gaze
here for a huge overview. What makes
rsync
rather behind is the “no match” topic the build the rolling hash does
no longer match any far away hash, and the algorithm has to roll the hash forward and
procure a hash blueprint search for for every byte. rsync
goes to
gigantic lengths
optimizing lookups.
cdc_rsync
does no longer exhaust mounted-size chunks, but as a substitute variable-size,
yelp-defined chunks. That means, chunk boundaries are certain by the
local yelp of the file, in put together a 64 byte sliding window. For additional
small print, gaze
the FastCDC paper
or decide a seek at our implementation.
If the file is modified in the heart, kindly the modified
chunks, but no longer subsequent chunks
commerce (except they’re no longer up to 64 bytes faraway from the modifications).
Computing the chunk boundaries is affordable and involves kindly a left-shift, a reminiscence
search for, an add
and an and
operation for every input byte. This is less dear
than the hash blueprint search for for the fashioned rsync
algorithm.
Attributable to this, the cdc_rsync
algorithm is quicker than the fashioned
rsync
. It’s additionally extra efficient. Since chunk boundaries go alongside with insertions
or deletions, the job to compare local and much away hashes is a trivial space
dissimilarity operation. It does no longer involve a per-byte hash blueprint search for.
CDC Chase
cdc_stream
is a instrument to trot recordsdata and directories from a Windows machine to a
Linux tool. Conceptually, it is miles same to sshfs,
but it is miles optimized for read flee.
- It caches streamed recordsdata on the Linux tool.
- If a file is re-read on Linux after it changed on Windows, kindly the
variations are streamed again. The relaxation is read from the cache. - Stat operations are very quick since the list metadata (filenames,
permissions etc.) is supplied in a streaming-excellent potential.
To effectively resolve which sides of a file changed, the instrument makes exhaust of the identical
CDC-essentially based fully diffing algorithm as cdc_rsync
. Modifications to Windows recordsdata are nearly
without delay mirrored on Linux, with a delay of roughly (0.5s 0.7s x total
size of changed recordsdata in GB).
The instrument does no longer reinforce writing recordsdata encourage from Linux to Windows; the Linux
list is readonly.
The following chart compares times from starting up a sport to reaching the menu.
In one case, the sport is streamed by potential of sshfs
, in the diversified case we exhaust
cdc_stream
. General, we gaze a 2x to 5x speedup.
Accumulate the precompiled binaries from the
most stylish free up.
We at display veil provide Linux binaries compiled on
Github’s most stylish Ubuntu version.
If the binaries give you the results you want, you would skip the following two sections.
Alternatively, the project may even be constructed from offer. Some binaries have to be
constructed on Windows, some on Linux.
Necessities
To own the tools from offer, the following steps have to be done on
each Windows and Linux.
- Accumulate and set up Bazel from here. Evaluate
workflow logs for the
at display veil ragged version. - Clone the repository.
git clone https://github.com/google/cdc-file-transfer
- Initialize submodules.
cd cdc-file-transfer git submodule replace --init --recursive
Sooner or later, set up an SSH consumer on the Windows tool if no longer recent.
The file transfer tools require ssh.exe
and scp.exe
.
Building
The 2 tools may even be constructed and ragged independently.
CDC RSync
- Assemble Linux parts
bazel own --config linux --compilation_mode=opt --linkopt=-Wl,--strip-all --copt=-fdata-sections --copt=-ffunction-sections --linkopt=-Wl,--gc-sections //cdc_rsync_server
- Assemble Windows parts
bazel own --config residence windows --compilation_mode=opt --copt=/GL //cdc_rsync
- Copy the Linux own output file
cdc_rsync_server
from
bazel-bin/cdc_rsync_server
on the Linux machine tobazel-bincdc_rsync
on the Windows machine.
CDC Chase
- Assemble Linux parts
bazel own --config linux --compilation_mode=opt --linkopt=-Wl,--strip-all --copt=-fdata-sections --copt=-ffunction-sections --linkopt=-Wl,--gc-sections //cdc_fuse_fs
- Assemble Windows parts
bazel own --config residence windows --compilation_mode=opt --copt=/GL //cdc_stream
- Copy the Linux own output recordsdata
cdc_fuse_fs
andlibfuse.so
from
bazel-bin/cdc_fuse_fs
on the Linux machine tobazel-bincdc_stream
on the Windows machine.
Usage
The tools require a setup the build you would exhaust SSH and SCP from the Windows machine
to the Linux tool without entering a password, e.g. by the usage of key-essentially based fully
authentication.
Configuring SSH and SCP
By default, the tools search ssh.exe
and scp.exe
from the path atmosphere
variable. Whereas you happen to can rush the following commands in a Windows cmd without
entering your password, you may be all space:
ssh particular person@linux.tool.com
scp somefile.txt particular person@linux.tool.com:
Right here, particular person
is the Linux particular person and linux.tool.com
is the Linux host to
SSH into or reproduction the file to.
If extra arguments are required, it is miles suggested to fabricate an SSH config
file. By default, each ssh.exe
and scp.exe
exhaust the file at
%USERPROFILE%.sshconfig
on Windows, if it exists. A conceivable config file
that items a username, a port, an identity file and a known host file may per chance look
as follows:
Host linux_device
HostName linux.tool.com
Person particular person
Port 12345
IdentityFile C:routetoid_rsa
UserKnownHostsFile C:routetoknown_hosts
If ssh.exe
or scp.exe
can not be discovered, you would specify the tubby paths by potential of
the expose line arguments --ssh-expose
and --scp-expose
for cdc_rsync
and cdc_stream start up
(gaze below), or space the atmosphere variables
CDC_SSH_COMMAND
and CDC_SCP_COMMAND
, e.g.
space CDC_SSH_COMMAND="C:route with residencetossh.exe"
space CDC_SCP_COMMAND="C:route with residencetoscp.exe"
New that you just would additionally specify SSH configuration by potential of the atmosphere variables
as a substitute of the usage of a config file:
space CDC_SSH_COMMAND=C:routetossh.exe -p 12345 -i C:routetoid_rsa -oUserKnownHostsFile=C:routetoknown_hosts
space CDC_SCP_COMMAND=C:routetoscp.exe -P 12345 -i C:routetoid_rsa -oUserKnownHostsFile=C:routetoknown_hosts
New the small -p
for ssh.exe
and the capital -P
for scp.exe
.
Google Deliver
For Google interior usage, space the following atmosphere variables to enable SSH
authentication the usage of a Google safety key:
space CDC_SSH_COMMAND=C:gnubbybinssh.exe
space CDC_SCP_COMMAND=C:gnubbybinscp.exe
New that you just will want to the contact the safety key extra than one times all thru the
first rush. Subsequent runs kindly require a single contact.
CDC RSync
cdc_rsync
is ragged same to scp
or the Linux rsync
expose. To sync a
single Windows file C:routetofile.txt
to the residence list ~
on the Linux
tool linux.tool.com
, rush
cdc_rsync C:routetofile.txt particular person@linux.tool.com:~
cdc_rsync
understands the same old Windows wildcards *
and ?
.
cdc_rsync C:routeto*.txt particular person@linux.tool.com:~
To sync the contents of the Windows list C:routetosources
recursively to
~/sources
on the Linux tool, rush
cdc_rsync C:routetosourcesparticular person@linux.tool.com:~/sources -r
To procure per file development, add -v
:
cdc_rsync C:routetosourcesparticular person@linux.tool.com:~/sources -vr
CDC Chase
To trot the Windows list C:routetosources
to ~/sources
on the Linux
tool, rush
cdc_stream start up C:routetosources particular person@linux.tool.com:~/sources
This makes all recordsdata and directories in C:routetosources
readily accessible on
~/sources
without delay, as if it were a local reproduction. On the opposite hand, recordsdata is streamed
from Windows to Linux as recordsdata are accessed.
To cease the streaming session, enter
cdc_stream cease particular person@linux.tool.com:~/sources
The expose additionally accepts wildcards. To illustrate,
stops all gift streaming sessions for the given particular person.
Troubleshooting
On first rush, cdc_stream
starts a background provider, which does the total work.
The cdc_stream start up
and cdc_stream cease
commands are correct RPC purchasers that
consult with the provider.
The provider logs to %APPDATA%cdc-file-transferlogs
by default. The logs are
precious to evaluation disorders with asset streaming. To go custom arguments, or
to debug the provider, own a JSON config file at
%APPDATA%cdc-file-transfercdc_stream.json
with expose line flags.
To illustrate,
instructs the provider to log debug messages. Strive cdc_stream start up-provider -h
for a list of readily accessible flags. Alternatively, rush the provider manually with
and go the flags as expose line arguments. Whereas you happen to hurry the provider manually,
the flag --log-to-stdout
is particularly precious because it logs to the console
as a substitute of to the file.
cdc_rsync
repeatedly logs to the console. To elevate log verbosity, go -vvv
for debug logs or -vvvv
for verbose logs.
For each sync and trot, the debug logs bear all SSH and SCP commands that
are attempted to hurry, which is terribly precious for troubleshooting. If a expose
fails , reproduction it and rush it in isolation. Lope -vv
or -vvv
for
extra debug output.