i also use wasabi and aws s3 deep glacier.
wasabi handles lots of small files well.
there is no --magic flag that will figure out the optimal set of flags for a given use case.
need a good log file to see what is really going on, as to why it is stuck at 99% for 10 hours.
if this a one-time transfer or something to be run on a schedule?
if on a schedule, then you can use --max-age to reduce the amount of checks per sync.
how much ram does the ununtu server have?
based on that, then you will know how to tweak the flags.
given the small file size, not sure --s3-chunk-size and --s3-upload-concurrency will help much
i would increase --checkers and --transfers to 32, perhaps higher.
and --fast-list
perhaps ask the data center, as you have some kind of high speed transit.
not sure what the means exactly, that a single file will transfer at high speed or very low latency to upload lots of small files quickly?
hello and welcome ... Thank you! wasabi handles lots of small files well. - our exact requirement.
~106.74 GiB in 2,907,626 directories & files for test data set. Real one is about 4.8 TiB.
The --experience option which is why I came are here.
need a good log file to see what is really going on, as to why it is stuck at 99% for 10 hours.
Are you suggesting aborting current copy operation? If so should next operation be copy or sync?
if this a one-time transfer or something to be run on a schedule?
Ideally one-time but fewer will do.
how much ram does the ununtu server have?
4GiB RAM
free -m
total used free shared buff/cache available
Mem: 3936 2193 143 1 1599 1593
Swap: 2047 252 1795
given the small file size, not sure --s3-chunk-size and --s3-upload-concurrency will help much
i would increase --checkers and --transfers to 32, perhaps higher. and --fastlist
as you have some kind of high speed transit. not sure what the means exactly, that a single file will transfer at high speed or very low latency to upload lots of small files quickly?
Ubuntu server is VMware VM with 10G network link into datacenter core network both high speed & low latency within datacenter then transit link to Wasabi.
Ubuntu server has 1G network link to Red Hat Linux server with 3G SAS link to RAID5 array.
The transfer choke point will be 1G link between our servers or transit link to Wasabi. Our servers have SolarWinds Orion monitoring to measure CPU, memory, and network performance.
Large file transfers perform fairly well; do you want numbers?
Wasabi hardware solutions (File Acceleration and Ball) are beyond our budget, and require datacenter operator cooperation which is not possible soon enough. Thank you for pointing them out.
The rclone copy runs inside GNU screen virtual terminal doing 4 transfers at once if I read bottom lines correctly; the top lines show progress too:
need a good log file to see what is really going on, as to why it is stuck at 99% for 10 hours.
Are you suggesting aborting current copy operation? If so should next operation be copy or sync?
If we let current operation finish I will continue setup for 4.8 TiB data set where I need the speed.
Which ever path we follow, I learn something; I do prefer faster path if we can see that.
Recommendation? Suggestion?
i do not know what is really going on and really do not know your exact use case.
if this is a one-time transfer, and you can wait, then i would wait.
if this is a test run for that 4.8TB, and you plan to run a sync on that 4.8TB on a repeated schedule, then i will kill that current rclone sync and increase --checkers and --transfers to 32 and compare to two runs.
i would also run a test with less files and not have to wait 24+ hours per test.
find a set of folders and sub folders with perhaps 10,000 files and test rclone with different flags.
if that 4.8TB is a one-time sync, then i might try https://rclone.org/docs/#no-check-dest
Thank you; despite incomplete information, IMO, you're providing excellent advice.
Allow me to start with use case context:
vFlyer is Java web application providing web content authoring services similar to SquareSpace, Wix, WordPress, etc. Our service includes domain resale which make custom domain web sites possible: https://www.1240westmain.com/
Customers compose complex web pages; work products are HTML, style sheet, & image files i.e. "Resources".
Resources are served with Nginx which also reverse proxies Apache Tomcat hosting our Java web applications.
Resources are NFS shared by Red Hat Enterprise Linux 5 dedicated server stored on triple volume RAID5 SCSI arrays, ext3 format, and mounted as:
/resources/r1 - 1,833 GiB volume, 1,263 GiB used
/resources/r2 - 1,833 GiB volume, 1,664 GiB used
/resources/r3 - 1,833 GiB volume, 1,347 GiB used
4,274 GiB used total in Production environment, last of three:
Windows Developer workstation - no change, Resources remain local
Linux Staging integration - Resources copy to Wasabi currently in progress
Linux Production operation - Resources copy to Wasabi planning in progress
Production Resources are proxied by BunnyCDN now and in the future.
We will rely on rclone mount per forum topic 19903 21-Oct-2020 Linux NFS Server with Rclone and local disk for cache to provision a POSIX compliant NFS share taking great care to mount remote bucket just once from single NFS rclone host.
RAID5 array drives are old and slow making traversing large directory and file population painful so we plan to traverse old volumes just once.
Production Resources move is planned as "hot" move; Production application writing operation continues while move is in progress using an OverlayFS twist.
To make old volume single traversal possible we added Ubuntu 20.04 LTS virtual machine to NFS share an OverlayFS with 1 TiB XFS upper layer and old volume lower layer; an OverlayFS mount per old volume.
OverlayFS writes to upper layer only making lower layer effectively read only and safe to traverse just once.
Later catch up copy operations will be from the upper layer only, a much smaller data set as writes are relatively infrequent events compared to reads.
BunnyCDN caching slows read events at our server.
Some Production Resources date back to our 2006 service launch; we are considering options to order copy by modification times for some directories whereas other directories must be complete irrespective of age.
Our next steps are deploy Production Resources OverlayFS mounts then begin limited scope Production Resources copy tests from "read only" legacy volumes to Wasabi.
IMO all your suggestions are worthy considerations. We have these options:
continue current Staging Resourcesrclone copy or interrupt as seems prudent
start Production Resourcesrclone copy tests immediately after OverlayFS deployment
I am working toward having Production Resources OverlayFS mounts completed tonight.
Success looks like we fully utilize 1G network link between our legacy RHEL5 NFS shared RAID5 array server and Ubuntu NFS shared OverlayFS server performing rclone copy operations.
Undecided but likely is transplant Production from St. Louis datacenter to AWS us-east-2; Staging is already there.
First rclone copy completed was Staging Resources from St. Louis (no egress charge) to Wasabi.
Immediate potential actions:
rclone sync AWS Staging Resources with Wasabi
AWS Staging was cloned off ~10 days ago
small testing team doesn't change data fast
rclone mount Wasabi as AWS Staging Resources
rclone mount VFS Cache verify POSIX compliance
Today's plan is write OverlayFS mounts for deployment late tonight then remount Production Resources volumes as read only to improve access by omitting inode st_atime, st_ctime , st_mtime updates.
Once OverlayFS mounts are written I need to estimate 1G network link transfer limits to support a full or partial copy decision to be made soon. Beyond that, Production Resourcesrclone operations begin in some form the great people here are helping me discover. Thank you.