Wasabi Upload Many Small Files Many Directories

vflyer.cpino · February 18, 2021, 8:53pm

What is the problem you are having with rclone?

First time Rclone users want help accelerating rclone copy/sync to Wasabi.

Data source is SCSI RAID5 array on Ubuntu server in St. Louis, MO datacenter with high speed transit to Wasabi us-east-1 region.

Our first copy attempt is still running after 10 hours; we don't know if continuing or restarting with new options is better. Any advice?

Which option provides best information for posting here? --log-file? --progress? combination? other?

Reviewing the following forum topics was helpful but left questions open.

17621 3-Jul-2020 Migrating big S3 Bucket using Rclone
15036 20-Mar-2020 Issue with rclone and wasabi
7671 20-Nov-2018 Wasabi: multi part uploads
- Memory = --transfers × --s3-upload-concurrency × --s3-chunk-size
6712 6-Sep-2018 Best Configuration for S3/Wasabi on Fast Line?
- Threads = --transfers × --s3-upload-concurrency

Is --concurrent-uploads deprecated or obsolete?

Where can we find guidance about combining:

--timeout
--checkers
--checksum
--fast-list
--transfers
--s3-chunk-size
--s3-upload-concurrency

Thank you in advance.

What is your rclone version (output from `rclone version`)

rclone v1.54.0
- os/arch: linux/amd64
- go version: go1.15.7

Which OS you are using and how many bits (eg Windows 7, 64 bit)

lsb_release --all
Distributor ID: Ubuntu
Description:    Ubuntu 20.04.2 LTS
Release:        20.04
Codename:       focal

Which cloud storage system are you using? (eg Google Drive)

Wasabi us-east-1 region

The command you were trying to run (eg `rclone copy /tmp remote:tmp`)

rclone copy --progress /vmnt/test/resources wasabi-us-east-1:test-cdn0

The rclone config contents with secrets removed.

[wasabi-us-east-1]
type = s3
provider = Wasabi
env_auth = false
endpoint = s3.us-east-1.wasabisys.com
location_constraint = us-east-1
access_key_id = ...
secret_access_key = ...

A log from the command with the `-vv` flag

Transferred:       36.759G / 37.316 GBytes, 99%, 1.033 MBytes/s, ETA 9m12s
Transferred:       920186 / 930198, 99%
Elapsed time:   10h7m31.5s
Transferring:

asdffdsa · February 18, 2021, 9:24pm

hello and welcome to the forum,

i also use wasabi and aws s3 deep glacier.
wasabi handles lots of small files well.

there is no --magic flag that will figure out the optimal set of flags for a given use case.
need a good log file to see what is really going on, as to why it is stuck at 99% for 10 hours.

if this a one-time transfer or something to be run on a schedule?
if on a schedule, then you can use --max-age to reduce the amount of checks per sync.

how much ram does the ununtu server have?
based on that, then you will know how to tweak the flags.
given the small file size, not sure --s3-chunk-size and --s3-upload-concurrency will help much

i would increase --checkers and --transfers to 32, perhaps higher.
and --fast-list

perhaps ask the data center, as you have some kind of high speed transit.
not sure what the means exactly, that a single file will transfer at high speed or very low latency to upload lots of small files quickly?

vflyer.cpino · February 18, 2021, 10:06pm

hello and welcome ... Thank you!
wasabi handles lots of small files well. - our exact requirement.
~106.74 GiB in 2,907,626 directories & files for test data set. Real one is about 4.8 TiB.

The --experience option which is why I came are here.

need a good log file to see what is really going on, as to why it is stuck at 99% for 10 hours.

Are you suggesting aborting current copy operation? If so should next operation be copy or sync?

if this a one-time transfer or something to be run on a schedule?

Ideally one-time but fewer will do.

how much ram does the ununtu server have?

4GiB RAM

free -m
              total        used        free      shared  buff/cache   available
Mem:           3936        2193         143           1        1599        1593
Swap:          2047         252        1795

given the small file size, not sure --s3-chunk-size and --s3-upload-concurrency will help much

i would increase --checkers and --transfers to 32, perhaps higher. and --fastlist

as you have some kind of high speed transit. not sure what the means exactly, that a single file will transfer at high speed or very low latency to upload lots of small files quickly?

Ubuntu server is VMware VM with 10G network link into datacenter core network both high speed & low latency within datacenter then transit link to Wasabi.

Ubuntu server has 1G network link to Red Hat Linux server with 3G SAS link to RAID5 array.

The transfer choke point will be 1G link between our servers or transit link to Wasabi. Our servers have SolarWinds Orion monitoring to measure CPU, memory, and network performance.

Large file transfers perform fairly well; do you want numbers?

asdffdsa · February 18, 2021, 10:17pm

that is a very large number of dir and files.
not sure what the best set of flags is, or if any set of flags will make a major difference.

so the progress stats have not changed in 10 hours?
no errors are displayed.

i would contact wasabi, as i found their tech support very helpful.
https://wasabi.com/wasabi-file-acceleration/

if this is a one-time transfer, perhaps https://wasabi.com/wasabi-ball/

vflyer.cpino · February 18, 2021, 10:58pm

Wasabi hardware solutions (File Acceleration and Ball) are beyond our budget, and require datacenter operator cooperation which is not possible soon enough. Thank you for pointing them out.

The rclone copy runs inside GNU screen virtual terminal doing 4 transfers at once if I read bottom lines correctly; the top lines show progress too:

tomcat@marble:~/rclone$ rclone-res1.sh
Transferred:       45.801G / 46.186 GBytes, 99%, 1.032 MBytes/s, ETA 6m21s
Transferred:      1124462 / 1134474, 99%
Elapsed time:   12h37m6.0s
Transferring:

If we do nothing it should finish in about 24 hours.

asdffdsa · February 18, 2021, 11:04pm

sometimes doing nothing is very hard to do.....

vflyer.cpino · February 18, 2021, 11:11pm

need a good log file to see what is really going on, as to why it is stuck at 99% for 10 hours.

Are you suggesting aborting current copy operation? If so should next operation be copy or sync?
If we let current operation finish I will continue setup for 4.8 TiB data set where I need the speed.
Which ever path we follow, I learn something; I do prefer faster path if we can see that.
Recommendation? Suggestion?

asdffdsa · February 18, 2021, 11:22pm

i do not know what is really going on and really do not know your exact use case.
if this is a one-time transfer, and you can wait, then i would wait.

if this is a test run for that 4.8TB, and you plan to run a sync on that 4.8TB on a repeated schedule, then i will kill that current rclone sync and increase --checkers and --transfers to 32 and compare to two runs.
i would also run a test with less files and not have to wait 24+ hours per test.
find a set of folders and sub folders with perhaps 10,000 files and test rclone with different flags.

if that 4.8TB is a one-time sync, then i might try https://rclone.org/docs/#no-check-dest

vflyer.cpino · February 19, 2021, 10:42am

Thank you; despite incomplete information, IMO, you're providing excellent advice.
Allow me to start with use case context:

vFlyer is Java web application providing web content authoring services similar to SquareSpace, Wix, WordPress, etc. Our service includes domain resale which make custom domain web sites possible: https://www.1240westmain.com/
Customers compose complex web pages; work products are HTML, style sheet, & image files i.e. "Resources".
Resources are served with Nginx which also reverse proxies Apache Tomcat hosting our Java web applications.
Resources are NFS shared by Red Hat Enterprise Linux 5 dedicated server stored on triple volume RAID5 SCSI arrays, ext3 format, and mounted as:
- /resources/r1 - 1,833 GiB volume, 1,263 GiB used
- /resources/r2 - 1,833 GiB volume, 1,664 GiB used
- /resources/r3 - 1,833 GiB volume, 1,347 GiB used
- 4,274 GiB used total in Production environment, last of three:
  1. Windows Developer workstation - no change, Resources remain local
  2. Linux Staging integration - Resources copy to Wasabi currently in progress
  3. Linux Production operation - Resources copy to Wasabi planning in progress
Production Resources are proxied by BunnyCDN now and in the future.
We will rely on rclone mount per forum topic 19903 21-Oct-2020 Linux NFS Server with Rclone and local disk for cache to provision a POSIX compliant NFS share taking great care to mount remote bucket just once from single NFS rclone host.
RAID5 array drives are old and slow making traversing large directory and file population painful so we plan to traverse old volumes just once.
Production Resources move is planned as "hot" move; Production application writing operation continues while move is in progress using an OverlayFS twist.
To make old volume single traversal possible we added Ubuntu 20.04 LTS virtual machine to NFS share an OverlayFS with 1 TiB XFS upper layer and old volume lower layer; an OverlayFS mount per old volume.
OverlayFS writes to upper layer only making lower layer effectively read only and safe to traverse just once.
Later catch up copy operations will be from the upper layer only, a much smaller data set as writes are relatively infrequent events compared to reads.
BunnyCDN caching slows read events at our server.

Some Production Resources date back to our 2006 service launch; we are considering options to order copy by modification times for some directories whereas other directories must be complete irrespective of age.

Our next steps are deploy Production Resources OverlayFS mounts then begin limited scope Production Resources copy tests from "read only" legacy volumes to Wasabi.

IMO all your suggestions are worthy considerations. We have these options:

continue current Staging Resources rclone copy or interrupt as seems prudent
start Production Resources rclone copy tests immediately after OverlayFS deployment

I am working toward having Production Resources OverlayFS mounts completed tonight.

Success looks like we fully utilize 1G network link between our legacy RHEL5 NFS shared RAID5 array server and Ubuntu NFS shared OverlayFS server performing rclone copy operations.

asdffdsa · February 19, 2021, 2:39pm

thanks,

that is a complex setup and rclone is being used in a commerical use case.

perhaps consider using the main autthor as a consultant, @ncw
perhaps consider https://rclone.org/sponsor/

if those options are not possible, let me know and i will try to help.

ncw · February 19, 2021, 3:53pm

If you haven't already, it is worth reading this bit of the s3 docs

That gives some tips for reducing number of transactions which speeds things up.

vflyer.cpino · February 19, 2021, 4:52pm

Thank you! I am very grateful for pointing out golden nuggets I overlooked by skipping directly to Wasabi section.

vflyer.cpino · February 19, 2021, 5:05pm

Staging Resources rclone copy finished:

tomcat@marble:~/rclone$ rclone-res1.sh
Transferred:      101.895G / 101.895 GBytes, 100%, 1020.408 kBytes/s, ETA 0s
Transferred:      2483601 / 2483601, 100%
Elapsed time:    29h5m7.6s
tomcat@marble:~/rclone$ cat test-cdn0.log
Thu 18 Feb 2021 02:10:24 AM PST
Fri 19 Feb 2021 07:15:31 AM PST
tomcat@marble:~/rclone$

rclone Elapsed time and date command before and after values agree exactly.

vflyer.cpino · February 19, 2021, 5:21pm

I don't have fiscal authority but agree this is a worthy consideration which I shall forward.
We are under time pressure, no days off; any help is appreciated.

asdffdsa · February 19, 2021, 5:24pm

ok

now that is done, what is the next concern?

vflyer.cpino · February 19, 2021, 5:57pm

Undecided but likely is transplant Production from St. Louis datacenter to AWS us-east-2; Staging is already there.

First rclone copy completed was Staging Resources from St. Louis (no egress charge) to Wasabi.

Immediate potential actions:

rclone sync AWS Staging Resources with Wasabi
- AWS Staging was cloned off ~10 days ago
- small testing team doesn't change data fast
rclone mount Wasabi as AWS Staging Resources
rclone mount VFS Cache verify POSIX compliance

Today's plan is write OverlayFS mounts for deployment late tonight then remount Production Resources volumes as read only to improve access by omitting inode st_atime, st_ctime , st_mtime updates.

Once OverlayFS mounts are written I need to estimate 1G network link transfer limits to support a full or partial copy decision to be made soon. Beyond that, Production Resources rclone operations begin in some form the great people here are helping me discover. Thank you.

asdffdsa · February 19, 2021, 6:02pm

is there a techinical problem with rclone itself?

vflyer.cpino · February 19, 2021, 7:04pm

None that I know of so far but that means almost nothing since we haven't taken rclone very far.
rclone mount has experienced and received VFS Cache POSIX compliance fixes
You see caution in abundance here as project started last March with s3fs as S3 solution:
- because s3fs was an Ubuntu standard package but
- NFS shared s3fs mount directory views are inconsistent across NFS client mounts and
- s3fs lacks copy tools altogether; standard single threaded tools were abysmally slow
  - Wasabi has read-after-write consistency unlike prior AWS S3 eventual consistency
  - Wasabi has long pause between last upload transfer and upload response
  - Wasabi requires strong parallelism to remain operation time effective
- we estimated 5.5 years to copy Production Resources volumes with s3fs mount
Rclone project appears to have aggressive feature development and highly proactive support.

asdffdsa · February 19, 2021, 7:18pm

that is correct.

at this point, i am trying to understand, do you need any help with rclone itself?

vflyer.cpino · February 19, 2021, 7:26pm

No, not right now.

Expect posts later today or tomorrow with Production Resources rclone copy options for inspection and comment.
Expect posts tomorrow or Sunday with rclone mount options and VFS Cache tuning concerns.