Unmount after writing with VFS cache and multiple transfer workers causes data loss

cranky-wing · October 28, 2021, 1:32pm

What is the problem you are having with rclone?

I am trying to write a large number of small files (in order of 10k files with 100 KB size) to an S3 bucket using a rclone mount. I used the following command to create a mount point. Everything is running inside a CI pipeline.

rclone mount s3-bucket:some-s3-bucket/namespace namespace --daemon --no-modtime --vfs-cache-mode writes --transfers 32 --s3-no-head --config .rclone.conf

After the mount, a separate program reads files from the local filesystem, transforms them and writes them to the namespace/ directory. As soon as this program finishes, I unmounted the S3 mount point using the following command.

fusermount -u ./namespace

When the pipeline finished, I noticed that a few files were missing from the bucket. These files should've been written towards the end of the transformation program. How do I resolve this? More specifically, I have two questions: how to wait for rclone to write everything from VFS cache to S3 bucket, and how to ensure that my CI pipeline fails if rclone failed to write at least one file to the S3 bucket?

What is your rclone version (output from `rclone version`)

v1.56.2-linux-amd64

Which cloud storage system are you using? (eg Google Drive)

AWS S3

The command you were trying to run (eg `rclone copy /tmp remote:tmp`)

rclone mount s3-bucket:some-s3-bucket/namespace namespace --daemon --no-modtime --vfs-cache-mode writes --transfers 32 --s3-no-head --config .rclone.conf

The rclone config contents with secrets removed.

[s3-bucket]
type = s3
provider = AWS
env_auth = true
region = us-west-1
location_constraint = us-west-1
storage_class = STANDARD

A log from the command with the `-vv` flag

Paste  log here

asdffdsa · October 28, 2021, 2:57pm

a rclone log file will show you what rclone is doing.

--log-level=INFO --log-file=rclone.log should be enough.
else --log-level=DEBUG --log-level=rclone.log

cranky-wing · October 28, 2021, 3:44pm

Sorry, I had to re-run the pipeline after enabling debug logs and that took time. Like I suspected, it provided no additional information than what I've already written in my question. The following are the logs of failures that I believe start occurring as soon as the unmount command is issued. I've changed file names and redacted request URLs.

2021/10/28 15:27:11 DEBUG : vfs cache: cleaner exiting
2021/10/28 15:27:11 ERROR : file-08937-1-0007.ts: Failed to copy: Put "<AWS PUT REQUEST URL>": context canceled
2021/10/28 15:27:11 INFO  : file-08937-1-0007.ts: vfs cache: upload canceled
2021/10/28 15:27:11 ERROR : file-08937-3-0007.ts: Failed to copy: Put "<AWS PUT REQUEST URL>": context canceled
2021/10/28 15:27:11 INFO  : file-08937-3-0007.ts: vfs cache: upload canceled
2021/10/28 15:27:11 ERROR : file-08937-2-0006.ts: Failed to copy: Put "<AWS PUT REQUEST URL>": context canceled
2021/10/28 15:27:11 INFO  : file-08937-2-0006.ts: vfs cache: upload canceled
2021/10/28 15:27:11 ERROR : file-08937-0-0007.ts: Failed to copy: Put "<AWS PUT REQUEST URL>": context canceled
2021/10/28 15:27:11 INFO  : file-08937-0-0007.ts: vfs cache: upload canceled
2021/10/28 15:27:11 ERROR : file-08937-2-0007.ts: Failed to copy: Put "<AWS PUT REQUEST URL>": context canceled
2021/10/28 15:27:11 INFO  : file-08937-2-0007.ts: vfs cache: upload canceled
2021/10/28 15:27:11 DEBUG : rclone: Version "v1.56.2" finishing with parameters ["rclone" "mount" "s3-bucket:some-s3-bucket/namespace" "namespace" "--daemon" "--no-modtime" "--vfs-cache-mode" "writes" "--transfers" "32" "--s3-no-head" "--log-file=rclone-logs.txt" "--log-level" "DEBUG" "--config" ".rclone.conf"]

The question remains: how to wait for rclone to write everything from VFS cache to S3 bucket before it exits after receiving the unmount command?

cranky-wing · October 28, 2021, 3:48pm

Nevermind! Deal with flushing the vfs cache on exit · Issue #1909 · rclone/rclone · GitHub

Animosity022 · October 28, 2021, 3:54pm

Why use a mount at all? Just write to a location / rclone move that location and it'll exit properly when done.

You are already caching writes so it's writing it locally anyway so no space savings are happening and the mount just adds complexity.

cranky-wing · October 28, 2021, 4:19pm

@Animosity022 While mount does add complexity, my overall workflow was quite simplified if I used it. The CI pipeline also needs to perform routine clean-up operations after writing to the S3 bucket. It didn't happen this time because it was literally the first run on an empty bucket. To perform these tasks, I can either clone the relevant contents from the bucket to CI's local FS or directly use the S3 API to manipulate objects. The former is infeasible since I am using a third-party CI service with limited storage on worker nodes and the latter is more complicated than simply using a mount point and manipulating objects as regular files. Anyways, since there are no more choices left, I'll refactor the scripts to directly use the S3 client for performing clean-up tasks.

Animosity022 · October 28, 2021, 4:23pm

You are:

writing a file
file is written to rclone mount
rclone writes file to local cache
rclone mount uploads files from cache

Instead of:

write file
rclone move when done to cloud

You can wait for rclone move to be done to move forward as everything is uploaded and don't have to deal with the extra layer of the mount and any inconsistencies when using said mount which is why you posted in the first as your process was not working, no?

You don't save any storage as you have cache mode writes so any write is done locally first anyway on your setup.

cranky-wing · October 28, 2021, 4:31pm

You can wait for rclone move to be done to move forward as everything is uploaded and don't have to deal with the extra layer of the mount and any inconsistencies when using said mount which is why you posted in the first as your process was not working, no?

Makes sense! Thanks a ton! I messed up while writing the scripts. I mixed the clean-up tasks and the writing task. A single script goes like -> partial clean-up (needs mount) -> write -> more clean-up. Instead, I should've done it just like you said: write -> move -> clean-up. This way, I don't need a mount for the write operation. And I am assuming deletes are synchronous for a mount?

Animosity022 · October 28, 2021, 5:18pm

I believe in the majority of cases that would be a correct statement. I just hate to say 'all' if I'm not certain.

system · October 31, 2021, 5:18pm

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.