I am trying to write a large number of small files (in order of 10k files with 100 KB size) to an S3 bucket using a rclone mount. I used the following command to create a mount point. Everything is running inside a CI pipeline.
After the mount, a separate program reads files from the local filesystem, transforms them and writes them to the namespace/ directory. As soon as this program finishes, I unmounted the S3 mount point using the following command.
fusermount -u ./namespace
When the pipeline finished, I noticed that a few files were missing from the bucket. These files should've been written towards the end of the transformation program. How do I resolve this? More specifically, I have two questions: how to wait for rclone to write everything from VFS cache to S3 bucket, and how to ensure that my CI pipeline fails if rclone failed to write at least one file to the S3 bucket?
What is your rclone version (output from rclone version)
v1.56.2-linux-amd64
Which cloud storage system are you using? (eg Google Drive)
AWS S3
The command you were trying to run (eg rclone copy /tmp remote:tmp)
Sorry, I had to re-run the pipeline after enabling debug logs and that took time. Like I suspected, it provided no additional information than what I've already written in my question. The following are the logs of failures that I believe start occurring as soon as the unmount command is issued. I've changed file names and redacted request URLs.
2021/10/28 15:27:11 DEBUG : vfs cache: cleaner exiting
2021/10/28 15:27:11 ERROR : file-08937-1-0007.ts: Failed to copy: Put "<AWS PUT REQUEST URL>": context canceled
2021/10/28 15:27:11 INFO : file-08937-1-0007.ts: vfs cache: upload canceled
2021/10/28 15:27:11 ERROR : file-08937-3-0007.ts: Failed to copy: Put "<AWS PUT REQUEST URL>": context canceled
2021/10/28 15:27:11 INFO : file-08937-3-0007.ts: vfs cache: upload canceled
2021/10/28 15:27:11 ERROR : file-08937-2-0006.ts: Failed to copy: Put "<AWS PUT REQUEST URL>": context canceled
2021/10/28 15:27:11 INFO : file-08937-2-0006.ts: vfs cache: upload canceled
2021/10/28 15:27:11 ERROR : file-08937-0-0007.ts: Failed to copy: Put "<AWS PUT REQUEST URL>": context canceled
2021/10/28 15:27:11 INFO : file-08937-0-0007.ts: vfs cache: upload canceled
2021/10/28 15:27:11 ERROR : file-08937-2-0007.ts: Failed to copy: Put "<AWS PUT REQUEST URL>": context canceled
2021/10/28 15:27:11 INFO : file-08937-2-0007.ts: vfs cache: upload canceled
2021/10/28 15:27:11 DEBUG : rclone: Version "v1.56.2" finishing with parameters ["rclone" "mount" "s3-bucket:some-s3-bucket/namespace" "namespace" "--daemon" "--no-modtime" "--vfs-cache-mode" "writes" "--transfers" "32" "--s3-no-head" "--log-file=rclone-logs.txt" "--log-level" "DEBUG" "--config" ".rclone.conf"]
The question remains: how to wait for rclone to write everything from VFS cache to S3 bucket before it exits after receiving the unmount command?
@Animosity022 While mount does add complexity, my overall workflow was quite simplified if I used it. The CI pipeline also needs to perform routine clean-up operations after writing to the S3 bucket. It didn't happen this time because it was literally the first run on an empty bucket. To perform these tasks, I can either clone the relevant contents from the bucket to CI's local FS or directly use the S3 API to manipulate objects. The former is infeasible since I am using a third-party CI service with limited storage on worker nodes and the latter is more complicated than simply using a mount point and manipulating objects as regular files. Anyways, since there are no more choices left, I'll refactor the scripts to directly use the S3 client for performing clean-up tasks.
You can wait for rclone move to be done to move forward as everything is uploaded and don't have to deal with the extra layer of the mount and any inconsistencies when using said mount which is why you posted in the first as your process was not working, no?
You don't save any storage as you have cache mode writes so any write is done locally first anyway on your setup.
You can wait for rclone move to be done to move forward as everything is uploaded and don't have to deal with the extra layer of the mount and any inconsistencies when using said mount which is why you posted in the first as your process was not working, no?
Makes sense! Thanks a ton! I messed up while writing the scripts. I mixed the clean-up tasks and the writing task. A single script goes like -> partial clean-up (needs mount) -> write -> more clean-up. Instead, I should've done it just like you said: write -> move -> clean-up. This way, I don't need a mount for the write operation. And I am assuming deletes are synchronous for a mount?