Recommended setup for cold-storage archive

Hi, we’re currently investigating utilizing Oracle Cloud archive storage for our disaster recovery backups. We set up the remote by using the Amazon S3 backend, which Oracle can emulate. (I’m not sure if there is any benefit to using the swift backend, but would be interested in hearing more). We are running rclone directly on our filer to avoid network bottlenecks.

What would be the recommended setup for cold-storage archives? To be clear, nothing on the remote side would change from any other location. We have looked into using sync, however every time we sync, we obtain a massive number of transaction charges. (We have lots of directories) I have seen the mount and cache commands, as well as the cache options in the sync command.

The following is the command that we were running originally, which resulted in the charges.

rclone --fast-list -v --config /etc/rclone.conf sync --delete-during --transfers 64 --checkers 48 --contimeout 60s --timeout 300s --retries 3 --low-level-retries 10 --stats 10s --filter-from /etc/rclone_filter.conf /tank/data/root config:bucket/

I should mention that lowering the number of transactions is the primary goal here.

Thanks.

To minimise transactions with s3 (and swift for that matter) use the --fast-list and either --size-only or --checksum. If you don’t use --size-only or --checksum then rclone reads metadata for each object to read the modification time which is probably what you are noticing.

--size-only is the quickest but least accurate, --checksum will read the local objects and calculate their MD5SUM each sync which is great for data integrity but not so great for disk IO. Perhaps you could do a --size-only sync daily then a --checksum sync once a week.

I’d use the s3 interface - the large object handling in swift is not as good as that in s3 IMHO.

1 Like

Hi Nick, thanks for your reply.

I am using fast-list currently, and local-disk I/O isn’t much of a problem, as we’ll be running this overnight.

To obtain the checksum or size of a file, doesn’t that still incur a transaction?

I was thinking that something like this would work, and is what I understand the cache does. For every backup, write out the hashes of all files on disk, and compare them to the hashes of the last backup. Then, only transfer the ones that are different or new, and delete the files that no longer exist.

So essentially, it would keep a cache of the metadata of the remote side on the local side, and compare against that. Then, the only transactions that would be charged would be the uploads.

No, because both of those things come in the listing of the bucket. The modification time doesn’t - hence the extra transaction.

Yes that is what the cache does and it should work like that. You’ll want to tune the retention times of the cache carefully as the cache will re-list the directories when they fall out of the cache.

I’m not 100% sure you need it with --checksum --fast-list but it might be worth a try.

If we are not modifying the bucket from anywhere else, is it advisable to set the retention times to be very large?

I’m also thinking that since we’re using a ZFS file system, we could instead upload our daily snapshots instead. Would the best way to do this be piping zfs send into rcat?

Yes that would seem sensible.

rclone rcat has no way of retrying stuff if the transfer goes wrong. I don’t know enough about zfs to say if that is a problem or not though - does it matter if you miss one transfer? If it does matter then use zfs to create the file in a local directory with a unique name and then use rclone move to move it to the cloud.

Ok. I took a stab at using cache.

[Backup]
type = s3
...

[Cache]
type = cache
remote = Backup:bucket/
chunk_size = 10M
info_age = 8760h
chunk_total_size = 1000G

Then, I sync with the following:

$ rclone --fast-list --checksum -v --config /etc/rclone.conf sync --delete-during  --transfers 64 --checkers 48 --contimeout 60s --timeout 300s --retries 3 --low-level-retries 10 --stats 10s --filter-from /etc/rclone_filter.conf --cache-workers=32 /tank/data/ Cache:

After sitting there for a while, it starts syncing. One thing that I have noticed is that for directories, I continuously get “put: cache expired”. Is this to be expected?

I’ve also noticed that when the sync starts, it seems to just sit there for a long time, printing that it hasn’t done anything (this takes almost two hours with our dataset of 190k files and 91k directories), and the cpu is only running on one core. No debug information is written even with -vvvvv specified. I assume it’s checking the hashes, but shouldn’t this be a parallel operation?

2018/06/20 17:04:23 INFO  : Cache: Cache DB path: /export/home/user/.cache/rclone/cache-backend/cache.db
2018/06/20 17:04:23 INFO  : Cache: Cache chunk path: /export/home/user/.cache/rclone/cache-backend/Cache
2018/06/20 17:04:23 INFO  : Cache: Chunk Memory: true
2018/06/20 17:04:23 INFO  : Cache: Chunk Size: 10M
2018/06/20 17:04:23 INFO  : Cache: Chunk Total Size: 1000G
2018/06/20 17:04:23 INFO  : Cache: Chunk Clean Interval: 1m0s
2018/06/20 17:04:23 INFO  : Cache: Workers: 32
2018/06/20 17:04:23 INFO  : Cache: File Age: 8760h0m0s
2018/06/20 17:04:23 INFO  : Cache remote Cache:: Modify window is 1ns
2018/06/20 17:04:23 INFO  : Waiting for deletions to finish
2018/06/20 17:04:33 INFO  :
Transferred:      0 Bytes (0 Bytes/s)
Errors:                 0
Checks:                 0
Transferred:            0
Elapsed time:         10s

2018/06/20 17:04:43 INFO  :
Transferred:      0 Bytes (0 Bytes/s)
Errors:                 0
Checks:                 0
Transferred:            0
Elapsed time:         20s

2018/06/20 17:04:53 INFO  :
Transferred:      0 Bytes (0 Bytes/s)
Errors:                 0
Checks:                 0
Transferred:            0
Elapsed time:         30s
...

I’m not sure what that message means exactly… @remus.bunduc is it expected?

This should be a parallel operation.

I wonder if this is caused by --fast-list? Can you try without that? You shouldn’t need it with the cache. Cache does support it, but I think it isn’t a well used code path.

–fast-list was indeed the issue. It now starts instantly.

Before putting this into production, I know that the cache system is still in beta. Is there any danger to use it right now in its current state?

Another thing that I have noticed is that when I added a few exclusions to my filter-from file, anything that matches throws the following error when the delete-excluded flag is specified (as I want to remove them from the remote):

2018/06/23 00:02:32 ERROR : my/file/to/ignore.ext: error refreshing object in : in cache fs S3 bucket mybucket: object not found

My command is:

rclone --size-only -v --config /etc/rclone.conf sync --transfers 64 --checkers 48 --contimeout 60s --timeout 300s --retries 3 --low-level-retries 10 --stats 60s --filter-from /etc/rclone_filter.conf --cache-workers=32 --cache-info-age=8760h --delete-excluded --stats 10s /tank/data/ Cache:

Thanks for your help - it’s greatly appreciated.

I think the worst that might happen is some data doesn’t get synced that should have been.

You could run an independent rclone check --fast-list --checksum /path/to/source dest: not via the cache every now and again to verify everything.

Can you show a log with -vv of the problem? Not sure exactly what is going on there!

2018/06/22 23:55:28 DEBUG : path/to/a/folder: list: cached entries: [path/to/a/folder/data path/to/a/folder/file.dat]
2018/06/22 23:55:28 DEBUG : Cache remote Cache:: list 'path/to/a/folder/data'
2018/06/22 23:55:28 DEBUG : path/to/a/folder/data/file.dat.exclude: Excluded from sync (and deletion)
2018/06/22 23:55:28 DEBUG : path/to/a/folder/data: list: warm 1 from cache for: path/to/a/folder/data, expiring on: 2019-06-22 20:21:20.677563142 -0400 EDT
2018/06/22 23:55:28 DEBUG : path/to/a/folder/data: list: cached entries: [path/to/a/folder/data/file.dat.exclude]
2018/06/22 23:55:48 ERROR : path/to/a/folder/data/file.dat.exclude: error refreshing object in : in cache fs S3 bucket mybucket: object not found
2018/06/22 23:55:48 ERROR : path/to/a/folder/data/file.dat.exclude: Couldn't delete: in cache fs S3 bucket mybucket: object not found
2018/06/22 23:57:16 DEBUG : path/to/a/folder: list: cached entries: [path/to/a/folder/data path/to/a/folder/file.dat]

I tried to replicate this but failed. Can you work out what I need to do differently?

I set up a cache pointing to a local directory in /tmp TestCache:

(I also tried this with s3 as the destination with the same results)

$ rclone config show TestCache
--------------------
[TestCache]
type = cache
remote = /tmp/rclone_cache_test
chunk_size = 
info_age = 
chunk_age = 
warmup_age = 
--------------------

I then set up a source directory

$ mkdir /tmp/rclone_cache_test
$ ls /tmp/rclone_cache_test
$ rclone --cache-db-purge ls TestCache:
$ cd /tmp/
$ mkdir source
$ echo one > source/normal
$ echo two > source/excluded
$ ls -l /tmp/source/
total 8
-rw-rw-r-- 1 ncw ncw 4 Jun 24 09:49 excluded
-rw-rw-r-- 1 ncw ncw 4 Jun 24 09:49 normal

Initial sync

$ rclone -vv sync /tmp/source/ TestCache:
2018/06/24 09:50:08 DEBUG : rclone: Version "v1.42-010-g935533e5" starting with parameters ["rclone" "-vv" "sync" "/tmp/source/" "TestCache:"]
2018/06/24 09:50:08 DEBUG : Using config file from "/home/ncw/.rclone.conf"
2018/06/24 09:50:08 DEBUG : TestCache: wrapped local:/tmp/rclone_cache_test at root 
2018/06/24 09:50:08 INFO  : TestCache: Cache DB path: /home/ncw/.cache/rclone/cache-backend/TestCache.db
2018/06/24 09:50:08 INFO  : TestCache: Cache chunk path: /home/ncw/.cache/rclone/cache-backend/TestCache
2018/06/24 09:50:08 INFO  : TestCache: Chunk Memory: true
2018/06/24 09:50:08 INFO  : TestCache: Chunk Size: 5M
2018/06/24 09:50:08 INFO  : TestCache: Chunk Total Size: 10G
2018/06/24 09:50:08 INFO  : TestCache: Chunk Clean Interval: 1m0s
2018/06/24 09:50:08 INFO  : TestCache: Workers: 4
2018/06/24 09:50:08 INFO  : TestCache: File Age: 6h0m0s
2018/06/24 09:50:08 DEBUG : Adding path "cache/expire" to remote control registry
2018/06/24 09:50:08 DEBUG : Adding path "cache/stats" to remote control registry
2018/06/24 09:50:08 DEBUG : Cache remote TestCache:: list ''
2018/06/24 09:50:08 DEBUG : : list: empty listing
2018/06/24 09:50:08 DEBUG : : list: read 0 from source
2018/06/24 09:50:08 DEBUG : : list: source entries: []
2018/06/24 09:50:08 DEBUG : : list: cached directories: 0
2018/06/24 09:50:08 DEBUG : : list: cached dir: '', cache ts: 2018-06-24 09:50:08.871707105 +0100 BST m=+0.017722264
2018/06/24 09:50:08 INFO  : Cache remote TestCache:: Waiting for checks to finish
2018/06/24 09:50:08 DEBUG : Cache remote TestCache:: put data at 'normal'
2018/06/24 09:50:08 INFO  : Cache remote TestCache:: Waiting for transfers to finish
2018/06/24 09:50:08 DEBUG : Cache remote TestCache:: put data at 'excluded'
2018/06/24 09:50:08 DEBUG : normal: put: uploaded to remote fs
2018/06/24 09:50:08 DEBUG : excluded: put: uploaded to remote fs
2018/06/24 09:50:08 DEBUG : normal: put: added to cache
2018/06/24 09:50:08 DEBUG : excluded: put: added to cache
2018/06/24 09:50:08 DEBUG : : cache: expired 
2018/06/24 09:50:08 INFO  : : put: cache expired
2018/06/24 09:50:08 DEBUG : normal: object hash cached: 5bbf5a52328e7439ae6e719dfe712200
2018/06/24 09:50:08 INFO  : normal: Copied (new)
2018/06/24 09:50:08 DEBUG : : cache: expired 
2018/06/24 09:50:08 INFO  : : put: cache expired
2018/06/24 09:50:08 DEBUG : excluded: object hash cached: c193497a1a06b2c72230e6146ff47080
2018/06/24 09:50:08 INFO  : excluded: Copied (new)
2018/06/24 09:50:08 INFO  : Waiting for deletions to finish
2018/06/24 09:50:08 INFO  : 
Transferred:      8 Bytes (138 Bytes/s)
Errors:                 0
Checks:                 0
Transferred:            2
Elapsed time:          0s

2018/06/24 09:50:08 DEBUG : 7 go routines active
2018/06/24 09:50:08 DEBUG : rclone: Version "v1.42-010-g935533e5" finishing with parameters ["rclone" "-vv" "sync" "/tmp/source/" "TestCache:"]
2018/06/24 09:50:08 DEBUG : Cache remote TestCache:: Services stopped

Second sync

$ rclone -vv sync /tmp/source/ TestCache:
2018/06/24 09:50:23 DEBUG : rclone: Version "v1.42-010-g935533e5" starting with parameters ["rclone" "-vv" "sync" "/tmp/source/" "TestCache:"]
2018/06/24 09:50:23 DEBUG : Using config file from "/home/ncw/.rclone.conf"
2018/06/24 09:50:23 DEBUG : TestCache: wrapped local:/tmp/rclone_cache_test at root 
2018/06/24 09:50:23 INFO  : TestCache: Cache DB path: /home/ncw/.cache/rclone/cache-backend/TestCache.db
2018/06/24 09:50:23 INFO  : TestCache: Cache chunk path: /home/ncw/.cache/rclone/cache-backend/TestCache
2018/06/24 09:50:23 INFO  : TestCache: Chunk Memory: true
2018/06/24 09:50:23 INFO  : TestCache: Chunk Size: 5M
2018/06/24 09:50:23 INFO  : TestCache: Chunk Total Size: 10G
2018/06/24 09:50:23 INFO  : TestCache: Chunk Clean Interval: 1m0s
2018/06/24 09:50:23 INFO  : TestCache: Workers: 4
2018/06/24 09:50:23 INFO  : TestCache: File Age: 6h0m0s
2018/06/24 09:50:23 DEBUG : Adding path "cache/expire" to remote control registry
2018/06/24 09:50:23 DEBUG : Adding path "cache/stats" to remote control registry
2018/06/24 09:50:23 DEBUG : Cache remote TestCache:: list ''
2018/06/24 09:50:23 DEBUG : : list: cold listing: 2018-06-24 03:50:08.901053886 +0100 BST
2018/06/24 09:50:23 DEBUG : : list: read 2 from source
2018/06/24 09:50:23 DEBUG : : list: source entries: [excluded normal]
2018/06/24 09:50:23 DEBUG : : list: cached object: excluded
2018/06/24 09:50:23 DEBUG : : list: cached object: normal
2018/06/24 09:50:23 DEBUG : : list: cached directories: 0
2018/06/24 09:50:23 DEBUG : : list: cached dir: '', cache ts: 2018-06-24 09:50:23.953566254 +0100 BST m=+0.032286330
2018/06/24 09:50:23 DEBUG : excluded: Size and modification time the same (differ by 0s, within tolerance 1ns)
2018/06/24 09:50:23 DEBUG : excluded: Unchanged skipping
2018/06/24 09:50:23 DEBUG : normal: Size and modification time the same (differ by 0s, within tolerance 1ns)
2018/06/24 09:50:23 DEBUG : normal: Unchanged skipping
2018/06/24 09:50:23 INFO  : Cache remote TestCache:: Waiting for checks to finish
2018/06/24 09:50:23 INFO  : Cache remote TestCache:: Waiting for transfers to finish
2018/06/24 09:50:23 INFO  : Waiting for deletions to finish
2018/06/24 09:50:23 INFO  : 
Transferred:      0 Bytes (0 Bytes/s)
Errors:                 0
Checks:                 2
Transferred:            0
Elapsed time:          0s

2018/06/24 09:50:23 DEBUG : 8 go routines active
2018/06/24 09:50:23 DEBUG : rclone: Version "v1.42-010-g935533e5" finishing with parameters ["rclone" "-vv" "sync" "/tmp/source/" "TestCache:"]
2018/06/24 09:50:23 DEBUG : Cache remote TestCache:: Services stopped

Sync with --delete-excluded

$ rclone -vv sync --exclude excluded --delete-excluded /tmp/source/ TestCache:
2018/06/24 09:50:47 DEBUG : rclone: Version "v1.42-010-g935533e5" starting with parameters ["rclone" "-vv" "sync" "--exclude" "excluded" "--delete-excluded" "/tmp/source/" "TestCache:"]
2018/06/24 09:50:47 DEBUG : Using config file from "/home/ncw/.rclone.conf"
2018/06/24 09:50:47 DEBUG : TestCache: wrapped local:/tmp/rclone_cache_test at root 
2018/06/24 09:50:47 INFO  : TestCache: Cache DB path: /home/ncw/.cache/rclone/cache-backend/TestCache.db
2018/06/24 09:50:47 INFO  : TestCache: Cache chunk path: /home/ncw/.cache/rclone/cache-backend/TestCache
2018/06/24 09:50:47 INFO  : TestCache: Chunk Memory: true
2018/06/24 09:50:47 INFO  : TestCache: Chunk Size: 5M
2018/06/24 09:50:47 INFO  : TestCache: Chunk Total Size: 10G
2018/06/24 09:50:47 INFO  : TestCache: Chunk Clean Interval: 1m0s
2018/06/24 09:50:47 INFO  : TestCache: Workers: 4
2018/06/24 09:50:47 INFO  : TestCache: File Age: 6h0m0s
2018/06/24 09:50:47 DEBUG : Adding path "cache/expire" to remote control registry
2018/06/24 09:50:47 DEBUG : Adding path "cache/stats" to remote control registry
2018/06/24 09:50:47 DEBUG : Cache remote TestCache:: list ''
2018/06/24 09:50:47 DEBUG : excluded: Excluded from sync (and deletion)
2018/06/24 09:50:47 DEBUG : : list: warm 2 from cache for: , expiring on: 2018-06-24 15:50:23.953566254 +0100 BST
2018/06/24 09:50:47 DEBUG : : list: cached entries: [excluded normal]
2018/06/24 09:50:47 INFO  : Cache remote TestCache:: Waiting for checks to finish
2018/06/24 09:50:47 DEBUG : normal: Size and modification time the same (differ by 0s, within tolerance 1ns)
2018/06/24 09:50:47 DEBUG : normal: Unchanged skipping
2018/06/24 09:50:47 INFO  : Cache remote TestCache:: Waiting for transfers to finish
2018/06/24 09:50:47 INFO  : Waiting for deletions to finish
2018/06/24 09:50:47 DEBUG : excluded: removing object
2018/06/24 09:50:47 DEBUG : : cache: expired 
2018/06/24 09:50:47 INFO  : excluded: Deleted
2018/06/24 09:50:47 INFO  : 
Transferred:      0 Bytes (0 Bytes/s)
Errors:                 0
Checks:                 2
Transferred:            0
Elapsed time:          0s

2018/06/24 09:50:47 DEBUG : 8 go routines active
2018/06/24 09:50:47 DEBUG : rclone: Version "v1.42-010-g935533e5" finishing with parameters ["rclone" "-vv" "sync" "--exclude" "excluded" "--delete-excluded" "/tmp/source/" "TestCache:"]
2018/06/24 09:50:47 DEBUG : Cache remote TestCache:: Services stopped

Resync with --delete-excluded

$ rclone -vv sync --exclude excluded --delete-excluded /tmp/source/ TestCache:
2018/06/24 09:51:44 DEBUG : rclone: Version "v1.42-010-g935533e5" starting with parameters ["rclone" "-vv" "sync" "--exclude" "excluded" "--delete-excluded" "/tmp/source/" "TestCache:"]
2018/06/24 09:51:44 DEBUG : Using config file from "/home/ncw/.rclone.conf"
2018/06/24 09:51:44 DEBUG : TestCache: wrapped local:/tmp/rclone_cache_test at root 
2018/06/24 09:51:44 INFO  : TestCache: Cache DB path: /home/ncw/.cache/rclone/cache-backend/TestCache.db
2018/06/24 09:51:44 INFO  : TestCache: Cache chunk path: /home/ncw/.cache/rclone/cache-backend/TestCache
2018/06/24 09:51:44 INFO  : TestCache: Chunk Memory: true
2018/06/24 09:51:44 INFO  : TestCache: Chunk Size: 5M
2018/06/24 09:51:44 INFO  : TestCache: Chunk Total Size: 10G
2018/06/24 09:51:44 INFO  : TestCache: Chunk Clean Interval: 1m0s
2018/06/24 09:51:44 INFO  : TestCache: Workers: 4
2018/06/24 09:51:44 INFO  : TestCache: File Age: 6h0m0s
2018/06/24 09:51:44 DEBUG : Adding path "cache/expire" to remote control registry
2018/06/24 09:51:44 DEBUG : Adding path "cache/stats" to remote control registry
2018/06/24 09:51:44 DEBUG : Cache remote TestCache:: list ''
2018/06/24 09:51:44 DEBUG : excluded: Excluded from sync (and deletion)
2018/06/24 09:51:44 DEBUG : : list: cold listing: 2018-06-24 03:50:47.760431914 +0100 BST
2018/06/24 09:51:44 DEBUG : : list: read 1 from source
2018/06/24 09:51:44 DEBUG : : list: source entries: [normal]
2018/06/24 09:51:44 DEBUG : : list: cached object: normal
2018/06/24 09:51:44 DEBUG : : list: cached directories: 0
2018/06/24 09:51:44 DEBUG : : list: cached dir: '', cache ts: 2018-06-24 09:51:44.603869892 +0100 BST m=+0.026212905
2018/06/24 09:51:44 INFO  : Cache remote TestCache:: Waiting for checks to finish
2018/06/24 09:51:44 DEBUG : normal: Size and modification time the same (differ by 0s, within tolerance 1ns)
2018/06/24 09:51:44 DEBUG : normal: Unchanged skipping
2018/06/24 09:51:44 INFO  : Cache remote TestCache:: Waiting for transfers to finish
2018/06/24 09:51:44 INFO  : Waiting for deletions to finish
2018/06/24 09:51:44 INFO  : 
Transferred:      0 Bytes (0 Bytes/s)
Errors:                 0
Checks:                 1
Transferred:            0
Elapsed time:          0s

2018/06/24 09:51:44 DEBUG : 6 go routines active
2018/06/24 09:51:44 DEBUG : rclone: Version "v1.42-010-g935533e5" finishing with parameters ["rclone" "-vv" "sync" "--exclude" "excluded" "--delete-excluded" "/tmp/source/" "TestCache:"]
2018/06/24 09:51:44 DEBUG : Cache remote TestCache:: Services stopped

Hi Nick, I’ll give that a go tomorrow.

Thinking a bit more about this, why would this be an error? If it doesn’t exist on the remote end, and it’s excluded on the local side, shouldn’t it just be silently ignored, and removed from the local cache?

I don’t know why it is giving the error. In my test above I uploaded it via the cache then deleted it via the cache (courtesy of --delete-excluded) and it all worked OK.

Interesting. I just tried your example and it worked.

I also checked in the bucket console, and I can confirm that the excluded files on the problem dataset do not exist on the remote side. I’m running rclone v1.41.

Would regenerating the cache from scratch help?

I guess the cache might have got out of sync with the actual bucket, so worth a try.