Sync with Delete Only?

Is it possible to do a sync that only performs the deletes? (No copying.)

My use case is syncing a very large folder to a cloud service over a very slow connection. I have access to a faster connection but not with the machine housing the large folder. As a result I would like to split the sync into two parts.
Part 1 (Need help) - Sync all deletes from the large folder to the cloud service. Even with the backup feature enabled this is a relatively low bandwidth operation so it can be performed over the slow connection.
Part 2 (Partially Solved) - Sync the new and changed files over the faster connection. I have been able to mostly accomplish this via a union file system on the slow internet machine with a portable device. Then I can transport the portable device to the faster internet location and actually perform the upload of the new and changed files. The current problem is that if backup is turned off deletes are ignored, if backup is turned on all deletes and edits result in a download of the cloud version over the slow internet connection.

For further background the folder in it's entirety is far larger than the portable device can handle. However, the changes can easily be handled by the portable device.

Ideal would be a single command to accomplish this. However, I'd be happy with even the ability to run this as two separate commands. Deletes first then new and edit. (Oh, and if backup folder is turned on I would expect deletes to include the remote portion of the edit.)

You may be able to do the delete-only sync if you used --disable copy :
https://rclone.org/docs/#disable-feature-feature
Not something I had a need to test, but I imagine it should work.

As for part2 of this - why does backup on (I am assuming you are talking about rclone's --backup-dir here) result in a download? Assuming that the backup-dir is on the same remote then any deletes should just server-side move the file. It should only need to download if your --backup-dir is either local (not sure why you'd want something like that) or if your backup dir was on some other clouddrive where a server-side operation was not possible.

It seems to me like that issue would be best solved by reevaluating your backup setup rather than trying to work around it, but you haven't given me a lot of spesifics to go on here...

They are? ...
Are you talking about some other backup system here than --backup-dir ?

welcome to the forum,

you can use the flag --dry-run

-n, --dry-run                              Do a trial run with no permanent changes

Yes, in combination with --disable copy , on a sync command.
Then you can check that the result is as-expected and what you want to happen.
It's a very good option to know about in general if you aren't sure about the commands you are testing.

so --disable copy prevents a sync from copying files from source to dest?

It should disable the whole copy function for that command yes. I believe rclone simply skips any copy-steps if copy is disabled. Other functions (like delete for example) or even multiple functions at once - can be disabled. I have had little reason to use this myself, but I guess it's for spesific edge-cases like this where you want a sync, except something in particular. A sync with no deletes as an example. That would be subtly different to a copy command as for example --track-renames can't be used with a copy. Flags cover most of the more common "tweaks" to the basic commands, but not all of the possible variants.

--disable copy specifically disables the server side copy - doing that makes rclone fall back to download and upload so I don't think it will solve this problem.

The obvious way to solve this problem is to do a sync with --dry-run and parse the logs for the deletes.

However you could do it like something like this (untested) which lists all the files then works out which files are only in the destination and deletes them.

rclone lsf -R --files-only source: > source-listing
rclone lsf -R --files-only dest: > dest-listing
comm -13 source-listing dest-listing > files-to-delete
rclone delete --files-from files-to-delete --dry-run dest:

Yeah I know. I did that first only to discover that some of my files have unusual characters in them and while the log file was fine the scripting language mangled them. Before I rewrote the script in another more capable scripting language I thought that I would see if a more "native" solution was possible. I guess not.

Thanks again for such an amazing program though!

Surprised me at first too. Turns out that union file systems trick the --backup-dir option. If the first file system in the union should result in a server-side move then --backup-dir thinks all is good. However, when it comes time to actually do the operation only the last file system matters. Since a server-side move is not possible on the last file system in the union the operation falls back to a download/upload. To me deletes causing this was pretty intuitive, but edits were extra unexpected.

Nope, I'm talking about --backup-dir. If I turn that off then my portable device only has record of new files. and the union file system cannot delete cloud files so deletes simply don't happen.

I'm open to any suggestions you might have. It is a pretty straightforward setup. I have access to a low bandwidth place (A) and a high bandwidth place (B) I also have a portable device that is big enough to handle the changes but not the entire data set. How do I backup all of the data in site A to a cloud service. Since sneakernet is far better than the internet at A I am trying to figure out a combination of sneakernet and rclone that takes best advantage of the internet connections at A and B. Oh, and I suppose it is also worth noting that I do not have the ability to have permanent storage at B.

If you would prefer a less abstract example this fictional example covers all the critical pieces as well. Wedding photographer has a desktop and hard drive setup that can handle all photos, but only has access to dial up internet. However, the local library has fast internet, and is fine with the photographer using it heavily. The photographer also has a laptop can handle a bit more than one photoshoot's worth of photos. How does the photographer keep the entire collection backed up to a cloud service? Note that the photographer occasionally edits old photos as well, and those changes need to end up in the cloud service as well.

Ah, I understand. This makes more sense now that I know you were doing this over union.
I have several suggestions on how to do this more smoothly - but before I start, please tell me what OS you are using as this can impact what options we have to work with.

Also - please give me an estimate how just how bad the "slow" bandwidth is. Hopefully it's not literally dialup-speed? :slight_smile:
Also - is it an always on connection or some kind of pay-pr-use connection?

Once I know this I think I can probably help you out on a much more efficient solution than what you were trying to accomplish originally.

EDIT: One more useful thing would be if you could share the full command (or start-script) you use so I can see a few of the details more clearly.

Windows on the slow side, Debian on the fast side. Although these days with Linux on Windows it could easily be Linux on both sides if that is important to you.

Always on roughly 1 mb up. Not pay per use.

Basically this
rclone sync L:\ remote:Current --fast-list --copy-links --log-file="log.txt" --backup-dir=remote:History/%date:~-4,4%-%date:~-10,2%-%date:~-7,2% --progress --dry-run
or if you aren't familiar with Dos Batch Syntax running this today would result in
rclone sync L:\ remote:Current --fast-list --copy-links --log-file="log.txt" --backup-dir=remote:History/2019-11-22 --progress --dry-run

Ok, then here is my suggestion to make this as efficient as possible...

Your basic setup of using union here to dump your uploads to a local folder is good. We should keep this. It's really the only workable way for you do transport them later.

However, to avoid any of the weirdness with how --backup-dir behaves, we should run the upload and&or maintenance commands/scripts outside of the union - ie. directly to the cloud-drive. This will allow for deletes and moves to work well together with the backup because the local union won't be absorbing those commands. (this is currently a limitation of union we have to work around - but there are plans to improve union sometime soon hopefully to more closely offer the same kind of features that mergerFS has).

To do the deletions part of the sync we should use:
--delete-before
https://rclone.org/docs/#delete-before-during-after
I would combo this with --max-transfer 1M. This will first do all the deletes in a separate pass, and then quickly exit on the second-pass (uploading) because it will hit the transfer-limit. Depending on the setup you might want to also use --compare-dest on your "temp-upload" directory if you wanted to prevent anything being synced that was already in that folder already. This should effectively solve your initial question (but sync to the cloud-remote, not the union as I said).

I think it will also be worthwhile to run a second sync command after that with --max-size 1M or something like that - just so your local connection that get some of the smallest file done - making your larger upload later faster and more efficient later on. You may want to use --bwlimit xxxk to let this work in the background without using more than 70-80% of your upload so you don't choke your normal-daily usage. How worthwhile this is really depends on how many small files you typically work on. If we can get a few hundred small files out of the way in this manner that would be a great benefit for example.

Now that's the theory of it and the flags that I think are correct for the job. Sorry for being long-winded, but I think it is just as important the explain the why as it is to just give you an answer that works :slight_smile:

I am also a Windows-primary guy and I've written a lot of batch for my own use with rclone. If you can provide a few more details I can try to coalesce all this information into an actual batch script for you - or at least a rough draft you can use as a basis. For starters these things would be relevant to know:

  • Name of your union remote
  • Name of your clouddrive remote
  • The type of cloud provider (important for any upload optimization flags)
  • The local path you sync from (assuming it is not sensitive in nature)
  • The local path for the "temp upload" folder

None of these are essential for me to give you an example, but it helps understanding and lessens the potential for confusion if I can put in the real names and paths rather than abstracted names :slight_smile:

Does that sound like a plan?

1 Like

Hey I think this is exactly what I was looking for!
Thanks!
I had not considered using --max-transfer as a way to prevent copying. I will have to play with it a bit to see if deletes, etc. count against that limit total. --dry-run's typically take over 24 hours so I expect it will take a while to confirm success or not.

So, the overall plan will be something roughly like this

1. rclone sync local_slow remote:Current --backup-dir=remote:History/date --max-transfer=1M --delete-before ...

  1. rclone sync local_slow remote:Current --backup-dir=remote:History/date --max-size=1M ...
  2. rclone sync local_slow union:current ...
  3. Sneakernet transfer data to fast site
  4. rclone move local_fast remote:Current --backup-dir=remote:History/date...
  5. Repeat based on backup schedule

Am I missing anything?

Thanks again!

Edit: I just realized that a --max-size exclusion filter would allow me to do all the deletes and also transfer files small enough to not kill my bandwidth without having to worry about maxing out the total transfer window bucket.

One thing I think you are misunderstanding (and which is very understandable because it's not obvious) is that --max-size is a filter that will apply to both sides, not just the source. Effectively this means we will not only skip transferring larger files, but also ignore (ie. not perform deletes on them) on the remote side. This is why I did not want to use it on the first command but wanted to use --delete-first + --max-transfer instead to achieve a similar result that would still make the large deletes.

Is "local_slow" here your name for the TempUpload folder in your union?
Want me to whip this into a quick script for you? I think you pretty much have the right ideas here, but you are still a bit off on some of the details I think. I would also like to make optimization suggestions for upload speed, but I need to know your cloud provider type for that (see previous question).

One thing that I highly recommend is to enable --dry-run on all commands first for testing. Only remove that once you are reasonably sure that it looks like things are behaving like you expect. Remember that any time you use sync files could get deleted if you accidentally made the sync command wrong so it's not something you should do on-the-fly testing on until you have more experience with the flags you use :slight_smile:

You’re right I had missed that. Thanks!
Back to the crossed out original I go.

No, union is the name for my union. :slight_smile: local_slow is the name for the monster folder on the slow internet machine.

I would be interested in knowing what flags you recommend. However, I’m not quite willing to put my complete setup details on the open Internet. Thank you for the offer though. I will note that my setup does include hard links so I use the —copy-links flag. Also I have determined that —fast-list makes a huge difference in the speed of a sync dry run on the slow internet connection machine.

Google

So this is the folder you collect your uploads into while using the union, correct? I am just trying to get a grasp on what we are calling things so I do not mislead you when I try to make examples :slight_smile:

And I understand and totally respect your security focus. No problem. I will then just basically revise your own list as I would recommend, and you can implement your own script from there. I can maybe share a couple of useful tricks in general also and give some snippets in PM you can re-use parts from.

--fast-list is very useful yes. Almost always recommended anytime you sync large collections of files. It can be up to 15x faster than regular listing methods. In addition to that, there is one more big optimization trick you should be aware of for Gdrive:
--drive-chunk-size 64M
(or 128M if you have plenty of RAM). This affects efficiency of bandwidth utilization on uploads. Default is only 8M and that leads to a lot of "saw-toothing" bandwidth where performance rarely actually hits 100% utilization because it stops and starts every 8MB. 64-128M can nearly double the effective speed from more efficient utilization of existing bandwidth. Note that it only helps on files that are larger than 8M to begin with though - but this is no doubt something you will want to use on the fast-bandwidth location.

I will complete that revised list for you as soon as you clarify that first question :slight_smile:

Thanks.

Here is a legend of how I have been naming things in my posts
Slow internet machine = local = local_slow = “L:\”
Transfer = (unnamed so far, let’s call it “temp” or Z:)
Cloud = remote (aka Google)
Union = remote+temp (in that order)
Fast internet machine = local_fast

Final note, my fast internet machine is actually a raspberry pi so sadly RAM based speed-ups are not an option. In fact I have had to turn on—no-traverse just to get successful uploads at all.

Hmm, I would still advice you to try to increase that chunk-size as much as possible though. Even just - say - 16M would help greatly. See what you can get away with. You can set --buffer-size 0 to save 16 MB pr transfer as it is far less consequential for performance here, and use that on chunksize instead. Even if your system just has 1GB that should be sufficient. If you have 2GB you should have plenty of room to tweak performance beyond the basics. rclone really doesn't use much RAM. Do be advised that --fast-list does use a bit of RAM though on large collections. I wouldn't use that if you are really starved for RAM. Especially on a move where it won't be of much help anyway. It's mostly useful for the sync part

Here is my revision of the rough plan I'd suggest:

At home on slow connection:

1. rclone sync local_slow: Google\:Current --fast-list --backup-dir=remote:History/date --delete-before --max-transfer 1k --compare-dest temp:

This should care of all the deletes. --compare-dest is probably not needed here since we are only doing deletes, but I left it in there mostly for illustration. If you use it you basically evaluate local_slow: as compared to Google: + temp: . Usually this is quite important for deciding what to transfer, but as I said I don't think it will make any practical difference to just deleting.

2. rclone sync local_slow: union:current ... --fast-list
As your original, but with --fast-list - this seems reasonable. Now your temp folder should have all the data we actually need to transfer.

3. rclone move temp: Google:\Current --max-size 1M --bwlimt 90k

This will just upload all the really small files in advance to save time later. You could omit this step though. This is just for the sake of efficiency later on. 90k (kBytes) should be about 70% of your upload if my math isn't completely off? Shouldn't choke your normal use.

4. Sneakernet transfer data to fast site
lol, I haven't heard this term before :stuck_out_tongue:

5. rclone move local_fast: Google:\Current --backup-dir=remote:History/date...
agreed.

So TLDR, not a lot of change when I really looked at it while understanding your terms better. Mostly the spesific implementation of step1 and adding optional step3.

I feel your pain. I used to have 1,5Mbit myself at one point back in the day. Now on 160Mbit fiber and if I ever had to go back I think I would commit sudoku... :stuck_out_tongue:
Good luck and let me know if you run into any issues. Can't say this is a problem I have tried to help on before, but I think this should work.

1 Like

Thank you!

Haven't noticed --compare-dest before. Looks useful however, if I'm reading the docs right sadly it won't help me as temp and Google are different remotes.

A variant that uses --compare-dest and and doesn't require a union remote:

  1. [home-slow] rclone copy local:Current usb:Current --fast-list --compare-dest=remote:Current
  2. [other-fast] rclone copy(or move) usb:Current remote:Current --fast-list
  3. [home-slow] rclone sync local:Current remote:Current --backup-dir=remote:History/Date --delete-before --fast-list

I have a similar use-case and this works nicely. If you have files that appear on monster between 1. and 3. then use --max-size or a front end timeout 5m rclone ... in 3.