Problems syncing very very large dataset

I'm trying to workaround listing the source, that is a s3-compatible bucket, that has trillions of directories on the root directory.

I thought that by passing the list of files to be copied using --files-from list_of_files_to_sync.lst I would prevent rclone from doing a list of files at source, but does not seem to be the case. In reality, it doesn't matter if I have or not the --files-from argument, it still is attempting to list all the source bucket, which never ends on my particular case due to being a very large data set plain at root directory.

What am I missing?

hello and welcome to the forum,

  • the latest rclone is v1.62.2
  • what is the exact command?

I'm using 1.60.1 and the commandline is
rclone --log-level=DEBUG --dump headers --stats-log-level DEBUG --stats 10s --fast-list --files-from=files.lst --bwlimit 1M:off copy s3:srcbucket backblaze:dstbucket

ok, filters are complex and often not efficient as one might want, so i am not sure if that is the issue.
others can comment on that.

tho, i would try --no-traverse without --fast-list

and easy to update - rclone selfupdate

really? so you need billions of MB RAM for rclone to work. There are computers like that - but no idea what you use.

there are always limits.

I think you have to provide more details.

Okey, this is progress :smiley: it seems that --no-traverse produced the results I was expecting, even though the destination is a totally empty bucket, so I clearly didn't fully get the concept of what it's used for :sweat_smile:

Isn't there any way to instruct rclone to not list everything at source before start transfering?
At this point I've realized that I would profit from any approach that help me.

I've tried to play with the --exclude & --include params, in an attempt to avoid source directory listing, but the strategy seems to be that it lists it and then based on the full source list, it filters based on the exclude&include combination, which for me doesn't fix anything.

Also, since the source directory is kind of hash-based directories, I attempted to use kind of regex to expand on the names of the source directories, but doesn't seem to work that way as well.

Any kind of clue if there's any alternative to avoid directory listing at source before doing any actual transfers, except from having to define a pre known list of files?

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.