Best approach for massive GDrive repo?


#1

Apologies in advance for the newb question.

I have a large repo of files on GDrive (~650Gb, 33k objects) and would like to build a 1 way sync from GDrive to a local directory. The repo will be updated about once a week and I would like the sync to be automated. The rclone host machine will be running Windows.

I’ve read through the documentation and done some testing, but I am curious what this forum thinks would be the most efficient approach. Obviously a scheduled task makes the most sense, but I’m not well versed enough in rclone to know the best order of commands/subcommands/options.

One thing that’s not super clear to me:
After running an initial sync or check, is metadata of both source and destination cached and stored locally (so the next sync/check is quicker)? If so, is it the cache-info-age option that controls retention? If so, can a wildcard be used for the duration so the cache is retained indefinitely (the options seems to only accept hour increments)?

Are there any experts out there that would be kind enough to advise on the fastest, most efficient approach?

Thank you!


#2

No, not unless you use the cache backend.

I would see whether the sync is acceptable first before trying the cache backend.

I’d recommend using --fast-list if you have enough memory, as that will speed up the traversal much. With my testing you’ll get a directory traversal of 33k objects done in 40s or so.

So

rclone sync --fast-list drive:dir \path\to\local\dir

I’d also recommend making your own credentials.


#3

My initial sync took about 20 hours, and nothing was actually copied, only checked. I’d copied everything from my local directory to GDrive through the GDrive UI, then ran rclone using Gdrive as the source.

I will try the cached backend and the fast-list.

Thank you.


#4

Try --fast-list first - I’m not sure you’ll need the cache backend for only 33k files.


#5

Yep. --fast-list brought the sync time down to 2m5s. I never saw memory usage go above 15MB. Running it a second time brought the sync time down to 15s.


#6

Thank you for your help! I think I can run with this. Quick question - does -quiet run rclone in the background or does it still open the shell window? I would like my scheduled task to be invisible.


#7

--quiet can still output errors to stderr.

If you don’t want rclone to make output then use --log-file would be my suggestion. Then you’ll have something to look at when things go wrong too!


#8

Thanks again. In regards to the scope configuration options in rclone (https://rclone.org/drive/).

How does this selection relate, if at all, to the pre-existing permissions set on the Google Drive folder? Say for example, the GDrive folder owner gives me read only access, but I configure rclone to use full access. Is there potential for conflict here?


#9

In that case you’ll get permission denied errors if you attempt to write, but if you only read all will be fine.


#10

Thanks again. On the --log-file, where is it writing logs to? Or do I need to specify the path of an existing .log or .txt file?


#11

–log-file /some/directory/something.log

You need to give it a file to write to.