Dropbox use case

Hi - I'm wondering if rclone makes sense for my use case - I have over 20M files taking ~200Gb to backup from a server , ideally to dropbox since I'm already paying for several Tb there anyway. dropbox can't seem to handle this number of files, i know rsync would prb be fine - so can rclone bridge the gap?

root@master:~/Dropbox# python3 dropbox.py status
Syncing 22,881,443 files • 2+ days
Indexing 22,035 files (4 mins)
Uploading 12,613,186 files (45.3 KB/sec, 2+ days)
Downloading 10,246,222 files...

Which cloud storage system are you using?

dropbox

hello and welcome to the forum,

does that python script support dropbox batch mode.
https://rclone.org/dropbox/#dropbox-batch-mode-sync

that is super slow, is that a DSL connection?
what is the result of a speedtest?

rclone is well known for uploading as fast as a provider and internet connection can handle.
i have no issue doing that with my 1Gbps fiber optic internet connection.

Thanks for reply. I imagine the script does support batch , but iiuc the rclone will do the driving on its own , i guess instead of the dropbox daemon ? anyway i should rather have asked if rclone will be limited by dropbox's extremely slow indexing (or whatever it is that's slow) - i should have decent network speeds when needed, its a digitalocean server so i blv i just pay for whatever bandwidth i use up to some large hardware limitation

The dropbox client had a soft limit at around 500k files before it starts to crumble. Rclone or dropbox as a service does not have any such limits.

Your scenario should be fine with rclone.

Edit: given the huge amount of small files, and the fact that you are saying it is a backup, some local tar:ing before uploading with rclone could drastically reduce your upload time here.

When uploading lots of small files, check out the batch mode options. async is the one you want with batch size set to 1000.

--dropbox-batch-mode sync

In this mode rclone will batch up uploads to the size specified by --dropbox-batch-size and commit them together.

Using this mode means you can use a much higher --transfers parameter (32 or 64 works fine) without receiving too_many_requests errors.

This mode ensures full data integrity.

Note that there may be a pause when quitting rclone while rclone finishes up the last batch using this mode.

--dropbox-batch-mode async

In this mode rclone will batch up uploads to the size specified by --dropbox-batch-size and commit them together.

However it will not wait for the status of the batch to be returned to the caller. This means rclone can use a much bigger batch size (much bigger than --transfers), at the cost of not being able to check the status of the upload.

This provides the maximum possible upload speed especially with lots of small files, however rclone can't check the file got uploaded properly using this mode.

If you are using this mode then using "rclone check" after the transfer completes is recommended. Or you could do an initial transfer with --dropbox-batch-mode async then do a final transfer with --dropbox-batch-mode sync (the default).

Note that there may be a pause when quitting rclone while rclone finishes up the last batch using this mode.

Ok I'll try the batch - in the meantime I've hit some trouble doing the config which seems to be different in my case than the docs .

> root@master:~/Dropbox# rclone config
> 2022/02/10 02:08:01 NOTICE: Config file "/root/.config/rclone/rclone.conf" not found - using defaults
> No remotes found - make a new one
> n) New remote
> s) Set configuration password
> q) Quit config
> n/s/q> n
> name> dropbox_remote
> Option Storage.
> Type of storage to configure.
> Enter a string value. Press Enter for the default ("").
> Choose a number from below, or type in your own value.
**[already ambiguous - am i supposed to choose a number or type a string, or either is ok ? ]**
...
> 11 / Dropbox
>    \ "dropbox"
> Storage> dropbox
> Option client_id.
> OAuth Client Id.
> Leave blank normally.
> Enter a string value. Press Enter for the default ("").
> client_id>
> Option client_secret.
> OAuth Client Secret.
> Leave blank normally.
> Enter a string value. Press Enter for the default ("").
> client_secret>
> Edit advanced config?
> y) Yes
> n) No (default)

and it seems I never hit the 'please visit
https://www.dropbox.com/1/oauth2....' business that would allow me to connect to my dropbox account as shown in the dropbox docs

Hit return to go to the next prompt.

righto -
everything seems to run great , thanks to the devs!!!!
I didnt quite grok how to use --dry-run, for instance

rclone --dry_run sync projects/PycharmProjects/trademark_scrape/data/ dropbox_remote:projects/PycharmProjects/trademark_scrape/data/

hit err

2022/02/10 06:15:36 Fatal error: unknown command "projects/PycharmProjects/trademark_scrape/data/" for "rclone"

while the docs say

Syntax: [options] subcommand <parameters> <parameters...>

and --dry-run is listed as an option, but in any case -i was good enough in my case

That's right.

That has --dry_run and not --dry-run

righto again, cheers. This is a great tool and just wht the doctor ordered in my case.
Just being able to see what's happening is already a vast improvement to the dropbox daemon.
I still seem to have slower uploads than what dropbox and my connection would allow, after setting async and batchsize 1000 but possibly skipping over a lot of files (I requested not changing modification times in interactive mode) is costing me in speed.

root@master:~/Dropbox# rclone --version
rclone v1.57.0
- os/version: ubuntu 20.04 (64 bit)
- os/kernel: 5.4.0-97-generic (x86_64)
- os/type: linux

root@master:~# rclone sync -i --dropbox-batch-mode async --dropbox-batch-size 1000 projects/PycharmProjects/trademark_scrape/data/ dropbox_remote:projects/PycharmProjects/trademark_scrape/data/
...
2022/02/10 12:51:40 NOTICE:
Transferred:      410.672 MiB / 523.976 MiB, 78%, 12.438 KiB/s, ETA 2h35m28s

root@master:~# speedtest-cli
Retrieving speedtest.net configuration...
Testing from Digital Ocean (a.b.c.d)...
Retrieving speedtest.net server list...
Selecting best server based on ping...
Hosted by Webe Digital Sdn Bhd (Petaling Jaya) [299.90 km]: 7.269 ms
Testing download speed................................................................................
Download: 811.00 Mbit/s
Testing upload speed......................................................................................................
Upload: 701.21 Mbit/s

root@master:~# wget www.dropbox.com
URL transformed to HTTPS due to an HSTS policy
--2022-02-10 12:48:21--  https://www.dropbox.com/
Resolving www.dropbox.com (www.dropbox.com)... 2620:100:6031:18::a27d:5112, 162.125.81.18
Connecting to www.dropbox.com (www.dropbox.com)|2620:100:6031:18::a27d:5112|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘index.html’

index.html                                    [ <=>                                                                               ] 525.90K  --.-KB/s    in 0.05s

2022-02-10 12:48:22 (10.1 MB/s) - ‘index.html’ saved [538517]

If you post a snippet, I have no idea what you ran so it's really impossible to figure out how to answer your question.

For any question, the template really helps out as that collects all the right information and helps to keep a post for a question as it just gets confusing.

added some detail to msg -

How many files per second is rclone transferring? I think you'll be lucky to get much above 10 with the Dropbox API limits.

I am getting a bit above 1 file per second . Any suggestions welcome for improving speed but in any case this is already better than what I had.

Run rclone with debug on and check the logs. There may be some hints in there.

I hit the dreaded 'too_many_requests' which afaict is supposed to be avoided or mitigated by the use of batch-mode async and batchsize 1000

root@master:~/Dropbox# rclone sync --log-file mylogfile.txt --log-level DEBUG --size-only --dropbox-batch-mode async --dropbox-batch-size 1000 --exclude *.txt  projects/PycharmProjects/trademark_scrape/data/ dropbox_remote:projects/PycharmProjects/trademark_scrape/data
....
> 2022/02/16 01:01:26 DEBUG : 286740160_errlog.txt: Uploading chunk 2/1
> 2022/02/16 01:01:26 DEBUG : pacer: low level retry 1/10 (error too_many_requests/.)
> 2022/02/16 01:01:26 DEBUG : pacer: Rate limited, increasing sleep to 20ms
> 2022/02/16 01:01:26 DEBUG : pacer: Reducing sleep to 15ms
> 2022/02/16 01:01:26 DEBUG : 286740455_errlog.txt: Uploading chunk 1/1
> 2022/02/16 01:01:26 DEBUG : pacer: low level retry 1/10 (error too_many_requests/...)

Dropbox has a TPS limit of 12 per second so you need to add that as it's per application you have registered.

I use:

 --tpslimit 12 --tpslimit-burst 12

It's not about the batch mode as you are syncing so you are listing, checking, etc and you only get ~12 per second.

Note that you can't be writing to the same account from elsewhere also....

But the --tpslimit advice from Animosity is good.

Any suggestions welcome for improving speed but in any case this is already better than what I had.

currently I am getting a bit above 1 file per second . .