Dealing with lots of small files and folders using COPY

What is the problem you are having with rclone?

Sanity check for my current script. It feels like I could speed up the processing of the script as it appears to spend a lot of time just comparing the local data to the remote data. Once a difference is checked, the upload is prompt and speedy relative to my available bandwidth. This script gets "slower" every year as a new large subset of files and folders start to populate local directories. The individual files range wildly from a couple of megs to 4-6GB each and sometimes as large as 32GB. The major issue in my opinion is that that each of the City subfolders have on average 150,000 files and folders. Some cities do get as high as 400,000 which I utilize unions for those rare occasions.

Each city starts at zero files and folders, but accumulates those 100k objects within a year and can reach 200k within 2-3 years. After year 4, it falls off and very little is added. For example now in 2023, very few files are added to 2018 folders. It can occur so we still have to scan it, but it's pretty rare.

So my question is, is this the best way to accomplish what I'm doing. Are there any flags I can use to speed up the checking process, not related to bandwidth?

Run the command 'rclone version' and share the full output of the command.

rclone v1.51.0

  • os/arch: windows/amd64
  • go version: go1.13.7

Which cloud storage system are you using? (eg Google Drive)

Google Drive, Shared Drives

The command you were trying to run (eg rclone copy /tmp remote:tmp)

"c:\rclone\rclone.exe" copy "\\contaso.com\data\Documents Data\City1\2023" City1-Documents-2023:\ --bwlimit "07:00,6.25M 19:00,9.375M" --fast-list --exclude-if-present rclone.ignore -v --log-file=c:\logs\rclone_log_%date:~-10,2%"-"%date:~7,2%"-"%date:~-4,4%.txt
"c:\rclone\rclone.exe" copy "\\contaso.com\data\Documents Data\City2\2023" City2-Documents-2023:\ --bwlimit "07:00,6.25M 19:00,12.5M" --fast-list --exclude-if-present rclone.ignore -v --log-file=c:\logs\rclone_log_%date:~-10,2%"-"%date:~7,2%"-"%date:~-4,4%.txt
"c:\rclone\rclone.exe" copy "\\contaso.com\data\Documents Data\City3\2023" City3-Documents-2023:\ --bwlimit "07:00,6.25M 19:00,9.375M" --fast-list --exclude-if-present rclone.ignore -v --log-file=c:\logs\rclone_log_%date:~-10,2%"-"%date:~7,2%"-"%date:~-4,4%.txt
"c:\rclone\rclone.exe" copy "\\contaso.com\data\Documents Data\City4\2023" City4-Documents-2023:\ --bwlimit "07:00,6.25M 19:00,9.375M" --fast-list --exclude-if-present rclone.ignore -v --log-file=c:\logs\rclone_log_%date:~-10,2%"-"%date:~7,2%"-"%date:~-4,4%.txt
"c:\rclone\rclone.exe" copy "\\contaso.com\data\Documents Data\City5\2023" City5-Documents-2023:\ --bwlimit "07:00,6.25M 19:00,9.375M" --fast-list --exclude-if-present rclone.ignore -v --log-file=c:\logs\rclone_log_%date:~-10,2%"-"%date:~7,2%"-"%date:~-4,4%.txt
"c:\rclone\rclone.exe" copy "\\contaso.com\data\Documents Data\City1\2022" City1-Documents-2022:\ --bwlimit "07:00,6.25M 19:00,9.375M" --fast-list --exclude-if-present rclone.ignore -v --log-file=c:\logs\rclone_log_%date:~-10,2%"-"%date:~7,2%"-"%date:~-4,4%.txt
"c:\rclone\rclone.exe" copy "\\contaso.com\data\Documents Data\City2\2022" City2-Documents-2022:\ --bwlimit "07:00,6.25M 19:00,12.5M" --fast-list --exclude-if-present rclone.ignore -v --log-file=c:\logs\rclone_log_%date:~-10,2%"-"%date:~7,2%"-"%date:~-4,4%.txt
"c:\rclone\rclone.exe" copy "\\contaso.com\data\Documents Data\City3\2022" City3-Documents-2022:\ --bwlimit "07:00,6.25M 19:00,9.375M" --fast-list --exclude-if-present rclone.ignore -v --log-file=c:\logs\rclone_log_%date:~-10,2%"-"%date:~7,2%"-"%date:~-4,4%.txt
"c:\rclone\rclone.exe" copy "\\contaso.com\data\Documents Data\City4\2022" City4-Documents-2022:\ --bwlimit "07:00,6.25M 19:00,9.375M" --fast-list --exclude-if-present rclone.ignore -v --log-file=c:\logs\rclone_log_%date:~-10,2%"-"%date:~7,2%"-"%date:~-4,4%.txt
"c:\rclone\rclone.exe" copy "\\contaso.com\data\Documents Data\City5\2022" City5-Documents-2022:\ --bwlimit "07:00,6.25M 19:00,9.375M" --fast-list --exclude-if-present rclone.ignore -v --log-file=c:\logs\rclone_log_%date:~-10,2%"-"%date:~7,2%"-"%date:~-4,4%.txt
"c:\rclone\rclone.exe" copy "\\contaso.com\data\Documents Data\City2\2021" City2-Documents-2021-Union:\ --bwlimit "07:00,6.25M 19:00,12.5M" --fast-list --exclude-if-present rclone.ignore -v --log-file=c:\logs\rclone_log_%date:~-10,2%"-"%date:~7,2%"-"%date:~-4,4%.txt
"c:\rclone\rclone.exe" copy "\\contaso.com\data\Documents Data\City1\2021" City1-Documents-2021:\ --bwlimit "07:00,6.25M 19:00,9.375M" --fast-list --exclude-if-present rclone.ignore -v --log-file=c:\logs\rclone_log_%date:~-10,2%"-"%date:~7,2%"-"%date:~-4,4%.txt
"c:\rclone\rclone.exe" copy "\\contaso.com\data\Documents Data\City5\2021" City5-Documents-2021:\ --bwlimit "07:00,6.25M 19:00,9.375M" --fast-list --exclude-if-present rclone.ignore -v --log-file=c:\logs\rclone_log_%date:~-10,2%"-"%date:~7,2%"-"%date:~-4,4%.txt
"c:\rclone\rclone.exe" copy "\\contaso.com\data\Documents Data\City4\2021" City4-Documents-2021:\ --bwlimit "07:00,6.25M 19:00,9.375M" --fast-list --exclude-if-present rclone.ignore -v --log-file=c:\logs\rclone_log_%date:~-10,2%"-"%date:~7,2%"-"%date:~-4,4%.txt
"c:\rclone\rclone.exe" copy "\\contaso.com\data\Documents Data\City3\2021" City3-Documents-2021:\ --bwlimit "07:00,6.25M 19:00,9.375M" --fast-list --exclude-if-present rclone.ignore -v --log-file=c:\logs\rclone_log_%date:~-10,2%"-"%date:~7,2%"-"%date:~-4,4%.txt
"c:\rclone\rclone.exe" copy "\\contaso.com\data\Documents Data\City1\2020" City1-Documents-2020:\ --bwlimit "07:00,6.25M 19:00,9.375M" --fast-list --exclude-if-present rclone.ignore -v --log-file=c:\logs\rclone_log_%date:~-10,2%"-"%date:~7,2%"-"%date:~-4,4%.txt
"c:\rclone\rclone.exe" copy "\\contaso.com\data\Documents Data\City5\2020" City5-Documents-2020:\ --bwlimit "07:00,6.25M 19:00,9.375M" --fast-list --exclude-if-present rclone.ignore -v --log-file=c:\logs\rclone_log_%date:~-10,2%"-"%date:~7,2%"-"%date:~-4,4%.txt
"c:\rclone\rclone.exe" copy "\\contaso.com\data\Documents Data\City4\2020" City4-Documents-2020:\ --bwlimit "07:00,6.25M 19:00,9.375M" --fast-list --exclude-if-present rclone.ignore -v --log-file=c:\logs\rclone_log_%date:~-10,2%"-"%date:~7,2%"-"%date:~-4,4%.txt
"c:\rclone\rclone.exe" copy "\\contaso.com\data\Documents Data\City3\2020" City3-Documents-2020:\ --bwlimit "07:00,6.25M 19:00,9.375M" --fast-list --exclude-if-present rclone.ignore -v --log-file=c:\logs\rclone_log_%date:~-10,2%"-"%date:~7,2%"-"%date:~-4,4%.txt
"c:\rclone\rclone.exe" copy "\\contaso.com\data\Documents Data\City2\2020" City2-Documents-2020:\ --bwlimit "07:00,6.25M 19:00,12.5M" --fast-list --exclude-if-present rclone.ignore -v --log-file=c:\logs\rclone_log_%date:~-10,2%"-"%date:~7,2%"-"%date:~-4,4%.txt

The rclone config contents with secrets removed.

[City1-Documents-2020]
type = drive
client_id = private
client_secret = private
scope = drive
service_account_file = c:\rclone\rclone-private.json
team_drive = privateid

[City2-Documents-2020]
type = drive
client_id = private
client_secret = private
scope = drive
service_account_file = C:\rclone\rclone-private.json
team_drive = private

etc...

A log from the command with the -vv flag

Paste  log here

hello and welcome to the forum,

  • lastest rclone is v1.61.1

might try using a filter, which runs against the source.

for example, rclone copy /path/to/source dest: --max-age=24h
as rclone scans /path/to/source
if a file is older than 24 hours,
then ignore it, do not check the dest.
else check the dest and if needed, copy the file

Thanks for your response! I thought about that but as of right now going through all the directories can take 3-4 days at times. This can occur due to the large check that is occuring or if some massive piece of data gets dropped in multiple directories. For example a 200GB archive in city 1 and city 2. That delay would result in gaps in our copy no?

rclone ls on the source takes that long?

not sure about your exact case, but seems like a lot of rclone copy commands and a lot of remotes
if the dest file structure matched the source, then could you use this
`rclone copy "\contaso.com\data\Documents Data" remote:

No not at all, I just did an LS for the largest city in 2022. It took 7 minutes.

so what is the problem using --max-age against the source.
as most files do not change, rclone has a smaller subset of files to check against the dest.

and i was writing my last reply, not sure if you saw my comment about the large amount of rclone copy and remotes.

I've seen files on the source not appear on the remote side for 3-4 days before. Usually this is due to the time it takes to upload a lot of data and/or the time it took to scan maybe? If I did --max-age=24h in the timeframe that its taking 3-4 days to finish, I would miss data in days 2-4 if City1 was uploading 200GB and it took more than 24 hours no?

As for having so many destinations and copy commands, it's due to the Google Drive Shared Drives having a limit of 400,000 objects. So each City gets a Shared Drive created for each year. I also use separate .json accounts for each Cities Shared Drive to avoid potential issues in API causing bottlenecks

yes, that is correct.
might change --max-age to something like 7d
and if wanted, once a time period, run the command without --max-age

ok, that makes sense.

7 days may work and then drop in a Friday job without a max-age like you said. Now if scanning 100,000 files is the reason it starts to crawl, wouldn't that still exist with max-age since it needs to scan the file or folder to determine its age?

you wrote above "I just did an LS for the largest city in 2022. It took 7 minutes."
that was a bash ls or a rclone ls?

using --max-age, rclone ls will still take the same 7 minutes.
if no source files changed, then rclone does not have to check the dest.

It was rclone ls. It may have been a bad example because I'm local (LAN) to the city I ran it on. Other dfs-namespace city locations are not responding as quickly since it has to traverse a MAN. This may be another source to the slowness.

Would there be a method to run these rclone copy commands on local servers in each city for the performance but to coordinate with each other so they don't each try and saturate their one shared WAN connection?

yes, you would have to script it. zillion ways to do that.

basically have a central server execute commands on each client.
after each command completes or times out, execute the next command on the next client.
or have each client poll that central server.

Before going down this route I figured I would just try --max-age=72h As expected 2023 folders went quickly using this command as is. When it finally hit the first 2022 folder it seems to be slowing again. The logfile I output has just a ton of these INFOs. Any idea what occuring during this portion, that may help identify where the bigger delay comes from?

The 10:53 logs are the during the 2023 command. The 15:42 logs are during the 2022 commands.

2023/02/10 10:52:42 INFO  : Starting bandwidth limiter at 6.250MBytes/s
2023/02/10 10:53:42 NOTICE: Scheduled bandwidth change. Limit set to 6.250MBytes/s
2023/02/10 10:53:43 INFO  : 
Transferred:   	         0 / 0 Bytes, -, 0 Bytes/s, ETA -
Elapsed time:         0.0s

2023/02/10 10:54:30 INFO  : Google drive root '': Waiting for checks to finish
2023/02/10 10:54:30 INFO  : Google drive root '': Waiting for transfers to finish
2023/02/10 10:54:30 INFO  : 
Transferred:   	         0 / 0 Bytes, -, 0 Bytes/s, ETA -
Checks:               377 / 377, 100%
Elapsed time:         0.0s

2023/02/10 15:40:20 INFO  : Starting bandwidth limiter at 6.250MBytes/s
2023/02/10 15:41:20 NOTICE: Scheduled bandwidth change. Limit set to 6.250MBytes/s
2023/02/10 15:41:20 INFO  : 
Transferred:   	         0 / 0 Bytes, -, 0 Bytes/s, ETA -
Elapsed time:         0.0s

2023/02/10 15:42:20 INFO  : 
Transferred:   	         0 / 0 Bytes, -, 0 Bytes/s, ETA -
Elapsed time:         0.0s

2023/02/10 15:43:20 INFO  : 
Transferred:   	         0 / 0 Bytes, -, 0 Bytes/s, ETA -
Elapsed time:         0.0s

2023/02/10 15:44:20 INFO  : 
Transferred:   	         0 / 0 Bytes, -, 0 Bytes/s, ETA -
Elapsed time:         0.0s

2023/02/10 15:45:21 INFO  : 
Transferred:   	         0 / 0 Bytes, -, 0 Bytes/s, ETA -
Elapsed time:         0.0s

2023/02/10 15:46:21 INFO  : 
Transferred:   	         0 / 0 Bytes, -, 0 Bytes/s, ETA -
Elapsed time:         0.0s

2023/02/10 15:47:21 INFO  : 
Transferred:   	         0 / 0 Bytes, -, 0 Bytes/s, ETA -
Elapsed time:         0.0s

2023/02/10 15:48:21 INFO  : 
Transferred:   	         0 / 0 Bytes, -, 0 Bytes/s, ETA -
Elapsed time:         0.0s

2023/02/10 15:49:21 INFO  : 
Transferred:   	         0 / 0 Bytes, -, 0 Bytes/s, ETA -
Elapsed time:         0.0s

2023/02/10 15:50:21 INFO  : 
Transferred:   	         0 / 0 Bytes, -, 0 Bytes/s, ETA -
Elapsed time:         0.0s

2023/02/10 15:51:21 INFO  : 
Transferred:   	         0 / 0 Bytes, -, 0 Bytes/s, ETA -
Elapsed time:         0.0s

2023/02/10 15:52:21 INFO  : 
Transferred:   	         0 / 0 Bytes, -, 0 Bytes/s, ETA -
Elapsed time:         0.0s

2023/02/10 15:53:21 INFO  : 
Transferred:   	         0 / 0 Bytes, -, 0 Bytes/s, ETA -
Elapsed time:         0.0s

2023/02/10 15:54:21 INFO  : 
Transferred:   	         0 / 0 Bytes, -, 0 Bytes/s, ETA -
Elapsed time:         0.0s

2023/02/10 15:55:21 INFO  : 
Transferred:   	         0 / 0 Bytes, -, 0 Bytes/s, ETA -
Elapsed time:         0.0s

2023/02/10 15:56:21 INFO  : 
Transferred:   	         0 / 0 Bytes, -, 0 Bytes/s, ETA -
Elapsed time:         0.0s

2023/02/10 15:57:21 INFO  : 
Transferred:   	         0 / 0 Bytes, -, 0 Bytes/s, ETA -
Elapsed time:         0.0s

2023/02/10 15:58:21 INFO  : 
Transferred:   	         0 / 0 Bytes, -, 0 Bytes/s, ETA -
Elapsed time:         0.0s

2023/02/10 15:59:21 INFO  : 
Transferred:   	         0 / 0 Bytes, -, 0 Bytes/s, ETA -
Elapsed time:         0.0s

2023/02/10 16:00:21 INFO  : 
Transferred:   	         0 / 0 Bytes, -, 0 Bytes/s, ETA -
Elapsed time:         0.0s

2023/02/10 16:01:21 INFO  : 
Transferred:   	         0 / 0 Bytes, -, 0 Bytes/s, ETA -
Elapsed time:         0.0s

2023/02/10 16:02:21 INFO  : 
Transferred:   	         0 / 0 Bytes, -, 0 Bytes/s, ETA -
Elapsed time:         0.0s

2023/02/10 16:03:21 INFO  : 
Transferred:   	         0 / 0 Bytes, -, 0 Bytes/s, ETA -
Elapsed time:         0.0s

2023/02/10 16:04:21 INFO  : 
Transferred:   	         0 / 0 Bytes, -, 0 Bytes/s, ETA -
Elapsed time:         0.0s

2023/02/10 16:05:21 INFO  : 
Transferred:   	         0 / 0 Bytes, -, 0 Bytes/s, ETA -
Elapsed time:         0.0s

2023/02/10 16:06:21 INFO  : 
Transferred:   	         0 / 0 Bytes, -, 0 Bytes/s, ETA -
Elapsed time:         0.0s

2023/02/10 16:07:21 INFO  : 
Transferred:   	         0 / 0 Bytes, -, 0 Bytes/s, ETA -
Elapsed time:         0.0s

2023/02/10 16:08:21 INFO  : 
Transferred:   	         0 / 0 Bytes, -, 0 Bytes/s, ETA -
Elapsed time:         0.0s

2023/02/10 16:09:21 INFO  : 
Transferred:   	         0 / 0 Bytes, -, 0 Bytes/s, ETA -
Elapsed time:         0.0s

2023/02/10 16:10:21 INFO  : 
Transferred:   	         0 / 0 Bytes, -, 0 Bytes/s, ETA -
Elapsed time:         0.0s

2023/02/10 16:11:21 INFO  : 
Transferred:   	         0 / 0 Bytes, -, 0 Bytes/s, ETA -
Elapsed time:         0.0s

2023/02/10 16:12:21 INFO  : 
Transferred:   	         0 / 0 Bytes, -, 0 Bytes/s, ETA -
Elapsed time:         0.0s

2023/02/10 16:13:21 INFO  : 
Transferred:   	         0 / 0 Bytes, -, 0 Bytes/s, ETA -
Elapsed time:         0.0s

2023/02/10 16:14:21 INFO  : 
Transferred:   	         0 / 0 Bytes, -, 0 Bytes/s, ETA -
Elapsed time:         0.0s

2023/02/10 16:15:21 INFO  : 
Transferred:   	         0 / 0 Bytes, -, 0 Bytes/s, ETA -
Elapsed time:         0.0s

2023/02/10 16:16:21 INFO  : 
Transferred:   	         0 / 0 Bytes, -, 0 Bytes/s, ETA -
Elapsed time:         0.0s

2023/02/10 16:17:21 INFO  : 
Transferred:   	         0 / 0 Bytes, -, 0 Bytes/s, ETA -
Elapsed time:         0.0s

2023/02/10 16:18:21 INFO  : 
Transferred:   	         0 / 0 Bytes, -, 0 Bytes/s, ETA -
Elapsed time:         0.0s

2023/02/10 16:19:21 INFO  : 
Transferred:   	         0 / 0 Bytes, -, 0 Bytes/s, ETA -
Elapsed time:         0.0s

2023/02/10 16:20:21 INFO  : 
Transferred:   	         0 / 0 Bytes, -, 0 Bytes/s, ETA -
Elapsed time:         0.0s

2023/02/10 16:21:21 INFO  : 
Transferred:   	         0 / 0 Bytes, -, 0 Bytes/s, ETA -
Elapsed time:         0.0s

2023/02/10 16:22:21 INFO  : 
Transferred:   	         0 / 0 Bytes, -, 0 Bytes/s, ETA -
Elapsed time:         0.0s

2023/02/10 16:23:21 INFO  : 
Transferred:   	         0 / 0 Bytes, -, 0 Bytes/s, ETA -
Elapsed time:         0.0s

2023/02/10 16:24:21 INFO  : 
Transferred:   	         0 / 0 Bytes, -, 0 Bytes/s, ETA -
Elapsed time:         0.0s

2023/02/10 16:25:21 INFO  : 
Transferred:   	         0 / 0 Bytes, -, 0 Bytes/s, ETA -
Elapsed time:         0.0s

2023/02/10 16:26:21 INFO  : 
Transferred:   	         0 / 0 Bytes, -, 0 Bytes/s, ETA -
Elapsed time:         0.0s

2023/02/10 16:27:21 INFO  : 
Transferred:   	         0 / 0 Bytes, -, 0 Bytes/s, ETA -
Elapsed time:         0.0s

2023/02/10 16:28:21 INFO  : 
Transferred:   	         0 / 0 Bytes, -, 0 Bytes/s, ETA -
Elapsed time:         0.0s

2023/02/10 16:29:21 INFO  : 
Transferred:   	         0 / 0 Bytes, -, 0 Bytes/s, ETA -
Elapsed time:         0.0s

2023/02/10 16:30:21 INFO  : 
Transferred:   	         0 / 0 Bytes, -, 0 Bytes/s, ETA -
Elapsed time:         0.0s

2023/02/10 16:31:21 INFO  : 
Transferred:   	         0 / 0 Bytes, -, 0 Bytes/s, ETA -
Elapsed time:         0.0s

2023/02/10 16:32:21 INFO  : 
Transferred:   	         0 / 0 Bytes, -, 0 Bytes/s, ETA -
Elapsed time:         0.0s

2023/02/10 16:33:21 INFO  : 
Transferred:   	         0 / 0 Bytes, -, 0 Bytes/s, ETA -
Elapsed time:         0.0s

2023/02/10 16:34:21 INFO  : 
Transferred:   	         0 / 0 Bytes, -, 0 Bytes/s, ETA -
Elapsed time:         0.0s

2023/02/10 16:35:21 INFO  : 
Transferred:   	         0 / 0 Bytes, -, 0 Bytes/s, ETA -
Elapsed time:         0.0s

2023/02/10 16:36:21 INFO  : 
Transferred:   	         0 / 0 Bytes, -, 0 Bytes/s, ETA -
Elapsed time:         0.0s

2023/02/10 16:37:21 INFO  : 
Transferred:   	         0 / 0 Bytes, -, 0 Bytes/s, ETA -
Elapsed time:         0.0s

2023/02/10 16:38:21 INFO  : 
Transferred:   	         0 / 0 Bytes, -, 0 Bytes/s, ETA -
Elapsed time:         0.0s

  • well, not sure, but the command is using a bandwidth limiter and gdrive is finicky.

  • from that last post, cannot tell, did you update from v1.51.0 to v1.61.0

  • might try a debug log, change -v to -vv

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.