Rclone copy local to remote downloading data from remote

What is the problem you are having with rclone?

rclone copy from local to remote billing download data from remote.
As it is listed in the attachment below it is charging remote to local download in a local to remote copy.

I have about 4tb local and a full copy on google cloud storage. Every day the copy command is run once to copy only the last day's production. But this copy command is charging remote download.
The command stays running about 3 hours but almost all the time is excluding files...

Way is it downloading data from remote backend during the entire rclone copy local to remote? since most of the time it would be excluding files...

What is your rclone version (output from rclone version)

rclone v1.56.0

Which cloud storage system are you using? (eg Google Drive)

google cloud storage

The command you were trying to run (eg rclone copy /tmp remote:tmp)

rclone copy /local remote:REMOTE --include-from "/include-file.txt" --max-age 2021-09-05 -vv

The rclone config contents with secrets removed.

[remote]
type = google cloud storage
service_account_file = key.json

A log from the command with the -vv flag

2021/09/05 21:57:17 DEBUG : --max-age 1.039787137250301d to 2021-09-04 21:00:00.000046545 -0300 -03 m=-89837.540811688
2021/09/05 21:57:17 DEBUG : rclone: Version "v1.56.0" starting with parameters ["rclone" "copy" "/local" "remote:REMOTE" "--include-from" "/include-file.txt" "--max-age" "2021-09-05" "-vv"]
2021/09/05 21:57:17 DEBUG : Creating backend with remote "/local"
2021/09/05 21:57:17 DEBUG : Using config file from "/rclone.conf"
2021/09/05 21:57:17 DEBUG : Creating backend with remote "remote:hsasaude"
.
.
.
lot of Excluded files (not last day age)
.
.
.
copy last days files...

What is in your include-from file? Maybe that could be tightened up?

inside file have this:
/data/**

with the command "rclone copy /local remote:REMOTE" will get all /local/data/** files, right?

or will it get /others/data/** files too?

looking de debug the files seens to be right.
get the files inside /local/data/** and the files old than date are excluded.

but seens to download data when excluding...

I have a union (/union) mount with /local and remote:REMOTE. But would not be a problem because the copy is about /local only.

No, just /data/**

Is the output in dated folders? Maybe you could specifically include those?

Once thing you could try is doing a top-up sync where you use rclone copy with --max-age to only copy the newer files. This works well for eliminating dirctory scanning.

yes. I'm using already the --max-age as is showed in the template here on the first post.

the problem still remains. I don't know what can be downloaded of the backend to local with a rclone copy local to backend.

the -vv log seems to be all right also.

Rclone has to download directory listings. That is likely the traffic.

If you want to see the traffic then use -vv --dump headers which might give you a clue.

1 Like

the -vv --dump headers log bellow... lot and lot of http request and response.

that is millions of files being excluded and the request and response are being done while excluding files. Is it not supposed to request http only when transfer a file?

it is about 1 hour excluding files and 10 minutes transferring new files. Is there a way to avoid download data from backend when excluding the files?

2021/09/20 22:52:27 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2021/09/20 22:52:27 DEBUG : HTTP REQUEST (req 0xc0006fe600)
2021/09/20 22:52:27 DEBUG : GET /storage/v1/b/BUCKET/o?alt=json&delimiter=%2F&maxResults=1000&prefix=data%2Farchive%2F2021%2F
9%2F9%2F&prettyPrint=false HTTP/1.1
Host: storage.googleapis.com
User-Agent: rclone/v1.56.0
Authorization: XXXX
X-Goog-Api-Client: gl-go/1.16.5 gdcl/20210406
Accept-Encoding: gzip
2021/09/20 22:52:27 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

2021/09/20 22:52:27 DEBUG : <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2021/09/20 22:52:27 DEBUG : HTTP RESPONSE (req 0xc000189a00)
2021/09/20 22:52:27 DEBUG : HTTP/2.0 200 OK
Content-Length: 868
Alt-Svc: h3=":443"; ma=2592000,h3-29=":443"; ma=2592000,h3-T051=":443"; ma=2592000,h3-Q050=":443"; ma=2592000,h3-Q046=":443"; ma=2592000,h3-Q043=":443"; ma=2592000,quic=":443"; ma=2592000; v="46,43"
Cache-Control: private, max-age=0, must-revalidate, no-transform
Content-Type: application/json; charset=UTF-8
Date: Tue, 21 Sep 2021 02:52:27 GMT
Expires: Tue, 21 Sep 2021 02:52:27 GMT
Server: UploadServer
Vary: Origin
Vary: X-Origin
X-Guploader-Uploadid: ADPycduka-dDsSwHESpNi8yi7tcpYNXDcOjCxNPauIGx3-EKTUaEmThxT5cPn82joaIikmtIwY93m-PG8cCWDk7dbLw
2021/09/20 22:52:27 DEBUG : <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

What it is doing there is listing the directories. If you have a lot of directories then this can cause a lot of traffic.

If you have enough memory then using --fast-list will speed it up enormously I think.

1 Like

thanks a lot! Is having a lot less calls with --fast-list. But still many calls like log bellow.

Why request and response while excluding?

2021/09/21 12:11:23 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2021/09/21 12:11:23 DEBUG : HTTP REQUEST (req 0xc000294600)
2021/09/21 12:11:23 DEBUG : GET /storage/v1/b/BUCKET/o?alt=json&maxResults=1000&pageToken=CjhwYWNzLWRhdGEvYXJjaGl2ZS8yMDE2LzEwLzEvOS8zREFGRjdDRC9DNDg
xQUVDQy9GNTU1Njc4QQ%3D%3D&prefix=&prettyPrint=false HTTP/1.1
Host: storage.googleapis.com
User-Agent: rclone/v1.56.0
Authorization: XXXX
X-Goog-Api-Client: gl-go/1.16.5 gdcl/20210406
Accept-Encoding: gzip

2021/09/21 12:11:23 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2021/09/21 12:11:24 DEBUG : <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2021/09/21 12:11:24 DEBUG : HTTP RESPONSE (req 0xc000294600)
2021/09/21 12:11:24 DEBUG : HTTP/2.0 200 OK
Content-Length: 966098
Alt-Svc: h3=":443"; ma=2592000,h3-29=":443"; ma=2592000,h3-T051=":443"; ma=2592000,h3-Q050=":443"; ma=2592000,h3-Q046=":443"; ma=2592000,h3-Q043=":443"; ma=2
592000,quic=":443"; ma=2592000; v="46,43"
Cache-Control: private, max-age=0, must-revalidate, no-transform
Content-Type: application/json; charset=UTF-8
Date: Tue, 21 Sep 2021 16:11:24 GMT
Expires: Tue, 21 Sep 2021 16:11:24 GMT
Server: UploadServer
Vary: Origin
Vary: X-Origin
X-Guploader-Uploadid: ADPycdskIHzpDgtbElrjaQTXqc3E0kyOB-reIK_N2jD4fPOdg5ndbMid_HSurbO4Pz4XJaeGj6h1higqw9SQ8gzvw98

2021/09/21 12:11:24 DEBUG : <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2021/09/21 12:28:35 DEBUG : data/archive/2018/2/1/14/3EB3097E/3EB30982/33523B09: Excluded from sync (and deletion)
2021/09/21 12:28:35 DEBUG : data/archive/2018/2/1/14/3EB3097E/3EB30982/33523B0A: Excluded from sync (and deletion)
2021/09/21 12:28:35 DEBUG : data/archive/2018/2/1/14/3EB3097E/3EB30982/33523B0B: Excluded from sync (and deletion)

I think I found the problem...

I'm using rclone copy. Why I'm getting the message "Excluded from sync (and deletion)"

It's reading all the files on the remote backend. Don't have a way to just copy new files local to remote?

2021/09/21 12:46:42 DEBUG : data/archive/2016/10/18/9/625CCCAA/F5FAC885/626AC4D1: Excluded from sync (and deletion)
2021/09/21 12:46:42 DEBUG : data/archive/2016/10/18/9/625CCCAA/F5FAC885/EAEDD487: Excluded from sync (and deletion)
2021/09/21 12:46:42 DEBUG : data/archive/2016/10/18/9/625CCCAA/F5FAC885/EAEDD488: Excluded from sync (and deletion)
2021/09/21 12:46:42 DEBUG : data/archive/2016/10/18/9/625CCCAA/F5FAC885/EAEDD489: Excluded from sync (and deletion)
2021/09/21 12:46:42 DEBUG : data/archive/2016/10/18/9/625CCCAA/F5FAC885/EAEDD48A: Excluded from sync (and deletion)
2021/09/21 12:46:42 DEBUG : data/archive/2016/10/18/9/625CCCAA/F5FAC885/EAEDD48B: Excluded from sync (and deletion)
2021/09/21 12:46:42 DEBUG : data/archive/2016/10/18/9/625CCCAA/F5FAC885/EAEDD48C: Excluded from sync (and deletion)
2021/09/21 12:46:42 DEBUG : data/archive/2016/10/18/9/625CCCAA/F5FAC885/EAEDD48D: Excluded from sync (and deletion)

I'm testing here and "Excluded from sync (and deletion)" is shown when using --include-from or --max-age and I'm using both.

It seems is reading all files on backend (millions) that is not local anymore just to say it is excluded.

Is there a way to rclone copy with --include-from and --max-age that don't check files are only in backend (don't sync)?

--no-traverse did the job. Thanks a lot.

Yes --no-traverse means that rclone will try to find each file individually.

This is usually less efficient that not using it with google drive but whether it is or isn't depends on the pattern of updated files. If you have lots of files being updated in one directory the --no-traverse is a loss, whereas if you have files being updated all over the place then it can be a win.

1 Like