Chunker leaving stray files when target folder did not exist before transfer (Dropbox Business)

What is the problem you are having with rclone?

Short version:
Chunker on Dropbox Business is leaving stray .rclone_chunk.* files when the target directory did not exist before the transfer. This does not happen on Google Workspace.
This curiously also happens with small files way below the chunk_size threshold.

Longer version:
The problem occurs when I copy a file through a Chunker backend to Dropbox and a target directory has to be created as a result of the transfer. The files can be small and below the threshold of chunk_size = 300G for this to occur.

What happens in the background is the first transfer fails with an error when renaming the file on the server and then rclone will transfer the file again (see full log at the end).

2023/04/06 12:06:27 ERROR : 10Mtest.file.rclone_chunk.001_p051y4: Couldn't move: move failed: from_lookup/not_found/..
2023/04/06 12:06:28 ERROR : 10Mtest.file.rclone_chunk.001_p051y4: Failed to remove temporary chunk: path_lookup/not_found/...
2023/04/06 12:06:28 ERROR : 10Mtest.file: Failed to copy: move failed: from_lookup/not_found/..
2023/04/06 12:06:28 ERROR : Attempt 1/3 failed with 1 errors and: move failed: from_lookup/not_found/..
2023/04/06 12:06:28 DEBUG : 10Mtest.file: Need to transfer - File not found at Destination

This will result in two problems.

  • Every time a new directory has to be created as a result of a copy, the first file to finish copying to that directory will have to be transferred twice.
  • After the transfer I am left with two files, a stale 10Mtest.file.rclone_chunk.001_p051y4 and the proper 10Mtest.file

Since there is no easy way to clean this up with a multi-TB archive I will have to manually clean up the stray fiels at some point.

Sidebar:

I found this issue trying to figure out something else. I have no idea why Chunker is using temporary .rclone_chunk.* files for transfers below the chunk_size = 300G at all. From my understanding of the docs (emphasis mine)

When rclone starts a file upload, chunker checks the file size. If it doesn't exceed the configured chunk size, chunker will just pass the file to the wrapped remote. If a file is large, chunker will transparently cut data in pieces with temporary names and stream them one by one, on the fly.

However I'm seeing temporary .rclone_chunk.* files created for all files regardless of size. This was my primary issue because that's where I thought my stray .rclone_chunk.* files were coming from. That's when I found the bug above.

What I find curious is the Uploading chunk 2/1 part from the log. Aside form the fact that chunker shouldn't even engage at all with a 10M sized file.

Run the command 'rclone version' and share the full output of the command.

rclone v1.62.2-DEV

  • os/version: gentoo 2.13 (64 bit)
  • os/kernel: 6.1.19-gentoo (x86_64)
  • os/type: linux
  • os/arch: amd64
  • go/version: go1.20.2
  • go/linking: dynamic
  • go/tags: none

Which cloud storage system are you using? (eg Google Drive)

Dropbox Business. Error does not occur on Google Workspace.

The command you were trying to run (eg rclone copy /tmp remote:tmp)

rclone copy 10Mtest.file DBc:/Test -vv

The rclone config contents with secrets removed.

[__DBraw]
type = dropbox
chunk_size = 128Mi
client_id = <ID>
client_secret = <SECRET>
token = <TOKEN>

[DBc]
type = chunker
remote = __DBraw:
chunk_size = 300G

A log from the command with the -vv flag

2023/04/06 12:06:20 DEBUG : rclone: Version "v1.62.2-DEV" starting with parameters ["rclone" "copy" "10Mtest.file" "DBc:/Test" "-vv"]
2023/04/06 12:06:20 DEBUG : Creating backend with remote "10Mtest.file"
2023/04/06 12:06:20 DEBUG : Using config file from "<PATH_RO_RCLONE_CONF>"
2023/04/06 12:06:20 DEBUG : fs cache: adding new entry for parent of "10Mtest.file", "/tmp"
2023/04/06 12:06:20 DEBUG : Creating backend with remote "DBc:/Test"
2023/04/06 12:06:20 DEBUG : Creating backend with remote "__DBraw:/Test"
2023/04/06 12:06:21 DEBUG : Dropbox root '': Using root namespace "<NAMESPACE>"
2023/04/06 12:06:21 DEBUG : fs cache: renaming cache item "__DBraw:/Test" to be canonical "__DBraw:Test"
2023/04/06 12:06:21 DEBUG : Reset feature "ListR"
2023/04/06 12:06:21 DEBUG : 10Mtest.file: Need to transfer - File not found at Destination
2023/04/06 12:06:21 DEBUG : 10Mtest.file: skip slow MD5 on source file, hashing in-transit
2023/04/06 12:06:22 DEBUG : 10Mtest.file.rclone_chunk.001_p051y4: Uploading chunk 1/1
2023/04/06 12:06:25 DEBUG : 10Mtest.file.rclone_chunk.001_p051y4: Uploading chunk 2/1
2023/04/06 12:06:26 DEBUG : Dropbox root 'Test': Adding "/Test/10Mtest.file.rclone_chunk.001_p051y4" to batch
2023/04/06 12:06:26 DEBUG : Dropbox root 'Test': Batch idle for 500ms so committing
2023/04/06 12:06:26 DEBUG : Dropbox root 'Test': Committing sync batch length 1 starting with: /Test/10Mtest.file.rclone_chunk.001_p051y4
2023/04/06 12:06:27 DEBUG : Dropbox root 'Test': Committed sync batch length 1 starting with: /Test/10Mtest.file.rclone_chunk.001_p051y4
2023/04/06 12:06:27 ERROR : 10Mtest.file.rclone_chunk.001_p051y4: Couldn't move: move failed: from_lookup/not_found/..
2023/04/06 12:06:28 ERROR : 10Mtest.file.rclone_chunk.001_p051y4: Failed to remove temporary chunk: path_lookup/not_found/...
2023/04/06 12:06:28 ERROR : 10Mtest.file: Failed to copy: move failed: from_lookup/not_found/..
2023/04/06 12:06:28 ERROR : Attempt 1/3 failed with 1 errors and: move failed: from_lookup/not_found/..
2023/04/06 12:06:28 DEBUG : 10Mtest.file: Need to transfer - File not found at Destination
2023/04/06 12:06:28 DEBUG : 10Mtest.file: skip slow MD5 on source file, hashing in-transit
2023/04/06 12:06:30 DEBUG : 10Mtest.file.rclone_chunk.001_p058mv: Uploading chunk 1/1
2023/04/06 12:06:33 DEBUG : 10Mtest.file.rclone_chunk.001_p058mv: Uploading chunk 2/1
2023/04/06 12:06:34 DEBUG : Dropbox root 'Test': Adding "/Test/10Mtest.file.rclone_chunk.001_p058mv" to batch
2023/04/06 12:06:34 DEBUG : Dropbox root 'Test': Batch idle for 500ms so committing
2023/04/06 12:06:34 DEBUG : Dropbox root 'Test': Committing sync batch length 1 starting with: /Test/10Mtest.file.rclone_chunk.001_p058mv
2023/04/06 12:06:35 DEBUG : Dropbox root 'Test': Committed sync batch length 1 starting with: /Test/10Mtest.file.rclone_chunk.001_p058mv
2023/04/06 12:06:37 INFO  : 10Mtest.file.rclone_chunk.001_p058mv: Moved (server-side) to: 10Mtest.file
2023/04/06 12:06:37 INFO  : 10Mtest.file: Copied (new)
2023/04/06 12:06:37 ERROR : Attempt 2/3 succeeded
2023/04/06 12:06:37 INFO  : 
Transferred:           30 MiB / 30 MiB, 100%, 1.250 MiB/s, ETA 0s
Checks:                 2 / 2, 100%
Renamed:                1
Transferred:            1 / 1, 100%
Elapsed time:        17.5s

2023/04/06 12:06:37 DEBUG : 9 go routines active
2023/04/06 12:06:37 INFO  : Dropbox root 'Test': Committing uploads - please wait...

It sounds like the root cause of this is in the Dropbox Move routine not creating directories when it should, or the chunker assuming that Move will create directories when it doesn't.

I think the chunker is uploading to temporary file names and renaming them even for small files - that is what it looks like.

Can you replicate this on the local file system? Or anywhere else? Or just with dropbox?

Yeah that's what it seemed like to me. It only happens with Chunker due to the temporary files and rename/move. Copying straight to Dropbox looks like this:

$ rclone copy 10Mtest.file __DBraw:/TestDir/ -vv
2023/04/07 18:45:34 DEBUG : rclone: Version "v1.62.2-DEV" starting with parameters ["rclone" "copy" "10Mtest.file" "__DBraw:/TestDir/" "-vv"]
2023/04/07 18:45:34 DEBUG : Creating backend with remote "10Mtest.file"
2023/04/07 18:45:34 DEBUG : Using config file from "<CONFIG>"
2023/04/07 18:45:34 DEBUG : fs cache: adding new entry for parent of "10Mtest.file", "/media/Videos/Temp"
2023/04/07 18:45:34 DEBUG : Creating backend with remote "__DBraw:/TestDir/"
2023/04/07 18:45:34 DEBUG : __DBraw: Loaded invalid token from config file - ignoring
2023/04/07 18:45:34 DEBUG : Saving config "token" in section "__DBraw" of the config file
2023/04/07 18:45:34 DEBUG : __DBraw: Saved new token in config file
2023/04/07 18:45:35 DEBUG : Dropbox root '': Using root namespace "<NAMESPACE>"
2023/04/07 18:45:35 DEBUG : fs cache: renaming cache item "__DBraw:/TestDir/" to be canonical "__DBraw:TestDir"
2023/04/07 18:45:35 DEBUG : 10Mtest.file: Need to transfer - File not found at Destination
2023/04/07 18:45:36 DEBUG : 10Mtest.file: Uploading chunk 1/1
2023/04/07 18:45:40 DEBUG : 10Mtest.file: Uploading chunk 2/1
2023/04/07 18:45:41 DEBUG : Dropbox root 'TestDir': Adding "/TestDir/10Mtest.file" to batch
2023/04/07 18:45:41 DEBUG : Dropbox root 'TestDir': Batch idle for 500ms so committing
2023/04/07 18:45:41 DEBUG : Dropbox root 'TestDir': Committing sync batch length 1 starting with: /TestDir/10Mtest.file
2023/04/07 18:45:42 DEBUG : Dropbox root 'TestDir': Committed sync batch length 1 starting with: /TestDir/10Mtest.file
2023/04/07 18:45:42 DEBUG : 10Mtest.file: dropbox = cfbbdae2c77066b55ee0bca512cfca2299904017a44f1dc362f4cc4e3924925d OK
2023/04/07 18:45:42 INFO  : 10Mtest.file: Copied (new)
2023/04/07 18:45:42 INFO  : 
Transferred:           10 MiB / 10 MiB, 100%, 1.666 MiB/s, ETA 0s
Transferred:            1 / 1, 100%
Elapsed time:         7.8s

2023/04/07 18:45:42 DEBUG : 9 go routines active
2023/04/07 18:45:42 INFO  : Dropbox root 'TestDir': Committing uploads - please wait...

So yeah, it's the rename (failing) that messes things up when the folder does not exist. Which is curious, because where is the temporary file being saved if not that very same folder? Maybe Dropbox needs a small delay between finished upload and rename to register that the folder does exist?

A similar issue can be seen if Chunker cannot rename the file due to Dropbox failing the rename due to the too_many_write_operations error.

2023-04-07 16:52:07 ERROR : <PATH_TO_FILE_AND_FILENAME>.rclone_chunk.001_rdjyko: Couldn't move: move failed: too_many_write_operations/...

which then also causes the upload to fail hard and be repeated from the beginning. Maybe we need a litle more graceful handling here with a rename retry instead of abort.. :slight_smile:

 

  • Is that the expected behavior for small files?
  • If yes, is it the sensible behavior?

I think the rename can cause a lot of problems if either some unexpected error occurs or the transfer is interrupted externally, potentially leaving stray chunk files. Plus I think I read some backends will count both files against the quota until the temporary chunk file falls off the trash recovery time.

 

Cannot replicate on Google Workspace with Chunker:

$ rclone copy 10Mtest.file GDc:/TestDir/ -vv
2023/04/07 19:00:47 DEBUG : rclone: Version "v1.62.2-DEV" starting with parameters ["rclone" "copy" "10Mtest.file" "GDc:/TestDir/" "-vv"]
2023/04/07 19:00:47 DEBUG : Creating backend with remote "10Mtest.file"
2023/04/07 19:00:47 DEBUG : Using config file from "<CONFIG>"
2023/04/07 19:00:47 DEBUG : fs cache: adding new entry for parent of "10Mtest.file", "/media/Videos/Temp"
2023/04/07 19:00:47 DEBUG : Creating backend with remote "GDc:/TestDir/"
2023/04/07 19:00:47 DEBUG : Creating backend with remote "GSuite:/TestDir"
2023/04/07 19:00:47 DEBUG : fs cache: renaming cache item "GSuite:/TestDir" to be canonical "GSuite:TestDir"
2023/04/07 19:00:47 DEBUG : Reset feature "ListR"
2023/04/07 19:00:47 DEBUG : 10Mtest.file: Need to transfer - File not found at Destination
2023/04/07 19:00:48 DEBUG : 10Mtest.file: skip slow MD5 on source file, hashing in-transit
2023/04/07 19:00:50 DEBUG : 10Mtest.file.rclone_chunk.001_rdzsax: Sending chunk 0 length 8388608
2023/04/07 19:00:51 DEBUG : 10Mtest.file.rclone_chunk.001_rdzsax: Sending chunk 8388608 length 2097152
2023/04/07 19:00:54 INFO  : 10Mtest.file.rclone_chunk.001_rdzsax: Moved (server-side) to: 10Mtest.file
2023/04/07 19:00:54 DEBUG : 10Mtest.file: md5 = f1c9645dbc14efddc7d8a322685f26eb OK
2023/04/07 19:00:54 INFO  : 10Mtest.file: Copied (new)
2023/04/07 19:00:54 INFO  : 
Transferred:           20 MiB / 20 MiB, 100%, 1.666 MiB/s, ETA 0s
Checks:                 1 / 1, 100%
Renamed:                1
Transferred:            1 / 1, 100%
Elapsed time:         7.4s

2023/04/07 19:00:54 DEBUG : 6 go routines active

Cannot replicate on Local with Chnker:

$ rclone copy 10Mtest.file localc:/tmp/TestDir/ -vv
2023/04/07 19:03:17 DEBUG : rclone: Version "v1.62.2-DEV" starting with parameters ["rclone" "copy" "10Mtest.file" "localc:/tmp/TestDir/" "-vv"]
2023/04/07 19:03:17 DEBUG : Creating backend with remote "10Mtest.file"
2023/04/07 19:03:17 DEBUG : Using config file from "<CONFIG>"
2023/04/07 19:03:17 DEBUG : fs cache: adding new entry for parent of "10Mtest.file", "/media/Videos/Temp"
2023/04/07 19:03:17 DEBUG : Creating backend with remote "localc:/tmp/TestDir/"
2023/04/07 19:03:17 DEBUG : Creating backend with remote "local:/tmp/TestDir"
2023/04/07 19:03:17 DEBUG : fs cache: renaming cache item "local:/tmp/TestDir" to be canonical "/tmp/TestDir"
2023/04/07 19:03:17 DEBUG : Reset feature "ListR"
2023/04/07 19:03:17 DEBUG : 10Mtest.file: Need to transfer - File not found at Destination
2023/04/07 19:03:17 DEBUG : 10Mtest.file: skip slow MD5 on source file, hashing in-transit
2023/04/07 19:03:17 INFO  : 10Mtest.file.rclone_chunk.001_re3xko: Moved (server-side) to: 10Mtest.file
2023/04/07 19:03:17 DEBUG : 10Mtest.file: md5 = f1c9645dbc14efddc7d8a322685f26eb OK
2023/04/07 19:03:17 INFO  : 10Mtest.file: Copied (new)
2023/04/07 19:03:17 INFO  : 
Transferred:           20 MiB / 20 MiB, 100%, 0 B/s, ETA -
Checks:                 1 / 1, 100%
Renamed:                1
Transferred:            1 / 1, 100%
Elapsed time:         0.0s

2023/04/07 19:03:17 DEBUG : 3 go routines active

What I find curious is that both times it says it transferred 20MiB with a single 10MiB file, probably counting the transfer and the subsequent rename individually? For the error in my first post it even counts 30MB of transferred data.

Adding to my own post, I checked the source code in chunker.go, and without deeper undestanding of the inner workings the comment in line 1177 here suggests that it should not use the "temp file and rename" for a single chunk size files:

                // If a single chunk is expected, avoid the extra rename operation
                chunkRemote := tempRemote
                if c.expectSingle && c.chunkNo == 0 && optimizeFirstChunk {
                        chunkRemote = baseRemote
                }
                info := f.wrapInfo(src, chunkRemote, size)

Which brings me to the question why it still tries to do so. Maybe it has something to do that in my logs it always tries to upload chunk 2/1, and that prevents the "only one chunk" trigger from firing? Example here:

2023/04/07 18:45:40 DEBUG : 10Mtest.file: Uploading chunk 2/1

Not sure why and how that could happen though, and without diving into the code it's really just a very uneducated guess.

Edit:
Reading the code some more the whole block above is hard disabled by the declaration in line 115 with the following line and preceding comment:

// optimizeFirstChunk enables the following optimization in the Put:
// If a single chunk is expected, put the first chunk using the
// base target name instead of a temporary name, thus avoiding
// extra rename operation.
// Warning: this optimization is not transaction safe.
const optimizeFirstChunk = false

I wonder why it isn't transaction safe though, since with a regular upload we're not using temp-name-and-rename either. Can anyone explain the dangers of turning this on?

Not that it'd fix the problem with Dropbox with files that are actually chunked, but I think it'd be a step forwards for me (less than 1% of my files are larger than 300G).

Edit 2:
Regarding the transaction safety I'm not sure how an optimizeFirstChunk upload would differ from an upload done with the existing config option transactions = norename. That is marked as experimental, but does not talk about not being transaction safe...

Edit 3:
Setting optimizeFirstChunk = true in chunker.go unsurprisingly avoids the issue for files smaller than chunk_size. Here's the log:

$ ./rclone copy 10Mtest.file DBc:/TestDir/ -vv
2023/04/07 21:00:05 DEBUG : rclone: Version "v1.63.0-beta.6948.fcdffab48" starting with parameters ["./rclone" "copy" "10Mtest.file" "DBc:/TestDir/" "-vv"]
2023/04/07 21:00:05 DEBUG : Creating backend with remote "10Mtest.file"
2023/04/07 21:00:05 DEBUG : Using config file from "<CONFIG>"
2023/04/07 21:00:05 DEBUG : fs cache: adding new entry for parent of "10Mtest.file", "/media/Videos/Temp"
2023/04/07 21:00:05 DEBUG : Creating backend with remote "DBc:/TestDir/"
2023/04/07 21:00:05 DEBUG : Creating backend with remote "__DBraw:/TestDir"
2023/04/07 21:00:05 DEBUG : Dropbox root '': Using root namespace "<NAMESPACE>"
2023/04/07 21:00:06 DEBUG : fs cache: renaming cache item "__DBraw:/TestDir" to be canonical "__DBraw:TestDir"
2023/04/07 21:00:06 DEBUG : Reset feature "ListR"
2023/04/07 21:00:06 DEBUG : 10Mtest.file: Need to transfer - File not found at Destination
2023/04/07 21:00:10 DEBUG : 10Mtest.file: skip slow MD5 on source file, hashing in-transit
2023/04/07 21:00:11 DEBUG : 10Mtest.file: Uploading chunk 1/1
2023/04/07 21:00:14 DEBUG : 10Mtest.file: Uploading chunk 2/1
2023/04/07 21:00:15 DEBUG : Dropbox root 'TestDir': Adding "/TestDir/10Mtest.file" to batch
2023/04/07 21:00:15 DEBUG : Dropbox root 'TestDir': Batch idle for 500ms so committing
2023/04/07 21:00:15 DEBUG : Dropbox root 'TestDir': Committing sync batch length 1 starting with: /TestDir/10Mtest.file
2023/04/07 21:00:16 DEBUG : Dropbox root 'TestDir': Committed sync batch length 1 starting with: /TestDir/10Mtest.file
2023/04/07 21:00:16 INFO  : 10Mtest.file: Copied (new)
2023/04/07 21:00:16 INFO  : 
Transferred:           10 MiB / 10 MiB, 100%, 1.000 MiB/s, ETA 0s
Transferred:            1 / 1, 100%
Elapsed time:        11.6s

2023/04/07 21:00:16 DEBUG : 10 go routines active
2023/04/07 21:00:16 INFO  : Dropbox root 'TestDir': Committing uploads - please wait...

Still would like to know more about the implications for thread safety. Also it doesn't fix the issue if a file is actually being chunked.

If anyone wants to enable this UNSAFE AND POTENTIALLY DANGEROUS option, here is a patch:

diff --git a/backend/chunker/chunker.go b/backend/chunker/chunker.go
index bb4b49812..4bc9b448f 100644
--- a/backend/chunker/chunker.go
+++ b/backend/chunker/chunker.go
@@ -112,7 +112,7 @@ const metadataVersion = 2
 // base target name instead of a temporary name, thus avoiding
 // extra rename operation.
 // Warning: this optimization is not transaction safe.
-const optimizeFirstChunk = false
+// const optimizeFirstChunk = false
 
 // revealHidden is a stub until chunker lands the `reveal hidden` option.
 const revealHidden = false
@@ -262,6 +262,11 @@ This method is EXPERIMENTAL, don't use on production systems.`,
 					Help: `Rename or norename will be used depending on capabilities of the backend.
 If meta format is set to "none", rename transactions will always be used.
 This method is EXPERIMENTAL, don't use on production systems.`,
+				}, {
+					Value: "optimize_single",
+					Help: `optimize_single aka 'optimizeFirstChunk' enables the following optimization in the Put:
+If a single chunk is expected, put the first chunk using the base target name instead of a temporary name, thus avoiding extra rename operation.
+This method is EXPERIMENTAL and NOT TRANSACTION SAFE, don't use on production systems.`,
 				},
 			},
 		}},
@@ -376,6 +381,7 @@ type Fs struct {
 	features     *fs.Features   // optional features
 	dirSort      bool           // reserved for future, ignored
 	useNoRename  bool           // can be set with the transactions option
+	optimizeFirstChunk bool     // if single chunk only, do not use temporary file
 }
 
 // configure sets up chunker for given name format, meta format and hash type.
@@ -468,6 +474,9 @@ func (f *Fs) setTransactionMode(transactionMode string) error {
 			f.useNoRename = false
 			return errors.New("using norename transactions requires metadata")
 		}
+	case "optimize_single":
+		f.useNoRename = false
+		f.optimizeFirstChunk = true
 	default:
 		return fmt.Errorf("unsupported transaction mode '%s'", transactionMode)
 	}
@@ -1176,7 +1185,7 @@ func (f *Fs) put(
 
 		// If a single chunk is expected, avoid the extra rename operation
 		chunkRemote := tempRemote
-		if c.expectSingle && c.chunkNo == 0 && optimizeFirstChunk {
+		if c.expectSingle && c.chunkNo == 0 && f.optimizeFirstChunk {
 			chunkRemote = baseRemote
 		}
 		info := f.wrapInfo(src, chunkRemote, size)

While I found this temporary workaround for my initial sync to Dropbox, is there anyone who knows about the Dropbox implementation and which lever to modify in the code to make sure a move creates a new directory?

Also who is the one to talk to regarding the Chunker transaction safety comment? :slight_smile:

Do you want me to enter the Dropbox issue in the bugtracker to proceed?

None of the rclone backends create directories on Move - they expect them to be already created.

Cloud storage providers only generally provide "eventual consistency" as a guarantee which means a delay could be needed.

However the uploads in the dropbox backend are somewhat asynchronous - the completion of the upload shouldn't be signalled until rclone gets the ack back from the commit. But that will be the minimum possible time after.

You could put a sleep in the code after this line to see if that helps and if so we'll know it is eventual consistency that is the problem

Not all backends (eg local) have a single part upload that either works or fails. I expect that is the reason.

I suspect Ivan was thinking about multiple users of the same remote.

The rename will be retried here

However the error returned needs to be a retriable error so it would be worth debugging what is happening there.

I wrote the dropbox integration so I can certainly help with that :slight_smile:

Alas Ivan who wrote the chunker backend can't contribute to rclone at the moment because of the Ukraine war, so we are on our own right now!

In a perfect world you'd find the problem and send a PR :slight_smile: Failing that a bug report would be great!

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.