"invalid seek position" with cached files after files updated on s3

luther_af · December 16, 2022, 1:23am

What is the problem you are having with rclone?

After updating files on s3, we sometimes get "invalid seek position" in rclone when the files are accessed locally in our pipeline. Stopping rclone and starting it back up eliminates the problem.

2022/12/16 00:41:46 ERROR : SomeFolder/SomeFile.ext: vfs cache: restart download failed: failed to start downloader: failed to open downloader: vfs reader: failed to open source file: invalid seek position
2022/12/16 00:41:50 ERROR : SomeFolder/SomeFile.ext: vfs cache: restart download failed: failed to start downloader: failed to open downloader: vfs reader: failed to open source file: invalid seek position
2022/12/16 00:41:50 ERROR : SomeFolder/SomeFile.ext: vfs cache: restart download failed: failed to start downloader: failed to open downloader: vfs reader: failed to open source file: invalid seek position
2022/12/16 00:41:50 ERROR : SomeFolder/SomeFile.ext: vfs cache: restart download failed: failed to start downloader: failed to open downloader: vfs reader: failed to open source file: invalid seek position
2022/12/16 00:41:50 ERROR : SomeFolder/SomeFile.ext: vfs cache: restart download failed: failed to start downloader: failed to open downloader: vfs reader: failed to open source file: invalid seek position
2022/12/16 00:41:50 ERROR : SomeFolder/SomeFile.ext: vfs cache: restart download failed: failed to start downloader: failed to open downloader: vfs reader: failed to open source file: invalid seek position
2022/12/16 00:41:50 ERROR : SomeFolder/SomeFile.ext: vfs cache: restart download failed: failed to start downloader: failed to open downloader: vfs reader: failed to open source file: invalid seek position
2022/12/16 00:41:50 ERROR : SomeFolder/SomeFile.ext: vfs cache: restart download failed: failed to start downloader: failed to open downloader: vfs reader: failed to open source file: invalid seek position
2022/12/16 00:41:50 ERROR : SomeFolder/SomeFile.ext: vfs cache: restart download failed: failed to start downloader: failed to open downloader: vfs reader: failed to open source file: invalid seek position
2022/12/16 00:41:50 ERROR : SomeFolder/SomeFile.ext: vfs cache: restart download failed: failed to start downloader: failed to open downloader: vfs reader: failed to open source file: invalid seek position
2022/12/16 00:41:50 ERROR : SomeFolder/SomeFile.ext: vfs cache: restart download failed: failed to start downloader: failed to open downloader: vfs reader: failed to open source file: invalid seek position
2022/12/16 00:41:50 ERROR : SomeFolder/SomeFile.ext: vfs cache: too many errors 11/10: last error: failed to open downloader: vfs reader: failed to open source file: invalid seek position
2022/12/16 00:41:50 ERROR : SomeFolder/SomeFile.ext: vfs cache: download write: failed to kick waiters: failed to open downloader: vfs reader: failed to open source file: invalid seek position
2022/12/16 00:41:50 ERROR : SomeFolder/SomeFile.ext: vfs cache: failed to _ensure cache failed to open downloader: vfs reader: failed to open source file: invalid seek position
2022/12/16 00:41:50 ERROR : IO error: failed to open downloader: vfs reader: failed to open source file: invalid seek position
2022/12/16 00:41:50 ERROR : SomeFolder/SomeFile.ext: vfs cache: failed to download: vfs reader: failed to write to cache file: failed to open downloader: vfs reader: failed to open source file: invalid seek position

Run the command 'rclone version' and share the full output of the command.

rclone v1.60.1

os/version: Microsoft Windows Server 2019 Datacenter 1809 (64 bit)
os/kernel: 10.0.17763.1339 (x86_64)
os/type: windows
os/arch: amd64
go/version: go1.19.3
go/linking: static
go/tags: cmount

Which cloud storage system are you using? (eg Google Drive)

s3

The command you were trying to run (eg `rclone copy /tmp remote:tmp`)

rclone mount my-bucket:my-bucket/ T: --vfs-cache-mode full --cache-dir D:\rclone-cache --vfs-fast-fingerprint --vfs-cache-max-age 720h --vfs-cache-max-size 50G

The rclone config contents with secrets removed.

[my-bucket]
type = s3
provider = AWS
env_auth = true
region = us-east-1
location_constraint = us-east-2
acl = private
server_side_encryption = AES256
storage_class = STANDARD

A log from the command with the `-vv` flag

n/a but I will try to reproduce this in a dev environment and get that log

ncw · December 16, 2022, 11:30am

This is definitely a bug.

I'm pretty sure its the same bug I've seen before with beta.rclone.org (which is served via rclone with --vfs-cache-mode full from object storage) where the source updates but the VFS cache gets confused internally and gets inconsistent data.

Is that how the bug gets triggered - stuff updated on the source?

If you could help me out and make a way of reproducing this reliably that would be very very useful!

If when you have a corruption you could take a look at the vfs cache, and compare the JSON file in vfsMeta and the cached file in vfs with what the file is before and after the change that would be useful too. I think the problem is that the file has the length of the old file but the contents of the new file if I remember correctly.

luther_af · December 16, 2022, 5:30pm

Thanks for the reply.

Yes, I have seen this a few times when I have updated data at the source (s3). I will try and spend some time to see if I can find steps to reproduce it consistently.

BTW I read as much as I could about the caching options. In this use case the clients using rclone only need to read files, not write them, but I do want data cached locally to speed things up for multiple reads -- should I still use cache mode "full"?

asdffdsa · December 16, 2022, 5:46pm

hi,
yes, full will cache all data locally.

luther_af · December 16, 2022, 10:41pm

I do know that full will cache locally, but since I do not need to do any writes or cache writes, is there another cache mode that might be more efficient for my use case?

luther_af · January 13, 2023, 12:36am

@ncw I have spent some more time on this this week and discovered another problem or perhaps just a related or root cause problem.

In the process I created a script that creates an s3 bucket, mounts it in rclone, uploads an image to s3, views the image through the mapped rclone drive, uploads a different image to the same object in s3 essentially updating/replacing it, and then retrieves that through the mapped rclone drive.

Initially, I was just simply seeing that rclone is retrieving the cached image instead of the new image, even after it was updated, and even without fast fingerprinting on.

I setup my script to create the bucket at the beginning, stop rclone at the end, delete the rclone cache, and delete the bucket at the end, so I could repeat the script multiple times. However, I noticed in doing this that the mapped drive would often show empty even after uploading the image to s3 and verifying that it is in fact there on s3. At one point I left for lunch and when I came back the mapped magically was now showing the contents again. I am not sure what could lead to this -- I even experienced it immediately after rebooting windows.

Here is some more information on the machine where I am reproducing the issue:
winfsp-1.12.22301
rclone v1.61.1

os/version: Microsoft Windows 11 Home 22H2 (64 bit)
os/kernel: 10.0.22623.1095 (x86_64)
os/type: windows
os/arch: amd64
go/version: go1.19.4
go/linking: static
go/tags: cmount

rclone.conf (note, I have aws credentials cached so I can run aws cli from the command line):

[rclone-test-123456]
type = s3
provider = AWS
env_auth = true
region = us-east-1
location_constraint = us-east-2
acl = private
server_side_encryption = AES256
storage_class = STANDARD

When rclone does see that I have uploaded an object "pic1.jpg", the cache json:
(myenv) PS D:\rclone-test> aws s3 cp smaller.jpg s3://rclone-test-123456/pic1.jpg
upload: .\smaller.jpg to s3://rclone-test-123456/pic1.jpg
(myenv) PS D:\rclone-test> explorer R:\pic1.jpg
(myenv) PS D:\rclone-test> cat D:\rclone-cache\vfsMeta\rclone-test-123456\rclone-test-123456\pic1.jpg

{
        "ModTime": "2023-01-12T14:26:39.2612787-08:00",
        "ATime": "2023-01-12T14:26:39.6148363-08:00",
        "Size": 48873,
        "Rs": [
                {
                        "Pos": 0,
                        "Size": 48873
                }
        ],
        "Fingerprint": "48873,2023-01-12 21:18:49 +0000 UTC,abde42c6fb17f932f97734b120d84578",
        "Dirty": false
}

The cached json again after I upload a larger image to the same destination object pic1.jpg and after accessing from the rclone mapped drive:
(myenv) PS D:\rclone-test> aws s3 cp bigger.jpg s3://rclone-test-123456/pic1.jpg
upload: .\bigger.jpg to s3://rclone-test-123456/pic1.jpg
(myenv) PS D:\rclone-test> explorer R:\pic1.jpg
(myenv) PS D:\rclone-test> cat D:\rclone-cache\vfsMeta\rclone-test-123456\rclone-test-123456\pic1.jpg

{
        "ModTime": "2023-01-12T14:26:39.2612787-08:00",
        "ATime": "2023-01-12T14:27:45.3085938-08:00",
        "Size": 48873,
        "Rs": [
                {
                        "Pos": 0,
                        "Size": 48873
                }
        ],
        "Fingerprint": "48873,2023-01-12 21:18:49 +0000 UTC,abde42c6fb17f932f97734b120d84578",
        "Dirty": false
}

You can see that only access time changes.

WIthout touching rclone or s3, I waited "a while" (had to make some phone calls) and came back and accessed the file again, then I got the correct (newer) version (and explorer opens the larger image so I can also see visually that it updated):
(myenv) PS D:\rclone-test> explorer R:\pic1.jpg
(myenv) PS D:\rclone-test> cat D:\rclone-cache\vfsMeta\rclone-test-123456\rclone-test-123456\pic1.jpg

{
        "ModTime": "2023-01-12T15:06:03.7225319-08:00",
        "ATime": "2023-01-12T15:06:04.3259038-08:00",
        "Size": 394249,
        "Rs": [
                {
                        "Pos": 0,
                        "Size": 394249
                }
        ],
        "Fingerprint": "394249,2023-01-12 22:27:39 +0000 UTC,3cc6e1936b9f7b38cffab06aed0b51e3",
        "Dirty": false
}

Here is a powershell script you can mess around with - just have two local image files that are different and named bigger.jpg and smaller.jpg - also rename the bucket as you see fit across the board (rclone.config and powershell script). D: is a local SSD where I put the rclone binary.

#create the bucket - same as the one in rclone.config
aws s3 mb s3://rclone-test-123456

# start rclone and give it ten seconds to get going
#start cmd "/k D:\rclone\rclone mount rclone-test-123456:rclone-test-123456/ R: --vfs-cache-mode full --cache-dir D:\rclone-cache --rc --vfs-fast-fingerprint --vfs-cache-max-age 720h --vfs-cache-max-size 50G"
start cmd "/k D:\rclone\rclone mount rclone-test-123456:rclone-test-123456/ R: --rc --vfs-cache-mode full --cache-dir D:\rclone-cache --vfs-cache-max-age 720h --vfs-cache-max-size 50G"
timeout /t 10

# upload the bigger of the two images to s3
aws s3 cp bigger.jpg s3://rclone-test-123456/pic1.jpg
timeout /t 2

#access the image through rclone mapped drive and show the iamge in windows:
explorer R:\pic1.jpg
timeout /t 2
#show the json in the cache:
cat D:\rclone-cache\vfsMeta\rclone-test-123456\rclone-test-123456\pic1.jpg

#replace it with a smaller image; access & show; look at cache
aws s3 cp smaller.jpg s3://rclone-test-123456/pic1.jpg
timeout /t 2
explorer R:\pic1.jpg
timeout /t 2
cat D:\rclone-cache\vfsMeta\rclone-test-123456\rclone-test-123456\pic1.jpg

#cleanup
D:\rclone\rclone.exe rc core/quit
### uncomment out the next line with care!! it will remove the entire bucket
### aws s3 rb s3://rclone-test-123456 --force
rmdir -recurse -force D:\rclone-cache

Later in my testing after replacing a larger image with a smaller one, refreshing explorer in the R: drive led to this message displayed in the output window:

2023/01/12 15:58:13 ERROR : pic1.jpg: vfs cache: failed to download: vfs reader: failed to write to cache file: InvalidRange: The requested range is not satisfiable
        status code: 416, request id: <removed for privacy>, host id: <removed for privacy>

(I removed the request id and host id)

At that point the cache json file looks like this:

(myenv) PS D:\rclone-test> cat D:\rclone-cache\vfsMeta\rclone-test-123456\rclone-test-123456\pic1.jpg
{
        "ModTime": "2023-01-12T15:57:55.5439762-08:00",
        "ATime": "2023-01-12T15:58:13.2537901-08:00",
        "Size": 394249,
        "Rs": [
                {
                        "Pos": 0,
                        "Size": 65536
                }
        ],
        "Fingerprint": "394249,2023-01-12 23:51:47 +0000 UTC,3cc6e1936b9f7b38cffab06aed0b51e3",
        "Dirty": false
}

The actual size of the larger image is 394249 and smaller file is 48873 -- so 65536 is corruption of some kind.

If I wait about 4-5 minutes after replacing the file, I can refresh without error and see the correct file.

It seems that when you access the file and then upload a new version, you will not "see" the new version for about 5 minutes after you last accessed the old version. If you replace the file with a SMALLER file during that time period, ATTEMPTING to access it results in the error and also corrupting the json and result in I/O errors as well, such as:

An unexpected error is keeping you from copying the file. If you continue to receive this error, you can use the error code to search for help with this problem.

Error 0x8007045D: The request could not be performed because of an I/O device error.

  pid.jpg
  Item type: JPG File
  Dimensions: 1278x894
  Size: 385 KB

      [Try Again]   [Skip]   [Cancel]

v More details

2023/01/12 16:13:10 ERROR : IO error: vfs reader: failed to write to cache file: InvalidRange: The requested range is not satisfiable
        status code: 416, request id: ---, host id: ---
2023/01/12 16:13:10 ERROR : pic1.jpg: vfs cache: failed to download: vfs reader: failed to write to cache file: InvalidRange: The requested range is not satisfiable
        status code: 416, request id: ---, host id: ---

PS - sorry for the two prior edited/deleted posts -- I was posting a follow-up and accidentally edited out the first of the two posts and had to re-create all this info and did it in this reply post...

luther_af · January 13, 2023, 5:12am

I scoured through the documentation some more and realized the 5 minutes correlated to the poll-interval (or was it dir-cache-time?) and found that adding these mitigates the issue in my circumstances:
--dir-cache-time 2s --poll-interval 1s

My use case is that we need some legacy code on multiple environments to access data on s3 as it if were on a Windows file system. Files are added and updated outside of rclone (e.g. directly to the bucket).

The errors reported here sound similar to what is described in Attribute Caching except that we do have attr-timeout at the default (1s) and this can happen for up to five minutes (or probably up to --dir-cache-time).

ncw · January 13, 2023, 2:32pm

Directory caching is the usual cause of this sort of problem.

S3 doesnt have a way of notifying rclone of changes (--poll-interval does not work on s3) so we rely on the directory cache expiring in order to see changes.

So what will be fixing your problem is --dir-cache-time 2s. This means that rclone will only keep directories in memory for 2s. Now OSes absolutely love to read directories whenever they can so you might find this causes a lot more directory reads. So I'd suggest you push that 2s to as high a number that you can put up with.

luther_af · January 13, 2023, 5:05pm

Understood. So does rclone cache directory data somewhere other than the --cache-dir location? I would have expected removing the cache dir and restarting rclone would force the cache to refresh - it didn't seem to in my testing.

Are the errors that this led to considered a bug that should be fixed?

2023/01/12 16:13:10 ERROR : IO error: vfs reader: failed to write to cache file: InvalidRange: The requested range is not satisfiable
        status code: 416, request id: ---, host id: ---
2023/01/12 16:13:10 ERROR : pic1.jpg: vfs cache: failed to download: vfs reader: failed to write to cache file: InvalidRange: The requested range is not satisfiable
        status code: 416, request id: ---, host id: ---

For our use case, I am thinking rc mode might be the way to go with a longer --dir-cache-time but then before certain operations we could send the remote command to invalidate particular directories as needed.

Is it correct that even if the directory has to refresh, the unchanged files will remain cached?

ncw · January 16, 2023, 11:39am

Rclone only caches directory info in memory at the moment. However info about individually cached files is stored under --cache-dir.

I would have expected stopping rclone, removing the cache dir, then restarting rclone to fix the problem. If you removed the cache dir when rclone is running then it might re-write it on shutdown.

luther_af:

Are the errors that this led to considered a bug that should be fixed?

2023/01/12 16:13:10 ERROR : IO error: vfs reader: failed to write to cache file: InvalidRange: The requested range is not satisfiable
        status code: 416, request id: ---, host id: ---
2023/01/12 16:13:10 ERROR : pic1.jpg: vfs cache: failed to download: vfs reader: failed to write to cache file: InvalidRange: The requested range is not satisfiable
        status code: 416, request id: ---, host id: ---

Yes, I think that is a bug.

The fingerprinting should be enough to convince rclone that the cached copy of the file has changed.

Ultimately this function should be called just before rclone opens a file to see if the object is still relevant. I guess that the Object didn't get updated as long as the directory cache was still alive.

github.com

rclone/rclone/blob/f31ab6d17857cc5619f321f0d7146721099ba644/vfs/vfscache/item.go#L793


      
          
          
// check the fingerprint of an object and update the item or delete
          // the cached file accordingly
          //
          // If we have local modifications then they take precedence
          // over a change in the remote
          //
          // It ensures the file is the correct size for the object.
          //
          // call with lock held
          func (item *Item) _checkObject(o fs.Object) error {
          	if o == nil {
          		if item.info.Fingerprint != "" {
          			// no remote object && local object
          			// remove local object unless dirty
          			if !item.info.Dirty {
          				item._remove("stale (remote deleted)")
          			} else {
          				fs.Debugf(item.name, "vfs cache: remote object has gone but local object modified - keeping it")
          			}
          			//} else {

That is correct, however rclone should notice if the objects are out of date using the fingerprint.

luther_af · January 20, 2023, 12:35am

Thanks Nick for taking the time to reply and explain.

Yes, I think that is a bug.
Have you / will you file that somewhere or should I go somewhere to report and file this issue about rclone seemingly not checking the file when it is accessed?

Separate potential issue:

I would have expected stopping rclone, removing the cache dir, then restarting rclone to fix the problem. If you removed the cache dir when rclone is running then it might re-write it on shutdown.
I believe I observed unexpected behavior. I started rclone with an empty s3 bucket, verified the empty directory on the virtual windows drive, and then uploaded a file to s3. The directory remained empty in Windows (as expected inside the default 5 minute --dir-cache-time). I stopped rclone, removed the cache dir, and restarted rclone -- it was still empty in Windows -- this was unexpected. I did these steps again and included rebooting Windows. Within the 5 minutes, it still appeared empty. Could it be that rclone persists the time and date of the last directory check somewhere other than the cache dir so that when it is restarted it won't check again until --dir-cache-time has expired? I couldn't find any evidence of that being persisted anywhere else.

ncw · January 20, 2023, 10:39am

Before we make an issue, I need a way of reproducing this myself.

If I can reproduce it locally, I'll fix the bug! Fixing bugs I can't reproduce is much harder

luther_af · January 20, 2023, 10:39pm

I believe I included enough information earlier in the thread to reproduce the issue. Are you saying you need more information, or just that you haven't had a chance to do it yet?

ncw · February 1, 2023, 2:13pm

@luther_af - you are correct you provided enough information - I was just in too much of a hurry to take it in

I've had time to go through it carefully now.

This is correct operation of the directory cache - the timeout is 5m and can be changed.

luther_af:

Later in my testing after replacing a larger image with a smaller one, refreshing explorer in the R: drive led to this message displayed in the output window:
2023/01/12 15:58:13 ERROR : pic1.jpg: vfs cache: failed to download: vfs reader: failed to write to cache file: InvalidRange: The requested range is not satisfiable
        status code: 416, request id: <removed for privacy>, host id: <removed for privacy>

However this is definitely a bug and the one I am very interested in fixing.

I think both that error and the incorrect size of the object in the cache are caused by rclone's copy of the s3 object metadata being out of sync with the actual metadata.

Rclone tried to request the larger sized object, but only the smaller one was available - hence the The requested range is not satisfiable.

The JSON cache file shows that one block of 64k of the file has been downloaded only

So its read 64k of the larger file and cached that.

Now, let me see if I can convert your powershell to bash to run it here.

I'm trying to reproduce the illegal seek so I swapped the order of your file. THis is what I came up with

#!/bin/bash
set -ve

# Config
CONFIG_NAME=s3
BUCKET_NAME=rclone-test-123456-bad-cache
REMOTE=${CONFIG_NAME}:${BUCKET_NAME}
RCLONE_CACHE=/tmp/rclone-cache
MNT=/mnt/tmp
RCLONE=rclone-v1.61.1

# Create the test files
${RCLONE} test makefile --seed 0 50k smaller.bin
${RCLONE} test makefile --seed 1 400k larger.bin

# create the bucket - same as the one in rclone.config
${RCLONE} mkdir ${REMOTE}

# upload the larger of the two files to s3
${RCLONE} copyto -v larger.bin ${REMOTE}/file.bin
sleep 2

# start rclone and give it time to get going
${RCLONE} mount ${REMOTE}/ ${MNT} --rc --vfs-cache-mode full --cache-dir ${RCLONE_CACHE} --vfs-cache-max-age 720h --vfs-cache-max-size 50G --dir-cache-time 10s &
sleep 10

# access the file through rclone mapped drive
ls -l ${MNT}/file.bin
${RCLONE} md5sum ${MNT}/file.bin
sleep 2
# show the json in the cache:
cat ${RCLONE_CACHE}/vfsMeta/${CONFIG_NAME}/${BUCKET_NAME}/file.bin

# Clear the directory cache and try again
# ${RCLONE} rc vfs/forget
# sleep 2

# Wait for the directory cache to expire
#sleep 12

# replace it with a smaller file; access & show; look at cache
${RCLONE} copyto -v smaller.bin ${REMOTE}/file.bin
sleep 2
ls -l ${MNT}/file.bin
${RCLONE} md5sum ${MNT}/file.bin
sleep 2
cat ${RCLONE_CACHE}/vfsMeta/${CONFIG_NAME}/${BUCKET_NAME}/file.bin

# cleanup
${RCLONE} rc core/quit
### uncomment out the next line with care!! it will remove the entire bucket
### ${RCLONE} purge ${REMOTE}
rm -rf ${RCLONE_CACHE}

If you uncomment the sleep 12 then it reproduces the file with the empty Fingerprint - however I can't get it to reproduce the illegal seek.

Any thoughts? Any chance you could try my script under Linux (or WSL)?

system · March 3, 2023, 2:13pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.