Testing for new vfs cache mode features

Here is the latest beta

https://beta.rclone.org/branch/v1.52.1-067-g0f0e2476-vfs-beta/ (uploaded in 15-30 mins)

I've been filling out the unit tests so I found a few small corner case bugs

- Fix Truncate on unopened file
- Fix bug when downloading segments to fill out file on close
- Fix bug when WriteAt extends the file and we don't mark space as used

I've got one more set of unit tests to write then I'll be thinking about merging this branch into the mainline beta.

2 Likes

Windows Build failed?

The new tests didn't work on Windows here is a new set building away!

https://beta.rclone.org/branch/v1.52.1-068-ge02475b2-vfs-beta/ (uploaded in 15-30 mins)

Version works good on Windows, no issues.

I think i will switch 1.51 to this beta and remove cache backend

Great!

:slight_smile:

An upcoming change may invalidate all the files in the cache - as long as you are OK with that I'd be interested to know how it turns out.

I've slipped this into all mine for a few days now and am using mode=full with no noticeable differences except for faster display of cached items. Its working well for me at least.

1 Like

Good news! I think this is nearly ready for merging :crossed_fingers:

2 Likes

@ncw: re eviction policy, it might make sense to just use the chunker that already exists and treat the cache as a chunked backend that can be written to and evicted from and be able to evict chunks.

2 Likes

About --vfs-cache-mode full, if i watch entire a video file and the free space < total size of the video file (let's say i have 20 GB free spaces and watch a 40 GB video), what will it do since there's only 1 sparse file in cache dir that keep growing? Does --vfs-read-chunk-size-limit help with limit the total size of a sparse file?

I've been trying the latest beta out today in writes mode, as the drawbacks of full mode's caching (mainly not being able to keep chunks of a file, only the whole file or nothing) don't seem worth it to me at the moment.

Read operations seem to all be working great, faster than with a cache remote in the mix like I had before. I have run in to an issue writing/uploading files though. I dropped in 3 files in quick succession, and the first two uploaded correctly, but the third threw an error, and then the mount seemed to hang (or at least any process trying to access that file/directory was non responsive). I aborted the FUSE connection, restarted the mount, and the file uploaded and everything seems happy now.

Unfortunately I didn't have debug logging on at the time, but here's what I got at the INFO level. This is a Google Drive remote with a crypt layer on top, which is exclusively accessed via this mount (hence the long timeouts/cache/checks) with the following mount options:

  --allow-other \                                                                                                                                                                                                              
  --umask 002 \                                                                                                                                                                                                                
  --transfers 6 \                                                                                                                                                                                                              
  --dir-cache-time 48h \                                                                                                                                                                                                       
  --poll-interval 24h \                                                                                                                                                                                                        
  --vfs-cache-mode writes \                                                                                                                                                                                                    
  --vfs-cache-max-age 24h \                                                                                                                                                                                                    
  --vfs-cache-max-size 10G \                                                                                                                                                                                                   
  --attr-timeout 0 \                                                                                                                                                                                                           
  --rc \
  --use-mmap \
  --log-level INFO \
  --gid 1001 \
  --uid 1000 \
  --fast-list \
  --drive-skip-gdocs \
  --drive-use-trash=false \
  --drive-disable-http2=false

Of course, there are some other things in here possibly affecting things. I've got attr-timeout down to 0 after the fix in https://github.com/rclone/rclone/issues/4104, and I've also re-enabled http2 for Google Drive. Neither of these options seemed to cause any issues with the cache remote for the last few weeks.

Okay, finally, the one log message I got about this:

2020/06/14 17:57:58 ERROR : REDACTED.mp4: Failed to copy: Post "https://www.googleapis.com/upload/drive/v3/files?alt=json&fields=id%2Cname%2Csize%2Cmd5Checksum%2Ctrashed%2CmodifiedTime%2CcreatedTime%2CmimeType%2Cparents%2CwebViewLink%2CshortcutDetails&supportsAllDrives=true&uploadType=resumable&upload_id=AAANsUn0YL2KoZR5_fU1GKqvfTBF7pD_Ck2B_YUAqp3kj78cO-dFOHpPAX05UOgTaApuHzqqbwCTjfKRZJuzuPXSrWA": context canceled

Any thoughts?

While the file is in use it isn't eligible for deletion even if your cache is over quota. When it is closed it is, so in this case the file would be deleted at the next cache clearing run.

Potentially rclone could delete an in use file and it would immediately be recreated empty. That might be something to consider doing later. So if the cache went over quota it would drop the whole file.

1 Like

I'm glad it uploaded on restart :smiley: That is a new feature!

Upload aborted because the context was cancelled is either a bug or fallout from a previous problem. Did you maybe modify the file, wait more than 5 seconds so it started uploading then open it and modify it again? That would cause the upload to be aborted. It should have been retried though

If you can cause the problem again with debugging that would be great. If the mount locks up then kill -QUIT it to get a full trace - it will be apparent why it locked up from that.

The file was post-processed, moved to the mount, and then Plex would have started scanning it immediately. I don't believe there would have been another write operation in there, but I can't be sure.

I actually did just cause this again with processing just a single file, so there does seem to be something strange happening here with this config. I'm going to try this with a test mount with debug logging on, but won't be able to do it right now. I'll report back with full debug logs in a few hours hopefully.

I thought it downloaded chunks as needed into the sparse file.

Success! So I replicated this by immediately attempting to read the file I just copied into the mount (in this case by just reading the file with ffmpeg, similar to what Plex would do), which seems to trigger a change in the file modification time, which cancels the upload and then seems to try and open the file read-only, but that's where things then hang.

Log file is here https://gist.githubusercontent.com/dotsam/33a8be8947643704daf2223e61ca7fb7/raw/1bff09a2a5b089a66d63961927478263c1b789d0/rclone-vfs-debug.log, with the file copy/upload in question starting at 2020/06/14 21:14:39 in the log. The log begins with the upload of a previous invocation of this bug, and also has a much smaller file I tested with, but was unable to trigger as the upload happened too quickly. It seems like I only hit this bug when the file is modified while uploading? I did a few reads of the file before the upload began, and those worked fine.

Yes, that's my understanding as well, but when a file is evicted from the cache, it is based on oldest file created, not oldest chunk, and then will remove the entire file, not just partial chunks. So if a file is opened for read more than once, eviction time will be based on the initial open of the file, not if those earlier chunks are still being accessed. At least that's my understanding from @ncw's comments in this thread.

1 Like

the issue (as I understand it) is because it's not storing individual chunks, but storing individual files. Therefore, it has no metadata (and would be overly expensive to maintain said metadata) and therefore the only evictable object is the file and the only real metadata it has to do that is the file system's last access time.

it would have to store the files in individual file chunks to enable per chunk evicting (personally I think that's the way it should go, but dont have time to even look at how to do that ATM, so beggars can't be choosers)

Thanks for the log :slight_smile: I managed to work out what was happening from the traceback.

It looks like this will only be triggered when a file is modified when it is being uploaded.

I think I've fixed it here (but I haven't managed to reproduce locally) - can you try?

https://beta.rclone.org/branch/v1.52.1-082-g017d7a53-vfs-beta/ (uploaded in 15-30 mins)

The files are evicted based on last access time not creation time. So the first file to be evicted will be the one which hasn't been accessed for the longest time.

The whole file (all the chunks in the sparse file) will be evicted at that point.

We do have a record of exactly what the chunks are in the file so it would be possible to evict chunks of the file based on their access time. However that is a lot more complex and I don't want to go there unless absolutely necessary!

Gave this a try again, and I'm still seeing the issue. I have however simplified the test case and found a few more details on when exactly this seems to error.

The command I've run is as follows:

cp testfile.mp4 /data/rclone-beta/testdir/ && sleep 8 && file /data/rclone-beta/testdir/testfile.mp4 && sleep 1 && file /data/rclone-beta/testdir/testfile.mp4

Sleeping for 8 seconds puts me at the point where the file is uploading, then the first invocation of file will interrupt the upload, but it's able to read the file. Waiting for a second and then attempting to read the file again, and that's where there's a deadlock.

Log: https://gist.github.com/dotsam/3f43c64e8486ec8be0f67035ad87d90d/raw/36fa7ccad4c73507420e723074ce8802cee897e4/vfsdebug.log

Thanks for your repro - very useful!

I've managed to fix this twice!

  • fix the deadlock (again!)
  • stop rclone cancelling the uploads if you don't modify the file

This patch also

  • cancels uploads if we do modify the file (before we close the file) so earlier
  • fixes double upload bug that happened once I'd fixed the deadlock!
  • allow ReadAt and WriteAt to run concurrently with themselves which should speed things up for some workloads

Testing appreciated :slight_smile:

https://beta.rclone.org/branch/v1.52.1-097-g8578d214-vfs-beta/ (uploaded in 15-30 mins)

1 Like