Google Drive Mount With 300K+ files on Google Colab

What is the problem you are having with rclone?

I am trying to find the fastest way to read/copy/cache files from Google Drive to a Google Colab Notebook.

What is your rclone version (output from rclone version)

Rclone version 1.53.3

Which OS you are using and how many bits (eg Windows 7, 64 bit)

Linux Ubuntu 18.04 64bit

Which cloud storage system are you using? (eg Google Drive)

I am using Google Drive

I was running this command:
rclone mount drive: 'gdrive/My Drive' --daemon --dir-cache-time 12h --vfs-cache-max-age 12h --fast-list --no-modtime --no-checksum --vfs-cache-mode full
`
I am using it to train a NN. The training dataset is split into 8 categories, each with ~5500 folders and each of these folders containing 8 files that are between 10KB-100KB .
Is there any way to make the reading faster? Currently (Even with cache off), it hangs for about 30-60s every 10 batches or so then goes on to read the required data in 0.07s or so.

Now, the machine is alive for 12hrs only, that's why I cached for 12hours, so subsequent runs would avoid that hang time. I do not know if they do already. So what do you think would be faster? Copyting all the files on the machine at once? Syncing them to the machine? (I won't write anything to them)

PS: I started using rclone instead of google.drive package from python because google drive was timing out after some time.

--fast-list does nothing on a mount and can be removed.

I use a post exec command to read in the cache so you'd want to see how long that takes before using the mount as you can run it without async and see.

ExecStartPost=/usr/bin/rclone rc vfs/refresh recursive=true --rc-addr 127.0.0.1:5572 _async=true

So in my mount, I use the remote control daemon as well.

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.