Performance issue on searching files

What is the problem you are having with rclone?

When filtering files, it looks like that RClone fetches a whole list of all the files from remote and applies the filters on the list. Instead, why not calling search API to search files from the remote directly? For example, the search API on Microsoft OneDrive, which can search files out in less than 1 seconds. Compared to the current RClone implementation, it took more than 10 seconds to find 2 files inside more than 2000 files. The time will increase linearly when the number of files increases.
So, any reason for not using the remote search directly? Besides, any other ways I can search files easily except the filters?

Run the command 'rclone version' and share the full output of the command.

rclone v1.68.2

Which cloud storage system are you using? (eg Google Drive)

Microsoft OneDrive

The command you were trying to run (eg rclone copy /tmp remote:tmp)

rclone lsjson "onedrive:" --files-only --recursive --include "*kms*.pdf" --ignore-case

Please run 'rclone config redacted' and share the full output. If you get command not found, please make sure to update rclone.

[onedrive]
type = onedrive
client_id = XXX
client_secret = XXX
drive_type = personal
access_scopes = Files.Read Files.Read.All Sites.Read.All offline_access
token = XXX
drive_id = XXX

A log from the command that you were trying to run with the -vv flag

2024/12/30 15:36:59 DEBUG : rclone: Version "v1.68.2" starting with parameters ["./rclone" "lsjson" "onedrive:" "--files-only" "--recursive" "--include" "*kms*.pdf" "--ignore-case" "-vv"]
2024/12/30 15:36:59 DEBUG : Creating backend with remote "onedrive:"
2024/12/30 15:36:59 DEBUG : Using config file from "/Users/rhao/.config/rclone/rclone.conf"
[
2024/12/30 15:37:00 DEBUG : file_name Excluded (Path Filter)
2024/12/30 15:37:00 DEBUG : file_name: Excluded
and repeated thousands times.
]
2024/12/30 15:37:09 DEBUG : 7 go routines active

To always use Onedrive search API would require it to support all rclone filters - it does not.

It could be possible to implement it as separate, extra functionality - similarly to GDrive query. Until somebody does implement such functionality and submits PR it is theoretical though.

In the meantime try to use fast-list. Adding --onedrive-delta can speed up all operations by significant factor.

Other option could be to use rclone mount. Prefetch all content with --vfs-refresh and set --dir-cache-time 9999h (OneDrive is polling remote so all changes on remote end will be picked up anyway). Then you can use whatever local search method you like.

Thanks kapitainsky. The --fast-list might not be the best solution for my use case. I'm working on a website that different users will use and most users will likely choose a subdirectory instead of the root folder. I will dig into the GDrive query and check whether I can implement a similar query for OneDrive, hoping it will not be too complex for me because I'm new to the Go language :grinning:
BTW, is the GDrive query feature implemented here rclone/backend/drive/drive.go at master · rclone/rclone · GitHub?

1 Like

Have a look at original gdrive query PR:

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.