Advice in writing a new backend

wiserain · February 8, 2023, 10:38am

There are a couple of issues with a new backend that I am working on and would like to ask advice for.

Related issue: add support for pikpak · Issue #6429 · rclone/rclone · GitHub

How to disable multipart copy

Download URL from API server doesn't allow multiple concurrent range-requests in a single file and thus default MultiThreadStreams=4 causes error on opening an object. As a temporary workaround, ci.MultiThreadStreams is set to 1 in NewFs(). Is there any better ways to disable multi-part copy for this backend only?
Incomplete md5sum

File info provided by API server has a field for md5sum but given empty in many cases. Don't know when/why/what the case is. In the current circumstance, setting hash.Set(hash.MD5) is desirable? If that's the case, some of fs tests always get failed as follows:
```
--- FAIL: TestSyncWithTrackRenames (6.07s)

--- FAIL: TestHashSums (5.79s)
    --- FAIL: TestHashSums/Md5 (0.24s)
```
test_all

Due to the daily limit for creating objects, carrying out test_all with maxtries=1 and disect sub-tests one bye one.

However, they almost always get failed in the first trial but after all tests are passed in subsequent retries. I believe this is a matter of backend reliability. Is this common for all other backends? If not, the code needs to be fixed.

And tests in the integration test are independent each other? Unlikely to other tests such as fs/operations, fs/sync, and vfs, integration tests always start failing with TestIntegration/FsMkdir/FsPutFiles/FsDirMove. This doesn't make sense to me.

ncw · February 8, 2023, 2:50pm

There is no way to do this at the moment.

A new Feature flag would be needed say NoConcurrentReads which was checked in the multipart copy code. Note that this will also stop --vfs-cache-mode full working.

What does the server do when you open a concurrent range request?

Maybe we would need to get a new Download URL for each one?

Not being able to do concurrent reads will make the backend less functional than it could be.

You should be able to return an empty MD5 hash and still have the tests pass. For instance S3 files uploaded with chunks may not have MD5 sums.

Some backends are like this. Its no big deal as long as the integration tests pass in the end.

Yes they are independent, but they may be run simultaneously which might be upsetting the host. You can set the oneonly flag to make sure only one ones at once

github.com

rclone/rclone/blob/04f7e528035309e0467e7bbac08997818f602eaa/fstest/test_all/config.yaml#L294


      
            fastlist: false
          - backend:  "fichier"
            remote:   "TestFichier:"
            fastlist: false
            listretries: 5
            tests:
              - backend
          - backend:  "qingstor"
            remote:   "TestQingStor:"
            fastlist: false
            oneonly:  true
            cleanup:  true
            ignore:
              # This test fails because of a broken bucket in the account!
              - TestIntegration/FsMkdir/FsPutFiles/FromRoot/ListR
          - backend:  "azureblob"
            remote:   "TestAzureBlob:"
            fastlist: true
          - backend:  "pcloud"
            remote:   "TestPcloud:"
            fastlist: true

It sounds like you are doing well with this backend - well done

wiserain · February 8, 2023, 3:26pm

Didn't check detailed http response but debug is like

<7>DEBUG : REDACTED_PATH/REDACTED_FILE.ext: Starting multi-thread copy with 4 parts of size 356.438Mi
<7>DEBUG : REDACTED_PATH/REDACTED_FILE.ext: multi-thread copy: stream 2/4 (373751808-747503616) size 356.438Mi starting
<7>DEBUG : REDACTED_PATH/REDACTED_FILE.ext: multi-thread copy: stream 1/4 (0-373751808) size 356.438Mi starting
<7>DEBUG : REDACTED_PATH/REDACTED_FILE.ext: multi-thread copy: stream 4/4 (1121255424-1494924591) size 356.359Mi starting
<7>DEBUG : REDACTED_PATH/REDACTED_FILE.ext: multi-thread copy: stream 3/4 (747503616-1121255424) size 356.438Mi starting
<7>DEBUG : REDACTED_PATH/REDACTED_FILE.ext: multi-thread copy: stream 1/4 failed: multipart copy: wrote 0 bytes but expected to write 373751808
<7>DEBUG : REDACTED_PATH/REDACTED_FILE.ext: multi-thread copy: stream 2/4 failed: multipart copy: failed to open source: open file failed: Get "REDACTED_DOWNLOAD_URL": context canceled
<7>DEBUG : REDACTED_PATH/REDACTED_FILE.ext: multi-thread copy: stream 3/4 failed: multipart copy: failed to open source: open file failed: Get "REDACTED_DOWNLOAD_URL": context canceled
<7>DEBUG : REDACTED_PATH/REDACTED_FILE.ext: multi-thread copy: stream 4/4 failed: multipart copy: failed to open source: open file failed: Get "REDACTED_DOWNLOAD_URL": context canceled
<3>ERROR : REDACTED_PATH/REDACTED_FILE.ext: Failed to copy: multipart copy: wrote 0 bytes but expected to write 373751808

I'll try, but there is a limit for each user. Currently, 10 sessions, 2 concurrent range-request in a single file, i.e. --multi-thread-streams=1 x --transfers=10 or --multi-thread-streams=2 x --transfers=5`. Anyway, I will try with new URL for each opening.

Even if it is already implemented to return an empty MD5 hash and the two tests I mentioned still get failed, then is it something wrong? In other words, success/failure of two tests are not relevant to hash implementation?

Wow, it is really useful!

ncw · February 8, 2023, 4:11pm

It's probably worth looking at the HTTP headers with -vv --dump headers

The backend tests will cope with empty MD5s - looks like the operations tests won't. We can fix the tests or you can ignore those tests in the integration tester (probably my preferred choice).

wiserain · February 8, 2023, 4:42pm

Thanks for your advice. I will meet you at github PR.

system · April 10, 2023, 12:42pm

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.