Copyto overallocates disk for byte range request

What is the problem you are having with rclone?

Using copyto with html headers to make byte range requests (download partial files), rclone seems to be allocating space for the entire file. For many partial downloads of large files the disk is quickly fragmented. Not sure if there some other html header I can add to avoid this, or if this is a potential bug. To reproduce:

### create random file of 10Mb
fallocate -l 10M file.txt

### Upload file to my-bucket
rclone copyto file.txt s3:my-bucket/AJA/file.txt

### Download 100 bytes range of data
rclone --header "Range: bytes=100-199" --ignore-checksum --ignore-size copyto s3:my-bucket/AJA/file.txt ./file_100-199.txt

### List apparent size
ls -lh ./file_100-199.txt
> -rw-r--r-- 1 ajaltomare ajaltomare 100 Oct 13 13:38 file_100-199.txt

### Show disk usage
du -h ./file_100-199.txt
> 10M     file_100-199.txt

Run the command 'rclone version' and share the full output of the command.

rclone v1.59.2

  • os/version: ubuntu 20.04 (64 bit)
  • os/kernel: 5.10.102.1-microsoft-standard-WSL2 (x86_64)
  • os/type: linux
  • os/arch: amd64
  • go/version: go1.18.6
  • go/linking: static
  • go/tags: none

Which cloud storage system are you using? (eg Google Drive)

S3

The command you were trying to run (eg rclone copy /tmp remote:tmp)

rclone --log-file rclone.log -vv --header "Range: bytes=100-199" --ignore-checksum --ignore-size copyto s3:my-bucket/AJA/file.txt ./file.txt

The rclone config contents with secrets removed.

[s3]
type = s3
provider = AWS
env_auth = true
region = us-west-2
location_constraint = us-west-2
acl = private

A log from the command with the -vv flag

2022/10/13 13:14:36 DEBUG : rclone: Version "v1.59.2" starting with parameters ["rclone" "--log-file" "rclone.log" "-vv" "--header" "Range: bytes=100-199" "--ignore-checksum" "--ignore-size" "copyto" "s3:my-bucket/AJA/file.txt" "./file.txt"]
2022/10/13 13:14:36 DEBUG : Creating backend with remote "s3:my-bucket/AJA/file.txt"
2022/10/13 13:14:36 DEBUG : Using config file from "/home/ajaltomare/.config/rclone/rclone.conf"
2022/10/13 13:14:37 DEBUG : fs cache: adding new entry for parent of "s3:my-bucket/AJA/file.txt", "s3:my-bucket/AJA"
2022/10/13 13:14:37 DEBUG : Creating backend with remote "./"
2022/10/13 13:14:37 DEBUG : fs cache: renaming cache item "./" to be canonical "/home/ajaltomare/workspace/temp/dl"
2022/10/13 13:14:37 DEBUG : file.txt: Need to transfer - File not found at Destination
2022/10/13 13:14:37 INFO  : file.txt: Copied (new)
2022/10/13 13:14:37 INFO  :
Transferred:            100 B / 100 B, 100%, 0 B/s, ETA -
Transferred:            1 / 1, 100%
Elapsed time:         0.3s

2022/10/13 13:14:37 DEBUG : 6 go routines active

You'll probably be better off using rclone cat and the flags

  --count int    Only print N characters (default -1)
  --discard      Discard the output instead of printing
  --head int     Only print the first N characters
  --offset int   Start printing at offset N (or from end if -ve)
  --tail int     Only print the last N characters

I'm surprised copyto didn't give an error about truncated files. Ah I see you used --ignore-size to suppress that.

Thanks for your reply Nick. cat does indeed solve the problem. Unfortunately for my use case, using stdout is suboptimal as I would like to process hundreds of byte-range requests in parallel. Any ideas?

If you wanted to carry on with the copyto approach, then using these flags should be helpful --local-no-preallocate and --multi-thread-streams 0.

  --local-no-preallocate          Disable preallocation of disk space for transferred files
   --multi-thread-streams int         Max number of streams to use for multi-thread downloads (default 4)

Rclone doesn't interpret the range request header so using copyto seems like a bit of an abuse, but I can't think of a better idea right now!

1 Like

This works quite well. Thanks!

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.