Rclone and linux readahead

spotter · February 6, 2020, 11:29am

though in rclone's case, its basically running all those buffers in parallel. I'm wondering if there's a way to serialize them.

ncw · February 6, 2020, 1:23pm

--transfers 1 is the documented way!

spotter · February 6, 2020, 2:21pm

yes, but that doesn't help me max out my 50MB/s symmetric pipe I only get 2MB/s or so to google for whatever reason, but if I ratchet it to 30-40 connections I saturate the pipe (while killing iops on my disk).

seuffert · February 6, 2020, 2:26pm

imho you are trying to optimize for a special case.
one might also have several mounts on multiple drives. serializing over all read requests would hurt performance a lot in that case.

ncw · February 6, 2020, 2:33pm

If you want to try an experiment you could put a global mutex here

github.com

rclone/rclone/blob/4ee3c21a9d6e4331f45b08805a6ed853bada3941/fs/asyncreader/asyncreader.go#L332


		return true
	}
	return false
}


// read into start of the buffer from the supplied reader,
// resets the offset and updates the size of the buffer.
// Any error encountered during the read is returned.
func (b *buffer) read(rd io.Reader) error {
	var n int
	n, b.err = readers.ReadFill(rd, b.buf)
	b.buf = b.buf[0:n]
	b.offset = 0
	return b.err
}


// Return the buffer at current offset
func (b *buffer) buffer() []byte {
	return b.buf[b.offset:]
}

To make sure that there is only one read from disk occurring at once and you could experiment with increasing BufferSize

github.com

rclone/rclone/blob/4ee3c21a9d6e4331f45b08805a6ed853bada3941/fs/asyncreader/asyncreader.go#L18


	"time"


	"github.com/pkg/errors"
	"github.com/rclone/rclone/fs"
	"github.com/rclone/rclone/lib/pool"
	"github.com/rclone/rclone/lib/readers"
)


const (
	// BufferSize is the default size of the async buffer
	BufferSize           = 1024 * 1024
	softStartInitial     = 4 * 1024
	bufferCacheSize      = 64              // max number of buffers to keep in cache
	bufferCacheFlushTime = 5 * time.Second // flush the cached buffers after this long
)


var errorStreamAbandoned = errors.New("stream abandoned")


// AsyncReader will do async read-ahead from the input reader
// and make the data available as an io.Reader.
// This should be fully transparent, except that once an error

That might help...

spotter · February 6, 2020, 4:18pm

was thinking the exact same thing as a hack

spotter · February 6, 2020, 4:37pm

yes/no. this would be optional, but I do think even if one's sync spans multiple mount points, if you have multiple parallel transfers, most likely many of the transfers (if not generally all the transfers) would be on the same mount point due to how rclone populates its queue on a directory by directory level.

It's also conceptually possible that one could optimize it on a blockdev level, by trying to serialize reads on a blockdev level instead of just globally.

right now I'n trying to figure out if I can optimize it for my use case, if its possible, then question then is, how can one generalize it to be useful in wider circumstances.

spotter · February 6, 2020, 8:37pm

here's a quick and dirty experiment outside of rclone on my somewhat beefy laptop (thinkpad t490, 48GB of RAM, SSD, 4 core / 8 thread i7-8565U CPU @ 1.80GHz). Test is 100 files of 1GB each

test.go: this serializes reads to independent threads over channels. laptop is responsive during the whole course of the test

timing results:

real    0m35.710s
user    0m12.607s
sys     0m29.983s

code

package main

import (
	"fmt"
	"io"
	"os"
	"runtime"
	"strconv"
)

type message struct {
	data []byte
}

var (
	bufferSize = 10 * 1024 * 1024
)

func main() {
	if len(os.Args) != 2 {
		fmt.Printf("usage:\n\t%v <# of files>\n", os.Args[0])
		os.Exit(1)
	}

	count, err := strconv.Atoi(os.Args[1])
	if err != nil {
		panic(fmt.Sprintf("couldn't convert %v to an int", os.Args[1]))
	}

	l := make([]chan message, 0)
	end := make([]chan bool, 0)

	for i := 0; i < count; i++ {
		c := make(chan message, bufferSize/4096)
		l = append(l, c)
		e := make(chan bool)
		end = append(end, e)
		fmt.Printf("starting reader thread %v\n", i)
		go reader(i, c, e)
	}

	go reader_writer(l)

	fmt.Printf("waiting for ending\n")

	for _, e := range end {
		<-e
	}
}

func reader_writer(l []chan message) {
	files := make([]*os.File, 0)
	open := make([]bool, 0)

	for i := 0; i < len(l); i++ {
		f, err := os.Open(fmt.Sprintf("%v", i))
		if err != nil {
			panic(fmt.Sprintf("failed to open %v: %v", i, err))
		}

		files = append(files, f)
		open = append(open, true)
	}

	buf := make([]byte, bufferSize)

	allFalse := false

	for !allFalse {
		allFalse = true

		for i, f := range files {
			if open[i] == false {
				continue
			}

			allFalse = false

			if len(l[i]) > cap(l[i]) / 2 {
				continue
			}

			read, err := f.Read(buf)
			if err != nil {
				if err == io.EOF {
					f.Close()
					open[i] = false
				}
			}

			if read != 0 {
				for j := 0; j < read; j += 4096 {
					writeLen := j + 4096
					if writeLen > read {
						writeLen = read
					}
					l[i] <- message{data: buf[j:writeLen]}
				}
			}

			if !open[i] {
				close(l[i])
			}
		}

		runtime.Gosched()
	}
}

func reader(i int, c chan message, e chan bool) {
	bytesRead := 0
	null, err := os.Open("/dev/null")
	if err != nil {
		panic("failed to open /dev/null")
	}

	for msg := range c {
		null.Write(msg.data)
		bytesRead += len(msg.data)
	}

	null.Close()

	fmt.Printf("%v: read %v bytes\n", i, bytesRead)

	e <- true
}

test1.go: same test, but with code that just has reading and writing in parallel (i.e. naive). this locks up my laptop (X11) for the duration of the test (literally, mouse refuses to move)

timing results

real    0m40.205s
user    0m26.470s
sys     0m54.352s

code:

package main

import (
	"fmt"
	"io"
	"os"
	"strconv"
)

func main() {
	if len(os.Args) != 2 {
		fmt.Printf("usage:\n\t%v <# of files>\n", os.Args[0])
		os.Exit(1)
	}

	count, err := strconv.Atoi(os.Args[1])
	if err != nil {
		panic(fmt.Sprintf("couldn't convert %v to an int", os.Args[1]))
	}

	end := make([]chan bool, 0)

	for i := 0; i < count; i++ {
		e := make(chan bool)
		end = append(end, e)
		fmt.Printf("starting reader thread %v\n", i)
		go simpleReaderWriter(i, e)
	}

	fmt.Printf("waiting for ending\n")

	for _, e := range end {
		<-e
	}
}

func simpleReaderWriter(i int, e chan bool) {
	bytesRead := 0

	f, err := os.Open(fmt.Sprintf("%v", i))
	if err != nil {
		panic(fmt.Sprintf("failed to open %v: %v", i, err))
	}
	defer f.Close()
	null, err := os.Open("/dev/null")
	if err != nil {
		panic(fmt.Sprintf("%v: couldn't open /dev/null: %v", i, err))
	}
	defer null.Close()

	buf := make([]byte, 4096)
	finished := false

	for !finished {
		read, err := f.Read(buf)
		if err != nil {
			if err == io.EOF {
				f.Close()
				finished = true
			}
		}

		bytesRead += read
		null.Write(buf)
	}

	fmt.Printf("%v: read %v bytes\n", i, bytesRead)
	e <- true
}

spotter · February 6, 2020, 8:40pm

once current rclone transfer is done (probably sometime tonight / tomorrow), I'll test it on my old reliably Dell R710 with a 5 * 8TB RAID5 array (what I'm generally transferring off of), with 48GB and 4 CPUs with 4 threads each). My expectation is that it will be even much more in favor of test.go over test1, but will see.

spotter · February 6, 2020, 8:47pm

and for the sake of completion, timing dd on a single 100GB file (i.e. same amount of data)

25600000+0 records in
25600000+0 records out
104857600000 bytes (105 GB, 98 GiB) copied, 45.6688 s, 2.3 GB/s

real    0m45.670s
user    0m8.406s
sys     0m32.603s
spotter@spotterT490:~/go/src/github.com/rclone/rclone$ time dd if=0 of=/dev/null bs=1M
100000+0 records in
100000+0 records out
104857600000 bytes (105 GB, 98 GiB) copied, 44.033 s, 2.4 GB/s

real    0m44.042s
user    0m0.095s
sys     0m28.785s
spotter@spotterT490:~/go/src/github.com/rclone/rclone$ time dd if=0 of=/dev/null bs=10M
10000+0 records in
10000+0 records out
104857600000 bytes (105 GB, 98 GiB) copied, 38.8603 s, 2.7 GB/s

real    0m38.865s
user    0m0.037s
sys     0m30.930s

ncw · February 7, 2020, 10:50am

What conclusion to you draw from that? I'd say that too much IO causes resource starvation elsewhere. If you run vmstat 5 when it is running I expect you will see the number of blocked processes increase

It might be worth experimenting with ionice - however I've not had much success with preventing this sort of thing...

spotter · February 7, 2020, 2:44pm

Also, as an aside, its hard to get this go threading right, without calling any system calls, a go thread might never schedule out.

ncw · February 7, 2020, 4:45pm

Actually any function call is a scheduling point, not just a syscall. I've never had a problem with it in practice.

go1.14 which is in preview now does pre-emptive scheduling BTW...

spotter · February 8, 2020, 8:41pm

yea, I saw the release notes for 1.14,

I wonder if len() counts as a scheduling point, because I was getting starvation on my code without the forced resched (which disappeared when I would stick in fmt.Printf's for debugging). and basically if len of a channel was more than half its capacity, I would just loop, and the reader of the channel never got a chance to do anything (or at least thats my hypothesis).

on a different note: looked at the asyncreader, and wondering what's the point of the timeout in the context of rclone. perhaps it has value in the context of the fs mounting, but it would seem for most use cases, its self defeating? i.e. I read it, I will need it (as most files are read to completion), but rclone is willing to throw it away if it hasn't been read within a certain period of time.

ncw · February 9, 2020, 12:21pm

I would say it doesn't as len() just reads the length from the slice/channel/map header - no function call required.

Are you referring to this?

bufferCacheFlushTime = 5 * time.Second // flush the cached buffers after this long

This is how long buffers hang around for re-use, not how long rclone keeps the data for. Rclone hangs onto the data indefinitely.

131 · February 11, 2020, 9:48am

As i wanted to use "huge" (100 - 300 MB / possible GB) read ahead buffer cache. I ended up creating a new fuse based (openstack swift backed only) project . It takes me all my time XD, and i'm searching tweak/hints in topics like this one.

Underlying http layer is https://github.com/131/random-read-http

Now, the fuse binding layer i'm working with has been refactored to a multithreaded lib. And now i am suffering with cache invalidation and non serial reads. In here, I can read nick's pointer to "async buffer reader" for rclone an search for more good solutions. Rclone is great. Internet is great. Thank you nick again for your amazing work.

Have a great day to you all.

system · May 11, 2020, 9:48am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.