though in rclone's case, its basically running all those buffers in parallel. I'm wondering if there's a way to serialize them.
--transfers 1
is the documented way!
yes, but that doesn't help me max out my 50MB/s symmetric pipe I only get 2MB/s or so to google for whatever reason, but if I ratchet it to 30-40 connections I saturate the pipe (while killing iops on my disk).
imho you are trying to optimize for a special case.
one might also have several mounts on multiple drives. serializing over all read requests would hurt performance a lot in that case.
If you want to try an experiment you could put a global mutex here
To make sure that there is only one read from disk occurring at once and you could experiment with increasing BufferSize
That might help...
was thinking the exact same thing as a hack
yes/no. this would be optional, but I do think even if one's sync spans multiple mount points, if you have multiple parallel transfers, most likely many of the transfers (if not generally all the transfers) would be on the same mount point due to how rclone populates its queue on a directory by directory level.
It's also conceptually possible that one could optimize it on a blockdev level, by trying to serialize reads on a blockdev level instead of just globally.
right now I'n trying to figure out if I can optimize it for my use case, if its possible, then question then is, how can one generalize it to be useful in wider circumstances.
here's a quick and dirty experiment outside of rclone on my somewhat beefy laptop (thinkpad t490, 48GB of RAM, SSD, 4 core / 8 thread i7-8565U CPU @ 1.80GHz). Test is 100 files of 1GB each
test.go: this serializes reads to independent threads over channels. laptop is responsive during the whole course of the test
timing results:
real 0m35.710s
user 0m12.607s
sys 0m29.983s
code
package main
import (
"fmt"
"io"
"os"
"runtime"
"strconv"
)
type message struct {
data []byte
}
var (
bufferSize = 10 * 1024 * 1024
)
func main() {
if len(os.Args) != 2 {
fmt.Printf("usage:\n\t%v <# of files>\n", os.Args[0])
os.Exit(1)
}
count, err := strconv.Atoi(os.Args[1])
if err != nil {
panic(fmt.Sprintf("couldn't convert %v to an int", os.Args[1]))
}
l := make([]chan message, 0)
end := make([]chan bool, 0)
for i := 0; i < count; i++ {
c := make(chan message, bufferSize/4096)
l = append(l, c)
e := make(chan bool)
end = append(end, e)
fmt.Printf("starting reader thread %v\n", i)
go reader(i, c, e)
}
go reader_writer(l)
fmt.Printf("waiting for ending\n")
for _, e := range end {
<-e
}
}
func reader_writer(l []chan message) {
files := make([]*os.File, 0)
open := make([]bool, 0)
for i := 0; i < len(l); i++ {
f, err := os.Open(fmt.Sprintf("%v", i))
if err != nil {
panic(fmt.Sprintf("failed to open %v: %v", i, err))
}
files = append(files, f)
open = append(open, true)
}
buf := make([]byte, bufferSize)
allFalse := false
for !allFalse {
allFalse = true
for i, f := range files {
if open[i] == false {
continue
}
allFalse = false
if len(l[i]) > cap(l[i]) / 2 {
continue
}
read, err := f.Read(buf)
if err != nil {
if err == io.EOF {
f.Close()
open[i] = false
}
}
if read != 0 {
for j := 0; j < read; j += 4096 {
writeLen := j + 4096
if writeLen > read {
writeLen = read
}
l[i] <- message{data: buf[j:writeLen]}
}
}
if !open[i] {
close(l[i])
}
}
runtime.Gosched()
}
}
func reader(i int, c chan message, e chan bool) {
bytesRead := 0
null, err := os.Open("/dev/null")
if err != nil {
panic("failed to open /dev/null")
}
for msg := range c {
null.Write(msg.data)
bytesRead += len(msg.data)
}
null.Close()
fmt.Printf("%v: read %v bytes\n", i, bytesRead)
e <- true
}
test1.go: same test, but with code that just has reading and writing in parallel (i.e. naive). this locks up my laptop (X11) for the duration of the test (literally, mouse refuses to move)
timing results
real 0m40.205s
user 0m26.470s
sys 0m54.352s
code:
package main
import (
"fmt"
"io"
"os"
"strconv"
)
func main() {
if len(os.Args) != 2 {
fmt.Printf("usage:\n\t%v <# of files>\n", os.Args[0])
os.Exit(1)
}
count, err := strconv.Atoi(os.Args[1])
if err != nil {
panic(fmt.Sprintf("couldn't convert %v to an int", os.Args[1]))
}
end := make([]chan bool, 0)
for i := 0; i < count; i++ {
e := make(chan bool)
end = append(end, e)
fmt.Printf("starting reader thread %v\n", i)
go simpleReaderWriter(i, e)
}
fmt.Printf("waiting for ending\n")
for _, e := range end {
<-e
}
}
func simpleReaderWriter(i int, e chan bool) {
bytesRead := 0
f, err := os.Open(fmt.Sprintf("%v", i))
if err != nil {
panic(fmt.Sprintf("failed to open %v: %v", i, err))
}
defer f.Close()
null, err := os.Open("/dev/null")
if err != nil {
panic(fmt.Sprintf("%v: couldn't open /dev/null: %v", i, err))
}
defer null.Close()
buf := make([]byte, 4096)
finished := false
for !finished {
read, err := f.Read(buf)
if err != nil {
if err == io.EOF {
f.Close()
finished = true
}
}
bytesRead += read
null.Write(buf)
}
fmt.Printf("%v: read %v bytes\n", i, bytesRead)
e <- true
}
once current rclone transfer is done (probably sometime tonight / tomorrow), I'll test it on my old reliably Dell R710 with a 5 * 8TB RAID5 array (what I'm generally transferring off of), with 48GB and 4 CPUs with 4 threads each). My expectation is that it will be even much more in favor of test.go over test1, but will see.
and for the sake of completion, timing dd on a single 100GB file (i.e. same amount of data)
25600000+0 records in
25600000+0 records out
104857600000 bytes (105 GB, 98 GiB) copied, 45.6688 s, 2.3 GB/s
real 0m45.670s
user 0m8.406s
sys 0m32.603s
spotter@spotterT490:~/go/src/github.com/rclone/rclone$ time dd if=0 of=/dev/null bs=1M
100000+0 records in
100000+0 records out
104857600000 bytes (105 GB, 98 GiB) copied, 44.033 s, 2.4 GB/s
real 0m44.042s
user 0m0.095s
sys 0m28.785s
spotter@spotterT490:~/go/src/github.com/rclone/rclone$ time dd if=0 of=/dev/null bs=10M
10000+0 records in
10000+0 records out
104857600000 bytes (105 GB, 98 GiB) copied, 38.8603 s, 2.7 GB/s
real 0m38.865s
user 0m0.037s
sys 0m30.930s
What conclusion to you draw from that? I'd say that too much IO causes resource starvation elsewhere. If you run vmstat 5
when it is running I expect you will see the number of blocked processes increase
It might be worth experimenting with ionice
- however I've not had much success with preventing this sort of thing...
Also, as an aside, its hard to get this go threading right, without calling any system calls, a go thread might never schedule out.
Actually any function call is a scheduling point, not just a syscall. I've never had a problem with it in practice.
go1.14 which is in preview now does pre-emptive scheduling BTW...
yea, I saw the release notes for 1.14,
I wonder if len() counts as a scheduling point, because I was getting starvation on my code without the forced resched (which disappeared when I would stick in fmt.Printf's for debugging). and basically if len of a channel was more than half its capacity, I would just loop, and the reader of the channel never got a chance to do anything (or at least thats my hypothesis).
on a different note: looked at the asyncreader, and wondering what's the point of the timeout in the context of rclone. perhaps it has value in the context of the fs mounting, but it would seem for most use cases, its self defeating? i.e. I read it, I will need it (as most files are read to completion), but rclone is willing to throw it away if it hasn't been read within a certain period of time.
I would say it doesn't as len() just reads the length from the slice/channel/map header - no function call required.
Are you referring to this?
bufferCacheFlushTime = 5 * time.Second // flush the cached buffers after this long
This is how long buffers hang around for re-use, not how long rclone keeps the data for. Rclone hangs onto the data indefinitely.
As i wanted to use "huge" (100 - 300 MB / possible GB) read ahead buffer cache. I ended up creating a new fuse based (openstack swift backed only) project . It takes me all my time XD, and i'm searching tweak/hints in topics like this one.
Underlying http layer is https://github.com/131/random-read-http
Now, the fuse binding layer i'm working with has been refactored to a multithreaded lib. And now i am suffering with cache invalidation and non serial reads. In here, I can read nick's pointer to "async buffer reader" for rclone an search for more good solutions. Rclone is great. Internet is great. Thank you nick again for your amazing work.
Have a great day to you all.
This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.