Rclone mount S3 bucket runs out of memory after many directory listings

What is the problem you are having with rclone?

I mounted an S3 bucket. The memory usage slowly grows after many directory listings. Eventually the system runs out of ram and oom kills the rclone process:

[510779.073341] Out of memory: Killed process 4312 (rclone) total-vm:130459036kB, anon-rss:129545496kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:253980kB oom_score_adj:0

Run the command 'rclone version' and share the full output of the command.

rclone v1.59.0

  • os/version: amazon 2 (64 bit)
  • os/kernel: 5.10.130-118.517.amzn2.x86_64 (x86_64)
  • os/type: linux
  • os/arch: amd64
  • go/version: go1.18.3
  • go/linking: static
  • go/tags: none

I also tried rclone v1.60.0-beta.6373.2f461f13e and goofys version 0.24.0-45b8d78375af1b24604439d2e60c567654bcdf88. They both suffer from a memory problem after many listings. The problem might be caused by common library such as aws-sdk-go or fuse?

Which cloud storage system are you using? (eg Google Drive)

AWS S3 bucket with more than 100M objects

The command you were trying to run (eg rclone copy /tmp remote:tmp)

rclone mount s3:<redacted> /mnt/<redacted> --config=/etc/rclone.conf --dir-cache-time=10s --allow-other --log-level=DEBUG --log-file=/var/log/rclone.log --use-mmap --rc

followed by some du executions on a large prefix.

The rclone config contents with secrets removed.

cat /etc/rclone.conf
[s3]
type = s3
provider = AWS
env_auth = true
region = eu-west-1
location_constraint = eu-west-1
acl = private
server_side_encryption = AES256

A log from the command with the -vv flag

2022/08/15 18:00:32 DEBUG : rclone: Version "v1.59.0" starting with parameters ["rclone" "mount" "s3:<REDACTED>" "/mnt/<REDACTED>" "--config=/etc/rclone.conf" "--dir-cache-time=10s" "--allow-other" "--log-level=DEBUG" "--log-file=/var/log/rclone.log" "--use-mmap" "--rc"]
2022/08/15 18:00:32 NOTICE: Serving remote control on http://localhost:5572/
2022/08/15 18:00:32 DEBUG : Creating backend with remote "s3:<REDACTED>"
2022/08/15 18:00:32 DEBUG : Using config file from "/etc/rclone.conf"
2022/08/15 18:00:32 INFO  : S3 bucket <REDACTED>: poll-interval is not supported by this remote
2022/08/15 18:00:32 DEBUG : S3 bucket <REDACTED>: Mounting on "/mnt/<REDACTED>"
2022/08/15 18:00:32 DEBUG : : Root: 
2022/08/15 18:00:32 DEBUG : : >Root: node=/, err=<nil>
2022/08/15 18:00:39 DEBUG : /: Attr: 
2022/08/15 18:00:39 DEBUG : /: >Attr: attr=valid=1s ino=0 size=0 mode=drwxr-xr-x, err=<nil>
2022/08/15 18:00:40 DEBUG : /: ReadDirAll: 
<REDACTED>
2022/08/15 18:02:52 DEBUG : XX: Re-reading directory (11.545480592s old)
2022/08/15 18:02:55 DEBUG : XX/: >Lookup: node=XX/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX/, err=<nil>
2022/08/15 18:02:55 DEBUG : XX/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX/: Attr: 
2022/08/15 18:02:55 DEBUG : XX/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX/: >Attr: attr=valid=1s ino=0 size=0 mode=drwxr-xr-x, err=<nil>
2022/08/15 18:02:55 DEBUG : XX/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX/: ReadDirAll: 
2022/08/15 18:02:55 DEBUG : XX/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX/: >ReadDirAll: item=20, err=<nil>
2022/08/15 18:02:55 DEBUG : XX/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX/: Lookup: name="0BCACE23-05EE-42EB-B31D-A3FB00050870.json"
2022/08/15 18:02:55 DEBUG : XX/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX/: >Lookup: node=XX/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX/0BCACE23-05EE-42EB-B31D-A3FB00050870.json, err=<nil>
2022/08/15 18:02:55 DEBUG : XX/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX/0BCACE23-05EE-42EB-B31D-A3FB00050870.json: Attr: 
2022/08/15 18:02:55 DEBUG : XX/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX/0BCACE23-05EE-42EB-B31D-A3FB00050870.json: >Attr: a=valid=1s ino=0 size=2695 mode=-rw-r--r--, err=<nil>
2022/08/15 18:02:55 DEBUG : XX/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX/: Lookup: name="296863f4-9c8f-4914-ad2e-c7d4db733078"
2022/08/15 18:02:55 DEBUG : XX/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX/: >Lookup: node=XX/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX/296863f4-9c8f-4914-ad2e-c7d4db733078, err=<nil>
2022/08/15 18:02:55 DEBUG : XX/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX/296863f4-9c8f-4914-ad2e-c7d4db733078: Attr: 
2022/08/15 18:02:55 DEBUG : XX/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX/296863f4-9c8f-4914-ad2e-c7d4db733078: >Attr: a=valid=1s ino=0 size=4245 mode=-rw-r--r--, err=<nil>
2022/08/15 18:02:55 DEBUG : XX/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX/: Lookup: name="448A6385-9894-4332-B9DC-8C39500C7A48.json"
2022/08/15 18:02:55 DEBUG : XX/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX/: >Lookup: node=XX/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX/448A6385-9894-4332-B9DC-8C39500C7A48.json, err=<nil>
2022/08/15 18:02:55 DEBUG : XX/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX/448A6385-9894-4332-B9DC-8C39500C7A48.json: Attr: 
2022/08/15 18:02:55 DEBUG : XX/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX/448A6385-9894-4332-B9DC-8C39500C7A48.json: >Attr: a=valid=1s ino=0 size=2697 mode=-rw-r--r--, err=<nil>
2022/08/15 18:02:55 DEBUG : XX/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX/: Lookup: name="5f3444b9-a484-44a3-9c11-e1adef2610ba"
2022/08/15 18:02:55 DEBUG : XX/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX/: >Lookup: node=XX/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX/5f3444b9-a484-44a3-9c11-e1adef2610ba, err=<nil>
2022/08/15 18:02:55 DEBUG : XX/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX/5f3444b9-a484-44a3-9c11-e1adef2610ba: Attr: 
2022/08/15 18:02:55 DEBUG : XX/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX/5f3444b9-a484-44a3-9c11-e1adef2610ba: >Attr: a=valid=1s ino=0 size=1855 mode=-rw-r--r--, err=<nil>
2022/08/15 18:02:55 DEBUG : XX/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX/: Lookup: name="63ed8805-d5f2-452a-80ac-4581c3128c70"
2022/08/15 18:02:55 DEBUG : XX/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX/: >Lookup: node=XX/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX/63ed8805-d5f2-452a-80ac-4581c3128c70, err=<nil>
2022/08/15 18:02:55 DEBUG : XX/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX/63ed8805-d5f2-452a-80ac-4581c3128c70: Attr: 
2022/08/15 18:02:55 DEBUG : XX/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX/63ed8805-d5f2-452a-80ac-4581c3128c70: >Attr: a=valid=1s ino=0 size=1848 mode=-rw-r--r--, err=<nil>
2022/08/15 18:02:55 DEBUG : XX/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX/: Lookup: name="6f2c29af-d0c3-4bb1-ae45-5b0bd1998c13"
2022/08/15 18:02:55 DEBUG : XX/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX/: >Lookup: node=XX/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX/6f2c29af-d0c3-4bb1-ae45-5b0bd1998c13, err=<nil>
<REDACTED>

pprof top:

Showing nodes accounting for 1444.51MB, 99.24% of 1455.58MB total
Dropped 30 nodes (cum <= 7.28MB)
Showing top 10 nodes out of 34
      flat  flat%   sum%        cum   cum%
  860.12MB 59.09% 59.09%  1146.13MB 78.74%  github.com/aws/aws-sdk-go/private/protocol/xml/xmlutil.XMLToStruct
  252.01MB 17.31% 76.40%   252.01MB 17.31%  github.com/aws/aws-sdk-go/private/protocol/xml/xmlutil.(*XMLNode).findNamespaces (inline)
  106.52MB  7.32% 83.72%   106.52MB  7.32%  github.com/rclone/rclone/vfs.newFile
   77.51MB  5.33% 89.05%    77.51MB  5.33%  github.com/rclone/rclone/backend/s3.(*Fs).newObjectWithInfo
   62.28MB  4.28% 93.33%    62.28MB  4.28%  bazil.org/fuse/fs.(*Server).saveNode
   44.57MB  3.06% 96.39%   152.09MB 10.45%  github.com/rclone/rclone/vfs.(*Dir)._readDirFromEntries
   29.50MB  2.03% 98.41%    29.50MB  2.03%  encoding/xml.(*Decoder).name
   11.50MB  0.79% 99.20%       12MB  0.82%  github.com/rclone/rclone/cmd/mount.(*Dir).Lookup
    0.50MB 0.034% 99.24%       30MB  2.06%  encoding/xml.(*Decoder).rawToken
         0     0% 99.24%  1451.01MB 99.69%  bazil.org/fuse/fs.(*Server).Serve.func1

pprof heap

(pprof) list github.com/aws/aws-sdk-go/private/protocol/xml/xmlutil.XMLToStruct
Total: 1.42GB
ROUTINE ======================== github.com/aws/aws-sdk-go/private/protocol/xml/xmlutil.XMLToStruct in github.com/aws/aws-sdk-go@v1.44.29/private/protocol/xml/xmlutil/xml_to_struct.go
  860.12MB     2.24GB (flat, cum) 157.45% of Total
         .          .     44:   n.Children[child.Name.Local] = append(n.Children[child.Name.Local], child)
         .          .     45:}
         .          .     46:
         .          .     47:// XMLToStruct converts a xml.Decoder stream to XMLNode with nested values.
         .          .     48:func XMLToStruct(d *xml.Decoder, s *xml.StartElement) (*XMLNode, error) {
  267.02MB   267.02MB     49:   out := &XMLNode{}
         .          .     50:   for {
         .       30MB     51:           tok, err := d.Token()
         .          .     52:           if err != nil {
         .          .     53:                   if err == io.EOF {
         .          .     54:                           break
         .          .     55:                   } else {
         .          .     56:                           return out, err
         .          .     57:                   }
         .          .     58:           }
         .          .     59:
         .          .     60:           if tok == nil {
         .          .     61:                   break
         .          .     62:           }
         .          .     63:
         .          .     64:           switch typed := tok.(type) {
         .          .     65:           case xml.CharData:
   95.01MB    99.01MB     66:                   out.Text = string(typed.Copy())
         .          .     67:           case xml.StartElement:
         .          .     68:                   el := typed.Copy()
         .          .     69:                   out.Attr = el.Attr
         .          .     70:                   if out.Children == nil {
      27MB       27MB     71:                           out.Children = map[string][]*XMLNode{}
         .          .     72:                   }
         .          .     73:
         .          .     74:                   name := typed.Name.Local
         .          .     75:                   slice := out.Children[name]
         .          .     76:                   if slice == nil {
         .          .     77:                           slice = []*XMLNode{}
         .          .     78:                   }
         .     1.12GB     79:                   node, e := XMLToStruct(d, &el)
         .   120.51MB     80:                   out.findNamespaces()
         .          .     81:                   if e != nil {
         .          .     82:                           return out, e
         .          .     83:                   }
         .          .     84:                   node.Name = typed.Name
         .   131.51MB     85:                   node.findNamespaces()
  257.52MB   257.52MB     86:                   tempOut := *out
         .          .     87:                   // Save into a temp variable, simply because out gets squashed during
         .          .     88:                   // loop iterations
         .          .     89:                   node.parent = &tempOut
   21.50MB    21.50MB     90:                   slice = append(slice, node)
  192.06MB   192.06MB     91:                   out.Children[name] = slice
         .          .     92:           case xml.EndElement:
         .          .     93:                   if s != nil && s.Name.Local == typed.Name.Local { // matching end token
         .          .     94:                           return out, nil
         .          .     95:                   }
         .          .     96:                   out = &XMLNode{}

The XML nodes seem to have circular references, but the mark and sweep collector should be able to clean this. I don't understand why these stay in memory.

What's the memory on the server you are trying to run this?

I started with 32GB ram, later on I tried 128GB but that did not help.

Check out:

@ncw explains the memory usage pretty well there.

Thanks a lot! This is very interesting material. 1k of RAM per object sounds reasonable. I don't expect that I list more than 100k objects at the same time. That would cost around 100MB, which is then kept in cache for the dir-cache-time of 10 seconds. But I think I need to monitor the listing size more carefully to be sure.

I opened a pull request to reduce memory consumption: Reduce S3 mounts memory consumption by 75% by agterdenbos · Pull Request #6393 · rclone/rclone (github.com)

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.