Rclone OOM killed during sync files check error 137

What is the problem you are having with rclone?

Im runing rclone in Cloud in kubernetes in Jenkins agent container. I have a job that sync many files for different products. This jenkins agent pod had 2GB memory limit. One of the products has grown to about 870k files in COS S3. and all of a sudden sync for this product is failing. I`m syncing between S3 buckets.
I have increase the memory from 2GB to 5GB on the Pod/container and it still crashes. It even look like it crashes during the checking of file differences.

Here is output from the Jenkins log:

16:41:54  2022-10-28 14:41:54 [INFO] Publishing **** to staging
16:42:55 2022/10/28 14:42:55 INFO  : 
16:42:55 Transferred:   	          0 B / 0 B, -, 0 B/s, ETA -
16:42:55 Checks:            299425 / 309432, 97%
16:42:55 Elapsed time:       1m0.5s
16:42:55 
16:43:07 /tmp/jenkins1505612189605231137.sh: line 42:   177 Killed 
rclone sync ${OVERWRITE}--no-update-modtime --metadata --checksum --checkers=6 --transfers=3 COS_US_SOUTH:ibm-docs-dev/"$PRODUCT_KEY" COS_US_SOUTH:ibm-docs-stage/"$PRODUCT_KEY" -v
16:43:07 [INFO] Status: 137

and the jenkins job continues as if nothing happened. it means the container does not run OOM.

I have red several good forum posts here on the forum so you can see I tried to tune the checkers and transfers down from defaults.

I would like to hear anyones opinion on what can go wrong or what I should try to pinpont the problem.

Run the command 'rclone version' and share the full output of the command.

rclone v1.60.0-beta.6355.67fd60275

  • os/version: ubuntu 18.04 (64 bit)
  • os/kernel: 4.15.0-175-generic (x86_64)
  • os/type: linux
  • os/arch: amd64
  • go/version: go1.18.3
  • go/linking: static
  • go/tags: none

Which cloud storage system are you using? (eg Google Drive)

Cloud Object Storage S3

The command you were trying to run (eg rclone copy /tmp remote:tmp)

rclone sync ${OVERWRITE}--no-update-modtime --metadata --checksum --checkers=6 --transfers=3 COS_US_SOUTH:ibm-docs-dev/"$PRODUCT_KEY" COS_US_SOUTH:ibm-docs-stage/"$PRODUCT_KEY" -v

another command I tried was with a DEBUG and dry run, and it failed too

DEBUG : rclone: Version "v1.60.0-beta.6355.67fd60275" starting with parameters ["rclone" "sync" "--no-update-modtime" "--metadata" "--checksum" "--checkers=6" "--transfers=3" "--dry-run" "--log-file=rclone_log.txt" "--log-level" "DEBUG" "COS_US_SOUTH:ibm-docs-dev/XYZ" "COS_US_SOUTH:ibm-docs-stage/XYZ"]

The rclone config contents with secrets removed.

[COS_US_SOUTH]
type = s3
provider = IBMCOS
env_auth = false
access_key_id = ****
secret_access_key = ****
endpoint = s3.us-south.cloud-object-storage.appdomain.cloud

A log from the command with the -vv flag

I have used lgo level DEBUG and output to file.
After abut1,5 minute, 90MB of log output (about milion log lines) it end with no errors.
Same output for every other file it was checking.

...
2022/10/31 09:00:39 DEBUG : p10eai/BC552409.htm: md5 = 7e43768d1a969854ec3c7fe063d2352d OK
2022/10/31 09:00:39 DEBUG : p10eai/BC552409.htm: Size and md5 of src and dst objects identical
2022/10/31 09:00:39 DEBUG : p10eai/BC552409.htm: Unchanged skipping
2022/10/31 09:00:39 DEBUG : p10eai/BC55240A.htm: md5 = 58f9375c20d0b77cc6c59f27022da0c1 OK
2022/10/31 09:00:39 DEBUG : p10eai/BC55240A.htm: Size and md5 of src and dst objects identical
2022/10/31 09:00:39 DEBUG : p10eai/BC55240A.htm: Unchanged skipping

This image shows top about 3 seconds before the rclone process was closed so it reached about 31%MEM

I have tried to get rid of --no-update-modtime --metadata and added --fast-list.
Same result it ends after about a minute without no errors in log. I wonder if the process hits some memory limit and is killed by the kubernetes pod.

Hi Andrej,

Your observations share some similarities with this thread (depending on the characteristics of your data/folders/remotes):
https://forum.rclone.org/t/huge-memory-usage-when-copying-between-s3-like-services/33687

I therefore suggest you try:

rclone sync --dry-run --checkers=1 COS_US_SOUTH:ibm-docs-dev/"$PRODUCT_KEY" COS_US_SOUTH:ibm-docs-stage/"$PRODUCT_KEY" -v

If it succeeds, then you can add back the flags one by one and then gradually increase checkers to locate the breaking point.

If it fails, then I suggest your try to find out which remote is causing the issue by (dry-run) checking each of the remotes towards a local folder using the above command.

What is the highest number of objects (files and folders) you have in a single folder (excluding objects in subfolders of the folder)?

Thanks for the lead.
I have tried lower checkers=1 and it was running for long time. (It takes 3,5h to check whole product key) so I cancelled it and was looking for the sweet spot. And I saw how with checkers=6 and --fast-list it fills 5,6GB of memory in about 1 minute and half.
whole product has 870k files with largest folder with 87k files in single folder, some other fodler about 52k files.
So it means that the checkers check several folders at the same time and fill up the memory with the hughe lists of files?

with rclone sync --no-update-modtime --checksum --dry-run --checkers=5 COS_US_SOUTH:ibm-docs-stage/"$PRODUCT_KEY" COS_US:ibm-docs-prod/"$PRODUCT_KEY" -v
I was able to run the job in 5 minutes and it checked 870k files.
It used little more then 4GB of memory in the peak but did not crashed.
Customers will be happy to wait 5 minutes instead of 3 hours. :slight_smile:
Need to do more testing.

Correct the checkers concurrently checks the folders, and each checker needs to hold the entire folder listing form both remotes in memory to do the comparison. Each entry takes 1-2K, to give you an impression.

So if all your folders where the exact same size then 6 checkers would (roughly) require 6 times more memory - or perhaps only three if perfectly out of phase.

If your folders are differently sized, then your might need enough memory to hold the 6 largest folders in memory from both remotes in the worst case - it typically plays out better in real life.

--fast-list will collect the entire listing of all folders from both remotes in memory using a single checker.

Once you remove --dry-run then there will also need to be memory to perform the --transfers, the more --transfers the more memory.

So it is a balance between resources and speed - like always :smile:

More information here:
https://rclone.org/docs/#checkers-n
https://rclone.org/docs/#fast-list

1 Like

Thanks a lot I wonder how much memory will the transfers need. Most of the files are kilobytes in size, few are more then a MB. So I assume the buffer and streaming of transfers=6 wont go over hundreds of MB.

1 Like

A rough calculation on the 4GB say that it corresponds to holding 2-4M directory entries in memory, that is 1-2M from each remote. This sounds (much) too high, your worst case should be less then 87k + 4*52k = 295k objects, that should fit within 1GB. So something seems wrong with the memory usage - even if the Go garbage collector is a little behind.

I find it a bit disturbing and would probably try a memory profiling to find the explanation/issue, if it was something I relied on to run my business. The instructions to collect the profile are in this post. Up to you.

I don't know. The default --buffer-size is 16M, so I would think something in the range of 20-40M per --transfer (using default settings). I guess independent from the file size. Nothing compared to the 4GB.

I have run memory test on the biggest folder in the structure

jenkins@jenkins-agent-5ns68:~/agent$ rclone test memory COS_US_SOUTH:ibm-docs-dev/****/p10eai
2022/10/31 14:20:26 NOTICE: 82632 objects took 299458512 bytes, 3624.0 bytes/object
2022/10/31 14:20:26 NOTICE: System memory changed from 42484744 to 432937048 bytes a change of 390452304 bytes

Which shows 285MB memory used for this folder so asuming the worst with 6 checkers 2x6x285 = 3.42GB

I have also tried the profiler. when looking at top it start at 6GB memeory and then slowly go down to about 1,5Gb when the rclone is OOM killed. I am not sure how to read the bellow profile. If anyone can have a look and see if anything is out of the ordinary.
rclone sync --rc --no-update-modtime --checksum --dry-run --checkers=6 COS_US_SOUTH:ibm-docs-dev/POWER10 COS_US_SOUTH:ibm-docs-stage/POWER10 -v &
pod memory requests 2G limit 5GB

jenkins@jenkins-agent-9zb66:~/agent$ go tool pprof -text http://localhost:5572/debug/pprof/heap
Fetching profile over HTTP from http://localhost:5572/debug/pprof/heap
Saved profile in /home/jenkins/pprof/pprof.rclone.alloc_objects.alloc_space.inuse_objects.inuse_space.010.pb.gz
File: rclone
Type: inuse_space
Time: Oct 31, 2022 at 2:44pm (UTC)
Showing nodes accounting for 1558.08MB, 98.23% of 1586.19MB total
Dropped 88 nodes (cum <= 7.93MB)
      flat  flat%   sum%        cum   cum%
 1040.66MB 65.61% 65.61%  1481.18MB 93.38%  github.com/aws/aws-sdk-go/private/protocol/xml/xmlutil.XMLToStruct
  296.01MB 18.66% 84.27%   296.01MB 18.66%  github.com/aws/aws-sdk-go/private/protocol/xml/xmlutil.(*XMLNode).findNamespaces (inline)
   53.50MB  3.37% 87.64%   129.51MB  8.16%  encoding/xml.(*Decoder).Token
   53.01MB  3.34% 90.98%    53.01MB  3.34%  github.com/rclone/rclone/backend/s3.(*Fs).newObjectWithInfo
      39MB  2.46% 93.44%       76MB  4.79%  encoding/xml.(*Decoder).rawToken
      37MB  2.33% 95.78%       37MB  2.33%  encoding/xml.(*Decoder).name
      15MB  0.95% 96.72%       15MB  0.95%  encoding/xml.CharData.Copy (inline)
      12MB  0.76% 97.48%       12MB  0.76%  reflect.(*structType).Field
   11.40MB  0.72% 98.20%    64.41MB  4.06%  github.com/rclone/rclone/backend/s3.(*Fs).listDir.func1
    0.50MB 0.032% 98.23%  1579.10MB 99.55%  github.com/rclone/rclone/backend/s3.(*Fs).list
         0     0% 98.23%       37MB  2.33%  encoding/xml.(*Decoder).nsname
         0     0% 98.23%  1509.69MB 95.18%  github.com/aws/aws-sdk-go/aws/request.(*HandlerList).Run
         0     0% 98.23%  1509.69MB 95.18%  github.com/aws/aws-sdk-go/aws/request.(*Request).Send
         0     0% 98.23%  1509.19MB 95.15%  github.com/aws/aws-sdk-go/aws/request.(*Request).sendRequest
         0     0% 98.23%  1509.19MB 95.15%  github.com/aws/aws-sdk-go/private/protocol/restxml.Unmarshal
         0     0% 98.23%  1508.18MB 95.08%  github.com/aws/aws-sdk-go/private/protocol/xml/xmlutil.UnmarshalXML
         0     0% 98.23%    27.01MB  1.70%  github.com/aws/aws-sdk-go/private/protocol/xml/xmlutil.parse
         0     0% 98.23%    27.01MB  1.70%  github.com/aws/aws-sdk-go/private/protocol/xml/xmlutil.parseList
         0     0% 98.23%    27.01MB  1.70%  github.com/aws/aws-sdk-go/private/protocol/xml/xmlutil.parseStruct
         0     0% 98.23%  1509.69MB 95.18%  github.com/aws/aws-sdk-go/service/s3.(*S3).ListObjectsWithContext
         0     0% 98.23%  1579.10MB 99.55%  github.com/rclone/rclone/backend/s3.(*Fs).List
         0     0% 98.23%    53.01MB  3.34%  github.com/rclone/rclone/backend/s3.(*Fs).itemToDirEntry
         0     0% 98.23%  1509.69MB 95.18%  github.com/rclone/rclone/backend/s3.(*Fs).list.func1
         0     0% 98.23%  1579.10MB 99.55%  github.com/rclone/rclone/backend/s3.(*Fs).listDir
         0     0% 98.23%  1509.69MB 95.18%  github.com/rclone/rclone/fs.pacerInvoker
         0     0% 98.23%  1579.10MB 99.55%  github.com/rclone/rclone/fs/list.DirSorted
         0     0% 98.23%  1579.10MB 99.55%  github.com/rclone/rclone/fs/march.(*March).makeListDir.func1
         0     0% 98.23%   768.83MB 48.47%  github.com/rclone/rclone/fs/march.(*March).processJob.func1
         0     0% 98.23%   810.27MB 51.08%  github.com/rclone/rclone/fs/march.(*March).processJob.func2
         0     0% 98.23%  1509.69MB 95.18%  github.com/rclone/rclone/lib/pacer.(*Pacer).Call
         0     0% 98.23%  1509.69MB 95.18%  github.com/rclone/rclone/lib/pacer.(*Pacer).call
jenkins@jenkins-agent-9zb66:~/agent$ go tool pprof -text http://localhost:5572/debug/pprof/heap
Fetching profile over HTTP from http://localhost:5572/debug/pprof/heap
http://localhost:5572/debug/pprof/heap: Get "http://localhost:5572/debug/pprof/heap": read tcp 127.0.0.1:52990->127.0.0.1:5572: read: connection reset by peer
failed to fetch any source profiles
[1]+  Killed                  rclone sync --rc --no-update-modtime --checksum --dry-run --checkers=6 COS_US_SOUTH:ibm-docs-dev/POWER10 COS_US_SOUTH:ibm-docs-stage/POWER10 -v

I have assigned little bit more memory to the jenkins agent pod and the rclone has passed but last profile I gathered was this before the rclone finished
rclone sync --rc --no-update-modtime --checksum --dry-run --checkers=6 COS_US_SOUTH:ibm-docs-dev/POWER10 COS_US_SOUTH:ibm-docs-stage/POWER10 -v &
pod memory requests 4096MB limits 5GB

jenkins@jenkins-agent-g0w2f:~/agent$ go tool pprof -text http://localhost:5572/debug/pprof/heap
Fetching profile over HTTP from http://localhost:5572/debug/pprof/heap
Saved profile in /home/jenkins/pprof/pprof.rclone.alloc_objects.alloc_space.inuse_objects.inuse_space.008.pb.gz
File: rclone
Type: inuse_space
Time: Oct 31, 2022 at 3:03pm (UTC)
Showing nodes accounting for 19688.65kB, 100% of 19688.65kB total
      flat  flat%   sum%        cum   cum%
 4608.62kB 23.41% 23.41%  7737.71kB 39.30%  github.com/aws/aws-sdk-go/private/protocol/xml/xmlutil.XMLToStruct
 2560.35kB 13.00% 36.41%  2560.35kB 13.00%  github.com/aws/aws-sdk-go/aws/endpoints.init
 1089.33kB  5.53% 41.94%  1089.33kB  5.53%  regexp/syntax.(*compiler).inst (inline)
 1081.02kB  5.49% 47.44%  1081.02kB  5.49%  bytes.makeSlice
 1024.05kB  5.20% 52.64%  1024.05kB  5.20%  github.com/aws/aws-sdk-go/private/protocol/xml/xmlutil.(*XMLNode).findNamespaces (inline)
  561.50kB  2.85% 55.49%   561.50kB  2.85%  golang.org/x/net/html.init
  540.51kB  2.75% 58.23%  1052.51kB  5.35%  sync.(*Map).Store
  524.09kB  2.66% 60.90%   524.09kB  2.66%  github.com/rclone/rclone/fs/accounting.(*StatsInfo).removeTransfer
  520.04kB  2.64% 63.54%   520.04kB  2.64%  github.com/rclone/rclone/cmd/serve/http/data.GetTemplate
  516.64kB  2.62% 66.16%   516.64kB  2.62%  github.com/gogo/protobuf/proto.RegisterType
     514kB  2.61% 68.77%      514kB  2.61%  github.com/rclone/rclone/fs/march.(*March).processJob
     514kB  2.61% 71.38%      514kB  2.61%  reflect.unsafe_NewArray
  512.88kB  2.60% 73.99%   512.88kB  2.60%  regexp.onePassCopy
  512.75kB  2.60% 76.59%   512.75kB  2.60%  encoding/pem.Decode
  512.50kB  2.60% 79.19%   512.50kB  2.60%  runtime.allocm
  512.20kB  2.60% 81.80%   512.20kB  2.60%  runtime.malg
  512.07kB  2.60% 84.40%   512.07kB  2.60%  net/http.cloneURL (inline)
  512.05kB  2.60% 87.00%   512.05kB  2.60%  github.com/prometheus/client_golang/prometheus.(*goCollector).Describe
  512.02kB  2.60% 89.60%  2105.04kB 10.69%  encoding/xml.(*Decoder).rawToken
  512.01kB  2.60% 92.20%   512.01kB  2.60%  encoding/xml.(*Decoder).name
  512.01kB  2.60% 94.80%  8706.78kB 44.22%  github.com/rclone/rclone/backend/s3.(*Fs).list
  512.01kB  2.60% 97.40%   512.01kB  2.60%  strings.(*Builder).grow (inline)
     512kB  2.60%   100%      512kB  2.60%  sync.newEntry (inline)
         0     0%   100%  1081.02kB  5.49%  bufio.(*Reader).Read
         0     0%   100%  1081.02kB  5.49%  bufio.(*Reader).ReadByte
         0     0%   100%  1081.02kB  5.49%  bufio.(*Reader).fill
         0     0%   100%  1081.02kB  5.49%  bytes.(*Buffer).Grow (inline)
         0     0%   100%  1081.02kB  5.49%  bytes.(*Buffer).grow
         0     0%   100%   512.75kB  2.60%  crypto/tls.(*Conn).HandshakeContext (inline)
         0     0%   100%  1081.02kB  5.49%  crypto/tls.(*Conn).Read
         0     0%   100%   512.75kB  2.60%  crypto/tls.(*Conn).clientHandshake
         0     0%   100%   512.75kB  2.60%  crypto/tls.(*Conn).handshakeContext
         0     0%   100%  1081.02kB  5.49%  crypto/tls.(*Conn).readFromUntil
         0     0%   100%  1081.02kB  5.49%  crypto/tls.(*Conn).readRecord (inline)
         0     0%   100%  1081.02kB  5.49%  crypto/tls.(*Conn).readRecordOrCCS
         0     0%   100%   512.75kB  2.60%  crypto/tls.(*Conn).verifyServerCertificate
         0     0%   100%   512.75kB  2.60%  crypto/tls.(*clientHandshakeState).doFullHandshake
         0     0%   100%   512.75kB  2.60%  crypto/tls.(*clientHandshakeState).handshake
         0     0%   100%   512.75kB  2.60%  crypto/x509.(*CertPool).AppendCertsFromPEM
         0     0%   100%   512.75kB  2.60%  crypto/x509.(*Certificate).Verify
         0     0%   100%   512.75kB  2.60%  crypto/x509.initSystemRoots
         0     0%   100%   512.75kB  2.60%  crypto/x509.loadSystemRoots
         0     0%   100%   512.75kB  2.60%  crypto/x509.systemRootsPool (inline)
         0     0%   100%  2105.04kB 10.69%  encoding/xml.(*Decoder).Token
         0     0%   100%  1081.02kB  5.49%  encoding/xml.(*Decoder).getc
         0     0%   100%   540.51kB  2.75%  encoding/xml.(*Decoder).mustgetc
         0     0%   100%   512.01kB  2.60%  encoding/xml.(*Decoder).nsname
         0     0%   100%   540.51kB  2.75%  encoding/xml.(*Decoder).text
         0     0%   100%  8763.72kB 44.51%  github.com/aws/aws-sdk-go/aws/request.(*HandlerList).Run
         0     0%   100%  8763.72kB 44.51%  github.com/aws/aws-sdk-go/aws/request.(*Request).Send
         0     0%   100%   512.07kB  2.60%  github.com/aws/aws-sdk-go/aws/request.(*Request).SetContext
         0     0%   100%  8763.72kB 44.51%  github.com/aws/aws-sdk-go/aws/request.(*Request).sendRequest
         0     0%   100%   512.07kB  2.60%  github.com/aws/aws-sdk-go/aws/request.setRequestContext (inline)
         0     0%   100%  8763.72kB 44.51%  github.com/aws/aws-sdk-go/private/protocol/restxml.Unmarshal
         0     0%   100%  8763.72kB 44.51%  github.com/aws/aws-sdk-go/private/protocol/xml/xmlutil.UnmarshalXML
         0     0%   100%  1026.01kB  5.21%  github.com/aws/aws-sdk-go/private/protocol/xml/xmlutil.parse
         0     0%   100%  1026.01kB  5.21%  github.com/aws/aws-sdk-go/private/protocol/xml/xmlutil.parseList
         0     0%   100%  1026.01kB  5.21%  github.com/aws/aws-sdk-go/private/protocol/xml/xmlutil.parseStruct
         0     0%   100%  9275.79kB 47.11%  github.com/aws/aws-sdk-go/service/s3.(*S3).ListObjectsWithContext
         0     0%   100%   512.05kB  2.60%  github.com/prometheus/client_golang/prometheus.(*Registry).Register.func1
         0     0%   100%   512.88kB  2.60%  github.com/rclone/rclone/backend/googlephotos.dirPatterns.mustCompile (inline)
         0     0%   100%   512.88kB  2.60%  github.com/rclone/rclone/backend/googlephotos.init
         0     0%   100%  8706.78kB 44.22%  github.com/rclone/rclone/backend/s3.(*Fs).List
         0     0%   100%  9275.79kB 47.11%  github.com/rclone/rclone/backend/s3.(*Fs).list.func1
         0     0%   100%  8706.78kB 44.22%  github.com/rclone/rclone/backend/s3.(*Fs).listDir
         0     0%   100%   520.04kB  2.64%  github.com/rclone/rclone/cmd.Main
         0     0%   100%   520.04kB  2.64%  github.com/rclone/rclone/cmd.initConfig
         0     0%   100%   520.04kB  2.64%  github.com/rclone/rclone/cmd/serve/httplib.NewServer
         0     0%   100%  1052.51kB  5.35%  github.com/rclone/rclone/fs.init.1
         0     0%   100%  9275.79kB 47.11%  github.com/rclone/rclone/fs.pacerInvoker
         0     0%   100%   524.09kB  2.66%  github.com/rclone/rclone/fs/accounting.(*StatsInfo).PruneTransfers
         0     0%   100%   524.09kB  2.66%  github.com/rclone/rclone/fs/accounting.(*Transfer).Done
         0     0%   100%  8706.78kB 44.22%  github.com/rclone/rclone/fs/list.DirSorted
         0     0%   100%      514kB  2.61%  github.com/rclone/rclone/fs/march.(*March).Run.func1
         0     0%   100%  8706.78kB 44.22%  github.com/rclone/rclone/fs/march.(*March).makeListDir.func1
         0     0%   100%  2560.38kB 13.00%  github.com/rclone/rclone/fs/march.(*March).processJob.func1
         0     0%   100%  6146.40kB 31.22%  github.com/rclone/rclone/fs/march.(*March).processJob.func2
         0     0%   100%   520.04kB  2.64%  github.com/rclone/rclone/fs/rc/rcserver.Start
         0     0%   100%   520.04kB  2.64%  github.com/rclone/rclone/fs/rc/rcserver.newServer
         0     0%   100%   524.09kB  2.66%  github.com/rclone/rclone/fs/sync.(*syncCopyMove).pairChecker
         0     0%   100%  8194.78kB 41.62%  github.com/rclone/rclone/lib/pacer.(*Pacer).Call
         0     0%   100%  8735.29kB 44.37%  github.com/rclone/rclone/lib/pacer.(*Pacer).call
         0     0%   100%   520.04kB  2.64%  github.com/spf13/cobra.(*Command).Execute (inline)
         0     0%   100%   520.04kB  2.64%  github.com/spf13/cobra.(*Command).ExecuteC
         0     0%   100%   520.04kB  2.64%  github.com/spf13/cobra.(*Command).execute
         0     0%   100%   520.04kB  2.64%  github.com/spf13/cobra.(*Command).preRun (inline)
         0     0%   100%   544.67kB  2.77%  go.opencensus.io/resource.init
         0     0%   100%   544.67kB  2.77%  go.opencensus.io/trace/tracestate.init
         0     0%   100%  1081.02kB  5.49%  io.(*LimitedReader).Read
         0     0%   100%   520.04kB  2.64%  main.main
         0     0%   100%  1052.51kB  5.35%  mime.TypeByExtension
         0     0%   100%  1052.51kB  5.35%  mime.initMime
         0     0%   100%  1052.51kB  5.35%  mime.initMimeUnix
         0     0%   100%  1052.51kB  5.35%  mime.loadMimeGlobsFile
         0     0%   100%  1052.51kB  5.35%  mime.setExtensionType
         0     0%   100%   512.07kB  2.60%  net/http.(*Request).WithContext (inline)
         0     0%   100%  1081.02kB  5.49%  net/http.(*body).Read
         0     0%   100%  1081.02kB  5.49%  net/http.(*body).readLocked

Perfect and tough call for me.

The 3.6K per object and the high memory usage (65+18=83%) by the xmlutil in the aws sdk does indeed challenge my view on normal rclone memory usage; but I am not enough expert to say if this is normal for (some of the) S3 backend(s) - or we are looking at a potential issue.

What was the rclone command used to provoke the first memory profile? (Preferably added as an edit to the post)

If is isn't too much trouble then I would like to also see an svg from (one of) the two profiles, that has a lot more info in it.
(now heavily inspired by @ncw in this post)

I have run the same command for both profiles. Difference is the alocation of memory for the Kuberentes Pod.
The first one that failed had set memory requests: 2GB, limits: 5GB
The second one that worked had settings memory requess: 4GB, limits 5GB.
I have a feeling that rclone is/was using only that 2GB of memory and was surpassing the 2GB limit without container getting more memory from the cluster.
Or maybe the container was getting more memory as I saw 6GB availabel first and then dropping to 1,5GB and crashing. But maybe it was not assigned to the rclone. Or rclone could not allocate that newly obtained memory from virtual machine. I am just guessing. Im not pro at VMs or dockers.

I do not know how to generate the svg image from the profile. I cannot run the -web option. As it is not internet facing pod. and I would not able to connect to it with my browser. maybe with curl from within the shell.

Interesting, they look pretty different, but this could also be the timing of the checks and garbage collector.

I am making educated guesses too - and you probably tried more than me in VM and dockers.

I once found a nice man page, but cant find it at the moment.

It is as simple as using -svg instead of -txt or -web, and requires Graphviz. (I have also only tried -txt up till now)

1 Like

There is only 19MB of memory in use according to this profile. Can you get one with more memory in use?

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.