We try to use rclone to sync a huge bucket which have about 40,000,000 objects in alibaba cloud, we run rclone in ECS that has about 32GB memory. When I monitor the memory usage, it goes up.
When about 8,000,000 are synced, the rclone is crashed. It reports out of memory issue. It seems that one object uses about 8K memory. And it will not free the memory.
What is your rclone version (output from rclone version)
1.49.3
Which OS you are using and how many bits (eg Windows 7, 64 bit)
Cent OS 7.3 64bit
Which cloud storage system are you using? (eg Google Drive)
Alibaba cloud OSS
The command you were trying to run (eg rclone copy /tmp remote:tmp)
rclone sync
A log from the command with the -vv flag (eg output from rclone -vv copy /tmp remote:tmp)
How can rclone free the memory for the synced memory
I am using JAVA to wrap rclone as a tool. When I add GOGC=20 and setup 4 environments as following:
Env1: 16GB memory. The previous code without GOGC setting
Env2: 16GM memory. GOGC = 20
Env3: 32GB memory. The previous code without GOGC setting
Env4: 32GB memory: GOGC = 20
Result:
Env1: The memory reaches 100% after 15 minutes.
Env2: The memory reaches 100% after 4 hours and a half.
Env3: The memory goes up smoothly. Now it is about 85%. I think it will throw OutOfMemory several hours later.
Env4: Similar to Env3
I think "GOGC" environment variable can't solve this issue. Here is the wrapper code:
List options = new ArrayList<>();
options.add(baseDirectory + "/bin/rclone");
options.add("sync");
options.add(src);
options.add(dest);
options.add("--config=" + baseDirectory + "/conf/rclone.conf");
options.add("--s3-provider=Alibaba");
options.add("--s3-access-key-id=" + ak);
options.add("--s3-secret-access-key=" + sk);
options.add("--s3-endpoint=" + setDefaultHttp(endpoint));
options.add("--no-check-certificate");
options.add(String.format("--retries=%s", RETRIES));
options.add(String.format("--retries-sleep=%ss", RETRIES_SLEEP));
options.add(String.format("--stats=%ss", STATS));
options.add("--transfers=" + numOfThreads);
options.add("--log-level=" + rcloneLogLevel);
options.add("--log-file=" + baseDirectory + "/logs/rclone/" + "backup_" + dateFormatter.format(new Date()) + "_" + sourcePart + ".log");
options.add("-P");
print(options);
ProcessBuilder processbuilder = new ProcessBuilder(options);
Map<String, String> env = processbuilder.environment();
env.put("GOGC", "20");
Process p;
try {
p = processbuilder.start();
} catch (IOException ex) {
log.error("Failed to start rclone with error {}", StringUtility.getTrace(ex));
throw ex;
}
I think I responded to you on the slack group, but here is a longer explanation.
Rclone buffers each directory in memory when it is transferring data. Each object from the directory uses 0.5-1k of memory, so if you have a directory with 23M objects in it, that will use up to 23GB of memory.
However rclone will try to check multiple directories at once and I think you have two very large directories, so you can help by using --checkers 1 which will cause rclone to load only one of the very large directories at once.
I have set --checkers. The parameter works well. I have two huge folders. Folder1(acesure) has more than about 10,000,000. Folder2(webRdp) has more than about 20,000,000 files. When I set --checkers to 1, I expect that when rclone finishes sync for the first folder, it will free all the memory for the first folder. Then rclone continue to sync for the second folder. Then the server only needs the memory to hold the biggest folder. But it seems it doesn't work. When rclone finish syncing for the first folder, it didn't free the memory.