System freezes when running rclone sync

System working perfectly until I run the following and the system freezes :
rclone sync --log-level INFO --log-file /home/foo/bar/rclone.log /home/foo/bar/fee/ fofum:bucket

Once that happens, the freeze, I cannot interract with rclone in any way or else it freezes the system. I can run rclone config, that is it. If I run rclone ls fofum:bucket system freezes. Login under a different user, run rclone config, everything works perfectly, I can use rclone ls to list the contents of the bucket without errors. Run the rclone sync command above and system freezes, that user cannot use rclone anymore.

If I create a new system, install nothing but rclone and run the same command I get the same results.

System is a VM on an ESXi 6.5 system, allocated 16GB RAM, 64GB Disk, 4vCPU. System is dedicated to running scripts. Mostly Bash and Python scripts that move and manipulate data, run reports, and similar. Nothing unusual about the system at all. Ubuntu 18.04.x server with most of the utilities you would expect for running various Bash and Python scripts, Libreoffice headless, and mysql-client. System was running Perfectly. System is only a few months old, fresh install when migrating from a physical system to a VM. Tried it on a different ESXi host, new VM, same issue.

Contents of /home/foo/bar/fee/ is approximately 120 files in different directories, total around 200MB, nothing but PDF files.

I'm not new to Linux, been doing Unix/Linux Sysadmin for around 30 years. Never seen this before. Checked all the usual logs, including the ESXi logs, absolutely nothing unusual. Bizarre.

Anyone seen anything similar?

hello and welcome to the forum,

if you have a question, when creating the topic, it is good to use the question template.
it would have asked you the following questions.
can you please answer them so we can better help you?

can you make sure you are using the lastest version v1.51.0?

can you re-run the command and use --log-level=DEBUG and check the log?

you mentioned that you created a new system, did you copy the rclone.conf file from the first server to second server. if so, perhaps create a new config?

what is the version of the linux kernel you are running?

What is the problem you are having with rclone?

What is your rclone version (output from rclone version)

Which OS you are using and how many bits (eg Windows 7, 64 bit)

Which cloud storage system are you using? (eg Google Drive)

The command you were trying to run (eg rclone copy /tmp remote:tmp)

A log from the command with the -vv flag (eg output from rclone -vv copy /tmp remote:tmp)

The version installed on the regular server is : rclone v1.51.0-021-g5c869d5b-beta
The version installed on the fresh VM is : rclone v1.51.0
The regular server initially had the stable version, but the beta was installed in an attempt to fix the problem.

What is the problem you are having with rclone?

After attempting an rclone sync the system freezes, subsequent attempts to do anything with rclone causes the same problem. Two different systems, one a fresh install with near zero modification, same issue.

What is your rclone version (output from rclone version )

Original system :
rclone v1.51.0-021-g5c869d5b-beta
- os/arch: linux/amd64
- go version: go1.13.7

New system :
rclone v1.51.0
- os/arch: linux/amd64
- go version: go1.13.7

Which OS you are using and how many bits (eg Windows 7, 64 bit)

Ubuntu 18.04.3 Server

The command you were trying to run (eg rclone copy /tmp remote:tmp )

rclone sync --log-level INFO --log-file /home/foo/bar/rclone.log /home/foo/bar/fee/ fofum:bucket
(foo/bar/fee/fofum are different of course)

A log from the command with the -vv flag (eg output from rclone -vv copy /tmp remote:tmp )

No log, system freezes. Nothing written to the log file specified in the original command. Nothing output to the screen. System completely locked up. Cannot access via network or console. System has to be hard reset.

Just as a final test I just built ANOTHER new system. Fresh VM with same configuration, 16GB RAM, 4vCPU, 64GB disk. Freshly downloaded copy of Ubuntu 18.04.3 server ISO. Installed the OS and made absolutely NO changes to the system. Copied the files to /home/foo/bar/fee in default user home directory. Installed rclone using script on Run rclone config to setup the connection to S3 bucket. Test connecting using the following command :
rclone ls fofum:bucket

Get results as expected. Run the rclone sync command and the system freezes. After a hard reset try to run rclone ls command again and the system freezes. Complete system lockup. ESXi is running fine , other VMs on that host running normally, ESXi reports normal resource utilization of the VM but the entire VM is in a frozen state.

1 . you need to run the command using debug log.

A log from the command with the -vv flag (eg output from rclone -vv copy /tmp remote:tmp )

  1. what is the linux kernel version?

Can you run the command in the foreground without using the log-file parameter?

rclone sync -vv /home/foo/bar/fee/ fofum:bucket

When it freezes, you aren't seeing any high CPU / memory or disk utilization? When you restart the system, are you seeing anything in the logs on what caused the freeze?

It's pretty odd that an application would cause a Linux box to freeze up as that's just strange.

1 Like

do you think this is important?
"Login under a different user, run rclone config, everything works perfectly, I can use rclone ls to list the contents of the bucket without errors"

Yes, but the second I issue the command 'rclone sync.....' that stops working as well. That user can no longer use rclone. Create another user, repeat the process, looks fine until we try to sync then its dead.

Is there some hidden cache mechanism on the system, created or utilized by rclone, I'm not noticing? I was going to create another VM and see what changed in the entire file system. This is truly bizarre.

I opened two terminals, hoping to catch some resource utilization increase using htop since it updates fairly quickly. But it froze at the same second. ESXi shows resource utilization and it is normal, as if the system is running normally, then it drops to zero.

I've checked syslog, kernel log, etc. and it is like the system was turned off at the second I start the command. Absolutely no entries to the log files.

This is driving me a little crazy since I don't see anyone else reporting the problem. Two different ESXi hosts, operating normally for all other VMs on the system, running clean OS installs, freezes the second we try anything. Literally everything else on the system works perfectly. I even ran stress on the last VM to make sure there wasn't something unusual going on but it has run for an hour now with the system maxed and no problems.

not unless you choose to use the cache backend module, or mount the remote with --cache-mode writes. These are not on by default - and if they are not used then nothing is cached. Only a few small (16MB default) transfer buffers are in memory at any given time.

An rclone crash is rare. A full freeze is something I have never heard of...

what is the linux kernel version rclone is running under?

Hmm - I see that this is VMware based. I can definitely confirm that rclone works with VMware virtualization for Ubuntu specifically. That said - I use VMware Player on a Windows host - so there are probably low-level differences aplenty there.

Since you are a technical user it seems fair we call upon @ncw here. He might have some idea on how you can gather useful data for debugging the issue.

Aside from whatever he might suggest I can only really suggest to try running a test on a different virtualized OS to see if it's related to the virtual OS or the virtualization-layer itself. If another quite-different linux distro doesn't do the trick then go ham and try a trial windows install. If the problem persists even there then you can definitely call the problem to be outside the virtualized OS I would think.

I agree that it is a very strange case.

When you run that in a terminal, you get no output at all? It just instantly hangs?

Correct, the system immediately freezes. Nothing is written to the log files, nothing is displayed on the screen.

The VMWare ESXi virtual console freezes, any SSH sessions freeze, system will not respond to ping. I left a script running in a screen session that was writing 'date >> /home/foo/timedate.log' every 5 seconds to see if the system was still functioning and maybe the virtual terminals had overloaded, but that stopped writing at the same time. Just in case I scheduled something similar to run every 5 seconds, but it stopped writing as well.

I'm going to try running this on a 19.10 desktop system I am installing now in case it is some bizarre kernel issue.

there have been some issues with rclone and certain linux kernels.
what kernel version rclone is running under?

Sorry, I already blew away the other new VMs I created - currently installing latest Debian 10 from iso I just downloaded.

The regular script server is currently : 4.15.0-76-generic

Can you try running a strace on it as that has to generate some output before it locks up.

strace -f -o /tmp/rclone.strace.txt rclone ......

Yes, I'll do that tonight.

But, I just finished loading two different new VM. One running Debian 10 minimal and the other running Ubuntu 19.10 server. Absolutely nothing done to it other than installing sudo, curl, openssh-server and rclone (using script from website). I was able to sync without error on both of those.

So, maybe something strange with the kernel. I'm going to run the strace tonight and I'll upload the results here.

I think it is very likely a kernel problem - a userspace program like rclone shouldn't be able to kill the whole machine.

Unless possibly it was swapping very hard?

So, I tried the strace -f as advised by Animosity022. Nothing written to that file. This was a brand new install. First login. Default installation of Ubuntu 18.04.3 server. I recorded a video of the process to show what is happening, and what is shown in ESXi. All you'll see is me replicating the process, most exciting thing was watching the CPU usage jump to around 30% when the system froze.

I need to blur a few things on the video then I'll post a link. But, since I can't replicate the problem on newer versions of Ubuntu or Debian I am going to assume this is an issue specific to 18.04.3 in my environment.

Not sure any of you should spend time on this since 20.04 should be released in April, but a lot of systems still running 18.04 so maybe someone else will experience this problem?

Video I mentioned before. Nothing startling here, just a recording of it happening and showing various resource utilization. I blurred out some information on the screen to hide the access keys, etc. but you should be able to see what is going on otherwise. At the end of the video the system was completely frozen - no console access, no ssh access, not answering to ping. I left the system like that for an hour to make sure it wasn't going to time out or finish processing. During that time the rclone sync ran normally on an Ubuntu 19.10 system on the same ESXi server, and completed in under 1 minute.