Prometheus failed to write data

Jeff_Jia · October 11, 2022, 5:49am

What is the problem you are having with rclone?

prometheus failed to write data into filesystem mounted by rclone

Run the command 'rclone version' and share the full output of the command.

rclone v1.59.2
- os/version: ubuntu 22.04 (64 bit)
- os/kernel: 5.15.0-50-generic (x86_64)
- os/type: linux
- os/arch: amd64
- go/version: go1.18.6
- go/linking: static
- go/tags: none

Which cloud storage system are you using? (eg Google Drive)

minio

The command you were trying to run (eg `rclone copy /tmp remote:tmp`)

rclone \
    -vv  \
    --s3-provider Minio \
    --s3-endpoint http://10.43.6.247:9000 \
    --s3-access-key-id minio \
    --s3-secret-access-key supersecret \
    --alias-remote local \
    mount \
    --async-read \
    --attr-timeout=1s \
    --daemon \
    --log-file=01.log \
    --daemon-timeout=5s\
    --daemon-wait=10s \
    --debug-fuse \
    --dir-cache-time=5m \
    --poll-interval=1m \
    --vfs-cache-max-age=1h \
    --vfs-cache-poll-interval=1m \
    --vfs-fast-fingerprint \
    --vfs-cache-mode=full \
    --vfs-write-back=5s \
    --no-modtime \
    --write-back-cache \
    local:/test  /data/prometheus

prometheus error logs

https://bpa.st/TKMA

Ole · October 11, 2022, 8:53am

Hi Jeff,

What makes you think that the error is caused by rclone?

Do you see the same issue if using a minimum of flags/parameters?
(That is try removing --async-read, --attr-timeout, --daemon-timeout, --daemon-wait, --dir-cache-time, --poll-interval, --vfs-cache-max-age, --vfs-cache-poll-interval, --vfs-write-back, --write-back-cache)

What is the redacted output of

rclone config show local:
rclone config show yourMinioRemote:

Jeff_Jia · October 11, 2022, 11:47am

It seems like this problem is related with kernel , these options are working on x86_64, but not on my firefly-rk3588 (arm64), I'll find out why and post result here

Ole · October 11, 2022, 11:56am

Great findings, my suspects are: --daemon-timeout, --daemon-wait and --write-back-cache

Ole · October 11, 2022, 12:11pm

... and --attr-timeout if the file is also being changed on the remote while written by Prometheus.

From the docs (with my emphasis):

The kernel can cache the info about a file for the time given by --attr-timeout . You may see corruption if the remote file changes length during this window. It will show up as either a truncated file or a file with garbage on the end. With --attr-timeout 1s this is very unlikely but not impossible. The higher you set --attr-timeout the more likely it is. The default setting of "1s" is the lowest setting which mitigates the problems above.

and the error you see:

panic: write header: write /data/prometheus/chunks_head/000001.tmp: bad file descriptor

Jeff_Jia · October 12, 2022, 7:38am

It turn out to be a false alarm, these options work well on ARM64 too.
I think that error is caused by my csi-s3-driver, when driver pod got recreated, rclone mount process terminated, so prometheus can't write data into it.

Ole · October 12, 2022, 7:50am

Thanks for the update!
Good to know and glad you have found a possible explanation.

system · November 11, 2022, 7:51am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.