What is the problem you are having with rclone?
We are running rclone via the python_rclone library and have experienced an interesting failure with rclone that you may be interested in.
The issue was encountered during an automated process to copy files from an incoming s3 bucket to an outgoing s3 bucket. We use the include option on the copy to match/filter files. The reason the problem triggered, turned out to be an error in our IAM role policy related to an incorrect s3:prefix - this caused rclone to not be able to iterate (list) the source bucket. This somehow caused rclone to use the include * wildcard to add all of the files in the current working directory (which was the Apache Airflow home dir) as additional arguments to the rclone command.
As you can see from the error message in the log:
Command copy needs 2 arguments maximum: you provided 12 non flag arguments: ["s3-source:source-bucket/incoming" "s3-destination:destination-bucket/outgoing/" "airflow.cfg" "config" "customer_stored_env" "dags" "logs" "mwaa_stored_env" "plugins" "requirements" "startup" "webserver_config.py"]
Where the following are files and directories from the airflow home folder:
• airflow.cfg
• config
• customer_stored_env
• dags
• logs
• mwaa_stored_env
• plugins
• requirements
• startup
• webserver_config.py
So it does appear that the permission error is triggering incorrect and possibly dangerous behaviour in the rclone client.
Run the command 'rclone version' and share the full output of the command.
rclone: Version v1.66.0
N.B. This was run via an automated process, so I cannot simply capture the version.
Which cloud storage system are you using? (eg Google Drive)
Amazon S3
The command you were trying to run (eg rclone copy /tmp remote:tmp
)
This command was generated via the python_rclone library on a production system and I have changed the names for the remotes and buckets.
rclone cp s3-source:source-bucket/incoming to s3-destination:destination-bucket/outgoing/ --ignore-existing --verbose --max-age 3d --checkers 25 --cache-workers 25 -vv --include ** --exclude exclude_nothing
The rclone config contents with secrets removed.
Not required for this issue.
N/A
A log from the command with the -vv
flag
[2024-06-12, 03:33:32 UTC] {{s3.py:89}} INFO - Running rclone.copy s3-source:source-bucket/incoming to s3-destination:destination-bucket/outgoing/ with ['--ignore-existing', '--verbose', '--max-age 3d', '--checkers 25', '--cache-workers 25', '-vv', '--include **', '--exclude exclude_nothing']
[2024-06-12, 03:33:32 UTC] {{logging_mixin.py:188}} INFO - Copying incoming to outgoing ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
2024/06/12 03:33:32 DEBUG : --max-age 3d to 2024-06-09 03:33:32.404070352 +0000 UTC m=-259199.931675387
2024/06/12 03:33:32 ERROR : Using --filter is recommended instead of both --include and --exclude as the order they are parsed in is indeterminate
2024/06/12 03:33:32 DEBUG : rclone: Version "v1.66.0" starting with parameters ["rclone" "copy" "--progress" "s3-source:source-bucket/incoming" "s3-destination:destination-bucket/outgoing/" "--ignore-existing" "--verbose" "--max-age" "3d" "--checkers" "25" "--cache-workers" "25" "-vv" "--include" "airflow-worker.pid" "airflow.cfg" "config" "customer_stored_env" "dags" "logs" "mwaa_stored_env" "plugins" "requirements" "startup" "webserver_config.py" "--exclude" "exclude_nothing"]
Usage:
rclone copy source:path dest:path [flags]
...
Use "rclone help backends" for a list of supported services.
Command copy needs 2 arguments maximum: you provided 12 non flag arguments: ["s3-source:source-bucket/incoming" "s3-destination:destination-bucket/outgoing/" "airflow.cfg" "config" "customer_stored_env" "dags" "logs" "mwaa_stored_env" "plugins" "requirements" "startup" "webserver_config.py"]
...