Using delete on backup-dirs

Hi. I have a script that syncs my home directory to pcloud. I’m using the backup-dir flag to move deleted local files to a special directory in the cloud; the script will then do a second rclone call, this time using delete and min-age to only delete files in my backup-dir that are over a certain age (90 days in this case). Unfortunately, it seems the modification time of files that are moved to the backup-dir isn’t changed, so if I delete a very old file locally, it will be immediately deleted from my backup-dir. What I was hoping is that rclone would somehow know that a deleted file was deleted on day x, and that it should only be deleted on day x+90. I’m not sure how that could be implemented, though, given that a move is happening, preserving the modification time. Any thoughts…?

Thanks,
Reid

P.S. I guess my script could “touch” all of the files under a newly-created backup-dir, which would do what I want at the cost of losing the original modification time. I’m not sure if rclone has a “touch” command, though…

I keep meaning to build this into rclone, but what I suggest is that you use different dated backup directories, eg 2018-08-01 then every day delete the one that is 90 days old. This is slightly harder to script though.

Of you can use --suffix to keep all the backups in the same directory. If you used the current date as the suffix then you could run through and find and delete the old ones.

I do plan to implement pretty much exactly this scheme in an rclone backup command. It would use a different directory per time interval and keep as many as you specified, or maybe keep them in exponential increasing age (so you have a backup at 1, 2, 4, 8, 16, 32, 64, 128… days).

1 Like

That could work quite easily, so set --backup-dir to backups/2018-08-01, so the backup then run something like this rclone lsf remote:backups/2018-08-01 | xargs -i rclone touch remote:{}

Unfortunately rclone’s touch command only touches a single file at the moment so it will be a bit inefficient. A flag to update the timestamp recursively would be easy enough though.

Thanks for the suggestions. I am using dated directory names for my backup-dir, but I think what I will do is use rclone’s lsd command (or lsjson) to get the modified time for the directories. Until it’s built in to rclone itself, I think I’m facing a parsing problem no matter what, so I might as well switch this over to python instead of bash. I think that should work well and not have the drawback of the “touch” solution. Thanks again!

Reid

So I broke down and converted this into a python program (my first, really) that does the syncing and deletes any backup-dirs and local log files that are older than x number of days. I figure I’ll post it here in case anyone finds it useful.

Thanks for the assistance.

Reid

#!/bin/python3

import psutil
import os
import pwd
import subprocess
import json
from datetime import datetime
from datetime import timedelta
from dateutil.parser import parse
import pytz
import glob

# Delete any rclone backup-dir entries in the cloud that are older than this, in days.
DELETED_OLD_DAYS_THRESHOLD = 90
# Delete any local log files that are older than this, in days.  If
# you use -v options with rclone, they can get huge.
LOGS_OLD_DAYS_THRESHOLD = 30
# Full path to log directory.
LOG_DIR = "/home/reid/rclone_logs"

RCLONE_NUMBER_CHECKERS = 10
RCLONE_NUMBER_TRANSFERS = 6

DATE = '{date:%Y-%m-%d_%H:%M:%S}'.format( date=datetime.now())
LOG_NAME = "rclone_" + DATE + ".log"
print(LOG_DIR + "/" + LOG_NAME)
def get_username():
    return pwd.getpwuid(os.getuid()).pw_name

def is_running(program):
    for pid in psutil.pids(): # Iterates over all process-ID's found by psutil,  
        try:
            p = psutil.Process(pid)
            if program == p.name() and get_username() == p.username():
                return True
            else:
                pass
        except:
            return False

if is_running('rclone'):
    print("rclone is already running; aborting...")
    exit(1)

# First start the sync process.
syncstart = datetime.now()
result = subprocess.run(["ionice", "-c", "3", "rclone", "-v", "--transfers=" + RCLONE_NUMBER_TRANSFERS, "--checkers=" + RCLONE_NUMBER_CHECKERS, "--delete-during", "--exclude-from", "/home/reid/bin/backup-rclone-excludes.txt", "--backup-dir=pcloud:rclone_root/deleted_backups/old_" + DATE, "--log-file=" + LOG_DIR + "/" + LOG_NAME, "sync", "/home/reid", "pcloud:rclone_root/current_backup"])
#print(result)
syncfinish = datetime.now()

# Now delete any old log files, appending to the same log file as rclone's.
with open(LOG_DIR + "/" + LOG_NAME, "a+") as logf:
    msg = "Started sync at: " + str(syncstart)
    logf.write(msg + "\n")
    msg = "Finished sync at: " + str(syncfinish)
    logf.write(msg + "\n")
    msg = "Timedelta: " + str(syncfinish - syncstart)
    logf.write(msg + "\n")
    msg = "Checking for local log files older than " + str(LOGS_OLD_DAYS_THRESHOLD) + " days to delete..."
    logf.write(msg + "\n")
    log_files = glob.glob(LOG_DIR + "/*.log")
    now = datetime.now()
    thresholdLogs = now - timedelta(days=LOGS_OLD_DAYS_THRESHOLD)
    for f in log_files:
        mtime = os.path.getmtime(f)
        date = datetime.fromtimestamp(mtime)
        if (date < thresholdLogs):
            msg = "Deleting old log file: " + str(f)
            logf.write(msg + "\n")
            os.remove(f)
    

    # Finally, delete any old rclone backup-dir entries in the cloud.
    msg = "Checking for deleted backups older than " + str(DELETED_OLD_DAYS_THRESHOLD) + " days to delete..."
    logf.write(msg + "\n")
    result = subprocess.run(["rclone", "lsjson", "pcloud:rclone_root/deleted_backups"], stdout=subprocess.PIPE)
    data = json.loads(result.stdout)
    now = datetime.now(pytz.utc)
    threshold = now - timedelta(days=DELETED_OLD_DAYS_THRESHOLD)
    for entry in data:
        date = parse(entry["ModTime"])
        if (date < threshold):
            fullPath = "pcloud:rclone_root/deleted_backups/" + entry["Path"]
            result2 = subprocess.run(["rclone", "purge", fullPath])
            logf.write("Deleted " + fullPath + "\n")
1 Like

Well, I made a last-minute “safe” change to the code I posted, but it turns out it broke it. Like I said, my first python program, so I’m still learning. Here is a fixed version. Sorry about the hassle.

#!/bin/python3

import psutil
import os
import pwd
import subprocess
import json
from datetime import datetime
from datetime import timedelta
from dateutil.parser import parse
import pytz
import glob

# Delete any rclone backup-dir entries in the cloud that are older than this, in days.
DELETED_OLD_DAYS_THRESHOLD = 90
# Delete any local log files that are older than this, in days.  If
# you use -v options with rclone, they can get huge.
LOGS_OLD_DAYS_THRESHOLD = 30
# Full path to log directory.
LOG_DIR = "/home/reid/rclone_logs"

RCLONE_NUMBER_CHECKERS = 16
RCLONE_NUMBER_TRANSFERS = 6

DATE = '{date:%Y-%m-%d_%H:%M:%S}'.format( date=datetime.now())
LOG_NAME = "rclone_" + DATE + ".log"
print(LOG_DIR + "/" + LOG_NAME)
def get_username():
    return pwd.getpwuid(os.getuid()).pw_name

def is_running(program):
    for pid in psutil.pids(): # Iterates over all process-ID's found by psutil,  
        try:
            p = psutil.Process(pid)
            if program == p.name() and get_username() == p.username():
                return True
            else:
                pass
        except:
            return False

if is_running('rclone'):
    print("rclone is already running; aborting...")
    exit(1)

# First start the sync process.
syncstart = datetime.now()
result = subprocess.run(["ionice", "-c", "3", "rclone", "-v", "--transfers=" + str(RCLONE_NUMBER_TRANSFERS), "--checkers=" + str(RCLONE_NUMBER_CHECKERS), "--delete-during", "--exclude-from", "/home/reid/bin/backup-rclone-excludes.txt", "--backup-dir=pcloud:rclone_root/deleted_backups/old_" + DATE, "--log-file=" + LOG_DIR + "/" + LOG_NAME, "sync", "/home/reid", "pcloud:rclone_root/current_backup"])
#print(result)
syncfinish = datetime.now()

# Now delete any old log files, appending to the same log file as used by rclone.
with open(LOG_DIR + "/" + LOG_NAME, "a+") as logf:
    msg = "Started sync at: " + str(syncstart)
    logf.write(msg + "\n")
    msg = "Finished sync at: " + str(syncfinish)
    logf.write(msg + "\n")
    msg = "Timedelta: " + str(syncfinish - syncstart)
    logf.write(msg + "\n")
    msg = "Checking for local log files older than " + str(LOGS_OLD_DAYS_THRESHOLD) + " days to delete..."
    logf.write(msg + "\n")
    log_files = glob.glob(LOG_DIR + "/*.log")
    now = datetime.now()
    thresholdLogs = now - timedelta(days=LOGS_OLD_DAYS_THRESHOLD)
    for f in log_files:
        mtime = os.path.getmtime(f)
        date = datetime.fromtimestamp(mtime)
        if (date < thresholdLogs):
            msg = "Deleting old log file: " + str(f)
            logf.write(msg + "\n")
            os.remove(f)
    

    # Finally, delete any old rclone backup-dir entries in the cloud.
    msg = "Checking for deleted backups older than " + str(DELETED_OLD_DAYS_THRESHOLD) + " days to delete..."
    logf.write(msg + "\n")
    result = subprocess.run(["rclone", "lsjson", "pcloud:rclone_root/deleted_backups"], stdout=subprocess.PIPE)
    data = json.loads(result.stdout)
    now = datetime.now(pytz.utc)
    threshold = now - timedelta(days=DELETED_OLD_DAYS_THRESHOLD)
    for entry in data:
        date = parse(entry["ModTime"])
        if (date < threshold):
            fullPath = "pcloud:rclone_root/deleted_backups/" + entry["Path"]
            result2 = subprocess.run(["rclone", "purge", fullPath])
            logf.write("Deleted " + fullPath + "\n")
1 Like