Backblaze b2 humming along as a replacement for ACD

The setup is fairly vanilla.

I point the data disks at the mount point folders for the individual disks — not the pool.

I store the parity information on other non-data disks.

I exclude these:

exclude *.unrecoverable
exclude Thumbs.db
exclude $RECYCLE.BIN
exclude \System Volume Information
exclude \Program Files
exclude \Program Files (x86)
exclude \Windows
exclude .covefs

I spread my .content files on multiple disks outside of the pool – including the Boot OS SSD.

I leave DrivePool balancing on which in theory will move a bunch of files around, but in practice doesn’t. Occasionally, I’ll get a message that a file has moved during a snapraid sync.

I consider the snapraid layer to be UNDER the drivepool layer. SnapRaid gives me silent bit rot detection and correction and an easy way to Read Scrub the disks.

For the failure capabilities, I’m not sure if an actual failure occurred if I would actually go through the trouble of recreating the failed disk or not via SnapRaid. I wouldn’t lose anything due to DrivePool’s duplication. I suppose during a file evacuation there could be some large scale moves which would screw with SnapRaid parity.

I’m still optimizing my setup, and I’m still learning too!

Keith

1 Like

Thanks for the explanation.
In the past I was also thinking about connecting DrivePool to SnapRaid, but it seemed to me that this solution was not optimal. In the future I will migrate to zfs, which is quite complex in data protection.

For mostly static content, where the bulk of the data isn’t changing, SnapRaid is great. I usually make mostly additions only, and I only do so a few times a week at most. Anytime I change a decent amount, I do a sync. The non-real time nature of SnapRaid is fine for me. Some people script it like a cron job, but I usually just run the command manually after making big changes. It’s two words “snapraid sync”, runs in the background, and does the business.

http://www.snapraid.it/

does a fine job of describing the benefits and ideal use-cases for it.

While I’m mostly in the minority here, and in other forums, I simply don’t like standard RAID or alternative file systems. The idea that the entire “thing” (pool, array, whatever) can be rendered completely unreadable due to a partial failure is just an untenable state for me. I can just imagine getting some cryptic error message and my stuff is just not accessible due to some weird synchronization problem. There are nightmare examples all over the place.

Of course you have a backup(as do I have my setup), but why put yourself in jeopardy?

In both the cases of DrivePool and SnapRaid, you can simply yank a drive out, and in the worst case you lose simply the files on that drive. And “dd_rescue” can go to work on the pulled drive with other NTFS-aware applications being used for the recovery.

Don’t like DrivePool? Just stop using it. Zero dependencies on any external metadata. You don’t need to copy your data off or “migrate” your stuff. Take your drivepool disks out of the original machine, plug it into another Windows box WITHOUT drivepool, and all of your data is immediately accessible. If you use the duplication feature, you might have multiple copies of the same files – big deal.

Don’t like SnapRaid? Just stop using it. Nothing gets “installed” It’s a single executable in a directory. Delete the parity files and just never run the executable again. Nothing to rebuild or migrate.

Want to add a disk of arbitrary size containing existing files? Add the drive to drive pool (takes a few seconds), and then MOVE your files into the created directory(also super fast operation). Add a single line to your snapraid.conf file, and just run a snapraid sync.

Want to remove a disk? Drivepool automatically moves the data off the disk to other drives on command. If the drive contains just duplicated data, do a “fast remove” and allow the duplication to occur in the background.

I also believe most snapraid functions are linear to the amount of data stored, not the capacity.

Everything sits on top of a native file system.

These are the reasons I really like DrivePool + SnapRaid.

1 Like

We are together in the same minority :slight_smile:
RAID and zfs are good just until things start failing in any another way than a broken disk. If single disk performance is good enough and you don’t care about high availability/uptime (in other words if you just store your pictures or your “linux ISOs”) just don’t over-complicate things.

1 Like

Well, there are three of us…

I’ve had/seen more issues with RAID than it solved. People who lost tons of data because the RAID controller died - disks were fine, but couldn’t read them. I finally gave in and was using RAID1 in a little NAS box - figured at least that might be ok.

One disk died. While I was getting that replaced, the other one started having issues. Glad I had backups.

1 Like

My raid controller started writing bad data to all my disks. Lovely.

That is why I am using the DrivePool without the SnapRaid - to not over-complicate things.

The StableBit has plans for the future:

StableBit FileVault

Status: (future)

StableBit FileVault is planned to deal with data integrity and file protection, and will fit well into our existing line of StableBit products.

Top level features:

Indexing of your data for enhanced statistics and data integrity verification.
Kernel-level protection of your data from accidental modification by applications or users.
Reporting and auditing of your data.
1 Like

Apparently, the first recovery of every file is free.
But $20/TB after that gets expensive very quickly. I have seen some people here mention their 60+TB libraries which would result in $1200+ for the second restore.

Well 60TB would be $300/month, never mind the restore (which presumably you don’t need very often).
On the other hand you ARE testing your backups, right? The complete backup, not just taking three small folders and comparing them, right?

This type of storage is not to ever use. It is an archival backup like Amazon glacier. “Backup of the backup”.

$300 a month is a fair bit, but 60TB is no joke. Definitely enterprise class data requirements. I do know how people roll around here. I’ve got about 40TB raw. My actual needs are sort of up to 15TB in the foreseeable future, which is $75 a month. Cheaper than my internet and cellphone bills. Given price of cheaper hard drives, I figure the break-even point for even buying that amount of storage at $25/TB, which puts the raw HD cost at $375, or break-even at 5 months. Then there’s the offsite thing (main selling feature), server cost and admin, high bandwidth and networking, maintaining API, different interfaces like web, support for customers.

Seems reasonable enough of a price to me?

I’m often the most careful guy around. Do you really think it’s necessary to download the entire lot of files in order to verify that the backup works? You checksum, you upload, they checksum when they receive it – rejecting the upload as failed otherwise. The checksum is stored as metadata with the file, which you can also check when you download. I suppose this doesn’t cover silent bit rot on their side.

If you spot-check the archive by downloading multiple different groups of important content, and all of it is perfect, without indication of any problems, I don’t know if you’re actually doing anything more by downloading all of it. The point is if it had been successfully uploaded initially, but then later corrupted, surely the corruption could simply also occur after the most recent download. One successful restore really doesn’t mean the next one would be.

Where’s the flaw in my logic? I’m sure I’m missing something.

Thanks

2 Likes

This is how preserving your digital data works: you make multiple copies and because you’re expecting SOME (but you have enough of them so very unlikely all) copies to develop problems you VERIFY periodically all the copies and you replace what’s bad. This periodic verification beside keeping you closer to the desired number of copies also gives you a measure of how often a particular way of storing data fails (at least for failures that appear periodically, not for one-time events like google shutting down for example).

OF COURSE this is all theory. Most of us won’t tell you seriously that they go ahead and download all their TBs and check them every three months or whatever. We rely on the metadata that tells us the files are there. As we do for our hard drives too, but at least those can be (somehow) easily scrubbed.

I’m considering migrating my ~1TB of data from ACD to Backblaze B2. My use case is for “backup of backup” only, so Backblaze should be only slightly costlier than ACD as my data size gradually go above 1TB.

What are your recommendations for migrating data directly from ACD to Backblaze B2? Due to Comcast data cap (1TB/month) & slow upload speed (5kbps), I don’t want to have to pull down my encrypted ACD files to my desktop & then re-upload to Backblaze.

People were using google compute (300$ trial credit) and scaleway to transfer their data from ACD. I’ve no experience with either but there are plenty of posts with details and tricks.

Is this documented anywhere? I couldn’t find it.

Documented anywhere? There might be a bug filed on rclone’s bug tracker. Or maybe a thread mentioning it.

I can tell you what happens though:

rclone does filename encryption locally, so they turn blah.jpg to klajsdfkljaklsdjfaklsdfj. B2, when versioning is enabled, appends server-side a “-olderversion-date-time-stamp” onto the old file. It becomes something like “klajsdfkljaklsdjfaklsdfj-July_1_2017_10_35_12”

Whenever rclone goes to read it back, it tries to decrypt “klajsdfkljaklsdjfaklsdfj-July_1_2017_10_35_12” filename using the key, but then garbage is returned from the decryption and so you see a message, “invalid filename, skipping…”

Is this what you wanted to know? rclone could fix this (perhaps by enabling versioning via a switch) to recognize when a plaintext date is appended to an encrypted filename. Maybe process “normally” and the code path that would normally call the error message, instead “walk backwards looking for a -”, then decrypt the original name, keeping the date/time for the local destination…

Keith

That is correct though. not with crypt…

Can you make please make a new issue on github with that in? I don’t think there is one and it catches people out regularly.

The crypt module could peek at the --b2-versions flag maybe.

Alright this is done

Keith

1 Like

Is there a way to have the data compressed before encryption? I like the BackBlaze B2 solution very much, and I add encryption to it, but I seem to be missing the option to compress the data. How may I do this?

Thanks!