Backblaze b2 humming along as a replacement for ACD

Keith_M · May 24, 2017, 2:59pm

As soon as I got wind of the ACD stuff going down, I setup rclone for backblaze b2. Got my pro-rated refund from Amazon.

https://www.backblaze.com/b2/cloud-storage-pricing.html

Pricing is $5/TB/month for storage, free uploading, and $20/TB for downloading. There’s some minor API transaction costs, but it doesn’t appear that they will add up to anything significant.

From the East Coast US, backblaze is pretty quick. Easily 150-180mbps uploading, which is flooding my FIOS. Fantastic, really. I’ve got reasonable enough speeds that I wasn’t gonna mess with a VPS or similar to do cloud->cloud.

My storage requirements are in the single digit TBs. The only time I’ll be downloading is to test the backup, which will consist of spot-checking by downloading 10s of GBs throughout the directories, and then hashing the original vs uploaded. I think b2 might store a checksum in its metadata, however I’ll have to think through whether that’s sufficient. I suppose there could be some bitrot after initial upload.

rclone doesn’t support b2’s versioning with filename encryption due to appended date/time stamp of the versions, and b2 doesn’t support --backup-dir option of rclone due to lack of server-side moves.

While b2 is much more than google drive, I don’t feel like screwing around with yet another provider where I’m the undesirable part of the equation. Basically just waiting to be marginalized, slowed, and excluded. I’m a full-paying customer of b2 – I pay what all customers of b2 pay for storage. If I use more, I pay more.

No real quirks with b2 that I’ve found yet – everything seems to be working as before.

Keith

calisro · May 24, 2017, 3:02pm

Agreed. b2 fits for the long-term storage use case. Not bad for that use case. It is really my backup of my backup.

Keith_M · May 24, 2017, 3:12pm

Yup. I use StableBit DrivePool redundancy (3x on most stuff because wtf not!), and SnapRaid 2-parity as well. I’d have to have a power supply nuke the disks, serious water damage, or outright theft of the equipment before I’d need to download. b2 checks the “offsite” box and sort of worst-case scenario coverage. I could withstand at least a couple simultaneous full disk failures(2 for Snapraid, 3 for Drivepool) before the data would start to be threatened. Did I mention I’m running WD Golds? The server is on a UPS, which also helps. I’m using ECC ram and do read scrubbing regularly to check for bitrot.

Berzerk90er · May 24, 2017, 4:25pm

Backblaze has an unlimited Option for 5$/month. Not a solution?

c17r · May 24, 2017, 6:51pm

The $5/month plan doesn't have an API like the B2, it's just a mirror of your local disks.

Keith_M · May 24, 2017, 7:40pm

No, you’ve got to use their proprietary backup software to take advantage of that. While for single computer backup it’s alright, I’ve found that the upload isn’t nearly as fast, and they remove files from their backup after the original has been deleted. It’s not really long term archival cold storage.

I do in fact use that solution too, but the offerings aren’t really close. I have a single VM that backs up that way. It could be a few TB up there. Initial backup probably took a week or so.

I really like using rclone (or similar) to simply sync a large NAS drive pool via B2 (or similar). I do a few TB in as a couple days. Rclone gives me more control, more verbose status and logging.

zhup · May 24, 2017, 8:04pm

Hello Keith_M,

I am also using the StableBit DrivePool (3x duplication).
Could you please give me the instruction how to use it together with SnapRaid?

Thank you in advance.

Keith_M · May 24, 2017, 9:04pm

The setup is fairly vanilla.

I point the data disks at the mount point folders for the individual disks — not the pool.

I store the parity information on other non-data disks.

I exclude these:

exclude *.unrecoverable
exclude Thumbs.db
exclude $RECYCLE.BIN
exclude \System Volume Information
exclude \Program Files
exclude \Program Files (x86)
exclude \Windows
exclude .covefs

I spread my .content files on multiple disks outside of the pool – including the Boot OS SSD.

I leave DrivePool balancing on which in theory will move a bunch of files around, but in practice doesn’t. Occasionally, I’ll get a message that a file has moved during a snapraid sync.

I consider the snapraid layer to be UNDER the drivepool layer. SnapRaid gives me silent bit rot detection and correction and an easy way to Read Scrub the disks.

For the failure capabilities, I’m not sure if an actual failure occurred if I would actually go through the trouble of recreating the failed disk or not via SnapRaid. I wouldn’t lose anything due to DrivePool’s duplication. I suppose during a file evacuation there could be some large scale moves which would screw with SnapRaid parity.

I’m still optimizing my setup, and I’m still learning too!

Keith

zhup · May 25, 2017, 1:14pm

Keith_M:

The setup is fairly vanilla.

I point the data disks at the mount point folders for the individual disks --- not the pool.

I store the parity information on other non-data disks.

I exclude these:

exclude *.unrecoverableexclude Thumbs.dbexclude $RECYCLE.BINexclude \System Volume Informationexclude \Program Files\exclude \Program Files (x86)\exclude \Windows\exclude .covefs

I spread my .content files on multiple disks outside of the pool -- including the Boot OS SSD.

I leave DrivePool balancing on which in theory will move a bunch of files around, but in practice doesn't. Occasionally, I'll get a message that a file has moved during a snapraid sync.

I consider the snapraid layer to be UNDER the drivepool layer. SnapRaid gives me silent bit rot detection and correction and an easy way to Read Scrub the disks.

For the failure capabilities, I'm not sure if an actual failure occurred if I would actually go through the trouble of recreating the failed disk or not via SnapRaid. I wouldn't lose anything due to DrivePool's duplication. I suppose during a file evacuation there could be some large scale moves which would screw with SnapRaid parity.

I'm still optimizing my setup, and I'm still learning too!

Keith

Thanks for the explanation.
In the past I was also thinking about connecting DrivePool to SnapRaid, but it seemed to me that this solution was not optimal. In the future I will migrate to zfs, which is quite complex in data protection.

Keith_M · May 25, 2017, 4:23pm

For mostly static content, where the bulk of the data isn’t changing, SnapRaid is great. I usually make mostly additions only, and I only do so a few times a week at most. Anytime I change a decent amount, I do a sync. The non-real time nature of SnapRaid is fine for me. Some people script it like a cron job, but I usually just run the command manually after making big changes. It’s two words “snapraid sync”, runs in the background, and does the business.

http://www.snapraid.it/

does a fine job of describing the benefits and ideal use-cases for it.

While I’m mostly in the minority here, and in other forums, I simply don’t like standard RAID or alternative file systems. The idea that the entire “thing” (pool, array, whatever) can be rendered completely unreadable due to a partial failure is just an untenable state for me. I can just imagine getting some cryptic error message and my stuff is just not accessible due to some weird synchronization problem. There are nightmare examples all over the place.

Of course you have a backup(as do I have my setup), but why put yourself in jeopardy?

In both the cases of DrivePool and SnapRaid, you can simply yank a drive out, and in the worst case you lose simply the files on that drive. And “dd_rescue” can go to work on the pulled drive with other NTFS-aware applications being used for the recovery.

Don’t like DrivePool? Just stop using it. Zero dependencies on any external metadata. You don’t need to copy your data off or “migrate” your stuff. Take your drivepool disks out of the original machine, plug it into another Windows box WITHOUT drivepool, and all of your data is immediately accessible. If you use the duplication feature, you might have multiple copies of the same files – big deal.

Don’t like SnapRaid? Just stop using it. Nothing gets “installed” It’s a single executable in a directory. Delete the parity files and just never run the executable again. Nothing to rebuild or migrate.

Want to add a disk of arbitrary size containing existing files? Add the drive to drive pool (takes a few seconds), and then MOVE your files into the created directory(also super fast operation). Add a single line to your snapraid.conf file, and just run a snapraid sync.

Want to remove a disk? Drivepool automatically moves the data off the disk to other drives on command. If the drive contains just duplicated data, do a “fast remove” and allow the duplication to occur in the background.

I also believe most snapraid functions are linear to the amount of data stored, not the capacity.

Everything sits on top of a native file system.

These are the reasons I really like DrivePool + SnapRaid.

e12 · May 25, 2017, 5:13pm

We are together in the same minority
RAID and zfs are good just until things start failing in any another way than a broken disk. If single disk performance is good enough and you don't care about high availability/uptime (in other words if you just store your pictures or your "linux ISOs") just don't over-complicate things.

bdillahu · May 25, 2017, 5:27pm

Well, there are three of us…

I’ve had/seen more issues with RAID than it solved. People who lost tons of data because the RAID controller died - disks were fine, but couldn’t read them. I finally gave in and was using RAID1 in a little NAS box - figured at least that might be ok.

One disk died. While I was getting that replaced, the other one started having issues. Glad I had backups.

calisro · May 25, 2017, 5:28pm

My raid controller started writing bad data to all my disks. Lovely.

zhup · May 26, 2017, 6:54am

That is why I am using the DrivePool without the SnapRaid - to not over-complicate things.

The StableBit has plans for the future:

StableBit FileVault

Status: (future)

StableBit FileVault is planned to deal with data integrity and file protection, and will fit well into our existing line of StableBit products.

Top level features:

Indexing of your data for enhanced statistics and data integrity verification.
Kernel-level protection of your data from accidental modification by applications or users.
Reporting and auditing of your data.

peatnik · May 26, 2017, 7:03am

Apparently, the first recovery of every file is free.
But $20/TB after that gets expensive very quickly. I have seen some people here mention their 60+TB libraries which would result in $1200+ for the second restore.

e12 · May 26, 2017, 7:43am

Well 60TB would be $300/month, never mind the restore (which presumably you don’t need very often).
On the other hand you ARE testing your backups, right? The complete backup, not just taking three small folders and comparing them, right?

calisro · May 26, 2017, 12:23pm

This type of storage is not to ever use. It is an archival backup like Amazon glacier. “Backup of the backup”.

Keith_M · May 26, 2017, 2:48pm

$300 a month is a fair bit, but 60TB is no joke. Definitely enterprise class data requirements. I do know how people roll around here. I’ve got about 40TB raw. My actual needs are sort of up to 15TB in the foreseeable future, which is $75 a month. Cheaper than my internet and cellphone bills. Given price of cheaper hard drives, I figure the break-even point for even buying that amount of storage at $25/TB, which puts the raw HD cost at $375, or break-even at 5 months. Then there’s the offsite thing (main selling feature), server cost and admin, high bandwidth and networking, maintaining API, different interfaces like web, support for customers.

Seems reasonable enough of a price to me?

I’m often the most careful guy around. Do you really think it’s necessary to download the entire lot of files in order to verify that the backup works? You checksum, you upload, they checksum when they receive it – rejecting the upload as failed otherwise. The checksum is stored as metadata with the file, which you can also check when you download. I suppose this doesn’t cover silent bit rot on their side.

If you spot-check the archive by downloading multiple different groups of important content, and all of it is perfect, without indication of any problems, I don’t know if you’re actually doing anything more by downloading all of it. The point is if it had been successfully uploaded initially, but then later corrupted, surely the corruption could simply also occur after the most recent download. One successful restore really doesn’t mean the next one would be.

Where’s the flaw in my logic? I’m sure I’m missing something.

Thanks

e12 · May 26, 2017, 4:04pm

This is how preserving your digital data works: you make multiple copies and because you're expecting SOME (but you have enough of them so very unlikely all) copies to develop problems you VERIFY periodically all the copies and you replace what's bad. This periodic verification beside keeping you closer to the desired number of copies also gives you a measure of how often a particular way of storing data fails (at least for failures that appear periodically, not for one-time events like google shutting down for example).

OF COURSE this is all theory. Most of us won't tell you seriously that they go ahead and download all their TBs and check them every three months or whatever. We rely on the metadata that tells us the files are there. As we do for our hard drives too, but at least those can be (somehow) easily scrubbed.

miloinu · June 1, 2017, 11:44pm

I’m considering migrating my ~1TB of data from ACD to Backblaze B2. My use case is for “backup of backup” only, so Backblaze should be only slightly costlier than ACD as my data size gradually go above 1TB.

What are your recommendations for migrating data directly from ACD to Backblaze B2? Due to Comcast data cap (1TB/month) & slow upload speed (5kbps), I don’t want to have to pull down my encrypted ACD files to my desktop & then re-upload to Backblaze.