I first read about gzip here:
But there is no mention of how to force gzip to occur, merely a flag allowing it to occur? does this mean most s3 backends do not support gzip?
Why do I care? Again I’ve been investing how S3 Intelligent-Tiering: Works hoping to solve my problem. And one issue I’ve noticed is that objects smaller than 128kb are treated as objects sized at 128kb
This is bad for me as some of my backups are proper normal disk image backups of my OS, I have a lot of other much lazier backups of let’s say for example the state of python on my machine on some random day. Many 1000s of files just uploaded loose because google drive was spoiling me by not forcing me to make proper backups ![]()
A gzip might solve this issue. I get the g in gzip likely stands for google. I read Google Cloud Storage but from what I can see… Overview of Storage Intelligence | Google Cloud Documentation offers automated lifecycle procedures but it offers nothing at all like amazon s3’s S3 Intelligent-Tiering where archived data can have hot metadata…
So, is it possible to force amazon s3 to use gzip? And is it possible to peak inside a gzip for say a list of files inside the gzip without downloading or otherwise fully accessing the gzip? In fact I swear I read that reading the file list of a gzip is a single HEAD api request? Rather than the potentially 1000-10000 requests that would be made for the file folder structure? Not to mention that S3 Intelligent-Tiering treats files smaller than 128kb not only as 128kb but keeps them hot and refuses to let them into cold storage.
In summary: Does gzip solve all my problems where my backups full of full file path structures and tiny program files are lazily just thrown at a cloud storage solution?
I am guessing no? If so what do I do? What’s the correct way to use s3 objects to backup all my data? I use proper full disk images for my OS, but I keep my OS partition small. I have large media files, large program files, and tiny program files. I also care almost MORE about the file path folder structure and date size name metadata than I do the data itself.
It feels like this makes s3 a poor fit for me. But only cold storage fits my needs, I have millions of files and 100TB that I do not expect to EVER want to access AT ALL. BUT I do need access to it’s metadata to avoid adding duplicates to the collection. Making a disk image of every single folder to maintain the file path structure sounds insane. It also sounds like it would make accessing the metadata impossible. Gzip sounds like the best of both worlds? But. I really need to find a way to have hot metadata and cold data data. Which again is what S3 Intelligent-Tiering offers. It’s just that I am not sure my data is in a format that can work with S3 Intelligent-Tiering at all.
TLTLDR: Does gzip solve all my inept fumbling problems? Does gzip work with amazon S3 Intelligent-Tiering? I assume no. In which case what can anyone recommend?