20 TB per Year
What does it cost to use 24 terabytes per year? It depends on what you do with it. At first, amazon glacier sounds like write-once-read-never disaster recovery storage, but it can be more than that. These calculations are for the US East region in US dollars.
Retrieval requests complete within 3-5 hours. Upload and Retrieval costs $0.05 per 1,000 requests regardless of data size. All data transfer in is free. All data transfer out to EC2 in the same region is free. Data transfer out to the internet is on a sliding scale, starting at free and moving to $0.12 / GB for your first 10 TB in a month. In addition to data transfer out fees and retrieval request fees you are charged $0.01 / GB for retrievals in excess of 5% of your average monthly storage. Still, if some of your big data is write-once-read-infrequently, and you can stomach the delay, the cost is very competitive.
To make things simple I'll assume you upload 2 TB in 20 requests (i.e. ~100 GB chunks) on the first day of each month and download never. Since prices will change it's probably not reasonable to forecast farther than five years.
jan = 0.05 + 0.01*1024*2*1 feb = 0.05 + 0.01*1024*2*2 ... dec = 0.05 + 0.01*1024*2*12 = 12*0.05 + (0.01*1024*2)(1+2+3+4+5+6+7+8+9+10+11+12) = $1,598.04 for year one = 5*12*0.05 + (0.01*1024*2)(1+2+...+5*12) = 5*12*0.05 + (0.01*1024*2)(61*30) = $37,481.4 over five years
How about the same thing in S3 reduced redundancy storage? Roughly speaking, uploads cost $0.005 per 1,000 requests, downloads cost $0.004 per 10,000 requests, data transfer in is free, data transfer out to EC2 in the same region is free, data transfer out to the internet is on a sliding scale starting at free and moving to $0.120 / GB for your first 10 TB in a month. Reduced redundancy storage is also on a sliding scale.
First 1 TB / month $0.068 / GB Next 49 TB / month $0.060 / GB Next 450 TB / month $0.048 / GB Next 500 TB / month $0.044 / GB
To make things simple I'll assume you upload 2 TB in 20 requests (i.e. ~100 GB chunks) on the first day of each month and download 4 TB in 4,000,000 requests (i.e. ~1 MB chunks). I'll also ignore the higher cost of the first TB and I'll only use the 450TB cost from year 4 onward. Since prices will change it's probably not reasonable to forecast farther than five years.
jan = 0.005 + 0.004*400 + 0.06*1024*2*1 feb = 0.005 + 0.004*400 + 0.06*1024*2*2 ... dec = 0.005 + 0.004*400 + 0.06*1024*2*12 = 12*(0.005 +0.004*400) + (0.06*1024*2)(1+2+3+4+5+6+7+8+9+10+11+12) = $9,603.9 for year one = 5*12*(0.005 +0.004*400) + (0.06*1024*2)(37*18) +(0.048*1024*2)(25*12) = $111,425.58 over five years
Note, I'm assuming a very low and non-increasing read rate. But of the $100k over five years, only $96 was data transfer, so there's lots of room to increase that relative to the overall cost.
See also The Cost Of Big Data.
Storage of 500 TB in Amazon Glacier for 1 year is $60k. (USD)
A roughly equivalent local solution would be to store each piece of data on two separate disks stored in two separate buildings.
statisticbrain.com (USD) 2013 average cost per GB $0.05 2013 Seagate Barracuda 3,000,000 MB at $129 = $0.043 per GB WD Green 1 TB SATA III 6Gbit/s $70 CAD = 70/TB WD Green 2 TB SATA III 6Gbit/s $100 CAD = 50/TB WD Green 3 TB SATA III 6Gbit/s $125 CAD = 42/TB WD Green 4 TB SATA III 6Gbit/s $185 CAD = 43/TB WD Blue 1 TB SATA III 6Gbit/s 7200 RPM $70 CAD = 70/TB WD Red 1 TB SATA III 6Gbit/s (for NAS) $80 CAD = 80/TB WD Red 2 TB SATA III 6Gbit/s 5400 RPM (for NAS) $120 CAD = 60/TB WD Red 4 TB SATA III 6Gbit/s (for NAS) $220 CAD = 55/TB WD 4 TB SATA III 6Gbit/s 7200 RPM (for NAS) $324 CAD = 81/TB WD Black 4 TB SATA III 6Gbit/s 7200 RPM $320 CAD = 80/TB WD Purple 4 TB SATA III 6Gbit/s $232 CAD = 58/TB Best price = 125/3/1024 = 0.0406 CAD = $0.037 USD per GB Best NAS price = 220/4/1024 = 0.0537 CAD = $0.049 USD per GB
What you would probably do is have the primary copy of data on a RAID-5 NAS drive and the backup on unplugged individual disks in a separate building. Thus the equivalent glacier cost is just the unplugged disks it the separate building. Storage of 500 TB in for 1 year is $19k.
So glacier is three times the cost of the bare disks. Of course, that's across 125 drives and I'm ignoring the cost of wires, controllers, power supply. And there's no redundancy. With that many drive you'd probably need RAID6 which eats up some of your storage. You also have to write your backup to the bare disks and walk them over to your storage building whereas you could automate an upload to Glacier, and the cost of the bare disks is incurred just once while the cost of glacier recurs yearly, and you'll probably have a hard time finding the correct disks when you need them whereas restore form Amazon could be automated. Then there's the manpower involved in maintaining your disk storage facility, heating, etc. And your disks will degrade over time.
If you really were going to store the disks unplugged, you'd probably use tape drives instead which have better shelf life but aren't much more cost efficient. Fore example FUJI LTO6 2.5 TB $72.00 or HP LTO6 2.5 TB $56.40 gives a best price of 56.40/2.5/1024=0.022 = $0.022 USD per GB. While these are roughly half the price, the writer costs ~4k.
In the above I didn't consider compression. LTO6 supports write-once-read-many with 2:1 compression (according to wikipedia, 2.5:1 according to the sites that sell the tapes). Thus storage of 500 TB for 1 year is $7k (plus the cost of the writer ~$4k)
Iron Mountain offsite tape vaulting in the US starts at $150/month which gives you 3 compact cases (10 tapes each) and 2 small cases (20 tapes each). Additional compacts add $6/mo. Additional smalls add $10/mo. Plus you're in for the cost of the tapes. Iron Mountain will hold the cases of tapes indefinitely (and keep sending you a monthly bill). When you send them a tape, you can specify that you want it back in 10 years, thus effectively freeing up space in you cases.
Highly Reliable Systems has a great article discussing the advantages of drives versus tape. They show that unless you're writing to many tapes and archiving them in a write-once-read-never fashion, then tapes plus infrastructure actually costs more than drives. It seems there are no hard facts on longevity and reliability. Shelf life has often been quoted at 10 years for hard drive and up to 30 years for LTO tape but that may be because hard drive vendors aren't interested on opening themselves to the liability of a long life claim.
Finally there are vendors like StrongBox that combine a NAS drive and tapes. Their base model (T10) has a 5.7 TB RAID5 and 4 LTO6 drives. As I understand it the RAID5 caches the MRU files and the LTO drives cheaply expand the overall storage. With compression this represents 2.5*2.5*4+5.7=30.7 TB of data, although their datasheet suggests the T10 can manage up to 500 TB, so I may misunderstand the spec. Maybe it can contain more tapes then in just the 4 tape drives.
In my experience, anything ten years old isn't worth having. I'm leaning towards hard drives in a scheme where data you're likely to never use again gets written to your oldest drives and you unplug them and store them in the building next door.