Tuesday, January 19, 2010

Storage Costs - Cost per GB? Not Always

Recently, using "cost per GB" as a metric to rank the acquisition cost for storage platforms has come up quite a bit... starting with a conversation amongst the storage fanatics on Twitter ('bod, myself, sysadmjim, and peglarr) followed by a post from tommyt (Xiotech) and then Storagebod.

First off, the general consensus seems to be that "cost per raw GB" is not ideal.  This is simply because, due to architectural differences (i.e. DIBs), RAID overhead, and other factors, the amount of usable capacity per "raw GB" can vary greatly.  Most people have settled on "cost per usable GB."  Over at Storagebod's blog, an unnamed vendor (but obviously one that has dedup, etc) claims that "cost per used GB" is a better, albeit harder to measure metric.

When migrating from a legacy environment, it is difficult to quantify how much primary storage dedup can save you.  From a career standpoint, it is a little risky to assume that you can purchase 80% of your currently allocated storage to meet your current and future needs based off of what a vendor claims you can squeeze out of it.  While there are tools and benchmarks to help with this, all things being equal, I think it's safe to say people would be more comfortable recommending buying sufficient spinning rust to meet their capacity requirements than relying on dedup/tp.  While dedup/tp can save money (especially for future orders after experiencing how it behaves in a particular environment), for the initial outlay it is a little risky.  Additionally, you need to be careful that you're not spending a greater amount of money on dedup licenses than you would have on disk (after maintenance, etc).

Another issue with cost per GB is it can fluctuate based on what components need to be upgraded with any given order.  With the upfront costs of the cabinets, storage processors, cache, connectivity being fairly significant, the ongoing cost of adding disk can be higher (or lower) depending on what you need to "scale" on the array.

In short, cost per GB is not always a good metric.  While it works when capacity is the primary requirement, it falls short when there are specific IOPS requirements that outstrip physical capacity.  For the sake of argument, lets assume a given corporation has standardized on 3 tiers of storage:
  • Tier 1:  High performance requirements - such that the capacity required does not fulfill the IOPS requirement (typical architecture:  RAID 10, 15k drives or SSDs).
  • Tier 2:  General storage requirements - generally, the capacity required will fulfill the IOPS requirement, especially when spread over enough hosts and disks (typical architecture:  RAID 5, 10k drives).
  • Tier 3:  Archive storage requirements - generally, IOPS is not a major consideration, and fat cheap disks are more important than speed (typical architecture:  RAID 6, 1-2TB SATA drives).
With these tiers, cost per GB is a fairly good metric for Tiers 2 and 3, but falls down in Tier 1 where after the capacity requirement is met, additional drives are allocated to meet a performance requirement.  The use of SSDs can reduce the cost of Tier 1, but in the past it has been difficult to optimally layout the allocations to get the best use out of them.  Technologies such as FAST (EMC) and PAM (NetApp) help with the layout issue.  Wide striping can help performance in all Tiers, but  in Tier 1 you have to be careful if you need to guarantee response time.

FAST v2 is an interesting technology.  Basically, EMC is betting that, with extremely granular intelligent data moves, most customers can eliminate the Tier 2 requirement, spread all the allocations over a lot more 'Tier 3' spindles, and handle all the hot spots by migrating hot blocks to SSDs.  This will make internal chargeback extremely difficult since it is dynamic and self tuning.  Also, it will also make performance testing on non-production servers difficult since, to my knowledge, there isn't a good way to "apply" a given layout to a different environment.

All of which basically says, going forward, straight "Cost per usable GB" is going to become less important for determining the total cost of a storage environment.  My recommendations?
  • Work with all vendors very closely to make sure that they (and you) have a good understanding of the requirements of the proposed environment.  Make sure the estimates are realistic, and in the cases of dedup and TP, make sure that the vendor will stand by the ratios that the solution depends on.
  • Make sure that you understand how the storage environment will grow - namely maintenance costs, stair step upgrades (when purchasing additional disks require cache, directors, additional licensing, etc).  Make sure it is all in the contract, and pre-negotiate as much as possible.
  • Maintain competition between vendors.
Typically, someone will bring up using virtualization as a mechanism to ensure fair competition among vendors for capacity growth.  While there is some validity to that for existing environments where multiple vendors are already on the floor and under maintenance, for new environments the "startup costs" of new arrays tend to negate any "negotiating benefit" you could get from virtualizing the environment in my opinion.


ttrogden said...

Full Disclosure - I’m an Enterprise Architect with Xiotech.

Hey Techmute - nice blog post and thank you very much for the reference in the blog !!

You bring up some great points. At the end of the day, making sure that you size your solution to fit your needs is important. Whether you include dedupe, thin provisioning, and all other sorts of bloated bells and whistles. It all adds up at the end :)

Specifically in regards to your comments around sizing for tier 1 storage. I recently wrote a blog post called “Performance Starved Applications” (http://bit.ly/4WvpiK ) in which I talk about the phenomena of under sizing storage arrays to the point that causes applications to have performance issues. In your blog you mentioned having to add spindles in Tier 1 storage to solve a performance issue. I tell prospects all the time, in typical storage solutions the system will never be any faster than it does when it’s empty. As soon as you start adding data the performance starts to degrade. Now, as good sales engineers it’s our job to figure out how to size the solution correctly. The more information the customer can give us, the more accurate our sizing models can be. For instant, should we base their Tier 1 performance numbers on 50% full, 75% or even 90% full? If you look at a customer that have had their array for more than say 2 years, they are probably at 80% + full. I doubt seriously that their solution was ever sized (Performance wise) based on that capacity utilization.

So I agree, customers need to make sure they are looking at the big picture, not just for the immediate need, but for the needs over the next 12, 24 or even 36+ months. No one wants to go back to their CFO and ask for more money and no one should have to over buy their Tier 1 storage to fix performance issues.

Xiotech Entperise Architect

techmute said...

Thanks for the comment Tommy. One addition:

"If you look at a customer that have had their array for more than say 2 years, they are probably at 80% + full. I doubt seriously that their solution was ever sized (Performance wise) based on that capacity utilization."

If the "performance tier" is at 80%+ full, I'd assume one of three things have occurred.

1. The customer has very closely watched the Tier and has pushed it pretty close to its maximum IOPS capacity for their workload (or else they are running SSDs).

2. There are enough allocations mis-tiered into Tier 1 that are "covering" for the allocations that require Tier 1 speed. EMC FAST can actually save some money by migrating this data to a less costly tier.

3. Tier 1 has been over-allocated to such an extent that it is no longer a performance tier. It is just a different, more costly iteration of their tier 2.