techmute: February 2010

Wednesday, February 24, 2010

The Tiers are Drying Up

A few days ago, NetApp CEO Tom Georgen pronounced that storage tiering is dead. Simon Sharwood has a good post linking to existing commentary... rather than re-link all the sites, I suggest you read his take on it (and associated links). The general consensus is that tiering is not dead.

A sentiment that I agree with, for the most part. Let's think about why most people tier, and what this pronouncement could possibly indicate.

Many organizations tier storage to reduce the TCO of their environment. SSDs are expensive, SATA disks aren't, and not everything requires IOPS ahead of bulk capacity. All things being equal, if SSDs were the cheapest storage option for capacity and IO, then this reason for tiering would not be relevant (best performance in most cases for the cheapest cost... who wouldn't like that?).
Many organizations tier storage to meet unique requirements such as WORM. Of course, it'd be silly to enact WORM policies on 100% of all storage in an enterprise. Since NetApp offers SnapVault, it is obvious he doesn't consider this tiering at all... they still offer SnapVault, right?
So, what it basically comes down to is the standard direction that the industry is going towards; the adoption of SSDs + SATA drives for all storage. This is colloquially referred to as "Flash and Trash." Tom Georgen seemed to be advocating placing 100% of all data on SATA disks and using PAM SSD cards as an extremely large and fast cache to maintain performance requirements. As discussed on Twitter, the main drawback of this is that if you lose PAM, your storage response time is impacted until the replacement PAM is primed with the data. If you want more information, Mike Riley and Dimitris Krekoukias were the two NetApp resources on Twitter who were most engaged.
Additionally, the NetApp stance is that WAFL and Raid-DP provides a RAID-6 implementation that is faster than traditional vendor's RAID-10 implementations... so basically even the SATA drives should perform admirably. The NetApp resources took this as an assumed fact and most of the other people involved didn't which made some of the retorts nonsensical at times. NetApp can show benchmarks that indicate this performance level is realistic (well, as realistic as a benchmark can get), and other vendors can show benchmarks indicating that WAFL performance degrades as it fragments. As someone in the "not-vendor" space, I can't weigh in on how much of an advantage this is, nor whether or not it actually degrades as time goes on. My gut feeling is that there isn't some magic algorithm that removes the parity impacts of RAID-6 after the volumes start filling up.
Probably the best information presented on Data ONTAP over the last few months were over at Marc Farley's blog, StorageRap. The comments have a lot of good, understandable information on how WAFL works.

Going forward, my opinion is that storage tiering is going to remain important, and grow to include migrating data to private/public cloud and even cheaper storage platforms - while doing so, making intelligent decisions such as removing DR replication links post-migration to free up expensive licensing. Basically, the right data on the right platform protected the right way... at all times.

Thursday, February 18, 2010

Cost per $metric - Part 2

Previously, I discussed storage costs and that, while cost per usable GB is perfectly fine in capacity driven tiers, it has less use in IOPS constrained tiers. There were a few aspects of storage costing that were not covered in that post.

First off, most arrays come with additional management software that is often licensed per TB threshold; you only incur additional license costs at certain points of capacity (20TB, then again at 40TB for example). You would need to work with the vendor to ensure that these upgrades are rated into the flat cost per GB for it to be a true "total cost."

Similarly, Storagewonk (Xiotech) had a post discussing the sometimes prohibitive cost of the next TB after a customer has reached the capacity of the current array. It is a very valid point. Even with a good cost per GB pricing model, it is doubtful that it covers cost of the next array simply due to the substantial initial implementation costs and lack of guarantee of future growth. Xiotech's spin on this is that since they're such a modular platform, the ramp up cost for additional footprints is substantially lower. Since I have no idea what an ISE costs, I'm not going to comment on how much of an expense advantage this is (I assume that it is pretty substantial). But it did get me thinking... what are other aspects can impact TCO and storage costs? For some of these, I'll use Xiotech's ISE as an example since it really is a unique solution that demonstrates the necessity of thinking through the impacts of decisions.

Xiotech's ISEs are all attached to the FC fabric. This could potentially increase the number of fabric ports that are used servicing array requests and should be accounted for - make sure to consider the density of the director blade that you use for storage ports.
If hosts need storage from 2 distinct ISEs, I assume that they'll need zoned to each ISE which increases administration time.
If a host is spanned across multiple ISEs, how does that affect replication? Is there a method to guarantee consistency across the ISEs?
Many up-and-coming solutions leverage host-level software to accomplish what used to be handled at the array (compression, replication, etc). How does that affect VMware consolidation ratios? Does it affect IO latency? Make sure that you understand the cost of placing yet-another-host-agent on SAN attached hosts. David Merrill (HDS) goes into this a little more on his blog.
Similarly, are there any per-host costs (such as management or MPIO software) that affect a solution's TCO?
What does migration look if you have a lot of smaller footprint arrays to replace in 3-5 years?

Any good IT architect will look at the total impact of implementing new technology into the environment. In a lot of cases (and I'd pick on Exchange on DAS here), short term cost savings can be quickly eroded by long-term sustainability issues... especially in shared environments.

Wednesday, February 3, 2010

Guarantees and Lowest Common Denominator

In the last few days, 3PAR, Pillar, and HDS joined NetApp as vendors offering guarantees around storage efficiencies. Chuck Hollis (EMC) posted why he feels that EMC (not including VMware, natch) won't offer blanket guarantees like this in the near future. The comments showed that a lot of people were passionate about the topic, especially vendors. It also showed that people who post on Chuck's blog apparently like to talk like press releases.

Honestly, I find this entire topic unnecessary and a little boring. I don't think that guarantees necessarily mean that the vendor is selling snake oil, nor do I think that not having a guarantee shows the vendor is hiding something. I'm still not sure how having an optional guarantee available for customers could ever be seen as being a "negative."

In a previous post, I discussed various ways to evaluate the cost of storage... Cost / GB and Cost IOP. Certain vendors (NetApp, 3PAR, etc) rely on software functionality such as primary storage deduplication and thin provisioning for competitive advantages. These features allow them to propose fewer disks/capacity to meet a customer's IOP or GB requirements. The guarantees show is that the vendor will stand behind these numbers.

If a customer is allowing a vendor to propose fewer spindles due to "secret sauce software," then I'd expect those terms to be written into the contract regardless of whether or not the vendor offers a guarantee. Other than marketing, I don't see a ton of value that the guarantees provide that a decent purchasing contract wouldn't. Yes, my opinions have shifted a little bit since the original NetApp guarantee... it is still a great marketing instrument, but outside of that, not a ton of actual value.

Various other notes from the comments...

"Since the SSD on V-max thru gateway CIFS debacle^W benchmark, it's not even apparent that a workaday NAS solution from EMC can crawl north of 45% storage efficiencies"

You shouldn't claim that SPC benchmarks have any validity and then bash EMC's SPECFS entry. Not many of the SPC entries have any more real-world relevance than EMC's entry here.

"Customers don’t want to have to bring in a team of neurologists to build a storage and data protection solution. NetApp offers simplicity and a great efficiency story."

Last time I checked, NetApp's guarantee required neurologists^W professional services.

"If a vendor is getting into my environment by selling some executive a useless empty guarantee we've started on the wrong foot from square one."

Hate to say it, the problem here really isn't the vendor with the guarantee... it's upper management not listening to their people.

"When I'm buying a car (infrequently, thank goodness) I am interested in the warranties and guarantees; it's a seller's mark of confidence in his product."

Which is why everyone buys Hyundai right now. Or, in the storage realm, Xiotech.

NetApp has a great solution, as does 3PAR, HDS, and EMC. Conversations like this really doesn't help anyone involved, least of all the customers. I'd much rather see debates around various approaches to solving real world problems than arguments like this which seem to be "who has the biggest contract."