- Storagebod posted a brief article on automated storage tiering. To briefly summarize, imperfect automated storage tiering is better than nothing... it is an easy way to get value out of SSDs in an existing environment and it provides a mechanism to move less-used data off of FC drives and onto SATA drives. One thing is certain... the importance of manual data layouts is decreasing. Between array architectures that don't 'allow' it (XIV being the most notorious example), don't 'need' it (NetApp FAS), and traditional architectures getting performance-driven automated storage tiering, using Excel to mismanage storage layouts could finally be over. Dimitris makes the point that due diligence still needs to be applied to allocations that require high performance a few times a month to ensure the volumes don't get migrated to the wrong tier (among other comments). There are excellent comments on that post from EMC and NetApp discussing the two approaches.
- Dimitris also has a good post on vendor competition and under-sizing proposals to get the sale. It is worth reading just for the 'basics' explanation of performance-sizing small arrays - it also has some good information on Compellant's architecture. My comments regarding this vendor comparison are attached to that post. As always, prior to storage acquisitions, make sure you understand how the vendor determined their bid's sizing and get guarantees on performance/capacity if you are at all concerned about meeting your requirements.
- Chuck Hollis (EMC) and Marc Farley (3PAR) have excellent posts up on storage caching.
Odds and Ends - Tiering, and Performance Planning
A few articles I wanted to briefly highlight:
FAST & PAM Contrasted
** Updates Appended Below **
Over the past few days, I've been thinking about storage tiering... both in general, and specifically FAST and PAM II. Each takes a very different approach to providing better storage performance without highly specific tuning. This is an outsider's view based off of publicly available information (so, in cases where I'm wrong, both vendors have shown that they aren't shy in correcting misconceptions). First, some general definitions:
To be clear, this is an "apples to oranges" comparison. Each solution takes a completely different approach to implementing flash into an array, and the two solutions behave very differently.
Additionally, since I was focusing on Flash in particular, I neglected to compare cache capacity directly. DMX/VMAX has a much higher cache capacity than the NetApp arrays. Per Storagezilla on Twitter: "Symmetrix already has acres of globally accessible DRAM for read/write and doesn't need anything like PAM."
Finally, cost does play into comparing the two approaches, but I don't have access to any sort of real-world pricing.
Over the past few days, I've been thinking about storage tiering... both in general, and specifically FAST and PAM II. Each takes a very different approach to providing better storage performance without highly specific tuning. This is an outsider's view based off of publicly available information (so, in cases where I'm wrong, both vendors have shown that they aren't shy in correcting misconceptions). First, some general definitions:
FASTv1: Released in December, it is the first version of EMC's Fully Automated Storage Tiering. It works at the LUN level, and requires identical LUN sizes across tiers. It is not compatible with Thin Provisioned/Virtual Provisioned LUNs.
FASTv2: Scheduled to be released in the second half of 2010, it is the next version of FAST that works at a sub-LUN level. It requires Thin Provisioning/Virtual Provisioning to manage the allocations since it utilizes that functionality to provide the granularity of migration.
PAM II: NetApp's Flash solution, Performance Acceleration Module. It acts as a additional layer of cache memory and does not have specific layout requirements.
Architecture Differences
FAST runs as a task on the processors of the DMX/VMAX. At user specified windows, it will determine volumes/sub-volumes that would benefit from relocation and perform a migration to a different tier. This requires some IO capacity to migrate the data, so offhours/weekends are ideal window candidates. It does a semi-permanent relocation so all reads/writes are serviced by the new location post-migration (semi-permanent since FAST can relocate an allocation back to the prior tier if the performance data indicates it is a good swap). Since RAID protection is maintained throughout the migration, the loss of components do not substantially affect response time.
PAM II is treated as an extremely large read cache. Basically, as a given read-block is expired in memory, it trickles down to PAM until it is finally flushed and resides solely on disk. This gives PAM II a few nice features. First of all, there is no performance hit during the 'charging' of the PAM - since it is fed by expired 'tier-1' cache, there is no additional performance impact after the un-cached block is read. Secondly, it does not cache any writes. This is a giant assumption on my part, but I assume that due to the 'virtualization' WAFL provides, PAM does not need to track changed blocks on the disk. Since everything is pointer based (think of NetApp snaps), when the track is changed on disk, future reads hit the new disk location then get migrated through the cache levels like 'new' reads since the location has changed (the old location/data gets expired fairly quickly). The downside to this approach is that the loss of PAM requires all reads to be serviced by disk+tier 1 memory until it is replaced and recharged.
One thing that the NetApp resources on Twitter kept repeating was the benefits of PAM as an extension of cache. I assume the main benefit of taking this approach to Flash is that it is accessed via memory routines (less layers/translation to execute through) rather than disk routines. Whether or not this is a significant performance benefit, I really can't say.
From the initial implementation, PAM will provide almost immediate benefit as data expires from cache. FAST will require a few iterations of the swap window before things have optimized. Taking a longer view, FAST will work best with consistent workloads... after a few weeks, the migrations should hit an equilibrium and response times should be stable and fast. Component failures should not adversely affect response time. PAM, as an extension of cache, will continuously optimize whatever blocks are getting hit hardest at any given moment. While this is more flexible day-to-day than data migrations, consistent performance could be an issue. Additionally, the IO hit of losing PAM would decrease response times, but the impact of this is somewhat reduced by the fact that ramping up PAM is much faster than the data migrations that FAST requires.
Both solutions make various trade offs between performance, stability, and consistency. Understanding these trade offs will benefit the customer as they choose which tools to leverage in their environment. Following are a few considerations...
Considerations
- Many customers have performance testing environments. Since both of these approaches optimize as tests run, what relationship can be drawn between the 3rd-5th week of integration/performance testing and the production implementation? Theoretically, if the data is identical between performance testing and production, NetApp dedup could leverage performance testing optimizations during the production implementation.
- Can customers run both FASTv1 and FASTv2 simultaneously since they have mutually exclusive volume requirements? Are both separately licensed? There are implementations where LUN level optimization may be preferred over sub-LUN.
- NetApp can simulate the benefits of PAM II in an environment. Can similar benefits be simulated for FAST prior to implementation?
- I assume that FAST will promote as much into SSD as possible to increase response time. How can customers determine when to grow that tier of storage?
- If a customer is using PAM II to meet a performance requirement, what can they do to reduce the impact of a PAM II failure?
- For both FASTv2 and PAM II, how can a customer migrate to a new array while keeping the current performance intact? With FASTv1, it is a simple LUN migration since it is determinable what tier a LUN is on. With FASTv2 and PAM II, it gets tricky (please note, I'm not talking about migrating the data, which is a standard procedure, I'm talking about making sure you hit performance requirements post-migration).
To be clear, this is an "apples to oranges" comparison. Each solution takes a completely different approach to implementing flash into an array, and the two solutions behave very differently.
Additionally, since I was focusing on Flash in particular, I neglected to compare cache capacity directly. DMX/VMAX has a much higher cache capacity than the NetApp arrays. Per Storagezilla on Twitter: "Symmetrix already has acres of globally accessible DRAM for read/write and doesn't need anything like PAM."
Finally, cost does play into comparing the two approaches, but I don't have access to any sort of real-world pricing.
The Tiers are Drying Up
A few days ago, NetApp CEO Tom Georgen pronounced that storage tiering is dead. Simon Sharwood has a good post linking to existing commentary... rather than re-link all the sites, I suggest you read his take on it (and associated links). The general consensus is that tiering is not dead.
A sentiment that I agree with, for the most part. Let's think about why most people tier, and what this pronouncement could possibly indicate.
A sentiment that I agree with, for the most part. Let's think about why most people tier, and what this pronouncement could possibly indicate.
- Many organizations tier storage to reduce the TCO of their environment. SSDs are expensive, SATA disks aren't, and not everything requires IOPS ahead of bulk capacity. All things being equal, if SSDs were the cheapest storage option for capacity and IO, then this reason for tiering would not be relevant (best performance in most cases for the cheapest cost... who wouldn't like that?).
- Many organizations tier storage to meet unique requirements such as WORM. Of course, it'd be silly to enact WORM policies on 100% of all storage in an enterprise. Since NetApp offers SnapVault, it is obvious he doesn't consider this tiering at all... they still offer SnapVault, right?
- So, what it basically comes down to is the standard direction that the industry is going towards; the adoption of SSDs + SATA drives for all storage. This is colloquially referred to as "Flash and Trash." Tom Georgen seemed to be advocating placing 100% of all data on SATA disks and using PAM SSD cards as an extremely large and fast cache to maintain performance requirements. As discussed on Twitter, the main drawback of this is that if you lose PAM, your storage response time is impacted until the replacement PAM is primed with the data. If you want more information, Mike Riley and Dimitris Krekoukias were the two NetApp resources on Twitter who were most engaged.
- Additionally, the NetApp stance is that WAFL and Raid-DP provides a RAID-6 implementation that is faster than traditional vendor's RAID-10 implementations... so basically even the SATA drives should perform admirably. The NetApp resources took this as an assumed fact and most of the other people involved didn't which made some of the retorts nonsensical at times. NetApp can show benchmarks that indicate this performance level is realistic (well, as realistic as a benchmark can get), and other vendors can show benchmarks indicating that WAFL performance degrades as it fragments. As someone in the "not-vendor" space, I can't weigh in on how much of an advantage this is, nor whether or not it actually degrades as time goes on. My gut feeling is that there isn't some magic algorithm that removes the parity impacts of RAID-6 after the volumes start filling up.
- Probably the best information presented on Data ONTAP over the last few months were over at Marc Farley's blog, StorageRap. The comments have a lot of good, understandable information on how WAFL works.