UPDATE [8/6/2010]: If you're interested in more current XIV information, I recommend reading Tony Pearson's recent posts here and here. He also provided additional information in the comments to one of my posts here.
Over the past few weeks, between the Wave and the blog posts, I've been thinking about XIV quite a bit. It has taken IBM quite a while to attempt to explain the impact and risk of double drive failures on XIV.
IBM definitely has an explanation, one that could have been told quite a bit ago. In fact, I'd assume that this is the same explanation they've been giving customers who pushed the point; that the risk is less that it seems due to quick rebuilds and the way parity is distributed between interface and data nodes. I realize that UREs are a very large concern, but to be honest, I bet less than 5% of customers even think about storage at that level. Perhaps the double drive failure issue is just a red herring that draws attention away from other issues.
One thing that continues to stick out in my mind is the ratio of interface nodes to data nodes. On the Google Wave, on of the IBM VARs made the following statement:
Remember there is more capacity in the data modules than in the interface modules. (9 data, 6 interface) Why they couldn't make this easy and have an equal number of both module types, I'll never know! :)The interface nodes are only 40% of the array. Even IBM VARs can't explain why this is a 40:60 ratio rather than 50:50. It increases the probability of double drive faults causing data loss at high capacity and it is a pretty specific design decision.
I wonder if it is related to the Gig-E interconnect and driving out "acceptable" performance from non-interface nodes. Jesse over at 50 Micron shares similar thoughts. Thinking this through (and this is all simply a hypothesis)... perhaps the latency and other limitations of the Gig-E interconnect are somewhat offset by having additional spindles (IOPS+throughput) on the "remote data nodes." I'd like to load a XIV frame to 50% utilization, run a 100% read workload at it, and see if the interface nodes are hit much harder than the data nodes (in effect, performing like a RAID 0, not a RAID 1). If that were true, for optimal performance you'd never want to load a frame past the point where new volumes would be allocated solely from data nodes.
I am not claiming this is true (no way for me to test it), but if XIV changes the interconnect to a different type (Infiniband, for example), I will find it interesting if "suddenly" there is a 50:50 ratio of interface to data nodes.