Big, Big Storage Systems - Fact or Fiction?
- TAGS:Data Direct, data storage, HPC, RAID 6, SATA, storage
- IT TOPICS:Storage
I talked to Dave Fellinger - CTO of Data Direct Networks and the discussion made me think about big, big storage systems. We are entering the era of PB storage systems. But what is reality and what is fiction? There are performance, reliability, footprint and cost issues that must be considered. Here are some questions to ask:
1. If the storage vendor claims they can support up to a PB of capacity ask them to explain how the system can run optimally with that much storage? Ask for real world proof.
2. A storage system with that much capacity will inevitably have disk drive failures. What happens to performance of primary I/O when a RAID rebuild occurs? RAID 5? RAID 6?
3. How long does the RAID rebuild take? Days? Hours? Minutes? Seconds? Some next generation storage systems claim to do RAID 6 rebuilds rapidly - in minutes or less. Ask them how they achieve this and again - get proof.
4. With that many disk drives your chances of silent data corruption will go up. Check out the CERN report published in April 2007 that analyzes silent data corruption and a great article by Robin Harris that analyzes the numbers. Based on the findings - a PB data storage statistically will have 2,500 corrupt files that you won't even know about - and that is with non-compressed files - the number goes up with compressed files. This of course is unacceptable. How does your storage system deal with silent data corruption? Will it even detect silent data corruptions? Will it fix them once detected? And do any of these processes impact performance?
Dave had great answers for all of the above and I found an ESG Lab report that was published in 2008 and it analyzed the Data Direct solution and architecture. I used to be a part of the ESG Lab team so I was curious to read the report and it validates a lot of what David claimed. For example, the ESG Lab report verified that Data Direct performed a RAID 6 rebuild of a 1 TB drive in just 30 seconds. That wasn't a typo - a 1 TB RAID 6 rebuild in 30 seconds! Some storage systems literally take days to perform this task.
ESG Lab also validated that Data Direct supports up to nearly 1 PB of capacity (in just 2 racks - which is pretty amazing) without a performance hit. Additionally, the report does a good overall job of explaining the Data Direct technology and they ran performance, reliability, and scalability tests with impressive results.
The claims of PB storage systems must be carefully considered - just housing lots of lots of disk drives physically doesn't mean that the storage system will perform optimally and reliably. And the issues of disk failures, RAID rebuilds and silent data corruption and how the storage system - especially one that is going to be used for 100s of TBs and potentially PBs - should be intelligently answered by any storage vendor that wants your business.



