More on Microsoft IT using L2 SAS DAS for Exchange
- TAGS:backup tape, cluster, clusters, collaboration, collaboration software, disk, disk array, e-mail, email, enterprise, Exchange, Microsoft, Microsoft Exchange, MSFT, raid, replication, sas, SATA, serial ATA, tapes
- IT TOPICS:Careers, Data Center, Enterprise Apps, Hardware, Infrastructure Management, Servers, Storage, Windows
This is a followup post to last week's series about how Microsoft IT built an unusual Exchange message store architecture.Â
As a recap, here are the highlights of what Microsoft IT revealed:
- Not using disk arrays, but direct-attached storage
- Not using RAID or clustering, but continuous replication onto spare servers
- Not using tape backup, but cheap disk arrays
If you missed any of the previous parts of this series, you can click on the links above. Go ahead, I'll wait...
Some folks are assuming that the SAS -- serial attached SCSI -- drives referred to in the first post are the relatively expensive, 2.5" 15K RPM type. They're not: the clue is in the word "nearline" (aka Level 2).
What the market calls a Nearline SAS drive is typically a 3.5" drive that spins at 5K or 7K -- just like their SATA cousins, which share similar mechanisms. The improved performance over Enterprise SATA drives, estimated by MSIT as about 25%, comes from better command queuing, not from faster platters.
In a heavily-used email store at peak times, I/O command queues can grow surprisingly long. Depending on how heavily your users load up the server, you could plan for peaks of the order of an IOPS per user. In the Microsoft design, this means about 350 IOPS per spindle -- that's far more than the actual drive can manage, but remember that these are transient peaks, not a steady-state average.
That's why good command queueing is important. The deeper queues provided by SCSI's TCQ are helpful here, as is the ability to tag commands for explicit ordering of execution, which NCQ doesn't offer. (Don't confuse SCSI TCQ with the ill-advised attempt to glue TCQ onto IDE drives a few years back.)
Other performance factors include better vibrational tolerance and full-duplex operation:
- Vibration can be a significant problem when several drives are packed together. The tracks on today's ultra-dense drives are brain meltingly tiny, so head positioning is critical. Vibration especially affects write performance -- head positioning for reads is less critical. If you're a fraction of the track pitch out of alignment, you can usually read the data just fine, but you need to be more accurate when writing and that extra fraction of accuracy takes time.
- Email database access patterns are a random mix of reads and writes (more reads than writes, naturally). Writes need to be sychronous for data resiliance, so writes can cause half-duplex systems to stall. Again, TCQ's command ordering features are useful here.
Of course, it's all very well building up long command queues, but it's no good if interactive performance suffers. That's why MSIT has switched its users to Outlook cached mode. Users are insulated from database latency, because the Exchange MAPI/RPC operations are happening in the background, to populate a local cache. The user interacts with the cache, not directly with the server.
I didn't hear Microsoft IT disclose which vendor it uses for storage, but I have a feeling it's using Dell PowerVaults with Seagate ES.2 spindles. Dell and Seagate respectively quote a 30% or 33% IOPS advantage over the equivalent enterprise SATA drives (although, tragically, some of the Dell marketing bumf talks about 133% improvements).

