Isilon for SAS Grid Computing

In this post I want to share some information and experience regarding EMC Isilon for SAS Grid Computing.

What is SAS Grid Computing?

“SAS Grid Computing is an efficient way to analyze enormous amounts of data. SAS Grid Computing enables applications to deliver value in a highly effective manner for SAS analytics, data integration, data mining, and business intelligence, while enabling fine-tuning of the SAS grid environment to allow multiple applications to efficiently and dynamically use a virtual IT infrastructure.” [1]

shared filesystems survey:

From an infrastructure point of view it is important to know that a SAS Grid Application (running on multiple nodes in the SAS Grid) requires access to the same data on every node. Therefore a shared filesystem is required.

SAS did a survey of shared filesystems in [3] with different representatives: IBM GPFS, Quantum StorNext and GFS2. They also evaluated network attached storage systems with NFS and CIFS.

In case of NFS they SAS tested Isilon and wrote following statement:

“The benchmarks were run on a variety of devices including EMC® Isilon® […] NFS benefits from isolating file system metadata from file system data. Some storage devices like Isilon allow this isolation. Excellent results were obtained from devices [like Isilon] that both isolate file system metadata and utilize SSD’s for file system metadata.” [3]

SAS workload profile:

Another important attribute of SAS Grid Computing is that its workload tends to be very sequential and requires a very high amount of bandwidth. A sizing guideline by SAS is to provide more than 75 MB/s, for each compute core, from the storage. As you can see these environments easily requires multiple GB/s of bandwidth.

“Generally SAS I/O workloads are sequential reads and writes, and fall into the percentage range of 50/50 to 60/40 reads versus writes. For the purposes of this presentation, we will use the 50/50 read/write split. SAS will automatically adjust the I/O size based on the data set sizes. For data sets larger than a few MB, SAS will use 128KB chunks.” [2]

Simulate SAS workload:

If you want to simulate a general SAS workload with a 50/50 sequential mix, FIO [5] is the tool you should use. FIO is capable of doing alternating sequential read and write operations to one file. In our test we observed that this is very different to other benchmark tools like iozone, which can only start one thread with sequential read and another one with sequential write.

There is a major difference at the client side which results from memory handling. Without tuning the Linux kernel you get much lower bandwidth results with FIO compared to iozone. We also did testing with a real SAS application and its performance and behavior was very close to the FIO test. If you want to read how to setup a FIO test please have a look at [2]

Client tuning:

The kernel tuning required achieving the required bandwidth is not unusual and well known for tuning Oracle DB running on NFS. [4]

For our testing we used following kernel memory parameters:

vm.swappiness = 0
vm.dirty_expire_centisecs = 10
vm.dirty_writeback_centisecs = 10
vm.dirty_ratio = 10

Mostly this will tell the kernel to flush the memory much more often.

Isilon/EMC Whitepaper:

Besides these personal experiences with Isilon and SAS, we (EMC) have done a benchmark a while ago with SAS. [1] This white paper is worth reading to better understand the SAS Grid computing infrastructure with Isilon. Please note that this paper is not uptodate and was done with the previous OneFS major release (6.5). There will be an update with a newer OneFS 7.x release.

In conclusion Isilon is not only achieving excellent performance but at the same time it is much less complex compared to other solutions. Furthermore it can scale with growing SAS Grid Computing environments.

[4] (Page13)

Posted also on EMC Community Network

Leave a comment

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.