Violin Memory Blog

Hear what the experts at Violin Memory have to say about a variety of topics on all things Flash.

SQL Server and the WFA: Part 2 – Latency Matters

by on August 28, 2014

Greetings!

The Windows Flash Array (WFA) is all about high throughput, low latency, scalability, and balanced performance. We previously discussed its high throughput, and in this blog, we are going to focus on how latency really matters, especially if you want your databases to hum along providing copious value to your enterprise.

diagram-1-sql-server-wfa

Having a high level of IOPS may seem impressive (and it is), but that’s only part of the high performance equation. There are two other factors in SQL Server environments that are the greatest concern with respect to achieving the highest performance.

Latency is a measure of the time it takes for data to come back from a request. How much you can lower this measure is indicative of how much of an increase in CPU utilization and therefore decrease in application processing time (duration of reports, etc.) you can achieve.

The other factor is throughput. Each query has a finite amount of data to process. Faster storage doesn’t change the amount of data in a report, rather only the time it takes to deliver it. So a disk array that caps out at 200MBps will deliver a 4000MB a report in 20 seconds. In contrast, the WFA, which achieves 4000MBps, can complete the task in 1 second.

Poor storage performance leads to poor CPU utilization. CPUs spend more time waiting than executing, which means servers are expensive and mostly idle investments. This reality has not escaped the attention of CIOs, CFOs, and other senior management. The performance challenge grows more pronounced the greater the number of server cores you have in a system. Of course, today multi-core is par for the course, which means you are facing the aforementioned challenge.

When an application (or in our case, a database server) is active, it will reside in one of three queues:

  1. The Running Queue is where the work gets done. It is a currently executing process, and the point at which your servers are earning their keep, and perhaps a bit more. You are on the express train, sitting in First Class.
  2. The Waiting Queue is what its name implies; the process is waiting for some resource (I/O, network, locks, latches, etc.) to become available. You ran out of things you can do, so you’re stuck waiting at the train station for the rest of your team to get it together.
  3. The Runnable Queue holds processes that have their resources ready, and are waiting their turn for some CPU time. You’re ready to go, but the train already left the station, so you’ll hop on the next one as soon as it arrives and the conductor will show you your seat.

It’s not bad when 1 user is vying for the system resources.

It gets a whole lot worse with multiple users vying for the same resources.


Typically, disk arrays have good throughput until the number of users increases at which point logically sequential requests become physically random. Keep in mind that the concept of being logically sequential at the database layer is entirely different than it actually being physically sequential on the storage medium. Defragging, index rebuilds, clustered indices, read ahead processes, large batch requests, etc. all may cause the DBA to believe that storage is being accessed sequentially; however, a SAN layout with striped LUNs, multiple components, and support of multiple workloads means that it is almost certainly not sequential on disk or the access of it is being randomized due to concurrency of requests with other users.

Our patented Flash Fabric Architecture™ overcomes this resource contention for storage. Put on your geek hats, here’s how it works:

  1. One 64K inbound write is split into 16 separate 4K packets
  2. Each 4K packet has parity calculated creating a total of 5K of data to store
  3. The 5K final packet is split into five 1K packets and one is sent to each of the five VIMMS in a RAID group

As you can see, the packet is broken into small batches and processed in parallel; this is what helps drive down latency. In our large capacity arrays there are 64 high-speed, custom-built, flash controllers processing 1K vs. commodity SSD controllers processing 64K (RAID 1 or RAID 10 will send the full packet to both SSDs that host a log or tempdb device). The result is that data is spread across the array so there is no clumping of data, data locality or ultimately, hot spot issues. This means the WFA can scale to support massive user concurrency for both random and sequential focused workloads. Connecting through SMB Direct can free up to 30% of CPU usage; these cycles could be put towards BI initiatives or other value added activities.

For example, you can hammer away on OLTP workloads while also executing highly parallel BI with many simultaneous related queries, without dragging down your overall performance. For example, let’s say your organization has a 24-64 core DW/BI solution. Thus, you have the capacity to run in a parallelization factor of 24-64. In order to justify a dedicated device for parallelization you would need to data in a factor over 64.

Running a mix of multiple database workloads against a single array is now not only possible, but you can do so with incredible performance, consistent latency, and overall higher server and storage efficiency. Of course, if consolidation or virtualization of SQL Server instances is your preferred approach, the WFA can enable you to increase virtualization density as well.


With the low latency of the WFA, you can achieve these levels of performance enhancement to your SQL Server without any storage tuning, upgrading or modification:

  • 3-10x faster reports
  • 3-6x faster transactions
  • 2-5x higher consolidation of SQL Server VM farms without performance penalty
  • Streamlined re-index process drops from 20+ milliseconds to sub-millisecond
  • Index and Stats defrag locking process can be released in a sub-millisecond
  • Grow your database size without taking a performance hit
  • 24×7 maintenance (do backups, index maintenance, etc., during the day) so if there’s an issue, it’s when you’re at work, not when you’re asleep

As you can see from this and the previous blog, we can do a lot in 3U to not only boost your SQL Server performance, but transform the economics of SQL Server as well. Finally, you can have storage that is so fast that it doesn’t require you to change your behavior; you can run tasks during the day while you’re at work, so why not buy the storage solution that allows you to do so?

Nonetheless, performance alone will not meet the needs of a 21st century data center. Transforming the efficiency and economics of the data center, which ultimately yields reduced OPEX and CAPEX, is just as essential for overall corporate success. We’ll continue on this theme the next time.

Cheers!

Learn how Violin and Microsoft collaboratively optimized this solution to leverage WFA’s unique performance profile so you can run your SQL Server environment in a Flash.

For more information on the WFA and SQL Server, go to http://www.violin-memory.com/windows-flash-array-sql-server-solution-brief/

> PREVIOUS: Part 1 of SQL Server and the WFA — Lots of I/O

SQL Server and the WFA: Part 1 – Lots of I/O

by on July 31, 2014

Greetings!

As you probably already know, Violin Memory has announced all-flash arrays integrated with Windows Storage Server 2012 R2, dubbed the Windows Flash Array. If you’ve had a chance to give them the once over, it’s obvious that they are a far cry from a white box server with JBOD attached running Windows Storage Server. The WFA is all about high throughput, consistent low latency, scalability, and balanced performance. It’s a true enterprise solution. Of course, this does not happen by generic technology combinations; rather it requires differentiated hardware, the tightest possible software integration, and solution-focused engineering with a vision to excel, or is that accel? Well OK, it’s both.

The unique performance of WFA is compelling for SQL Server workloads. For example, a common database application such as OLTP generates lots of random I/O, which makes sense as it’s driven by copious ad hoc inquiries from all across the enterprise. However, such a workload is a mismatch with legacy storage’s linear read/write focus. This results in high latency and poor application performance. Yet these queries are often associated with high-value real-time applications such as customer support, sales, inventory control, etc. As a result, storage professionals have tried their best to coax more performance from their cylinders of spinning rust but their success has been limited at best.

The Violin Windows Flash Array (WFA) combines Violin Memory’s patented Flash Fabric Architecture™ (FFA), Microsoft’s fast SMB Direct protocol, and Microsoft Windows Storage Server 2012 R2. That sounds very nice, but what does it mean? Look inside of a WFA, and you’ll note that there are no disks of any type (HDD or SSD). As a result, there is no need for striping techniques and optimization algorithms, nor is there a need to over provision storage or add cache to controllers in an attempt to garner enough I/O capacity. The FFA delivers fast consistent access to our all-flash storage so you don’t have to continuously monitor data hot spots and tune accordingly. In other words, it’s fast without your constant worry and attention.

graph-sql-reads-writes

How is fast is fast? Pretty darn fast. We did some SQLIO Load testing on a WFA-64 equipped with 56GBb FDR InfiniBand connectivity at a customer site; here’s the performance we observed:

The WFA delivered over 1.2 million IOPS on SQL reads. That’s up to 50% higher performance compared with an industry-standard all-flash array connected through Fibre Channel.

The WFA delivered 1.2 million IOPS on SQL writes, which was proportionally even more impressive. That’s up to twice the throughput compared with an industry standard all-flash array with Fibre Channel.

From these numbers you can see that the WFA is up to the task of delivering the sustained performance for short block random read/write workloads. And, there is no ongoing tuning required. But did you know that this performance was achieved over a file-based solution, not block?

The WFA delivers DAS-like performance due to its support for SMB Direct, also known as SMB over RDMA. You can have simplicity of a file-based environment but without sacrificing the performance you’re accustomed to with block-based solutions. The advantages of file over block can fundamentally change your ease of use and efficiency. So much so, this is worthy of a discussion of its own; we’ll dive in depth on this in a future blog.

From an I/O perspective, we can do a lot in 3U to boost your SQL Server performance. But just attaining a high IOPS number does not guarantee balanced performance for a mix of multiple workloads. Achieving consistent low latency is essential, as this will transform how your SQL database and related apps perform, and your expectations! We’ll continue on this theme the next time.

Cheers!

Learn how Violin and Microsoft collaboratively optimized this solution to leverage WFA’s unique performance profile so you can run your SQL Server environment in a Flash.

For more information on the WFA and SQL Server, go to http://www.violin-memory.com/windows-flash-array-sql-server-solution-brief/

> NEXT: Part 2 of SQL Server and the WFA — Latency Matters

Why Implement an All Flash Data Center?

by on June 23, 2014

pic-all-flash-data-center

The argument goes something like this: if flash costs more than disk, why would you spend the money on an all-flash data center?  Some might suggest that you just use flash for intense I/O applications like databases where you can justify the additional expense over disk.

What we see from our customers is different.  Not all have gone all-flash, but for those that have the benefits are many.

All-flash data centers can provide new sources of revenue.  Lower operating costs.  Elimination of slow I/O workarounds.  Improved application response times.  Faster report turnaround.   Simplified operations. Lower capital costs.

As a storage subsystem manufacturer, we put together the best system we can design, but we are constantly being schooled by our customers.  For instance, we had a large telecom customer who felt they were missing some billing opportunities and redesigned their customer accounting software.  When they implemented it on their traditional storage system, they didn’t see much benefit. They saw the application wanted even more I/O, and brought in Violin.  As a result they found over $100 million in new revenue.  That paid for the project handsomely, of course.  This is revenue that wasn’t available with traditional storage, but is captured due to Violin’s low latency.

Another example of how flash storage changes the data center is the impact of low latency on servers and the software that runs on them.  Moving to a Violin All Flash Array speeds I/O so much, the traditional layers of overprovisioning and caching can be eliminated.  The result: better application performance with lower costs.  Customers have also told me they can also free up people from this consolidation to redeploy on more productive efforts since there is no need to manage the overprovisioning and caching infrastructure.

However, not all All Flash solutions are created equal.  SSD solutions are inferior to a backplane-based approach like Violin’s Flash Fabric Architecture™.  Consider key operating metrics such as power and floorspace.  For instance,  70 raw TB from Violin takes 3RU of space.  Common SSD-based solutions take 12RU (or more) for the same raw capacity. This density also translates into power.  The Violin 70TB will take 1500W, while common SSD approaches may take over 3000W for the same capacity.  This translates into operating expense savings.  One customer recently estimated they would save 71% in operating costs with Violin over traditional storage.

Additionally, Violin Flash Fabric Architecture provides superior performance, due to the array-wide striping of data and parallel paths for high throughput that holds up under heavy loads.  It also provides for better resiliency, since hot spots are essentially eliminated.  The result is not just a big step up over traditional disk storage, it is a significant improvement over SSD-based arrays.

Customers who have gone all-flash for active data have found they can buy the new storage and server equipment, and still have money left over.  This is in addition to any new sources of revenue realized, such as the Telecom example.  Flash is essentially free.

The last hurdle has been data services.  Some customers who have Violin installed love the performance, but were hesitant to put all their data on it because they wanted to have the enterprise level availability features.  Capabilities such as synchronous and asynchronous replication, mirroring and clustering give enterprises a robust tool kit.  They configure their data centers in a variety of ways that will protect against local issues like fire, metro area problems like hurricanes/typhoons, and regional issues with a global replication.  These capabilities now exist in the Concerto 7000 All Flash Array from Violin Memory.  This allows enterprises who want to experience transformative performance to also employ the operational capabilities they need to meet their data center design goals.

The move to the all-flash data center is upon us.

The question really is: Why wouldn’t you implement an all-flash data center with Violin?

For more information go to www.violin-memory.com

Violin & Microsoft’s High-Performance, All-Flash Enterprise Storage

by on April 24, 2014

Guest Blog:
Scott M. Johnson
Senior Program Manager, Windows Storage Server
@supersquatchy

Hi Folks –

Violin Memory today announced the Violin Windows Flash Array (WFA)—an all-flash, high-performance storage appliance powered by Windows Storage Server 2012 R2.

Here are the highlights:

  • The WFA is the result of a joint development effort between Violin and Microsoft, which spent more than 18 months working together on software optimizations to get the most out of the array’s unique performance profile.
     
  • It takes advantage of SMB Direct (SMB 3.0 over RDMA) and Violin’s flash memory optimization algorithms to deliver an ideal combination of low latency, high IOPS, and low CPU use.
     
  • Performance provided by the WFA makes it an ideal storage platform for major enterprise workloads, including Microsoft SQL Server, Hyper-V virtualization, and Virtual Desktop Infrastructure (VDI).

If you’re not already coveting a Windows Flash Array for your own IT infrastructure, let me put things another way: This incredibly powerful storage appliance delivers 70 terabytes (TB) raw storage capacity, nominal latencies of less than 500-microseconds, and more than 750,000 4K IOPS—all in a 3U, dual-node cluster that draws about 1500 watts and can be deployed in as little as 30 minutes, without an advanced degree in storage networking!

Now that I’ve covered the highlights, let’s take a deeper look at Architecture, Performance, Scalability, Availability, and Manageability in a three-part series.

Read More…

Software Defined Datacenter for Windows

by on April 21, 2014

Windows in the Enterprise has been a SQL Server, SharePoint, Exchange kind of thing, with a little Hyper-V thrown in.  In many cases there might be several, even hundreds of SQL databases throughout the enterprise, sometimes consolidated under Hyper-V, sometimes not.  Apps such as SharePoint and Exchange can sometimes grow to the point where a difficult and hard to maintain.  One of the common sore points in datacenter computing, not just Microsoft environments, is the I/O.  Mechanical storage is just too slow.  Caching helps.  Overprovisioning helps.  They don’t fix the problem, however.

To really fix I/O problems, you need to rethink the datacenter.  Microsoft and Violin have done that with the new Windows Flash Array.  For maximum flexibility, availability and affordability, you need to virtualize each of the 3 datacenter layers: compute, network and storage.  You’ve probably heard this described as software defined computing.  In the first layer, compute,  resources are virtualized by Hyper-V in a Microsoft datacenter.  Although this has not been extensively used, it is included in your Windows Server license, and is much improved, so use it.  In the second layer, network,  resources are virtualized in the latest Windows Server 2012 R2.  This enables the use of SMB Direct and multichannel networking.  This provides link failover, bandwidth balancing, and with SMB Direct, screaming performance with remote direct memory access (RDMA) which can bypass most of the OS software stack, and operate at better than Fibre Channel speeds, for less money.

WFAblogImage

 

Finally, storage needs to be reformed.  This comes in two pieces: the file system and the storage itself.  Windows Server 2012 R2 includes support for SOFS, or Scale Out File System.  This provides a uniform naming convention that allows seamless growth for storage as applications grow.  This layer is critical for the software defined datacenter, and it is now available from Microsoft Windows Storage Server 2012 R2.  Violin’s Windows Flash Array provides the hardware platform that makes all this possible with a Flash array that has been tuned to take advantage of Windows Storage Server 2012 R2.  That’s right, Violin’s Windows Flash Array and Windows Storage Server 2012 R2 have been designed to work together.  So far, Violin is the only vendor so honored by Microsoft.

You might be thinking “that sounds neat, but is there any real benefit to me?”   Windows applications using Violin All Flash Storage Arrays do perform at an extreme level.  Windows applications using the Violin Windows Flash Array take performance to a new level.  By using SMB Direct to connect the application server and the Violin Windows Flash Array you can get up to 2x the performance of the leading all flash array.  It also gives you the proven storage features of Windows Storage Server 2012 R2, such as deduplication, compression, thin provisioning, snapshots, mirroring, encryption, migration, tiering and virtual desktops.

This is interesting enough, but taken to the next step, it provides a framework to remake your Windows datacenter using SMB Direct with RDMA.  Remaking your Windows datacenter with this architecture almost eliminates the software stack latency, and is the basis for that humongous increase in performance, ease of use and manageability.  It does make one think.

More Posts ...

Featured Posts