Memory Fault Tolerance


Violin memory systems have been designed from the ground up with memory fault tolerance. Memory failures should not lead to loss of application data or application crashes. Violin Switched Memory (VXM) provides extreme data reliability through the following capabilities:

ECC: ECC on each module protects against bit errors and individual device failures.
RAID: Data is striped across four modules and parity is placed in a redundant fifth module. If a module fails, data is recovered automatically using the parity data.
Global Hot Standby Modules: If an active module in a RAID group fails, the data is recovered to a Hot Standby module which may be located in any slot. This module becomes part of the RAID group.
Flexible RAID Groups: RAID groups may be created out of any set of modules within a large pool. This reduces the number of Hot Standby modules required.
Fail-in-Place Support: VXM's redundant topology, Flexible RAID Groups and Hot Standby Modules allows a Fail-in-Place maintenance strategy. Any 2 modules in a system can fail without interrupting service.
Non-disruptive Replacement: Failed modules may be replaced without disrupting the system or the applications.
Strong CRC: VXM uses 32 bit Cyclic Redundancy Checks (CRCs) on all internal links. This detects internal errors and avoids any data being corrupted.

The following figure shows a VXM network with data striped across 4 modules and a 5th module storing the parity information.