
8 of 11
P R O D U C T P R O F I L E
Copyright The TANEJA Group, Inc. 2008. All Rights Reserved
8 Elm Street, Suite 900 Hopkinton, MA 01 48 Tel: 508-435-5040 Fax: 508-435-1530 www.tanejagroup.com
storage in a scant 4GB of RAM. This
supports the TS 650G’s industry leading in-
line, single node throughput because element
identification and referencing is all
performed in main memory – no accesses to
disk are required. Competitive indexing
technologies such as hashing and content-
aware approaches have much less efficient
mapping algorithms, forcing them to
reference a disk-based index during the
capacity optimization process to map more
than around 20TB of base capacity. This
explains why alternative capacity
optimization technologies generally suffer
decreased throughput as the repository
grows; they run very fast when all the index
references can be handled in main memory,
but once they outgrow the available memory
and must touch disk, reference times can
slow down by two orders of magnitude. This
efficient index mapping design sets
HyperFactor apart, allowing it to scale
linearly for repositories up to 1PB in base
capacity. After HyperFactor completes the
de-duplication process, it then compresses
elements before they are stored.
The Importance of SCO TL Clustering
With this announcement, IBM is unveiling
gateway clustering along with support for a
global repository. Although today they are
supporting two node configurations, the
architecture is designed to support up to 16
nodes over time, providing a very scalable
growth path for high end customers.
Clustered TS 650Gs present a single VTL
image to backup servers across which single
system throughput can be scaled. Based on
data from ProtecTIER’s installed base, many
of their customers are seeing single node
sustained throughput in the 450MB/sec
range, with peak throughputs topping
600MB/sec. In adding a second node and
supporting a global repository, IBM is
pushing the sustained throughput rate into
the 900MB/sec range, with peak
throughputs even higher. Because the entire
index is mapped into the main memory of
each node, it doesn’t matter which node a
backup stream hits: it will enjoy the same
high level of performance.
When it comes to throughput in clustered
environments, there is an important
distinction between single system and
aggregated throughput. Single system
throughput identifies a throughput number
against a single repository, access to which
may be spread across multiple VTLs and
multiple processing nodes. In the TS 650G’s
case, multiple gateways leverage a global
repository, which makes the single node
throughput number additive as nodes are
added to scale the system. For example, a
single node TS 650G can sustain speeds of
450MB/sec, while a two-node cluster can
sustain 900MB/sec, all while accessing a
single large repository. Other competitors
talk about aggregate throughput numbers for
their clusters, which implies that they do not
support a global repository. In these
products, there is a separate repository for
each “node” so the performance numbers for
each node are not additive. Such products
lead to independent islands of storage, which
limits the capacity optimization ratios to
those achievable by a single node.
Enterprises that are looking to consolidate
their backup sets to improve efficiencies and
reduce management points, necessarily
prefer solutions with high single system