
 
 
8 of 11 
P R O D U C T   P R O F I L E  
 
Copyright The TANEJA Group, Inc. 2008. All Rights Reserved 
8  Elm Street, Suite 900     Hopkinton, MA  01 48      Tel:  508-435-5040    Fax:  508-435-1530      www.tanejagroup.com 
storage  in  a  scant  4GB  of  RAM.    This 
supports  the TS 650G’s industry leading in-
line, single node throughput because element 
identification  and  referencing  is  all 
performed in main memory – no accesses to 
disk  are  required.    Competitive  indexing 
technologies  such  as  hashing  and  content-
aware  approaches  have  much  less  efficient 
mapping  algorithms,  forcing  them  to 
reference  a  disk-based  index  during  the 
capacity  optimization  process  to  map  more 
than  around  20TB  of  base  capacity.    This 
explains  why  alternative  capacity 
optimization  technologies  generally  suffer 
decreased  throughput  as  the  repository 
grows; they  run  very fast when all the index 
references  can  be  handled in main memory, 
but once they outgrow the available memory 
and  must  touch  disk,  reference  times  can 
slow down by two orders of magnitude.  This 
efficient  index  mapping  design  sets 
HyperFactor  apart,  allowing  it  to  scale 
linearly  for  repositories  up  to  1PB  in  base 
capacity.    After  HyperFactor  completes  the 
de-duplication  process,  it  then  compresses 
elements before they are stored.     
 
The Importance of SCO  TL Clustering 
 
With  this  announcement,  IBM  is  unveiling 
gateway  clustering  along  with  support  for  a 
global  repository.    Although  today  they  are 
supporting  two  node  configurations,  the 
architecture  is  designed  to  support  up  to  16 
nodes  over  time,  providing  a  very  scalable 
growth  path  for  high  end  customers.  
Clustered  TS 650Gs  present  a  single  VTL 
image to  backup servers across which single 
system throughput can be  scaled.   Based  on 
data from ProtecTIER’s installed base, many 
of  their  customers  are  seeing  single  node 
sustained  throughput  in  the  450MB/sec 
range,  with  peak  throughputs  topping 
600MB/sec.    In  adding  a  second  node  and 
supporting  a  global  repository,  IBM  is 
pushing  the  sustained  throughput  rate  into 
the  900MB/sec  range,  with  peak 
throughputs even higher.  Because the entire 
index  is  mapped  into  the  main  memory  of 
each  node,  it  doesn’t  matter  which  node  a 
backup  stream  hits:    it  will  enjoy  the  same 
high level of performance.   
 
When  it  comes  to  throughput  in  clustered 
environments,  there  is  an  important 
distinction  between  single  system  and 
aggregated  throughput.    Single  system 
throughput  identifies  a  throughput  number 
against  a  single  repository,  access  to  which 
may  be  spread  across  multiple  VTLs  and 
multiple processing nodes.  In the TS 650G’s 
case,  multiple  gateways  leverage  a  global 
repository,  which  makes  the  single  node 
throughput  number  additive  as  nodes  are 
added  to  scale  the  system.    For  example,  a 
single  node  TS 650G  can  sustain  speeds  of 
450MB/sec,  while  a  two-node  cluster  can 
sustain  900MB/sec,  all  while  accessing  a 
single  large  repository.    Other  competitors 
talk about aggregate throughput numbers for 
their clusters, which implies that they do not 
support  a  global  repository.    In  these 
products,  there  is  a  separate  repository  for 
each “node” so the performance numbers for 
each  node  are  not  additive.    Such  products 
lead to independent islands of storage, which 
limits  the  capacity  optimization  ratios  to 
those  achievable  by  a  single  node.  
Enterprises  that  are  looking  to  consolidate 
their backup sets to improve efficiencies and 
reduce  management  points,  necessarily 
prefer  solutions  with  high  single  system