...

– Managing Data Archiving the Mountain Andrew Quinn

by user

on
Category: Documents
22

views

Report

Comments

Transcript

– Managing Data Archiving the Mountain Andrew Quinn
Data Archiving – Managing
the Mountain
Andrew Quinn
17/01/2012
IDC White Paper sponsored by EMC
“Extracting Value from Chaos”, June 2011
• The world’s collective data volume doubles
every two years
• Since 2005 the investment by enterprises has
increased 50%
• Over the next decade
– The amount of information managed by
enterprise datacentres will grow by ×50
– The number of files will grow by ×75
– The number of IT professionals will grow by less
than ×1.5
Challenges
• Increasing storage requirements means
more hardware, rack space, power, and
cooling
• Databases are typically provisioned on
high performance storage – now this
storage is used up by inactive data
• Increasing database sizes pushes
performance maximums
Challenges
• Data migrations become more problematic
• Backup storage requirements grow
• Backup and recovery times increase –
windows are pushed and recovery time
objectives can no longer be met
• A lot of the time this wasn’t planned for
when the system was designed!
Thailand Flood Impact on Storage
Source: camelegg.com, 17/01/2012
What is Active / Online Archiving?
• Data is removed from live system and
moved to an archive
• Archived data can be accessed directly by
users without the need to request retrieval
by an administrator
• Access is typically slower than from the
live system
Data Storage Tiers
Regular Access
High Speed
High Cost
Occasional Access
Slower
Moderate Cost
Rare Access
Notification Required
Low Cost
Data Storage Tiers
How Does Archiving Save Money?
•
•
•
•
Compression
Deduplication
Cheaper Storage Medium
Separation of Active and Inactive data for
Backup
Compression
aaabbbbbccccddd
Original File
aaabbbbbccccddd
Identify Repeated Data
3a5b4c3d
Replace Repeated
Data with Shorthand
Duplication
• User A opens a template document from
SharePoint
• User A saves a working copy to their personal file
share
• User A emails a copy to Users B and C for
approval
• Users B and C give approval, User A saves to
project file share
• User A emails a copy to a client
Deduplication
1 1 1
2
2 3
3 4 4
3 3 4
Cheaper Storage
24 × 500GB SAS RAID 10
4 × 2TB SATA RAID 5
Data Separation
• No easy way to back up
active and inactive data
separately
• Incremental backups not
a solution as they require
the restore of the last full
and any subsequent
incremental backups
• Regular backups of all
data are necessary
Data Separation
• Active data can be
backed up regularly
• Inactive (archived) data
can be backed up once or
infrequently
• Smaller backup windows
• Shorted recovery time
• Less backup storage
required
Data Separation
• Archive can be
partitioned based on date
• Further separation and
refinement of backups
possible
What Systems Can Benefit?
• Microsoft Exchange
–
–
–
–
Reduce mailbox size
Reduce database size
Eliminate need for PSTs
Meet retention
requirements
• Lotus Domino
– Reduce main NSF sizes
– Eliminate need for user
created NSFs
– Meet retention
requirements
What Systems Can Benefit?
• File Servers
– Reduce storage
requirements
– Provide version history
– Meet retention
requirements
• SharePoint
– Reduce SQL database
size
– Extend version history
– Meet retention
requirements
User Experience
• Depends on the type of data, and the product
being used
– Users may have to use a different program /
interface
– Users may use a different section of the same
program / interface
– Users may be redirected automatically
– Archived content may be wrapped into the
original program / interface via a plugin
– Archived content may be restored from archive
and accessed from the original location
Stub Files / Shortcuts
• Used to simplify the experience for users
• A small shortcut / stub replaces the
original file
• When users access the stub, an archive
retrieval is triggered behind the scenes
• After a short delay, the user is presented
with the full file
Stub Files / Shortcuts
Problems
•
•
•
•
Dead or missing shortcuts
Indexing / AV / Backups causing recalls
Searching archived content
Offline access to archived content
Where are the Costs?
• You’ll need to buy archiving licenses
– May be based on data volume or user count
• You’ll need a server to do your archiving
– May require new hardware, or Virtual
Infrastructure capacity
• You may need additional Operating
System or SQL licenses
• Who’s going to support it?
Case Study 1
•
•
•
•
150 mailboxes
Mailbox sizes are… epic
Mailbox downtime is very expensive
Performance is suffering due to data
volumes
• Migration planned which requires
downtime for mailbox moves
Case Study 1
• License cost is per mailbox
• Low mailbox count + huge data volume =
low cost + huge benefit
• Decreased mailbox sizes justifies cost due
to downtime reduction during mailbox
moves alone
• Archived!
Case Study 2
• 500 Mailboxes
• Mailbox sizes ~1-2GB
• Willing to spend money to avoid mailbox
downtime
• Performance currently adequate
• Migration planned – downtime not
necessary
Case Study 2
• License cost is per mailbox
• Larger number of mailboxes = larger cost
• Quite a bit of data, but mailboxes are fairly well
managed and there is no negative impact
• Anticipated to reduce storage by 0.5TB
(conservative estimate)
• If we spend the implementation cost on storage
how much could we get?
• A LOT more than 0.5TB!!!
• NOT Archived!
Personal Archives
Data Archiving – Managing
the Mountain
Andrew Quinn
17/01/2012
Fly UP