Comments
Transcript
– Managing Data Archiving the Mountain Andrew Quinn
Data Archiving – Managing the Mountain Andrew Quinn 17/01/2012 IDC White Paper sponsored by EMC “Extracting Value from Chaos”, June 2011 • The world’s collective data volume doubles every two years • Since 2005 the investment by enterprises has increased 50% • Over the next decade – The amount of information managed by enterprise datacentres will grow by ×50 – The number of files will grow by ×75 – The number of IT professionals will grow by less than ×1.5 Challenges • Increasing storage requirements means more hardware, rack space, power, and cooling • Databases are typically provisioned on high performance storage – now this storage is used up by inactive data • Increasing database sizes pushes performance maximums Challenges • Data migrations become more problematic • Backup storage requirements grow • Backup and recovery times increase – windows are pushed and recovery time objectives can no longer be met • A lot of the time this wasn’t planned for when the system was designed! Thailand Flood Impact on Storage Source: camelegg.com, 17/01/2012 What is Active / Online Archiving? • Data is removed from live system and moved to an archive • Archived data can be accessed directly by users without the need to request retrieval by an administrator • Access is typically slower than from the live system Data Storage Tiers Regular Access High Speed High Cost Occasional Access Slower Moderate Cost Rare Access Notification Required Low Cost Data Storage Tiers How Does Archiving Save Money? • • • • Compression Deduplication Cheaper Storage Medium Separation of Active and Inactive data for Backup Compression aaabbbbbccccddd Original File aaabbbbbccccddd Identify Repeated Data 3a5b4c3d Replace Repeated Data with Shorthand Duplication • User A opens a template document from SharePoint • User A saves a working copy to their personal file share • User A emails a copy to Users B and C for approval • Users B and C give approval, User A saves to project file share • User A emails a copy to a client Deduplication 1 1 1 2 2 3 3 4 4 3 3 4 Cheaper Storage 24 × 500GB SAS RAID 10 4 × 2TB SATA RAID 5 Data Separation • No easy way to back up active and inactive data separately • Incremental backups not a solution as they require the restore of the last full and any subsequent incremental backups • Regular backups of all data are necessary Data Separation • Active data can be backed up regularly • Inactive (archived) data can be backed up once or infrequently • Smaller backup windows • Shorted recovery time • Less backup storage required Data Separation • Archive can be partitioned based on date • Further separation and refinement of backups possible What Systems Can Benefit? • Microsoft Exchange – – – – Reduce mailbox size Reduce database size Eliminate need for PSTs Meet retention requirements • Lotus Domino – Reduce main NSF sizes – Eliminate need for user created NSFs – Meet retention requirements What Systems Can Benefit? • File Servers – Reduce storage requirements – Provide version history – Meet retention requirements • SharePoint – Reduce SQL database size – Extend version history – Meet retention requirements User Experience • Depends on the type of data, and the product being used – Users may have to use a different program / interface – Users may use a different section of the same program / interface – Users may be redirected automatically – Archived content may be wrapped into the original program / interface via a plugin – Archived content may be restored from archive and accessed from the original location Stub Files / Shortcuts • Used to simplify the experience for users • A small shortcut / stub replaces the original file • When users access the stub, an archive retrieval is triggered behind the scenes • After a short delay, the user is presented with the full file Stub Files / Shortcuts Problems • • • • Dead or missing shortcuts Indexing / AV / Backups causing recalls Searching archived content Offline access to archived content Where are the Costs? • You’ll need to buy archiving licenses – May be based on data volume or user count • You’ll need a server to do your archiving – May require new hardware, or Virtual Infrastructure capacity • You may need additional Operating System or SQL licenses • Who’s going to support it? Case Study 1 • • • • 150 mailboxes Mailbox sizes are… epic Mailbox downtime is very expensive Performance is suffering due to data volumes • Migration planned which requires downtime for mailbox moves Case Study 1 • License cost is per mailbox • Low mailbox count + huge data volume = low cost + huge benefit • Decreased mailbox sizes justifies cost due to downtime reduction during mailbox moves alone • Archived! Case Study 2 • 500 Mailboxes • Mailbox sizes ~1-2GB • Willing to spend money to avoid mailbox downtime • Performance currently adequate • Migration planned – downtime not necessary Case Study 2 • License cost is per mailbox • Larger number of mailboxes = larger cost • Quite a bit of data, but mailboxes are fairly well managed and there is no negative impact • Anticipated to reduce storage by 0.5TB (conservative estimate) • If we spend the implementation cost on storage how much could we get? • A LOT more than 0.5TB!!! • NOT Archived! Personal Archives Data Archiving – Managing the Mountain Andrew Quinn 17/01/2012