TECHNOLOGY IN BRIEF THE OBJECT EVOLUTION EMC OBJECT-BASED STORAGE FOR

by user

on 15 сентября 2016

Category: Documents

>> Downloads: 3

views

Report

Comments

Description

Download TECHNOLOGY IN BRIEF THE OBJECT EVOLUTION EMC OBJECT-BASED STORAGE FOR

Transcript

TECHNOLOGY IN BRIEF THE OBJECT EVOLUTION EMC OBJECT-BASED STORAGE FOR

TECHNOLOGY IN BRIEF
THE OBJECT EVOLUTION
EMC OBJECT-BASED STORAGE FOR ACTIVE ARCHIVING AND
APPLICATION DEVELOPMENT
NOVEMBER 2012
A few years ago, object-based storage made a huge splash on-premise with the
promise of meaningful data relationships, information accessibility and strong
compliance. It remains an important component for information management
based on compliance and single-tenant architectures. However, the evolution of
object-based storage has big implications for the cloud and unstructured data:
new approaches to active archiving, web/mobile application development and a changing model for
cloud storage service providers.
Object storage is optimal for the web. It has a very different architecture from file systems, which
are frankly overkill for most cloud storage. On-premise can be a different story; having data close to
hand under single-tenant access control is right for some data storage. But on-premise stored data
requires that the enterprise maintain a primary data center, a cold data center for DR, replication,
continuous data protection, and so on. Given the right set of needs this is a fine trade-off of course
and we certainly do not counsel people to get rid of their internal data centers and redundant
systems.
However, cloud-based object architecture offers big benefits for storing unstructured data for
active archiving, global access to data, fast application development and much lower cost compared
to the high computing and data protection costs of on-premise NAS. EMC has engineered Atmos to
provide these capabilities and many more as a massively scalable, distributed cloud-based system.
In this Technology in Brief we will examine the fast-changing world of archiving and development
on the web, and how object-based storage is the best way to go for these monumental tasks.
When Object Trumps File
The go-to architecture for unstructured data has traditionally been an application-centric system
containing the operating system, the application, and a NAS filer using hierarchical file architecture.
This infrastructure works acceptably well in a slow-growth, consistent workload setting; although
even then it is far too easy to add complexity along with additional systems and filers.
However, business needs have evolved far beyond this sleepy storage model. Unstructured data
now comprises a massive portion of large data growth, and hierarchical file systems are difficult to
optimize and scale. For example, file system-based storage requires near-constant provisioning. As
storage requests grow (which they inevitably do), IT administrators must manually provision
storage to meet the expanded requirements. Meanwhile, large volume and spiky workloads make
provisioning both “up” and “down” an expensive and time-consuming proposition.
And difficult provisioning is hardly the only problem: siloed data protection with individual backup,
replication and archiving applications steadily raises OPEX. Scaling is an issue as well. Large critical
big data applications may warrant scale-out or scale-up file systems (which are challenges in and of
Copyright The TANEJA Group, Inc. 2012. All Rights Reserved.
87 Elm Street, Suite 900  Hopkinton, MA 01748  T: 508.435.2556  F: 508.435.2557
1 of 12
www.tanejagroup.com
Technology in Brief
themselves). Most do not rate this architecture, and instead reside on poorly scalable systems. The
number of these systems grows as applications come online, making it even harder for IT and
application owners to administrate and for users to get the value from the application that they
need. This already difficult scenario gets even worse when NAS storage is used for what is
essentially a cloud use case, such as extending existing assets over the cloud.
Figure: Traditional NAS infrastructure 3
In contrast to hierarchical file system-based storage silos, object-based storage opens up a whole
new range of dynamic functionality. Object-based storage assigns unique object IDs to access data
across all federated locations. This goes a long way towards eliminating traditional, timeconsuming storage management tasks like LUN creation and RAID groups. Active archives and
applications needing fast global access particularly benefit from global namespaces and location
transparency. The flat, universal namespace allows global access to stored content from anywhere
the distributed application runs. Applications can also efficiently associate metadata with stored
objects without using a dedicated database. Sharing vast storage resources means application
administrators do not need to modify application files. Object-based storage usually has elements of
file systems in order to handle processes like file archiving, but it is not founded on that
architecture and its drawbacks.
Object-based storage originally developed as a type of specialized NAS storage where the
hierarchical system was replaced with an object-oriented system that made file storage far more
secure and scalable. One of its most popular incarnations is still going strong today: ContentAddressable Storage (CAS). A subset of object-oriented storage, CAS ensures there is only one ID for
any object. When the CAS object is retrieved, it can be hashed again and checked against its ID to
verify identity. CAS de-dupes at the object level for copy control.
Copyright The TANEJA Group, Inc. 2012. All Rights Reserved.
87 Elm Street, Suite 900  Hopkinton, MA 01748  T: 508.435.2556  F: 508.435.2557
2 of 12
www.tanejagroup.com
Technology in Brief
TABLE: CONTRASTING FILE SYSTEMS WITH OBJECT STORAGE
Characteristic
File
Object
Metadata
File
systems
implement
a
centralized file layer metadata
service that tracks directory
structures, permissions, and on-disk
locations of files. All file requests
must access metadata first for
permission and file information.
Object metadata is stored along
with the object data to avoid
metadata service bottlenecks. This
ID may be used to also uniquely
verify and validate the data being
stored.
Namespace
File
systems
have
built-in
namespace constraints for files and
directories they can store and
manage. Hierarchical directory
structures can become unwieldy,
performing poorly at navigating
large numbers of users or files.
Object storage provides a single
flat namespace for objects.
Replacing path and filenames with
object identifiers makes the
address space practically infinite
with very fast performance for
users and applications.
Interaction
File systems are designed to offer
in-place editing and updating of
files using sophisticated, yet highly
complex,
locking
and
synchronization mechanisms. These
methods make it difficult to
distribute or extend file systems
across multiple locations.
Objects are inherently immutable
once stored under a unique ID,
and can be easily replicated and
accessed globally. Programming
for object storage leads to simpler,
supportable, and more reliable
programs.
Cloud Applications
File systems present a real
challenge for cloud-based archival
management
and
mobile
application
delivery.
Poor
scalability, lagging performance,
and
complex
application
development make traditional file
systems a poor choice for
compelling new cloud usages.
Object stores are simple, clean and
quick to access. Since objects are
easily distributed, replicated, and
globally accessible in the cloud,
they are ideal for active global
archives and distributed mobile
applications.
Object-based storage both on-premise and in the cloud require certain key capabilities. On-premise
object storage has great benefits for local file storage including multiple application access, massive
scaling, high availability; and in some architectures, information governance as well.

Multiple application access. Applications simultaneously leverage the same centralized
object-based storage infrastructure. This enables local object-based storage to execute
application-specific archiving management attributes for a complete chain of information
custody.

Massive scaling. Massive scaling is problematical with file-based archive solutions. As the file
system reaches its maximum capacity, administrators must expand the entire system’s
operating system, file system and application in order to scale the archive. By contrast, object-
Copyright The TANEJA Group, Inc. 2012. All Rights Reserved.
87 Elm Street, Suite 900  Hopkinton, MA 01748  T: 508.435.2556  F: 508.435.2557
3 of 12
www.tanejagroup.com
Technology in Brief
based storage can expand in an open fashion into multiple petabytes due to their flat address
space.

High availability. Object storage often archives data that has heavy retention and government
requirements. In this environment, 5 9’s or higher availability (99.999%) is a necessity.
Mirroring and parity help to protect availability; other beneficial features include self-healing,
detecting and fixing soft corruptions in the background, and addressing hardware failures
before they impact data availability.

Information governance. A subset of object-based storage, Content-Addressable Storage
(CAS) is purpose-built for long-term defensible retention of fixed files and data. As opposed to
other archival storage methods like tape or monolithic “tar” files that bundle data up and/or
move it offline, CAS stores data as objects that can be strictly and individually managed for
governance and compliance and yet remain actively accessible on-line.
Best Practices: Object and the Cloud
We strongly support on-premise object storage such as CAS for local space savings, performance
and information governance. However, we find that object storage is roaring to life in the cloud,
where cloud-based active archiving and application development require highly distributed and
single namespace storage for unstructured content. These critical usage cases benefit far more from
object-based storage than they do from traditional file systems. Let’s look at best practices
architectural features for object-based storage in the cloud.
DATA AND METADATA
When data is stored as an object, a unique object identifier is created out of a single universal global
namespace. The object ID is retained by the client application and used to subsequently retrieve
that object. Objects can effectively live anywhere in the cloud-wide system without the storage
client needing to know about actual data locations, file system structures or LUN details. This
provides a complete location transparency that serves to reduce intentional storage management
and inherently supports globally distributed access by web and mobile applications.
Because of the location transparency provided by the object storage layer, objects can be
automatically load-balanced across nodes, and replicated within and across sites without
disrupting applications or users. Wide data distribution and federation can be managed through
systematic policies to meet various service level goals for access, high availability, protection, cost
and performance.
The object layer abstraction also provides a great benefit to applications that previously might have
had to be intimately storage aware to avoid running out of space or had to otherwise actively
manage data locations. Because applications written to leverage object storage don’t have to embed
rules or code specific knowledge of storage infrastructure details, they avoid having to be rewritten or re-architected for “changing” storage assignments as users spread, features expand, and
data sets grow.
MULTI-TENANCY
Secure multi-tenancy is a key requirement of cloud object storage, which should support two levels
of multi-tenancy: tenants and sub-tenants. Tenants are top-level entities that each has its own
access points, security controls and master storage policies. Tenants share nothing with other
tenants and are fully isolated. Every node gets assigned to a specific tenant; tenants do not share
nodes and therefore each tenant has its own dedicated access points and storage. Within a large
company, a tenant could be set up for independently managed divisions or subsidiaries. In a service
provider implementation, the tenant might be mapped to a broad storage service offering.
Copyright The TANEJA Group, Inc. 2012. All Rights Reserved.
87 Elm Street, Suite 900  Hopkinton, MA 01748  T: 508.435.2556  F: 508.435.2557
4 of 12
www.tanejagroup.com
Technology in Brief
Sub-tenants are then created within each tenant with security controls and defined management
policies assigned by the tenant. Each sub-tenancy defines a distinct storage environment with
isolated management for its own users, object namespace, and defined shares. A sub-tenant within
a company might correspond to a department, while a storage provider's sub-tenant might track to
a specific client account.
This highly functional multi-tenancy capability makes it easy to create private sandboxes or
implement a global content delivery scheme. With some planning, this scheme could enable large
corporations to facilitate aggregating “big data” distributed across the enterprise.
ACCESS FROM ANYWHERE
As a cloud object storage service with a flat global namespace, an object can be accessed through
any site (although for performance, policies might strive to replicate objects to sites closer to where
they will be read). In addition, object storage for the cloud must present a broad range of access
methods including both web services and traditional file services.
REST (and SOAP) web services are key APIs. REST is the most common cloud storage access
method for browser and custom mobile applications. REST as a protocol over HTTP was designed
to optimize web-style remote access to “resources”, and is an ideal match to object storage where
each object can be easily treated as a REST resource.
Figure: Typical cloud-based object storage deployment
POLICY DRIVEN MANAGEMENT
A key benefit of object storage is the ability to use metadata to drive automatic data management
policies. Policies should support service levels, and should be triggered when data objects are
created, objects hit certain ages, or upon metadata updates. Policies can control data protection
operations including the number, type and target locations for replicas, inherent storage features
for striping, compression and de-duplication, retention locks and automatic deletion, and shifting
objects into different policies over time.
Copyright The TANEJA Group, Inc. 2012. All Rights Reserved.
87 Elm Street, Suite 900  Hopkinton, MA 01748  T: 508.435.2556  F: 508.435.2557
5 of 12
www.tanejagroup.com
Technology in Brief
The policy mechanism should be highly flexible, targeting policies to any group of objects based on
both system and user defined metadata. Policies can be used to build service levels by defining the
amount of replication, implement archive rules for compliance, and optimize capacity and
performance as items age.
Primary Object Use Cases in the Cloud
Cloud-based archiving, particularly medical and file archiving, forms the primary use case for
object-based storage. Web application development is surging forward, and Archive-as-a-Service
and its providers round out the fastest-growing use cases.
PRIMARY USE CASE: ACTIVE ARCHIVES
Archived information is playing a more strategic role in workflows and business processes. Onpremise archiving is essentially static and used to reduce storage costs, improve operational
efficiency, retention and compliance, and enable the business to use archived data to make better
business decisions. Cloud-based archiving retains elements of these features but adds new dynamic
ones: instant access from any device, archive as a service and federating to private or public cloud.
Atmos provides both the static and dynamic features that massive active archives require.

Federate to public or private clouds. Federation enables companies to treat on-premise and
cloud object storage as a single efficient infrastructure. Companies may pool distributed storage
assets including data, applications and policies to take full advantage of the cloud’s massive
scalability and global access features. Federation also lowers cost and risk: application
workloads run on cloud resources with a low execution cost, and if a cloud-based storage
system goes down the distributed workload remains protected. Federation extends internal
policies to cloud-based storage environments by applying existing policies and settings to
cloud-based storage.

Use metadata to drive business and storage decisions. We expect the use of metadata to
expand quickly to directly feed business exploitation processes, as well as support more
automatic and intelligent storage management decisions. A singly managed distributed system
that maintains directly accessible object metadata yields rich support for business decisions.
Object-based storage also enables IT to automate information lifecycle management across the
entire distributed data store, not just by storage silo. Policies should be flexible enough to be set
at the object, tenant or system levels, to automate archive decisions, set and manage retention,
expiration, and disposition.

Multi-tenancy for secure shared storage. Multiple applications can safely co-exist as separate
tenants. Isolation by tenant protects security while enabling the sharing of system-wide
resources and capacity. Multi-tenancy is also efficient since it is subscribed to a highly scalable
pool of storage, which can flexibly up-scale and down-scale on demand.

Massive scalability. Unstructured data storage is growing so fast that traditional storage
systems are straining purchase, maintenance and management resources to the brink.
Distributed object-based architecture yields near-limitless scale. Object also allows for
automatic load balancing whenever new objects are stored, which protects high performance
across the entire distributed system.

Multi-site active/active. Multi-site active/active architecture is an important component of
object-based storage, especially in the cloud. Cloud object storage systems span multiple sites
and provide for multi-site direct access to objects through both synchronous and asynchronous
replications. This model replicates between multiple storage nodes and sites, which not only
increases distributed availability and content distribution, but also supports disaster recovery.
Copyright The TANEJA Group, Inc. 2012. All Rights Reserved.
87 Elm Street, Suite 900  Hopkinton, MA 01748  T: 508.435.2556  F: 508.435.2557
6 of 12
www.tanejagroup.com
Technology in Brief

Archive-as-a-service. The most agile and flexible way for IT to deliver archive services is with
the cloud model of self-service portals. This model manages and meters utilization and
bandwidth and supports third-party chargeback. Within an enterprise this flexibility and
instant storage relieves users of the temptation of using commercial cloud services simply
because they can get the storage they need fast – even though security might not be in place.
This approach also enables ISVs and MSPs to extend archive requirements and offerings.

Reduce manual tasks and provisioning across multiple archives. Cloud-based archives
must be easy to set-up and for reliability and consistency must not require long or deep manual
configuration. They should also automate underlying complexities including security, audit,
retention, performance, and capacity growth. Atmos provides these features and more,
relieving the cloud administrator of enormous burdens. Distributed systems may be managed
as a single entity with policies to automate hundreds of management and data protection tasks.
And perhaps the most important of all, object-based systems like Atmos offer massive
scalability of capacity and performance thanks to their unique architecture.
FAST-GROWING USE CASE: WEB AND MOBILE APPLICATION DEVELOPMENT
Web and mobile applications development using unstructured data also has driving needs that
object-based cloud storage meets. Web application development requires quick access to storage
resources, test/dev environments capable of storing multiple copies of large data sets, and the
ability to test web applications in real-time online environments. These requirements are
understandably hard to achieve in traditional using file-based storage systems.
Applications written to leverage object storage won’t need to be rewritten or even taken offline as
the object storage seamlessly (or elastically) expands over time. Atmos provides the key
capabilities that web application development require, including location transparency, selfmanaging storage and REST APIs.

Enable instant access to data from any device. Web and mobile applications are inherently
geographically distributed, yet file systems are usually limited in both effective access points
(location) and number of files that they can manage. Object-based storage abstracts its storage
from physical locations, providing a secure access point in place of device-specific mount points.
Web services APIs and file-based access allow approved users to easily access their archives
from computers and a broad array of mobile devices. Integrated web services over REST and
SOAP are key to this instant access. Other support components are file-based access (CIFS / NFS
/ IFS / CAS), and expanded access via ISV applications.

Self-managing storage. In traditional development, applications have often been hard-coded
to specific data stores through pointers to identified LUN’s or file system navigation paths. In
contrast, object storage provides a clean mapping from application to data through a simple
REST API with an immutable unique object ID to the stored object. This goes a long way
towards eliminating traditional, time-consuming storage management tasks like LUN creation
and RAID groups. Cloud owners may choose to extend self-management options to customers,
making it simple for users to grow storage capacity on demand.

Broad API support. Cloud object storage is basically shared storage accessed through webbased services. Atmos’ architecture supports rapid web application development with a broad
API set including REST and S3. REST API leverages HTTP operations on objects that are directly
addressed, which reduces code complexity and provides the kind of easy, automatically
distributed, protected, persistent storage the developer needs. In addition to the REST API,
EMC Atmos also natively supports the Amazon S3 API. This provides customers with the ability
to simply point S3 applications to Atmos and seamlessly migrate their applications to any of the
more than 40 Atmos powered public clouds around the globe.
Copyright The TANEJA Group, Inc. 2012. All Rights Reserved.
87 Elm Street, Suite 900  Hopkinton, MA 01748  T: 508.435.2556  F: 508.435.2557
7 of 12
www.tanejagroup.com
Technology in Brief
EMC and Object-Based Storage
EMC first introduced Centera CAS for archiving in 2002. Centera offers 5 9’s data availability with
its redundant array of independent nodes (RAIN) is interconnected via cube switches, protecting
data across independent nodes in a cube. Mirroring and parity provide additional protection and
availability.
Centera’s CAS architecture keeps the retained data from being compromised or deleted before the
end of its retention period. Centera assigns unique hash-code identifiers specific to each unique
object including content elements, metadata, and data/metadata relationships. This inextricably
links content elements with their metadata, which are stored within a flat address space – no need
for a separate database. This architecture ensures authenticity of the archived objects. Centera
abstracts the unique objects from their generating applications and operating systems, which
enables Centera to flexibly act as the single, highly optimized data store for previously siloed
archives.
Centera retains single instances of archived objects. In the case of multiple users of the same file –
such as a PowerPoint file sent over a distribution list – Centera retains metadata with information
about each user’s interaction with the file, but points to the single instance of the object. By cutting
down on data copies, this results in dramatic reductions in the quantity of archive storage.
Centera searches using metadata, rather than opening up the content objects on application-specific
storage. This results in much faster and more efficient searches without using application cycles.
This is possible because content and metadata stored on Centera is application, file and operating
system independent; and Centera offers is a search engine right in its repository.
Centera’s content-based addressing integrates directly with application environments via APIs,
with no need for kernel level dependencies. This means that multiple applications can
simultaneously use Centera, and that specific archiving management attributes – such as data aging
and data protection -- can be executed per application. These capabilities create a complete chain of
custody once the data leaves the primary application to be archived on Centera. Media
independence also leverages Centera’s application support. Centera objects are independent of
specific storage media and protocols, which means that the storage system can migrate to new
storage media over time without disturbing the integrity of the archived objects. For long term
disk-based archiving, this represents significant risk mitigation and investment protection.
Centera architecture is highly scalable and self-managing. Traditional file systems scale based on
the amount of stored data versus remaining available address space – which may not be much. As
the file system reaches its maximum capacity, administrators must expand the entire file system
including operating system, file system, and application in order to scale the archive. In contrast,
Centera expands to petabyte-high capacities due to their flat address space. It also leverages its
architecture to distribute management controls across the entire archive infrastructure. For
example, if a Centera disk or node fails, the archive cluster knows how to self heal without manual
intervention. This distributed management structure extends to cover the deployment, scaling,
recovery and protection of all the archival objects being stored by Centera.
Centera optimizes archiving, information governance and compliance. Users may choose from 300
native, integrated archiving applications to manage archival needs for email, files, medical imaging,
content management, video, voice, and more on the single Centera archiving platform. In addition,
Centera offers Compliance Edition Plus for compliance and eDiscovery, and Governance Edition for
data retention management.
Copyright The TANEJA Group, Inc. 2012. All Rights Reserved.
87 Elm Street, Suite 900  Hopkinton, MA 01748  T: 508.435.2556  F: 508.435.2557
8 of 12
www.tanejagroup.com
Technology in Brief
Centera Compliance Edition Plus captures and preserves original content, protecting data and
proving chain of custody for legal eDiscovery and litigation. Retention classes assign a logical
reference to each electronic record object; policies enforce data retention and safe disposition.
Centera Governance Edition enforces internal policies for data retention and disposition. Policies
may be organizational or application-specific, which improves corporate accountability, reduces the
cost of eDiscovery and compliance, and proves the integrity of governance controls.
To the Cloud: Atmos Architecture
EMC’s Atmos supports the same CAS API as Centera for seamless migration, and brings object
storage into the cloud with massive scalability and geographic federation supported with multitenancy, cloud provisioning and global access features. While Atmos is readily leveraged to extend
active global archives, it also offers an exceptional platform for web and mobile application
development. Atmos even enables new opportunities for global “big” data aggregation and
distribution.
Atmos is at heart a software storage system for building private and public cloud storage. Atmos
implementations are available from EMC either already integrated into pre-packaged physical
building blocks or as a virtual machine solution for VMware vSphere that can leverage other EMC or
3rd party storage resources. Additionally, there is a rich ecosystem of service providers providing
Atmos as cloud Storage-as-a-Service directly. Any and all of these options can be federated together
as needed within and across a given organization.
EMC uses REST and SOAP web services, and has also implemented file services on top of Atmos to
serve underlying objects through the lens of either an NFS or CIFS file server. When NFS or CIFS
shares are defined, they are assigned to specific Atmos nodes (or dedicated pairs for HA) and utilize
the Atmos node’s inherent Linux capabilities (leveraging an Installable File System with the FUSE
extension). Layering a file system over Atmos imposes some constraints regarding universal access,
but also enables both traditional and transitional applications and file system type usage.
EMC Atmos Windows and Linux users can also leverage the EMC GeoDrive add-on that installs on a
single user workstation or server to provide remote virtual NFS/CIFS style access (over REST) to
Atmos object storage. GeoDrive supports local caching of files for offline use and eventual
synchronization on reconnection. One of the major benefits of GeoDrive is enabling a user to access
large amounts of protected storage from anywhere. It can also be used for the disaster recovery of
files pushed or mirrored into Atmos.
Atmos technically maintains a given piece of data as an object with associated metadata that
includes the object ID, system and user-defined metadata fields and the internal object layout
information (and parent/child information for objects saved through a file system “namespace”
interface). Applications and users can store arbitrary metadata with each object that can be
leveraged by group management policies. Policies can be created at the tenant level as a design
scheme to provide various service levels of performance access, and data protection based on some
awareness of the multi-site architecture of the cloud implementation. They are then assigned to
subtenants, who need to not be aware of the underlying implementation, to apply as target service
levels to their objects. For example, the power to explicitly enforce compression of image files (e.g.
jpegs) after a number of days would present a significant capacity optimization for a web-based
application dealing with millions of images.
In addition to supporting compliance and retention policies, metadata can be used to drive
automated file distribution, access control and data protection activities optimizing for the
appropriate level of data resiliency, performance and availability. For most applications, thoughtful
use of user metadata can remove any need to implement a separate management tracking database
for stored objects.
Copyright The TANEJA Group, Inc. 2012. All Rights Reserved.
87 Elm Street, Suite 900  Hopkinton, MA 01748  T: 508.435.2556  F: 508.435.2557
9 of 12
www.tanejagroup.com
Technology in Brief
Replication is controlled by automated policies which can mirror data objects at many points in an
object’s lifecycle both within and across multiple sites. Within a data center site, replication might
for example be set to happen synchronously upon ingestion while between replication between
sites might be set asynchronously and launched with an arbitrary delay to allow for data settling.
Replications can be targeted to specific locations, or abstractly sent to “other” sites as the system
decides.
For performance and availability, replicas are all active for read access (objects are inherently
immutable so there is no issue with having to manage distributed locking mechanisms). Because it
is “multi-site active/active”, any site can fulfill new object write requests when the local primary
site is unavailable.
In addition to full replication, EMC also provides an erasure coding option called GeoParity. Instead
of keeping two or more full 100% copies, “9/12” erasure coding enables storing an “expanded”
object containing only 33% additional encoded “redundant” data broken up into 12 segments. By
using erasure coding, the original data can be reconstructed dynamically from any 9 of the
segments. These segments are cleverly distributed so that the object can survive (and even be
accessed during) multiple failures. For greater protection there is also a “10/16” coding with a 60%
capacity overhead. Erasure coding does impact access performance, especially at ingestion, but
provides great fault tolerance with much lower capacity utilization. Of course, policies can be
written to convert replicated objects to erasure coded schemes as they age appropriately.
With object stores there is generally no need for low-level RAID or disk level protection and Atmos
is no exception. Upon hardware failures, replications and/or GeoParity across nodes (RAIN)
combined with built-in node auto-healing features suffice to provide the full data protection as
determined by the service level “policies” implemented for each type of data object. Atmos can
withstand the loss of any disk, node, rack, or even site.
Atmos Pre-built Hardware Configurations
EMC Atmos pre-configured hardware “appliances” consists of a rack/cabinet containing from 4 to
16 Atmos nodes in various configurations and disk capacities. Flexible configurations enable
smooth scalability, and allow for mixes of capacity and performance in and across Atmos sites. An
Atmos storage node consists of a 1GbE server front-end running the Atmos storage services
connected to one or more SAS attached disk array enclosures (DAE), each containing 15 1-3TB
7200RPM disks. Every node runs all object storage services (the first two nodes in each site also
run the site metadata locator service that indexes which node contains which objects) supporting
tremendous horizontal system scalability.
EMC has also introduced their new Atmos G3 series for new levels of density and energy efficiency.
G3-Dense-480 is the first in the Atmos G3 series and consists of 4, 6, or 8 nodes with 480 disks in
40U, and 3TB drives.
TABLE: ALIGNING TOP CLOUD USE CASES WITH EMC ATMOS
Use case
Challenge
Benefits
Medical
Archiving
Over 800 million medical imaging
procedures a year require huge
storage scalability; collaboration and
compliance increase complexity.
Vendor Neutral Archive (VNA) on Atmos:
integrates with EMR/EHR and improves
PACs for better patient care and
collaboration, improves data lifecycle
management, reduces IT costs, and
preserves HIPAA compliance.
File Archiving
Corporate file sharing is popular with
With EMC Sync & Share, users can securely
Copyright The TANEJA Group, Inc. 2012. All Rights Reserved.
87 Elm Street, Suite 900  Hopkinton, MA 01748  T: 508.435.2556  F: 508.435.2557
10 of 12
www.tanejagroup.com
Technology in Brief
employees but syncing and sharing
are hard to manage. Employees will
frequently share files anyway over
mobile devices, leaving corporations
accountable for risky behavior.
share Atmos files across mobile devices,
Linux and Windows. GeoDrive creates a
Dropbox-like service that is secure and
manageable, powered by Atmos’ fast
performance. Atmos policies monitor
changes to data and provide access control,
benefitting regulated verticals like finance.
Archive as a
Service
Both the enterprise and storage
service providers struggle to provide
IT services to their respective
customers. Provisioning,
maintenance, and security are all
difficult issues in traditional storage
offerings.
The Atmos Cloud Delivery Platform enables
corporations and service providers to meter
capacity, bandwidth, and usage across
tenants. Provisioning is automated by
tenant, and Atmos allows tenants to safely
self-manage and access their own storage.
Managed
Service
Providers
Many MSPs suffer from narrow profit
margins because of the expense of
delivering storage to customers.
Managing multiple tenants, manual
provisioning and maintaining service
level agreements all cut into revenue
and make it too expensive to add
new storage services.
Atmos lets MSPs efficiently offer storage as
a service and better monetize new service
offerings. MSPs can monitor capacity and
usage for chargeback, reduce provisioning
costs, and replace multiple tenant management systems with a single system. Dynamic
scaling, high availability and security costeffectively meet service level requirements.
Content-Rich
Web
Applications
Traditional storage is a poor
environment for Web application
development, which needs highly
scalable capacity for multiple large
data sets, a secure environment for
test/dev and application testing in
real-time environments.
Atmos provides location transparency for
global applications and a highly mobile user
base. The single namespace means that
application developers never need to recode
pathnames and locations, and do not need
to code for limited storage environments.
Self-management options make it easy for
customers to provision their own storage,
and REST APIs reduce application
complexity.
Taneja Group Opinion
When on-premise archive solutions smoothly integrate with federated storage, then public and
private clouds provide extensive scalability and global availability. Yet we see too many end-users
treating the cloud as just another storage tier for low value retained data. This is a huge waste of
cloud possibilities but we understand why it happens: cloud platforms with poor performance and
delivery mechanisms can make cloud-based storage more trouble than it’s worth.
But when we talk about EMC Atmos we are not talking about a low-cost storage tier, far from it. We
are describing the heart of business innovation based on highly secure and highly accessible global
data stores. EMC’s long expertise with object-based storage has kept Centera relevant and has
extended dynamic data management to the cloud with Atmos. The Atmos-fueled cloud replaces
hierarchical file storage while allowing the secure flow of information between the data center, the
distributed cloud, and global access points. Customers profit from greatly improved application and
data delivery, and the deep business value inherent in their valuable data.
Copyright The TANEJA Group, Inc. 2012. All Rights Reserved.
87 Elm Street, Suite 900  Hopkinton, MA 01748  T: 508.435.2556  F: 508.435.2557
11 of 12
www.tanejagroup.com
Technology in Brief
When a company is dealing with geographic reach and large growing volumes of rich content, then
they should look to object-based storage in the cloud. We fully support EMC in its push to scale
capacity, performance, availability and management far beyond what traditional file systems are
capable of, and more massively than ever before.
NOTICE: The information and product recommendations made by Taneja Group are based upon public
information and sources and may also include personal opinions both of Taneja Group and others, all of which we
believe to be accurate and reliable. However, as market conditions change and not within our control, the
information and recommendations are made without warranty of any kind. All product names used and
mentioned herein are the trademarks of their respective owners. Taneja Group, Inc. assumes no responsibility or
liability for any damages whatsoever (including incidental, consequential or otherwise), caused by your use of, or
reliance upon, the information and recommendations presented herein, nor for any inadvertent errors that may
appear in this document.
.
Copyright The TANEJA Group, Inc. 2012. All Rights Reserved.
87 Elm Street, Suite 900  Hopkinton, MA 01748  T: 508.435.2556  F: 508.435.2557
12 of 12
www.tanejagroup.com