...

Solution Brief Bridging the Infrastructure Gap for Unstructured Data with Object Storage

by user

on
Category: Documents
23

views

Report

Comments

Transcript

Solution Brief Bridging the Infrastructure Gap for Unstructured Data with Object Storage
89 Fifth Avenue, 7th Floor
New York, NY 10003
www.TheEdison.com
@EdisonGroupInc
212.367.7400
Solution Brief
Bridging the Infrastructure Gap for
Unstructured Data with Object Storage
Printed in the United States of America
Copyright 2016 Edison Group, Inc. New York.
Edison Group offers no warranty either expressed or implied on the information contained
herein and shall be held harmless for errors resulting from its use.
All products are trademarks of their respective owners.
First Publication: January 2016
Produced by: Brandon Moore, Analyst; Manny Frishberg, Editor; Barry Cohen, Editor-in-Chief
Table of Contents
Opportunities and Challenges of Unstructured Data ........................................................... 1
Object Storage: The Solution for the Unstructured Data Deluge....................................... 3
Key Advantages of Object Storage ......................................................................................... 3
EMC ECS and Object Storage.................................................................................................... 7
Opportunities and Challenges of Unstructured Data
Whether you are a startup or have been in business for 100 years, the rules are the same.
Every business wants to get as close as possible to their customers. The closer you are to
them, the closer you are to revenue generation. The world is experiencing a data
revolution that can help put your company top of mind for all current and potential
customers. Using different methods, like surveys, banner ads, email campaigns, social
media, or targeted product recommendations, companies and entities, for instance
federal, state and local governments, have been collecting this data for years. For
example, http://www.data.gov contains nearly 193,000 different datasets from 422
publishers with 78 agencies and sub agencies of the U.S. Government that are changing
monthly.
Access to these large data sets is just the tip of the proverbial iceberg. This data requires
new methods of storage. Collection and management of data provides additional non-IT
challenges. These include security concerns, data locality based on regulations, privacy
laws as well as the total cost of ownership (TCO) when scaling past the petabyte range.
Business leaders want to collect and analyze data from each part of their revenue chain
to deliver the best products and services, while mitigating risk, generating revenue, and
maximizing margins.
As an IT professional, you view this problem from a different perspective:

How much data will be generated and where can I store it cost effectively?

Where is the data coming from?

What kind of data is it? Structured, unstructured, or both?

Who or what application needs to use it?

How do I protect that data?

Data generation is a “one-time” event; do I have what I need to collect this data?
These questions refer to the deluge often associated with unstructured data. The
challenge is being prepared for what the business has asked of you with these questions.
As more and more systems, devices and sensors in your company’s revenue chain
generate data that needs to be collected, you begin to understand the gaps in your
infrastructure’s ability: A storage gap and an application gap. The solution requires the
flexibility to address different kinds of applications, development cycles, and
infrastructure.
Edison: Bridging the Infrastructure Gap for Unstructured Data with Object Storage
Page 1
Figure 1, illustrates the gaps often found in IT infrastructure trying to address this
increase in data.
Figure 1: Infrastructure Gap for Unstructured Data Storage
At the center are the core systems that run your business, often referred to as systems of
record. These systems are some of the most protected, regulated, and secure assets in the
company. As a result, they were not built to have the flexibility to interface with systems
that generate large amounts of unstructured data with different speeds and sources. The
data generated by the web, people, devices, sensors, and your revenue chain is
unstructured, unpredictable, and unending. Looking at Figure 1, the infrastructure gap
between the systems interacting with your revenue chain and those currently running
your business becomes clear: Storage technology is at the foundation of this challenge.
A keystone of data collection in IT, storage technology is experiencing a revolution
centered on addressing the challenges of the unstructured data gap. Object storage is a
solution for storage of unstructured data and application systems analyzing and
transforming data.
In this solution brief, Edison Group explores the infrastructure gap and evaluates object
storage and EMC Elastic Cloud Storage (ECS) as the solution to fill that gap.
Edison: Bridging the Infrastructure Gap for Unstructured Data with Object Storage
Page 2
Object Storage: The Solution for the Unstructured
Data Deluge
While not a new concept, object storage is one of the hottest terms in IT today. As a
result, startups are being acquired, and “born on the web” companies are innovating
heavily in this field. Using a flat namespace, object storage uses a globally unique
address and metadata to store data. This method reduces the overhead needed to
manage storage systems such as:

LUN creation, expansion, or migration

Applying data protection schemes

Creating, extending, and managing filesystems
With object storage, data can be of any size and type. From documents, images, audio
and video, there is no need to apply special techniques to store these and other data
types. Along with a unique global ID, metadata is embedded with each object. Users and
applications can embed additional metadata to further increase and customize ease of
identification.
Key Advantages of Object Storage
Let’s explore the advantages of object storage for storing unstructured data. Traditional
storage systems are built to interact with operating systems and people. Object storage is
built to interact with applications and many different data sources. Block storage is
needed to provide a place for the applications to run in the operating systems to live and
generate data. Some of the data applications generate is not best suited for block storage.
Examples of that type of data are:

Backup files (database dumps, virtual machine level backups, and other
backup/recover applications)

Content repositories for content archival (document archival, compliance data,
email, databases)
This type of data is considered referential from an application perspective, meaning it
needs to be available for recall but not accessed frequently. Additionally, the volume of
this data far exceeds what is used in the applications operation. As a result, the data
does not require the performance and availability characteristics of block storage. Object
storage excels at storing this type of data because application can write data directly
Edison: Bridging the Infrastructure Gap for Unstructured Data with Object Storage
Page 3
using a TCP/IP connection to a programmable API on the object storage system. This
characteristic is what defines object storage systems as software defined storage.
Access connections for object storage are delivered via TCP/IP and Ethernet. Since no
setup of LUNs, RAID, or filesystems are required for use, integration into existing
environments is seamless, providing excellent time to value. Object storage systems
deliver unlimited capacity expansion as a cost effective, high value solution for warm
archival of referential data. These characteristics of object storage also provide an
excellent solution for the following use cases:

Archiving files in place of local tapes and tape libraries

Offsite backup and archive storage for disaster recovery

Archive tiering for network attached storage (NAS)

Remote office and back office (ROBO)
Extending or replacing capacity on current NAS devices with object storage systems can
improve TCO for your storage environment without needing to purchase identical or
similar equipment. Object is a viable option for disk based recovery time objectives for
disaster recovery plans. Replication, another feature of object storage, extends
recoverability and mitigates risk as it can be extended outside of the datacenter to other
locations under your security control.
The setup of an object storage system is simplified for implementation, but so is
management moving forward. Because of the flat namespace, capacity expansion and
upgrades can be executed with no downtime in most cases.
Monitoring and administrative tools are web based allowing for use anywhere on your
secured network. This ultimately means less of a learning curve to achieve operational
efficiency as the systems continue to grow. Once operational efficiency is achieved, the
TCO of your environment can be further reduced when adding more workloads.
The flat namespace also allows for multiple types of data to be stored side by side.
Regardless of the data, it is all viewed as object, their globally unique IDs, and metadata.
This puts your company in excellent position to ingest and store data from the following
sources:

Large data sets: Financial, pharmaceutical, geospatial, biotech, and legal

Public data sets: Weather, government

Security, imagery, and social media: Images, videos, blogs

Revenue chain data: Sensors, devices, Internet of Things
Edison: Bridging the Infrastructure Gap for Unstructured Data with Object Storage
Page 4
The ability to store this data locally incorporates your own IT security and lessens the
exposure of your company if this data was stored in a public cloud. You also have
greater control to share data with your partners for support and service of your revenue
chain.
Knowing that being prepared for the data deluge is critical to your company’s future
successes, Figure 2 highlights how object storage and its advantages fill the previously
identified infrastructure gap.
Figure 2: How Object Storage Fills the Unstructured Data Storage Infrastructure Gap
Edison Group believes object storage technology is best equipped to address the
infrastructure gap between your business, customers, and the unstructured data they
both generate. We recommend that organizations begin to investigate this technology
within the next 3-6 months and plan to implement it in their environment within the
next 6-18 months, as there are likely several areas of immediate need for object storage.
Some of them include:

Backup and disaster recovery

Archive data

Content management repositories
Edison: Bridging the Infrastructure Gap for Unstructured Data with Object Storage
Page 5

Compliance and regulated data archives (Sarbanes Oxley – SOX, BASEL, etcetera)

NAS migration/modernization

Remote office storage solutions

Enterprise data warehouse (EDW) data offload
Object storage systems, having low time to value for implementation along with easy,
petabyte scalability, also enable your IT department to move quickly to address the
concerns of LoB application development lifecycles. By providing “as a Service
offerings” and a foundational, “in-house” private cloud storage environment object
storage allows your company to further big data analytics, data lakes, and Internet of
Things (IoT) development at a greatly reduced security risk and cost.
Based upon this evaluation, Edison recommends object storage solutions to meet the
demands of ever-increasing data. Now that we understand the challenge associated with
unstructured data and what technology is needed to bridge the infrastructure gap, let’s
get an overview of how an object storage solution, EMC ECS, can close your storage
infrastructure gap for unstructured data.
Edison: Bridging the Infrastructure Gap for Unstructured Data with Object Storage
Page 6
EMC ECS and Object Storage
EMC provides a cloud-scale object storage platform that meets the storage demands of
today and beyond through their ECS solution. ECS is a turnkey, on-site solution offering
all the advantages of commodity infrastructure with enterprise grade reliability,
availability, and serviceability. ECS can efficiently store PBs of data – whether billions of
small files and/or large files – in a cost appropriate state-of-the-art, storage system.
EMC ECS Appliance features include:

Universal protocol support in a single platform with support for object, file (NFS),
and HDFS

Single management view across multiple types of infrastructures

Geo-federated, active-active architecture with a single global namespace, enabling
the management of a geographically distributed environment as a single logical
resource using metadata-driven policies to distribute and protect content

Multi-tenancy support, detailed metering, and an intuitive self-service portal, as well
as billing integration
These features allow customers to extend automation capabilities and deliver improved
efficiencies across their storage environments, providing better control of operating
expenses as data growth continues to rise at unprecedented rates — one of the key pain
points customers face in the current IT landscape.
To help put this data growth in context, the Digital Universe is growing 40 percent
yearly into the next decade.1 By 2020, it will contain as many digital bits as there are
stars in the universe. This vast amount of data makes storing, accessing and managing
all this data difficult, not to mention expensive. The way customers distribute and
protect their data at scale today will play a very important role in how successful they
are in the future.
1
http://www.emc.com/leadership/programs/digital-universe.htm
Edison: Bridging the Infrastructure Gap for Unstructured Data with Object Storage
Page 7
Fly UP