Solution Brief Bridging the Infrastructure Gap for Unstructured Data with Object Storage
by user
Comments
Transcript
Solution Brief Bridging the Infrastructure Gap for Unstructured Data with Object Storage
89 Fifth Avenue, 7th Floor New York, NY 10003 www.TheEdison.com @EdisonGroupInc 212.367.7400 Solution Brief Bridging the Infrastructure Gap for Unstructured Data with Object Storage Printed in the United States of America Copyright 2016 Edison Group, Inc. New York. Edison Group offers no warranty either expressed or implied on the information contained herein and shall be held harmless for errors resulting from its use. All products are trademarks of their respective owners. First Publication: January 2016 Produced by: Brandon Moore, Analyst; Manny Frishberg, Editor; Barry Cohen, Editor-in-Chief Table of Contents Opportunities and Challenges of Unstructured Data ........................................................... 1 Object Storage: The Solution for the Unstructured Data Deluge....................................... 3 Key Advantages of Object Storage ......................................................................................... 3 EMC ECS and Object Storage.................................................................................................... 7 Opportunities and Challenges of Unstructured Data Whether you are a startup or have been in business for 100 years, the rules are the same. Every business wants to get as close as possible to their customers. The closer you are to them, the closer you are to revenue generation. The world is experiencing a data revolution that can help put your company top of mind for all current and potential customers. Using different methods, like surveys, banner ads, email campaigns, social media, or targeted product recommendations, companies and entities, for instance federal, state and local governments, have been collecting this data for years. For example, http://www.data.gov contains nearly 193,000 different datasets from 422 publishers with 78 agencies and sub agencies of the U.S. Government that are changing monthly. Access to these large data sets is just the tip of the proverbial iceberg. This data requires new methods of storage. Collection and management of data provides additional non-IT challenges. These include security concerns, data locality based on regulations, privacy laws as well as the total cost of ownership (TCO) when scaling past the petabyte range. Business leaders want to collect and analyze data from each part of their revenue chain to deliver the best products and services, while mitigating risk, generating revenue, and maximizing margins. As an IT professional, you view this problem from a different perspective: How much data will be generated and where can I store it cost effectively? Where is the data coming from? What kind of data is it? Structured, unstructured, or both? Who or what application needs to use it? How do I protect that data? Data generation is a “one-time” event; do I have what I need to collect this data? These questions refer to the deluge often associated with unstructured data. The challenge is being prepared for what the business has asked of you with these questions. As more and more systems, devices and sensors in your company’s revenue chain generate data that needs to be collected, you begin to understand the gaps in your infrastructure’s ability: A storage gap and an application gap. The solution requires the flexibility to address different kinds of applications, development cycles, and infrastructure. Edison: Bridging the Infrastructure Gap for Unstructured Data with Object Storage Page 1 Figure 1, illustrates the gaps often found in IT infrastructure trying to address this increase in data. Figure 1: Infrastructure Gap for Unstructured Data Storage At the center are the core systems that run your business, often referred to as systems of record. These systems are some of the most protected, regulated, and secure assets in the company. As a result, they were not built to have the flexibility to interface with systems that generate large amounts of unstructured data with different speeds and sources. The data generated by the web, people, devices, sensors, and your revenue chain is unstructured, unpredictable, and unending. Looking at Figure 1, the infrastructure gap between the systems interacting with your revenue chain and those currently running your business becomes clear: Storage technology is at the foundation of this challenge. A keystone of data collection in IT, storage technology is experiencing a revolution centered on addressing the challenges of the unstructured data gap. Object storage is a solution for storage of unstructured data and application systems analyzing and transforming data. In this solution brief, Edison Group explores the infrastructure gap and evaluates object storage and EMC Elastic Cloud Storage (ECS) as the solution to fill that gap. Edison: Bridging the Infrastructure Gap for Unstructured Data with Object Storage Page 2 Object Storage: The Solution for the Unstructured Data Deluge While not a new concept, object storage is one of the hottest terms in IT today. As a result, startups are being acquired, and “born on the web” companies are innovating heavily in this field. Using a flat namespace, object storage uses a globally unique address and metadata to store data. This method reduces the overhead needed to manage storage systems such as: LUN creation, expansion, or migration Applying data protection schemes Creating, extending, and managing filesystems With object storage, data can be of any size and type. From documents, images, audio and video, there is no need to apply special techniques to store these and other data types. Along with a unique global ID, metadata is embedded with each object. Users and applications can embed additional metadata to further increase and customize ease of identification. Key Advantages of Object Storage Let’s explore the advantages of object storage for storing unstructured data. Traditional storage systems are built to interact with operating systems and people. Object storage is built to interact with applications and many different data sources. Block storage is needed to provide a place for the applications to run in the operating systems to live and generate data. Some of the data applications generate is not best suited for block storage. Examples of that type of data are: Backup files (database dumps, virtual machine level backups, and other backup/recover applications) Content repositories for content archival (document archival, compliance data, email, databases) This type of data is considered referential from an application perspective, meaning it needs to be available for recall but not accessed frequently. Additionally, the volume of this data far exceeds what is used in the applications operation. As a result, the data does not require the performance and availability characteristics of block storage. Object storage excels at storing this type of data because application can write data directly Edison: Bridging the Infrastructure Gap for Unstructured Data with Object Storage Page 3 using a TCP/IP connection to a programmable API on the object storage system. This characteristic is what defines object storage systems as software defined storage. Access connections for object storage are delivered via TCP/IP and Ethernet. Since no setup of LUNs, RAID, or filesystems are required for use, integration into existing environments is seamless, providing excellent time to value. Object storage systems deliver unlimited capacity expansion as a cost effective, high value solution for warm archival of referential data. These characteristics of object storage also provide an excellent solution for the following use cases: Archiving files in place of local tapes and tape libraries Offsite backup and archive storage for disaster recovery Archive tiering for network attached storage (NAS) Remote office and back office (ROBO) Extending or replacing capacity on current NAS devices with object storage systems can improve TCO for your storage environment without needing to purchase identical or similar equipment. Object is a viable option for disk based recovery time objectives for disaster recovery plans. Replication, another feature of object storage, extends recoverability and mitigates risk as it can be extended outside of the datacenter to other locations under your security control. The setup of an object storage system is simplified for implementation, but so is management moving forward. Because of the flat namespace, capacity expansion and upgrades can be executed with no downtime in most cases. Monitoring and administrative tools are web based allowing for use anywhere on your secured network. This ultimately means less of a learning curve to achieve operational efficiency as the systems continue to grow. Once operational efficiency is achieved, the TCO of your environment can be further reduced when adding more workloads. The flat namespace also allows for multiple types of data to be stored side by side. Regardless of the data, it is all viewed as object, their globally unique IDs, and metadata. This puts your company in excellent position to ingest and store data from the following sources: Large data sets: Financial, pharmaceutical, geospatial, biotech, and legal Public data sets: Weather, government Security, imagery, and social media: Images, videos, blogs Revenue chain data: Sensors, devices, Internet of Things Edison: Bridging the Infrastructure Gap for Unstructured Data with Object Storage Page 4 The ability to store this data locally incorporates your own IT security and lessens the exposure of your company if this data was stored in a public cloud. You also have greater control to share data with your partners for support and service of your revenue chain. Knowing that being prepared for the data deluge is critical to your company’s future successes, Figure 2 highlights how object storage and its advantages fill the previously identified infrastructure gap. Figure 2: How Object Storage Fills the Unstructured Data Storage Infrastructure Gap Edison Group believes object storage technology is best equipped to address the infrastructure gap between your business, customers, and the unstructured data they both generate. We recommend that organizations begin to investigate this technology within the next 3-6 months and plan to implement it in their environment within the next 6-18 months, as there are likely several areas of immediate need for object storage. Some of them include: Backup and disaster recovery Archive data Content management repositories Edison: Bridging the Infrastructure Gap for Unstructured Data with Object Storage Page 5 Compliance and regulated data archives (Sarbanes Oxley – SOX, BASEL, etcetera) NAS migration/modernization Remote office storage solutions Enterprise data warehouse (EDW) data offload Object storage systems, having low time to value for implementation along with easy, petabyte scalability, also enable your IT department to move quickly to address the concerns of LoB application development lifecycles. By providing “as a Service offerings” and a foundational, “in-house” private cloud storage environment object storage allows your company to further big data analytics, data lakes, and Internet of Things (IoT) development at a greatly reduced security risk and cost. Based upon this evaluation, Edison recommends object storage solutions to meet the demands of ever-increasing data. Now that we understand the challenge associated with unstructured data and what technology is needed to bridge the infrastructure gap, let’s get an overview of how an object storage solution, EMC ECS, can close your storage infrastructure gap for unstructured data. Edison: Bridging the Infrastructure Gap for Unstructured Data with Object Storage Page 6 EMC ECS and Object Storage EMC provides a cloud-scale object storage platform that meets the storage demands of today and beyond through their ECS solution. ECS is a turnkey, on-site solution offering all the advantages of commodity infrastructure with enterprise grade reliability, availability, and serviceability. ECS can efficiently store PBs of data – whether billions of small files and/or large files – in a cost appropriate state-of-the-art, storage system. EMC ECS Appliance features include: Universal protocol support in a single platform with support for object, file (NFS), and HDFS Single management view across multiple types of infrastructures Geo-federated, active-active architecture with a single global namespace, enabling the management of a geographically distributed environment as a single logical resource using metadata-driven policies to distribute and protect content Multi-tenancy support, detailed metering, and an intuitive self-service portal, as well as billing integration These features allow customers to extend automation capabilities and deliver improved efficiencies across their storage environments, providing better control of operating expenses as data growth continues to rise at unprecedented rates — one of the key pain points customers face in the current IT landscape. To help put this data growth in context, the Digital Universe is growing 40 percent yearly into the next decade.1 By 2020, it will contain as many digital bits as there are stars in the universe. This vast amount of data makes storing, accessing and managing all this data difficult, not to mention expensive. The way customers distribute and protect their data at scale today will play a very important role in how successful they are in the future. 1 http://www.emc.com/leadership/programs/digital-universe.htm Edison: Bridging the Infrastructure Gap for Unstructured Data with Object Storage Page 7