Storage is the foundation of cloud services. All cloud services – delineated as scalable, elastic, on-demand, and self-service – begin with storage. Almost universally, cloud storage services are virtualized and hybrid cloud architectures that combine on-premise resources with colocation, private and public clouds result in highly redundant data environments. IDC’s FutureScape report finds “Over 80% of enterprise IT organizations will commit to hybrid cloud architectures, encompassing multiple public cloud services,…
Most people assume cloud storage is cheaper than on-premise storage. After all, why wouldn’t they? You can rent object storage for $276 per TB per year or less, depending on your performance and access requirements. Enterprise storage costs between $2,500 to $4,000 per TB per year, according to analysts at Gartner and ESG.
This comparison makes sense for primary data, but what happens when you make backups or copies of data for other reasons in the cloud? Imagine that an enterprise needs to retain 3 years of monthly backups of a 100TB data set. In the cloud, this can be easily equated to 3.6 PB of raw backup data, or a monthly bill of over $83,000. That’s about $1 million a year before you even factor in and data access or retrieval charges.
That is precisely why efficient deduplication is hugely important for both on-premise and cloud storage, especially when enterprises want to retain their secondary data (backup, archival, long-term retention) for weeks, months, and years. Cloud storage costs can add up quickly, surprising even astute IT professionals, especially as data sizes get bigger with web-scale architectures, data gets replicated and they discover it can’t be deduplicated in the cloud.
The Promise of Cloud Storage: Cheap, Scalable, Forever Available
Cloud storage is viewed as cheap, reliable and infinitely scalable – which is generally true. Object storage like AWS S3 is available at just $23/TB per month for the standard tier, or $12.50/TB for the Infrequent Access tier. Many modern applications can take advantage of object storage. Cloud providers offer their own file or block options, such as AWS EBS (Elastic Block Storage) that starts at $100/TB per month, prorated hourly. Third-party solutions also exist that connect traditional file or block storage to object storage as a back-end.
Even AWS EBS, at $1,200/TB per year, compares favorably to on-premise solutions that cost 2-3 times as much, and require high upfront capital expenditures. To recap, enterprises are gravitating to the cloud because the OPEX costs are significantly lower, there’s minimal up-front cost, and you pay as you go (vs. traditional storage where you have to buy far ahead of actual need)
How Cloud Storage Costs Can Get Out of Hand: Copies, Copies Everywhere
The direct cost comparison between cloud storage and traditional on-premise storage can distract from managing storage costs in the cloud, particularly as more and more data and applications move there. There are three components to cloud storage costs to consider:
- Cost for storing the primary data, either on object or block storage
- Cost for any copies, snapshots, backups, or archive copies of data
- Transfer charges for data
We’ve covered the first one. Let’s look at the other two.
Copies of data. It’s not how much data you put into the cloud — uploading data is free, and storing a single copy is cheap. It’s when you start making multiple copies of data — for backups, archives, or any other reason — that costs spiral if you’re not careful. Even if you don’t make actual copies of the data, applications or databases often have built-in data redundancy and replicate data (or in database parlance, a Replication Factor).
In the cloud, each copy you make of an object incurs the same cost as the original. Cloud providers may do some dedupe or compression behind the scenes, but this isn’t generally credited back to the customer. For example, in a consumer cloud storage service like DropBox, if you make a copy or ten copies of a file, each copy counts against your storage quota.
For enterprises, this means data snapshots, backups, and archived data all incur additional costs. As an example, AWS EBS charges $0.05/GB per month for storing snapshots. While the snapshots are compressed and only store incremental data, they’re not deduplicated. Storing a snapshot of that 100 TB dataset could cost $60,000 per year, and that’s assuming it doesn’t grow at all.
Data access. Public cloud providers generally charge for data transfer either between cloud regions or out of the cloud. For example, moving or copying a TB of AWS S3 data between Amazon regions costs $20, and transferring a TB of data out to the internet costs $90. Combined with GET, PUT, POST, LIST and DELETE request charges, data access costs can really add up.
Why Deduplication in the Cloud Matters
Cloud applications are distributed by design and are deployed on non-relational massively scalable databases as a standard. In non-relational databases, most data is redundant before you even make a copy. There are common blocks, objects, and databases like MongoDB or Cassandra have replication factor (RF) of 3 to ensure data integrity in a distributed cluster, so you start out with three copies.
Backups or secondary copies are usually created and maintained via snapshots (for example, using EBS snapshots as noted earlier). The database architecture means that when you take a snapshot, you’re really making three copies of the data. Without any deduplication, this gets really expensive.
Today there are solutions to solve the public cloud deduplication or data reduction conundrum. Permabit VDO can be easily deployed in public and/or private cloud solutions Take a look at the following blog from Tom Cook http://permabit.com/data-efficiency-in-public-clouds/ or for the technical details look at one from Louis Imershein http://permabit.com/effective-use-of-data-reduction-in-the-public-cloud/. Both provide examples and details on why and how to drive deduplication and compression solutions in a public cloud.
A few years ago, open source was the less-glamorous and low-cost alternative in the enterprise world, and no one would have taken the trouble to predict what its future could look like. Fast-forward to 2016, many of us will be amazed by how open source has become the de facto standard for nearly everything inside an enterprise. Open source today is the primary engine for innovation and business transformation. Cost is probably the last reason for an organisation to go in for open source.
An exclusive market study conducted by North Bridge and Black Duck brought some fascinating statistics a few months ago. In the study titled “Future of Open Source”, about 90% of surveyed organisations said that open source improves efficiency, interoperability and innovation. What is even more significant is the finding that the adoption of open source for production environments outpaced the proprietary software for the first time – more than 55% leverage OSS for production infrastructure.
OpenStack will rule the cloud world
OpenStack has already made its presence felt as an enterprise-class framework for the cloud. An independent study, commissioned by SUSE, reveals that 81 percent of senior IT professionals are planning to move or are already moving to OpenStack Private Cloud. What is more, the most innovative businesses and many Fortune 100 businesses have already adopted OpenStack for their production environment.
As cloud becomes the foundation on which your future business will be run, OpenStack gains the upper hand with its flexibility, agility, performance and efficiency. Significant cost reduction is another major consideration for organisations, especially the large enterprises. Because a proprietary cloud platform is excessively expensive to build and maintain and operations of Open Stack deliver baseline cost reductions. In addition data reduction in an Open Stack deployment can further reduce operating costs.
Open source to be at the core of digital transformation
Digital transformation is, in fact, one of the biggest headaches for CIOs because of its sheer heterogeneous and all-pervading nature. With the data at the center of digital transformation, it is often impossible for CIOs to ensure that the information that percolates down is insightful and secure at the same time. They need a platform which is scalable, flexible, allows innovations and is quick enough to turn around. This is exactly what Open Source promises. Not just that, with the current heterogeneous environments that exist in enterprises, interoperability is going to be the most critical factor.
Technologies like Internet of Things (IoT) and SMAC (social, mobile, analytics and cloud) will make data more valuable and voluminous. The diversity of devices and standards that will emerge will make open source a great fit for enterprises to truly leverage these trends. It is surprising to know that almost all ‘digital enterprises’ in the world are already using open source platforms and tools to a great extent. The pace of innovation that open source communities can bring to the table is unprecedented.
Open source-defined data centers
A recent research paper from IDC states that 85 percent of the surveyed enterprises globally consider open source to be the realistic or preferred solution for migrating to software-defined infrastructure (SDI). IDC also recommends to avoiding vendor lock-in by deploying open source solutions. Interestingly, many organisations seem to have already understood the benefits of open source clearly, with Linux adoption in the data centers growing steadily at a pace of 15-20%.
The key drivers of SDI – efficiency, scalability and reliability at minimal investment – can be achieved only with the adoption of open source platforms. Open source helps the enterprises to be agiler in building, deploying and maintaining applications. In the coming days, open source adoption is going to be essential for achieving true ‘zero-downtime’ in Software-Defined-Infrastructure.
The open source will have specifically large role to play in the software-defined-storage (SDS) space. It will help organisations in overcoming the current challenges associated with SDS. Open SDS solutions can scale infinitely without a need to refresh the entire platform or disrupt the existing functioning environment.
Data Reduction will easily be added to SDS or OS environments with Permabit VDO. A simple plug and play approach that will enable 2X or more storage reduction will add to the already efficient operations of open source deployments.
Open source to be decisive in enterprise DevOps journey
Today, software and applications have a direct impact on business success and performance. As a reason, development, testing, delivery, and maintenance of applications have become very crucial. In the customer-driven economy, it is imperative for organisations to have DevOps and containerisation technologies to increase release cycles and quality of applications.
Often, enterprises struggle to get the most out of DevOps model. The investment associated with replicating the production environments for testing the apps is not negligible. They also fail to ensure that the existing systems are not disturbed while running a testing environment within containers.
Industry analysts believe that microservices running in Docker-like containers, on an open and scalable cloud infrastructure are the future of applications. OpenStack-based cloud infrastructures are going to be an absolute necessity for enterprises for a successful DevOp journey. The flexibility and interoperability apart, the open cloud allows the DevOps team to reuse the same infrastructure as and when containers are created.
In 2017, it is expected to see open source becoming the first preference for organisations that are at the forefront of innovation.
The IT solutions market for cloud providers has nowhere to go but up.
A new forecast from IDC predicts that cloud IT infrastructure spending on servers, storage and network switches will jump 18.2 percent this year to reach $44.2 billion. Public clouds will generate 61 percent of that amount and off-premises private clouds will account for nearly 15 percent.
IDC research director Natalya Yezhkova, said that over the next few quarters, “growth in spending on cloud IT infrastructure will be driven by investments done by new hyperscale data centers opening across the globe and increasing activity of tier-two and regional service providers,” in a statement.
Additionally, businesses are also growing more adept at floating their own private clouds, she said. “Another significant boost to overall spending on cloud IT infrastructure will be coming from on-premises private cloud deployments as end users continue gaining knowledge and experience in setting up and managing cloud IT within their own data centers.”
Despite a 3 percent decline in spending on non-cloud IT infrastructure during 2017, the segment will still command the majority (57 percent) of all revenues. By 2020, however, the tables will turn.
Combined, the public and private data center infrastructure segments will reach a major tipping point in 2020, accounting for nearly 53 percent of the market, compared to just over 47 percent for traditional data center gear. Public cloud operators and private cloud environments will drive $48.1 billion in IT infrastructure sales by that year.
Indeed, the cloud computing market is growing by leaps and bounds.
The shifting sands are both predictable and evolutionary. Dominant data center spending has been platform specific and somewhat captive. As public cloud providers demonstrated, efficient data center operations are being deployed with white box platforms and high performance open -source software stacks that minimize costs and eliminate software bloat. Corporate IT professionals didn’t miss this evolution and have begun developing similar IT infrastructures. They are sourcing white box platform’s which are much less costly than branded platforms and combining them with open-source software including operating systems, software defined storage with data reduction that drives down storage consumption too. The result is a more efficient data center with less costly hardware and open-source software that drives down acquisition and operating costs.
The shift is occurring and the equilibrium between public and private clouds will change. Not just because of hardware but increasingly because of open-source software and the economic impact it has on building high density data centers that run more efficiently than the branded platforms.
There are so many data storage applications out there that whittling down the list to a handful was quite a challenge. In fact, it proved impossible.
So we are doing two stories on this subject. Even then, there are many good candidates that aren’t included. To narrow things down a little, therefore, we omitted back up, disaster recovery (DR), performance tuning, WAN optimization and similar applications. Otherwise, we’d have to cover just about every storage app around.
We also tried to eliminate cloud-based storage services as there are so many of them. But that wasn’t entirely possible because the lines between on-premise and cloud services are blurring as software defined storage encroaches further on the enterprise. As a result, storage services from the likes of Microsoft Azure, Amazon and one or two others are included.
Storage Spaces Direct (S2D) for Windows Server 2016 uses a new software storage bus to turn servers with local-attached drives into highly available and scalable software-defined storage. The Microsoft pitch is that this is done at a tiny fraction of the cost of a traditional SAN or NAS. It can be deployed in a converged or hyper-converged architecture to make deployment relatively simple. S2D also includes caching, storage tiering, erasure coding, RDMA networking and the use of NVMe drives mounted directly on the PCIe bus to boost performance.
“S2D allows software-defined storage to manage direct attached storage (SSD and HDD) including allocation, availability, capacity and performance optimization,” said Greg Schulz, an analyst at StorageIO Group. “It is integrated with the Windows Server operating systems, so it is leveraging familiar tools and expertise to support Windows, Hyper-V, SQL Server and other workloads.”
Red Hat’s data storage application for OpenStack is Red Hat Ceph Storage. It is an open, scalable, software-defined storage system that runs on industry-standard hardware. Designed to manage petabytes of data as well as cloud and emerging workloads, Ceph is integrated with OpenStack to offer a single platform for all its block, object, and file storage needs. Red Hat Ceph Storage is priced based on the amount of storage capacity under management.
“Ceph is a software answer to the traditional storage appliance, and it brings all the benefits of modern software – it’s scale-out, flexible, tunable, and programmable,” said Daniel Gilfix, product marketing, Red Hat Storage. “New workloads are driving businesses towards an increasingly software-defined datacenter. They need greater cost efficiency, more control of data, less time-consuming maintenance, strong data protection and the agility of the cloud.”
Gilfix is also a fan of Virtual Data Optimizer (VDO) software from Permabit. This data efficiency software uses compression, deduplucation and zero-elimination on the data you store, making it take up less space. It runs as a Linux kernel module, sitting underneath almost any software – including Gluster or Ceph. Pricing starts at $199 per node for up to 16 TB of storage. A 256 TB capacity-based license is available for $3,000.
“Just as server virtualization revolutionized the economics of compute, Permabit data reduction software has the potential to transform the economics of storage,” said Gilfix. “VDO software reduces the amount of disk space needed by 2:1 in most scenarios and up to 10:1 in virtual environments (vdisk).”
VMware vSAN is a great way to pool internal disks for vSphere environments. It extends virtualization to storage and is fully integrated with vSphere. Policy-based management is also included, so you can set per-VM policies and automate provisioning. Due to its huge partner ecosystem, it supports a wide range of applications, containers, cloud services and more. When combined with VMware NSX, a vSAN-powered software defined data center can extend on-premise storage and management services across different public clouds to give a more consistent experience.
OpenIO is described as all-in-one object storage and data processing. It is available as a software-only solution or via the OpenIO SLS (ServerLess Storage) platform. The software itself is open source and available online. It allows users to operate petabytes of object storage. It wraps storage, data protection and processing in one package that can run on any hardware. OpenIO’s tiering enables automated load-balancing and establishes large data lakes for such applications as analytics.
The SLS version is a storage appliance that combines high-capacity drives, a 40Gb/s Ethernet backend and Marvell Armada-3700 dual core ARM 1.2Ghz processors. It can host up to 96 nodes, each with a 3.5″ HDD or SSD. This offers a petabyte scale-out storage system in a 4U chassis.
StarWind Virtual SAN is a virtualization infrastructure targeted at SMBs, remote offices and branch offices, as well as cloud providers. It is said to cut down on the cost of storage virtualization using a technique that mirrors internal hard disks and flash between hypervisor servers. This software-defined storage approach is also designed for ease of use. Getting started requires two licensed nodes and can be expanded beyond that. It comes with asynchronous replication, in-line and offline deduplication, multi-tiered RAM and flash cache.
IBM Spectrum Virtualize deals with block-oriented virtual storage. It is available as standalone software or can be used to power IBM all-flash products. The software provides data services such as storage virtualization, thin provisioning, snapshots, cloning, replication, data copying and DR. It makes it possible to virtualize all storage on the same Intel hardware without any additional software or appliances.
“Spectrum Virtualize supports common data services such as snapshots and replication in nearly 400 heterogeneous storage arrays,” said David Hill, Mesabi Group. “It simplifies operational storage management and is available for x86 servers.”
Dell EMC Elastic Cloud Storage (ECS) is available as a software-defined storage appliance or as software that could be deployed on commodity hardware. This object storage platform provides support for object, file and HDFS. It is said to make app development faster via API accessible storage, and it also enables organizations to consolidate multiple storage systems and content archives into a single, globally accessible content repository that can host many applications.
NetApp ONTAP Cloud is a software-only storage service operating on the NetApp ONTAP storage platform that provides NFS, CIFS and iSCSI data management for the cloud. It includes a single interface to all ONTAP-based storage in the cloud and on premises via its Cloud Manager feature. It is also cloud-agnostic, i.e., it is said to offer enterprise-class data storage management across cloud vendors. Thus it aims to combine cloud flexibility with high availability. Business continuity features are also included.
Quantum’s longstanding StorNext software continues to find new avenues of application in the enterprise. StorNext 5 is targeted at the high-performance shared storage market. It is said to accelerate complex information workflows. The StorNext 5 file system can manage Xcellis workflow storage, extended online storage and tape archives via advanced data management capabilities. Billed as the industry’s fastest streaming file system and policy-based tiering software, it is designed for large sets of large files and complex information workflows.
Information Age previews the storage landscape in 2017 – from the technologies that businesses will implement to the new challenges they will face.
The enthusiastic outsourcing to the cloud by enterprise CIOs in 2016 will start to tail off in 2017, as finance directors discover that the high costs are not viable long-term. Board-level management will try to reconcile the alluring simplicity they bought into against the lack of visibility into hardware and operations.
As enterprises attempt to solve the issue of maximising a return for using the cloud, many will realise that the arrangement they are in may not be suitable across the board and seek to bring some of their data back in-house.
It will sink in that using cloud for small data sets can work really well in the enterprise, but as soon as the volume of data grows to a sizeable amount, the outsourced model becomes extremely costly.
Enterprises will extract the most value from their IT infrastructures through hybrid cloud in 2017, keeping a large amount of data on-premise using private cloud and leveraging key aspects of public cloud for distribution, crunching numbers and cloud compute, for example.
‘The combined cost of managing all storage from people, software and full infrastructure is getting very expensive as retention rates on varying storage systems differ,’ says Matt Starr, CTO at Spectra Logic. ‘There is also the added pressure of legislation and compliance as more people want or need to keep everything forever.
‘We predict no significant uptick on storage spend in 2017, and certainly no drastic doubling of spend,’ says Starr. ‘You will see the transition from rotational to flash. Budgets aren’t keeping up with the rates that data is increasing.’
The prospect of a hybrid data centre will, however, trigger more investment eventually. The model is a more efficient capacity tier based on pure object storage at the drive level and above this a combination of high-performance HDD (hard disk drives) and SSD (solid state drives).
Hybrid technology has been used successfully in laptops and desktop computers for years, but it’s only just beginning to be considered for enterprise-scale data centres.
While the industry is in the very early stages of implementing this new method for enterprise, Fagan expects 70% of new data centres to be hybrid by 2020.
‘This is a trend that I expect to briskly pick up pace,’ he says. ‘As the need for faster and more efficient storage becomes more pressing, we must all look to make smart plans for the inevitable data.
One “must have” is data reduction technologies. By applying data reduction to the software stack data density, costs and efficiency will improve. If Red Hat Linux is part of your strategy, deplpoying Permabit VDO data reduction is as easy as plug in and go. Reducing storage consumption, data center footprint and operating costs will drop by 50% or more.
Permabit VDO Delivers Record-setting Performance on Samsung’s NVMe Reference Design Platform CAMBRIDGE, Mass. , Dec. 21, 2016 /PRNewswire/ — Permabit Technology Corporation , the data reduction experts, announced today that its Virtual Data Optimizer (VDO) software for Linux has exceeded the 8GB/s performance throughput barrier for inline compression. This was accomplished running on a single Samsung NVMe All-Flash Reference Design node.
The latest version of VDO’s HIOPS Compression has been optimized to take advantage of today’s multi-core, multi-processor, scale-out architectures to deliver maximum performance in enterprise storage. To demonstrate this level of performance, Permabit combined VDO with Red Hat Ceph Storage software and 24 480GB Samsung PM953 U.2 NVMe PCIe SSDs (solid state drives) running on the Samsung NVMe Reference Design platform. Samsung Electronics is one of the first companies to offer U.2 Gen 3 X4 NVMe PCIe SSDs. The PM953 that was used in the testing also features nine watts TDP (Total Dissipated Power) and a Z-height of 7mm.
The resulting reference architecture delivered single-node performance of over 8 GB/s read and 3.6GB/s write performance under workloads generated by Ceph RADOS bench. These results are more than twice as fast as published compression performance numbers by proprietary single node storage arrays and were achieved without the use of hardware acceleration boards.
Today’s data center managers are increasingly turning to architectures built around Software-Defined Storage (SDS) to provide highly scalable solutions that control costs. SDS solutions (such as Red Hat Ceph Storage and Red Hat Gluster Storage) must be able to handle enterprise workloads such as databases, virtual servers and virtual desktops as well as, or better than, the proprietary systems that they are meant to replace. While data compression greatly reduces storage costs, one challenge up until now, has been finding a compression approach that could run at high-end enterprise speeds, on standard hardware in an open Linux environment. HIOPS compression technology, incorporated into VDO, addresses all of these requirements because it serves as a core service of the OS. Any SDS solution that runs on that OS can then scale out to support petabyte-sized deployments.
“Previous systems relied on proprietary hardware acceleration based on ASICs or FPGAs to deliver a similar level of performance. Permabit Labs has demonstrated for the first time that HIOPS compression can be achieved with industry-standard processors and platforms,” said Louis Imershein, VP Product for Permabit Technology Corporation. “We’re looking forward to also leveraging the full multi-node, scale-out capabilities of the Red Hat Ceph storage platform as we test further in 2017.”
Today Permabit announced that our Virtual Data Optimizer (VDO), using HIOPS Compression, achieved 8 GB/s throughput when tested on the Samsung NVMe Reference Design platform from Samsung Electronics, Ltd. This is astounding performance from a single node Intel server platform, particularly one that has no custom hardware acceleration feature for compression. As a follow on, I wanted to provide some background on why we wanted to collect these numbers, how we…
As the end of the year approaches, I’ve been thinking about trends we are seeing today and how they will impact data center purchase decisions and the storage industry in 2017. I’ve been following three industry shifts this year and I believe they will have major impact in 2017. Cost of Capacity over Performance For decades, data center managers have focused on the need for speed. As a result storage…
The 2017 Computer Weekly/TechTarget IT Priorities poll suggests the next 12 months will see enterprise IT buyers move to increase the hybrid-readiness of their datacentre facilities.
Connecting on-premise datacentre assets to public cloud resources will be a top investment priority for UK and European IT decision makers in 2017, research suggests.
According to the findings of the 2017 Computer Weekly/Tech Target IT Priorities survey, readying their on-premise infrastructure for hybrid cloud has been voted the number one datacentre investment priority by IT decision makers across the continent.