Fast data: The future of big data
It’s not news that big data is getting bigger by the second. However, in addition to sheer volume, there is also increasing demand to take action faster than ever based on the data. An organization’s leaders want to gain a competitive advantage by turning raw data into actionable intelligence. How can they quickly and efficiently pull together huge volumes of data from dozens or even hundreds of isolated and disparate data sources? Much of this data is not of the traditional, structured variety but is, instead, being driven by the growth of the Internet of Things (IoT) and the collection of data from digital human interactions. According to IDC, by 2025 there will be 80 billion connected devices, from fewer than 20 billion today, with over 150,000 new connected devices being added every minute.
To handle the demand for speed and the volume of analytics, organization leaders are starting down the path towards human/digital interaction and cognitive applications that mine data in order to react to change. The first steps down this road are being taken with the adoption of technologies such as Apache Spark, along with machine learning and deep learning. But computing power is only part of the answer: accessing and managing all of this data can create a significant bottleneck.
An organization’s big data resides in tens or hundreds of isolated systems associated with different applications or serving different lines of business. Moreover, Hadoop, the most commonly used framework for big data analytics, requires data from other systems to be copied over to the Hadoop Distributed File System (HDFS). This is a time-consuming process during which data can get stale. It is also a waste of resources, since it results in multiple copies of the data – the original plus the HDFS copy.
A solution for managing big data
The solution to these data access and management challenges is a high-performance data and file management solution designed to support big data analytics. IBM has announced the IBM All Flash Elastic Storage Server (ESS) 5.2, whose solid state storage improves data bandwidth performance by 60 percent over previous solutions.
Incorporating IBM Spectrum Scale, ESS spans an organization’s data lakes, creating one unified data ocean with a single namespace against which to run analytics quickly and efficiently. It supports a wide variety of network protocols and provides the ability to automatically and transparently tier data across flash, disk, tape and cloud. Another important advantage of IBM Spectrum Scale is that it provides direct access for Hadoop to underlying data storage without requiring data to be copied over into an HDFS environment.
Rapid changes in the big data analytics ecosystem are being driven by open source and industry-wide improvements. IBM is partnering with Hortonworks, and IBM Spectrum Scale 4.2.3 has been certified with the Hortonworks Data Platform (HDP) 2.6/Ambari 2.5.
Managing multiple frameworks and versions requires advanced workload management. IBM Spectrum Scale software can be deployed with IBM Spectrum Conductor with Spark to provide a unique solution that optimizes performance, eases management and comes complete with Apache Spark.
In summary, the new IBM All-Flash Elastic Storage Server 5.2 expands the existing ESS family to provide industry-leading performance and efficiency in support of faster big data analytics and allows users to:
- Reduce performance bottlenecks on critical IT workloads such as backup.
- Run Hadoop and other big-data applications directly on enterprise storage.
- Share data across applications with unified storage for file and object data.
- Benefit from high-availability design for five nines of availability with faster rebuild of failed disks with erasure coding for declustered RAID technology and fully redundant data pathways.