Big Data: yet another “game-changer” IT pros must grapple with these days. But not in the usual way.
Companies like Google and Facebook are demonstrating that a solid data management strategy can make a huge difference to a company’s bottom line. Corporations everywhere are paying attention; C-level executives are increasingly using insights gained from analyzing Big Data to make business decisions. As a result, companies are promoting IT from cost center to partner in strategic data management.
The term Big Data refers to the vast amounts of unstructured data that result from people’s interactions with the Internet, social media and mobile apps. It’s the kind of data that doesn’t fit neatly into rows and columns with clear relationships on which simple queries and reports can be based.
More and more, IT managers on the front lines are actively participating in efforts to extract meaning from the Big Data companies collect and store. Therefore, IT managers would do well to learn all they can about Big Data and what can be done to help their company mold a solid data management strategy.
Making sense of Big Data
Examples of Big Data are videos, images, transactions, web pages, email, social media content, click-stream data, search indexes, sensor data, etc. – a wide variety of raw, semi-structured and unstructured data that can’t be processed and analyzed using traditional processes and tools, like relational databases.
But the term Big Data also refers to the volume and velocity of the data generated today. IBM, in its e-book, Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data, explains it this way: the interconnectivity of people and things via technology generates data continuously; technology makes it possible to collect a massive amount of data; but, most of this data isn’t relational and can’t be processed by traditional database systems. Moreover, much of it needs to be analyzed in real time. According to this definition, Big Data encompasses data at rest and data in motion.
So it’s no small wonder that Big Data is so unwieldy. The challenge is to formulate the right questions to extract meaning out of terabytes, even petabytes (and some day zettabytes!) of data — data organizations feel compelled to collect and store even though its value is not always immediately known. For some companies, putting two and two together may be the only thing standing in the way of greatness.
Except making that connection is really hard. It’s expensive and time consuming to use traditional database tools to analyze Big Data, and it’s not always possible – there might be too much data in too many different formats. Plus, there’s a steep learning curve when it comes to Big Data – new tools require a new set of expertise.
- Traditional relational databases cannot deal effectively with Big Data, making it necessary to search for alternatives. New processes and tools may be required to manage the Big Data flowing back and forth across the enterprise, and to process and analyze it.
- New storage systems may be needed as well. Data must be collected and stored whether its value is immediately known or not.
- A business person submitting a written request to IT no longer makes sense when the need for real time analysis of data in motion is factored in. So, new tools must be acquired or developed in-house for end users to get at the information they need on their own.
- Managers may need to add staff or train the current staff to bring the department up to speed on the hardware and software needed to handle Big Data.
Turning Big Data into something useful
For companies to get a handle on Big Data and use it to boost their productivity, IT must collaborate with the business side of the organization to develop a data management strategy that will deliver measurable results.
The components of an effective data management strategy are: 1) storage, 2) security, 3) data reconciliation, 4) information extraction, and 5) insight distribution throughout organization.
While it may seem unnecessary to collect and store every piece of information ever generated, the Economist Intelligence Unit recently reported that “many industry experts believe that larger data sets are beneficial for comprehensive analysis and that new technologies are speeding up the results more effectively.”
Given that large amounts of data provide more reliable insights, storage is an integral part of a robust data management solution. Big Data storage refers to types of storage that can handle huge volumes of unstructured data. Disk storage is not well-suited for Big Data because of its cost, lack of scalability and latency issues. Therefore, shared storage in a cloud-based environment is a better option.
Storage platforms designed specifically to organize the many, many server racks necessary to store Big Data are required. The leading platforms include:
NoSQL databases, also called non-relational or cloud databases, are replacing traditional relational databases as the tool of choice for data management. Amazon DynamoDB is an example of a hosted, cloud-based database management solution.
The security architecture needed to secure a company’s Big Data must be considered from the very beginning with access control and encryption built in. Depending on the information collected, individuals and the company could be at risk if it were to leak outside the organization. Each piece of information collected should be encrypted as it is captured and stored with the credentials necessary to access that data, to protect the confidentiality and integrity of the data and to satisfy regulatory requirements.
Once the storage and security architectures are planned out, the next step is to introduce standards, validate and verify the data, and find ways to reduce redundancies. The data from various systems across the enterprise housed in information “silos” must be integrated as well. As the Economist Intelligence Unit report points out, “There are as many uses of data as there are types of data. They can inform strategy, increase efficiency, identify markets and enhance customer experiences. None of these can be accomplished, however, unless the data are clean, accurate and reliable.”
As previously mentioned, the real challenge of Big Data is to extract meaning out of the massive volumes of data collected. Sophisticated business intelligence software is needed to accomplish this task. Some Big Data tools that analyze data at rest recently reviewed by CIO.com include:
- Jaspersoft BI Suite
- Pentaho Business Analytics
- Karmasphere Studio and Analyst
- Talend Open Studio
- Skytree Server
- Tableau Desktop and Server
Analysis of Big Data in motion can be performed with the right technology. Special systems are required to handle a constant flow of data like that generated by social networking websites and sensor data. “Stream computing” refers to a high-performance system that can take in multiple data streams from various sources, process it, and then spit it back out as a single stream of data. IBM is a leader in this area.
Insight distribution refers to the need for processes and tools to disseminate the information gleaned from Big Data throughout the organization. Leaders need the information at their disposal in order to make better business decisions. This is where new tools for end users to get on-demand, real time analysis on their own become important.
Successfully Navigating Big Data
While analysis of Big Data has the potential to provide actionable insight that can generate financial windfalls for companies, if a compelling business case can’t be made to justify the project, it may be doomed from the start, says Jill Dyche, Vice-President of Thought Leadership at DataFlux Corporation, in a recent blog post. Dyche advises companies to think hard about the answers to these five questions when contemplating an investment in Big Data:
- What are the goals of the project and what does the company want Big Data to help it accomplish?
- What current resources can the company build on to develop a comprehensive data management strategy?
- How will the company avoid scope-creep?
- What are the criteria for success and how will progress be measured along the way?
- Can the company manage the structural and process changes that will inevitably result?
If the company can answer these questions to its satisfaction, then chances are developing a solid data management strategy to deal with Big Data is worth it.