Thursday, January 16, 2014

NoSQL, Part 2: Grappling With Big Data

NoSQL databases are generally better-suited to handling Big Data than RDBMSes are. For example, in an RDBMS, data for a given record is spread across many tables, requiring joins and careful coordination of a transaction across them all, said MongoDB's Kelly Stirman. That means transactions must be very sophisticated and be able to address a variety of failure scenarios.

The Pain of Big Data

Social media contribute 90 percent or so of the data available today, while the use of geographical information systems, including location-based data systems, is growing rapidly.
In 2012, about 2.5 quintillion bytes of data were being produced every day. Meanwhile, the size of data sets was increasing, from a comparatively paltry few terabytes to several petabytes and, now, to several terabytes.
Businesses want to analyze the data because they yield important customer information. Walmart, for instance, is reported to have exhaustive consumer data on more than 145 million Americans that it shares with more than 50 third parties.
Further, companies are increasingly using predictive analytics to increase customer profitability and reduce customer churn, and that again requires managing Big Data.
However, analyzing such large quantities of data requires improvements in queries, the accuracy of responses to those queries and the speed of those responses.

NoSQL Databases Make Simplicity a Virtue

The relatively simple architecture of NoSQL databases is simply better-suited than that of RDBMSes to handling Big Data.
In an RDBMS, data for a given record is spread across many tables, requiring joins and careful coordination of a transaction across them all, Kelly Stirman, director of product marketing at MongoDB, told TechNewsWorld. That means transactions must be very sophisticated and be able to address a variety of failure scenarios.
Performing joins and transactions across tables becomes increasingly difficult as an RDBMS scales.
"The schema flexibility and distributed nature of most NoSQL databases make them complementary in various types of Big Data environments," Nick Heudecker, a research director at Gartner, told TechNewsWorld.
For example, NoSQL databases using document store technologies such as Couchbase let users address the documents through unique keys that represent each document. Many also offer an application programming interface or query language that lets users retrieve documents based on their contents, however, while others allow for retrieval using MapReduce.
Global travel and tourism industry player Amadeus is running a pilot using Couchbase as "a very efficient key-value store," Dietmar Fauser, its vice president of Architecture, Quality & Governance divisions for Research and Development, told TechNewsWorld. "The document store aspects of Couchbase will be used in a second step."

Cutting Costs With Commodity Hardware

Another plus for NoSQL databases is that they are developed to run on clusters of commodity hardware, which is inexpensive to source and replace. That also means they are distributed and have no single point of failure.
"NoSQL is about building the next generation of operational databases that have to deal with a large data set that is semi-structured and needs a flexible data schema; a distributed scale-out architecture that provides elasticity and easy scaling; high performance and low latency for billions of users at Internet scale; and an always-on architecture that allows for upgrades and maintenance of a system on-the-fly with no maintenance downtimes," Rahim Yaseen, the company's senior vice president of engineering, told TechNewsWorld.
For example, Couchbase is a scale-out topology with "a true Shared Nothing cluster architecture" so there is no contention for centralized resources, Yaseen explained. To scale up, users just add nodes to the Couchbase cluster.

Is ACID Necessary?

By being distributed, fault-tolerant and run on clusters of commodity servers, NoSQL databases made a trade-off over ACID -- Atomicity, Consistency, Isolation and Durability -- properties and other issues.
Some NoSQL databases offer ACID while others don't; yet others offer partial ACID support.
Couchbase does not support full ACID because "for modern Internet applications with data and users at Internet scale, and with flexible schema data, it is more important to focus on consistency, durability and atomicity," Yaseen said.
MongoDB "provides strong consistency and guarantees ACID operations at the document level, which tends to be sufficient for most applications," the company's Stirman said.
"With very few exceptions, first-generation NoSQL databases do not support ACID transactions and therefore do not support SQL," Nick Lavezzo, cofounder of FoundationDB, told TechNewsWorld. "The lack of distributed ACID transaction support and, therefore, lack of perfect data consistency has been the biggest barrier holding NoSQL database technologies out of mission-critical tasks at large enterprises."
Some database vendors that do not support ACID transactions "confuse the issue by attempting to redefine ACID to mean some weaker set of guarantees than it has traditionally meant," Lavezzo stated.

Moving Towards Acidity

ACID transactions let FoundationDB support multiple data models on top of a single storage engine architecture, Lavezzo said.
New-generation distributed database technologies such as Google Spanner F1 and FoundationDB both support data models typical of NoSQL and also support true SQL, which requires ACID support, he pointed out.
However, the lack of ACID support will not be easily remedied.
"Having spent four years building FoundationDB -- and it apparently took Google about four years to build Spanner -- we know how hard this problem is," Lavezzo remarked. "Because of this, I don't see any NoSQL technologies adding true ACID transactions any time in the foreseeable future."

No comments: