There’s no denying it. Right now, there’s a plethora of newly established and emerging analytical databases aimed at enabling organizations to capture – and act on (hopefully) – ever increasing amounts of data.
Getting to know Greenplum
The latest update of Yellowfin’s Business Intelligence (BI) software – Yellowfin 6.1 – was released in May this year, and included support for a number of new database types aimed at facilitating our customers to pursue their Big Data analytics goals. Greenplum was one of those newly supported databases.
Greenplum – a massively parallel processing (MPP) database based on the open source database PostgreSQL – began life as Metopa back in 2000. Scott Yara and Luke Lonergan officially founded Greenplum in June of 2003, with the first Greenplum Database released in January 2005. In July 2010, Greenplum was speedily acquired by EMC – which describes itself as an enabler that empowers organizations to deliver information technology as a service (ITaaS): “Through innovative products and services, EMC accelerates the journey to cloud computing, helping IT departments to store, manage, protect and analyze their most valuable asset — information — in a more agile, trusted and cost-efficient way.”
The deal was purportedly worth well over $300 million at the time of the agreement. So they’re clearly enterprising; and we like that. So why was EMC so quick to pounce on Greenplum, and outlay such a large sum of money for a technology company still in its relative infancy?
Why was EMC so quick to pounce?
In an official statement back in 2010, Pat Gelsinger, President and Chief Operating Officer of EMC Information Infrastructure Products, said: “The data warehousing world is about to change. Greenplum’s massively-parallel, scale-out architecture, along with its self-service consumption model, has enabled it to separate itself from the incumbent players and emerge as the leader in this industry shift toward ‘Big Data’ analytics. Greenplum’s market-leading technology provides customers, today, with a best-of-breed solution for tomorrow’s ‘Big-Data’ challenges.”
Essentially, EMC showed no hesitation in putting-up substantial funds for Greenplum because they were keenly aware of the increasing interest, willingness and ability of many organizations to collect and (attempt to) interpret growing amounts of data and numbers of data types.
A recent 2012 report by IDC – a prominent research, analysis and advisory firm – confirmed that EMC was able to accurately predict the uptick in demand for analytics technologies capable of analyzing larger data volumes. IDC reported that the worldwide business analytics market grew 14.1 percent in 2011, “driven by the attention-grabbing headlines for Big Data and more than three decades of evolutionary and revolutionary developments in technology”.
According to IDC, annual enterprise investment in hardware, software, cloud services technologies and IT staff has increased by 50 percent to $4 trillion since 2005 (as of 2011).
IDC attested to this growing demand in its fifth annual IDC Digital Universe Study (ironically sponsored in part by EMC). The study, released in late 2011, predicted that the volume of worldwide data is set to grow 50 fold by 2020. The study also concluded that current technologies and numbers of IT professionals would struggle to manage the rapid data growth.
“As an industry, we’ve done a tremendous job at lowering the cost of storing data. As a result, people and companies store more data," said David Reinsel, IDC’s vice president of storage and semiconductor research in an official statement relating to the 2011 report.
Reinsel cautioned that the real challenge, and opportunity, is in the ability for enterprise technology companies to enable organizations to not just store more information, but also extract better value from those ever-growing data stores via Big Data analytics.
“This is where real opportunities lie, and where some folks may miss the boat. As soon as Big Data success stories are advertised and people see that there is gold in their data… then you will find more companies desiring to put more data online,” he said.
Reinsel was quick to reemphasize the fact that if organizations simply continued to capture and store more data, but were unable to leverage actionable insights from it, they would realize meager ROI on their data initiatives. So how does Greenplum’s technological components enable organizations to utilize Big Data?
Greenplum: Let’s get technical
EMC’s Gelsinger references, and alludes to a number of other, underpinning technologies that have given Greenplum an edge in the analytical database marketplace. So what are these advantageous technologies? Well, they include:
- Its MPP architecture for Loading and Query performance that enables data science teams to unlock the value from Big Data stores, capable of storing and analyzing petabytes of data. It also provides automatic parallelization of data loading and queries. All data is automatically partitioned across all nodes of the system, and queries are planned and executed using all nodes working together in a highly coordinated fashion. Data is also partitioned across multiple segment servers, with each segment owning and managing a separate share of the data set – there’s no unnecessary overlap.
- Its Polymorphic Data Storage-MultiStorage/SSD Support allows customers to leverage multiple storage technologies to find a suitable balance between performance and cost.
- And, its Multi-level Partitioning with Dynamic Partitioning Elimination means that irrelevant partitions in a table are disregarded to reduce the overall amount of data scanned per query to yield faster query response times.
In fact, Greenplum claims that it offers its users up to 100 times faster performance compared to traditional RDBMS products. Not bad, right? So the technology’s impressive too.
But there’s more to Greenplum than its enterprising approach and technical prowess. Its attitude is also impressive and noteworthy.
Greenplum: A database specialist that understands the Big Data needs of the modern organization
EMC Greenplum’s Mark Burnard encapsulated Greenplum’s approach to Big Data in a candid interview with Analyst First’s Stephen Samild in February this year. You really get the sense that Greenplum understand the unique requirements of Big Data analytics.
The full interview can be found – and should be read – HERE >
“I think now the game’s changed, and a six month turnaround time for a BI project is too slow. Marketing teams want to be able to spin off new products, sometimes one or two a month. Telcos, for example, are spinning off new plans, new product bundles, new marketing messages and solutions that require a lot of agility in the billing system and the provisioning system. Their core systems have to be a lot more agile and the data warehouse has to keep up with that.
“They have to have a way of managing information that caters for ad-hoc analysis, for marketing guys running quick, nimble little models to come up with some market segment so they can generate a campaign to a particular demographic.
“The pace of change and the complexity of doing business has changed, and the classic model that we’ve had for data warehousing has not significantly changed.
“The problem now is that we’ve taken that disciplined model – which is fantastic for reporting on financial numbers to regulators – and we’ve extended it into the domains of HR, and marketing, and across the entire enterprise data model. We wanted to fit the entire enterprise into the data warehouse. What I think we’re seeing already happen is that the locus of subjects that fit into the traditional data warehouse is shrinking to include only those where that level of rigor is required for reporting and analysis – areas such as finance, risk, and other things that go to regulators. All the other stuff, which is the other 95 percent of the business, will probably end up in a much more flexible, dynamic platform which we’re starting to call an analytical warehouse or analytic warehouse.”
Yellowfin and Greenplum: Making Big Data analytics easy.