SAND

What to look for when you’re considering column store

Home »  SAND News »  How To »  Steven Green »  What to look for when you’re considering column store

What to look for when you’re considering column store

On May 24, 2011, Posted by , In Steven Green, With No Comments

The first item you need to consider is why choose a column store over the standard row databases such as Oracle or DB2, or even the data partitioning varieties such as Teradata or Netezza.

###1. Do you need a column store?

Generally, row-oriented architectures are geared towards OLTP-type workloads which, as the acronym implies, are optimized for row based transactions. Column stores are optimal for OLAP-like workloads (e.g., data warehouses) which typically involve complex queries over large data volumes (possibly terabytes) by large user populations (possibly tens of thousands). If you have an environment which is more taxed by queries than loads, where the queries involve multiple fields (5, 20, or hundreds) out of tables that may have 1000s of columns, and where the queries may touch millions of records, a column store is ideal, especially if you do not know what the queries are in advance.

The main differentiators can be summed up as:

– Column-oriented systems are more efficient when an aggregate needs to be computed over many rows, for fewer columns than an entire row
– Column-oriented systems are more efficient when new values of a column are supplied for many rows at once, because that column can be updated efficiently without touching any other columns
– Column-oriented systems are more efficient at allowing multiple departments or applications to access the same data at the same time
– Column-oriented systems are more efficient at running ad-hoc queries, drilling down along a trend or mining analysis.
– Column-oriented systems have been shown to be more effective at aggregating the data.
– Row-oriented systems may be more efficient when most of the columns of a single row are required at the same time, and when row-size is relatively small, as the entire row can be retrieved with a single disk seek.
– Row-oriented systems may be more efficient when writing an entire new row if all of the column data is supplied at the same time

###Which column store do you need?

So now you realize you need a column store, which one should you choose? The criteria for a column store brings us back to the needs of a data warehouse: You need to store large volumes of data somewhere where you can have many concurrent users asking questions, which are not known in advance, with rapid response times that provide insight into where the company should be going. These criteria may include, but are not limited to:

– Support for wide tables (thousands of fields)
– No need for additional indexing or tuning
– Low administration, maintenance, or learning curve requirements
– Optimized for unknown queries
– Scalable on the number of concurrent queries
– Concurrent query and load/update capabilities
– Dynamically adjust to increased volumes of data
– Leverage hardware processing power
– Low I/O requirements
– High levels of compression
– Support standard platforms

Of course, once the basic need-to-have capabilities are covered, you can then look for the nice-to-have features such as:

– Personal views of the data
– 24/7 up time
– Instantaneous version control
– Secure with encryption, SSL, LDAP support
– Predictive modeling
– Text search
– Run across multiple platforms simultaneously (Linux, AIX, HP UX, Solaris, Windows)

The end result is finding the best product for the right environment, giving you the ability to get the most out of your valuable information.

Leave a Reply