Introducing Nearline 2.0
In today’s post, I want to introduce the notion of “Nearline 2.0”. While the name might seem esoteric, this concept represents the logical evolution of older data warehouse and information lifecycle approaches that have struggled to maintain acceptable performance levels in the face of the increasingly intense “data tsunami” that looms over today’s business world. Whereas older archiving solutions based their viability on the declining prices of hardware and storage, and rigid “Nearline 1.0” solutions were primarily designed to work with transactional systems, Nearline 2.0 embraces the dynamism of a software and services approach to fully leverage the potential of large enterprise data architectures.
Looking back, we can now see that the older data management solutions presented a paradox: in order to mitigate performance issues and meet Service Level Agreements (SLA) with users, they actually prevented or limited ad-hoc access to data. On the basis of system monitoring and usage statistics, this inaccessible data was then declared to be unused, and this was cited as an excuse for locking it away entirely. In effect, users were told: “Since you can’t get at it, you can’t use it, and therefore we’re not going to give it to you”!
Nearline 2.0, by contrast, allows historical data to be accessed with near-online speeds, empowering business analysts to measure and perfect key business initiatives through analysis of actual historical details. In other words, Nearline 2.0 gives you all the data you want, when and how you want it. (And without impacting the performance of existing warehouse reporting systems!)
Aside from the obvious economic and environmental benefits of this software-centric approach and the associated best practices, the value of Nearline 2.0 can be assessed in terms of the core proposition cited by Tim O’Reilly when he coined the term “Web 2.0”:
“The value of the software is proportional to the scale and dynamism of the data it helps to manage.”
In this regard, Nearline 2.0 provides a number of important advantages over prior methodologies:
Keeps data accessible: Nearline 2.0 enables optimal performance from the online database while keeping all data easily accessible. This massively reduces the work required to identify, access and restore archived data, while minimizing the performance hit involved in doing so in a production environment.
Keeps the online database “lean”: Because Nearline 2.0 data can still be easily accessed by users at near-online speeds, it allows for much more recent data to be moved out of the online system than would be possible with archiving. This results in far better online system performance and greater flexibility to further support user requirements without performance trade-offs.
Relieves data management stress: Data can be moved to Nearline 2.0 without the substantial ongoing analysis of user access patterns that is usually required by archiving products. The process is typically based on a rule as simple as “move all data older than x months from the ten largest tables”.
Mitigates administrative risk: Unlike archived data, Nearline 2.0 data requires little or no additional ongoing administration, and no additional administrative intervention is required to access it.
Lets analysts be analysts: With Nearline 2.0, far less time is taken up in gaining access to key data and “cleansing it”, so much more time can be spent performing “what if” scenarios before recommending a course of action for the company. This improves not only the productivity but also the quality of work of key business analysts and statistical gurus.
Copes with data structure changes: Nearline 2.0 can easily deal with data model changes, making it possible to query data structured according to an older model alongside current data. With archive data, this would require considerable administrative work.
Leverages existing storage environments: Compared to older archiving products/strategies, the high degree of compression offered by nearline 2.0 greatly increases the amount of information that can be stored as well as the speed at which it can be accessed.
Keeps data private and secure: Nearline 2.0 has optional privacy and security packages that protect key information from being seen by ad-hoc business analysts (for example: names, social security numbers, credit card information).
In short, Nearline 2.0 offers a significant advantage over older Nearline 1.0 and archiving technologies. When data needs be removed from the online database in order to improve performance, but still needs to be readily accessible by users to conduct long-term analyses or to rebuild aggregates/KPIs/InfoCubes for period-over-period analysis, Nearline 2.0 is currently the only workable solution available.
In my next post, I’ll discuss more specifically how implementing a Nearline 2.0 solution can benefit both your data warehouse and your business.
Richard Grondin