Nearline 2.0 Best Practices

Home »  SAND News »  SAND Blogs »  Richard Grondin »  Nearline 2.0 Best Practices

Nearline 2.0 Best Practices

On October 13, 2008, Posted by , In Richard Grondin, By , With No Comments

In previous posts, we introduced the concept of Nearline 2.0, showed how it represented a significant step forward from traditional archiving practices, and discussed how Nearline 2.0 could help your business. To recapitulate: the major advantage of Nearline 2.0 is its superior data access performance, which enables a more aggressive approach to migrating data out of the online repository to nearline (a process known as “data nearlining”) than is practical when using a traditional archiving product.

Today, I will be considering the question of when an enterprise should consider implementing a Nearline 2.0 solution. Broadly speaking, such implementations fall into two categories: they either offer a “cure” for an existing data management problem or represent a proactive implementation of data best practices within the organization.

Cure or Prevention?

The “cure” type of implementation is typically associated with a data warehouse “rescue” project. This is undertaken when the data warehouse grows to a point where database size causes major performance problems and affects the warehouse’s ability to meet Service Level Agreements (SLAs). In these kinds of situation, it is mainly the operations division of the organization that is affected, and who demand an immediate fix that can take the form of a Nearline implementation. The question here is: How quickly can the “cure” implementation stabilize warehouse performance and ensure satisfaction of SLAs?

On the other hand, the best practice approach, much like current practices related to healthy living, focuses on prevention rather than on curing. In this respect, best practices dictate that the Nearline 2.0 implementation should start as soon as some of the data in the data warehouse becomes “infrequently accessed”. Normally, this means data older than 90 days, since the access rate for granular data older than 90 days is usually minimal. The main idea is to keep the size of the data warehouse from inflating for no good business reason, by nearlining data as soon as possible. Ultimately this should work to protect the enterprise from an operational crisis arising from deteriorating performance and unmet SLAs.

In order to better judge the impact of using either of these two approaches, it is important to understand the various steps involved in the “Data Nearlining” process. What do we find when we “dissect” the process of nearlining data?

Dissecting the “Data Nearlining” Process

“Data Nearlining” involves multiple processes, whose performance characteristics can significantly influence the speed at which data is migrated out of the online database. The various processes can be grouped into two major steps: data extraction and database housekeeping.

Data Extraction

  • The first step (optional in some cases) is to lock the data that is targeted by the nearlining process, in order to ensure that the data is not modified while the process is going on.
  • Next comes the extraction of the data to be migrated. This is usually achieved via an SQL statement based on business rules for data migration. Often, the extraction can be performed using multiple extraction/consumer processes working in parallel.
  • The next step is to secure the newly extracted data, so that it is recoverable.
  • Then, the integrity of the extracted data must be validated (normally by comparing it to its online counterpart).

Database Housekeeping

  • Next, delete the online data that has been moved to nearline.
  • Then, reorganize the tablespace of the deleted data.
  • Finally, rebuild/reorganize the index associated with the online table from which data has been nearlined.

The Database Housekeeping process is often the slowest part of a Data Nearlining process, and thus can dictate the pace and scheduling of the implementation. In a production environment, the database housekeeping process is frequently decoupled from ongoing operations and performed over a weekend. It may be surprising to learn that deleting data can be a more expensive process than inserting it, but just ask an enterprise DBA about what is involved in deleting 1 TB from an Enterprise Data Warehouse and see what answer you get: for many, the task of fitting such a process into standard Batch Windows would be a nightmare.

So, it is easy to see that starting earlier in implementing Nearline 2.0 as a best practice can help to massively reduce not only the cost of the implementation, but also the time required to perform it. Therefore, the main recommendation to take away from this discussion is: Don’t wait too long to consider embarking on your Nearline 2.0 strategy!

That’s it for today. In my next post, I will take up the topic of which data should be initially considered as a candidate for migration

Leave a Reply