Intelligent Information Management Part 2
In my previous post, I quickly introduced the concept of Intelligent Information Management. In today’s post, I discuss Information Lifecycle Management (ILM). ILM is one component of IIM best practices, dealing with the management of data from the moment of its creation up to its disposal. Studies have demonstrated that the rate of access for a given data set drops dramatically after 90 days. In fact some studies claim that currently less than 30% of the data in an enterprise data warehouse (EDW) is actively accessed by users. However, some organizations are responding to data retention regulations by storing data that is accessed very rarely in the data warehouse “just in case”, causing unnecessary database growth and increased TCO. These organizations are essentially using the EDW as a storage device – a very expensive one indeed!
According to ILM principles, data should be classified based on how it is used, and then assigned the corresponding level of service. This could mean archiving or moving the data to a less expensive storage media very quickly. A technique using layered data storage that offers a range of data access speeds – for example, hierarchical storage management – is one way of implementing different service levels. Another approach involves defining data retention rules that classify data according to usage patterns, and moving it accordingly between different data storage providers (hardware and software) until it is deleted when no longer useful. This overall data management process could be described as “data right-sizing”.
Various commercial software packages, such as Solix, HP/RIM, PBS or Optim Solution from IBM, focus on this kind of functionality, but are mainly designed for use with OLTP and ERP systems. Hardware vendors like EMC (Documentum and Centera) and Network Appliances, sometimes in conjunction with software providers such as OpenText, already have a good handle on the problem when it comes to the unstructured and semi-structured data that represents a large portion of the data generated by enterprises. Some RDBMS’s have introduced this concept into their data management tools through the use of data partitioning and compression techniques. SAP has focused on the Enterprise Data Warehouse in introducing the concept of Near Line Storage to enable implementation of ILM in SAP NetWeaver BI. Also, niche players like CopperEye have developed solutions to deal with specific types of data produced by the operation of existing systems, such as application logs or system logs.
The main questions that need to asked about these different approaches are:
- How easy will be to get access to the data when required?
- How easy will it be to apply data model changes to the active data while continuing to have access to the historical data?
- Is the data secure?
- Can the data be used for an audit?
- Can data access be tracked to show who accessed what, and when?
- Can the overall TCO for enterprise data be reduced?
- Can the performance be improved for active (frequently used) data?
My next post will go deeper in answering these questions, and will also describe more advanced techniques for extending Information Lifecycle Management to implement a full Intelligent Information Management approach. This can not only reduce overall TCO, but also improve the ability of the IT organization to deliver “just-in-time” Business Intelligence.