ERP Insights >> Magazine  >> August - 2016 issue

Data Offloading - A Growing Trend

Author : Rajesh Krishnamoorthy, Head of IT
Friday, September 16, 2016

Rajesh Krishnamoorthy, Head of IT

Uncertain economic conditions around the world, as well as changing customer behavior and availability of information, have brought a sea change in the way data and especially analytics (prescriptive and predictive) are looked at. Gone are the days when the whole purpose of building a data warehouse was to provide the department heads and C-Level executives' reports while comparing performance on a set level of KPIs.

In today's world, any data store is looked on as a key weapon in the fight for survival and growth. And when data engineering takes this perspective, it is a golden opportunity for product vendors to sell more licenses and for engineering to acquire more cores. Yet, at some point, the CFO will step-in to question the ROI and how has the spent been utilized. Organizations are distant from answering this question to any degree of satisfaction other than firmly believing that the data that comes in should not be discarded.

In this conflict between the financial controllers and the IT/Business teams, the usual result is a delay in the acquisition of new licenses/new infrastructure and the impact is felt by on-the-ground teams in the sense of slowing ETL (Extract, Transform, Load) jobs, slow running reports, un-updated/partially updated cubes and the fraying nerves of the NOC (Network Operations Center) team in the face of increasing demands vis-�-vis inadequate architecture.

To provide at least a partial panacea for this, two technologies have matured: Cloud and Big Data.

Cloud: The cloud has proved to be the game-changer in the application and innovation space, reducing the cost and effort of entry while maturing to the point of being useful at an enterprise level. Cloud providers like Amazon Web Services and Azure have changed the way IT teams look at storage and processing acquisition, utilizing a pay-as-you-go model vs a license up-front model. Security is handled as a first-class citizen with VPNs and Ad integrations provided making it extremely secure. From a DW/BI perspective, one has to look at AWS�s Redshift and EMR to see how to quickly and easily set up a pay-as-you-go RDBMS or Big Data based warehouse or a combination.

Big Data: 2015 was the year when the big data distributions finally stepped up and provided a suite of features that could co-exist with enterprise systems at extremely low storage and processing costs. Reliable SQL on Hadoop (Apache Drill, SparkSQL), support for a variety of programming languages and a huge variety of connectors to front-end tools have made this a must-have option for BI Infrastructure teams.
The third aspect of this is the integrators. Most data integration tools like Pentaho, Informatica, and DataStage support Cloud Databases and Big Data distributions like Cloudera, EMR, HortonWorks and others, out of the box with the capability to run map-reduce jobs in these systems.
The combination of Cloud, Big Data, and Integration decouples data from size to a large extent and play with the various permutations of cost vs time vs flexibility to arrive at the optimum for the organization.

Data Offloading � Why?
With that said, data offloading is simply the practice of moving parts of the data warehouse to cheaper storage (mostly likely with a Cloud DB/Big Data destination). What are the benefits of this approach to minimizing license costs? Why not move the entire data warehouse to the cloud? What parts can bemoved?
While the idea of moving totally to the cloud is very attractive, there are sunk costs in the current infrastructure that should continue to be utilized to increase data ROI. The goal is to have flexibility in controlling costs and not just locking oneself with a new vendor.
The other key issue is the re-establishment of all the application sources and sinks, and retraining all current users when doing a full migration.
Additionally, the issue with a full migration scenario is the cost of running the current infrastructure while also planning and implementing a similar sized one in the cloud, thus significantly increasing spend while not increasing utility.

Strategy

To ensure a smooth offloading experience, one must understand the data warehouse and its usage in its entirety. The usual ways of analysis include a top-down (go through the usage and identify the data targets) and bottom up (understanding the data and seeing how frequently it\\\'s used in reports � whether raw or computed)
The next step is to identify shards of data that can be offloaded. To start with these would be unused or least used data. But in time, one can conceivably store augmented data in a hybrid database based on the business demands of the enterprise.
After this, ETL jobs are built to create an initial load and then incremental updates of offloaded data while purging it from the original data warehouse.
The usual way is, to begin with a Proof of Concept and extend on it on the offloading journey.

Conclusion

Organizations challenged with overburdened EDWs need cost-effective and reliable solutions that can offload the heavy lifting of ETL processing from the data warehouse to an alternative environment that can manage the large data sets. Data offloading is an excellent technique to ensure the optimum spends on data infrastructure while ensuring that existing data analytics endeavors are not interfered with.

Facebook