|
As you can see the design of this site has changed! We are also changing the way people interact on this site. From now on - 1. All the help needed postings should be done in the forums. These rules are to make your networking better and easier. |
How Google's Technology could possibly Change SAP BI
- dkaps's blog
- Login or register to post comments
Posted on August 9th, 2006
Query Execution and Efficiency is a major
concern in organizations with large amounts of data! SAP has come out with a Business Intelligence Accelerator (BIA) tool
that does a good job.. but Google's technique and architecture is still the best!
One of the main pain points for SAP NetWeaver BI ( SAP BI) has been its data access
performance due to its sophisticated relational OLAP model. That model works well for small to mid sized business
intelligence operations. Now with the BI accelerator, large SAP BI implementations will enjoy super-high data access speed
that is necessary to implement 'real-time' embedded analytics for intelligent business solutions, a key to SAP Enterprise
Services Architecture (ESA) based future business solutions.
href="https://www.sdn.sap.com/irj/sdn/weblogs?blog=/pub/wlg/4060" target="_blank">Dan McWeeney
using BIA and Google's architecture and how they could change the query performances!
What happens today when a user executes a query? ( let's assume
no aggregates because as we all know the BIA does away with aggregates
):
- BW reads the query definition and presents some user interface to the
user.
- The user types some stuff in the boxes and hits execute
- The OLAP processor looks at the query
definition and splits the query into parts
- It then executes a select statement on the database to pull all the
records it needs to suffice the query
- It then summarizes those records up to the exact table the user asked
for
How can Google change this?
First they can take steps 3 through 5 and make use
of the GFS and Map/Reduce to remove this from the BI server altogether. (Sounds familiar doesn't it? BIA anyone? ) The
beauty of the Google model is the fault tolerance and the ability to easily take large tasks and fracture them into little
parts. The way this would be distributed would probably work something like this ( keeping parts 1 and 2 --- for now
):
- A client program constructs a map function that is based on the meta data for the GFS cluster
that the data resides in determines what needs to be read and where
- The reduce function handles all the
calculations for the query execution summarizing the records on the key the user requested ( this could be a complex
calculation not just limited to the Default Aggregation allowed in BW )
- This map/reduce is sent out to a massive
cluster of machines that in turn reads their chunks of data and then being to perform the reduce function
- The
client program gets each one of the chunks performs a final reduce and passes back the exact solution set to the user
query
This system would be highly fault tolerant, totally scalable and more then likely blow the doors off
anything we have today. On top of that, the way the data can now be summarized is endless as it would be generated as a
function to be called for each one of the smaller data sets and not restricted by database operations. There would be no need
for aggregates as the granular data can be search so fast by the "swarm" of machines. In reality this is similar to what the
BIA does, just on a much smaller scale, it's using a handful of blades all linked into some proprietary piece of hardware,
Google does it with dirt cheap off the shelf components and on a much larger scale.
This approach allows much
greater flexibility in terms of analysis and at some point in the future will allow the user to just ask for what they want
instead of having to navigate around this query thingy and that is is probably the "Holy Grail" of
BI.
