Your browser doesn't support the features required by impress.js, so you are presented with a simplified version of this presentation.
For the best experience please use the latest Chrome, Safari or Firefox browser.
Real-Time Analytics
with HBase & Kiji
Case Study of Quicklizard.com
Quicklizard
eCommerce Business-Intelligence Platform
We help eCommerce store owners make the right decisions
and react to changes in their store & market-place
based on real-time and historical information
Measure anything that happens in an eCommerce store
Provide real-time and historical BI on events and trends
in the context of the client's store and market-place
Answer questions like:
- How many times was a product viewed or purchased?
- Were viewes and purchases part of a campaign?
- How well is a landing page performing?
- How should you order a list of products to maximize revenue?
- What is the optimal price for a product in comparison to competition?
To do that, we process 100s of thousands of events per-day
both in real-time and in batch using Storm, HBase & Hadoop
The Challenge:
- Handle 100s of thousands of daily events
- Generate real-time BI for incoming data
- Analyze historical data and generate useful insights over time
- Use historical trends to make real-time decisions
Our Solution
- Storm - Real-time event aggregation and decision making
- HBase - Real-time & historical aggregated data store
- Kiji - Data collection & analysis framework for HBase
- Hadoop M/R - Offline data processing and analysis
HBase
The Hadoop Database
We chose HBase because:
- It's exteremly robust & scalable
- It can handle huge amounts of data
- Perfect for data aggregation (counters, TTL, row scans)
- Supports Map/Reduce (thanks to Hadoop)
BUT....
We're using HBase with a bit of help from
Kiji
Real-time data collection, aggregation and analysis for HBase
Kiji simplifies
- HBase schema management
- Data collection (keys, TTLs, counters, history)
- Data retrieval
- Data aggregation and analysis (M/R)
In other words, Kiji makes it easier to use HBase as a real-time BI platform
The End Result
Short and long-term data is aggregated
in a scalable, robust data store (HBase)
This data is used for real-time BI and decision-making
Since it's readily available (no Hadoop M/R).
Historical data (logs) is analysed offline (Hadoop M/R)
And is fed back into real-time data store (HBase).
Historical data can be analyze and queried in near real-time
(coming soon thanks to Kiji...)
We can answer questions like:
"How well is my campaign performing now in comparison to last week?"
Or let you say things like:
"Email me if the conversion rate of white washing machines falls below 2%"
In Real-Time
and over large amounts of data
Thank You
Questions?