Based in San Mateo, California, Rakuten Rewards is a shopping rewards company that makes money through affiliate marketing links across the web. In return, members earn reward points every time they make a purchase through a partner retailer and get cash back rewards.

Naturally this drives a lot of user insight data – hundreds of terabytes on active recall with more in cold storage, to be exact.

Also on InfoWorld: Snowflake review: A data warehouse made better in the cloud ]

In 2018 the business started to get serious about giving more users access to this insight – without having Python or Scala coding chops – while also reducing its capital expenditure on hardware, and started looking to the cloud.

‘SQL server machines don’t scale elegantly’

Formerly known as Ebates, the business was acquired in 2014 by the Japanese e-commerce giant Rakuten, and has been growing fast since, forcing a drive to modernize its technology stack and become more data-driven in the way it attracts and retains customers.

This starts with the architecture. In the past three years Rakuten Rewards has moved its big data estate from largely on-prem SQL to on-prem Hadoop to, today, a cloud data warehouse courtesy of Snowflake.

“SQL server machines don’t scale elegantly, so we went on-premises Hadoop with Cloudera, using Spark and Python to run ETL, and got some performance out of that,” VP for analytics at Rakuten Rewards, Mark Stange-Tregear, told InfoWorld.

“Managing that [Hadoop] structure is not trivial and somewhat complicated, so when we saw the cloud warehouses coming along we decided to move and have this centralized enterprise-level data warehouse and lake,” he said.

As former Bloomberg developer and big data consultant Mark Litwintschik argues in his blog post “Is Hadoop Dead?”, the world has moved on from Hadoop after the halcyon days of the early 2010’s.

Now, cloud frameworks which take much of the heavy lifting away from data engineering teams are proving more popular with enterprises looking to reduce the cost of having on-prem machines sit idle – and to streamline their analytics operations overall.

Moving on from Hadoop

So Stange-Tregear and lead data engineer Joji John decided in mid-2018 to start a major data migration from its core systems to the Snowflake cloud data warehouse on top of Amazon Web Services (AWS) public cloud infrastructure.

That migration started with the reporting layer and some of the most-used data sets across the business, before moving ETL and actual data generation workloads, all of which was completed towards the end of 2019, barring some more sensitive HR and credit card information.

#hadoop #sql #snowflake

Rakuten frees itself of Hadoop investment in two years
1.55 GEEK