Tag: hadoop

Migrating to Spark 2.0
By: Date: October 4, 2017 Categories: Apache Spark Tags: , , , , , , , ,

Spark 2.0 provides a more matured eco-system, a unified data abstraction API and setting some new benchmarks in performance boosts with some non-backward compatible changes. Here, we try to see some important things to learn/remember before we migrate our existing spark projects to spark 2.0. Following is not a complete list of points but presents…

Read More →
MapReduce – Quick Intro
By: Date: July 25, 2017 Categories: Apache Spark Tags: , , ,

MapReduce – Quick Intro If you are reading this page, then I assume you have heard about MapReduce. Let us understand MR framework quickly, as understanding of this is much needed for someone to appreciate Apache Spark. MapReduce is the core de facto data processing framework of Apache Hadoop. The beauty of this framework was…

Read More →