Tag: java

Relational SetTheory Tranformation
By: Date: August 12, 2017 Categories: Apache Spark Tags: , , , , , , , , , , , , , ,

Relation/Set Theory transformations We will be playing with this following program to understand the three important set theory based transformations. package com.mishudi.learn.spark.dataframe; import java.util.Arrays; import org.apache.spark.SparkConf; import org.apache.spark.api.java.JavaRDD; import org.apache.spark.api.java.JavaSparkContext; import org.apache.spark.sql.DataFrame; import org.apache.spark.sql.SQLContext; public class RelationalOrSetTheoryTransformations { public static void main(String[] args) { SparkConf sparkConf = new SparkConf().setAppName(“RelationalOrSetTheoryTransformations”); JavaSparkContext ctx = new JavaSparkContext(sparkConf); //…

Read More →
Apache Spark Transformation – DataFrame
By: Date: August 7, 2017 Categories: Apache Spark Tags: , , , , , , , ,

Apache Spark Transformation – DataFrame DataFrame can be create from any structured dataset like JSON, relational table, parquet or an existing RDD with defined schema. Following program creates a DataFrame and queries using sql. Here is the json we will use to play with, copy these following lines into a file and save it in <SPARK_HOME>/bin…

Read More →
General Spark Transformations
By: Date: July 31, 2017 Categories: Apache Spark Tags: , , , , , , , , , ,

Apache Spark Transformations In this post we will be focussing on general Apache Spark transformation against RDDs. We will keep it simple but try to go as deep as we can. Download link is provided at the bottom for you to run the programs and try it with your input. Goal is to get familiar…

Read More →
Anonymous inner class function implementation
By: Date: July 30, 2017 Categories: Apache Spark Tags: , , , , ,

The Word Count program in Java we saw here was written using lambda expression supported in Java 8. So, we passed functions are arguments to our transformation calls like mapToPair() and reduceByKey() etc. In this post we will try to write more detailed implementations of the lambda expressions that we used, as these are still fairly new…

Read More →
Apache Spark
By: Date: July 26, 2017 Categories: Apache Spark Tags: , , , , , ,

If you haven’t read the previous article about MapReduce, I’d highly recommend reading it because that will set a good foundation to appreciate Sparks existence. Apache Spark – Introduction I want to get to the practical exercises quickly and I think there are enough resources on the internet to explain theoretical view of the framework….

Read More →
MapReduce – Quick Intro
By: Date: July 25, 2017 Categories: Apache Spark Tags: , , ,

MapReduce – Quick Intro If you are reading this page, then I assume you have heard about MapReduce. Let us understand MR framework quickly, as understanding of this is much needed for someone to appreciate Apache Spark. MapReduce is the core de facto data processing framework of Apache Hadoop. The beauty of this framework was…

Read More →