Relation/Set Theory transformations We will be playing with this following program to understand the three important set theory based transformations. package com.mishudi.learn.spark.dataframe; import java.util.Arrays; import org.apache.spark.SparkConf; import org.apache.spark.api.java.JavaRDD; import org.apache.spark.api.java.JavaSparkContext; import org.apache.spark.sql.DataFrame; import org.apache.spark.sql.SQLContext; public class RelationalOrSetTheoryTransformations { public static void main(String[] args) { SparkConf sparkConf = new SparkConf().setAppName(“RelationalOrSetTheoryTransformations”); JavaSparkContext ctx = new JavaSparkContext(sparkConf); //…
Read More →Tag: sparksql
Apache Spark Transformation – DataFrame DataFrame can be create from any structured dataset like JSON, relational table, parquet or an existing RDD with defined schema. Following program creates a DataFrame and queries using sql. Here is the json we will use to play with, copy these following lines into a file and save it inĀ <SPARK_HOME>/bin…
Read More →If you haven’t read the previous article about MapReduce, I’d highly recommend reading it because that will set a good foundation to appreciate Sparks existence. Apache Spark – Introduction I want to get to the practical exercises quickly and I think there are enough resources on the internet to explain theoretical view of the framework….
Read More →