Tag: sparksql

Relational SetTheory Tranformation
By: Date: August 12, 2017 Categories: Apache Spark Tags: , , , , , , , , , , , , , ,

Relation/Set Theory transformations We will be playing with this following program to understand the three important set theory based transformations. package com.mishudi.learn.spark.dataframe; import java.util.Arrays; import org.apache.spark.SparkConf; import org.apache.spark.api.java.JavaRDD; import org.apache.spark.api.java.JavaSparkContext; import org.apache.spark.sql.DataFrame; import org.apache.spark.sql.SQLContext; public class RelationalOrSetTheoryTransformations { public static void main(String[] args) { SparkConf sparkConf = new SparkConf().setAppName(“RelationalOrSetTheoryTransformations”); JavaSparkContext ctx = new JavaSparkContext(sparkConf); //…

Read More →
Apache Spark Transformation – DataFrame
By: Date: August 7, 2017 Categories: Apache Spark Tags: , , , , , , , ,

Apache Spark Transformation – DataFrame DataFrame can be create from any structured dataset like JSON, relational table, parquet or an existing RDD with defined schema. Following program creates a DataFrame and queries using sql. Here is the json we will use to play with, copy these following lines into a file and save it inĀ <SPARK_HOME>/bin…

Read More →
Apache Spark
By: Date: July 26, 2017 Categories: Apache Spark Tags: , , , , , ,

If you haven’t read the previous article about MapReduce, I’d highly recommend reading it because that will set a good foundation to appreciate Sparks existence. Apache Spark – Introduction I want to get to the practical exercises quickly and I think there are enough resources on the internet to explain theoretical view of the framework….

Read More →