Apache Spark Transformation – DataFrame DataFrame can be create from any structured dataset like JSON, relational table, parquet or an existing RDD with defined schema. Following program creates a DataFrame and queries using sql. Here is the json we will use to play with, copy these following lines into a file and save it in <SPARK_HOME>/bin…
Read More →Tag: transformation
Apache Spark DataFrame So, lets recall RDD(Resilient Distributed Datasets)? It is an immutable distributed collection of objects, it is an Interface. OK! we have also seen how to apply transformations in previous post. They are amazing! as they give us all the flexibility to deal with almost any kind of data; unstructured, semi structured and structured…
Read More →Apache Spark Transformations In this post we will be focussing on general Apache Spark transformation against RDDs. We will keep it simple but try to go as deep as we can. Download link is provided at the bottom for you to run the programs and try it with your input. Goal is to get familiar…
Read More →If you haven’t read the previous article about MapReduce, I’d highly recommend reading it because that will set a good foundation to appreciate Sparks existence. Apache Spark – Introduction I want to get to the practical exercises quickly and I think there are enough resources on the internet to explain theoretical view of the framework….
Read More →