Apache Spark introduced Dataset API that unified the programming experience, improving upon the performance/experience and reducing the learning curve for spark developers. This is a great link to get familiar with Dataset. If the link doesn’t work at when you are reading this post, google is your friend. I want to save time and get…
Read More →Tag: dataset
Spark 2.0 provides a more matured eco-system, a unified data abstraction API and setting some new benchmarks in performance boosts with some non-backward compatible changes. Here, we try to see some important things to learn/remember before we migrate our existing spark projects to spark 2.0. Following is not a complete list of points but presents…
Read More →