With the advent of Apache Spark, there is a real need to understand what it is and how it can be of benefit to organizations as they continue to put data and analytics at the heart of their business. While some may tout Spark as the new “Swiss Army knife for analytics,” others claim that it is not the panacea to data processing and high performance data analytics that people once believed.
So, while we may like the idea of a simple solution to solve a complex problem, it’s important to delve into the detail just to understand what indeed the truth is. In this case, what Spark is, what it does and what it cannot do.
This paper sets out to discuss Apache Spark as well as Spark SQL, and explain their benefits as well as their shortcomings when it comes to data analytics on large and complex data volumes. Simply put, it explains why it is important that organizations do not get blinded into thinking one solution solves all problems but rather use the “right tool for the right job.”