Our survey about data analytics show, that Apache Spark has indeed become a notable player in the field of Loading...big data.
In our previous survey about data analytics, we found that performance is the top priority for our customers. Our second survey put the focus on Apache Spark. And, as the results show, Spark has indeed become a notable player in the field of big data. However, on closer inspection, it soon becomes apparent that Spark and Exasol are suited for different use cases and follow unique approaches.
There is more to themes such as big data, analytics and Loading...data science than first meets the eye. Indeed, it is all too apparent that the belief in a one-size-fits-all tool that covers all use cases is a misguided one; new applications and tools appear on the market every day offering innovative solutions to process and analyze data that make it possible to extract valuable insights for future business decisions.
Spark has created quite the hype following, there’s no doubt about it. The open source solution, based on a modern in-memory approach for distributed data processing, can be used for many different purposes. As such, 16% of participants in our survey said they are currently deploying Spark. Furthermore, 64% of the respondents believe that Spark is suited for deployment in certain departments or for certain tasks. Consequently, Spark is suited best as a complement to Loading...Hadoop and can even partially replace some components and Hadoop concepts. For example, Spark can mean that users no longer have to run 10-year old “MapReduce” components in their Hadoop infrastructure; Spark is a much better fit for the fast development cycles expected from a framework in a fast evolving digital landscape. 10% of respondents believe that Spark is suited more to data processing instead of analytics, and 6 % think that Spark does not offer all the capabilities required for deep-dive data analytics. The remaining answers were split between Spark not being mature enough as a technology or not providing sufficient performance.
Which uses cases are the solutions best suited for? Where does Exasol perform best?
Regardless of the percentages, we must first consider which specific problems need to be solved and what the elementary requirements for a solution are before testing and deploying one. For Data Scientists, for instance, Apache Spark is undoubtedly a valuable toolbox. Spark also works well for streaming analytics and processing polystructured data. However, when maximum performance is key and the solution is to be deployed in business-critical departments, such as returns management, fleet management or fraud detection, Exasol is the superior solution. Exasol specifically focuses on optimization, performance and maturity, as well as comprehensive support and ease of use, which together make it the best choice.
Better together
Moreover, Exasol and Spark shouldn’t be seen as solutions that compete with each other. On the contrary, using them together, which is made easy by Exasol’s seamless Spark integration, delivers great results. To find out how, read the article “Setting Light to Apache Spark” by Jens Graupmann.