Businesses today have found ways of using data as a primary resource for value generation. These data-driven businesses use various data tools in this process, and data analysis is one of the major components. As a consequence, over the last few years Loading...data science has matured into a well-established profession receiving enormous attention from both industry and education.
Because data scientists’ activities are rather diverse, they involve quite a number of technical tools. Given the results of recent polls, it is clear that the availability of languages like R and Loading...Python is crucial for data scientists. Beyond the actual tools and details of the data, it is “the Loading...data science way of working” that is crucial. R and Loading...Python allow you to interact closely with data, to get your hands dirty, play with data in an ad hoc fashion; and this way of working needs to be preserved when moving to larger scales.
The consequence is obvious: As data scientists cannot work with very large data sets interactively, they fall back to two compensation strategies. Either they are working in a batch orientation and lose major components of their unique style of working that makes them so powerful or they are working only on small subsets of the real data. In general, the latter strategy is fine for some data exploration activities, but it means that putting new insights into production remains sluggish.