Exasol can cope with unstructured data using User Defined Functions
“Structured data” is the kind of data that you could work with in a (sufficiently large) spreadsheet. There are fixed fields within a record and there’s a data model which defines what fields of data will be stored and how that data will be stored (numeric, alphabetic, length, precision etc.)
Structured data in a database is easily entered, stored, queried and analysed, using the STRUCTURED query language, which we all know and love as SQL.
Once upon a time, databases were just like big spreadsheets and that was the end of the story as far as data analysis was concerned. You either had to force the data to fit a structure, or you had to throw it away. Which was bad news, because some experts suggest that unstructured data constitutes 80-90% of the total data in a business.
If you force unstructured data into a structure, you are likely to lose some of the meaning and if you throw it away totally then any value that data might have had is gone forever.
That was before the current “Big Data” movement – now we know that unstructured data has value and we’re not prepared to throw it away.
Here are some examples of unstructured data – imagine how difficult it would be to force it into a structure and imagine how much business value would be thrown away if you simply ignored it:
- Customer feedback on your website
- Tweets from Twitter about your company’s products
- Email messages
The biggest myth in the world of Big Data is that SQL databases can’t work with unstructured data and you must have a completely different technology. Totally untrue – well, we at Exasol think so anyway.
We have the concept of user defined functions which allow our users to create functions using their favourite programming language to extract meaning from unstructured data. Those functions can be used in SQL alongside any structured data they might have.
The upshot of this is that unstructured data can be stored and queried in parallel and in memory in our databases, just like structured data.
You no longer have to throw unstructured data away and you don’t have to force it into a structure.
And in many cases you won’t have to buy, implement and maintain a separate “NoSQL” solution for your unstructured data.
Don’t ever let anybody tell you that SQL databases can’t cope with unstructured data – they are obviously living in the past.