A practical demonstration: EXASolution versus Natural Language Processing
I’d like to mark Thanksgiving by celebrating one of America’s greatest living poets, Shawn Carter, a.k.a. Jay-Z.
The thing I like most about Jay-Z’s poetry is that there is no way I could have written it. His upbringing, working and social lives are radically different from mine, and so the way he uses language couldn’t be more different. We both have English as a mother tongue, but we use it like we are from two different planets.
That’s the problem with attempting to use computers to analyse human language. They only have words to work with and no context regarding who is saying those words and for what reason. If two people say the same words in response to a different question, it’s not guaranteed that they intend the same meaning.
Let’s take as an example the simple sentence:
“That meal was all right.”
In some parts of Northern England could mean “10 out 10 – best meal I’ve had this year”, whereas in Southern England it could mean “5 out of 10 – I’ve had worse meals, but it wasn’t great”. They might even mean “2 out of 10 – but I’m too polite to tell the whole truth”.
One particular problem in language analysis is that it is difficult for a computer to detect sarcasm – to be honest, it’s difficult enough for humans.
Take for example the sentence:
“I loved this book so much that I threw it away before finishing the first page.”
A simple computer program looking to detect whether this review was positive or negative would have seen “I loved this book” and marked it as a positive review – however most humans can see that this comment was not written by a happy reader.
One final difficulty – small mistakes can make a big difference. The Ten Commandments read very differently if you miss out the word “not”!
Even punctuation is important – my favourite “missing comma” example is:
“She finds inspiration in cooking her family and her dog.”
which is quite different from
“She finds inspiration in cooking, her family and her dog.”
Mining natural language for business insight
Despite the difficulties involved, the rewards of mining natural language for business insight are definitively worth having. If you can detect happy customers, angry customers or potential sales opportunities from masses of unfiltered comments from social media or elsewhere, then this gives you an edge over your competition which I would summarise as “happier customers and more of them”.
In order to demonstrate some of the basic features involved in Natural Language Processing, I have made a short video.
This video takes Jay-Z’s lyrics and demonstrates a couple of techniques using Python User Defined Functions on EXASolution. I like to keep my videos short, but hopefully it’s enough to show that you can extend EXASOL way beyond SQL and perform complex analysis on some very unstructured (but beautiful) data.
Happy Thanksgiving to all, and as Jay-Z once said:
“May the best of your todays be the worst of your tomorrows.”