Last week I was lucky enough to catch up with Big Data World Europe speaker Professor Nigel Shadbolt for a discussion about the evolving world of data. Read on and enjoy part one of the interview. For part two click here.
Good morning Nigel, let's start off with a little bit about your background
I've been an AI researcher since the late 70s when I studied it at Edinburgh. I've seen lots of booms and busts of AI technology which is one of those areas where being smart in software, whether to provide decision support or indeed going right back to early work around analytics, AI has had some notable achievements over the years. Smart methods around cluster analysis or predictive analysis of sets of data about how people are going to behave in the future has been an enduring feature of computer technology and AI for the past 20 and 30 years. What is different of course now is that it is happening at web scale.
I was originally doing my AI in Edinburgh then within a School of Psychology at Nottingham where I was trying to understand how people made decisions and how human decision-making could help write better software programmes. So that thread has always been there: how do we understand how to be smart about making decisions and what data we need. When I then moved down to Southampton and became increasingly engaged with the web I saw that what was happening is that the web represents the most incredible distributed knowledge base, not just in the sense that the data is on the web but because people are on the end of those IP presences. It is such a phenomenal resource in terms of the content they author, the information they provide but also the problem solving that they provide. Whilst Big Data is absolutely a focus at the moment, what intrigues me as well is "Big Problem Solving", how people are connected together and how together they can solve problems that individually they can't solve. It's not just about integrating data together.
So my background is very much on looking at the evolving, changing technological landscape. Moore's Law, which every 15 months gives us twice the power at half the price, means that we can bring to bear methods and techniques which, while they were predictable ten or twenty years ago, always seem to take us by surprise.
So would you say then that the rapid changes in the abilities of technology have surprised you?
Well I think that it was in the wind, you could see that the sheer information that was becoming available. If you take a practical problem, such as finding a page on the web, the search engine problem, the web gets to a certain size and people start to think that the whole situation is just a counsel of despair. Then along comes Google with a novel algorithm that works out that the links human beings make on pages are made for a purpose. They are made because of a relevance of one link or page to another. So if you can work out the influence diagram of all of that, which are the most important pages and which are being pointed at by other important pages and then rank that set you get an incredibly powerful search method. So the web in itself was an obvious demonstration of the power and scale of data.
Now before that of course people had been doing work on relational databases and data mining before the web was around, but both they and the data were in silos and were disconnected from one another. You had your data in your database and you would run your analytics on that, but then to see that there was this open network of content being added to, and the rate of addition were truly staggering. What that means, to have the constant ability to look at the shape and structure of the web graph, that is a piece of supremely powerful analytics. It is not just about the shape and structure of the web graph but what is in it, what is the content, whether it is in the blogs or the tweets or Facebook posts. These are extremely rich sources of information, as are the search queries people make.
What the internet seems to have done is to break down silos. Nevertheless because people are extremely guarded about their data do you think that this open data model is at risk if mindsets do not change?
Well I think that we shouldn't necessarily imagine that we have gotten rid of all the silos. Whilst the web undoubtedly provides for great connectivity it is also the case that silos are alive and well, and it can actually be quite hard for businesses to break out their data from existing databases and integrate it. That really is an ongoing challenge, how do you surface the data from the deep web or from the existing databases or data assets that are around. We certainly should not see this as a solved problem.
The other side of this is that as we put more and more of this stuff up there and more is made available there is this worry around triangulation, more and more information that specifically targets specific consumer buying habits. There is a really interesting range of attitudes around this. On the one hand consumers are pleased to receive some services, they like the experiences they get with accurately targeted material. On the other hand they are increasingly concerned about what they see as a one-sided contract, they don't feel empowered and they don't understand the consequences. The question though is what should we reasonably expect, because in a sense the art of the possible in terms of the analytics that can be performed is an area of almost constant research.
I do think that we have to take the consumers' interests and concerns seriously. We do not want a world in which consumers withdraw their consent. Nevertheless we must ask whether at the moment they really do give their consent in an informed way. Do they really grasp the complexities of the terms and conditions that they enter into? I think that it has to be a two way process, a world in which there is a conversation between those who generate the data and those who use it. The benefits have to be outlined as well as the rights and responsibilities.
To continue reading click here for part two of the interview.