Is transaction data more valuable than search data?

In Data & Analytics by Freya Smale

analytics, big data, paypal

Mok Oh, Chief Scientist at PayPal began with a bold question; is transaction data is more valuable than search data? The questions was swiftly amended to being a position statement that reflects the approach of PayPal and how they are choosing to work with data. The retail sector has shifted from 4% to 6% in terms of online sales, with 37% of offline sales now influenced by online research about products. This blurring between online and offline relates to the concept of omni-platforms that Mok Oh from Paypal is presenting.  It should be expected that the bias from Paypal would be towards transactions and not search, of course although it was interesting to see that Viktor Frankl had influence some of the thinking behind the work, based on the idea that space exists between stimulus and response. This relates directly to transactions, with search being a part of that process.

What happens in a shopper's mind he asks, displaying a picture of Homer Simpson's brain. The sequence that unfolds in a buyers mind is presented as a six step sequence:

Want, Discover, Shop, Buy Pay, Own.

Returning to Frankl's idea, looking at the spaces in between these parts can reveal a great deal more about transactions. An example is given regarding the shopper who searches for the product they want, looking to discover it then attempting to find it in a shop and but it etc. But PayPal know this sequence backwards – which isn't to say they know it so well – it just works for them in reverse. They know what you bought, so they can see where you bought it, then they can make assumptions using big data about how you discovered it and therefore what you want in the future (and when you might want it too).

PayPal exists in 190 markets in the world today, is available in 25 different currencies and has 113m+ active users who are described by 2m+ attributes. 70 hz is shown on screen as the measure of  transactions in Q1 2012 (per second), leading to a figure of $4,300 dollar per second in generated revenue. We are then presented with a series of visualisations, the first showing a time-lapse map of the US, animated to show the time just before and during black-friday, where a great flash of blue light appears to show the burst of online transactions. More visuals follow with a globe showing live transactions displaying PayPal transactions across borders.

There are other types of data outside of transactions which are used to augment and add meaning, such as blogs, social media, photos, check-ins and documents, all of which PayPal are paying attention to and it all starts with a core of transactional data. Marrying online and offline data (all very ‘big data') can provide more insights for the company, which benefits users of course.

Big Data is Big Science!
Functions of data can predict similarity, explained by Mok presenting an example of a real person (anonymised) who is a working mother, online savvy, and a person that hunts for good deals when shopping. Paypal uses her data to track down other users with similar patterns in their data. This is achieved through integration with Hunchworks and other big data sources that record user preferences. Mok then shows a diagram of ‘Bill' another user who like Subway sandwiches and shopping at Home Depot (which is B&Q in the US I think), using only a basic model paypal can predict with 70% accuracy where and when you will spend money, with a more complex algorithm providing over 90% accuracy to shopping habits. This is incredible, but raises immediate issues about the privacy and commodity of data – will PayPal sell this to others in the future I wonder or is it too valuable? Either way there are privacy and / or security issues surrounding this use of data.

Mok then shows a diagram visualisation of a ‘customer genome'. PayPal map out the buying behaviours (it features 113m rows of user data), with True Industry categories in the columns. A yellow cell colour indicates a person has the ‘shopping gene' for Victoria’s secret for example, from this and other ‘genes' people can be matched against others with a similar retail disposition. Again, although not mentioned at all in the talk, the issues of privacy and security are not being overlooked. Afterall, if information is power, how much is too much?

Blogged on behalf of guest blogger Paul Booth.