20th May 2013

Leaky data: How Wonga makes lending decisions

Wonga.com is not only the most high profile and controversial payday lender in the UK, it is also the most technologically advanced. By automatically sorting through 8,000 different data points, it claims to be particularly good at sorting borrowers who will repay from those who will not, based on its distinctive method of credit assessment. But, apart from Wonga insiders, no-one quite knows how this is done. I’m going to look at what you can find out from what is publically available – once you know how to look – and what the implications of these kinds of practices might be, as they spread.

The individual controversies that have accompanied Wonga will be familiar to many. At the root of Wonga’s PR problems seem to be its particular combination of friendly, welcoming brand imagery – its latest adverts feature pensioner puppets ‘Betty, Joyce and Earl’ getting up to various mischief – with lending rates that currently stand at 4214% APR, a level branded ‘usurious’ by the then Bishop, now Archbishop of Canterbury. The rate is significantly higher than many other online payday lenders.

Wonga’s response to criticism has been to point to the unusually fast service it offers borrowers, to its transparency – the APR is, after all, displayed prominently on its homepage – and, echoing arguments made by an industry body, it asserts that APR, an annual, compound measure of interest, isn’t an appropriate measure for the short term world of payday lending.

Yet Wonga doesn’t want to be considered as just the most high profile of many potentially controversial payday lenders. They don’t even see finance as their primary arena, but technology, with their founder, Errol Damelin, likening the company, amongst others, to Amazon and Paypal.

For some, this is a deliberate muddying of the water, obscuring the real basis of their economic success: the high cost of its loans. As British Member of Parliament and consumer lending campaigner Stella Creasy put it, “They need to think again about the idea that it is the technology that people are attracted to, rather than the credit”. She has a fair point, especially if you add in the polished marketing and website. Others have pointed out that all this talk of being “truly selective” seems to have done little to stem increasing levels of defaulting debts: £77m had to be written off in 2011, the most recent year for which accounts are publically available – a sizeable amount, considering its total revenue in the same year was £185m. However, before we ourselves write off Wonga’s claims as either spin or marketing fluff, it is important to understand what, exactly, is going on when a borrower comes to Wonga and asks for a loan.

Instant decisions

Key to Wonga is its speed. Its short term, low value loans are delivered to potential customers fast – extremely fast, in fact. Decisions, customers are promised, are made within six minutes, with money wired direct to their accounts in fifteen. This means it has to use data that is available instantly, or nearly instantly.

Like many lenders, ‘scoring’ is also important to Wonga. Credit scoring means using whatever data is available on a borrower to come up with a single figure which will, with some exceptions, determine whether or not a loan is offered. But why would Wonga need 8,000 pieces of data? This seems an unnecessarily large amount. Surely the most important predictors of a borrower’s future intentions are fairly obvious, things like past payment record, their job and future earning prospects, their monthly incomings and outgoings?

Well, when it comes to credit assessment, each bit of data really can matter. As I’ve written about in the context of debt collection, creditors, at various points in the lending cycle, routinely feed as much information about the borrower as they can get their hands on into sophisticated analytical models. The aim is to predict how certain debtors, grouped together by shared characteristics, will act in future.

In this quest, familiar variables – things like income, employment, payment history – are the blunt instruments. When more fine grained distinctions are needed, less conventional characteristics, like the tendency not to pay by direct debit, or the precise relationship between a current balance and a credit limit, may reveal themselves as having particular ‘predictive power’. The more data you have at the outset based on the repayment behaviour of known debtors (those customers you already have), the more likely you are to find a variable, or more likely a combination of variables, that can help you isolate and differentiate new groups of debtors. And when it comes to really pushing profit margins, anything you can do to weed out potential defaulting debtors, without throwing out reliable ones, is seen as an unqualified ‘good thing’.

Credit agencies have, of course, known this for years. This is their bread and butter. One might reasonably presume, then, that data from credit agencies is part of a Wonga assessment.  A recent Guardian article, for instance, had one anonymous banker speculating that maybe “Wonga spends huge sums on databases and traditional credit-ratings agencies – anything to improve the accuracy of its assessments”.

While this is right about Wonga’s thirst for information, it’s not quite right about how dependent they are on conventional credit ratings. Wonga explicitly states that traditional, third party credit scores do not form the basis of their system. They do buy this data, but it appears to be just one element which is then overlaid and mixed in with Wonga’s own. This is worth pausing to reflect on: credit reference agencies have been the most important source of information for the huge majority of mainstream consumer credit lending decisions in recent decades, yet Wonga’s founder claims that its own model is ‘unbelievably’ and ‘dramatically’ more predictive.

Perhaps Wonga are also paying to access other databases, as the shy banker also suggests. Or perhaps, as another source in the same Guardian article guesses, Wonga draws on the wealth of free information that is available instantly online: electoral roll details, estimates of house values, even simple searches of the applicants name alongside certain key terms.

It is perfectly possible that some such data is indeed part of Wonga’s ratings system. But, given this information could potentially be available to credit reference agencies, wouldn’t it be a bit odd if Wonga, a relatively small, relatively new company, could so easily use just this data to beat an industry that has spent decades trying to master such methods?

There are a few other clues. A well placed source comes, I think, closest to the answer: “They use a lot of social media and other tools on the internet you don’t even think about”, she says. “That’s where the magic is”. This still leaves a lot of questions open. Magic? Social media? How?

Leaky data

At this point, I invite you to try an experiment. Visit wonga.com. You will see, at the centre of the page, two sliders. One with a loan amount and one with a repayment period. Without doing anything, note where the loan amount slider sits. Now, open a different browser. For example, if you generally use Firefox, perhaps try Internet Explorer, or Chrome. Again, visit Wonga. What is the loan amount this time? What about if you use your phone?

For many of you, the loan amount initially displayed will change depending on how you use the site. If you’re interested, you can pursue the experiment further. Try deleting your cookies and revisiting the site. Or using different search engines to look for “payday loans” or “wonga” and follow the links. As you will see, the loan amount you are initially presented with sometimes, although not always, differs.

Based on my own experiments and a not very scientific Facebook survey amongst friends, the loan amounts users are initially presented with currently tend to be either £111 or £265, although I have also achieved figures of £350 and £361. In my informal survey, those using Apple products (a Safari browser, or say an iPhone or an iPad) seemed to be most consistently offered £265. Although tests with some obscure browsers suggest that it is likely that it is less that you are ‘uprated’ by using Apple products, than you are ‘downrated’ by using less niche browsers like Firefox and Internet explorer.

Another easily accessible and likely important bit of information is your IP address, which provides a pretty good indication of your rough location. Also seemingly influential is the number of prior visits. And of course, the route to Wonga – was it a ‘direct hit’, or via a search query? If it was a search, what were the terms you searched for? Did you click on an advert?

As those familiar with this area well know, when you visit a website, it is extremely difficult not to leak lots of information about precisely how you are accessing the site. Mobile devices are particularly leaky. Even this website routinely collects such information. As an example, here, via Google Analytics, are the top five mobile devices known to access the site:

1. Apple iPad
2. Apple iPhone
3. Samsung GT-I9100 Galaxy S II
4. Samsung GT-P5110 Galaxy Tab 2 10.1
5. RIM BlackBerry 9300 Curve 3G

Why does it matter where the slider is? After all, aren’t borrowers just going to slide it to whatever level they need? Perhaps. However Wonga themselves have confirmed that the ‘degree’ of slide matters – and that users that instantly push the slider to maximum are more likely to default. This behaviour is thus being translated into a new metric to feed into their calculations.

The slider does potentially one more important thing: after offering a base level loan for a user – a kind of initial, rough, undeclared confidence vote – we can hypothesise that, very much in the school of ‘nudge theory’, it is trying to encourage broadly riskier borrowers not to ask for too much. This will potentially improve the chance of their application being successful. After all, new users are quite unlikely to go through the whole process twice, with different loan amounts. This, then, would be about improving what is called a site’s ‘conversion rate’.

These different slider patterns are only the beginning. As Wonga has frequently made clear, the assessment process itself is being influenced by a user’s unwittingly leaked data, with such information forming part of their 8,000 data points. An important characteristic of the kind of information I’ve covered here is that it is fairly reliable (after all, who can be bothered enough to disguise their browser or IP address when applying for a loan?), it is easy to collect, and it can be accessed and fed into a database virtually instantly. This is crucial, otherwise it would be of no use to Wonga in its quest for quick, automated decision making.

Again, when it comes to algorithmic prediction, anything can matter, however apparently mundane. If it turns out that those that use one search engine rather than another tend to default on their loans more, then this may well be significant. Wonga has revealed that even the time of day a user accesses the site feeds into their calculations. Other mundane information we routinely leak includes the particular version of the browser we are using (up to date, or not?), our operating system, screen resolution, internet service provider, and so on. If you’re curious what data you’re leaking right now, there are plenty of sites that can quickly tell you.

I don’t know what each of these various bits of information tell you about a potential borrower. However, I’d confidently bet that someone at Wonga does – for many, if not all of them. It is not that these would form the sole basis of their decisions – Wonga after all does collect conventional data (income, dependents, profession, etc) and probably uses conventional credit scores. Rather these additional points of data each potentially add extra predictive power to such decisions.

There are other clues. Using a Firefox plugin called Ghostery, which helps users see which analytical ‘widgets’ a site you are visiting has working behind the scenes – the so-called ‘invisible web’ – we can learn that, unsurprisingly, Wonga is using a variety of ad tracking services. These are used by many sites to collect data on the effectiveness of different online advertising campaigns, or to monitor the impact of changes to the site design. We also learn that it uses Google Analytics. Although common, Google Analytics can feed the kind of leaky information I’ve described to website owners in real time – a crucial pre-requisite for Wonga’s fast decision making process.

Particularly interesting is one widget, called QuBit OpenTag. QuBit is a London based tech company, funded, perhaps coincidentally, by Balderton Capital, the same venture capitalist firm as Wonga. OpenTag itself is a tool partly designed to help companies improve their website’s performance and monitoring. But the company also helps websites provide exactly the kind of real time personalised content, based on things like browser type and IP address, that seems to make Wonga’s slider appear at different initial positions for different people. So for example, in a report designed to showcase the power of their analytics, QuBit describe how technology purchases by visitors using Safari are “around £30 more than any other browser”. This is, as it’s known in marketing, customer segmentation. Crucially, in this case, this segmentation can be done virtually instantly, based on variables that many users might assume to be irrelevant.

Other than OpenTag, it’s not possible to be sure about what services QuBit is providing for Wonga. If OpenTag is the extent of it, then Wonga seems to be mirroring QuBit’s approach extremely closely. There is, however, one third party about whom it’s possible to be pretty sure about their relationship with Wonga: and that’s Facebook.

Connecting with Facebook

After choosing their loan level and repayment period, users are asked to fill in some basic, familiar details. A field then pops up, which invites them to establish a connection to Facebook. In effect, this means installing Wonga’s ‘app’ on your Facebook account. This is optional and Wonga commits to maintain the user’s privacy and not to post anything on their page. Connecting, Wonga tells the potential customer, “helps us to know you better. This will improve your chances of being approved for a loan”.

It’s easy to see why Wonga might be keen on this. Apps granted the right levels of access to Facebook don’t just leak information to their owners, they gush it. Wonga would certainly not be the first company to realise that accessing customer’s Facebook details can play a potentially powerful role in helping to understand and further segment their user base. To take just one example, US based company Microstrategy offers a service which can help websites segment their users by a wide range of different Facebook generated criteria. As they note, “Facebook is the world’s most comprehensive and up-to-date database of people’s demographic information and interests”.

Wonga’s own app appears to be in the process of development: when you try to connect, you currently get an error message. That said, if you understand the basics of Facebook Connect, the url itself tells you all you need to know, providing a complete list of the permissions being requested. If granted, these permissions would include access to information that would help confirm the identity of a user, including birthday, hometown, and location. Wonga is also interested in a wealth of information that might supplement or undermine the income level declared by a potential borrower: educational history, work history, as well as relationship details. And then, perhaps more surprisingly, it is also interested in seeing ‘softer’ information. This includes the user’s listed interests, games activity, religious and political views, any subscriptions they might have, their ‘likes’, groups the user is part of, and their personal website. While Wonga is not allowed to copy details out of Facebook’s databases wholesale, it could search this information against a potentially infinite variety of terms and test the predictiveness of this as part of its own scoring models.

There is one further particularly powerful permission buried in Wonga’s request, called ‘read_stream’. This is what is called an Extended Permission, explicitly designed to grant access to what Facebook calls “sensitive info”. Specifically, read_stream, allows access “to all the posts in the user’s News Feed and enables your application to perform searches against the user’s News Feed”. Bernhard Rieder, working at the University of Amsterdam, has been investigating this permission. As his experiments show, read_stream not only provides access to all of your posts – a highly intimate level of access in its own right – it also provides access to what your friends are doing on Facebook, as shown in your news feed. As he writes, what Facebook in its description breezes over as just “posts in the user’s News Feed” might be more accurately translated as “a minute account of your friends’ activities”. There are privacy settings within Facebook to stop the apps that your friends use being able to do this. It’s not easy, however. I thought my privacy settings were set pretty high, yet it turns out that if a friend gave an app the read_stream permission, it would be able to access my name, gender, list of friends, activities, interests, things I like, current city, and app activity. In fact, the first three items are only possible to hide by barring all apps access to your Facebook account.

When you add together all the potential sources of leaked information that I’ve covered – and I have likely only scratched the surface – you can see how Wonga might begin to get up to the 8,000 data point mark. Once the Facebook app is up and running, it is reasonable to expect this figure to climb. For all we know, their analysis could show that most their profitable borrowers are Mac-using freelancers, living in London, who like Werner Herzog documentaries, who access Wonga in the early hours of the morning. This is quite different from making a lending decision based on a conventional credit rating.

While Wonga’s use of this sort of data for assessing potential borrowers is a significant departure from assessments relying on conventional credit scores, its approach is not unique. Around the end of 2011, for instance, tech blogs buzzed with reports about a company called Lenddo that was trying to do something similar in the USA – although they seem to be mainly operating in the Philippines and Columbia. But perhaps the most significant of Wonga’s technological rivals is a German based company called Kreditech, recent recipients of investment totalling around £5million. It specialises in what it calls ‘big data scoring’ and assesses potential borrowers by analysing exactly the same number of data points as Wonga. It claims that their method can potentially do without conventional credit references altogether. Kreditech is a little more open about its methods – which means it’s possible to speculate by analogy a little more about other kinds of leaky data that Wonga might be using – this includes things like users’ shopping behaviours on other sites, the apps installed on their devices, and the precise way in which customers move around a site’s webpage.

Kreditech themselves are currently busy preparing to launch new payday loan sites in Australia and across Eastern Europe and Central and Southern America, to add to their existing sites in the Czech Republic, Poland and Spain. Its sites also invite users to establish a connection with their Facebook accounts, this time encouraged by a financial incentive. The Polish site, for instance, offers to knock 25 zloty (around £5) off the final repayment amount for users who connect via Facebook. And its app works just fine, with users receiving the following notification, which they are asked to ‘Okay’:

Kredito24.pl would like to access your public profile, friend list, email address, custom friends lists, messages, News Feed, birthday, chat status, work history, status updates, checkins, education history, groups, hometown, interests, current city, photos, website, personal description, likes and your friends’ birthdays, work histories, status updates, checkins, education histories, events, groups, hometowns, interests, current cities, photos, websites, personal descriptions and likes.

Be sure to read this list very slowly. This lender is quite open about its interest not only in a staggering amount of information about you, but also about your friends. Okay?

Different ethical engagements

I am not pointing all this out so that we can marvel at the ingenuity of such companies. That the high interest rates charged by payday lenders are becoming normalised and internationally formalised in this way matters a great deal, certainly. Given the brazenness of such requests, we should also think carefully about how such technologies sit within the rise of what academics like Jason Pridmore have identified as the rise of ‘consumer surveillance’. At the same time, is also important not to overstate their current power. Just because companies like Wonga and Kreditech clearly see the potential of such technologies, it doesn’t mean they are yet fully exploiting them. Aspects of these systems are clearly new and under development, while analysing and understanding such a deluge of information will not be straightforward.

That said, what is unambiguous about these new approaches to credit assessment is their ambition. It is therefore vital to understand what exactly is being attempted by Wonga and its technological rivals. For, when compared to assessments using conventional credit ratings, such systems imply quite different ethical engagements between lender and borrower. Credit ratings have often been talked about by sociologists as eliciting practices of ‘entrepreneurial self-government’. This is the phenomenon of individuals’ actions becoming ever more oriented towards these hugely powerful but often hard to understand credit scores. In this situation, it is increasingly difficult to resist the pull of the credit score, and the desire to improve it, if we want to function successfully in such a credit led society. The kind of assessments that I have explored here are, however, of a different order. I will I touch on just two of the questions that they raise.

First, is it fair that those who do not need to use such potentially expensive sources of credit do not have to submit themselves to this level of highly personal, often surreptitious, form of personal scrutiny – which might even extend to the analysis of their friends? And, second, what would happen if such practices were extended to other lending arenas? Would I have to worry about which browser to use to approach a lender with? Should I include my educational history on Facebook? Which internet provider should I sign up to? What new information am I leaking which I didn’t know about before? I am certain that, if mortgage applications were being influenced by our unwittingly leaked data, we would have heard far more about these kinds of practices. That we haven’t, should give pause for thought.

Acknowledgements

Thanks to all those friends who participated in the Facebook poll, as well as for their technological support. Thanks also to Paul Langley, Liz McFall, Jose Ossandon, Bernhard Rieder and Lonneke van der Velden for their comments on an earlier draft. Image by Dave R Farmer, used under a Creative Commons license. This article is co-posted with Estudios de la Economía.

EDIT: I just came across this excellent article in Slate which also explores these technologies in some detail.