Algorithms are judging you on your credit scores. But are they getting it right?
Money2020, the largest finance tradeshow in the world, takes place each year in the Venetian Hotel in Las Vegas. At a recent gathering, above the din of slot machines on the casino floor downstairs, cryptocurrency startups pitched their latest coin offerings, while on the main stage, PayPal President and CEO Dan Schulman made an impassioned speech to thousands about the globe’s working poor and their need for access to banking and credit. The future, according to PayPal and many other companies, is algorithmic credit scoring, where payments and social media data coupled to machine learning will make lending decisions that another enthusiast argues are “better at picking people than people could ever be.”
Credit in China is now in the hands of a company called Alipay, which uses thousands of consumer data points—including what they purchase, what type of phone they use, what augmented reality games they play, and their friends on social media—to determine a credit score. In a culture where the elderly casually pull out their phones to pay for groceries and even the homeless don QR codes to accept donations, there’s plenty of data to draw on. And while the credit score can dictate the terms of a loan, it also acts as a proxy for general good character. In China, having a high credit rank can help your chances of accessing employment, for example, or of getting a visa to travel within Europe, and even finding a partner via online dating. One Chinese dating site, Baihe.com, offers greater visibility to users with high credit scores.
And all of it is dictated by the algorithm.
In China, having a high credit rank can help your chances of accessing employment, for example, or of getting a visa to travel. The decisions made by algorithmic credit scoring applications are not only said to be more accurate in predicting risk than traditional scoring methods; its champions argue they are also fairer because the algorithm is unswayed by the racial, gender, and socioeconomic biases that have skewed access to credit in the past. It might not be clear why playing video games, owning an Android phone, and having 400 Facebook friends can help to determine whether or not a loan application is successful, but a decade after the financial crisis, the logic goes, we need to trust that the numbers don’t lie.
Alipay isn’t alone. Aside from Chinese competitors like WePay, other companies are using machine learning to make lending decisions in Sub-Saharan Africa. One such company, called Branch, is capitalizing on mobile phone adoption in Kenya, drawing down data gleaned from the hugely popular mobile payments platform M-Pesa to devise credit scores. And of course, algorithmic credit scoring isn’t confined to emerging credit markets. In Germany, Kreditech, a lending service determined to build the “Amazon for consumer finance,” is moving away from traditional metrics such as repayment histories, to mine the personality clues hidden in the Facebook data its customers surrender. Meanwhile, a U.S. company called ZestFinance uses big data to target customers whose ratings arguably never recovered from the subprime mortgage crisis.
Algorithmic credit scoring is fueled by a desire to capitalize on the world’s ‘unbanked.’ As Schulman’s Money2020 speech suggests, algorithmic credit scoring is fueled by a desire to capitalize on the world’s ‘unbanked,’ drawing in billions of customers who, for lack of a traditional financial history, have thus far been excluded. But the rise of algorithmic credit also responds to anxieties in developed economies too—particularly in the aftermath of the financial crisis. A decade post-crash, there’s a whiff of a hope that big data might finally shore up the risky business of consumer credit everywhere. Whether we ought to have faith in that promise remains an open question—and one that is hard to answer given the impenetrability of machine learning.
In 2002, J.P. Martin, an executive at Canadian Tire, began to analyze transactional data from the previous year. The company sold sports and recreation equipment, homewares, and automotive supplies, and issued a credit card that was widely accepted. By examining transactional histories, Martin traced correlations between the purchases that customers made and the likelihood they would default on their repayments. Responsible and socially-orientated purchases such as birdseed or tools to remove snow from roofs correlated with future creditworthiness, while cheap brands of motor oil indicated a higher likelihood of default.
Some companies curtailed their customers’ credit if charges appeared for counseling, because depression and marital strife were signs of potential job loss. Shortly afterwards, some credit card companies began using these and other discoveries to scrutinize their customers. In the US, every transaction processed by Visa or MasterCard is coded by a “merchant category“—5122 for drugs, for example; 7277 for debt, marriage, or personal counseling; 7995 for betting and wagers; or 7273 for dating and escort services. Some companies curtailed their customers’ credit if charges appeared for counseling, because depression and marital strife were signs of potential job loss or expensive litigation.
The black box of the algorithm dictates that no one person really knows what data—or what combinations of data—will prove significant. While companies are generally up-front about what data is input to refine and upgrade the decision-making processes, the black box of the algorithm dictates that no one person really knows what data—or what combinations of data—will prove significant. With a little trial and error, for example, Joe Deville, a researcher at Lancaster University in the U.K., discovered that simply changing the screen resolution on his phone seemed to result in a different score for some algorithmic lenders, while others have suggested that actions as mysterious as charging your phone more often may produce a more favorable result. Meanwhile, the chief executive of Branch speaks whimsically of their machine-learning algorithm as a “robot in the sky” — a kind of AI fairy that makes lending decisions based on whether its users are naughty or nice. If you’re unhappy with the number that emerges from the black box, there’s little you can do to change or dispute it.
Algorithmic credit scores might seem futuristic, but these practices do have roots in credit scoring practices of yore. Early credit agencies, for example, hired human reporters to dig into their customers’ credit histories. The reports were largely compiled from local gossip and colored by the speculations of the predominantly white, male middle class reporters. Remarks about race and class, asides about housekeeping, and speculations about sexual orientation all abounded. One credit reporter from Buffalo, New York noted that “prudence in large transactions with all Jews should be used,” while a reporter in Georgia described a liquor store he was profiling as “a low Negro shop.” Similarly, the Retailer Credit Company, founded in 1899 (now Equifax) made use of information gathered by Welcome Wagon representatives to collate files on millions of Americans for the next 60 years.
By 1935, whole neighborhoods in the US were classified according to their credit characteristics. A map from that year of Greater Atlanta comes color-coded in shades of blue (desirable), yellow (definitely declining) and red (hazardous). The legend recalls a time when an individual’s chances of receiving a mortgage were shaped by their geographic status. The neighborhoods that received a hazardous rating were frequently poor or dominated by racial and ethnic minorities. The scoring practice, known today as redlining, acted as a device to reduce mobility and to keep African American families from moving into neighborhoods dominated by whites.
The Fair Credit Reporting Act in 1970 and the 1974 Equal Credit Opportunity Act were attempts to rectify these discriminatory practices. Today, or so the fintech narrative goes, we have detailed and unbiased scoring algorithms that are perceptually blind to gender, class, and ethnicity in their search for a creditworthy individual. And yet, burgeoning studies of how algorithms classify and make decisions mirror these historic geographies of exclusion, leading academics such as Cathy O’Neill and Frank Pasquale, who study the social, economic, and political effects of algorithmic decision making, to point to emergent practices of “weblining,” where algorithmic scores reproduce the same old credit castes and inequalities. Because these systems learn from existing data sets, it often follows that existing bias shapes what the machine decides is good, bad, normal or creditworthy.
Burgeoning studies of how algorithms classify and make decisions mirror historic geographies of exclusion. These systems are fast becoming the norm. The Chinese government is now close to launching its own algorithmic “Social Credit System” for its 1.4 billion citizens, a metric that uses online data to rate trustworthiness. As these systems become pervasive, and scores come to stand for individual worth, determining access to finance, services, and basic freedoms, the stakes of one bad decision are that much higher. This is to say nothing of the legitimacy of using such algorithmic proxies in the first place.
While it might seem obvious to call for greater transparency in these systems, with machine learning and massive datasets it’s extremely difficult to locate bias. Even if we could peer inside the black box, we probably wouldn’t find a clause in the code instructing the system to discriminate against the poor, or people of color, or even people who play too many video games. More important than understanding how these scores get calculated is giving users meaningful opportunities to dispute and contest adverse decisions that are made about them by the algorithm.
Maybe then we can really see if these systems are giving credit where credit is due.
InnoValeur Conseil | Data Science | Smart Data | Machine Learning | AI