Big Data or big problem?


Big Data or big problem?


Lionel Messi equalises v Chelsea in the 2018 Champions League Round of 16 tie at Stamford Bridge © UEFA.com

There is no doubt that sports science in football can get players fitter quicker, pre-empt injury and avoid over-training. But when it comes to the grander claims that the appliance of science can engineer the randomness out of match preparation and player recruitment I remain to be convinced.



As a movement, the weight of numbers is experiencing critical mass and as Big Data enjoys its moment in every other area of life, its adoption within football seems inevitable. Because everyone appears to have skin in the game.

The academics crave the reflected glamour of the beautiful game. The VC-funded start-ups smell the money. The journalists crave stats that inform their features. The fans want a ready-reckoner description of what they’re watching. And then there’s football people themselves. They want to appear progressive, intelligent and modern in their embracing of technology.

The sales pitch is as slick as it is relentless but it is hard to escape the impression that the game itself is being short-changed by a hotch potch of half-baked ideas portrayed as applied business intelligence.

Sharing my scout’s scepticism is Dr Robert Webb of the University of Nottingham.



As the UK’s pre-eminent academic in his field, the Associate Professor in Banking is in an excellent position to talk with some authority. Both his parents were scouts and he spent his teenage years as a talented ball playing centre half on the books at Nottingham Forest and Leeds United. He says that limitations of perception are very evident on both sides.

There are sticking point with quants in football. Managers are seduced by reducing uncertainty in outcome (as we all are) and that is what they feel they’re buying. I’m not convinced that they fully understand what the modelling means or that how it is interpreted is a matter of opinion rather than a matter of objective fact.”

He says: “A lot of the current analysis smacks of people trying to make a quick buck as football has so much money swilling around within it.”

As Webb explains, financial institutions ‘measure the measurables’ and will have over a million data points from decisions made in the past to extrapolate from. Football does not have a massive database of similar, relevant data that can act as a control group. And only a small amount of what a player really does can be measured or modelled.”

He says: “Human behaviour is way more complicated than just how many tackles and how far did you run. The interaction with your 11 teammates is the key – not individual actions.”



In short: stats form a useful topography for describing surface-level events within the game you’ve seen. But they are of negligible value for predicting what is going to happen in the game you are about to watch.

It is an observation that points up the limits of the current obsession with big data solutions and football analytics.

Data points, recording physical things that happen in a game, are outputs. What they are not is outcomes. Correlation does not apply causation and because football is a game played by players and teams in opposition what happens in specific games may not be relevant to how teams or players react in other scenarios, in other games and against other direct opponents.

Take a player’s movement, for example. In a game where even the best players rarely take more than 50 touches per game and where players can expect to be in possession for somewhere around 2-3 minutes per ninety, then what happens off the ball, generally conditions how good teams and players are and how the game pans out.

Where players run to, when they run, how they support their teammates, how they control space. This all conditions when and where chances are created – at both ends of the field.

And of course, it is the team without the ball that set the tempo of the game because where, how, when and how fast they try and regain possession creates the dynamics of every match. This is Newton’s Third Law of Motion in action. Formally stated it is: “For every action, there is an equal and opposite reaction.” All the way down the chain. And nothing happens in isolation.

In football terms that reaction can vary from an imperceptibly hurried pass or turnover of possession to the creation of a chance in reaction to a half-hearted press. Analytics can tell us what has happened (in the broadest sense) but what it can’t do is tell us why it happened or even if a specific event is important as a predictor of future events.

A players ability off the ball simply can’t be measured objectively because movement in football is an activity that’s meaning is typically obscure, potentially even unconscious, and also conditioned by an adherence to specific pre-game or in-game instructions (or not) that the observer is not privy to.

The temptation, is to look at games as a rigid script: 4-4-2 v 4-2-3-1, a game of personal duels and the outputs from yards run, passes, tackles and shots, a puzzle to be solved. The reality is that a football match is more like a musical score, interpreted in the moment by players. Almost all of the time football is bad jazz rather than good time rock ‘n’ roll and it is a wholly different experience live than ‘on the record’

However, we’ve been sold the argument so well that human beings are inherently biased and that this is a bad thing, that very few people, and particularly those involved in football, implicitly trust the evidence of their eyes. Even when that’s exactly what they should do.

And yet anecdote based on the scout’s recognition of situations they’ve seen recur over years’ of watching matches through the lens of their own eyes, does create insights and experience. It’s a form of bias, of subjectivity, that is actually worth its weight in gold, running contrary to technocratic wisdom.

Take for example, this anecdote told to me by Sir Alex Ferguson, based on his immersion in the collected scouts’ wisdom of the great Scottish scouts John Barr and Jimmy Dickie as a young manager at St Mirren.

He said: “I remember when David Beckham was just a young boy his parents came to see me and asked if I’d consider releasing David. He was still quite small and their concern was that it would break his heart not to make it at United and that it might be better to release him early. I assured them though that David Beckham could have a career at United and also that he would grow to be able to compete at the top level. I’d like to think that view was in some part shaped by the experience of watching young players with John when I was starting out as a manager in Scotland.”

The $60m question here is would David Beckham have been passed over by a more ‘scientific’ rendering of his ability and potential? We’ll never know but there is some evidence to suggest that the impact of ‘the numbers’ is far from benign, or ‘value free’.

Johan Cruyff is on record as saying: ‘I find it terrible when talents are rejected based on computer stats. Based on the criteria at Ajax now I would have been rejected.

Johan Cruyff outpaces Berti Vogts in the 1974 World Cup Final © Bundesarchiv

‘When I was 15, I couldn’t kick a ball 15 metres with my left and 20 with my right. My qualities, technique and vision, aren’t detectable by computer.”

And of course, there is no reason to believe that the same fallacies are not being perpetrated again and again in the recruitment and evaluation of players of all ages today.

Ironically, it is Arsene Wenger, the English game’s greatest advocate of data, that somewhat lets the cat out of the bag when discussing the transfer recommendations of Stat DNA, the company Arsenal purchased for £2.1m to be their in-house assistants on recruitment and analysis.

As reported in The New York Times: “When Wenger mentioned a talented young wing at Spain’s Real Sociedad, he was told that his metrics were not overly impressive. Wenger smiled, and remarked that he would be keen to see how the player, Antoine Griezmann, now one of the most coveted strikers in Europe, had developed.”

Modelling outputs as if they are objective records of outcomes creates it own law of unintended consequences because favouring outputs is a credo that creates athletes rather than natural footballers. That’s players who play the numbers to the exclusion of ‘being brave’ and taking legitimate risks. Antoine Griezmann failed Arsenal’s test but Gabriel Paulista, shipped out to Valencia, did not, with Mustafi and Elneny also said to be Stat DNA promoted. Arsenal are yet to deliver a poster boy for their current scouting method.

Gabriel Paulista signed for Arsenal in 2015 on the recommendation of StatDNA © Arsenal.com

We look at interceptions, defensive errors, winning tackles, set piece receptions. Gabriel has good stats,” Wenger said at the time he signed Paulista. “Of course, it is difficult to watch all the games. But what I mean is that if the numbers confirm the eye, it gives you more. If a guy comes home and says ‘I’ve seen a good player’, you can statistically observe this player for five, six, seven games.”

Yet you look at the quoted criteria and in isolation they throw up more questions than answers. What if his best work is done clearing up his own mess from poor positioning or reading of developing situations? How would a player that doesn’t dive in but uses his body and intelligence compare? And how would the fact that the player didn’t speak a word of English on his arrival in London impact on his stats? In short, how would he fit in as a teammate, part of a defensive unit? Only a scout, in the ground could really hazard a guess as to how these questions might work out. There is a danger that the generated raw data acts not as reassurance, a second opinion or as due diligence, but as a toxic red herring.

It is a subtle but no less significant dynamic. Let’s say that the modelling of a player’s profile is irrelevant to a successful signing. But as the clubs’ buy-in is so significant and because objective tech is viewed as superior to subjective eye there is no doubt that seductive stats have a pernicious, corrosive influence on sound opinions formed of experience.

Sarah Rudd from StatDNA explains the work of her company in a 2016 presentation at the Knowledge, Discovery and Datamining conference of 2016 (below).

Remember that good scouts will see game intelligence in action whenever it occurs – even if it doesn’t lead to a positive outcome or indeed any outcome at all. Sometime body language, a panicked look, a shout or gesture will speak far more eloquently than numbers.

Similarly, experienced scouts can also recognise when a player of either low confidence or dubious character deliberately closing down his own space (‘hiding’) when he doesn’t actively want the ball. A scout that knows what he’s watching will apply praise and condemnation where it is due. A cocksure PhD with a video monitor probably wouldn’t register this unfolding, never mind be able to model it.

Another apposite example is the stats-head’s obsession with Lionel Messi’s lack of distance covered per game, relative to others. The implication in terms of outputs is that a lack of measurable interventions is somehow indicative of a lack of effort, commitment or effectiveness.

A colleague of mine sums up the ‘Messi anomaly’ rather well when he says: “Messi isn’t considered the best just because he has great ball control, unparalleled dribbling skills, and amazing finishing and passing. He’s the best because he’s always in the right position, thinking, watching, imagining, studying.”

Lionel Messi’s moments of magic, are conceived in moments of instinct, muscle memory, supreme pattern recognition. And how do you measure that? It exists at the level beyond most human’s comprehension, never mind direct experience. It can’t be reduced to rows on a spreadsheet – ‘measured’, even when it can be clearly described, or sifted for relevance.

The key question is not the pace or distance of Messi’s movement but how he moves, where he moves too and how that prompts his teammates and the defenders trying to stop him.

Champions magazine, UEFA’s sadly defunct house organ of the Champion’s League, produced a fantastic analysis of this point in breaking down a typical Barcelona goal by David Villa against Man United in the 2010 Wembley Champions League Final.

You’ll have to watch the relevant footage a few times to catch it. And you’ll be rewarded with the perfect encapsulation of instinctive football intelligence. The goal is actually made by Messi simply taking two steps to the right to create an angle for a through ball in the lead up to Barca scoring and then unsighting the keeper by moving across his eyeline as Villa curls it around him and inside the goalkeeper’s left hand post.

Make no mistake, Messi’s two steps made the goal. Yet it wouldn’t register as significant on any conventional metric. And indeed, I had to watch the clip a few times from the various angles, after reading about it, just to really see it for what it was.

It is typical though that ‘expert’ people can measure everything yet understand the value of nothing, It is the classic case of the madness of the Prozone effect: measuring outputs as outcomes and ‘more’ as ‘better’, ‘less’ as ‘worse’. It promotes measured quantity as a substitute for illusive quality simply because ‘the data funnel abhors a vacuum’.

Does it really matter? Is Messi simply an outlier, and exception to the rule?

While the answer to both questions is ‘yes’, it doesn’t alter the fact that this MBA corporate approach to analysing onfield action and informing multi-million pound recruitment decisions is harming the game we all love. This football business is killing the business of football. And the game you see as fans is a direct reflection of this ideological battle between applied emotional intelligence and commercially motivated propaganda, where the latter has the upper hand in terms of prestige and resources.

In Issue 4 of Nutmeg Joel Sked suggested that the way ahead was ‘buying a couple of nerds, a few laptops and a Wyscout subscription’. My position, in a world where that’s exactly what everyone else is doing, is that this is the last thing clubs should currently do. Better to host your own party than be late to everyone else’s.

My advice: invest instead in people, the trusted and biased eyes of a band of committed, all-weather scouts, experts with the game’s lifeblood coursing through their veins.

Good eyes don’t need an algorithm to recognise that the left back over-covers whenever the ball is switched to his side of the park and that the inconsistency of the opponent’s winger is a function of the fact that he goes to pieces whenever he’s on the same side as the main stand boo boys within his club’s support. Good eyes’ insights are delivered with minimal fanfare but their value, when acted upon, will win you games.

Greg Gordon has over a decade’s experience as a next opponent analyst. He works with www.prep4pro.com and is the creator of www.howtowatchfootball.co.uk.

This feature originally appeared in Issue 6 of the Scottish Football periodical Nutmeg and is supported by the podcast above. You can find out more about Nutmeg here: https://www.nutmegmagazine.co.uk/


;