Here’s how mainFT covered April’s UK CPI numbers:

UK inflation dropped less than forecast to 2.3 per cent in April, in a blow to hopes that the Bank of England would be ready to cut interest rates as soon as next month.

Forecasts matter: using them, the inflation print gets immediately contextualised in terms of the forecast-based framework of data-led monetary policy. In other words, it means reporting looks forward from the data for effects, rather than backwards for often-trivial comparisons.

Where do these forecasts come from? Although plenty of attention gets directed towards central bank predictions (when they’re available, and especially when they’re wrong), the comparators typically deployed used by the financial press are derived from guesses submitted by economists and compiled by Reuters or Bloomberg.

In the case of UK macro stats, most of these economists work for familiar names: lenders like Bank of America, Barclays, Goldman Sachs and SocGen, or consultancies like Capital Economics, Pantheon Macroeconomics, and EY. Some work for names that are possibly less well-known to Brits, such as Colombia’s Acciones Y Valores, Poland’s Bank Gospodarstwa Krajowego and Switzerland’s Zurcher Kantonalbank.

Are these economists good at guessing outcomes? It’s complicated.

Let’s look at the basics first. Bloomberg uses a median figure from all of these economists’ predictions as its consensus figure, showing up on its ECO screens. The outcome, and predictions, go to one decimal place. Here’s how the survey looked for April CPI’s data (yes, this article took ages to do), which was a big miss:

We’ve crudely recreated that histogram, sans the distribution curve — hover or prod to see which firms/economists were in each bucket:

Even in this single example, there’s… a lot to unpack.

Clearly, April was a bad outing for the sellside, who as a pack overestimated the drop in inflation. Only Philip Shaw and Sandra Horsfield from Investec called the headline number correctly. The presence of a 1.5 per cent estimate, from Argyll Economics, is baffling.

Let’s dig.

The sellside finally got its implied collective inflation call correct in May: the average of all responses gathered by Bloomberg was 2 per cent, and the reading was 2 per cent. Hooray!

The last time before then that the economists had called UK CPI correctly in aggregate was December 2022. In the 16 readings between then and May’s, every inflation reading beat or missed expectations:

‘Correct’ consensus is rare, to be fair. The Terminal has data for economist surveys for UK CPI back to May 2003, since when there been 253 monthly readings. During that time, the economists have collectively only got the reading right 63 times, a hit rate of about 25 per cent.

Here’s how that looks on a histogram…

…and, probably less usefully, as a timeline:

As a 12-month moving average calculated independent of the direction of the miss (so 0.3 higher and 0.3 lower are treated the same amount of error), economist accuracy reached an all-time low last year, and is still pretty bad by historic standards:

This is obviously an over-simplistic framework — as the overall volatility of inflation increases, small errors look less acute. Being 0.1 per cent off in a month when inflation was flat, during a period of low inflation, is probably worse than being 0.1 per cent off in a month where inflation jumped 3 per cent year-on-year.

We could try to devise a better system, but it’s worth making the point that, for users of these surveys, an error is an error regardless of whether it occurs in a economic moment with a greater propensity for errors.

What effects do these errors have? From the perspective of a financial blog, there are two main ones:
— They provide journalists with exciting copy
— They produce negative financial outcomes for people who traded on the assumption that the consensus was correct

Thinking deeply about the first one isn’t worth anyone’s time.

The second is more interesting. Let’s hypothecate:

— Some economists are better at guessing inflation than others.
— Following these economists individually and basing trades on their predictions would produce better investment outcomes than if you followed their rivals.
— Some economists may be better at guessing inflation than the aggregate.
— Following these economists individually and basing trades on their predictions would produce better investment outcomes than if you followed the consensus.
— Some economists may be better at guessing inflation than other economists, but worse than the aggregate of all economists.
— A consensus drawn from a basket composed of the best (ie most accurate) economists should be better than a consensus drawn from all economists.

How would one form such a custom basket? The obvious system would involve scoring economists based on how they did at guessing inflation.

Bloomberg provides this service, sort of. All economists who submit estimates to the Borg get a score, subject to certain criteria. Here’s how the leaderboard looked for UK CPI following the April release:

The Terminal’s user guide says this screen:

assists you with deciding who to follow to help shape your expectations of future releases.

Only the top seven are ranked. To understand why, we need to read Bloomberg’s methodological notes. They say:

Ranks are shown for the top 10 qualified (meets inclusion rules) economists, or 20% of qualified economists, whichever is lower.

Qualified economists meet the following standards:
— Minimum number of submitted forecasts: At least 62.5% out of the total number of qualified releases during the two year period prior to the release date under consideration.
— Consecutive forecast minimums: For weekly indicators, two forecasts within the last eight qualified releases. For all other indicators, two forecasts within the last six qualified releases.
— All indicators: At least one forecast in last three qualified releases.

There are 54 firms on the list, so the seven ranked appears to represent 20 per cent of around 35 firms that qualified for ranking at the time.

(Quick note: we made these charts before the May release so they’re mildly out of date, and we should note that TD Securities is now ranked #1*)

Overlooking that Bloomberg’s own economists came top of a ranking Bloomberg created (👀), the obvious questions are these: how would an aggregate of the top seven have performed? Are these good scores? And how is everyone not in the top seven else doing?

We can answer the first one pretty easily, with these caveats:
— Robert Wood recently moved from Bank of America to Pantheon Macroeconomics, while Sam Tombs has moved to covering the US for Pantheon, so though Wood’s guesses are a continuous series it’s worth noting most of them were at his former employer.
— Dan Hanson was submitting forecasts solo for the Borg from 2016-22, before joining forces with Ana Andrade and Niraj Shah. We’re going to combine those into a single series.
— We’ll have to limit our series to the last few years to avoid the pack thinning out too much.

Here’s the overall distribution of responses from this group, the UK CPI Magnificent Seven (CM7) as of April, versus the actual:

And here’s the median average of their responses against the actual, starting from January 2020 — the first month when at least five of them submitted guesses (an entirely vibes-based threshold) — and their spread performance against the whole pack:

TL;DR: Averaging only the top-ranked economists (as of April) would generally have produced better results than averaging all of them over the past four years or so. Hedge funds, if you’d like to pay us for this retrofittable wisdom, please get in touch.**

The other questions (are these good scores?/how are the people without a rank doing?) are a bit harder, and require some even closer inspection of the sausage-making process.

Bloomberg’s in-house economics team held the top rank with a score of 71.58. How is that calculated? Borg sayeth:

A “Z-score” based statistical model… is employed to calculate the probability of the forecast error. The score is then equated to the probability of the forecast error being larger than the observed error for the given economist.

If the economist’s prediction is perfect (zero error), then by definition the probability is 100%, and this would become the score. Conversely, if the error is very large, the probability value would be low, resulting in an expectedly low score. The period-specific scores are then averaged to form an overall score for each economist to arrive at the final economist score per indicator.

Essentially, Bloomberg compares each economist’s error and assumes a normal distribution to arrive at a “probability score” — or the “probability” that someone would predict the score correctly based on how far off they were on a given guess. We spoke to some statisticians, who called the probability score an arguably an unnecessary extra step (one could just report the Z-score), but said the approach was ultimately statistically sound and helpful for comparing across indicators as diverse as CPI inflation and employment reports.

The inclusion of only economists with a sufficient number of predictions is also statistically sensible, and only listing the top performers, rather than shaming the low scorers, is generous to the less accurate economists.

But this is Alphaville, and we believe in radical transparency (when our IT policy allows it).

So, to the best of our (limited) abilities, we attempted to recreate Bloomberg’s scoring system. But, unlike Bloomberg, we threw caution to the wind on sample size, under the idea that even a single guess deserves to be celebrated (or shamed).

It didn’t go perfectly. Despite several weeks of work and consultations with statisticians and economists, we could not crack the Borg perfectly — the scores we generated were consistently a bit different.

BUT what we did generate scores that held the same internal logic spelled out in Bloomberg’s instructions, that matched the point-in-time rankings on the Bloomberg terminal. To quote Michael Bloomberg’s (ill-fated) 2020 presidential campaign:

In God we trust, everyone else bring data

Basically, we tried. Is it the fairest possible assessment? Maybe not. Could we visualise it without creating something incredibly cursed? Of course not. Is it internally consistent? You betcha.

Here are the results up to April. Prepare for a scroll (use the controls to swap between ranks, which are much clearer, and scores):

We hope that was enjoyable, or at least functional.

What did we find out? The top spots have generally been held by TD Securities, Bloomberg, Itau Unibanco and Pantheon (both Tombs and, latterly, Woods), with Citi, and Bank of America (ie Wood passim) not too far behind.

But even these titans of guesswork are prone to blunder. In March of this year, both Bloomberg and TD Securities majorly missed — getting a point in time score of just 34 per cent.

Post-Wood BofA is looking very strong, while Modupe Adegbembo made a solid start with her opening guess for Jefferies. (Both also got May’s print bang on, so we will watch their careers with great interest.)

Elsewhere, UBS had once been towards the top, but their predictions have really dropped off. Their average score fell from 58 per cent probability to 43 in the past two years.

At the bottom of the current ranking are Natixis and Argyll Europe. Both have had spotty records, missing the target by such a large margin that they received scores of 0 on nearly half of their predictions. Argyll Europe has at least had some redeeming moments, as one of the few firms to perfectly predict February 2024’s reading. But Natixis has tended to be very far off. In fact, just guessing the prior month’s CPI reading for each reading over the past four years would have yielded a higher score than either firm’s average.

Swiss Life Holding AG also has a patchy record. They have only made eight predictions in the past four years, most of which have been very poor. But their star is rising — they got CPI perfectly right in September 2023, and have recently had a better hit rate.

Though we highlight Swiss Life and Argyll’s shoddy performance, ultimately they deserve some praise for sticking in the game. Most bad predictors cut their losses far earlier: Commonwealth Bank of Australia, Mufg Bank, Sterna partners and a couple others have only logged a handful of prediction in the past few years, with scores ranging from 12 per cent to 36 per cent. They understandably pulled out right after.

And there are also those who quit while they were ahead. Exoduspoint Capital got CPI right on the money in February of 2020, and then quit the UK inflation prediction game. We salute you.

So… we’ve written a lot of words and made several charts. Is there any meaningful takeaway from all this?

Well, we promised we wouldn’t get caught up on media ethics, but it’s at least interesting that the default yardstick against which an economic data release is often deemed good or bad is (at least in this example) is partially built from such mixed components.

Otherwise, it’s simply hard proof that there are material differences between different research outfits, and further evidence of Borg supremacy. Oh well.

Further reading
The mystery of the £39 orange (FTAV)


*Latest official table here:

**We assume that anyone who trades based on survey average vs actual print has already figured out a way of improving the composition of that survey, but to reiterate: we would accept the money.

Copyright The Financial Times Limited 2024. All rights reserved.
Reuse this content (opens in new window) CommentsJump to comments section

Comments