The first instinct of a lot of academics, when given a natural language processor, is to feed it on something deeply unnatural. How else to explain the recent rash of research that pits AI against Fedspeak?
To be fair, there are very good reasons for wanting to ignore central bankers, and the desire to delegate the chore of listening to their strangulated equivocalities predates ChatGPT by at least a century. But it’s the possibility of automating the process that has been causing excitement recently — such as in papers here, here, here and here, as well as in an Alphaville post here.
Now it’s the subject of a flagship JPMorgan note that we can’t link to, so will summarise here. It opens with a big claim:
With central bank communications now at the frontlines of policy setting, everything from official policy statements to individual speeches are scrutinized for hints of policy guidance. It is against this backdrop that machine learning and natural language processing (NLP) find fertile ground.
The use of NLP to assess central bank communications has been around for some time. However, prior attempts failed to gain traction because they lacked the sophistication to generate actionable results. Simply put, the technology was not yet ready for primetime. This has changed. We believe NLP is ready for the successful application that many have long waited for.
Here’s what many have long waited for:
Y tho? If monetary policy guidance is of such fundamental importance, and if clear communication is the post-GFC prerequisite, why pass the job to an algorithm? JPMorgan suggests five reasons.
AI offers a second opinion to human economists; its interpretations are systematic and transparent; it’s quicker to reach a conclusion; it spits out metrics rather than essays; and its findings are invulnerable to retrospection. “While an economist can offer an informed judgment of a particular central bank speech, this assessment is often multi-dimensional and may be dependent on context which can be lost in a matter of weeks or even days,” JPMorgan says. “By contrast, the HDS is singular and permanent in the historical record, making it ideal for gauging how central bank thinking is changing over time and how it compares to past episodes.”
The HDS referred to above is the JPMorgan Hawk-Dove Score. It’s an upgrade of the bank’s 2019 attempt at Fedspeak processing using BERT, a language model developed by Google. The rebuild uses ChatGPT and in theory can make sense of any central bank on the planet, because it frames its whole world around three rules.
And here’s how it rates individual Federal Open Market Committee committee members, based on recent speeches, with positive numbers meaning hawkish and negative ones meaning dovish . . .
… which isn’t how humans see things at all. Bullard and Kashkari are generally considered the most hawkish, Cook is the dove and Barr stays in the middle of the pack next to Powell. Luckily, because the FOMC is the world’s most micro-analysed committee, it’s reasonably easy to guess where garbage in has become garbage out:
One reason Governor Barr comes across as the most dovish member of the FOMC is that he is the vice chair for supervision and many of his speeches are less relevant to macro monetary policy while also having numerous references to financial stability—a concept that can be interpreted as dovish (though not always). [ . . . ] President Bullard is often noted for having a wide range of views that are hard to pin down. Moreover, because he often presents in slide-format without a speech, he is harder to quantify.
The hawk-dove ratios for the European Central Bank and Bank of England committees also need humans to add context.
Schnabel is probably too near the centre of the ECB because her most hawkish speeches have all been recent, so they haven’t yet moved the average. Broadbent probably over-indexes as a dove because he speaks rarely and cagily. The opposite may be true of Pill and his unique presentation style. Etc.
Whether it’s possible to apply this level of granular analysis to less studied rate-setting committees is a question the paper does not investigate.
One interesting theme in JPMorgan’s hawk-dove study is that in all three committees examined, the chairs tilt hawkish. That’s a surprise, as chairs are expected to be in the middle of the pack, but is probably an accurate reflection of recent communications. It’s possible that because of the background noise, chairs are having to take a bigger role communicating their committee’s direction of travel — though a speech-by-speech analysis doesn’t make it easy to spot any pattern:
To be clear, the hawk-dove detector works. JPMorgan’s machine is at least as good as the average economist at identifying changes in the mood music:
But the paper only touches briefly on whether it’s a lead indicator or a temperature check, and its findings are complicated by periods when rates were zero-lower-bounded.
Broadly speaking, JPMorgan finds that when the three-month average of its hawkishness-of-speakers measure rises between meetings by 10 points, it’s worth roughly 10 basis points to short-term interest rates with a one-week lead. That’s what the below chart apparently illustrates:
“Debate about these rankings is as likely as debate over who is the best footballer or baseball player,” JPMorgan says, accurately.
While ten basis points of performance is not to be sniffed at, the enterprise brings to mind sports performance metrics like expected goals, which often seem more useful for prolonging arguments than for predicting outcomes. And in the end, isn’t a prolonged argument what (human) economists want most of all?