【美聯(lián)儲(chǔ)】全面召回?大型語(yǔ)言模型的宏觀經(jīng)濟(jì)知識(shí)評(píng)價(jià)-2025.6_第1頁(yè)
【美聯(lián)儲(chǔ)】全面召回?大型語(yǔ)言模型的宏觀經(jīng)濟(jì)知識(shí)評(píng)價(jià)-2025.6_第2頁(yè)
【美聯(lián)儲(chǔ)】全面召回?大型語(yǔ)言模型的宏觀經(jīng)濟(jì)知識(shí)評(píng)價(jià)-2025.6_第3頁(yè)
【美聯(lián)儲(chǔ)】全面召回?大型語(yǔ)言模型的宏觀經(jīng)濟(jì)知識(shí)評(píng)價(jià)-2025.6_第4頁(yè)
【美聯(lián)儲(chǔ)】全面召回?大型語(yǔ)言模型的宏觀經(jīng)濟(jì)知識(shí)評(píng)價(jià)-2025.6_第5頁(yè)
已閱讀5頁(yè),還剩89頁(yè)未讀 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

FinanceandEconomicsDiscussionSeries

FederalReserveBoard,Washington,D.C.

ISSN1936-2854(Print)ISSN2767-3898(Online)

TotalRecall?EvaluatingtheMacroeconomicKnowledgeofLarge

LanguageModels

LelandD.Crane,AkhilKarra,PaulE.Soto

2025-044

Pleasecitethispaperas:

Crane,D.Leland,AkhilKarra,PaulE.Soto(2025).“TotalRecall?EvaluatingtheMacroeconomicKnowledgeofLargeLanguageModels,”FinanceandEconomicsDiscus-sionSeries2025-044.Washington:BoardofGovernorsoftheFederalReserveSystem,

/10.17016/FEDS.2025.044

.

NOTE:StafworkingpapersintheFinanceandEconomicsDiscussionSeries(FEDS)arepreliminarymaterialscirculatedtostimulatediscussionandcriticalcomment.TheanalysisandconclusionssetfortharethoseoftheauthorsanddonotindicateconcurrencebyothermembersoftheresearchstafortheBoardofGovernors.ReferencesinpublicationstotheFinanceandEconomicsDiscussionSeries(otherthanacknowledgement)shouldbeclearedwiththeauthor(s)toprotectthetentativecharacterofthesepapers.

TotalRecall?EvaluatingtheMacroeconomicKnowledge

ofLargeLanguageModels*

LelandD.Crane?AkhilKarra?PaulE.Soto?

June24,2025

Abstract

Weevaluatetheabilityoflargelanguagemodels(LLMs)toestimatehistoricalmacroe–conomicvariablesanddatareleasedates.WefindthatLLMshavepreciseknowledgeofsomerecentstatistics,butperformancedegradesaswegofartherbackinhistory.Wehighlighttwoparticularlyimportantkindsofrecallerrors:mixingtogetherfirstprintdatawithsubsequentrevisions(i.e.,smoothingacrossvintages)andmixingdataforpastandfuturereferenceperiods(i.e.,smoothingwithinvintages).WealsofindthatLLMscanoftenrecallindividualdatareleasedatesaccurately,butaggregatingacrossseriesshowsthatonanygivendaytheLLMislikelytobelieveithasdatainhandwhichhasnotbeenreleased.OurresultsindicatethatwhileLLMshaveimpressivelyaccuraterecall,theirerrorspointtosomelimitationswhenusedforhistoricalanalysisortomimicrealtimeforecasters.

*WethankGaryCornwall,AnneHansen,participantsinBoardbrownbags,andparticipantsatthe2025SGEconferenceforusefulcomments.WethankBetsyVrankovichforhertechnicalexpertise.

OpinionsexpressedhereinarethoseoftheauthorsaloneanddonotnecessarilyreflecttheviewsoftheFederalReserveSystemortheBoardofGovernors.

?BoardofGovernorsoftheFederalReserveSystem?CarnegieMellonUniversity

1

1Introduction

Theriseoflargelanguagemodels(LLMs)hasgeneratedinterestinhowtheycanbeusedforeconomicanalysisandforecasting(e.g.,

Korinek

2023

).TheutilityofLLMsdependsontheirunderstandingofeconomics–relatedfactsandtheirabilitytofollowinstructionsprecisely.WeevaluateLLMsonseveraldimensionsrelatedtothesecapabilities.First,howwelldoLLMsestimateimportantmacroeconomicvariablesfromthepast?Second,towhatextentareLLMs’estimatescontaminatedwithfutureinformation?Andthird,howwelldoLLMsrecalldatareleasedates?LLMswhichhaveaccurateknowledgeofeconomichistory(in–cludingdatareleasedates)willlikelybemoreusefulwhengeneratinghypothesesanddo–inganalysis.Separately,ifLLMscanproviderealisticquasi–real–timeestimates—simulatingforecastersfromthepast—thenwecanbetterunderstandhowtheLLM’sforecastingpro–cessrelatestohumanforecasts.Ontheotherhand,LLMestimateswhichareinaccurateorcontaminatedwithlook–aheadbiasmaybeofmorelimiteduse.

WefindthatforsomevariablesLLMshaveremarkablerecall.

1

TheLLMwefocuson—ClaudeSonnet3.5—canrecallthequarterlyvaluesoftheunemploymentrateandCPIwithfairlyhighaccuracybacktoWWII.However,itfaresmuchmorepoorlyonmorevolatilerealactivityserieslikerealGDPgrowthandindustrialproduction(IP)growth.TheLLMappearstomissmanyofthehigh–frequencyswingsintheseseries,thoughitdoescapturebusinesscyclevariationwell.

FocusingonGDP,wedevelopevidencethattheLLMestimateisamixtureofthefirstprintvalueforthereferenceperiodandsubsequentrevisedvaluesforthatreferenceperiod.ThissmoothingacrossdatavintagesappearsregardlessofwhetherweasktheLLMtopro–videthefirstprintorthefullyrevisednumber.LLMsaretrainedonanenormousamountofdataand—unlesseverypartofthecorpusisclearlydatestampedandthatinformation

1WeusethetermrecallwhentheLLMisestimatingahistoricalquantitywhichwas(presumably)initstrainingdata.Thisisdistinctfrom“retrieval”inthecontextofretrievalaugmentedgeneration,wheretheLLMisbackedbyasearchengineandreferencedocuments.OurfocusisontheLLMinisolation,andwhichhistoricalfactsitisabletoestimateaccurately.

2

isembeddedinthemodelweightsbythetrainingprocess—itwon’talwaysbeclearwhen

thetextwaswrittenorwhichvintageofGDPitisreferringto.Themixingoffirstprintandfullyreviseddataisproblematic,becauseitmeans(1)themodelhasalessthanaccurateret–rospectiveunderstandingoftheeconomicsituation,and(2)themodelwillhavedifficultysimulatingareal–timeforecaster.

ArelatedbutdistinctquestioniswhetherLLMestimatesforagivenreferenceperiodareinfluencedbyfutureandpastreferenceperiods,keepingthevintageconstant.Inotherwords,areLLMestimatesofdatapublishedfordatetaffectedbypublisheddatavaluesfromt+1?WedevelopatestforwhethertheLLM’sestimateforaparticulardateisin–fluencedbyfutureshockstotheseries,controllingforexpectations.WefindsuggestiveevidencethatLLM’sdoindeedusefuturereferenceperiodvaluewhenconstructinganes–timate,evenwheninstructedtoignorefutureinformation.AnysuchsmoothingisagainachallengeforhistoricalanalysisandusingLLMstomimicreal–timeforecasters.

Finally,wedocumenttheLLM’sknowledgeofeconomicdatareleasedates.WefindthatLLMsoftenhaveanaccurateideaofwhenhistoricaldatareleasesoccurred.However,theysometimesmissthetruereleasedatebyafewdays.Theresultsarealsosensitivetothedetailsoftheprompt;wefindthatvaryingtheprompttoreducethenumberofestimatereleasedatesthatarelateleadstoanincreaseinestimatedreleasedatesthataretooearly.Ourpromptengineeringdoesn’tleadtoastrategythatincreasesaccuracytoaveryhighlevel;ratherweenduptradingoffdifferenttypesoferrors.TheconclusionisthattheLLMdoesn’thaveaverystrongconceptionoftheindividualdatareleasedates.Wefindthat—aggregatingacrossmajoreconomicindicators—onatypicaldaythereisagoodchancetheLLMfalselybelievesatleastsomemajordatareleaseshaveoccurred.Interestingly,theseerrorsareexactlythekindwewouldexpectahumantomake:sometimestooearly,sometimestoolate,andattemptstoreduceonekindoferrorincreasetheother.

OurresultspaintamixedpictureofcurrentLLMcapabilities.LLMrecallofhistoricaldatavaluesandreleasedatesisoftenveryimpressive.Thatsaid,therearealsosignificant

3

shortcomingsinLLMrecall,andtheerrorsareoftencorrelatedwithinformationfromafterthereferencedate.Atahighleveltheseerrorsareveryhumaninthattheycanbeinterpretedasagood–faithefforttofollowinstructionswhilebeinghamperedbyafuzzyrecollectionofthepast.Thesepatternssuggestthatlook–aheadbiasmaybeanimportantchallengewhenusingLLMs.

2LiteratureReview

AnumberofrecentpapershaveusedLLMsforeconomicforecastingandanalysis.

Kim

etal.

(

2024

)findthatanLLMcanpredictfirmearningswhenpromptedwithanonymizedaccountingdata.

Cooketal.

(

2023

)useLLMstoanalyzeearningscalls.

PhamandCun–

ningham

(

2024

)presentout–of–sample(i.e.post–knowledgecutoff)forecastsforinflationandAcademyAwards.

Schoeneggeretal.

(

2024

)showthatGPT4canhelphumanforecast–ersonavarietyoffinancialandpoliticalforecastingtasks,allofwhichoccurredaftertheknowledgecutoff.Similarly,

Phanetal.

(

2024

)compareLLMforecastswithcrowd–sourcedforecasts.

Jhaetal.

(

2024

)feedearningscalltranscriptstoGPT3.5andshowthatitcanhelpforecastcapitalinvestmentandabnormalreturns.Aspartoftheirrobustnessexercisestheyrestrictthesampletothepost–knowledgecutoffperiod,andseparatelytrytoanonymizethetranscripts.

GlassermanandLin

(

2023

)examineGPT3.5’sabilitytoforecaststockreturnsfromnewsheadlines;theyanonymizecompanynamestoavoidanin–sample“distraction”effect.

Faria–e–CastroandLeibovici

(

2023

)evaluateinflationforecastsfromanLLM,bothbeforeandaftertheknowledgecutoff.

Zarifhonarvar

(

2024

)studieshowdifferentpromptsandaccesstodifferentinformationaffectGPT4’sinflationexpectations.Separately,astrandoftheliteraturehasusedLLMsasstand–insforhumansinsurveysorstrategicgames(

Man–

ningetal.

(

2024

),

Kazinnik

(

2024

),

Trancheroetal.

(

2024

).)

Hansenetal.

(

2024

)contributetobothliteratures,simulatingSurveyofProfessionalForecasters(SPF)respondentsandevalu–atingthepropertiesoftheLLM–derivedforecasts.Finally,anumberofpapersuseLLMsasclassifiersforthingslikenewsheadlines,andthenusetheclassificationstobuildindicators

4

likesentimentindexes(

Shapiroetal.

,

2022

;

Bybee

,

2023

;

Cajneretal.

,

2024

;

vanBinsbergen

etal.

,

2024

).

Manyofthesepapersacknowledgelook–aheadbias—thepotentialforanLLMthatissupposedtomimicanagentactingattimettouseinformationfromt+1orlater—andat–tempttoaddressitwithanonymization,post–knowledge–cutoffcomparisons,andprompt–ingtechniques.Somewhatlesshasbeendonetodirectlymeasuretheextentoflook–aheadbias.

2

SakarandVafa

(

2024

)isoneexception,theyshowlook–aheadbiasarisesintwocon–textswhereGPT4isaskedtoactasarealtimeforecaster:first,whenassessingpre–pandemicearningscallsforriskfactors,theLLMsometimesmentionspandemicsandCovid.Second,theLLMisoftenableto“forecast”thewinnerofcloseelections.

Lopez–Liraetal.

(

2025

)evaluaterecallandlook–aheadbiasforfinancialmacroeconomicvariables;interestingly,theirestimatesofrecallofrecallaccuracyarehigherthanours,suggestingsomemodel–orprompt–specificeffects.WecomplementthesepapersbydevelopingmoreformaltestsofdataleakageinthemacroeconomicsettingandexploringtheLLM’sunderstandingfordatareleasedates,acriticalfactorforreal–timeforecasting.

Ludwigetal.

(

2025

)alsodiscusslook–aheadbiasinthecontextofcongressionallegislationandfinancialnews.Toaddresstheseconcerns

Sarkar

(

2024

)and

Heetal.

(

2025

)developsequencesofLLMstrainedonlyondatauptoaknownpointintime,butofcoursethesemodelsaremuchsmallerthanthecommerciallyavailableonesanddohavethefullsetofcapabilitiesavailablewithfrontiermodels.

Look–aheadbiasisalsoafocusofourpaper;weaddtotheliteraturebyquantifyingseveralpracticallyimportanttypesoflook–aheadbias,e.g.thecontaminationofanLLM’smemoriesoffirst–printdatawithlaterrevisionsanduncertaintyaboutthetimingofdatareleases.WealsodevelopatestforwhetherLLM’sestimatesarecontaminatedbyfuturedatavalues.

Assessinglook–aheadbiasishard.LLMshaveattractedattentionfromforecasterspre–

2See

Croushore

(

2011

)foradetaileddiscussionoftherelatedtopicsofdatarevisionsandforecastinstabilityintraditionalforecasting.

5

ciselybecausethereisreasontothinktheymightproveusefulforprediction.Thismeans

thathighaccuracyatforecastingcannotbecountedasstrongevidenceoflook–aheadbias;LLMsarecapableforecastersweshouldexpectthemtobeatsomeotherforecasts.Inthispaperwetakeanindirectapproach,focusingontheLLM’srecallofhistoricaldataval–ues/releasedates.Itappearseasiertoshowthaterrorsinrecallareinfluencedbyfutureinformationthanitistoprovethataforecastis“tooaccurate”.Notethat

Hansenetal.

(

2024

)prompttheLLMwithrecentvaluesofmacroeconomicindicatorstogrounditandhelpimproveperformance;thisstrategymayalsohelpmitigatelook–aheadbias.Ourworkcom–plementstheirsbydocumentingthecapabilitiesandlimitationsoftherawLLMwithoutadditionalinformationpassedintotheprompt.

Ourassessmentgoesbeyondthetopicoflook–aheadbias,aswetestwhethertheLLMcanaccuratelyrecalleconomicstatisticsingeneral.AnanalystusinganLLMtoexploreeconomichypotheseswouldwantthemodeltohaveaclear,preciseunderstandingofeco–nomichistory.DocumentingtheextentofrecallandthelimitationsonLLM’sknowledgewillassistresearchersconsideringhowtousethesetools.

3ModelsandData

Formostofthepaperwefocusonfourmacroeconomictimeseries:GDP,inflation,industrialproduction,andunemployment.Similarlyto

Hansenetal.

(

2024

),werestrictourattentiontoquarterlyvaluessothatwecancomparetotheSPF.Thedetailsoftheseriesareasfollows:

?GrossDomesticProduct(GDP):TheseasonallyadjustedannualizedonequartergrowthrateofrealGDP

?Inflation:ThefourquarterchangeintheseasonallyadjustedConsumerPriceIndex(CPI)

?IndustrialProduction(IP):TheseasonallyadjustedannualizedonequartergrowthrateofIP

6

?Unemployment:Theonequarteraverageoftheseasonallyadjustedleveloftheun–employmentrate

Weuseboththefully–revised(currentvintage)numbers,aswellasthefirst–printvalues.

3.1Models

WeuseAnthropic’sClaudeSonnet3.5largelanguagemodelasprovisionedthroughAWSBedrock.

3

Sonnet3.5iswidelyconsideredtobecomparabletoOpenAI’scontemporaneousofferings(thoughitdoesnothavethereasoningcapabilitiesofo1andlatermodels),anditperformsverywellonbenchmarks.Notethatthismodeldoesnothaveinternetsearchortooluseenabled;itcannotaccessanyupdatedinformationasidefromwhatisincludedintheprompt.WedonotuseOpenAI’smodelsbecausewedonothaveaneasywaytoaccessthem.

3.2Methodology

OurmainqueriesinstructtheLLMtothinkstep–by–step,writeouttheirreasoning,andonlywritethefinalanswerattheend.Thisisintendedtoimproveperformance,asLLMscanbenefitfromreasoningstep–by–stepbeforecommittingtoananswer(

Weietal.

,

2022

).ThesystempromptcanbefoundinFigure

18

,andanexampleuserpromptisshowninFigure

19

.

Theresponsestothequeriesareverbose.Weuseasecondary“summarizer”LLMandprompttoextracttheestimatefromtheresponses.Thesummarizerisinstructedtoreadtheoriginalresponseandreturnananswerapproximatelyoftheform“Answer:{estimate}”,where{estimate}isthedesiredestimate.Wethenparsethesummarizer’sanswerswitharegularexpression(regex)toextractthenumericpointestimate.

Itisworthnotingthatthedevelopmentofthepromptsisaniterativeprocess.Ourinitial

3ThemodelIDisanthropic.claude-3-5-sonnet-20240620-v1:0.ThisistheoriginalSonnet3.5,notthenewerversionofSonnet3.5releasedinOctober2024.

7

attemptsyieldedmanyranges(notpointestimates)andmanyfailurestoanswer.Toaddress

thisweaddedinstructionstoalwaysproduceananswerandtoavoidgivingranges.Asanotherexample,ourparserwouldsometimesfailtolocatetheanswer.Wefoundthiswasbecausethesummarizerwasnotconsistentaboutcapitalizing“Answer”,whichwefixedbychangingtheregex.

3.3NondeterminisminAnswers

IntypicaluseLLMresponsesarestochastic.TheLLMgeneratesaresponseonetokenatatimeandthetokengeneratedisafunctionofthetext—eitherinthepromptortheincom–pleteresponse—uptothatpointintime.

4

TheLLMgeneratestokensbysamplingfromthemodel’sprobabilitydistributionofnexttokens,somoreprobablecompletionsarechosenmoreoften.

Severalparametersgovernthesamplingprocess.Inolder,smallerLLMs(likeGPT–2)themostimportantisthetemperature.InsimpleLLMsatemperatureofzerocorrespondstoanessentiallydeterministicresponse.However,frontiermodelsincludeotherfactors(likemixtureofexperts)thatintroduceothersourcesofrandomness.

Weruneachqueryseveraltimesandaverageestimatesinordertoattenuatetheran–domnessinLLMresponses.Wealsocalculatethestandarderrorofthismeanestimateanduseittoplotconfidenceintervals.Theaveragedresponsesareclosetodeterministic,andtheconfidenceintervalsshowuswherethereisstillsignificantrandomness.

3.4ChoosingtheTemperature

Weneedtoevaluatehowmuchthetemperatureparametermattersinourcontextandwhatvaluetosetitto.Figure

1

showstwoGDPestimates:onewiththetemperaturesettoone(thedefault),andonewiththetemperaturesettozero.

5

Thetwoseriesareextremelysimilar.

4Tokensarewordsorwordparts,forexample“the”maybeasingletokenbut“generates”mightbetok–enizedasgenerat,es

5Forthetemp.=0versionwealsosetthe"topk"parameterequaltoone;inasimpleLLMthiswouldensurethattheLLMchoosesonlythemostprobablenexttokenconditionalonthesetofavailabletokensandtheir

8

Percent

10

5

0

-5

Temp.=0,corr.w/actual:.7855

Temp.=1,corr.w/actual:.7907

l

1940q11950q11960q11970q11980q11990q12000q12010q12020q12030q1

Note:LLMestimatesofGDPunderdifferenttemperatureparameters.Correlationsarewithactual,finalprintGDP.Covidperiodnotplottedtokeepscalereadable.

Source:Authors’calculations,BEA

Figure1:TemperatureandRecallofGDP

Theircorrelationswithactualfirst–printGDParealsosimilar,thoughthetemp.=1serieshasamarginallyhighercorrelation.Basedonthis—andthefactthatthetemperatureissettoonebydefault—weusetemp.=1asthemainspecificationinmostofwhatfollows.

3.4.1Digression:NondeterminismatTemperature=0

Interestingly,the(within–quarter)standarddeviationsofthedifferenttemperatureseriesarealsoverysimilar.Inparticular,forthetemp.=1seriestheaveragewithin–quarterstandarddeviationoftheestimatesis0.786,whiletheaveragestandarddeviationforthetemp.=0seriesis0.7616.Whilethetemp.=0seriesappearstohavemarginallylessvariability,thesizeoftheeffectisverysmall.

Alackofcompletedeterminismwithtemp.=0isunderstoodtobeafeatureofthelarger

probabilities.Likesettingtemp.=0,thiswouldmaketheresponsedeterministicinasimplerLLM.

9

LLMs.

6

Butthenear–identicalresultsweseeaboveraisequestionsastowhetherthetem–

peratureparameterhasanymaterialimpactatall,orwhetherourcodebaseissettingitcorrectly.Table

1

showsthatwecaninfactdocumentsomeeffectoftemperature.Forthisexercisewelookattheraw,textresponseoftheLLM,beforeparsingandsummarization.WefixacharacterlengthN(say,50characters)andcomparethefirstNcharactersoftworandomresponses.Thecomparisonisdonewithinquarters,sothepromptsforthetworesponsesareidentical.WecheckwhetherthefirstNcharactersoftheresponseareidenti–cal,andrecordanindicatorvariablethatequals1foramatchand0foradifference.Thuseachpairofresponsesgeneratesasingleindicatorvariable,andwerepeattheprocessmanytimes.Table

1

showstheresults.Whenlookingatthefirst50characters,withtemperaturesettozero42percentofresponsepairsareidentical;thisdropsto22percentwithtemper–aturesettoone.Thisamountstoasignificantchangeinthevariabilityoftheresponses,thoughthereisobviouslyagreatdealofvariationinthezerotemperatureresponses.

ItappearsthatsettingthetemperaturetozeroforSonnet3.5onBedrockdoesindeedmaketheresponsestringmoredeterministicasmeasuredbyincreasingresponsesimilarityacrossidenticalqueries.However,settingtemperaturetozerodoesnotremoverandom–nessbyanymeansandmakesverylittledifferenceforthesubstanceoftheresponse:theGDPestimate.Ourresultsgenerallymirrorthoseof

Ouyangetal.

(

2025

),whoshowsignif–icantnon–determinisminOpenAI’sGPT–3.5andGPT–4modelsevenwithtemperaturesetto0.Wewouldcautionusersagainstassumingthattemp.=0ensuresdeterministicorevenmostlydeterministicresults.Evenwithtemp.=0averagingacrossseveralqueriesstillseemsnecessarytoensurethatresultsarereproducible.

6

ThedocumentationforClaude

mentionsthat“Notethatevenwithtemperatureof0.0,theresultswillnotbefullydeterministic.”Seealso

Ouyangetal.

(

2025

).

10

SequenceLength

Temperature

Obs.

Mean

St.Dev.

50chars.

0

3150

0.42

0.49

1

3150

0.22

0.42

100chars.

0

3150

0.37

0.48

1

3150

0.14

0.34

200chars.

0

3150

0.27

0.44

1

3150

0.03

0.18

Table1:Fractionofresponsesidenticalatvarioussequencelengths

4TestingLLMRecall

InthissectionwetesthowwellLLMsrecallimportantmacroeconomicstatistics.Theprompt—showninFigure

19

—askstheLLMtouseallinformationavailabletothem(i.e.,theLLMisnotinstructedtobehaveasarealtimeforecaster.)WeasktheLLMforestimatesthrough2027whichitprovideseventhoughitsknowledgecutoffisin2024.ExaminingtheLLMresponsesinthesecasesshowitdecidestoprovideaforecastinthesecases.Themostrecentactualdataavailableasofthiswritingisfor2025Q1.

Figure

2

showstheresultsforCPIinflationandtheunemploymentrate.Ineachpanelthebluelineisthetrue,fully–revisedseries.TheredlineistheaverageestimatereturnedbytheLLM,andthepinkbandisthe95percentconfidenceintervalbasedonthevariabilityofthe10iterationsofeachquery.ItisevidentthattheLLMgenerallyrecallssomethingveryclosetotruthforbothseries.Theonlymajorvisiblegapsappearforpre–1990CPIinflation,wheretheLLMseemstobebiasedupwheninflationislow.Inaddition,theconfidencebandsaretight,indicatinglittlevariabilityintheLLMresponses.

Figure

3

showsthesameexerciseforrealGDPgrowthandindustrialproductiongrowth.

Here,thestoryisquitedifferent.TheLLMconsistentlymissesthehigh–frequencyswingsintheseseries,thoughitdoestrackmanybusinesscyclemovements.Notethattheyear2020isnotplottedsincethepandemicrealactivityswingswoulddwarftherestofthevariation.

ItiseasiertoseethedynamicsinFigures

4

and

5

,whichfocusonthe1990–2019period.

Duringthisperiod,CPIinflationandtheunemploymentratearerecalledprecisely.Onthe

11

Percent

Percent

CPI

20

15

10

5

0

-5

CPI:FullyRevisedSonnet3.5estimateConfidenceinterval

1940q11950q11960q11970q11980q11990q12000q12010q12020q12030q1

Unemployment

15

10

5

0

Unemployment:FullyRevisedSonnet3.5estimate

Confidenceinterval

1940q11950q11960q11970q11980q11990q12000q12010q12020q12030q1

Note:LLMestimatesofquarterlyvariables.95%Confidenceintervalsbasedon10repetitionsofthesamequery.Datagothrough2025Q1,LLMestimatesthrough2027Q1.

Source:BLS,authors’calculations

Figure2:LLMRecallofCPIandUnemployment

12

Percent

GDP

20

GDP:FullyRevisedSonnet3.5estimateConfidenceinterval

10

0

-10

1940q11950q11960q11970q11980q11990q12000q12010q12020q12030q1

Percent

40

20

0

-20

-40

IP

IP:FullyRevised

Sonnet3.5estimateConfidenceinterval

1940q11950q11960q11970q11980q11990q12000q12010q12020q12030q1

Note:LLMestimatesofquarterlyvariables.95%Confidenceintervalsbasedon10repetitionsofthesamequery.Covidperiodnotplottedtokeepscalereadable.Datagothrough2025Q1,LLMestimatesthrough2027Q1.

Source:BEA,FederalReserveBoard,authors’calculations

Figure3:LLMRecallofGDPandIP

13

Percent

6

4

2

0

-2

CPI

CPI:FullyRevisedSonnet3.5estimateConfidenceinterval

1990q12000q12010q12020q1

Percent

10

Unemployment:FullyRevisedSonnet3.5estimate

Confidenceinterval

8

6

4

2

1990q12000q12010q12020q1

Unemployment

Note:LLMestimatesofquarterlyvariables.95%Confidenceintervalsbasedon10repetitionsofthesamequery.

Source:BLS,authors’calculations

Figure4:Pre–PandemicRecentHistory:CPIandUnemployment

14

Percent

Percent

IP:FullyRevised

Sonnet3.5estimateConfidenceinterval

GDP

10

5

0

-5

-10

GDP:FullyRevisedSonnet3.5estimateConfidenceinterval

1990q12000q12010q12020q1

IP

10

0

-10

-20

1990q12000q12010q12020q1

Note:LLMestimatesofquarterlyvariables.95%Confidenceintervalsbasedon10repetitionsofthesamequery.

Source:BEA,FederalReserveBoard,authors’calculations

Figure5:Pre–PandemicRecentHistory:GDPandIP

15

Percent

CPI:FullyRevisedSonnet3.5estimateConfidenceinterval

CPI

10

8

6

4

2

2021q32023q12024q32026q12027q3

Percent

5.5

5

4.5

4

3.5

Unemployment

Unemployment:FullyRevisedSonnet3.5estimate

Confidenceinterval

2021q32023q12024q32026q12027q3

Note:LLMestimatesofquarterlyvariables.95%Confidenceintervalsbasedon10repetitionsofthesamequery.VerticallineshowsSonnet3.5’sknowledgecutoff(April2024).Datagothrough2025Q1,LLMestimatesthrough2027Q1.

Source:BLS,authors’calculations

Figure6:Post–2021CPIandUnemployment

16

Percent

Percent

IP:FullyRevised

Sonnet3.5estimateConfidenceinterval

GDP

8

6

4

2

0

-2

GDP:FullyRevisedSonnet3.5estimateConfidenceinterval

2021q12022q32024q12025q32027q1

IP

10

5

0

-5

2021q12022q32024q12025q32027q1

Note:LLMestimatesofquarterlyvariables.95%Confidencein

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

評(píng)論

0/150

提交評(píng)論