Interest Rate Volatility and No-Arbitrage Term Structure Models

Interest Rate Volatility and

No-Arbitrage Aﬃne Term Structure Models

∗

Scott Joslin

†

Anh Le

‡

This draft: April 3, 2016

Abstract

An important aspect of any dynamic model of volatility is the requirement that

volatility be positive. We show that for no-arbitrage aﬃne term structure models, this

admissibility constraint gives rise to a tension in simultaneous ﬁtting of the physical

and risk-neutral yields forecasts. In resolving this tension, the risk-neutral dynamics

is typically given more priority, thanks to its superior identiﬁcation. Consequently,

the time-series dynamics are derived partly from the cross-sectional information; thus,

time-series yields forecasts are strongly inﬂuenced by the no-arbitrage constraints. We

ﬁnd that this feature in turn underlies the well-known failure of these models with

stochastic volatility to explain the deviations from the Expectations Hypothesis observed

in the data.

∗

We thank Caio Almeida, Francisco Barillas, Riccardo Colacito, Hitesh Doshi, Greg Duﬀee, Michael

Gallmeyer, Bob Kimmel, Jacob Sagi, Ken Singleton, Anders Trolle and seminar participants at the

Banco de Espa˜na - Bank of Canada Workshop on Advances in Fixed Income Modeling, Emory Goizueta,

EPFL/Lausanne, Federal Reserve Bank of San Francisco, Federal Reserve Board, Gerzensee Asset Pricing

Meetings (evening sessions), the 2012 Annual SoFiE meeting, the 2013 China International Conference in

Finance, and University of Houston Bauer for helpful comments.

†

University of Southern California, Marshall School of Business, [email protected]

‡

Pennsylvania State University, Smeal College of Business, anh [email protected]

1 Introduction

One of the key challenges for stochastic volatility models of the term structures, as observed by

Dai and Singleton (2002), is the “tension in matching simultaneously the historical properties

of the conditional means and variances of yields.” Similarly, Duﬀee (2002) notes that the

overall goodness of ﬁt “is increased by giving up ﬂexibility in forecasting to acquire ﬂexibility

in ﬁtting conditional variances.” Although the diﬃculty in matching both ﬁrst and second

moments in aﬃne term structure models has been a robust ﬁnding in the literature, the

exact mechanism that underlies this tension is not well understood. In this paper, we show

that the key element in understanding the tension between ﬁrst and second moments is the

no-arbitrage restriction inducing the additional requirement to match ﬁrst moments under

the risk-neutral distribution. Moreover, we show that precise inference about the risk-neutral

distribution has a number of important implications for stochastic volatility term structure

models.

The literature has largely attributed the failures of stochastic volatility term structure

models to match key properties in the data as the tension between the physical ﬁrst and

second moments. To see the importance of the no-arbitrage constraints, consider, for example,

the deviations from the expectations hypothesis (EH). Campbell and Shiller (1991) show that

when the EH holds, a regression coeﬃcient of φ

= 1 should be obtained in the regression

n−1,t+1

− y

n,t

= α

+ φ



n,t

− y

1,t

n − 1



+ 

n,t+1

, (1)

where

n,t

is the

-month yield at time

. However, in the data, the empirical

coeﬃcient

estimates are all negative and increasingly so with maturity. Dai and Singleton (2002)

(hereafter DS) show that no-arbitrage models with constant volatility are consistent with

the downward sloping pattern in the data. However, the no-arbitrage models with one or

two stochastic volatility factors are unable to match the pattern in the data. Their results

are replicated in Figure 1.

DS conjecture “the likelihood function seems to give substantial

weight to ﬁtting volatility at the expense of matching [deviations from the EH]”.

We estimate stochastic volatility factor models that do not impose no arbitrage but ﬁt

stochastic volatility of yields. In stark contrast to the no arbitrage models, the stochastic

volatility factor models can almost perfectly match the empirical patterns of bond risk

premia as characterized by regression coeﬃcients. This ﬁnding clariﬁes that ﬁtting stochastic

volatility is not an issue per se. Rather, it is the restrictiveness associated with the no-

arbitrage structure that underlies the well documented failure of the no arbitrage stochastic

volatility models to rationalize the deviations from the EH in the data.

The tension between ﬁrst and second moments arises because of the fact that volatility

must be a positive process. This requires that forecasts of volatility must also be positive.

This introduces a tension between ﬁrst and second moments. This type of tension, observed

by Dai and Singleton (2002) and Duﬀee (2002), is generally present in aﬃne stochastic

volatility models, even when no arbitrage restrictions are not imposed. In a no arbitrage

See Section 5 for additional details on the data and our estimation.

1 2 3 4 5 6 7 8 9 10

ï3

ï2.5

ï2

ï1.5

ï1

ï0.5

0.5

Maturity

Data

(3)

Figure 1: Violations of the Expectations Hypothesis. This ﬁgures plots the coeﬃcients

from the Campbell-Shiller regression in (1). When risk premia are constant so that the

expectations hypothesis holds, the coeﬃcients should be uniformly equal to one across all

maturities. The models

(3) are three factor no arbitrage models with

= 0

1, or 2

factors driving volatility. The models

(3) are three factor models that do not impose no

arbitrage with m = 1 or 2 factors driving volatility.

model, volatility must also be a positive process under the risk-neutral measure. This induces

an additional tension with risk-neutral ﬁrst moments. This creates a three-way tension now

between ﬁrst moments under the physical and risk-neutral measure and second moments.

The relative importance of these moments (and their role in the tension) are determined by

the precision with which they can be estimated.

At the heart of our result is the fact that the

dynamics is estimated much more

precisely than its historical counterpart. Intuitively, although we have only one historical time

series with which to estimate physical forecasts, each observation of the yield curve directly

represents a term structure of risk neutral expectations of yields. Due to this asymmetry,

it is typically “costly” for standard objective functions to “give up” cross-sectional ﬁts for

time-series ﬁts in estimation. As a result, when faced with the “ﬁrst moments” tension

– the trade-oﬀ between ﬁtting time series and risk-neutral forecasts – standard objective

functions typically settle on a rather uneven resolution in which cross-sectional pricing errors

are highly optimized at the expense of ﬁts to time series forecasts. The resulting impact on

the time series dynamics in turn deprives the estimated model of its ability to replicate the

CS regressions – meant to capture the times series properties of the data.

Our ﬁndings add to the recent discussion that suggests that no arbitrage restrictions are

completely or nearly irrelevant for the estimation of Gaussian dynamic term structure models

(DTSM).

Still left open by the existing literature is the question of whether the no arbitrage

restrictions are useful in the estimation of DTSMs with stochastic volatility. Our results show

that the answer to this question is a resounding yes – an answer that is surprising (given the

existing evidence regarding Gaussian DTSMs) but can now be intuitively explained in light

of our results. That is, the “ﬁrst moments” tension essentially provides a channel through

which relatively more precise

information will spill over and inﬂuence the estimation of the

dynamics. This channel does not exist in the context of Gaussian DTSMs in which the

admissibility constraint ensuring positive volatility is not needed.

Our ﬁndings also help clarify the nature of the relationship between the no arbitrage struc-

ture and volatility instruments extracted from the cross-section of bond yields documented

by several recent studies.

For example, we show that for the

(

) class of models (an

factor model with a single factor driving volatility), the cross-section of bonds will reveal

up to

linear combinations of yields, given by the

left eigenvectors of the risk neutral

feedback matrix (

), that can serve as instruments for volatility. The no arbitrage structure

then essentially implies nothing more for the properties of volatility beyond the assumed one

factor structure and the admissibility conditions. Furthermore, we show that the estimates

are very strongly identiﬁed and essentially invariant to volatility considerations. For a

variety of sampling and modeling choices, we show that the estimates of

are virtually

identical across models with or without stochastic volatility.

This invariance implies the

striking conclusion that a Gaussian term structure model – with constant volatility – can

reveal which instruments would be admissible for a stochastic volatility model.

An elaborate

example illustrating this point is provided in Section 5.2.

Finally, our results help identify aspects of model speciﬁcations that may or may not have

any signiﬁcant bearing on the model implied volatility outputs. For example, we show that

within the

(

) class of models, diﬀerent speciﬁcations of the market prices of risks are

unlikely to signiﬁcantly aﬀect the identiﬁcation of the volatility factor. To see this, recall

from the preceding paragraph that volatility instruments for an

(

) model are determined

by left eigenvectors of the risk neutral feedback matrix. Intuitively, since the market prices of

risks serve as the linkage between the

and

measures, and since the

dynamics is very

strongly identiﬁed, diﬀerent forms of the market prices of risks are most likely to result in

diﬀerent estimates for the

dynamics while leaving estimates of risk neutral feedback matrix

essentially intact. This thus implies that volatility instruments are likely identical across

these models with diﬀerent risk price speciﬁcations. Our intuition is consistent with the

See, for example, Duﬀee (2011), Joslin, Singleton, and Zhu (2011), and Joslin, Le, and Singleton (2012).

For example, Collin-Dufresne, Goldstein, and Jones (2009) ﬁnd an extracted volatility factor from the

cross-section of yields through a no arbitrage model to be negatively correlated with model-free estimates.

Jacobs and Karoui (2009) in contrast generally ﬁnd volatility extracted from aﬃne models are generally

positively related though in some cases they also ﬁnd a negative correlation. Almeida, Graveline, and Joslin

(2011) also ﬁnd a positive relationship.

In addition to our results, ﬁndings by Campbell (1986) and Joslin (2013b) also suggest that risk neutral

forecasts of yields are largely invariant to any volatility considerations.

A practical convenience of this result is that we can use the Gaussian model to generate very good

starting points for the

(

) models. In our estimation, these starting values take only a few minutes to

converge to their global estimates.

almost identical performances of volatility estimates implied by

(3) models with diﬀerent

(completely aﬃne and essentially aﬃne) risk price speciﬁcations as reported in Jacobs and

Karoui (2009).

The rest of the paper is organized as follows. In Section 2, we provide the basic intuition

as to how the “ﬁrst moments” tension arises. In Section 3, we lay out the general setup of

the term structure models with stochastic volatility that we subsequently consider. Section 4

empirically evaluates the admissibility restrictions under both the physical and risk neutral

measures. Section 5 provides a comparison between the stochastic volatility and pure gaussian

term structure models. Section 6 examines the impact of no arbitrage restrictions on various’

model performance statistics. Section 8 provides some extensions. Section 9 concludes.

2 Basic Intuition

In this section, we develop some basic intuition for our results before elaborating in more

detail both theoretically and empirically. We ﬁrst describe three basic moments that a term

structure model should match. We then show how tensions arise in a no arbitrage term

structure model in matching those moments. In particular, we show that the presence of

stochastic volatility induces a tension between matching ﬁrst moments under the historical

distribution (

) and the risk-neutral distribution (

). This tension accentuates the diﬃculty

in matching ﬁrst and second moments under the historical distribution.

2.1 Moments in a term structure model

A term structure model should match:

1. M

(P): the conditional ﬁrst moments of yields under the historical distribution,

2. M

(Q): the conditional ﬁrst moments of yields under the risk-neutral distribution, and

3. M

: the conditional second moments of yields.

A number of basic stylized facts are well-known about these moments (see, Piazzesi (2010)

or Dai and Singleton (2003), for example.) Empirically, the slope and curvature of the

yield curve (as well as the level to a slight extent) exhibit some amount of mean reversion.

Also, an upward sloping yield curve often predicts (slightly) lower interest rates in the

future.

(

) should capture these types of patterns. Recall that risk-neutral forecasts are

convexity-adjusted forward rates and therefore matching ﬁrst moments under the risk-neutral

measure,

(

), is closely related to the ability of the model to price bonds. The volatility

We make no distinction between second moments under the historical and risk-neutral distribution though

this is possible in some contexts. In Section 8.2 we discuss also the case where there is unspanned stochastic

volatility.

of yields is time-varying and persistent. Volatility is also related at least partially to the level

and shape of yield curve.

should deliver such features of volatility.

It is worth comparing that we could equivalently replace

(

) with matching risk premia.

Dai and Singleton (2003) and others take this approach. In this context, the model should

match time-variation in expected excess returns found in the data such as the fact that

when yield curve is upward sloping, excess returns for holding long maturity bonds are on

average higher. Since excess returns are related to diﬀerences between actual and risk-neutral

forecasts (i.e. the expected excess return is the diﬀerence between an expected future spot

rate and a forward rate), such an approach is equivalent to our approach. As we explain

later, focusing on risk-neutral expectations has the beneﬁt of isolating parameters which are

both estimated precisely and, importantly, invariant to the volatility speciﬁcation.

2.2 The ﬁrst moments tension

We now develop some intuition for how the “ﬁrst moments” tension—that is a tension between

matching M

(P) and M

(Q)—arises.

Consider the aﬃne class of models,

(

), formalized by Dai and Singleton (2000). Due

to the aﬃne structure, the processes for the ﬁrst

principle components of the yield curve

(e.g., level, slope, and curvature), denoted P, can be written as:

=(K

+ K

)dt +

, (2)

=(K

+ K

)dt +

, (3)

where

are standard Brownian motions under the historical measure,

, and the risk

neutral measure,

, respectively. Σ

is the diﬀusion process of

, taking values as an

N ×N

positive semi-deﬁnite matrix:

= Σ

+ Σ

1,t

+ . . . + Σ

M,t

, and V

i,t

= α

+ β

· P

, (4)

where

i,t

’s are strictly positive volatility factors and conditions are imposed to maintain

positive semi-deﬁnite (psd) Σ

A rich body of literature has shown that the volatility of the yield curve is, at least partially, related to

the shape of the yield curve. For example, volatility of interest rates is usually high when interest rates are

high and when the yield curve exhibits higher curvature (see Cox, Ingersoll, and Ross (1985), Litterman,

Scheinkman, and Weiss (1991), and Longstaﬀ and Schwartz (1992), among others).

Importantly, diﬀusion invariance implies that the diﬀusion, Σ

, is the same under both measures. Since Σ

is the same under both the historical and risk-neutral measures, it must be that the coeﬃcients in (4) are the

same under both measures. A caveat applies that at a ﬁnite horizon, there may be diﬀerence in the coeﬃcients

in (4). These arise because of diﬀerences in

[

t+∆t

] and

[

t+∆t

]. Importantly, however, (

∆t

, β

∆t

) will

not depend on

. This diﬀerences will manifest in diﬀerences in the other coeﬃcients. That is, there

will be (Σ

∆t,Q

, . . . ,

∆t,Q

) which will be diﬀerent from (Σ

∆t,P

, . . . ,

∆t,P

). These diﬀerences will

not be important for our analysis. Even so, in typical applications, the time horizon is small (from daily to at

most one quarter), so even these diﬀerences will be minor. See also Section 4 and Appendix B.

Alternatively, one could express the diﬀusion as Σ

1,t

. . .

N,t

. When the model

falls in the

(

) class, the matrices (

, . . . ,

) will lie in an

-dimensional subspaces, allowing the

representation in (4)

The one-factor structure of volatility

For the sake of clarity, let us ﬁrst specialize to the case:

= 1. Due to the positivity of the

one volatility factor,

β · P

(where for simplicity we drop the indices in equation

(4)), forecasts of

at all horizons must remain positive. Thus, to avoid negative forecasts,

the (

N −

1) non-volatility factors must not be allowed to forecast

. This in turn requires

that the drift of V

must depend on only V

According to equation (2), the drift of

(ignoring constant) is given by

. For this

to depend only on

, and thus

, it must be the case that

is a multiple of

. That

is, β must be a left-eigenvector of K

. Equivalently, β must be an eigenvector of K

Likewise, applying similar logic under the risk-neutral measure, it must follow that

is a

left-eigenvector of

. Thus, the volatility loading vector

must be a left eigenvector to both

the risk neutral feedback matrix,

, and physical feedback matrix,

. This establishes a

tight connection between the physical and risk neutral yields forecasts since

and

are

forced to share one common left eigenvector.

With this in mind, an unconstrained estimate of

, for example one obtained by ﬁtting

to a VAR(1) analogous to (2), may not be optimal. The reason being, such an unconstrained

estimate might force

to admit a left eigenvector of

as one of its own. Such an

imposition can result in poor cross-sectional ﬁts. Likewise, an unconstrained estimate of

can signiﬁcantly impact the time series dynamics, by imposing one of its own left eigenvectors

upon

. By stapling the

and

forecasts together, the common left eigenvector constraint

potentially triggers some tradeoﬀ as the

and

dynamics “compete” to match

(

) and

(Q).

More general settings

More generally, since the volatility factors

i,t

must remain positive, their conditional expec-

tations at all horizons must be positive. For given

’s, only some values of (

, K

) will

induce positive forecasts of

i,t

for all possible values of

This is the well-documented

tension between matching ﬁrst and second moments (

(

) and

) seen in the literature.

We would like to choose a particular volatility instrument (

’s) to satisfy

, but the best

choice of β

’s to match M

may rule out the best choice of (K

, K

) to match M

(P).

Even within an aﬃne factor model with stochastic volatility (that is, a factor model that

does not impose conditions for no arbitrage so that (2) applies but not (3)), this tension would

arise. That is, no arbitrage does not directly aﬀect this tension. However, for no-arbitrage

aﬃne term structure models, the above logic applies equally to both the

and

measures.

As before, for a given choice of

’s, we will be restricted on the choice of (

, K

), so that

the drift of

i,t

under the risk-neutral measure guarantees that risk-neutral forecasts of

i,t

remain positive. Thus the no arbitrage structure adds a tension between

and

(

That is, the best choice of

’s to match

may be incompatible with the best choice of

, K

) to match M

(Q).

In the aﬃne model we consider, the possible values of

will be an aﬃne transformation of

×R

N−M

for some (M, N ).

This implies a three-way tension between

(

), and

. When a model matches

and either

(

) or

(

), it may not be possible to match the other ﬁrst moment.

Since the risk-neutral dynamics are typically estimated very precisely, this can lead to a

diﬃculty matching M

(P) when M

is also matched.

3 Stochastic Volatility Term Structure Models

This section gives an overview of the stochastic volatility models that we consider. First, we

establish a general factor time-series model with stochastic volatility that does not impose

conditions for the absence of arbitrage. Within these models, arbitrary linear combinations of

yields serve as instruments for volatility. An important consideration here is the admissibility

conditions required to maintain a positive volatility process. Next, we show how no arbitrage

conditions imply constraints on the general factor model. A key result that we show is that

no arbitrage imposes that the volatility instrument is entirely determined by risk neutral

expectations. Finally, we investigate further the links between volatility and the cross-sectional

properties of the yield curve within the no arbitrage model. For simplicity, we focus in the

main text on the case of a single volatility factor under a continuous time setup; modiﬁcations

for discrete time processes and more technical details are described in Appendix B.

3.1 General admissibility conditions in latent factor models

We ﬁrst review the conditions required for a well-deﬁned positive volatility process within

a multi-factor setting. Following Dai and Singleton (2000), hereafter DS, we refer to these

conditions as admissibility conditions. Recall the

-factor

(

) process of DS. This process

has an

-dimensional state variable composed of a single volatility factor,

, and (

N −

conditionally Gaussian state variables,

. The state variable

= (

, X

)

follows the Itˆo

diﬀusion





= µ

Z,t

dt + Σ

Z,t

, (5)

where

Z,t







1V X

1XV





, and Σ

Z,t

= Σ

+ Σ

, (6)

and

is a standard

-dimensional Brownian motion under the historical measure,

. Duﬃe,

Filipovic, and Schachermayer (2003) show that this is the most general aﬃne process on

× R

N−1

In order to ensure that the volatility factor,

, remains positive, we need that when

is zero: (a) the expected change of

is non-negative, and (b) the volatility of

becomes

zero. Otherwise there would be a positive probability that

will become negative. Imposing

additionally the Feller condition for boundary non-attainment, our admissibility conditions

are then

1V X

= 0, Σ

0Z,11

= 0, and K

≥

1Z,11

. (7)

A consequence of these conditions is that volatility must follow an autonomous process

under

since the conditional mean and variance of

depends only on

and not on

We now show how to embed the

(

) speciﬁcation into generic term structure models

where no arbitrage is not imposed and re-interpret these admissibility constraints in terms of

conditions on the volatility instruments.

3.2 An A

(N) factor model without no arbitrage restrictions

We can extend the latent factor model of (5–6) to a factor model for yields by appending the

factor equation

= A

+ B

, (8)

where (

, B

) are free matrices. Importantly, there are no cross-sectional restrictions that

tie the loadings (

, B

) together across the maturity spectrum. In this sense, this is a pure

factor model without no arbitrage restrictions.

Given the parameters of the model, we can replace the unobservable state variable with

observed yields through (8). Following Joslin, Singleton, and Zhu (2011), hereafter JSZ, we

can identify the model by observing that equation (8) implies

≡ W y

= (

W A

)+(

W B

)

for any given loading matrix

such that

is of the same size as

. Assuming

W B

full rank,

this in turn allows us to replace the latent state variable Z

with P

= (K

+ K

)dt +

+ Σ

, (9)

where we can write V

(the ﬁrst entry in Z

) as a linear function of P

: V

= α + β · P

Because the rotation from

is aﬃne, individual yields must be related to the yield

factors P

through:

= A + B P

. (10)

The admissibility conditions (7) map into:

= cβ

, (11)

where c is an arbitrary constant, and

β = 0, and β

≥

β. (12)

We will denote the stochastic volatility model in (9–10) by

(

). The model is parame-

terized by Θ

≡

(

, K

, α, β, A, B

) which is subject to the conditions in (12). Our

development shows that the

(

) model is the most general factor model with an underlying

aﬃne A

(N) state variable.

This is overidentifying. For details, see JSZ. In the current case, this would rule out unspanned stochastic

volatility in the factor model. We extend our logic to the case of partially unspanned volatility in Section 8.

To maintain internal consistency, we impose that

W A

= 0 and

W B

, as in JSZ. This guarantees

that as we construct the yield factors by premultiplying

to the right hand side of the yield pricing equation

(10), we exactly recover P

We will refer to the ﬁrst admissibility condition in (11) as condition

(

). This condition,

needed so that

is an autonomous process under

, can be restated as the requirement that

be a left eigenvector of

. With this requirement, choosing a

such that

matches

yields volatility (

) is equivalent to imposing a certain left eigenvector on the time series

feedback matrix

, which may hinder our ability to match the time series forecasts of bond

yields (

(

)). When it is not possible to choose

to match

(

) and

to match

the presence of

(

), a tension will arise. We refer to the tension between ﬁrst and second

moments as the diﬃculty to match M

(P) and M

in the presence of the constraint A(P).

3.3 No arbitrage term structure models with stochastic volatility

The

(

) no arbitrage model of DS represents a special case of the

(

) model. That is,

when one imposes additional constraints to the parameter vector Θ

one will obtain a model

consistent with no arbitrage. In this section, we ﬁrst review the standard formulation of the

(

) no arbitrage model. We then focus on the the eﬀect of no arbitrage on the volatility

instrument through the restriction it implies on the loadings parameter β.

The latent factor speciﬁcation of the A

(N) model

We now consider aﬃne short rate models which take a latent variable

with dynamics

given by (5–6) and append a short rate which is aﬃne in a latent state variable. We consider

the general market prices of risk of Cheridito, Filipovic, and Kimmel (2007). Joslin (2013a)

shows that any such latent state term structure model can be drift normalized under

that we have the short rate equation

= r

∞

+ ρ

+ ι · X

, (13)

where

denotes a vector of ones,

is either +1 or -1, and the canonical risk-neutral dynamics

of Z

are given by



N−1×1





1×N−1

N−1×1

diag(λ

)





dt +

+ Σ

, (14)

where

is ordered. To ensure the absence of arbitrage, we impose the Feller condition that

≥

1Z,11

No arbitrage pricing then allows us to obtain the no arbitrage loadings that replace the

unconstrained version of (8) in the

(

) model with

where

and

are dependent on the parameters underlying (13-14). From this, we again can rotate

≡ W y

to obtain a yield pricing equation in terms of

. This is a

constrained version of the yield pricing equation for the

(

) model in (10). In addition to

the time series dynamics in (9), we also obtain the dynamics of P under Q:

= (K

+ K

)dt +

+ Σ

, (15)

with V

= α + β · P

Compared to the

(

) model, one clear distinction of the

(

) model is the role of

the

dynamics (15) in determining yields loadings (

) and the volatility loadings

. We

provide an in-depth discussion of this dependence below. We ﬁrst explain the impact of the

no arbitrage restrictions on the volatility loadings

. Next, we provide an intuitive illustration

as to how the no arbitrage restrictions will give rise to an intimate relation between the yields

loadings

and the volatility loadings

. This compares starkly with the

(

) models for

which B and β are completely independent.

Implications of the no arbitrage restrictions for the factor model

Ideally, we would like to characterize the no arbitrage model as restrictions on the parameter

vector Θ

in the

(

) model. In JSZ, they were able to succinctly characterize the parameter

restrictions of the no arbitrage model as a special case of the factor VAR model. In their case,

essentially the main restriction was that the factor loadings (

) belongs to an

-parameter

family characterized by the eigenvalues of the Q feedback matrix. In our current context of

stochastic volatility models, such a simple characterization is not possible because changing

the volatility parameters Σ

aﬀects not only the volatility structure but also the loadings

This is because higher volatility implies higher convexity and thus higher bond prices

or lower yields. The fact that Σ

shows up both in volatility and in yields complicates a

clean characterization of the restrictions on Θ

that no arbitrage implies.

For this reason, we focus on a simpler but equally interesting question: what is the impact

of the no arbitrage restrictions on the volatility loadings β?

Recall from the previous subsection that for an

(

) model, the two main conditions

are : (1) matching second moments (

); and (2)

must be a left eigenvector of the

physical feedback matrix

that matches the ﬁrst moments under

(

)). Turning to the

(

) model, these conditions are still applicable. Additionally, applying the admissibility

conditions (7) to the risk-neutral dynamics in (15) results in a set of constraints analogous to

(11):

= cβ

, (16)

for an arbitrary number

. We will refer to the condition in (16) for the no arbitrage model

as the admissibility condition

(

). This implies a third condition on

for the no arbitrage

model:

must be a left eigenvector of the risk neutral feedback matrix

that matches the

ﬁrst moments under Q (M

(Q)).

The impact of the no arbitrage restrictions on

depends on how strongly identifying the

third condition is compared to the ﬁrst two. Should

be very precisely estimated from the

data, the estimates of

for the

(

) models are strongly inﬂuenced by

(

). Whence it is

possible that

estimates are diﬀerent across the

(

) and

(

) models. To anticipate

our empirical results, we compare these restrictions in subsequent sections and indeed ﬁnd

that the admissibility condition

(

) (together with matching

(

) and

) is essentially

In the Gaussian case, B

is only dependent on the eigenvalues of the risk-neutral feedback matrix, and

not on the volatility parameters.

the main restriction responsible for pinning down

in no arbitrage models whereas the direct

tension between ﬁrst and second moments implied by A(P) has virtually no impact.

Why might

be strongly pinned down in the data? Similar to JSZ, it can be shown

that the no-arbitrage restriction on K

takes the following form:

= (W B

)diag(λ

)(W B

)

−1

(17)

where

= (

, λ

)

. This follows from the rotation from

whose dynamics is given by

(14) to

. Additionally, observe that the loadings

depend only on (

, λ

). Since

is a normalization factor, it can be ignored. Σ

will aﬀect the yield loadings through the

Jensen eﬀects which are typically small and will be dominated by variation in risk neutral

expectations driven by

. Thus

will be well approximated by loadings obtained when

is set to zeros. These can be viewed as loadings from a Gaussian term structure model

which does not have a stochastic volatility eﬀect. Up to this approximation, the risk-neutral

feedback matrix is essentially a non-linear function of its eigenvalues, which are typically

estimated with considerable precision (for example, see JSZ).

Combined, this implies that

will be strongly identiﬁed in the data and thus

(up to scaling) is likely strongly aﬀected

by the no arbitrage restrictions due to A(Q).

To relate to the results of JSZ, we make the above arguments relying on the approximation

that convexity eﬀects are negligible. It is important to note that we can make our argument

more precise without resorting to approximations by a relatively more mechanical examination

of the above steps. In particular, we show in Appendix A that the volatility instrument

is in fact, up to a constant, completely determined by the (

N −

1) eigenvalues given in

Coupled with the observation that

is typically estimated with considerable precision, it is

clear that the volatility instruments are heavily aﬀected by the no arbitrage restrictions.

The relation between yield loadings and the volatility instrument

An alternative way of understanding the impact of the no arbitrage restrictions on the

volatility instrument is through examining the linkage between yield loadings (

) and

To begin,

and

are clearly independent for the

(

) models since they are both free

parameters. Intuitively, for these models the yields loadings

are obtained from purely

cross-sectional information: regressions of yields on the pricing factors

whereas the volatility

loadings

is obtained purely from the time series information. In contrast, in the context

of an

(

) model, both

and

are inﬂuenced by

. This common dependence on the

risk-neutral feedback matrix forces a potentially tight linkage between these two components.

For the sake of intuition, we consider below a simple example and show that for no arbitrage

models there is indeed an intimate relationship between B and β.

Intuitively,

governs the persistence of yield loadings along the

maturity

dimension. As shown by

Joslin, Le, and Singleton (2012), the estimates of the loadings (obtained, for example, by projecting individual

yields onto

) are typically very smooth functions of yield maturities. This relative smoothness in turn

should translate into small statistical errors associated with estimates of

. This intuition is conﬁrmed by

examining the results of JSZ in which λ

is estimated with considerable precision.

Let’s deﬁne the convexity-adjusted n-year forward rate on an one-year forward loan by:

(n) = E

[

t+n+1

t+n

ds]. (18)

In the spirit of Collin-Dufresne, Goldstein, and Jones (2008) we can write the following one

year ahead risk-neutral conditional expectation:





t+1

(0)

t+1

(1)





= constant +





0 0

0 0 1









(0)

(1)





. (19)

The ﬁrst row is due to the autonomous nature of

. The second row is the deﬁnition of

the forward rate in (18) for

= 1. The last row is obtained from the fact that in a three

factor aﬃne model, (

(0),

(1)) are informationally equivalent to the three underlying

states at time

. From the last row and by applying the law of iterated expectation to (18),

we have:

(2) = constant + a

+ a

(0) + a

(1). (20)

This equation may be solved to give

in terms of

(0),

(1), and

(2). Furthermore, since

(18) gives

t+1

(2) =

t+1

[

t+2

(1)] we can use (19) and (20) to express

[

t+1

(2)] in terms of

(0),

(1), and

(2). Putting these together allows us to substitute

out from (19) and

obtain





t+1

(0)

t+1

(1)

t+1

(2)





= constant +





0 1 0

0 0 1









(0)

(1)

(2)





. (21)

Simple calculations give

−a

− a

, and

. It follows from the

last row of (21) that:

(3) = constant + α

(0) + α

(1) + α

(2). (22)

Equation (22) reveals that if the forward rates can be empirically observed, the loadings

can in principle be pinned down simply by regressing

(3) on

(0),

(1), and

(2). Based

on the mappings from (

) to

, it follows that the regression implied by (22) will also

identify all the

coeﬃcients, except for

. In the context of equation (20), it means that the

volatility factor is tightly linked to the forward loadings, up to a translation and scaling eﬀect.

Since forwards and yields (and therefore yield portfolios) are simply rotated representations

of one another, this implies a close relationship between the volatility instrument and yields

loadings.

As is well known, yields and forwards at various maturities exhibit very high correlations.

The

’s obtained for cross-sectional regressions similar to (22) are typically close to 100%

with pricing errors in the range of a few basis points. Therefore we expect the standard errors

associated with

to be small and thus the volatility loadings

will be strongly identiﬁed

from cross-sectional loadings.

Repeated iterations of the above steps allow us to write any forward rate

(

) as a linear

function of (

(0)

, f

(1)

, f

(2)). Suppose that we use

+ 1 forwards in (

(0),

. . . f

(

)) in

estimation, then:







(0)

(1)

(2)

(3)

(4)

(J)













1 0 0

0 1 0

0 0 1

(α)

. . .

(α)











(0)

(1)

(2)





where (

, . . . , g

) represent the cross-sectional restrictions of no-arbitrage. This allows us to

think of the no-arbitrage restrictions as having two facets. First, it imposes a cross-section

to time series link through the fact that ﬁxing

constrains what the volatility factor must

look like, through

and

. Second, it induces cross-sectional restrictions on the loadings

, . . . g

), just as is seen with pure Gaussian term structure models.

4 Evaluating the Admissibility Restrictions

We have seen in Section 3 that in order to have a well-deﬁned admissible volatility process,

we must have both

(

) and

(

) which can be restated as that

must be a common left

eigenvector of the feedback matrices under

and

. These admissibility restrictions are

helpful in providing guidance on potential volatility instruments. For example, although level

is known to be related to volatility, it is unlikely to be an admissible instrument for volatility

by itself. To see this, recall the well-known result (for example Campbell and Shiller (1991))

that the slope of the yield curve predicts future changes in the level of interest rates. Up to

the associated uncertainty of such statistical evidence, this suggests that the slope of the yield

curve predicts the level and thus also that the level of interest rates is not an autonomous

process.

We evaluate empirically how helpful each of the admissibility restrictions can be in

identifying the potential volatility instrument which in turn depends on the accuracy with

which the feedback matrices can be estimated. For example, if the physical (risk-neutral)

feedback matrix is strongly identiﬁed in the data, then the condition

(

) (

(

)) must

provide helpful identifying information about

. As will be seen, our assessments are relatively

robust to the extent that we do not have to actually estimate the term structure models, nor

do we require that

be matched. Following Joslin, Le, and Singleton (2012) (hereafter JLS),

we use the monthly unsmoothed Fama Bliss zero yields with eleven maturities: 6–month,

one- out to ten-year. We start our sample in January 1973, due to the sparseness of longer

maturity yields prior to this period, and end in December 2007 to ensure our results are not

inﬂuenced by the ﬁnancial crisis.

We note that the aﬃne dynamics for

in (9) implies that the one month ahead conditional

expectation of P

t+∆

is aﬃne in P

t+∆

] = constant + e

∆

(23)

where ∆ = 1

12. Thus

, even when sampled monthly, follows a ﬁrst order VAR. Importantly,

we can show that any left eigenvector of

must also be a left eigenvector of the one-month

ahead feedback matrix

∆

, denoted by

1,∆

In other words, the set of left eigenvectors

of the instantaneous feedback matrix

and the one-month ahead feedback matrix

1,∆

must be identical. As a result, we can equivalently restate

(

) as the requirement that the

volatility loading

be a left eigenvector of

1,∆

. Since our data are sampled at the monthly

interval, it is more convenient for us to focus on K

1,∆

in our empirical analysis.

Similarly, the aﬃne dynamics in (15) under

also implies a ﬁrst order VAR for

sampled at the monthly frequency:

t+∆

] = constant + e

∆

|{z}

1,∆

. (24)

Applying similar logic, we can again restate

(

) as the requirement that the volatility

loading β be a left eigenvector of the one-month ahead risk-neutral feedback matrix K

1,∆

It is worth noting that for small ∆,

1,∆

≈ I

+ ∆

. So in some sense, we can view

and

1,∆

interchangeably. Importantly though, as the arguments above illustrate, our results

do not rely on this approximation.

4.1 Admissibility restrictions under P

We ﬁrst consider the restriction

(

) which is present in both the

(

) and

(

) models.

This restriction guarantees that

is an autonomous process, which in turn is necessary for

volatility to be a positive process under

. This requires the volatility instrument,

, be a

left eigenvector of the one-month ahead physical feedback matrix

1,∆

. To the extent that

the conditional mean is strongly identiﬁed by the time-series, this condition will pin down

the admissible volatility instruments up to a sign choice and the choice of which of the

left

eigenvectors instruments volatility. However, in general even with a moderately long time

series, such as our thirty ﬁve year sample, inferences on the conditional means are not very

precise.

To gauge how strongly identiﬁed the volatility instrument is from the autonomy require-

ment under

, we implement the following exercise. First we estimate an unconstrained VAR

on the ﬁrst three principal factors,

. Ignoring the intercepts, the estimates for our sample

To see this, assume that

is a left eigenvector of

with a corresponding eigenvalue

. Applying the

deﬁnition of left eigenvector,

cβ

, repeatedly, it follows that

is also a left eigenvector

for any

. Substitute these into

∆

∞

n=0

∆

!, it implies that

∆

c∆

. Thus

is a

left eigenvector of e

∆

with the corresponding eigenvalue e

c∆

period are:

t+∆

= constant +





0.9902 −0.0092 −0.0472

0.0097 0.9548 −0.0802

−0.0021 0.0096 0.7991





| {z }

1,∆

+ noise. (25)

Then, for

each

potential volatility instrument

β · P

(as

roaming over all possible choices),

we re-estimate the VAR under the constraint that

is a left eigenvector of

1,∆

. The VAR

is easily estimated under this constraint after a change of variables so that the eigenvector

constraint becomes a zero constraint (compare the constraints in (7) and (12)). We then

conduct a likelihood ratio test of the unconstrained versus the constrained alternative and

compute the associated probability value (p-value). A p-value close to one indicates that the

evidence is consistent with such an instrument being consistent with

(

) while a p-value

close to zero indicates contradicting evidence.

In conducting this experiment, we do not

force

β ·P

to forecast volatility nor is

required to satisfy

(

). In this sense, this exercise is

informative about the contribution of

(

) in shaping the volatility instrument independent

of both A(Q) and the requirement that M

be matched.

Since

β · P

and its scaled version,

cβ · P

, for any constant

, eﬀectively give the same

volatility factor (and hence deliver the same p-values in our exercise), we scale so that all

elements of

sum up to one (the loading on PC1

(1) = 1

− β

(2)

− β

(3)). We plot the

p-values against the corresponding pairs of loadings on PC2 and PC3 in Figure 2. For ease

of presentation, in this graph the three PCs are scaled to have in-sample variances of one.

We see that there are three peaks which correspond to the three left eigenvectors of the

maximum likelihood estimate of

1,∆

. When

is equal to one of these left eigenvectors (up to

scaling), the likelihood ratio test statistic must be zero and hence the corresponding p-value

must be one, by construction. As our intuition suggests, many, though not all, instruments

appear to potentially satisfy

(

) according to the metric that we are considering. Thus

we conclude that the admissibility requirement under the

measure in general still leaves a

great deal of ﬂexibility in forming the volatility instrument.

4.2 Admissibility restrictions under Q

Turning to

(

), to have a clean comparison, it is ideal if we can implement the same

regression approach applied to

(

) in the previous exercise. That is, we ﬁrst run an

unconstrained regression using the Q forecasts:

t+∆

] = constant + K

1,∆

+ noise (26)

to obtain an estimate of

1,∆

. An important diﬀerence here with the

case in (25) is that

we now use

[

t+∆

] instead of

t+1

on the left hand side in the regression. Next, for

We view this test as an approximation since it assumes volatility of the residuals is constant. However,

computations of p-values, accounting for heteroskedasticity of the errors, deliver very similar results.

Figure 2: Likelihood Ratio Tests of the Autonomy Restriction under

. This ﬁgure reports

the p-values of the likelihood ratio test of whether a particular linear combination of yields,

β · P

, is autonomous under

, plotted against the loadings of PC2 and PC3. The loading of

PC1 is one minus the loadings on PC2 and PC3 (

(1) = 1

− β

(2)

− β

(3)). PC1, PC2, and

PC3 are scaled to have in-sample variances of one.

each potential volatility instrument

β · P

, we re-estimate the regression in (26) under the

constraint that

is a left eigenvector of

1,∆

. As is seen in the previous exercise, the resulting

likelihood ratios reveal whether or not the volatility instrument considered is consistent with

the admissibility constraint A(Q).

Although we do not strictly observe the risk neutral forecasts

[

t+∆

] for stochastic

volatility models due to the presence of convexity eﬀects, we use a model-free approach to

obtain very good approximation. The insight again is that risk-neutral expectations are, up

to convexity, observed as forward rates. The

-year forward rate that begins in one month,

∆,n

((n + ∆)y

n+∆,t

− ∆r

) is, up to convexity eﬀects:

∆,n

≈ E

n,t+∆

] (27)

where

n,t

denotes

-year zero yield observed at time

. Thus we can use (27) to approximate

[

n,t+∆

] whereby we simply ignore any convexity terms. This approximation is reasonable

for two reasons. First, Jensen terms are typically small. Second, notice that since our primary

interest is not in the level of expected-risk neutral changes but in their variation (as captured

1,∆

), it is only changes in stochastic convexity eﬀects that will violate this approximation.

Thus to the extent that changes in convexity eﬀects are small this approximation will be

valid for inference of K

1,∆

Using this method, we extract observations on

[

n,t+∆

] from forward rates which we

can then convert into estimates of

[

t+∆

] using the weighting matrix

. We denote this

approximation of E

t+∆

] by P

. Whence regression (26) translates into:

= constant + K

1,∆

+ noise. (28)

Regression (28) draws a nice analogy to the time series VAR(1) of (25) that we use in

examining

(

). Importantly, as this regression can be implemented completely independently,

abstracting from any time series considerations, it serves as a stand-alone assessment of

(

up to the validity of our convexity approximation approach. Notably, (28) makes clear the

(essentially) contemporaneous nature of the estimation of

1,∆

. Since

explains virtually all

contemporaneous yields and forwards (and thus portfolios of forwards such as

), the

’s

of (28) are likely much higher than those for the time series VAR(1) at the monthly frequency.

Therefore we expect much stronger identiﬁcation for

1,∆

. Intuitively, although we observe

only a single time series under the historical measure with which to draw inferences, we

observe repeated term structures of risk-neutral expectations every month and this allows us

to draw much more precise inferences.

Figure 3 plots the p-values for this test of the restrictions of various instruments to be

autonomous under

. In stark contrast to Figure 2 and in accordance with our intuition,

we see that the risk-neutral measure provides very strong evidence for which instruments

are able to be valid volatility instruments. Most potential volatility instruments are strongly

ruled out with p-values essentially at zero. Thus, our results here suggest that were it only

up to

(

) and

(

) to decide which volatility instrument to use, the latter would almost

surely be the dominant force, with the remaining degrees of freedom being the sign choice and

choosing which of the

left eigenvectors of

1,∆

is the volatility instrument. This evidence

suggests that the no arbitrage restrictions can potentially have very strong impact in shaping

volatility choices.

Left open by the model-free nature of our analysis in this section is, among other things,

the possibility that the deﬁning property of the volatility factor (

should match

) can be

powerful enough that it might dominate

(

) at identifying potential volatility instruments.

We take up an in depth examination of this possibility in the next section.

5 Comparison of Gaussian and Stochastic Volatility

Models

To understand the contribution of matching

on the identiﬁcation of the volatility loadings

, we estimate and compare the (Gaussian)

(

) models with stochastic volatility models.

Figure 3: Likelihood Ratio Tests of the Autonomy Restriction under

. This ﬁgure reports

the p-values of the likelihood ratio test of whether a particular linear combination of yields,

β · P

, is autonomous under

, plotted against the loadings of PC2 and PC3. The loading of

PC1 is one minus the loadings on PC2 and PC3 (

(1) = 1

− β

(2)

− β

(3)). PC1, PC2, and

PC3 are scaled to have in-sample variances of one.

N = 4 N = 3

(N)

0.998 0.027 0.032 0.014 0.997 0.028 0.025

-0.007 0.957 -0.128 -0.042 -0.003 0.954 -0.098

-0.010 0.006 0.895 -0.080 -0.005 -0.002 0.928

-0.009 -0.009 -0.085 1.007

(N)

0.999 0.027 0.030 0.013 0.998 0.028 0.024

-0.005 0.959 -0.123 -0.037 -0.002 0.955 -0.097

-0.010 0.006 0.902 -0.075 -0.005 -0.000 0.931

-0.006 -0.007 -0.079 1.018

(N)

0.998 0.028 0.031 0.013 0.997 0.029 0.025

-0.005 0.956 -0.125 -0.040 -0.002 0.954 -0.099

-0.009 0.003 0.899 -0.077

-0.005 -0.002 0.929

-0.005 -0.012 -0.080 1.010

Regression

0.998 0.027 0.029 0.014 0.997 0.027 0.025

-0.008 0.957 -0.118 -0.041 -0.003 0.958 -0.098

-0.011 0.005 0.905 -0.077 -0.006 0.007 0.925

-0.016 -0.006 -0.065 0.987

Table 1: K

1,∆

Estimates.

Clearly, matching

is relevant only in the latter and not the former. Since the

(

)

models are aﬃne models, the one month ahead conditional expectation of yields portfolios

also take an aﬃne form. Thus for both Gaussian and stochastic volatility models, we can

write:

[

t+∆

] =

constant

1,∆

under the risk neutral measure. Of particular interest

is the estimates of the monthly risk-neutral feedback matrix,

1,∆

, implied by these models.

As we will show in this section, estimates of

1,∆

are highly similar across these models.

This suggests that the role of stochastic volatility (matching

) is inconsequential for the

estimation of

1,∆

. Thus identifying volatility instrument (

) is simply limited to making

the choice of which left eigenvector of

1,∆

and its sign can best match

. We use the same

dataset as in the preceding section and note that all of our results remain fully robust for a

shortened sample period that excludes the Fed experiment regime.

5.1 Comparison of K

1,∆

estimates

We estimate

(

) models, with

= 0

2 and

= 3

4, and then rotate the state

variables into low order yield PCs. For estimation, we assume these PCs are priced perfectly

while higher order PCs are observed with i.i.d. errors. JLS show that this assumption is

innocuous as it is likely to deliver estimates close to those obtained by Kalman ﬁltering where

all yields portfolios are observed with errors. Estimation details and full parameter estimates

are deferred to Appendix C.

Table 1 reports the estimates of

1,∆

implied by these models. Recall the deﬁning property

1,∆

given by equation (26) in which

1,∆

is informative about how

forecasts

t+∆

under the risk neutral measure. Since for each

is characterized by the same loading

matrix

(that corresponds to the ﬁrst

PCs of bond yields) across all models, it follows

that

1,∆

estimates are directly comparable across all models with the same number of

factors

. Focusing ﬁrst on the two models

(3) and

(3), the two estimates of

1,∆

are

strikingly close: most entries are essentially identical up to the third decimal place. This

evidence indicates that the identiﬁcation by the cross-sectional information (and possibly

other moments shared between the

(3) and

(3) models) for the parameter

1,∆

seems

overwhelmingly stronger than the restrictions coming from matching

. Enriching the

volatility structure to

= 2 does not overturn this observation: the

1,∆

estimate implied by

the A

(3) model remains essentially identical. Additionally, changing the number of factors

= 4 (results also reported Table 1) or

= 2 (results not reported) does not alter our

observation.

We have argued that variation in the one month ahead risk-neutral expectations, as

determined by

1,∆

, is well approximated by the regression based estimate of (28). This

estimate can be further improved by simple steps that take into account the aﬃne structure

of bond yields. Speciﬁcally up to convexity eﬀects, the aﬃne structure of bond yields implies

that:

n+∆

= K

1,∆

∆

where

denotes the unannualized loadings of

-year zero yields on

. This suggests we

can recover

1,∆

in two steps. First, we project yields of all maturities onto the states

recover the loadings

Second, an estimate of

1,∆

is obtained by projecting

n+∆

−

∆

onto

(allowing for no intercepts).

As can be viewed from the last panel of Table 1, this

model free estimate of

1,∆

come strikingly close to estimates obtained from the no arbitrage

models. This evidence suggests that the cross-sectional information

alone

is suﬃcient to pin

down the risk-neutral feedback matrix, and this identiﬁcation is so strong that information

from other constraints imposed by the models seems irrelevant.

Given the estimates of

(

) models, we are able to conﬁrm that the convexity eﬀects

on yield loadings are negligible. Speciﬁcally, holding

ﬁxed, varying

, and thereby varying

the degree of convexity eﬀects due to the presence of stochastic volatility, is completely

inconsequential for the yield loadings implied by diﬀerent models. Graphs (not reported) of

yield loadings on

plotted against the corresponding maturities (up to ten years) implied

by A

(N), A

(N), and A

(N) are virtually indistinguishable.

The observed invariance property of

1,∆

estimates has a number of implications. First,

as stated previously, this allows us to pin down the potential volatility instruments using the

cross-section of yields due to the admissibility constraint. Essentially the volatility instrument

is free in terms of the sign but must be one of the left eigenvectors of

1,∆

which can be

To obtain yields for the full range of maturities from the small set of maturities used in estimation, we

can use simple interpolation techniques such as the constant forward bootstrap or simply a cubic spline.

de los Rios (2013) develops a similar regression-based approach to obtain estimates of K

1,∆

computed accurately from either the cross-sectional regression or from estimation of the

(

) model which has constant volatility and can be estimated quite quickly as shown in

JSZ.

This observation also shows that, in some regards, the estimation of the no arbitrage

(

) model is more tractable than estimate of the F

(

) model. In the case of the Gaussian

models the opposite holds: the factor model is trivial to estimate as it amounts to a set of

ordinary least squares regressions while the no arbitrage model is slightly more diﬃcult to

estimate due to the non-linear constraints in the factor loadings. In the stochastic volatility

models, the admissibility conditions require a number of non-linear constraints in order

to ensure that volatility remains positive. The no arbitrage model essentially determines

the volatility instrument up to sign and choice of eigenvector. This actually simpliﬁes the

estimation since it reduces the set of non-linear constraints that need to be imposed.

The observation that

1,∆

estimates are nearly invariant across Gaussian and stochastic

volatility models leads us to the surprising conclusion that the

(

) model with

constant

volatility

allows us to essentially identify (up to choice of which eigenvector) the source of

stochastic volatility in the

(

) model. We provide an illustration of this point in the next

subsection.

5.2 Volatility information revealed by the Gaussian model

Despite the similarity, the estimates of

1,∆

reported in Table 1 still exhibit slight numerical

diﬀerences. It is possible these small numerical diﬀerences might become more signiﬁcant

in terms of the left eigenvectors and thus among model implied volatility instruments. To

show that this is not the case, we carry out the following exercise. Starting with the

1,∆

estimate by the

(3) model, we form three potential volatility instruments from the three

left eigenvectors of

1,∆

and then pick out the instrument with most predictive content

for volatility. Speciﬁcally, we ﬁrst project the level factor,

1,t+∆

, onto

to obtain the

forecast residuals and then choose the volatility candidate with most predictive content for

the squared residuals. This way, from the

(3) model, we can have a “guess” for what the

volatility instrument of the

(3) model looks like even before we actually estimate the

(3)

model. Finally, we compare this “guess” to the actual volatility instrument implied by the

(3) model.

Table 2 reports the adjusted

statistics (in percentage) of regressions in which each

potential volatility instrument is used to predict the squared residuals of the level factor.

Evidently, one of the instruments clearly dominates the others at all forecasting horizons

from one to twelve months. Comparing this dominant instrument to the actual volatility

factor of the

(3) model results in a striking correlation of one. To see this more visually, we

plot these two volatility instruments, normalized to have the same scaling and intercepts,

in Figure 4. Clearly, the

(3)’s “guess” is very accurate as the two graphs are right on top

of one another.

Speciﬁcally, in constructing both volatility instruments, we drop the intercepts and scale the loading on

the level factor (β(1)) to one.

Horizon Instrument 1 Instrument 2 Instrument 3

1 9.35 -0.00 0.58

2 8.84 -0.21 -0.16

3 8.66 0.92 2.17

4 7.32 0.96 2.55

5 6.51 0.50 1.84

6 6.04 -0.24 0.12

7 5.28 -0.07 0.57

8 4.71 -0.24 0.01

9 4.87 0.33 1.26

10 4.59 0.51 1.66

11 4.27 0.51 1.77

12 4.21 0.03 0.95

Table 2:

(in percentage) predicting squared residuals in forecasting the level factor by

the three potential volatility instruments implied by the

(3) model. Instruments 1, 2, 3

are formed from the left eigenvectors of the

1,∆

matrix, corresponding to the eigenvalues

ordered from highest to lowest.

1970 1975 1980 1985 1990 1995 2000 2005 2010

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

(3) guess

(3)

Figure 4: Volatility instrument “guessed” by the

(3) model and the actual volatility factor

implied by the

(3) model. The volatility instrument is normalized as

β · P

where

(1) is

scaled to one.

This exercise and the content of the previous subsection clearly reveal the respective roles

of the cross-sectional and time series information in shaping the choice of volatility instrument

in an

(3) model. The cross-sectional information pins down the risk-neutral feedback

matrix

1,∆

. The identiﬁcation seems so strong that time series constraints from matching

(

) and

appear inconsequential. Regardless of whether the time series constraints are

applied (in the

(3) model) or not (in the

(3) model), the estimates of

1,∆

seem largely

unaﬀected. The precise estimate of

1,∆

together with

(

) dramatically reduces the choice

of potential volatility instruments from an uncountably inﬁnite set to a discrete choice among

the

left eigenvectors of

1,∆

. This is the very sense in which a constant volatility model

such as the

(

) model can reveal volatility information of stochastic volatility models.

The

(

) model pins the optimal volatility instrument to be one of the

left eigenvectors,

but it does not determine exactly which one. It is the role of the time series constraints in

picking the left eigenvector (and its sign) that best matches M

(P) and M

6 No Arbitrage Restrictions

In this section, we show that the clear distinction of the roles of cross-section and time

series information in determining the volatility of no arbitrage models can have important

implications for dynamic term structure models. Speciﬁcally, we reconsider the puzzling result

of Dai and Singleton (2002). They show that while Gaussian models are able to replicate the

deviations from the expectation hypothesis found in the data, aﬃne term structure models

with stochastic volatility are unable to match the patterns found in the data. This failure

may potentially be due to the tension between ﬁrst and second moments. However, we show

that the stochastic volatility factor model (where the ﬁrst and second moment tension still

applies) is able to match deviations from the expectations hypothesis. This demonstrates

that the tension created by also matching

(

) in addition to

(

)and

is what drives

the failures of the stochastic volatility models demonstrated by Dai and Singleton (2002).

These results, together with our previous results, show that recent results about the

irrelevancy of no arbitrage restrictions in Gaussian models do not extend to aﬃne models

with stochastic volatility. For example, Duﬀee (2011), Joslin, Singleton, and Zhu (2011), and

Joslin, Le, and Singleton (2012) all show that no arbitrage is nearly irrelevant in Gaussian

dynamic term structure models on a number of dimensions. In contract, for the case of

stochastic volatility models, the no arbitrage constraints on the factor model has material

eﬀects for both ﬁrst and second moments.

6.1 Expectation hypothesis

A generic property of arbitrage-free dynamic term structure models is that risk-premium

adjusted expected changes in bond yields are proportional to the slope of the yield curve.

Under the expectation hypothesis (EH), risk premiums are constant. This implies that the

coeﬃcients φ

in the projections

P roj [y

n−∆,t+∆

− y

n,t

− y

∆,t

] = α

+ φ



n,t

− y

∆,t

n − ∆



, for all n > ∆, (29)

should be uniformly ones under the EH. Campbell and Shiller (1991) shows robust evidence

that

’s are signiﬁcantly diﬀerent from one and become increasingly negative for large

’s.

This puzzling pattern of

’s, which can be observed in Figure 1 for our sample period, has

become one of the most studied empirical phenomena for the last twenty years.

Dai and Singleton (2002) show that constant volatility models are not “puzzled” by this

pattern and that the population coeﬃcients

implied by estimated

(

) models very

closely match their data counterparts. However, Dai and Singleton (2002) show a stark

contrast for the canonical models

(

) models with

M >

0 with stochastic volatility. Here

they ﬁnd

’s typically stay close to the unit line, thereby counter-factually implying that

the EH nearly holds.

What is behind the diﬀerence in performances of the Gaussian and stochastic volatility

models? To begin, it is worth noting that the loadings

’s for all aﬃne models (with or

without no arbitrage restrictions) can be written as:

= (n − ∆)

n−∆

1,∆

− B

)Σ(B

− B

∆

)

− B

∆

)Σ(B

− B

∆

)

. (30)

where Σ denotes the unconditional covariance matrix of the time series innovations and

the loadings of the

-period yield

n,t

on the principal components of yields

. As

noted earlier, the loadings

’s are essentially identical across models with and without

stochastic volatility.

Furthermore, the covariance matrix Σ appears in both the numerator

and denominator of (30), thus its impact on

is greatly dampened due to cancellation. This

essentially leaves the one-month ahead physical feedback matrix

1,∆

as the natural focus in

explaining the diﬀerences in φ

’s across the constant and stochastic volatility models.

One of the main ﬁndings of JSZ in the context of the Gaussian models is that no arbitrage

restrictions are irrelevant for models’ forecasting performance. Equivalently, estimates of the

one-month ahead physical feedback matrix

1,∆

from

(

) models are exactly identical to

those obtained from OLS regressions of

t+∆

and, thus, completely unaﬀected by no

arbitrage restrictions. Turning to the

(

) models, the concurrent presence of

(

) and

(

) builds a strong link between the one-month ahead physical and risk neutral feedback

matrices:

1,∆

and

1,∆

must share one common left eigenvector. To the extent that

1,∆

very strongly pinned down by the cross-section information, it is likely to force the physical

feedback matrix

1,∆

to accept one of the

left eigenvectors of

1,∆

as one of its own.

Due to this coupling of

1,∆

and

1,∆

, the estimate of

1,∆

from the

(

) model is likely

strongly inﬂuenced by the no arbitrage restrictions and thus can be quite diﬀerent from its

OLS counter-part.

In fact, these loadings are very close to those obtained from OLS regressions of yields of individual

maturities onto the pricing factors P

N = 4 N = 3

(N)

0.990 -0.009 -0.047 -0.017 0.990 -0.009 -0.047

0.010 0.955 -0.080 -0.032 0.010 0.955 -0.080

-0.002 0.010 0.799 0.030 -0.002 0.010 0.799

0.000 0.012 -0.012 0.627

(N)

0.990 0.015 0.002 -0.034 0.993 0.011 -0.035

0.004 0.975 -0.080 -0.099 0.006 0.973 -0.100

0.000 0.002 0.835 0.013 -0.001 -0.009 0.823

-0.006 0.001 -0.031 0.703

(N)

0.987 0.017 0.035 -0.004 0.992 0.015 0.009

-0.002 0.958 -0.083 -0.106 0.004 0.974 -0.075

0.002 0.013 0.870 0.052

0.013 0.011 0.902

0.001 0.020 -0.025 0.729

(N)

0.991 -0.013 -0.093 -0.010 0.997 -0.018 -0.081

0.006 0.962 -0.054 -0.045 0.002 0.965 -0.052

0.009 -0.012 0.867 0.048 0.005 -0.008 0.830

-0.009 0.022 0.003 0.715

(N)

0.994 -0.013 -0.086 0.028 0.998 -0.018 -0.056

0.005 0.962 -0.067 -0.070 0.005 0.965 -0.010

0.005 -0.012 0.876 0.032 0.004 -0.007 0.828

-0.003 -0.011 -0.027 0.702

Table 3: K

1,∆

Estimates

Table 3 reports estimates of

1,∆

for the

(

) models with

= 0

2 and

= 3

The sample period is 1973 through 2007 and we note, again, that all of our results for a

shortened sample period that excludes the Fed experiment regime remain qualitatively similar.

Comparing the

(

) and

(

) models reveals one interesting diﬀerence: for both

= 3

and

= 4, the (1,2) entry of the feedback matrix, which governs how slope this period

forecasts level next month, is negative for the

(

) model but positive for the

(

) model.

A negative value for this entry means that higher slope leads to lower level and thus higher

return in the future whereas the opposite is true for a positive entry. As is well-known, the

Campbell and Shiller (1991) regression in (29) is equivalent to one in which future bonds’

excess returns are projected onto a measure of slope:

P roj[xr

n−∆

t+∆

n,t

− y

∆,t

] = (1 − φ

)(y

n,t

− y

∆,t

) (31)

where

n−∆

t+∆

denotes one-month excess returns on the

n −

∆ period bond, realized at time

+ ∆. Combining with the established empirical fact that the Campbell and Shiller (1991)

loadings

’s are always below one (and mostly negative), (31) clearly reveals that higher

slope must be followed by higher returns. It follows that the positive (1,2) entry of the

feedback matrix, which counter-factually implies that higher slope must be followed by lower

returns, is likely the key weak point of the

(

) models. Moreover, the same weakness also

applies to the

(

) models as the (1,2) entries for these models are also similarly positive.

To examine whether the no-arbitrage restrictions are indeed forcing the physical feedback

matrix of the stochastic volatility models to admit these counter-factual values, we estimate

the

(

) models established in Section 3. Recall that these are the counter-parts to the

(

) models with the no-arbitrage restrictions, and thus the “ﬁrst moments” restrictions

through

(

), completely relaxed. We use the same sample period (1973 through 2007)

and the same set of yields in estimation, thus the estimated

(

) and

(

) models are

directly comparable. Examining the reported values of the

matrix reported in the last

two panels of Table 3, for all four

(

) models (M=1,2 and N=3,4) the (1,2) entry of the

feedback matrix is negative.

Although a negative (1,2) entry should now allow slope to forecast level with the right

sign, the key question is whether the

(

) models, without no arbitrage restrictions, can

produce loadings

’s that match up with the Campbell and Shiller (1991) regression (31)

in the data. The answer to this question is a deﬁnite yes! Examining the pattern of the

loadings

implied by the

(3) and

(3) models in Figure 1, we ﬁnd the well-known result

of Dai and Singleton (2002) in which these stochastic volatility models have a long way to go

in matching the empirical Campbell and Shiller (1991) regression coeﬃcients. Nevertheless,

once the no arbitrage restrictions are dropped and the the

(3) and

(3) models turn into

the corresponding

(3) and

(3) models, the model-implied

’s now become extremely

close to their empirical counter-parts, arguably as close as those loadings implied by the

(3) model. A graph (not reported) for four factor models shows very similar results.

In short, Figure 1 constitutes convincing evidence that the no arbitrage restrictions, and

in particular needed to match

(

), seem directly behind the failure of the

(

) models

for

M >

0 in explaining the deviations from the EH. In stark contrast, the admissibility

restriction

(

) under the physical measure – present in both the

(

) as well as

(

)

models – appears largely inconsequential for a model’s ability in matching the deviations

from the EH.

6.2 Why does imposing no-arbitrage lead to slope predicting level

with a positive sign?

Whereas it seems clear the presence of no-arbitrage forces slope to predict level with a

positive sign, thereby impairing no arbitrage stochastic volatility models’ ability to match

the empirical Campbell and Shiller (1991) regression coeﬃcients, the exact mechanism is not

obvious. To shed light on this, and with an emphasis on intuition, we focus on the

(

)

models and present two heuristic results. First, we show that as long as the risk-neutral

dynamics of the non-volatility factors are not too close to being explosive, the loadings of the

volatility factor on the level and slope factors (

(1) and

(2)) will have the same sign. Second,

we show that, given the tendency of the volatility factor to be relatively more persistent than

the slope factor, the sign constraint on

(1) and

(2) necessarily causes slope to predict level

with a positive sign.

To see the former result in the most simpliﬁed manner, let’s focus on the two-factor model

(2) and think of the level and slope factors simply as the one-year yield, and the spread

between the ten-year and one-year yield, respectively. We can show in Appendix A that the

volatility loadings (with the ﬁrst entry normalized to one) can be written as:

β = (1, W

)

−1

where

= (1

0),

= (

−

1). Furthermore, standard bond pricing calculations reveal that

= ∆



1−e

(1−e

∆

)

1−e

10λ

10(1−e

∆

)



where

denotes the risk-neutral eigenvalue corresponding

to the non-volatility factor as in (14). A few algebraic steps show that :

β(2) = W

)

−1

1 − e

−

1−e

10λ

. (32)

Clearly, as long as

≤

0, or equivalently the non-volatility factor is stationary, both the

numerator and the denominator of the right hand side of (32) are positive. Therefore we can

make the following statement for the A

(2) model:

As long as the non-volatility factor is

-stationary, the loading of the volatility factor

on slope will always be of the same sign as the loading on level.

To the extent that the loading of volatility on level is generally positive, it implies that

the loading on slope is also positive.

Similar results hold up for more general loadings of the level and slope factors and for

the

(3) model. Adopting the loadings

that correspond to the lower order yield PCs,

we roam over the possible values of

≤

0 for both cases

= 2 and

= 3 and plot in

Figure 5 the corresponding values of 1 +

log

(

(2)) (again with

(1) normalized to one). Note

that this transformation, chosen for better scaling of the graphs, is positive if and only if

(2) is positive. As can be seen clearly from the graphs,

(2) is always positive, implying

that both level and slope will load with the same sign in the volatility instrument of both the

(2) and A

(3) models.

Turning to the second result, let’s start by noting that the normalized volatility factor of

the A

(2) model can be written as:

= β · P

= L

+ β(2)S

(33)

where

is the level,

is the slope, and

(2) is positive. Now the admissibility restriction

under the physical measure requires that only V

can forecast V

t+∆

, or equivalently:

t+∆

] = constant + ρ

. (34)

0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1

0.5

1.5

2.5

(2)

exp(h

log(1+`(2))

(a) A

(2)

0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1

0.5

1.5

2.5

exp(h

X,2

log(1+`(2))

(3)

exp(h

X,1

6)=0.75

exp(h

X,1

6)=0.85

exp(h

X,1

6)=0.95

(b) A

(3)

Figure 5: log(1 + β(2)) for various values of λ

Substitute (33) into (34) and evaluate the one month forecast of the level factor, we

obtain:

t+∆

] = constant + ρ

+ β(2)(ρ

− E

t+∆

]). (35)

Assuming slope forecasts future slope with a coeﬃcient of

then it follows from (35) that

slope forecasts future level with a coeﬃcient of

β(2)(ρ

− ρ

Due to the sign restriction

(2)

0, established above, the sign with which slope forecasts

level is dependent on the diﬀerence between

, the persistence of the volatility factor, and

. Empirically, due to well known volatility level eﬀect, the volatility factor is typically

quite persistent. In contrast, the slope factor operates at a relatively higher frequency. This

suggests that

, which is closely related to the persistence of the slope factor, is likely much

smaller than ρ

, thus requiring slope to forecast level with a positive coeﬃcient.

7 Risk price speciﬁcation and identiﬁcation of the volatil-

ity factor(s)

Within the class of

-aﬃne term structure models, many diﬀerent speciﬁcations for the market

prices of risks have been proposed. Starting with the completely aﬃne setup formalized by

Dai and Singleton (2000), we have seen more ﬂexible aﬃne forms such as Duﬀee (2002),

Cheridito, Filipovic, and Kimmel (2007), as well as non-aﬃne forms such as Duarte (2004).

Depending on the risk-price speciﬁcations, the physical dynamics can be aﬃne or non-aﬃne.

To be more precise, E

t+1

] = constant + ρ

+ ρ

but we focus only on the S

term.

(3) Duarte

(3) Diag

(3)

0.997 0.028 0.025 0.998 0.028 0.024 0.998 0.028 0.024

-0.003 0.954 -0.098 -0.002 0.955 -0.097 -0.002 0.955 -0.098

-0.005 -0.002 0.928 -0.005 -0.000 0.931 -0.005 -0.000 0.930

Table 4: K

1,∆

Estimates

Nonetheless, a common thread through all of these diﬀerent modelling choices is the fact

that the risk-neutral dynamics of the underlying states remain aﬃne and, importantly, free

of artiﬁcial constraints beyond those that guarantee admissibility.

Recall from Section 5 that our regression-based estimates of the risk-neutral feedback

matrix

1,∆

, which are completely independent of any physical dynamics, are quite close

to estimates implied by the

(

) models. It is therefore very likely that the risk neutral

feedback matrix

1,∆

will be very strongly identiﬁed regardless of their risk price speciﬁca-

tions. Moreover, specializing to models with one volatility factor, due to

(

), the strong

identiﬁcation of

1,∆

translates into a virtually discrete choice of the volatility instruments

from the

left eigenvectors of

1,∆

. As is seen earlier, given the

left eigenvectors, the

choice of which volatility instrument seems to rest on the matching of

and much less on

the functional form of risk prices. We therefore conjecture that the volatility factor implied

by these models is likely highly similar across diﬀerent risk price speciﬁcations.

To conﬁrm our conjecture, we use the same data used earlier to estimate two term

structure models with one volatility factor for

= 3 with diﬀerent risk price speciﬁcations.

The ﬁrst adopts the non-aﬃne approach of Duarte (2004) to include a square-root term in

the risk price of the volatility factor. The second restricts the conditional feedback matrix

corresponding to the PC yields portfolios to be diagonal under the physical measures. Similar

diagonal restrictions have been considered by Joslin, Le, and Singleton (2012), among others.

We refer to these models as Duarte

(3) and Diag

(3), respectively.

As is evident from Table 4, the risk neutral feedback matrices implied by

Duarte

(3)

and

Diag

(3) are virtually identical and are extremely close to the

1,∆

matrix implied by

the constant volatility model

(3) (which is shown earlier to be very close to that implied

by the

(3) model). Comparing the volatility factors implied by

Duarte

(3) and

Diag

(3)

in Figure 6 clearly reveals that these volatility factors are virtually indistinguishable. Very

similar results (not reported) are obtained for four factor models and for a shortened sample

period that excludes the Fed regime. In short, we ﬁnd strong evidence that altering the price

of risk speciﬁcation is relatively inconsequential for the identiﬁcation of volatility.

8 Extensions

In this section, we show how our results extend to the case of multiple volatility factors and

unspanned stochastic volatility. As before, this tension arises because of the relationship

1970 1975 1980 1985 1990 1995 2000 2005 2010

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

(3)

Duarte

(3)

Diag

(3)

Figure 6: Volatility factor (

β·P

where

(1) is scaled to one) implied by the

(3),

Duarte

(3),

and Diag

(3) models.

between the left eigenspaces of the feedback matrices under

and

. Additionally, new

conditions arise because of requirements of positive feedback amongst CIR factors. We also

show how our results extend to the case of unspanned stochastic volatility.

8.1 Multiple volatility factors

In an

(

) model, as

rises relative to

, the restriction that the univariate volatility

factor is an autonomous process weakens to the requirement that the

-dimensional volatility

process is autonomous. This allows for richer feedback among the factors. While the conditions

are weaker in this sense, new conditions arise where now feedback amongst the volatility

factors must be positive. Additionally, such factors are now required to be independent.

Although the proof of the results that we state below are generally direct, we omit them due

to their tedious nature.

An A

(N) model has a latent factor representation as

= (K

+ K

) dt + +

+ Σ

1,1

+ . . . + Σ

t,M

(36)

with r

= ρ

+ ρ

· X

and the admissibility requirements:

≥

1,i,ii

, i ≤ M, (37)

1,ij

≥ 0, i, j ≤ M, i 6= j, (38)

1,ij

= 0, i ≤ M < j, (39)

0,ii

= 0, i ≤ M, (40)

1,i,jj

= 0, i, j ≤ M, i 6= j, (41)

and Σ

, . . . ,

are positive semi-deﬁnite symmetric matrices. Similar relations apply for

the risk-neutral dynamics. As before we have the relation

= A

+ B

, (42)

where A

and B

are dependent on the underlying parameters.

When

M >

1 there will be multiple volatility instruments given by a set of vectors

{β

, β

, . . . , β

}

. As before, we will require that under both the historical and risk-

neutral measures there are CIR-type factors which do not have any feedback from the

conditionally Gaussian factors. In the

(

) case, this required that

is a left eigenvector

and

. For the general case of

M ≥

1, we require that there exists

left eigenvectors

, . . . , e

}

, and

left eigenvectors of

, . . . , f

}

so that the span of

the same of the span of both

, . . . , e

}

and

, . . . , f

}

This will give a direct tension

between ﬁrst moments.

The conditions for no-feedback between conditionally Gaussian and CIR factors required

by (39) imply eigenvector restrictions that give rise to a tension between

(

) and

(

The restriction of positive feedback among the CIR factors given by (38) also implies a tension

between

(

) and

(

). To illustrate these tensions, let us consider the case of an

(2)

model. In this case, since M = N, the eigenvector conditions do not bind.

Suppose that

has real eigenvalues

{λ

, λ

}

. with corresponding eigenvectors

, u

}

Then

−1

diag

(

, λ

) where

= [

, u

]. Straightforward but tedious computations

imply that the positive feedback conditions in (38) requires that Γ =

−1

must satisfy

one of:

1. Γ

≥ 0, or

2. Γ

< 0 < Γ

and Γ

− Γ

> 2

√

−Γ

, or

3. Γ

< 0 < Γ

and Γ

− Γ

> 2

√

−Γ

Note that by our previous logic,

is likely to be strongly identiﬁed by the cross-section

of yields and largely invariant to the choice of

. Therefore

should be estimated very

precisely. If one considers

to be ﬁxed (i.e. estimated without error), then the conditions

Alternatively, we can characterize the relation that the exists a single ﬁxed matrix

so that for each

both (i) K

(Uβ

) is in the span of B and (ii) (K

)

(Uβ

) is in the span of B.

above translate into quadratic constraints on

. This is the new tension between

(

)

and M

(Q) not present in A

(N) models.

In addition, the tensions with

become more pronounced. To see this, suppose that

· P

, c

· P

}

have uncorrelated innovations. That is, the covariance matrix of innovations

, where

= [

, c

]

, is diagonal. Then it must be that

−1

and

−1

has

non-negative oﬀ-diagonal elements. Again taking the extreme example that

is estimated

without any error, this will impose a restriction on which possible linear combinations of

are uncorrelated.

8.2 Unspanned stochastic volatility

With some modiﬁcations, our results also apply in models with unspanned stochastic volatility

as in Collin-Dufresne and Goldstein (2002) and Bikbov and Chernov (2009), among others.

We now discuss these results for the case of one or more stochastic volatility factors as well

as for the cases with purely unspanned stochastic volatility or a mixture of spanned and

unspanned stochastic volatility. Similar results would apply in cases with unspanned risk

such as inﬂation risk (see, e.g. Chernov and Mueller (2009)), provided additional data is

available to price this risk such as inﬂation swaps.

In the case of an

(

) model with unspanned stochastic volatility, the tension implies

that volatility is an autonomous process under

(

) and

(

). The clean separation of

yields and volatility will imply that our results are informative only about volatility and will

require that volatility is an autonomous process, which is well understood in the literature.

For example, Almeida, Graveline, and Joslin (2011) ﬁnd correlations between yield curve

factors and volatility measures as high as 84% while Jacobs and Karoui (2009) ﬁnd

high as 75%.

When

M >

1, several possibilities arise. Joslin (2013a) develops an

(

) model

where there are both spanned and unspanned stochastic volatility. Such a model would be

consistent with the moderate

(but less that 100%) that are available from projecting

volatility/variance measures onto the yield curve. Our existing results will apply directly

to the spanned component of volatility. For example, the possible choices for the spanned

instrument for volatility will be determined by

. Moreover, spanned and unspanned

volatility factors will follow an

(2) process. The previous discussion of

(2) models will

now apply. In this case,

(

) and

(

) will be determined by actual and risk-neutral

forecasts of volatility. Risk-neutral forecasts of volatility will now be identiﬁed by a cross-

section of option prices instead of bond prices. For example, if

= (

span

, V

unspan

) and

has uncorrelated innovations, then our results in Section 8.1 show that the risk-neutral and

actual forecasts will have to satisfy a number of constraints.

Another alternative is to have a model with multiple unspanned stochastic volatility

factors. Joslin (2013a) and Trolle and Schwartz (2009) show how to construct such models.

In this case, our results regarding

(2) models (and more generally

(

)) will apply

to the volatility factors. Thus, again, there will be tensions as we have outlined between

risk-neutral and actual forecasts of volatility as well as the volatility of volatility. Moreover,

our logic would generally apply to multiple unspanned volatility factors where the risk-neutral

dynamics could be identiﬁed by options as in Carr, Gabaix, and Wu (2009).

9 Conclusion

In the context of no arbitrage aﬃne term structure model with stochastic volatility, we

document a strong tension between the ﬁrst moments of bond yields under the time series and

risk neutral measures. We show that, beyond other types of tensions documented so far in

the existing literature, this tension is key in understanding important economic implications

of no-arbitrage aﬃne models with stochastic volatility. In particular, this tension underlies

the well-known failure of the

(

) class of models in explaining the deviations from the

EH in bond data.

Our primary results are driven by the fact that an aﬃne drift requires a number of

constraints in order to assure that volatility stays positive. A number of alternative models

could be considered. First, one could consider a model with unspanned or nearly unspanned

volatility. This, however, can only partially counteract our results in the sense that the

projection of volatility onto yields must still mathematically be a positive process. So several

of our insights maintain. Another possible model to consider is a model with non-linear drift.

That is, we can suppose that there is a latent state variable

with the drift of

linear

without any constraints provided that volatility (or its instrument) is far from the

boundary. Near the zero boundary, the drift of the volatility may be non-linear in such a way

as to maintain positivity. Provided that the probability of entering this non-linear region is

small (under Q), similar pricing equation will be obtained as in the standard aﬃne setting.

A Dependence of Volatility Loadings β and the Eigen-

values under Q

Our arguments in Section 3.3 rest on the approximation that convexity eﬀects are negligible.

Although we conﬁrm empirically that this approximation holds up in the data, we now

make our arguments more precise by showing that the volatility instrument

is in fact

completely determined by the (N − 1) eigenvalues given in λ

. This can be seen as follows.

Let

(1)

(

(2)

) denote the ﬁrst entry (entries two to N) of

where

(

)

refers to the ﬁrst row (rows 2 to N) of

. Let

and

denote the yield loadings on

and

, respectively. Thus,

corresponds to the ﬁrst column, and

the remaining

columns of

. Notably, due to the block structure of the feedback matrix in (14) and the

fact that the non-volatility factor

does not give rise to Jensen eﬀects, it can be shown

that B

only depends on λ

. Then we have,

(1)

= constant + W

+ W

(2)

= constant + W

+ W

This gives two equations and two unknowns, so we can subtract

(

)

−1

times the

second equation from the ﬁrst equation to eliminate X

and obtain

(1)

− W

)

−1

(2)

= constant + cV

where

is a constant. This shows directly that no arbitrage imposes the restriction (up to

scaling):

β = (1, −W

)

−1

). (43)

We see that the volatility instrument is in fact determined entirely by

. Coupled with our

reasoning earlier that

should be strongly pinned down in the data, it is clear that the

volatility instruments are heavily aﬀected by the no arbitrage restrictions. Equation (43) also

makes clear the nature of the (close) relationship between the volatility instrument and yield

loadings discussed earlier.

B A Canonical Form for Discrete-Time Term Struc-

ture with Stochastic Volatility

In this section, drawing on the construction in Le, Singleton, and Dai (2010) (LSD), we lay out

canonical forms for discrete-time aﬃne term structure models with stochastic volatility. As is

shown by LSD, for monthly data, these provide very good approximation to the continuous

time A

(N) models in the main text.

We start by assuming that the economy is fully characterized by the

-variate state

vector

= (

, X

)

where

is a strictly positive

-variate volatility process and

conditionally Gaussian. The time interval is ∆.

B.1 Risk-neutral dynamics and bond pricing

Under Q, our states follow:

t+∆

∼ CAR(ρ

, c

, ν

), (44)

t+∆

∼ N(K

1V,∆

+ K

1X,∆

, Σ

0,∆

i=1

i,∆

i,t

), independent of V

t+∆

(45)

= r

∞

+ ρ

+ ι

. (46)

CAR denotes a compound autoregressive gamma process. See LSD for more details.

Each CAR process is fully characterized by three non-negative parameters:

(

M × M

(M × 1), and ν

(M × 1). The Laplace transform for a CAR variable:

t+1

] = e

a(u)+b(u)Z

where a(u) = −

log(1 − u

), b(u) =

1 − u

where the subscript i indexes the i

element for vectors and the i

row for matrices.

From the Laplace transform, standard bond pricing calculations show that bond prices for

all maturities are exponentially aﬃne. Denoting by

n,t

the price of a zero coupon bond with

periods (

∆ years) until maturity, we can show that

logP

n,t

−A

− B

V,n

− B

X,n

with loadings given by:

X,n

= ι

+ B

X,n−1

, (47)

V,n,i

= ρ

V,i

k=1

(k, i)

V,n−1,k

1 + B

V,n−1,k

+ B

X,n−1

(:, i) −

X,n−1

i,X

X,n−1

(48)

= r

∞

+ A

n−1

log(1 + B

V,n−1,i

) −

X,n−1

, (49)

starting from: A

= B

V,0

= B

X,0

≡ 0.

B.2 Physical dynamics

Under P, the state variables follow:

t+∆

∼ bivariate CAR(ρ, c, ν), (50)

t+∆

∼ N(K

0,∆

+ K

1V,∆

+ K

1X,∆

, Σ

0,∆

i=1

i,∆

i,t

), independent of V

t+∆

(51)

Non-attainment under P requires the Feller condition: ν ≥ 1.

B.3 The continuous time limit

The conditional mean

[

t+1

] and conditional covariance matrix

[

t+1

] implied by the

Laplace transform of the CAR process are

t+1

](i) = ν

+ ρ

, V

t+1

](i, i) = ν

+ 2c

, (52)

and the oﬀ-diagonal elements of

[

t+1

] are all zero (correlation occurs only through the

feedback matrix).

That this process converges to the multi-factor CIR process can be seen by letting

M×M

− κ

∆

, and

2(κθ)

, where

is a

M × M

matrix and

is a

M ×

vector. In the limit as ∆t → 0, the V

converges to:

= κ(θ − V

)dt + σ

diag(V

)dB

where σ is a N × N diagonal matrix with i

diagonal element given by σ

For the conditionally Gaussian variables, it is straightforward to see that if we let

0,∆

∆

1V,∆

∆

1X,V

I −K

∆

, and Σ

i,∆

= Σ

∆

, in the time limit, the

process converges to:

= (K

+ K

)dt +

i=1

i,t

. (53)

B.4 Technical conditions

We now discuss two technical issues related to this parameterization. First, consider the

market prices of variance risk:

V ar

t+1

]

−1

t+1

] − E

t+1

]).

As discussed by Cheridito, Filipovic, and Kimmel (2007), when

V ar

[

t+1

] approaches zero,

there is the issue of exploding market prices of risks unless the intercept terms of

[

t+1

]

and

[

t+1

] are the same (hence the numerator too approaches zero at the same rate as

the denominator). Nevertheless, in our discrete time setup, as long as

and

are strictly

positive,

V ar

[

t+1

] is bounded strictly away from zero. As a result, we don’t have to directly

deal with this issue. If one wishes to avoid this issue even in the continuous time limit, then

a suﬃcient restriction on the parameters is:

vc = v

Finally, the scale parameters (

and

) in principle can be any pair of positive numbers

in our discrete time setup. Nevertheless, the diﬀusion invariance property of the CIR process

requires that these two parameters have the same continuous time limit (

). To be

consistent with diﬀusion invariance of

in the continuous time limit, then a suﬃcient

restriction on the parameters is:

c = c

C Estimation

For estimation, we use the monthly unsmoothed Fama Bliss zero yields with eleven maturities:

6month, one- out to ten-year. We start our sample in January 1973, due to the sparseness of

longer maturity yields prior to this period, and end in December 2007 to ensure our results

are not inﬂuenced by the ﬁnancial crisis.

Using the canonical form laid out in Appendix B and assuming that the ﬁrst

PCs of

bond yields are priced perfectly and the remaining PCs are priced with iid errors and one

common variance, we compute the model implied one-month ahead conditional means and

variances and implement estimation using QMLE.

In the main text, we note that the one-month ahead conditional mean of the yields

portfolios

take an aﬃne form. The same also holds for the one-month ahead conditional

variance. That is:

t+∆

] =K

0,∆

+ K

1,∆

, (54)

t+∆

] =K

0,∆

+ K

1,∆

, (55)

V ar

t+∆

] =Σ

0,P,∆

i=1

i,P,∆

i,t

, (56)

where

. Thus one way to fully characterize each of the

(

) model is through

the set of parameters Θ

∆

= (

0,∆

, K

1,∆

, K

0,∆

, K

1,∆

i,P,∆

, α, β

). In the main text, we have

reported estimates of

1,∆

and

1,∆

for the

(

) models. For completeness, we report

here all the remaining estimates. Table 5 contains estimates of the intercept terms

0,∆

and

0,∆

. Table 6 reports the estimates of the volatility loadings

and

. Table 7 and Table 8

report the Choleskey decomposition of the variance parameters Σ

i,P,∆

for 4-factor models

and 3-factor models, respectively.

P Q

M=0 M=1 M=2 M=0 M=1 M=2

(4)

0.04 0.05 0.09 0.01 0.01 0.01

-0.03 0.01 0.10 -0.03 -0.04 -0.04

-0.30 -0.24 -0.26 -0.04 -0.04 -0.03

0.24 0.23 0.12 -0.07 -0.10 -0.09

(4)

0.04 -0.02 -0.07

-0.03 0.02 0.04

-0.30 -0.25 -0.19

0.24 0.23 0.24

(3)

0.03 -0.03 0.02 0.01 0.01 0.01

-0.05 -0.10 -0.05 -0.02 -0.04 -0.03

-0.28 -0.21 -0.25 -0.05 -0.06 -0.06

(3)

0.03 -0.04 -0.02

-0.05 0.01 0.04

-0.28 -0.25 -0.24

Table 5: Estimates of K

0,∆

and K

0,∆

(N) A

(N) F

(N)

α β α β α β α β

N=4

-7.20 1.94 -6.58 1.75 1.63 -5.43 2.00 -3.87 1.07 1.13

1.32 5.80 1.28 -0.58 -0.64 -2.94 -0.18 -0.28

-0.50 -0.47 1.40 -1.29 -0.57 -0.90

-0.70 -0.74 0.72 -0.20 0.79 -0.84

N=3

-10.32 2.75 -10.41 2.74 1.23 -1.30 1.00 0.72 3.20 2.19

1.79 21.87 1.82 0.65 -0.54 -2.66 3.90 -1.21

-1.58 -1.60 2.99 -0.32 -1.32 -0.67

Table 6: Estimates of α and β

0,P,∆

1,P,∆

2,P,∆

(4)

0.40

-0.11 0.40

0.11 0.07 0.40

0.01 -0.02 -0.02 0.62

(4)

0.08 0.12

-0.00 0.01 -0.00 0.13

0.20 -0.03 0.00 0.03 0.02 0.11

0.05 -0.02 0.00 0.00 0.01 -0.01 -0.01 0.16

(4)

0.03 0.08 0.07

-0.01 0.01 0.02 0.13 -0.03 0.01

0.03 -0.01 0.00 -0.05 0.04 0.08 0.08 -0.01 0.00

0.02 -0.02 0.00 0.00 -0.06 0.01 -0.13 0.07 0.06 -0.02 0.00 0.00

(4)

0.40

-0.11 0.40

0.11 0.07 0.40

0.01 -0.02 -0.02 0.62

(4)

0.21 0.10

0.18 0.18 -0.06 0.07

0.23 -0.07 0.05 -0.00 -0.03 0.12

0.01 -0.09 0.02 0.00 0.02 0.02 -0.02 0.18

(4)

0.19 0.10 0.10

0.18 0.17 -0.06 0.02 -0.04 0.11

0.23 -0.05 0.04 -0.02 -0.09 0.03 0.02 -0.02 0.13

-0.05 -0.00 -0.01 0.04 0.13 0.09 0.11 0.00 -0.08 0.01 0.07 0.12

Table 7: Estimates of Σ

i,P,∆

(choleskey decomposition) for A

(4) and F

(4) models

0,P,∆

1,P,∆

2,P,∆

(3)

0.41

-0.11 0.40

0.10 0.07 0.41

(3)

0.10 0.09

0.07 0.00 -0.02 0.10

0.25 -0.03 0.00 0.01 0.00 0.08

(3)

0.03 0.08 0.03

0.02 0.00 -0.03 0.09 0.02 0.00

0.07 -0.02 0.00 -0.02 -0.02 0.02 0.07 -0.01 0.00

(3)

0.40

-0.11 0.40

0.11 0.07 0.41

(3)

0.19 0.14

0.23 0.11 -0.09 0.09

0.21 -0.17 0.06 0.01 -0.02 0.14

(3)

0.09 0.03 0.09

0.02 0.03 0.05 0.01 -0.07 0.03

0.26 -0.02 0.06 0.01 -0.02 0.00 0.02 0.08 0.04

Table 8: Estimates of Σ

i,P,∆

(choleskey decomposition) for A

(3) and F

(3) models

References

Almeida, C., J. J. Graveline, and S. Joslin, 2011, “Do interest rate options contain information

about excess returns?,” Journal of Econometrics.

Bikbov, R., and M. Chernov, 2009, “”Unspanned Stochastic Volatility in Aﬃne Models:

Evidence from Eurodollar Futures and Options,” Management Science.

Campbell, J., 1986, “A defense of the traditional hypotheses about the term structure of

interest rates,” Journal of Finance.

Campbell, J., and R. Shiller, 1991, “Yield Spreads and Interest Rate Movements: A Bird’s

Eye View,” Review of Economic Studies, 58, 495–514.

Carr, P., X. Gabaix, and L. Wu, 2009, “Linearity-Generating Processes, Unspanned Stochastic

Volatility, and Interest-Rate Option Pricing,” Discussion paper, New York University.

Cheridito, R., D. Filipovic, and R. Kimmel, 2007, “Market Price of Risk Speciﬁcations for

Aﬃne Models: Theory and Evidence,” Journal of Financial Economics, 83, 123 – 170.

Chernov, M., and P. Mueller, 2009, “The Term Structure of Inﬂation Expectations,” Discussion

paper, London Business School.

Collin-Dufresne, P., R. Goldstein, and C. Jones, 2008, “Identiﬁcation of Maximal Aﬃne Term

Structure Models,” Journal of Finance, LXIII, 743–795.

Collin-Dufresne, P., R. Goldstein, and C. Jones, 2009, “Can Interest Rate Volatility Be

Extracted From the Cross Section of Bond Yields?,” Journal of Financial Economics, 94,

47–66.

Collin-Dufresne, P., and R. S. Goldstein, 2002, “Do Bonds Span the Fixed Income Markets?

Theory and Evidence for ‘Unspanned’ Stochastic Volatility,” Journal of Finance, 57,

1685–1730.

Cox, J., J. Ingersoll, and S. Ross, 1985, “An Intertemporal General Equilibrium Model of

Asset Prices,” Econometrica, 53, 363–384.

Dai, Q., and K. Singleton, 2000, “Speciﬁcation Analysis of Aﬃne Term Structure Models,”

Journal of Finance, 55, 1943–1978.

Dai, Q., and K. Singleton, 2002, “Expectations Puzzles, Time-Varying Risk Premia, and

Aﬃne Models of the Term Structure,” Journal of Financial Economics, 63, 415–441.

Dai, Q., and K. Singleton, 2003, “Term Structure Dynamics in Theory and Reality,” Review

of Financial Studies, 16, 631–678.

de los Rios, A. D., 2013, “A New Linear Estimator for Gaussian Dynamic Term Structure

Models,” Discussion paper, Bank of Canada.

Duarte, J., 2004, “Evaluating an Alternative Risk Preference in Aﬃne Term Structure Models,”

Review of Financial Studies, 17, 379–404.

Duﬀee, G., 2002, “Term Premia and Interest Rates Forecasts in Aﬃne Models,” Journal of

Finance, 57, 405–443.

Duﬀee, G., 2011, “Forecasting with the Term Structure: the Role of No-Arbitrage,” Discussion

paper, Johns Hopkins University.

Duﬃe, D., D. Filipovic, and W. Schachermayer, 2003, “Aﬃne Processes and Applications in

Finance,” Annals of Applied Probability, 13, 984–1053.

Jacobs, K., and L. Karoui, 2009, “Conditional volatility in aﬃne term-structure models:

Evidence from Treasury and swap markets,” Journal of Financial Economics, 91, 288–318.

Joslin, S., 2013a, “Can Unspanned Stochastic Volatility Models Explain the Cross Section of

Bond Volatilities?,” Discussion paper, USC.

Joslin, S., 2013b, “Pricing and Hedging Volatility in Fixed Income Markets,” Discussion

paper, Working Paper, USC.

Joslin, S., A. Le, and K. Singleton, 2012, “Why Gaussian Macro-Finance Term Structure

Models Are (Nearly) Unconstrained Factor-VARs,” Journal of Financial Economics, forth-

coming.

Joslin, S., K. Singleton, and H. Zhu, 2011, “A New Perspective on Gaussian DTSMs,” Review

of Financial Studies.

Le, A., K. Singleton, and J. Dai, 2010, “Discrete-Time Aﬃne

Term Structure Models with

Generalized Market Prices of Risk,” Review of Financial Studies, 23, 2184–2227.

Litterman, R., J. Scheinkman, and L. Weiss, 1991, “Volatility and the Yield Curve,” Journal

of Fixed Income, 1, 49–53.

Longstaﬀ, F. A., and E. S. Schwartz, 1992, “Interest Rate Volatility and the Term Structure:

A Two-Factor General Equilibrium Model,” Journal of Finance, 47, 1259–1282.

Piazzesi, M., 2010, “Aﬃne Term Structure Models,” in Y. Ait-Sahalia, and L. Hansen (ed.),

Handbook of Financial Econometrics . chap. 12, pp. 691–766, Elsevier B.V.

Trolle, A. B., and E. S. Schwartz, 2009, “A general stochastic volatility model for the pricing

of interest rate derivatives,” Review of Financial Studies, 22, 2007–2057.