3  Introduction to the Multivariate Direct Filter Analysis

Chapter 2 introduced the LPP and its ideal solution. This chapter extends the discussion by determining optimal solutions without restricting to parametric models.

3.1 Background on Multivariate Filtering

Recall the spectral representation of \(\{ X_t \}\) via Equation 2.2. Because there are \(n\) series in the orthogonal increments process \(\mathcal{Z}\), we have \[ X_{t,j} = \int_{-\pi}^{\pi} e^{i \omega t} \, d\mathcal{Z}_j (\omega ) \] for each \(1 \leq j \leq n\), and hence for a scalar target \(\{ Y_t \}\) we have \[ Y_t = \sum_{j=1}^n \int_{-\pi}^{\pi} e^{i \omega t} \, \Psi_{1 j} (e^{-i \omega }) \, d\mathcal{Z}_j (\omega ). \tag{3.1}\] Each of the functions \(\Psi_{1j} (e^{-i \omega })\) is complex scalar-valued, and can be decomposed in terms of its gain and phase functions. We here provide some background on these functions, because they provide an interpretation of the action of the linear filter on the input time series.

Any complex number \(\zeta\) is decomposed in terms of its real \(\Re \zeta\) and imaginary \(\Im \zeta\) parts: \[ \zeta = \Re \zeta + i \, \Im \zeta. \] The magnitude of \(\zeta\) is defined via \[ | \zeta | = \sqrt{ { \Re \zeta }^2 + { \Im \zeta }^2 }. \] If this is positive, then \(\zeta / |\zeta|\) is a complex number with unit modulus, and hence can be represented as \(\exp \{ -i \, \mbox{Arg} \zeta \}\) for some angle in \([0, 2 \pi]\) known as \(\mbox{Arg} \zeta\), or the angular portion of \(\zeta\). It follows that \[ \zeta = | \zeta | \, \exp \{ -i \, \mbox{Arg} \zeta \}, \] which is known as the Polar decomposition of \(\zeta\). Sometimes it is of interest to use a negative Polar decomposition based upon the negative magnitude, which can still be written in terms of \(\mbox{Arg} \zeta\) via \[ \zeta = - | \zeta| \, \exp \{ -i \, [ \pi + \mbox{Arg} \zeta ] \}, \] using \(e^{-i \pi} = -1\). The angular portion of \(\zeta\) can be directly computed from the real and imaginary parts of \(\zeta\) via \[ \mbox{Arg} \zeta = \arctan \left( \frac{ -\Im \zeta }{ \Re \zeta} \right). \] Now when \(\zeta\) is a function of \(\omega \in [-\pi, \pi]\), then the magnitude and angular portions also become functions of \(\omega\). In particular, a scalar frequency response function has a magnitude function (called the gain function) and angular function (called the phase function). In the case of some scalar filter \(\Psi (L)\) (e.g., the component filter \(\Psi_{1j} (L)\)) we obtain \[ \Psi (e^{-i \omega }) = | \Psi (e^{-i \omega })| \, \exp \{ - i \, \mbox{Arg} \Psi (e^{-i \omega }) \}. \] At \(\omega = 0\), we know the frequency response function is \(\Psi (1) = \sum_{\ell \in \mathbb Z} \psi (\ell)\), which is real; hence the phase function at \(\omega = 0\) must be an integer multiple of \(\pi\). It is advantageous to ensure the phase function takes the value zero, as this will facilitate the definition of the phase delay function discussed below. We can ensure this condition by allowing the gain function to be signed. Denoting these by \(A(\omega )\) (for amplitude, or signed gain) and \(\Phi (\omega )\) (for continuous phase), we have \[ \Psi (e^{-i \omega }) = A(\omega ) \, \exp \{ - i \, \Phi (\omega ) \}. \tag{3.2}\] There may be frequencies \(\omega\) for which the frf equals zero; because the real and imaginary parts of zero are both zero, there is an indeterminancy to the angular portion. However, because the frf is continuous in \(\omega\) (which follows from summability of the coefficients) we should define the phase function at the zeroes such that it is continuous. By adjusting the angular portion by an integer multiple of \(\pi\), we can ensure that it will be a continuous function of \(\omega\); this adjustment can be compensated by inserting a sign change in the gain function. In this way, the signed gain and continuous phase functions can be computed: both \(A\) and \(\Phi\) will be continuous functions of \(\omega\), and \(\Phi (0) = 0\) as well.

Example 3.1 Consider the scalar filter \(\Psi (L) = (1 + L + L^2)/3\). The frf is \[ \Psi (e^{-i \omega}) = \frac{1 + e^{-i \omega} + e^{-i 2 \omega} }{3}, \] and so the modulus works out to be \[ | \Psi (e^{-i \omega}) | = \frac{ \sqrt{1 - \cos (2 \omega) }}{ 3 \sqrt{1 - \cos \omega}}. \] We can alter this function to take on certain negative values, and thereby obtain a continuous phase function. Observe that \[ \Psi (e^{-i \omega}) = \frac{e^{i \omega} + 1 + e^{-i \omega}}{3} \, e^{-i \omega} = \frac{1 + 2 \cos \omega}{3} \, e^{-i \omega}. \] When \(\omega \in [0, 2\pi/3)\), the function \(1 + 2 \cos \omega\) is real and positive, but when \(\omega \in (2 \pi/3, \pi]\) the function is negative. So we see that \(A (\omega) = (1 + 2 \cos \omega)/3\) is the signed gain function, and \(\Phi (\omega) = \omega\) is the continuous phase function.

Substituting Equation 3.2 into the spectral representation Equation 3.1 of the target, where \(A_j\) and \(\Phi_j\) are the gain and phase functions of \(\Psi_{1j} (e^{-i \omega }),\) yields \[ Y_t = \sum_{j=1}^n \int_{-\pi}^{\pi} e^{i \omega \, [ t - \omega^{-1} \, \Phi_j (\omega ) ] } \, A_j ( \omega ) \, d\mathcal{Z}_j (\omega ). \] This representation is interpreted as follows: the target is the sum of \(n\) filtered series, where each orthogonal increments process \(\mathcal{Z}_j\) has been dilated by the signed gain function \(A_j\), and the timing of the sinusoidal component \(e^{i \omega t }\) has been delayed by \(\omega ^{-1} \, \Phi_j (\omega )\). This quantity, called the phase delay function, is well-defined in a neighborhood of zero, as seen in the following result.

Proposition 3.1 If the scalar filter \(\Psi (L)\) satisfies \(\sum_{\ell \in \mathbb Z} \ell \, \psi (\ell) < \infty\), \(\Psi (1) \neq 0\), and the phase function is continuously defined, then the phase delay function \[ \phi (\omega ) = \frac{\Phi (\omega ) }{ \omega } \] is well-defined for \(\omega \in [-\pi, \pi]\), and \[ \phi (0) = \dot{\Phi} (0) = \frac{ \sum_{\ell \in \mathbb Z} \ell \, \psi (\ell) }{ \sum_{\ell \in \mathbb Z} \psi (\ell) }. \]

Proof. First note that the signed gain function is even, and hence \(\dot{A} (0) = 0\). Differentiating Equation 3.2 and evaluating at zero yields \[ -i \, \sum_{\ell \in \mathbb Z} \ell \, \psi (\ell) = \frac{\partial}{\partial \omega } \Psi (e^{-i \omega }) \vert_{\omega = 0} = \dot{A} (0) \, e^{-i \, \Phi (0) } + A(0) \, e^{-i \, \Phi (0) } \, (-i \, \dot{\Phi} (0)). \] Using \(\dot{A} (0) = 0\), \(\Phi (0) = 0\), and \(A(0) = \Psi (1)\) yields \[ \dot{\Phi} (0) = \frac{ \sum_{\ell \in \mathbb Z} \ell \, \psi (\ell) }{ \sum_{\ell \in \mathbb Z} \psi (\ell) }. \quad \Box \]

The amplitude effects can be understood as dilations of the input spectral densities. If the spectral density matrix of the input process is denoted \(F\), then \(F_{jj}\) is the spectral density of the \(j\)th component input series; its contribution to the target output involves the increment \[ A_j ( \omega ) \, d\mathcal{Z}_j (\omega), \] and the associated spectral density is \[ {| A_j (\omega) |}^2 \, F_{jj} (\omega). \] There are approximate empirical versions of these relations, which can be described in terms of the DFT. Applying the definition Equation 2.3 to the scalar output \(\{ Y_t \}\), and utilizing Equation 3.1, we obtain \[ \widetilde{Y} (\xi) = \int_{-\pi}^{\pi} T^{-1/2} \, \sum_{t=1}^T e^{i (\omega - \xi) t} \, \sum_{j=1}^n \Psi_{1j} (e^{-i \omega}) \, d\mathcal{Z}_j (\omega). \] Note that the summation is bounded as \(T \rightarrow\infty\) unless \(\omega = \xi\); it can be shown that the variance of the difference between \(\widetilde{Y} (\xi)\) and \(\sum_{j=1}^n \Psi_{1j} (e^{-i \xi}) \, \widetilde{X}_j (\xi) = \Psi (e^{-i \omega}) \, \widetilde{X} (\omega)\) tends to zero (see Proposition 9.7.3 of T. S. McElroy and Politis (2019)), so that we have the approximate result \[ \widetilde{Y} (\omega) \approx \Psi (e^{-i \omega}) \, \widetilde{X} (\omega). \tag{3.3}\] From a practical point of view we can ignore the approximation error in Equation 3.3, due to the uniform super-consistency argument for integrals and discrete sums; see Wildi (2005) and Wildi (2008). Utilizing Equation 2.4, we obtain an approximate relation of periodograms: \[ \widehat{F}_Y (\omega) \approx \Psi(e^{-i \omega}) \, \widehat{F}_X (\omega) \, { \Psi(e^{i \omega}) }^{\prime}. \tag{3.4}\] This approximation, when averaged over Fourier frequencies, will be used below to justify Direct Filter Analysis.

3.2 Multivariate Direct Filter Analysis of the LPP

We can now discuss a more general solution to the LPP. One perspective on Proposition 2.1 is that it provides a particular class of concurrent filters that arise from specified models. However, so long as these models are mis-specified, the resulting concurrent filters will be sub-optimal. Therefore, it may be possible to improve performance by utilizing broader classes of concurrent filters that are not derived from a particular model. The Direct Filter Analysis (DFA) seeks a concurrent filter \(\widehat{\Psi} (L)\) that optimizes the MSE in a given LPP. While DFA was originated to handle univariate time series, its multivariate generalization – Multivariate Direct Filter Analysis (MDFA) – is designed for the broader context of LPPs discussed in Chapter 2.

The entire class of concurrent filters corresponds to the collection of power series in \(L\). Here we are interested in scalar targets given \(n\) input series, so the coefficient matrices of the concurrent filters are \(1 \times n\). We may be interested in some subcollection \(\mathcal{G}\) of all concurrent filters. For instance, \(\mathcal{G}\) could be the optimal solutions to an LPP for a particular process, i.e., consist of all \(\widehat{\Psi} (L)\) given in Equation 3.16 for a particular \(\Psi (L)\) and \(\Theta (L)\). Or we might consider much broader classes of filters, that are described in terms of the rate of decay of the filter coefficients, e.g., \[ \mathcal{G} = \{ \widehat{\Psi} (L) : \widehat{\Psi} (e^{-i \omega}) \; \mbox{is twice continuously differentiable at } \; \omega = 0 \}. \] Alternatively, \(\mathcal{G}\) might consist of all VARMA filters of a particular AR and MA order, or might consist of all Zero-Pole Combination (ZPC) filters of a given specification Wildi (2007). The original univariate DFA of Wildi (2007) a pproached the LPP with \(\mathcal{G}\) consisting of appropriately restricted ZPC filters.

For now, we shall suppose that the concurrent filters of \(\mathcal{G}\) belong to some parametric family described by a parameter \(\vartheta\) belonging to a parameter manifold. Because we seek elements of \(\mathcal{G}\) that will solve an LPP, i.e., be a good concurrent approximation to \(\Psi (L)\), we use the notation \[ \mathcal{G} = \{ \widehat{\Psi}_{\vartheta} (L) : \vartheta \; \mbox{belongs to a parameter space} \}. \tag{3.5}\] Whereas the model-based approach to the LPP discussed in Chapter 2 involves minimizing a particular parametric form of the filter error MSE – namely the function \(J_{\Psi} (\vartheta, G)\) for \(G\) corresponding either to the periodogram or true spectrum – a more direct approach is to minimize a general expression for the filter error MSE over a given set \(\mathcal{G}\). The real-time estimation error is given in Equation 2.8, which has mean zero and variance \[ \mathbb E[ E_t^2 ] = { \langle \left[ \Psi (z) - \widehat{\Psi}_{\vartheta} (z) \right] \, \widetilde{F} \, { \left[ \Psi (z) - \widehat{\Psi}_{\vartheta} (z) \right] }^{*} \rangle }_0. \tag{3.6}\] This suggests the criterion function \(D_{\Psi} (\vartheta, G)\) for any Hermitian function \(G\), defined as \[ D_{\Psi} (\vartheta, G) = { \langle \left[ \Psi (z) - \widehat{\Psi}_{\vartheta} (z) \right] \, G \, { \left[ \Psi (z) - \widehat{\Psi}_{\vartheta} (z) \right] }^{*} \rangle }_0. \tag{3.7}\] This is the MDFA criterion function. An equivalent formula to Equation 3.7 that can be useful for calculations is \[ D_{\Psi} (\vartheta, G) = \mbox{tr} \{ { \langle G \, M_{\vartheta} \rangle }_0 \} \qquad M_{\vartheta} (z) = { \left[ \Psi (z) - \widehat{\Psi}_{\vartheta} (z) \right] }^{*} \, { \left[ \Psi (z) - \widehat{\Psi}_{\vartheta} (z) \right] }. \tag{3.8}\] Given a filter class \(\mathcal{G}\), the best possible concurrent filter is given by \(\widehat{\Psi}_{\vartheta (\widetilde{F})}\), where \(\vartheta (\widetilde{F})\) is a minimizer of \(D_{\Psi} (\vartheta, \widetilde{F})\). This \(\vartheta (\widetilde{F})\) is the PTV for the filter parameter, in analogy with the terminology for model parameters. Clearly, if the set \(\mathcal{G}\) is rendered sufficiently large to include the optimal concurrent filter for that particular LPP and process – as given in Proposition 2.1 – then there exists some \(\widetilde{\vartheta}\) such that \(\widehat{\Psi}_{\widetilde{\vartheta}}\) is identical with the optimal filter. However, if \(\mathcal{G}\) is smaller, then the PTV \(\vartheta (\widetilde{F})\) is as close as possible according to \(D_{\Psi}\) discrepancy to the optimal filter.

A case of interest arises from taking a very broad class \(\mathcal{G}\): let \(\mathcal{G}\) consist of all length \(q\) concurrent filters, denoted \(\mathcal{M}_q\), with \[ \vartheta^{\prime} = \left[ {\widehat{\psi} (0) }, {\widehat{\psi} (1) }^{}, \ldots, {\widehat{\psi} (q-1) }^{} \right]. \tag{3.9}\] So \(\vartheta\) is a column vector of length \(q n\). Then the criterion Equation 3.7 can be rewritten as \[ D_{\Psi} (\vartheta, G) = \vartheta^{\prime} \, B \, \vartheta - \vartheta^{\prime} \, b - b^{\prime} \, \vartheta + { \langle \Psi (z) \, G \, { \Psi (z) }^* \rangle }_0, \tag{3.10}\] where \[ b^{\prime} = \left[ { \langle \Psi (z) \, G \rangle }_{0}, { \langle \Psi (z) \, G \rangle }_{1}, \ldots, { \langle \Psi (z) \, G \rangle }_{q-1} \right], \tag{3.11}\] and \(B\) is a block matrix, where the \(jk\)th \(n \times n\) block of is \({ \langle G \rangle }_{k-j}\) for \(1 \leq j,k \leq q\).

Proposition 3.2 The minimizer of the MDFA criterion Equation 3.7, given that \(\mathcal{G} = \mathcal{M}_q\) (the set of length \(q\) concurrent filters), is \[ \vartheta = B^{-1} \, b, \] where the \(jk\)th block of \(B\) is \({ \langle G \rangle }_{k-j}\), and \(b\) is given by Equation 3.11. The minimal value is \[ { \langle \Psi (z) \, G \, { \Psi (z) }^* \rangle }_0 - b^{\prime} \, B^{-1} \, b. \tag{3.12}\]

Proof. First note that the typical component of \(b\) has the form \[ { \langle \Psi (z) \, G \rangle }_{\ell} = \sum_{k \in \mathbb Z} \psi (k) \, { \langle G \rangle }_{\ell-k} \tag{3.13}\] for \(0 \leq \ell < q\),which shows that \(b\) is real-valued. The objective function is a quadratic in \(\vartheta\), and therefore the minimizer is obtained by computing the gradient and Hessian, which are \(-2 b + 2 B \, \vartheta\) and \(2 B\) respectively, yielding the solution. Plugging back into \(D_{\Psi}\) yields Equation 3.12. \(\quad \Box\)

Conjecture 3.1 To implement Proposition 3.2 in practice, \(G\) is given by the periodogram so that \({ \langle G \rangle }_h = \widehat{\Gamma} (h)\). It is necessary to compute \(b\), given by Equation 3.11, and we can proceed by approximating the integrals over a Riemann mesh corresponding to Fourier frequencies; this is discussed further below.

This broad class \(\mathcal{G}\) of filters will furnish concurrent filters that closely approximate those of Proposition 2.1 as \(q \rightarrow\infty\).

Example 3.2 (Multi-step Ahead Forecasting.) Suppose we consider the one-step ahead forecasting of stationary time series and \(\mathcal{G}\) corresponds to all moving average filters of length \(q\) (i.e., the set \(\mathcal{M}_q\)), where
\[ \vartheta = \mbox{vec} [{\widehat{\psi} (0) }^{\prime}, {\widehat{\psi} (1) }^{\prime}, \ldots, {\widehat{\psi} (q-1) }^{\prime} ]. \] With \(\Psi (L) = L^{-1}\) from Equation 3.7 we have \[ \begin{align} D_{\Psi} (\vartheta, G) & = { \langle \left[ z^{-1} e_1^{\prime} - \widehat{\Psi}_{\vartheta} (z) \right] \, G \, { \left[ z^{-1} e_1^{\prime} - \widehat{\Psi}_{\vartheta} (z) \right] }^{*} \rangle }_0 \\ & = { \langle \left[ e_1^{\prime} - \sum_{\ell = 0}^{q-1} \widehat{\psi} (\ell) \, z^{\ell+1} \right] \, G \, { \left[ e_1^{\prime} - \sum_{\ell = 0}^{q-1} \widehat{\psi} (\ell) \, z^{\ell+1} \right] }^{*} \rangle }_0 \\ & = { \langle G \rangle }_0 - 2 \, \vartheta^{\prime} \, { \langle G \rangle }_{1:q} \, e_1 + \vartheta^{\prime} \, { \langle G \rangle }_{0:(q-1),0:(q-1)} \, \vartheta. \end{align} \] Hence the optimizer is \[ \vartheta (G) = { \langle G \rangle }_{0:(q-1),0:(q-1)}^{-1} \, { \langle G \rangle }_{1:q} \, e_1, \] which is the first component of the solution to the Yule-Walker system of order \(q\) determined by \(G\). Therefore the MDFA solution is the same as the fit of a VAR(\(q\)) using Proposition 2.1.

The empirical problem is solved by minimizing \(D_{\Psi} (\vartheta, \widehat{F})\), yielding the estimator \(\vartheta (\widehat{F})\). The empirical criterion can be simply computed using Equation 3.8 and Equation 2.6, namely \[ D_{\Psi} (\vartheta, \widehat{F}) = \sum_{|h| < T } \mbox{tr} \{ \widehat{\Gamma} (h) \, { \langle M_{\vartheta} \rangle }_{-h} \}. \] Filtering with \(\widehat{\Psi}_{\vartheta (\widehat{F})}\) instead of \(\widehat{\Psi}_{\vartheta (\widetilde{F})}\) involves some statistical error, which vanishes as \(T \rightarrow\infty\) because \(\vartheta ( \widehat{F})\) is consistent for the PTV. We can quantify this additional error if we know the statistical properties of the estimate; under fairly broad conditions, it follows a central limit theorem. As in Chapter 2, we assume the HT conditions and that the Hessian \(H(\vartheta) = \nabla \nabla^{\prime} D_{\Psi} (\vartheta, \widetilde{F})\) of \(D_{\Psi}\) is positive definite at the PTV. The function \(M\) is defined in Equation 3.8.

Theorem 3.1 Suppose that \(\vartheta (\widetilde{f})\) exists uniquely in the interior of the filter parameter space, and that \(H(\vartheta (\widetilde{F}))\) is positive definite. Suppose that \(\{ X_t \}\) has finite fourth moments, conditions (HT1)-(HT6) of Taniguchi and Kakizawa (2012) hold, and that the fourth order cumulant function of \(\{ X_t \}\) is zero. Then the estimator is consistent for the PTV, and \[ \sqrt{T} \, \left( \vartheta( \widehat{F} ) - \vartheta (\widetilde{F}) \right) \stackrel{{\cal L}}{\Longrightarrow }\mathcal{N} \left( 0, { H(\vartheta (\widetilde{F})) }^{-1} \, V (\vartheta (\widetilde{F})) \, { H(\vartheta (\widetilde{F})) }^{-1} \right) \] as \(T \rightarrow\infty\), where \[ V_{jk} (\vartheta) = \mbox{tr} \{ { \langle \partial_j M_{\vartheta} (z) \, \widetilde{F} \, \partial_k M_{\vartheta} (z) \, \widetilde{F} \rangle }_0 \}. \]

Proof. This is proved in the same way as Theorem 2.1.

We designate the resulting prediction function \(\widehat{\Psi}_{\widehat{\vartheta}}\) as a Linear Prediction Filter (LPF).

Example 3.3 (VAR(1).) Again consider a VAR(1) process, and suppose we wish to use MDFA to approximate the optimal LPP solution – even though we don’t know the true dynamics. Let \(\mathcal{G} = \mathcal{M}_q\), and let \(G\) be the spectral density of the VAR(1); the solution given by Proposition 3.2 can be compared to that of the LPP, which has the first \(q\) components given by \[ \varphi^{\prime} = [ \psi (0) + A_{\Psi} (\Phi), \psi (1), \ldots, \psi (q-1)]. \] This is an approximate solution to the system \(\vartheta^{\prime} \, B = b^{\prime}\), because \(\varphi^{\prime} \, B\) has \(j+1\)th component, for \(0 \leq j \leq q-1\), equal to \[ \sum_{\ell=0}^{q-1} \psi (\ell) \, {\langle G \rangle }_{j-\ell} + A_{\Psi} (\Phi) \, \Gamma (j). \] Noting that \[ A_{\Psi} (\Phi) \, \Gamma (j) = \sum_{\ell < 0 } \psi (-\ell) \, \Phi^{-\ell} \, \Gamma (j) = \sum_{\ell < 0} \psi (-\ell) \, \Gamma (j- \ell), \] because for a VAR(1) process \(\Gamma (h) = \Phi^h \, \Gamma (0)\) when \(h \geq 0\), we see that component \(j+1\) of \(\varphi^{\prime} \, B\) is \[ \sum_{\ell \leq q-1} \psi (\ell) \, \Gamma (j-\ell) = {[ \Re b^{\prime} ]}_{j+1} - \sum_{\ell \geq q} \psi (\ell) \, \Gamma (j-\ell). \] As \(q \rightarrow\infty\) the error term vanishes (for each \(j\)), indicating that \(\varphi^{\prime} \, B \approx b^{\prime}\), or \(\vartheta \approx \varphi\). So the difference in this VAR(1) case between the LPP solution and that of MDFA is due to the fact that the former (LPP) assumes an infinite-length filter. This is reflected in the finitely truncated error term.

3.3 Computation of the Linear Prediction Filter

Here we discuss the calculation of the quantities \(B\) and \(b\) appearing in Proposition 3.2. In the case that \(G\) corresponds to the periodogram only a finite number of sample autocovariances are non-zero, and Equation 3.13 simplifies. More generally, suppose that for some \(r > 0\) we have \({ \langle G \rangle }_h = 0\) for all \(|h| \geq r\). Then Equation 3.13 can be written in matrix form as \[ b = \left[ \begin{array}{ccccccc} { \langle G \rangle }_{1-r} & \ldots & { \langle G \rangle }_0 & \ldots & { \langle G \rangle }_{r-1} & 0 & \ldots \\ \ddots & \ddots & \ddots & \ddots & \ddots & \ddots & \ddots \\ \ldots & 0 & {\langle G \rangle}_{1-r} & \ldots & { \langle G \rangle }_{0} & \ldots & { \langle G \rangle }_{r-1} \end{array} \right] \, \left[ \begin{array}{c} {\psi (1-r) }^{\prime} \\ \vdots \\ {\psi (r+q-2) }^{\prime} \end{array} \right]. \] The matrix in this product has \(2r+q-2\) block columns, and \(q\) block rows. The dimension of \(B\) is similarly \(rn \times rn\). This formulation requires a time-domain specification of the target filter, whereas in applications it is often more convenient to utilize the frequency-domain. To that end, we discuss the computation of \(b\) and \(B\) via discretization of the appropriate integrals over a Riemann mesh corresponding to Fourier frequencies. We let square brackets denote the floor of a real number.

Definition 3.1 Given integer \(T\), the Fourier frequencies are a set of \(T\) numbers in \([-\pi,\pi]\) of the form \(\omega_{j} = 2 \pi j/ T\) for \(-[T/2] \leq j \leq [T/2]\) (when \(T\) is odd) and \(-[T/2] \leq j \leq [T/2]-1\) (when \(T\) is even).

Conjecture 3.2 In the case that \(T\) is odd, there exists \(m\) such that \(T = 2 m + 1\), and in this case \(-m \leq j \leq m\). In the case that \(T\) is even, then \(T = 2m\) for some \(m\), and \(-m \leq j \leq m-1\). Clearly, \(m = [T/2]\) in either case.

The Fourier frequencies form the basis for a transformation of the time-domain sample \(\underline{X}\) to the frequency-domain, known as the DFT, cf. Equation 2.3. By restricting the DFT to Fourier frequencies, we obtain a linear transformation from the \(T \times n\) matrix of the sample to a \(T \times n\) matrix of DFTs. To show this result, let \[ \mathcal{X} = [ X_1, X_2, \ldots, X_T], \] so that \(\mbox{vec} [\mathcal{X}] = \underline{X}\). Similarly, denote the matrix of DFTs by \(\widetilde{\mathcal{X}}\), with \(j\)th column (\(1 \leq j \leq T\)) given by \(\widetilde{X} (\omega_{j - [T/2]-1})\). In this way, the matrix of DFTs begins with \(\widetilde{X} (\omega_{-[T/2]})\) in the first column, and proceeds to \(\widetilde{X} (\omega_{T-[T/2]-1})\), with frequency corresponding to either \([T/2]\) or \([T/2] -1\) depending on whether \(T\) is odd or even. Letting \(C\) denote the \(T \times T\) linear transformation such that \(\widetilde{\mathcal{X}}^{\prime} = C \, \mathcal{X}^{\prime}\), we see that \[ C_{jt} = T^{-1/2} \, \exp \{ - i \, 2 \pi t \, (j- [T/2]-1)/T \}, \] for \(1 \leq j,t \leq T\). This follows directly from Equation 2.3. Moreover, the original sample can be recovered from the DFT matrix by applying \(C^{-1}\), which equals the conjugate transpose.

Proposition 3.3 The DFT matrix \(C\) is unitary, i.e., \(C^{-1} = \overline{C}^{\prime}\).

Proof. \[ \begin{align} {\left[ \overline{C}^{\prime} \, C \right] }_{jk} & = T^{-1} \, \sum_{t=1}^T \exp \{ i \, 2 \pi j \, (t-[T/2]-1)/T \} \, \exp \{ - i \, 2 \pi k \, (t - [T/2]-1)/T \} \\ & = \left( T^{-1} \, \sum_{t=1}^T \exp \{ i \, 2 \pi (t [j-k]) /T \} \right) \, \exp \{ i \, 2 \pi (k-j) \, ([T/2]+1)/T \}, \end{align} \] and the expression in parentheses equals zero unless \(k=j\), in which case it equals one (this is easily verified by using the formula for the partial summation of a geometric series). Hence the \(jk\)th element of \(\overline{C}^{\prime} \, C\) corresponds to the \(jk\)th element of the identity matrix. \(\quad \Box\)

To compute the quantities given in Proposition 3.2, and more generally to compute the MDFA criterion Equation 3.7, we propose to approximate each integral by an average over Fourier frequencies. Although finer meshes could clearly be implemented, the Fourier frequency mesh is sufficient for statistical purposes – this is because when considering the asymptotic properties of linear functionals of the periodogram (i.e., weighted linear combinations of periodogram ordinates), there is no difference between averaging over Fourier frequencies or integrating over every frequency. Moreover, using the Fourier frequencies produces an empirical criterion function that is a closer approximation to the sample mean squared error, which is shown by the following arguments. Recalling that the real-time filter error \(E_t = Y_t - \widehat{Y}_t\) has variance given by Equation 3.6, the sample variance is \[ T^{-1} \, \sum_{t=1}^T E_t^2 = T^{-1} \sum_{j=1}^T \widehat{F}_E (\omega_{j-[T/2]-1}), \] where \(\widehat{F}_E\) is the periodogram of the filter errors. This equality is a discrete version of the Plancherel identity; the right hand side is approximated by \[ T^{-1} \sum_{j=1}^T \left[ \Psi - \widehat{\Psi} \right](\omega_{j-[T/2]-1}) \, \widehat{F}_X (\omega_{j-[T/2]-1}) \, {\left[ \Psi - \widehat{\Psi} \right]}^{\prime} (-\omega_{j-[T/2]-1}), \tag{3.14}\] using Equation 3.4. Note that Equation 3.14 is a super-consistent estimate of the sample mean squared error (see Theorem 4.8 (assertion 4) in Wildi (2005)), and is exactly the MDFA criterion Equation 3.7 with the integrals replaced by Riemann sums over the Fourier frequencies, and \(G\) replaced by the periodogram.

With this justification, we see that the entries of the matrix \(B\) in Proposition 3.2 are approximately computed via \[ B_{j,k} \approx T^{-1} \sum_{\ell=1}^T G (\omega_{\ell-[T/2]-1}) \, \exp \{ i \, (k-j) (\omega_{\ell-[T/2]-1}) \} \] for \(1 \leq j,k \leq T\). Moreover, for \(0 \leq k \leq T-1\) \[ b_k^{\prime} \approx T^{-1} \sum_{\ell=1}^T \Psi ( \exp\{ -i \,\omega_{\ell-[T/2]-1} \}) \, G (\omega_{\ell-[T/2]-1}) \, \exp \{ i \, k (\omega_{\ell-[T/2]-1}) \}, \] where \(b^{\prime} = [ b_0^{\prime}, \ldots, b_{T-1}^{\prime} ]\). Finally, \[ { \langle \Psi (z) \, G \, { \Psi (z) }^* \rangle }_0 \approx T^{-1} \sum_{\ell=1}^T \Psi ( \exp\{ -i \,\omega_{\ell-[T/2]-1} \}) \, G (\omega_{\ell-[T/2]-1}) \, \Psi^{\prime} ( \exp\{ i \,\omega_{\ell-[T/2]-1} \}). \] Several exercises illustrate the implementation of these formulas, and their applications to filtering problems. Our implementation in mdfa.filter is written with a multivariate target in view; by focusing upon the first row, the applications of this chapter can be obtained.

Exercise 3.1 (Correct VAR(1) LPP.) This exercise compares LPP and MDFA when the model is correct. Simulate a sample of size \(T=100\) from a bivariate VAR(1) process with \[ \Phi = \left[ \begin{array}{cc} 1 & 1/2 \\ -1/5 & 3/10 \end{array} \right] \] and \(\Sigma\) equal to the identity. The eigenvalues are \(4/5\) and \(1/2\). Implement MDFA for this sample, using the moving average filters (Proposition 3.2) of length \(q = 20\) to approximate the optimal LPP filter (Exercise 2.1), for the \(2\)-step ahead forecasting LPP (Example 2.1). Does the MDFA concurrent filter closely match the optimal LPP filter? Repeat for \(T=200\) and \(T=500\).

Solution

First, we simulate a Gaussian VAR(1) of sample size \(T=100\). For \(T=200\) or \(T=500\), alter T.sim.

set.seed(888)
T.sim <- 100
N <- 2
phi.matrix <- rbind(c(1,.5),c(-.2,.3))
innovar.matrix <- diag(2)
x.sim <- varp.sim(array(phi.matrix,c(2,2,1)),innovar.matrix,T.sim)

Compute sample autocovariances.

x.acf <- acf(x.sim,type="covariance",plot=FALSE,lag.max=T.sim)[[1]]
x.acf <- aperm(aperm(x.acf,c(3,2,1)),c(2,1,3))

Next, we obtain the solution to the 2-step ahead forecasting LPP.

psi.array <- array(0,c(1,2,2))
psi.array[,,1] <- c(0,0)
psi.array[,,2] <- c(1,0)
acf.array <- x.acf[,,1:3]
theta <- rnorm(4)
var1.fit.2step <- optim(theta,lpp.var1,psi.array=psi.array,
  acf.array=acf.array,method="BFGS")

We evaluate the 2-step ahead forecast error criterion function at its minimizer, obtaining 2.9337. Then we compute the estimate of the VAR(1) coefficient \(\Phi\) from the LPP, denoted by phi.lpp. The 2-step ahead forecast fore.lpp is obtained by multiplying the present observation by \(\Phi^2\).

phi.lpp <- var.pre2par(var1.fit.2step$par,1,2)[,,1]
fore.lpp <- phi.lpp %^% 2
fore.lpp <- fore.lpp[1,]

Now we set up MDFA, first selecting the filter length \(q=20\) and computing the Fourier frequencies.

q <- 20
grid <- T.sim
m <- floor(grid/2)
lambda.ft <- exp(-1i*2*pi*grid^{-1}*(seq(1,grid) - (m+1)))

Next, we compute the frf for the 2-step ahead forecasting target, and calculate the periodogram at the Fourier frequencies using mdfa.pergram.

frf.psi <- matrix(lambda.ft^{-2},nrow=1) %x% diag(N)
frf.psi <- array(frf.psi,c(N,N,grid))
spec.hat <- mdfa.pergram(x.sim,1)

Finally, we compute the MDFA filter. We do this through the function mdaf.unconstrained, which calls mdaf.filter without any constraints on filter coefficients (filter constraints are studied later in Section 4.1).

fore.mdfa <- mdfa.unconstrained(frf.psi,spec.hat,q)

The results of the LPP and MDFA methods can then be compared by examining the first coefficient of each. For the LPP we have

print(fore.lpp)
[1] 0.8422086 0.6642738

and for MDFA we have

print(fore.mdfa[[1]][1,,1])
[1] 1.0383626 0.2279001

Exercise 3.2 (Incorrect VAR(1) LPP.) This exercise compares LPP and MDFA when the model is wrong. Simulate a sample of size \(T=100\) from a bivariate VMA(1) with \[ \Theta = \left[ \begin{array}{cc} 1 & 0 \\ 2/5 & 2 \end{array} \right] \] and \(\Sigma\) equal to the identity. Use the moving average filter MDFA (Proposition 3.2) with \(q=20\) to find the best concurrent filter, for the \(2\)-step ahead forecasting LPP (Example 2.1). Compare these results to the VAR(1) LPP filter previously obtained (Exercise 2.2), based on the mis-specified VAR(1) model. Which filter, LPP or MDFA, more closely approximates the ideal filter? Repeat for \(T=200\) and \(T=500\), and explain your results.

Solution

For \(T=200\) or \(T=500\), alter T.sim in the following code block.

set.seed(888)
T.sim <- 100
N <- 2
theta.matrix <- rbind(c(1,0),c(.4,2))
innovar.matrix <- diag(2)
x.sim <- vmaq.sim(array(theta.matrix,c(2,2,1)),innovar.matrix,T.sim)

Compute sample autocovariances.

x.acf <- acf(x.sim,type="covariance",plot=FALSE,lag.max=T.sim)[[1]]
x.acf <- aperm(aperm(x.acf,c(3,2,1)),c(2,1,3))

Next, we obtain the solution to the 2-step ahead forecasting LPP.

psi.array <- array(0,c(1,2,2))
psi.array[,,1] <- c(0,0)
psi.array[,,2] <- c(1,0)
acf.array <- x.acf[,,1:3]
theta <- rnorm(4)
var1.fit.2step <- optim(theta,lpp.var1,psi.array=psi.array,
  acf.array=acf.array,method="BFGS")

We compute the estimate of the VAR(1) coefficient \(\Phi\); the 2-step ahead forecast is obtained by multiplying the present observation by \(\Phi^2\).

phi.lpp <- var.pre2par(var1.fit.2step$par,1,2)[,,1]
fore.lpp <- phi.lpp %^% 2
fore.lpp <- fore.lpp[1,]

Now we set up MDFA, first selecting the filter length \(q=20\) and computing the Fourier frequencies.

q <- 20
grid <- T.sim
m <- floor(grid/2)
lambda.ft <- exp(-1i*2*pi*grid^{-1}*(seq(1,grid) - (m+1)))

Next, we compute the frf for the 2-step ahead forecasting target, and calculate the periodogram.

frf.psi <- matrix(lambda.ft^{-2},nrow=1) %x% diag(N)
frf.psi <- array(frf.psi,c(N,N,grid))
spec.hat <- mdfa.pergram(x.sim,1)

Finally, we compute the MDFA filter.

fore.mdfa <- mdfa.unconstrained(frf.psi,spec.hat,q)

The results of the LPP and MDFA methods can then be compared by examining the first coefficient of each. For the LPP we have

print(fore.lpp)
[1] -0.1142641  0.1013095

and for MDFA we have

print(fore.mdfa[[1]][1,,1])
[1] 0.01040808 0.03128738

Whereas the LPP method relies on a mis-specified model to construct the 2-step ahead forecasting filter, MDFA avoids modeling altogether, and therefore has superior performance. In particular, the 2-step ahead forecast error criterion at the LPP solution is 2.0394, while the same error for the MDFA is 1.4725.

Exercise 3.3 (MDFA VAR(1) Filtering.) This exercise examines MDFA applied to the trend of a VAR(1) process. Simulate a sample of size \(T=5000\) from a bivariate VAR(1) process with \[ \Phi = \left[ \begin{array}{cc} 1 & 1/2 \\ -1/5 & 3/10 \end{array} \right] \] and \(\Sigma\) equal to the identity. The eigenvalues are \(.8\) and \(.5\). Apply the ideal low-pass filter (cf. Example 2.2) with \(\mu = \pi/6\) to the sample (truncate the filter to \(1000\) coefficients on each side). Use the moving average filter MDFA (Proposition 3.2) to find the best concurrent filter, setting \(q= 20\). Apply this concurrent filter to the simulation, and compare the relevant portions to the ideal trend. Also determine the in-sample performance, in comparison to the criterion value (Equation 3.12). Target the trends for both time series.

Solution

We begin with a simulation of length \(5000\).

set.seed(1234)
T.sim <- 5000
N <- 2
phi.matrix <- rbind(c(1,.5),c(-.2,.3))
innovar.matrix <- diag(2)
x.sim <- varp.sim(array(phi.matrix,c(2,2,1)),innovar.matrix,T.sim)

Next, we construct the ideal low-pass filter. We use the custom function mvar.filter to do multivariate filtering.

mu <- pi/6
len <- 1000
lp.filter <- c(mu/pi,sin(seq(1,len)*mu)/(pi*seq(1,len)))
lp.filter <- c(rev(lp.filter),lp.filter[-1])
x.trend.ideal <- mvar.filter(x.sim,array(t(lp.filter) %x% diag(N),c(N,N,(2*len+1))))

Now we set up MDFA, first selecting the filter length \(q=20\) and computing the Fourier frequencies.

q <- 20
grid <- T.sim - 2*len
m <- floor(grid/2)
freq.ft <- 2*pi*grid^{-1}*(seq(1,grid) - (m+1))

Next, we compute the frf for the ideal low-pass target, and calculate the periodogram.

frf.psi <- rep(0,grid)
frf.psi[abs(freq.ft) <= mu] <- 1
frf.psi <- matrix(frf.psi,nrow=1) %x% diag(N)
frf.psi <- array(frf.psi,c(N,N,grid))
spec.hat <- mdfa.pergram(x.sim[(len+1):(T.sim-len),],1)

Finally, we compute and apply the MDFA filter. We have to trim the output appropriately so that it can be correctly matched to the ideal trend.

lp.mdfa <- mdfa.unconstrained(frf.psi,spec.hat,q)
x.trend.mdfa <- mvar.filter(x.sim,lp.mdfa[[1]])[(len-q+2):(T.sim-q+1-len),]

The ideal trend can now be compared to the MDFA trend.

par(oma=c(2,0,0,0),mar=c(2,4,2,2)+0.1,mfrow=c(2,1),cex.lab=.8)
plot(ts(x.trend.ideal[,1]),ylab="",xlab="",yaxt="n",xaxt="n")
axis(1,cex.axis=.5)
axis(2,cex.axis=.5)
lines(x.trend.mdfa[,1],col=grey(.7))
plot(ts(x.trend.ideal[,2]),ylab="",xlab="",yaxt="n",xaxt="n")
axis(1,cex.axis=.5)
axis(2,cex.axis=.5)
lines(x.trend.mdfa[,2],col=grey(.7))
mtext("Time",side = 1,line = 1,outer=TRUE)
invisible(dev.off())

The in-sample MSE for both components is

print(c(mean((x.trend.ideal[,1] - x.trend.mdfa[,1])^2),
    mean((x.trend.ideal[,2] - x.trend.mdfa[,2])^2)))
[1] 0.4282337 0.1453833

which is close to the criterion value 0.4296039, 0.147079.

Figure 3.1 shows the tracking of the ideal trends by the MDFA real-time trends. The MDFA criterion attempts to find a real-time filter \(\widehat{\Psi}\) that is close to the target \(\Psi\) at frequencies that are emphasized by spectral content in the time series, which is assessed through the periodogram. This particular optimization concept can be understood by analyzing real-time filter outputs and filter characteristics, i.e., amplitude and phase delay functions.

Exercise 3.4 (MDFA VAR(1) Filtering Characteristics.) This exercise examines MDFA applied to the trend of a trivariate VAR(1) process, which is essentially three univariate AR(1) processes. Simulate a sample of size \(T=5000\) from a trivariate VAR(1) process with \[ \Phi = \left[ \begin{array}{ccc} 9/10 & 0 & 0 \\ 0 & 1/10 & 0 \\ 0 & 0 & -9/10 \end{array} \right] \] and \(\Sigma\) equal to the identity. Apply the ideal low-pass filter with \(\mu = \pi/6\) to the sample (truncate the filter to \(1000\) coefficients on each side). Use the moving average filter MDFA (Proposition 3.2) to find the best concurrent filter, setting \(q= 12\). Apply this concurrent filter to the simulation, and compare the relevant portions to the ideal trend. Also determine the in-sample performance, in comparison to the criterion value (Equation 3.12). Target the trends for both time series, and compare the results graphically. Finally, compute and graphically compare the amplitude and phase delay functions for each of the three trend targets.

Solution

We begin with a simulation of length \(5000\).

set.seed(1234)
T.sim <- 5000
N <- 3
phi.matrix <- rbind(c(.9,0,0),c(0,.1,0),c(0,0,-.9))
innovar.matrix <- diag(3)
x.sim <- varp.sim(array(phi.matrix,c(3,3,1)),innovar.matrix,T.sim)

Next, we construct the ideal low-pass filter.

mu <- pi/6
len <- 1000
lp.filter <- c(mu/pi,sin(seq(1,len)*mu)/(pi*seq(1,len)))
lp.filter <- c(rev(lp.filter),lp.filter[-1])
x.trend.ideal <- mvar.filter(x.sim,array(t(lp.filter) %x% diag(N),c(N,N,(2*len+1))))

Now we set up MDFA, first selecting the filter length \(q=20\) and computing the Fourier frequencies.

q <- 20
grid <- T.sim - 2*len
m <- floor(grid/2)
freq.ft <- 2*pi*grid^{-1}*(seq(1,grid) - (m+1))

Next, we compute the frf for the ideal low-pass target, and calculate the periodogram.

frf.psi <- rep(0,grid)
frf.psi[abs(freq.ft) <= mu] <- 1
frf.psi <- matrix(frf.psi,nrow=1) %x% diag(N)
frf.psi <- array(frf.psi,c(N,N,grid))
spec.hat <- mdfa.pergram(x.sim[(len+1):(T.sim-len),],1)

Finally, we compute and apply the MDFA filter.

lp.mdfa <- mdfa.unconstrained(frf.psi,spec.hat,q)
x.trend.mdfa <- mvar.filter(x.sim,lp.mdfa[[1]])[(len-q+2):(T.sim-q+1-len),]

The ideal trend can now be compared to the MDFA trend.

par(oma=c(2,0,0,0),mar=c(2,4,2,2)+0.1,mfrow=c(3,1),cex.lab=.8)
plot(ts(x.trend.ideal[,1]),ylab="",xlab="",yaxt="n",xaxt="n")
axis(1,cex.axis=.5)
axis(2,cex.axis=.5)
lines(x.trend.mdfa[,1],col=grey(.7))
plot(ts(x.trend.ideal[,2]),ylab="",xlab="",yaxt="n",xaxt="n")
axis(1,cex.axis=.5)
axis(2,cex.axis=.5)
lines(x.trend.mdfa[,2],col=grey(.7))
plot(ts(x.trend.ideal[,3]),ylab="",xlab="",yaxt="n",xaxt="n")
axis(1,cex.axis=.5)
axis(2,cex.axis=.5)
lines(x.trend.mdfa[,3],col=grey(.7))
mtext("Time", side = 1, line = 1,outer=TRUE)
invisible(dev.off())

The in-sample MSE for the three components is

print(c(mean((x.trend.ideal[,1] - x.trend.mdfa[,1])^2),
    mean((x.trend.ideal[,2] - x.trend.mdfa[,2])^2),
    mean((x.trend.ideal[,3] - x.trend.mdfa[,3])^2)))
[1] 0.29050583 0.08095840 0.02257534

which is close to the criterion value 0.2926971, 0.0814786, 0.0226081. The gain and phase delay functions are computed next; we use mdfa.frf, which compute the frf of a given multivariate filter at a specified mesh size of Fourier frequencies.

frf.psi <- frf.psi[1,1,]
gain.psi <- abs(frf.psi)
phased.psi <- Arg(frf.psi)/freq.ft
lp.frf <- mdfa.frf(lp.mdfa[[1]],0,T)
lp.gain1 <- abs(lp.frf[1,1,])
lp.gain2 <- abs(lp.frf[2,2,])
lp.gain3 <- abs(lp.frf[3,3,])
lp.phased1 <- -Arg(lp.frf[1,1,])/freq.ft
lp.phased2 <- -Arg(lp.frf[2,2,])/freq.ft
lp.phased3 <- -Arg(lp.frf[3,3,])/freq.ft

These are displayed below:

par(oma=c(2,0,0,0),mar=c(2,4,2,2)+0.1,mfrow=c(3,1),cex.lab=.8)
plot(ts(gain.psi,start=-1,frequency=m),col=1,ylim=c(0,1),main="Gain",
    ylab="",xlab="",yaxt="n",xaxt="n")
axis(1,cex.axis=.5)
axis(2,cex.axis=.5)
lines(ts(lp.gain1,start=-1,frequency=m),col="orange")
lines(ts(lp.gain2,start=-1,frequency=m),col="green")
lines(ts(lp.gain3,start=-1,frequency=m),col="violet")
plot(ts(phased.psi,start=-1,frequency=m),col=1,
    ylim=c(0,max(na.exclude(lp.phased1),na.exclude(lp.phased2),
    na.exclude(lp.phased3))),main="Phase Delay",
    ylab="",xlab="",yaxt="n",xaxt="n")
axis(1,cex.axis=.5)
axis(2,cex.axis=.5)
lines(ts(lp.phased1,start=-1,frequency=m),col="orange")
lines(ts(lp.phased2,start=-1,frequency=m),col="green")
lines(ts(lp.phased3,start=-1,frequency=m),col="violet")
plot(ts(rep(NA,T),start=-1,frequency=m),col=1,
    ylim=c(0,max(Re(spec.hat[1,1,]),Re(spec.hat[2,2,]),Re(spec.hat[3,3,]))/6),
    main="Periodogram",ylab="",xlab="",yaxt="n",xaxt="n")
axis(1,cex.axis=.5)
axis(2,cex.axis=.5)
lines(ts(Re(spec.hat[1,1,]),start=-1,frequency=m),col="orange")
lines(ts(Re(spec.hat[2,2,]),start=-1,frequency=m),col="green")
lines(ts(Re(spec.hat[3,3,]),start=-1,frequency=m),col="violet")
mtext("Cycles",side = 1,line = 1,outer=TRUE)
invisible(dev.off())

Figure 3.3: Gain functions (upper panel), Phase Delay Functions (center panel), and Periodograms (bottom panel) for series one (orange), two (green), and three (violet).

A visual inspection of Figure 3.2, regarding the trends and trend estimates in Exercise 3.4, indicates an apparent conflict with the criterion values: although the first series has the largest MSE, the fit of the concurrent estimator to the target trend appears best. This is because the task of the filter for the first series is easiest, because a higher degree of smoothness must be captured – whereas, in contrast, a noisy target is harder to replicate in a mean square error sense. Another feature is that the real-time estimates appear to be systematically shifted to the right (they are delayed); the first series seems to be least affected. These observations indicate that the difficulty of the estimation task depends on the DGP (as specified by the entries of \(\Phi\)): larger eigenvalues correspond to greater persistence of the process, and an easier estimation problem. In contrast, small eigenvalues correspond to a noisier process, and a harder estimation problem.

These properties are further confirmed by the gain and phase delay functions displayed in Figure 3.3. The noise in real-time estimates \(\widehat{Y}_t\) is due to the incomplete matching of the estimated gain (orange, green, or violet lines in the upper panel) to the ideal gain function (black); note that the concurrent filters allow some content at frequencies greater than \(\mu = \pi/6\) to penetrate. On the other hand, the observed delay in the real-time estimates can be explained through the fact that the phase delay functions of the concurrent filters (orange, green, or violet lines in the center panel) do not vanish, unlike the ideal filter’s phase delay (black). Chapter 6 proposes a more general optimization paradigm that will address these issues explicitly.

Also observe that the phase delay function of the first series (orange line, center panel), which has the strongest autocorrelation, remains comparatively small. Its gain function (orange line, upper panel) is the farthest away from the target in the stop-band \(|\omega| >\pi/6\), but most closely resembles the target in the pass-band \(|\omega| \leq\pi/6\). Apparently, the optimization criterion concedes poorer high-frequency damping to obtain improved pass-band properties. In summary, \(\widehat{\Psi}\) tracks \(\Psi\) towards the pivotal frequencies, i.e., those that are important to the process’ dynamics, as quantified by the periodogram (bottom panel) in Figure 3.3. Similar findings apply to the other two series.

3.4 Qualitative Easing by Leading Indicators: an Empirical Study

In this section we quantify performance gains obtained by inclusion of a leading indicator into a univariate design. In particular, consider the process \[ \begin{align} X_{t,1} & = \phi \, X_{t-1,1} + \epsilon_{t,1} \notag \\ X_{t,2} & = X_{t+\delta,1} + \sigma \, \epsilon_{t,2}, \end{align} \tag{3.15}\] where \(\{ \epsilon_t \}\) is i.i.d. with mean zero and identity covariance matrix. Clearly, \(\{ X_{t,2} \}\) is a leading indicator of \(\{ X_{t,1} \}\) when the time-shift \(\delta > 0\). The scaling factor \(\sigma\) determines the extent to which the indicator is effective, with larger values of \(\sigma\) implying that the indicator is less informative about the target \(X_{t,1}\).

3.4.1 Bivariate MDFA versus Univariate DFA

Here we select \(\sigma=1\), corresponding to a weak idiosyncratic component, and set \(\delta=1\) so that the indicator leads by one time unit.

Exercise 3.5 (Strong Leading Indicator.) Simulate a sample of size \(T=200\) from the process Equation 3.15 with \(\phi = .9\), \(\delta = 1\), and \(\sigma = 1\). The target is one-step ahead forecasting of \(\{ X_{t,1} \}\), i.e., \(Y_t = X_{t+1,1}\). Apply univariate DFA by specializing the MDFA methodology, and compare to results obtained from MDFA (Proposition 3.2), in each case setting \(q=20\). Apply both concurrent filters to the simulation, and compare the relevant portions to the actual target. Also determine the in-sample performance, in comparison to the criterion value Equation 3.12, for both the DFA and MDFA methods. Compare the results graphically.

Solution

We begin with a simulation of length \(200\).

set.seed(654)
T.sim <- 200
N <- 2
phi.matrix <- matrix(.9,c(1,1))
innovar.matrix <- diag(1)
y.sim <- varp.sim(array(phi.matrix,c(1,1,1)),innovar.matrix,T.sim+1)
sigma <- 1
z.sim <- y.sim[-1] + sigma*rnorm(T.sim)
x.sim <- cbind(y.sim[-(T+1)],z.sim)

Now we set up MDFA, first selecting the filter length \(q=20\) and computing the Fourier frequencies.

q <- 20
grid <- T.sim
m <- floor(grid/2)
lambda.ft <- exp(-1i*2*pi*grid^{-1}*(seq(1,grid) - (m+1)))

Next, we compute the frf for 1-step ahead forecasting, and calculate the periodogram.

frf.psi <- matrix(lambda.ft^{-1},nrow=1) %x% diag(N)
frf.psi <- array(frf.psi,c(N,N,grid))
spec.hat <- mdfa.pergram(x.sim,1)

The MDFA and DFA filters are computed next, and applied to the simulation.

fore.mdfa <- mdfa.unconstrained(frf.psi,spec.hat,q)
fore.udfa <- mdfa.unconstrained(frf.psi[1,1,,drop=FALSE],spec.hat[1,1,,drop=FALSE],q)
x.fore.mdfa <- mvar.filter(x.sim,fore.mdfa[[1]])
x.fore.udfa <- mvar.filter(x.sim[,1,drop=FALSE],fore.udfa[[1]])

The MDFA and DFA forecasts of the first series are displayed against the target, which is the simulation advanced one step forward:

par(mar=c(4,4,2,2)+0.1,cex.lab=.8,mfrow=c(1,1))
plot(ts(x.sim[(q+1):T.sim,1]),ylab="",xlab="Time",yaxt="n",xaxt="n",col=grey(.7))
axis(1,cex.axis=.5)
axis(2,cex.axis=.5)
lines(x.fore.mdfa[-(T.sim-q+1),1],col=grey(.5),lty=1)
lines(x.fore.udfa[-(T.sim-q+1),1],col=1,lty=2)
invisible(dev.off())

Figure 3.4: One-step ahead forecasts based upon MDFA (black solid) and univariate DFA (dashed dark grey), with target in solid light grey.

The in-sample MSE for MDFA and DFA, focusing on the first component, is

print(c(mean((x.sim[(q+1):T.sim,1] - x.fore.mdfa[-(T.sim-q+1),1])^2),
    mean((x.sim[(q+1):T.sim,1] - x.fore.udfa[-(T.sim-q+1)])^2)))
[1] 0.8386949 0.9194815

which is close to the criterion value .

We see that there is a substantial improvement to performance of the MDFA over the univariate DFA; this can be visualized by the tracking of the target shown in Figure 3.4. This is possible because the MDFA filter assigns more weight to the second series (the leading indicator), which is not available to the univariate DFA.

3.4.2 Measuring Lead and Signal-to-Noise Effects of a Leading Indicator

Intuitively, increasing \(\delta\) or \(\sigma\) should result in a harder forecasting problem: \(1/ \sigma\) measures signal-to-noise ratio (snr), and low values indicate that real-time signal extraction becomes more difficult. On the other hand, a high lead time \(\delta\) requires one to do long-term forecasting, which is known to be difficult. Through the fabricated process (Equation 3.15) we can disentangle the conflict between increasing \(\delta\) and decreasing \(\sigma\). By allowing for non-integer shifts \(\delta_j=j/4\), \(j=0,1,2,3,4\) we can quantify real-time forecasting performance. Note that if the time units are annual, then the \(\delta_j\) correspond to quarterly forecasts; more generally, taking \(\delta < 1\) corresponds to now-casting of a time series, and has many practical applications. The target filter has frf of the form \[ \Psi (e^{-i \omega}) = \exp \{ i \, \omega \delta \}, \] which can be immediately implemented in the frequency domain (whereas in time domain, the filter is difficult to express when \(\delta\) is non-integer).

Exercise 3.6 (Now-casting with a Leading Indicator.) Simulate a sample of size \(T=2500\) from the process Equation 3.15 with \(\phi = .9\), \(\delta_j=j/4\), \(j=0,1,2,3,4\) and \(\sigma = 0,0.1,0.5,1,2\). The target is \(\delta\)-step ahead nowcasting of \(\{ X_{t,1} \}\), i.e., \(Y_t = X_{t+\delta,1}\). Filter to obtain the now-cast target, so as to retain a target series of length \(500\). Combine with the leading indicator, and apply univariate DFA and MDFA methodology (Proposition 3.2), in each case setting \(q=20\). Apply both concurrent filters to the simulation, and compare the relevant portions to the actual target. Record the criterion values (Equation 3.12) for both the DFA and MDFA methods.

Solution

The basic solution of Exercise 3.5 is applied, where we loop over various choices of \(\delta\) and \(\sigma\). We begin with some global settings:

set.seed(654)
T.sim <- 2500
len <- 1000
N <- 2
phi.matrix <- matrix(.9,c(1,1))
innovar.matrix <- diag(1)

Next, obtain the Fourier frequencies.

grid.big <- T.sim
m <- floor(grid.big/2)
lambda.ft.big <- exp(-1i*2*pi*grid.big^{-1}*(seq(1,grid.big) - (m+1)))
grid.small <- T.sim - 2*len
m <- floor(grid.small/2)
lambda.ft.small <- exp(-1i*2*pi*grid.small^{-1}*(seq(1,grid.small) - (m+1)))

Now we define some matrices and the MDFA filter length.

delta.vals <- c(0,1,2,3,4)/4
sigma.vals <- c(0,.1,.5,1,2)
critmdfa.mat <- matrix(0,5,5,dimnames=list(c(0,1,2,3,4)/4,sigma.vals))
critudfa.mat <- matrix(0,5,5,dimnames=list(c(0,1,2,3,4)/4,sigma.vals))
q <- 20

Next, loop and do the main calculations. The function mdfa.coeff computes filter coefficients from a given frf.

y.sim <- varp.sim(array(phi.matrix,c(1,1,1)),innovar.matrix,T.sim)
for(delta in delta.vals) 
{

  frf.psi.big <- matrix(lambda.ft.big^{-delta},nrow=1)
  frf.psi.big <- array(frf.psi.big,c(1,1,grid.big))
  nowcast.filter <- mdfa.coeff(frf.psi.big,-len,len)
  y.target <- mvar.filter(y.sim,nowcast.filter[1,1,,drop=FALSE])

  for(j in 1:5) 
  {

    sigma <- sigma.vals[j]
    z.sim <- y.target + sigma*rnorm(T.sim-2*len)
    x.sim <- cbind(y.sim[(len+1):(T.sim-len)],z.sim)
    frf.psi.small <- matrix(lambda.ft.small^{-delta},nrow=1) %x% diag(N)
    frf.psi.small <- array(frf.psi.small,c(N,N,grid.small))
    spec.hat <- mdfa.pergram(x.sim,1)

    fore.udfa <- mdfa.unconstrained(frf.psi.small[1,1,,drop=FALSE],
                                    spec.hat[1,1,,drop=FALSE],q)
    x.fore.udfa <- mvar.filter(x.sim[,1,drop=FALSE],fore.udfa[[1]])
    if(j > 1) { 
        fore.mdfa <- mdfa.unconstrained(frf.psi.small,spec.hat,q) 
      x.fore.mdfa <- mvar.filter(x.sim,fore.mdfa[[1]])
    } else 
    { 
      fore.mdfa <- fore.udfa
      x.fore.mdfa <- cbind(x.fore.udfa,rep(0,(T.sim-2*len-q+1)))
    }

    i <- delta*4 + 1
    critmdfa.mat[i,j] <- round(fore.mdfa[[2]][1,1],digits=5)
    critudfa.mat[i,j] <- round(fore.udfa[[2]][1,1],digits=5)
    
  }
}

Finally, we display two tables that summarize the results.

tab <- as.table(critmdfa.mat)
dimnames(tab) <- list(delta = delta.vals, 
                      sigma  = sigma.vals) 
print(tab)

?(caption)

      sigma
delta        0     0.1     0.5       1       2
  0    0.00000 0.00000 0.00000 0.00000 0.00000
  0.25 0.06617 0.01175 0.05229 0.05579 0.06311
  0.5  0.29806 0.02053 0.13006 0.19737 0.27083
  0.75 0.64488 0.02176 0.16080 0.37165 0.53507
  1    0.98893 0.01723 0.19467 0.48642 0.76505
tab <- as.table(critudfa.mat)
dimnames(tab) <- list(delta = delta.vals, 
                      sigma  = sigma.vals) 
print(tab)

?(caption)

      sigma
delta        0     0.1     0.5       1       2
  0    0.00000 0.00000 0.00000 0.00000 0.00000
  0.25 0.06617 0.06617 0.06617 0.06617 0.06617
  0.5  0.29806 0.29806 0.29806 0.29806 0.29806
  0.75 0.64488 0.64488 0.64488 0.64488 0.64488
  1    0.98893 0.98893 0.98893 0.98893 0.98893

The results for MDFA and univariate DFA, respectively, are given in ?tbl-critmdfa-mat and ?tbl-critudfa-mat. Regarding the design of Exercise 3.6, we make the following comments. When \(\sigma = 0\) the leading indicator exactly matches the target; if \(\delta = 0\) as well, then \(X_{t,1} = X_{t,2}\) and there is redundancy in the data – this will lead to a singularity in the periodogram. When \(\delta > 0\) (but \(\sigma = 0\)) then \(\{ X_{t,1} \}\) and \(\{ X_{t,2} \}\) are perfectly coherent, as the relation \(X_{t,2} = B^{-\delta} \, X_{t,1}\) holds. This full coherency indicates a type of redundancy is still present, and the MDFA method is singular. Therefore, in these cases we restrict the MDFA to univariate DFA. Therefore, the first columns of ?tbl-critmdfa-mat and ?tbl-critudfa-mat are identical.

The first rows of ?tbl-critmdfa-mat and ?tbl-critudfa-mat are trivially zero, because when \(\delta = 0\) the target is observable, and hence both MDFA and DFA select the identity filter. Broadly, the patterns are what we would expect: increasing \(\delta\) and/or \(\sigma\) generates worse performance (higher MSE), although the MDFA is superior to DFA. The performance of MDFA relative to DFA worsens as \(\sigma\) increases, irrespective of \(\delta\), which makes sense: there is less benefit to the leading indicator when the snr is low, in which case DFA should generate a competitive real-time filter. In particular, reading across the rows of ?tbl-critudfa-mat we see the MSE is fairly constant – DFA does not utilize the leading indicator, so the variability here is due to the simulation seeds. As for the MDFA, decreased performance due to lower snr could be compensated by decreasing the lead-time \(\delta\). Again, when \(\sigma\) is low (second column of ?tbl-critmdfa-mat) the leading indicator is very useful, and the MSE does not depend greatly on \(\delta\). This pattern is opposite when \(\sigma\) is high (fifth column), as increased \(\delta\) negatively affects performance.

These results suggest the pertinence of a mixed-frequency approach, whereby information at differing sampling frequencies (such as monthly and quarterly data) is combined. The higher-frequency data stream could be used to update the filter for the lower-frequency time series; this is further discussed in Chapter 7.

3.5 Multivariate DFA with Multiple Targets

We now consider a slight generalization of the LPP, where the target is also multivariate.

3.5.1 LPP with Multiple Targets

Definition 3.2 A target is defined to be the output of any known linear filter acting on the data process, i.e., \(\{Y_t \}\) is a target time series corresponding to a given filter \(\Psi (L)\) acting on a given observed time series \(\{ X_t \}\) if and only if we can write \(Y_t = \Psi (L) \, X_t\) for all integers \(t\).

We revisit the examples considered in Chapter 2, but now incorporated with multiple targets.

Example 3.4 (Multi-step Ahead Forecasting.) Suppose that our goal is to forecast all of the component series \(h\) steps ahead, where \(h \geq 1\) is the given forecast lead. Hence the target is expressed as \(Y_t = X_{t+h}\) for all \(t \in \mathbb Z\). This target corresponds to \(\Psi (L) = L^{-h} \, 1_n\). Thus, \(\psi (\ell)\) is a \(n \times n\) matrix, each of which are zero except \(\psi (-h)\), which is given by \(1_n\).

Example 3.5 (Ideal Low-Pass.) The ideal low-pass target is the same for each series, and hence \[ \Psi (z) = \chi_{ [ -\mu, \mu ]} (\omega) \,1_n \] for some cutoff \(\mu \in (0, \pi)\) that separates the pass-band from the stop-band. The coefficients are given by \[ \psi (\ell) = \frac{ \sin (\ell \mu) }{ \pi \ell } \, 1_n \] for \(\ell \neq 0\) and \(\psi (0) = \mu/\pi \, 1_n\).

Example 3.6 (The Future VAR(1) Shock.) Consider the VAR(1) process Equation 2.11, and suppose our target is the future shock (or innovation), so that \[ Y_t = X_{t+1} - \Phi \, X_{t}. \] Hence \(\Psi (L) = L^{-1} 1_n - \Phi\). Note that this target is model-based; one must know the process to compute the target. Also, each component of the multivariate target depends on all the series: \[ Y_{t,j} = X_{t+1,j} - \sum_{k=1}^n \Phi_{jk} X_{t,k} \] for \(1 \leq j \leq n\).

As we see from these examples, the targets of real-time signal extraction are features of the stochastic process that are of interest to a particular user. Targets can be ad hoc (cf. Example 3.5) or model-based (cf. Example 3.6), and may depend upon all the components of \(X_t\).

Definition 3.3 The Linear Prediction Problem (LPP) seeks a linear estimate such that the filter error Equation 2.8 has mean zero, and such that the determinant of the filter error variance \(\mbox{Var} [ E_t ]\) is minimized.

The filter error variance matrix is referred to as the filter MSE; the diagonal entries correspond to the scalar LPPs considered earlier in this book. We now state the solution to the general LPP with multiple targets.

Proposition 3.4 Suppose that \(\{ X_t \}\) is mean zero and weakly stationary with Wold decomposition expressed as \(X_t = \Theta (L) \, \epsilon_t\), where \(\Theta (L)\) is invertible. Then the solution to the LPP posed by a target \(Y_t = \Psi (L) \, X_t\) is given by \[ \widehat{\Psi} (L) = \sum_{\ell \geq 0 } \psi (\ell) \, L^{\ell} + \sum_{\ell < 0 } \psi (\ell) \, { [ \Theta (L) ]}_{-\ell}^{ \infty } \, L^{\ell} \, {\Theta (L) }^{-1} = { [ \Psi (L) \Theta (L) ]}_0^{\infty} {\Theta (L) }^{-1}. \tag{3.16}\] Moreover, the MSE corresponding to this solution is given by \[ { \langle {[ \Psi (e^{-i \omega}) \Theta (e^{-i \omega}) ]}_{-\infty}^{-1} \Sigma { {[ \Psi (e^{-i \omega}) \Theta (e^{-i \omega}) ]}_{-\infty}^{-1} }^* \rangle }_0. \tag{3.17}\]

Proof. We first prove the first formula of Equation 3.16. In order for a linear solution to be MSE optimal, it is sufficient that the resulting error process be uncorrelated with the present and past data, denoted \(X_{t:}\). If we can show that the real-time signal extraction error process \(\{ E_t \}\) depends only on future innovations, then by the causality of \(\{ X_t \}\) the error process must be uncorrelated with \(X_{t:}\), establishing optimality. The filter error of the putative solution is given by \[ \begin{align} \Psi (L) - \widehat{\Psi} (L) & = \sum_{\ell < 0 } \psi (\ell) \, L^{\ell} \, \left( 1 - {[ \Theta (L) ]}_{-\ell}^{\infty} \, { \Theta (L) }^{-1} \right) \\ & = \sum_{\ell < 0 } \psi (\ell) \, L^{\ell} \, {[ \Theta (L) ]}_{0}^{ -(\ell + 1)} \, { \Theta (L) }^{-1}. \end{align} \] Applying this to \(\{ X_t \}\) yields \[ E_t = \sum_{\ell =1 }^{\infty} \psi (-\ell) \, {[ \Theta (L) ]}_0^{\ell - 1} \, \epsilon_{t + \ell }. \] Noting that \({[ \Theta (L) ]}_0^{\ell - 1}\) is an order \(\ell-1\) polynomial in \(L\), and is applied to \(\epsilon_{t+ \ell}\), it is apparent that \(E_t\) is a linear function of future innovations \(\{ \epsilon_{t+1}, \epsilon_{t+2}, \ldots \}\). For the second formula, we use the fact that \(\Theta (z)\) is a power series to obtain \[ \begin{align} \sum_{\ell < 0 } \psi (\ell) \, z^{\ell} \, {[ \Theta (z) ]}_{0}^{ -(\ell + 1)} & = \sum_{\ell = -\infty }^{-1} \psi (\ell) \, \sum_{k = -\infty}^{-\ell-1} \theta (k) z^{k + \ell} \\ & = \sum_{\ell = -\infty }^{-1} \psi (\ell) \, \sum_{k \geq 0} \theta (-\ell-1-k) z^{-1-k} \\ & = \sum_{k \geq 0} z^{-1-k} \sum_{\ell = -\infty }^{\infty} \psi (\ell) \, \theta (-\ell-1-k) \\ & = \sum_{k \geq 0} z^{-1-k} { \langle \Psi (z) \Theta (z) \rangle }_{-1-k} \\ & = { [ \Psi (z) \Theta (z) ]}_{- \infty}^{-1}. \end{align} \] From this expression it follows that \[ \widehat{\Psi} (z) = \Psi (z) - { [ \Psi (z) \Theta (z) ]}_{- \infty}^{-1} { \Theta (z)}^{-1} = { [ \Psi (z) \Theta (z) ]}_{0}^{\infty} { \Theta (z)}^{-1}. \] This calculation also yields the variance of \(E_t\), and the expression for the minimal MSE. \(\quad \Box\)

3.5.2 MDFA for Multiple Targets

Suppose that the causal filters of interest belong to a class \(\mathcal{G}\) described by a vector parameter \(\vartheta\) belonging to a parameter manifold. We now generalize the definition of \(\mathcal{G}\) given by Equation 3.5, where now the filters are \(n \times n\). Likewise, we generalize the real-time estimation error given in Equation 2.8 by noting that now \(E_t\) is an \(n\)-dimensional vector. Hence Equation 3.6 becomes \[ \mathbb E[ E_t \, E_t^{\prime} ] = { \langle \left[ \Psi ( e^{-i \omega} ) - \widehat{\Psi}_{\vartheta} (e^{-i \omega}) \right] \, F (\omega) \, { \left[ \Psi (e^{i \omega}) - \widehat{\Psi}_{\vartheta} (e^{i \omega}) \right] }^{\prime} \rangle }_0. \tag{3.18}\] This suggests as a generalization of Equation 3.7 the criterion function \(\det D_{\Psi} (\vartheta, G)\) for any Hermitian function \(G\), defined via \[ D_{\Psi} (\vartheta, G) = { \langle \left[ \Psi (e^{-i \omega}) - \widehat{\Psi}_{\vartheta} (e^{-i \omega}) \right] \, G (\omega) \, { \left[ \Psi (e^{i \omega}) - \widehat{\Psi}_{\vartheta} (e^{i \omega}) \right] }^{\prime} \rangle }_0. \tag{3.19}\] In the following development, setting \(G = F\) yields an ideal criterion based on the process, whereas setting \(G = \widehat{F}\) (the periodogram) yields an empirical criterion, providing estimates that we can compute from data. Taking the determinant of Equation 3.19 yields the MDFA criterion function. In the case \(n=1\) we recover the univariate DFA; Proposition A.3 of T. S. McElroy and Wildi (2020) shows that by including additional time series the MDFA will improve over DFA for any filter class \(\mathcal{G}\).

The best possible concurrent filter is given by \(\widehat{\Psi}_{\vartheta (F)}\), where \(\vartheta (F)\) is a minimizer of \(\det D_{\Psi} (\vartheta, F)\). This \(\vartheta (F)\) is the Pseudo-True Value for the filter parameter. A case of interest arises from taking a very broad class \(\mathcal{G}\), namely let \(\mathcal{G}\) consist of all length \(q\) concurrent filters (\(\mathcal{M}_q\)), with \(\vartheta = \mbox{vec} [ P^{\prime}]\) and \[ P = {\left[ \widehat{\psi} (0), \widehat{\psi} (1), \ldots, \widehat{\psi} (q-1) \right] }^{\prime}. \tag{3.20}\] So \(P\) is a \(q n \times n\) dimensional matrix. Then the criterion Equation 3.19 can be rewritten as \[ D_{\Psi} (\vartheta, G) = P^{\prime} \, B \, P - P^{\prime} \, A - A^{\prime} \, P + { \langle \Psi (e^{-i \omega}) \, G (\omega) \, { \Psi (e^{i \omega}) }^{\prime} \rangle }_0, \tag{3.21}\] where \[ A^{\prime} = \left[ { \langle \Psi (e^{-i \omega}) \, G (\omega) \rangle }_{0}, { \langle \Psi (e^{-i \omega}) \, G (\omega) \rangle }_{1}, \ldots, { \langle \Psi (e^{-i \omega}) \, G (\omega) \rangle }_{q-1} \right], \tag{3.22}\] and \(B\) is a block matrix such that the \(jk\)th \(n \times n\) block of \(B\) is \({ \langle G \rangle }_{k-j}\) for \(1 \leq j,k \leq q\). (Because \(G\) is Hermitian, \({ \langle G \rangle }_{k-j}\) is real, and it follows that \(A\) is real as well.)

Proposition 3.5 The minimizer of the MDFA criterion given by the determinant of Equation 3.19 with respect to \(\mathcal{G} = \mathcal{M}_q\) is \(P (G) = B^{-1} \, A\) (expressing \(P\) as a function of \(G\)), where the \(jk\)th block of \(B\) is \({ \langle G \rangle }_{k-j}\), and \(A\) is given by Equation 3.22. The minimal value is the determinant of \[ { \langle \Psi (e^{-i \omega}) \, G (\omega) \, { \Psi (e^{i \omega}) }^{\prime} \rangle }_0 - A^{\prime} \, B^{-1} \, A. \tag{3.23}\]

Proof. First note that the typical component of \(A\) has the form \[ { \langle \Psi (z) \, G \rangle }_{\ell} = \sum_{k = - \infty}^{\infty} \psi (k) \, { \langle G \rangle }_{\ell-k} \tag{3.24}\] for \(0 \leq \ell < q\), which shows that \(A\) is real-valued. The argument follows the same method as in T. McElroy and Findley (2014); each entry of the matrix objective function is a quadratic in \(P\), and therefore the minimizer is obtained by computing the gradient and Hessian, which are \(-2 A + 2 B \, P\) and \(2 B\) respectively, yielding the solution. Plugging back into \(D_{\Psi}\) yields Equation 3.23. \(\quad \Box\)

Example 3.7 (One-step Ahead Forecasting.) We now generalize the result of Example 3.2 by considering multiple series at once. Consider the one-step ahead forecasting of stationary time series, where \(\mathcal{G} = \mathcal{M}_q\) (i.e., the set of moving average filters of length \(q\)), where \(\vartheta = \mbox{vec} [{\widehat{\psi} (0) }^{\prime}, {\widehat{\psi} (1) }^{\prime}, \ldots, {\widehat{\psi} (q-1) }^{\prime} ]\). With \(\Psi (z) = z^{-1} 1_n\), from Equation 3.19 we have \[ \begin{align} D_{\Psi} (\vartheta, G) & = { \langle \left[ e^{i \omega} 1_n - \widehat{\Psi}_{\vartheta} (e^{-i \omega}) \right] \, G (\omega) \, { \left[ e^{-i \omega} 1_n - \widehat{\Psi}_{\vartheta} (e^{i \omega}) \right] }^{\prime} \rangle }_0 \\ & = { \langle \left[ 1_n - \sum_{\ell = 0}^{q-1} \widehat{\psi} (\ell) \, e^{-i \omega (\ell+1) } \right] \, G \, { \left[ 1_n - \sum_{\ell = 0}^{q-1} \widehat{\psi} (\ell) \, e^{i \omega (\ell+1 ) } \right] }^{\prime} \rangle }_0 \\ & = { \langle G \rangle }_0 - 2 \, P^{\prime} \, { \langle G \rangle }_{1:q} + P^{\prime} \, B \, P. \end{align} \] Hence the optimizer is \(P (G) = B^{-1} \, { \langle G \rangle }_{1:q}\), which is the first component of the solution to the Yule-Walker system of order \(q\) determined by \(G\). Therefore the MDFA solution is the same as the fit of a VAR(\(q\)) using Proposition 3.4.

We designate the resulting prediction function \(\widehat{\Psi}_{ {\vartheta} (G)}\) as a Linear Prediction Filter (LPF). Again, when \(G=F\) this LPF is a theoretical object, but when \(G = \widehat{F}\) the LPF can be constructed directly from the sample. When \(\mathcal{G}\) is large enough to include the optimal MB filter \(\widehat{\Psi}\) of Proposition 3.4, then \(\widehat{\Psi}_{ {\vartheta} (F)}\) corresponds to this \(\widehat{\Psi}\) (assuming the model is correctly specified).

Example 3.8 (VAR(\(1\)).) We revisit the VAR(1) Example 3.3. Now the solution given by Proposition 3.5 can be compared to that of the LPP, which has the first \(q\) components given by \(\varphi^{\prime} = [ \psi (0) + A_{\Psi} (\Phi), \psi (1), \ldots, \psi (q-1) ]\). This is an approximate solution to the system \(P^{\prime} \, B = A^{\prime}\), because \(\varphi^{\prime} \, B\) has \(j+1\)th component, for \(0 \leq j \leq q-1\), equal to \(\sum_{\ell=0}^{q-1} \psi (\ell) \, {\langle G \rangle }_{j-\ell} + A_{\Psi} (\Phi) \, \Gamma_j\). Noting that \[ A_{\Psi} (\Phi) \, \Gamma_j = \sum_{\ell = -1 }^{-\infty} \psi (-\ell) \, \Phi^{-\ell} \, \Gamma_j = \sum_{\ell = -1}^{\infty} \psi (-\ell) \, \Gamma_{j- \ell}, \] because for a VAR(\(1\)) process \(\Gamma_h = \Phi^h \, \Gamma_0\) when \(h \geq 0\), we see that component \(j+1\) of \(\varphi^{\prime} \, B\) is \[ \sum_{\ell =0 }^{ q-1} \psi (\ell) \, \Gamma_{j-\ell} = {[ A^{\prime} ]}_{j+1} - \sum_{\ell = q}^{\infty} \psi (\ell) \, \Gamma_{j-\ell}. \] As \(q \rightarrow\infty\) the error term vanishes (for each \(j\)), indicating that \(\varphi^{\prime} \, B \approx A^{\prime}\), or \(P \approx \varphi\).

For computation, we update the treatment given earlier to now discuss the case of a multivariate filter output. To compute the quantities given in Proposition 3.5 and the MDFA criterion Equation 3.19, we approximate each integral by an average over Fourier frequencies: \[ T^{-1} \, \sum_{t=1}^T E_t \, E_t^{\prime} = T^{-1} \sum_{j=-[T/2]}^{T-[T/2]-1} \widehat{F}_{E} (\omega_{j}), \] where \(\widehat{F}_{E}\) is the periodogram of the filter errors and \(\omega_j = 2 \pi \, j/T\) is a Fourier frequency. The right hand side (with \(\widehat{F}_X\) denoting the periodogram of the process) is approximated by \[ T^{-1} \sum_{j=-[T/2]}^{T-[T/2]-1} \left[ \Psi (e^{-i \omega_{j} }) - \widehat{\Psi} (e^{-i \omega_{j} }) \right] \, \widehat{F}_X (\omega_{j}) \, {\left[ \Psi (e^{i \omega_{j} }) - \widehat{\Psi}( e^{i \omega_{j} }) \right]}^{\prime}. \] This is exactly the criterion \(D_{\Psi} (\vartheta, \widehat{F}_X)\) of Equation 3.19 with the integrals replaced by Riemann sums over the Fourier frequencies. With this justification, we see that the entries of the matrix \(B\) in Proposition 3.5 are approximately computed via \[ B_{j,k} \approx T^{-1} \sum_{\ell=-[T/2]}^{T-[T/2]-1} G (\omega_{\ell}) \, \exp \{ i \, (k-j) (\omega_{\ell}) \} \] for \(1 \leq j,k \leq T\). Moreover, for \(0 \leq k \leq T-1\) \[ A_k^{\prime} \approx T^{-1} \sum_{\ell=-[T/2]}^{T-[T/2]-1} \Psi ( e^{ -i \omega_{\ell} }) \, G (\omega_{\ell}) \, e^{ i k \omega_{\ell} }, \] where \(A^{\prime} = [ A_0^{\prime}, \ldots, A_{T-1}^{\prime} ]\). Finally, \[ { \langle \Psi (e^{-i \omega}) \, G (\omega) \, { \Psi (e^{i \omega}) }^{\prime} \rangle }_0 \approx T^{-1} \sum_{\ell=-[T/2]}^{T-[T/2]-1} \Psi ( e^{ -i \omega_{\ell} } ) \, G (\omega_{\ell}) \, {\Psi ( e^{ i \omega_{\ell} } ) }^{\prime}. \]

McElroy, Tucker S, and Dimitris N Politis. 2019. Time Series: A First Course with Bootstrap Starter. CRC Press.
McElroy, Tucker S, and Marc Wildi. 2020. “The Multivariate Linear Prediction Problem: Model-Based and Direct Filtering Solutions.” Econometrics and Statistics 14: 112–30.
McElroy, Tucker, and David Findley. 2014. “Fitting Constrained Vector Autoregression Models.” In Empirical Economic and Financial Research: Theory, Methods and Practice, 451–70. Springer.
Taniguchi, Masanobu, and Yoshihide Kakizawa. 2012. Asymptotic Theory of Statistical Inference for Time Series. Springer Science & Business Media.
Wildi, Marc. 2005. Signal Extraction: Efficient Estimation, ’Unit-Root’-Tests and Early Detection of Turning-Points. Springer-Verlag.
———. 2007. “Real-Time Signal Extraction.” Lecture Notes in Economics and Mathematical Systems 547.
———. 2008. Real-Time Signal Extraction: Beyond Maximum Likelihood Principles. IDP-Working Book.