PRML演習問題 12.2(標準) - 機械学習基礎理論独習

問題

$(12.15)$ で与えられる主成分分析の歪み尺度 $J$ の、正規直交条件 $(12.7)$ の下での ${\bf u}_i$ に対する最小値は、
${\bf u}_i$ がデータ共分散行列 ${\bf S}$ の固有ベクトルであるときに得られることを示せ。
これを行うために、ラグランジュ乗数の行列 $\bf H$ を導入し、制約条件のそれぞれを取り込む。
その結果、歪み尺度の式は修正を受け、行列形式で表すと、

$\begin{eqnarray} \widetilde{J}={\rm Tr}\left(\widehat{\bf U}^\top{\bf S}\widehat{\bf U}\right)+{\rm Tr}\left({\bf H}({\bf I}-\widehat{\bf U}^\top\widehat{\bf U})\right)\tag{12.93} \end{eqnarray}$

のようになる。
ここで、 $\widehat{\bf U}$ は $D\times (D-M)$ 行列で、その列ベクトルは ${\bf u}_i$ で与えられる。
$\widehat{\bf U}$ についてこの $\widetilde{J}$ を最小化し、その解が ${\bf S}\widehat{\bf U}=\widehat{\bf U}{\bf H}$ を満たすことを示せ。
明らかに、可能なひとつの解は、 $\widehat{\bf U}$ の列ベクトルが $\bf S$ の固有ベクトルとなっている場合である。
その場合、 $\bf H$ は対応する固有値を持った対角行列となる。
一般的な解を得るために、 $\bf H$ が対称行列と仮定できることを示し、その固有ベクトル展開を用いるいることで、
${\bf S}\widehat{\bf U}=\widehat{\bf U}{\bf H}$ の一般解が、 $\bf U$ の列ベクトルを $\bf S$ の固有ベクトルに選ぶという特解と同じ $\widetilde{J}$ の値を与えることを示せ。
これらの解は等価なので、固有ベクトルの方の解を選んだ方が楽である。

参照

$\begin{eqnarray} {\bf u}_i^\top{\bf u}_j=\delta_{ij}\tag{12.7} \end{eqnarray}$

解答

一般的な $M,D(M < D)$ について考えます。

制約条件 ${\bf u}_i^\top{\bf u}_j=\delta_{ij}$ の下で、歪み尺度 $J$ を最小化します。
ラグランジュの未定乗数法を用いると、ラグランジュ関数は以下のようになります。

$\begin{eqnarray} \tilde{J}=\sum_{i=M+1}^D{\bf u}_i^\top{\bf S}{\bf u}_i+\sum_{i=M+1}^D\lambda_i(1-{\bf u}_i^\top{\bf u}_i)+\sum_{i=M+1}^{D-1}\sum_{j=i+1}^D\mu_{ij}(-{\bf u}_j^\top{\bf u}_i)\tag{1} \end{eqnarray}$

$\widehat{\bf U}\in\mathbb{R}^{D\times(D-M)}$ を以下のように定義します。

$\begin{eqnarray} \widehat{\bf U}=\begin{pmatrix}{\bf u}_{M+1} & \cdots & {\bf u}_D\end{pmatrix}\tag{2} \end{eqnarray}$

以下で、 $\tilde{J}$ を式変形します。

$\widehat{\bf U}^\top{\bf S}\widehat{\bf U}$ を計算します。

$\begin{eqnarray} \widehat{\bf U}^\top{\bf S}\widehat{\bf U}&=&\begin{pmatrix}{\bf u}_{M+1}^\top\\ \vdots \\ {\bf u}_D^\top\end{pmatrix}{\bf S}\begin{pmatrix}{\bf u}_{M+1} & \cdots & {\bf u}_D\end{pmatrix}\\ &=&\begin{pmatrix}{\bf u}_{M+1}^\top\\ \vdots \\ {\bf u}_D^\top\end{pmatrix}\begin{pmatrix}{\bf S}{\bf u}_{M+1} & \cdots & {\bf S}{\bf u}_D\end{pmatrix}\\ &=&\begin{pmatrix} {\bf u}_{M+1}^\top{\bf S}{\bf u}_{M+1} & \cdots & {\bf u}_{M+1}^\top{\bf S}{\bf u}_{D}\\ \vdots & \ddots & \vdots\\ {\bf u}_{D}^\top{\bf S}{\bf u}_{M+1} & \cdots & {\bf u}_{D}^\top{\bf S}{\bf u}_{D}\\ \end{pmatrix}\tag{3} \end{eqnarray}$

$\widehat{\bf U}^\top{\bf S}\widehat{\bf U}$ の $i$ 行 $i$ 列は以下のようになります。

$\begin{eqnarray} (\widehat{\bf U}^\top{\bf S}\widehat{\bf U})_{ii}={\bf u}_{M+i}^\top{\bf S}{\bf u}_{M+i}\tag{4} \end{eqnarray}$

よって、 $\widehat{\bf U}^\top{\bf S}\widehat{\bf U}$ のトレースは以下のようになります。

$\begin{eqnarray} {\rm Tr}\left(\widehat{\bf U}^\top{\bf S}\widehat{\bf U}\right)=\sum_{i=M+1}^D{\bf u}_i^\top{\bf S}{\bf u}_i\tag{5} \end{eqnarray}$

${\bf H}\in\mathbb{R}^{(D-M)\times(D-M)}$ を以下のように定義します。

$\begin{eqnarray} &&{\bf H}=\begin{pmatrix} \lambda_{M+1} & \frac{1}{2}\mu_{M+1,M+2} & \cdots & \frac{1}{2}\mu_{M+1,D}\\ \frac{1}{2}\mu_{M+1,M+2} & \lambda_{M+2} & \cdots & \vdots\\ \vdots & \vdots & \ddots & \vdots\\ \frac{1}{2}\mu_{M+1,D} & \cdots & \cdots & \lambda_D \end{pmatrix}\tag{6} \end{eqnarray}$

${\bf H}({\bf I}-\widehat{\bf U}^\top\widehat{\bf U})$ を計算します。

$\begin{eqnarray} &&{\bf H}({\bf I}-\widehat{\bf U}^\top\widehat{\bf U})\\ &=&{\bf H}\begin{pmatrix} 1-{\bf u}_{M+1}^\top{\bf u}_{M+1} & -{\bf u}_{M+1}^\top{\bf u}_{M+2} & \cdots & -{\bf u}_{M+1}^\top{\bf u}_{D}\\ -{\bf u}_{M+2}^\top{\bf u}_{M+1} & 1-{\bf u}_{M+2}^\top{\bf u}_{M+2} & \cdots & \vdots\\ \vdots & \vdots & \ddots & \vdots\\ -{\bf u}_{D}^\top{\bf u}_{M+1} & \cdots & \cdots & 1-{\bf u}_{D}^\top{\bf u}_{D}\\ \end{pmatrix}\\ &=&\begin{pmatrix} \lambda_{M+1} & \frac{1}{2}\mu_{M+1,M+2} & \cdots & \frac{1}{2}\mu_{M+1,D}\\ \frac{1}{2}\mu_{M+1,M+2} & \lambda_{M+2} & \cdots & \vdots\\ \vdots & \vdots & \ddots & \vdots\\ \frac{1}{2}\mu_{M+1,D} & \cdots & \cdots & \lambda_M \end{pmatrix} \begin{pmatrix} 1-{\bf u}_{M+1}^\top{\bf u}_{M+1} & -{\bf u}_{M+1}^\top{\bf u}_{M+2} & \cdots & -{\bf u}_{M+1}^\top{\bf u}_{D}\\ -{\bf u}_{M+2}^\top{\bf u}_{M+1} & 1-{\bf u}_{M+2}^\top{\bf u}_{M+2} & \cdots & \vdots\\ \vdots & \vdots & \ddots & \vdots\\ -{\bf u}_{D}^\top{\bf u}_{M+1} & \cdots & \cdots & 1-{\bf u}_{D}^\top{\bf u}_{D}\\ \end{pmatrix}\tag{7} \end{eqnarray}$

一般の $M,D$ において、 ${\rm Tr}({\bf H}({\bf I}-\widehat{\bf U}^\top\widehat{\bf U}))$ は以下のようになります。(ここの導出は下の方で補足しています。)

$\begin{eqnarray} {\rm Tr}({\bf H}({\bf I}-\widehat{\bf U}^\top\widehat{\bf U}))=\sum_{i=M+1}^D\lambda_{i}(1-{\bf u}_{i}^\top{\bf u}_{i})+\sum_{i=M}^{D-1}\sum_{j=i+1}^D\mu_{ij}(-{\bf u}_j^\top{\bf u}_i)\tag{8} \end{eqnarray}$

$(5),(8)$ より、 $(1)$ は次のように書けます。

$\begin{eqnarray} \tilde{J}={\rm Tr}\left(\widehat{\bf U}^\top{\bf S}\widehat{\bf U}\right)+{\rm Tr}({\bf H}({\bf I}-\widehat{\bf U}^\top\widehat{\bf U}))\tag{9} \end{eqnarray}$

$\tilde{J}$ を ${\bf U}$ で微分して、 $={\bf O}$ とおきます。

$\begin{eqnarray} &&\frac{\partial}{\partial{\bf U}}\tilde{J}={\bf O}\\ &&\Leftrightarrow\frac{\partial}{\partial{\bf U}}{\rm Tr}\left(\widehat{\bf U}^\top{\bf S}\widehat{\bf U}\right)+\frac{\partial}{\partial{\bf U}}{\rm Tr}({\bf H}({\bf I}-\widehat{\bf U}^\top\widehat{\bf U}))={\bf O}\\ &&\Leftrightarrow\frac{\partial}{\partial{\bf U}}{\rm Tr}\left(\widehat{\bf U}^\top{\bf S}\widehat{\bf U}\right)+\frac{\partial}{\partial{\bf U}}{\rm Tr}({\bf H})-\frac{\partial}{\partial{\bf U}}{\rm Tr}({\bf H}\widehat{\bf U}^\top\widehat{\bf U})={\bf O}\\ &&\Leftrightarrow{\bf S}\widehat{\bf U}+{\bf S}^\top\widehat{\bf U}-\left(\widehat{\bf U}{\bf H}+\widehat{\bf U}{\bf H}^\top\right)={\bf O}\\ &&\Leftrightarrow{\bf S}\widehat{\bf U}=\widehat{\bf U}{\bf H}\tag{10} \end{eqnarray}$

ここで、 ${\bf H},\widehat{\bf U}$ が以下の形で表されると仮定すると

$\begin{eqnarray} &&{\bf H}=\begin{pmatrix}\lambda_{M+1} & & 0\\ & \ddots & \\ 0 & & \lambda_D\end{pmatrix}\tag{11}\\ &&\widehat{\bf U}=\begin{pmatrix}{\bf u}_{M+1} & \cdots & {\bf u}_D\end{pmatrix}\tag{12} \end{eqnarray}$

$(10)$ は成立します。

この時、 $\tilde{J}$ は以下のようになります。

$\begin{eqnarray} \tilde{J}&=&{\rm Tr}\left(\widehat{\bf U}^\top{\bf S}\widehat{\bf U}\right)+{\rm Tr}({\bf H}({\bf I}-\widehat{\bf U}^\top\widehat{\bf U}))\\ &=&{\rm Tr}\left(\widehat{\bf U}^\top\widehat{\bf U}{\bf H}\right)+{\rm Tr}({\bf H})-{\rm Tr}({\bf H}\widehat{\bf U}^\top\widehat{\bf U})\\ &=&{\rm Tr}\left({\bf H}\widehat{\bf U}^\top\widehat{\bf U}\right)+{\rm Tr}({\bf H})-{\rm Tr}({\bf H}\widehat{\bf U}^\top\widehat{\bf U})\\ &=&{\rm Tr}({\bf H})\\ &=&\sum_{i=M+1}^D\lambda_i\tag{13} \end{eqnarray}$

したがって、 $\bf S$ の固有値のうち小さい方の $D-M$ 個に対応する固有ベクトルを選んだ時に歪み尺度 $J$ の最小値を得ることができます。
よって、主部分空間は $\bf S$ の固有値のうち大きい方の $M$ 個に対応する固有ベクトルを選ぶとよいことが分かります。

$(11),(12)$ を仮定しない場合の $(10)$ の解を ${\bf K},\widehat{\bf V}$ とします。

$\begin{eqnarray} {\bf S}\widehat{\bf V}=\widehat{\bf V}{\bf K}\tag{14} \end{eqnarray}$

$\bf K$ の固有値を $\nu_1\ldots,\nu_{D-M}$ 、固有ベクトルを $\boldsymbol\phi_1,\ldots,{\boldsymbol\phi}_{D-M}$ とします。
${\bf N},{\boldsymbol\Phi}$ を以下のようにおきます。

$\begin{eqnarray} &&{\bf N}=\begin{pmatrix}\nu_{1} & & 0\\ & \ddots & \\ 0 & & \nu_{D-M}\end{pmatrix}\tag{15}\\ &&{\boldsymbol\Phi}=\begin{pmatrix}{\boldsymbol\phi}_{1} & \cdots & {\boldsymbol\phi}_{D-M}\end{pmatrix}\tag{16} \end{eqnarray}$

このとき、

$\begin{eqnarray} {\bf K}{\boldsymbol\Phi}={\boldsymbol\Phi}{\bf N}\tag{17} \end{eqnarray}$

が成り立ちます。
$(23)$ の両辺に右から $\boldsymbol\Phi$ を掛けます。

$\begin{eqnarray} &&{\bf S}\widehat{\bf V}{\boldsymbol\Phi}=\widehat{\bf V}{\bf K}{\boldsymbol\Phi}\\ &&\Leftrightarrow{\bf S}\widehat{\bf V}{\boldsymbol\Phi}=\widehat{\bf V}{\boldsymbol\Phi}{\bf N}\\ &&\Leftrightarrow{\bf S}\tilde{\bf V}=\tilde{\bf V}{\bf N}\tag{18} \end{eqnarray}$

$(18)$ で $\tilde{\bf V}=\widehat{\bf V}{\boldsymbol\Phi}$ とおきました。
$(18)$ は $\nu_i$ が $\bf S$ の固有値であることを示しています。

この時、 $\tilde{J}$ は以下のようになります。

$\begin{eqnarray} \tilde{J}&=&{\rm Tr}\left(\widehat{\bf V}^\top{\bf S}\widehat{\bf V}\right)+{\rm Tr}({\bf K}({\bf I}-\widehat{\bf V}^\top\widehat{\bf V}))\\ &=&{\rm Tr}\left(\widehat{\bf V}^\top\widehat{\bf V}{\bf H}\right)+{\rm Tr}({\bf K})-{\rm Tr}({\bf K}\widehat{\bf V}^\top\widehat{\bf V})\\ &=&{\rm Tr}\left({\bf H}\widehat{\bf V}^\top\widehat{\bf V}\right)+{\rm Tr}({\bf K})-{\rm Tr}({\bf K}\widehat{\bf V}^\top\widehat{\bf V})\\ &=&{\rm Tr}({\bf K})\\ &=&\sum_{i=1}^{D-M}\nu_i\tag{19} \end{eqnarray}$

$(11),(12)$ を仮定してもしなくても、主部分空間は $\bf S$ の固有値のうち大きい方の $M$ 個に対応する固有ベクトルを選ぶとよいことが分かりました。

補足

$M=3,D=7$ として、 ${\rm Tr}({\bf H}({\bf I}-\widehat{\bf U}^\top\widehat{\bf U}))$ を計算します。

$\begin{eqnarray} &&{\rm Tr}({\bf H}({\bf I}-\widehat{\bf U}^\top\widehat{\bf U}))\\ &=&{\rm Tr}\left(\begin{pmatrix} \lambda_{4} & \frac{1}{2}\mu_{4,5} & \frac{1}{2}\mu_{4,6} & \frac{1}{2}\mu_{4,7}\\ \frac{1}{2}\mu_{4,5} & \lambda_{5} & \frac{1}{2}\mu_{5,6} & \frac{1}{2}\mu_{5,7}\\ \frac{1}{2}\mu_{4,6} & \frac{1}{2}\mu_{5,6} & \lambda_6 & \frac{1}{2}\mu_{6,7}\\ \frac{1}{2}\mu_{4,7} & \frac{1}{2}\mu_{5,7} & \frac{1}{2}\mu_{4,5} & \lambda_7 \end{pmatrix} \begin{pmatrix} 1-{\bf u}_{4}^\top{\bf u}_{4} & -{\bf u}_{4}^\top{\bf u}_{5} & -{\bf u}_{4}^\top{\bf u}_{6} & -{\bf u}_{4}^\top{\bf u}_{7}\\ -{\bf u}_{5}^\top{\bf u}_{4} & 1-{\bf u}_{5}^\top{\bf u}_{5} & -{\bf u}_{5}^\top{\bf u}_{6} & -{\bf u}_{5}^\top{\bf u}_{7}\\ -{\bf u}_{6}^\top{\bf u}_{4} & {\bf u}_{6}^\top{\bf u}_{5} & 1-{\bf u}_{6}^\top{\bf u}_{6} & -{\bf u}_{6}^\top{\bf u}_{7}\\ -{\bf u}_{7}^\top{\bf u}_{4} & {\bf u}_{7}^\top{\bf u}_{5} & -{\bf u}_{7}^\top{\bf u}_{6} & 1-{\bf u}_{7}^\top{\bf u}_{7}\\ \end{pmatrix}\right)\\ &=&(1-{\bf u}_{4}^\top{\bf u}_{4})\lambda_{4}-\frac{1}{2}\mu_{4,5}{\bf u}_{5}^\top{\bf u}_{4}-\frac{1}{2}\mu_{4,6}{\bf u}_{6}^\top{\bf u}_{4}-\frac{1}{2}\mu_{4,7}{\bf u}_{7}^\top{\bf u}_{4}\\ &&-\frac{1}{2}\mu_{4,5}{\bf u}_{4}^\top{\bf u}_{5}+(1-{\bf u}_{5}^\top{\bf u}_{5})\lambda_{5}-\frac{1}{2}\mu_{5,6}{\bf u}_{6}^\top{\bf u}_{5}-\frac{1}{2}\mu_{5,7}{\bf u}_{7}^\top{\bf u}_{5}\\ &&-\frac{1}{2}\mu_{4,6}{\bf u}_{4}^\top{\bf u}_{6}-\frac{1}{2}\mu_{5,6}{\bf u}_{5}^\top{\bf u}_{6}+(1-{\bf u}_{6}^\top{\bf u}_{6})\lambda_{6}-\frac{1}{2}\mu_{6,7}{\bf u}_{7}^\top{\bf u}_{6}\\ &&-\frac{1}{2}\mu_{4,7}{\bf u}_{4}^\top{\bf u}_{7}-\frac{1}{2}\mu_{5,7}{\bf u}_{5}^\top{\bf u}_{7}-\frac{1}{2}\mu_{6,7}{\bf u}_{6}^\top{\bf u}_{7}+(1-{\bf u}_{7}^\top{\bf u}_{7})\lambda_{7}\\ &=&\lambda_{4}(1-{\bf u}_{4}^\top{\bf u}_{4})+\lambda_{5}(1-{\bf u}_{5}^\top{\bf u}_{5})+\lambda_{6}(1-{\bf u}_{6}^\top{\bf u}_{6})+\lambda_{7}(1-{\bf u}_{7}^\top{\bf u}_{7})\\ &&-\mu_{4,5}{\bf u}_{5}^\top{\bf u}_{4}-\mu_{4,6}{\bf u}_{6}^\top{\bf u}_{4}-\mu_{4,7}{\bf u}_{7}^\top{\bf u}_{4}-\mu_{5,6}{\bf u}_{6}^\top{\bf u}_{5}-\mu_{5,7}{\bf u}_{7}^\top{\bf u}_{5}-\mu_{6,7}{\bf u}_{7}^\top{\bf u}_{6}\\ &=&\sum_{i=4}^7\lambda_{i}(1-{\bf u}_{i}^\top{\bf u}_{i})+\sum_{i=4}^6\sum_{j=i+1}^7\mu_{ij}(-{\bf u}_j^\top{\bf u}_i)\tag{8} \end{eqnarray}$