(A) 特異値分解を用いた方法¶

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from mpl_toolkits.mplot3d import Axes3D

(A-1) データの読み込み、偏差行列の作成¶

サンプルデータ：data/sample_data.csv を読み込む（30人の国語、数学、英語の点数）。
以下のような偏差行列($X$)を作成する（バーは各科目の平均）。

$$ \begin{align} X = \left[\begin{array}{cc} x_{11} - \overline x_1&x_{12} - \overline x_2 \\ \vdots&\vdots \\ x_{n1} - \overline x_1&x_{n2} - \overline x_2 \\ \end{array}\right] \end{align} $$

# データの読み込み
file = "./sample_data.csv"
# --> file = "data/dd/sample_data.csv"に変更のこと

X_org = pd.read_csv(file,encoding='shift_jis',header=0,index_col=None)# headerは一行目、index列はなし、の意味
print(X_org.head())

# 偏差行列にする（データは三列あるが、まずは簡単のため最初の二列の数値だけ取り出す)
columns = X_org.columns
data = X_org.values[:,0:2] # .valuesで数値を取り出す（ラベルは剥がす）

mean = data.mean(axis=0) #　平均
anom = data - mean #　平均偏差行列
m,n = anom.shape # 配列の大きさ

   国語  数学  英語 (金谷(2003)表6.2より)
0  49  51                   59
1  58  58                   63
2  64  56                   68
3  65  70                   77
4  54  45                   55

(A-2) 特異値分解¶

$m \times n$ $(m >n)$の偏差行列$X$について：

$$ \begin{align} X &= \sum_{k=1}^n \sigma_k \boldsymbol{u}_k \boldsymbol{v}_k^T \\ & = UDV^T \end{align} $$$$ U = (\boldsymbol{u}_1,...,\boldsymbol{u}_m), \ D = \left[\begin{array}{ccc} \sigma_1&\cdots&O \\ \vdots&\ddots&\vdots \\ O&\cdots&\sigma_n \\ &O& \\ \end{array}\right], \ V^T = \left( \begin{array}{ccc} \boldsymbol{v}_1 \\ \vdots \\ \boldsymbol{v}_n \\ \end{array} \right) $$

### 主成分ベクトル（EOF$k$）: $v_k$

### 主成分得点（PC$k$）: $\sigma_k \boldsymbol{u}_k$

　　元の偏差行列$X$の$v_k$への射影と考えれば良い：

　　$$ 　　X\boldsymbol{v}_k = \sigma_k \boldsymbol{u}_k 　　$$

特異値分解にはNumpyの関数numpy.linalg.svdを用いる。

[使い方] U, s, Vh = np.linalg.svd(a)
- a: （入力）行列
- U: （出力）直交行列。$k$番目の左特異ベクトルは$U[:,k]$
- s: （出力）特異値。降順にソートされて出力。
- Vh:（出力）直交行列。$k$番目の右特異ベクトルは$Vh[k,:]$

- 参考: https://docs.scipy.org/doc/numpy/reference/generated/numpy.linalg.svd.html

# 特異値分解(Numpyの関数を使う)
U, s, Vh = np.linalg.svd(anom)

# 描画
fig = plt.figure(figsize=(10,5))
ax = fig.add_subplot(121)
ax2 = fig.add_subplot(122)

# -------（左パネル）------------
## x,y軸をそれぞれ国語、算数の得点として、(1) それぞれの得点をプロット ＋ (2) 固有ベクトルを重ね書き

## (1) オリジナルデータを描画
ax.scatter(anom[:,0],anom[:,1])

## (2) 主成分ベクトル(EOF1, EOF2)を描画
colors = ["red","blue"]
labels = ["EOF1","EOF2"]
for i in range(2):
    
    #　主成分ベクトル．ベクトルの大きさは，主成分得点の標準偏差とする（Vhは大きさ１に規格化されていることに注意）
    x2 = s[i]*Vh[i,0]/m**0.5; y2 = s[i]*Vh[i,1]/m**0.5
    print('EOF',i,' = ', [x2, y2])
    
    ax.quiver([0],[0],[x2],[y2],color=colors[i],label ="EOF"+str(i),angles='xy',scale_units='xy',scale=1)
    
ax.legend()
ax.set_xlabel("Japanese"); ax.set_ylabel("Math")
ax.set_xlim(-20,20); ax.set_ylim(-20,20)

# -----（左パネル、ここから）-----


# -------(ここから　右パネル）-------
## x,y軸をそれぞれEOF1, EOF2として（基底を取り直して）、それぞれの主成分得点をプロット

pc1 = s[0]*U[:,0]
pc2 = s[1]*U[:,1]

#元の偏差行列の主成分ベクトルへの射影を計算しても良い
#V = Vh.T#V(=Vhの転置)
#eof1 = anom @ V[:,0]# "@" は行列の積
#eof2 = anom @ V[:,1]

ax2.scatter(pc1,pc2)
ax2.set_xlabel('PC1'); ax2.set_ylabel('PC2')
ax2.set_xlim(-20,20); ax2.set_ylim(-20,20)

# ------（右パネル、ここまで）------

plt.show()

EOF 0  =  [6.582483921460372, 7.416763851505768]
EOF 1  =  [-4.137607317351972, 3.6721856304291904]

（課題）三次元で同様の処理をして結果を描画せよ。¶

三次元領域でプロットするには、サブプロット領域の作成時に3Dオプションをつけるだけで良い：
- ax = fig.add_subplot(121,projection='3d')

%matplotlib notebookと最初に書いておけば（インターラクティブなグラフになり）マウスで視点を動かすことができる-> 動かないので、%matplotlib inlineに変更（インターアクティブにはならないが）

#%matplotlib notebook
%matplotlib inline

data = X_org.values # .valuesで数値を取り出す（ラベルは剥がす）

mean = data.mean(axis=0) #　平均
anom = data - mean #　平均偏差行列
m,n = anom.shape

# 特異値分解(Numpyの関数を使う)
U, s, Vh = np.linalg.svd(anom)

# 描画
fig = plt.figure(figsize=(10,5))

ax = fig.add_subplot(121,projection='3d')
ax2 = fig.add_subplot(122,projection='3d')

# -------（左パネル）------------
## x,y軸をそれぞれ国語、算数の得点として、(1) それぞれの得点をプロット ＋ (2) 固有ベクトルを重ね書き

## (1) オリジナルデータを描画
ax.scatter(anom[:,0],anom[:,1],anom[:,2])

## (2) 主成分ベクトル(EOF1, EOF2)を描画
colors = ["red","blue","green"]
labels = ["EOF1","EOF2","EOF3"]
for i in range(3):
    
    #　主成分ベクトル．ベクトルの大きさは，主成分得点の標準偏差とする
    x2 = s[i]*Vh[i,0]/m**0.5; y2 = s[i]*Vh[i,1]/m**0.5; z2 = s[i]*Vh[i,2]/m**0.5
    print('EOF',i,' = ', [x2, y2, z2])
    
    ax.quiver([0],[0],[0],[x2],[y2],[z2],color=colors[i],label ="EOF"+str(i))#,angles='xyz',scale_units='xyz',scale=1)
    
ax.legend()
ax.set_xlabel("Japanese"); ax.set_ylabel("Math");  ax.set_zlabel("English")
ax.set_xlim(-20,20); ax.set_ylim(-20,20); ax.set_zlim(-20,20)

# -----（左パネル、ここから）-----


# -------(ここから　右パネル）-------
## x,y軸をそれぞれEOF1, EOF2として（基底を取り直して）、それぞれの主成分得点をプロット

pc1 = s[0]*U[:,0]
pc2 = s[1]*U[:,1]

ax2.scatter(pc1,pc2)
ax2.set_xlabel('PC1'); ax2.set_ylabel('PC2')
ax2.set_xlim(-20,20); ax2.set_ylim(-20,20)

# ------（右パネル、ここまで）------

plt.show()

EOF 0  =  [6.577725325210972, 7.336368286043271, 7.158742315321274]
EOF 1  =  [4.072183626793399, -3.7452687775358355, 0.096520545218501]
EOF 2  =  [0.7744281403996967, 0.8024932952287641, -1.5339792199307516]

(B) 共分散行列から求める方法¶

(1) 共分散行列を作る: $$ \begin{align} S_x = \frac{1}{m}\tilde X^T \tilde X \end{align} $$

(2) 固有値問題を解くことで、主成分ベクトル($\boldsymbol v_k$)を求める:

$$ \begin{align} S_x \boldsymbol v_k = \lambda \boldsymbol v_k \end{align} $$

(3) 偏差行列を$v_k$に射影することで$\sigma_k \boldsymbol u_k$を求める。

（課題）上記の手順により、（A）と同様の結果が得られることを確認せよ。¶

なお、固有値分解にはNumpyの関数numpy.linalg.eigを用いる。

[使い方] w, v = np.linalg.eig(a)
- a: （入力）行列
- w: （出力）固有値。ソートされていない。
- v:（出力）直交行列。$w[k]$に対応する固有ベクトルは$v[:,k]$

- 参考: https://docs.scipy.org/doc/numpy/reference/generated/numpy.linalg.eig.html

(B-1) 共分散行列の作成¶

data = X_org.values[:,0:2] # .valuesで数値を取り出す（ラベルは剥がす）
mean = data.mean(axis=0) #　平均
anom = data - mean #　平均偏差行列
m,n = anom.shape


# 共分散行列
Cov = 1/m*np.dot(anom.T,anom)

print(Cov)

[[60.44888889 33.62666667]
 [33.62666667 68.49333333]]

(B-2) 固有値分解¶

# 固有値を求める
w, v = np.linalg.eig(Cov)
#固有値iのi番目の固有ベクトルはv[:,i]
print("Singular values", (w*m)**0.5)
print("Eigenvectors:", v[:,0],v[:,1])

coeff=np.dot(anom[:,:],v[:,0].T)/w[0]**0.5

Singular values [30.30086217 54.31504781]
Eigenvectors: [-0.7479196   0.66378933] [-0.66378933 -0.7479196 ]

var = np.sum( anom.var(axis=0) )
print("Variance in original coorinate: ", var)
print("Variance in EOF1-2 coordinate: ", np.sum(w))

l = w.shape

fig = plt.figure(figsize=(10,5))
ax = fig.add_subplot(121)
ax2 = fig.add_subplot(122)

# 元データ
ax.scatter(anom[:,0],anom[:,1])

# 新しい軸
x0 = [0,0]; y0 = [0,0]

c = ["red","blue"]
labels = ["EOF1","EOF2"]
for i in range(len(w)):
    idx = np.argsort(w)[::-1][i] #降順に並べ直す
    print("index", idx)
    ll = w[idx]; print(w[idx])
    x2 = ll**0.5*v[0,idx]; y2 = ll**0.5*v[1,idx]
    #ax.plot([0,x2],[0,y2],color=c[i],label=labels[i])
    ax.quiver([0],[0],[x2],[y2],color=colors[i],label ="EOF"+str(i),angles='xy',scale_units='xy',scale=1)
#ax.quiver(x0,y0,x1,x2,c,angles='xy')

ax.set_title('Score of Japanese & Math')

ax.set_xlabel("Japanese"); ax.set_ylabel("Math")
ax.set_xlim(-20,20); ax.set_ylim(-20,20)


# -------(ここから　右パネル）-------
## x,y軸をそれぞれEOF1, EOF2として（基底を取り直して）、それぞれの主成分得点をプロット
#元の偏差行列の主成分ベクトルへの射影

pc1 = anom @ v[:,1]# "@" は行列の積, 固有値の順番を考慮する（2番目が第一モード）
pc2 = anom @ v[:,0]

ax2.scatter(pc1,pc2)
ax2.set_xlabel('PC1'); ax2.set_ylabel('PC2')
ax2.set_xlim(-20,20); ax2.set_ylim(-20,20)


ax.legend() #凡例をかく
plt.show()

Variance in original coorinate:  128.94222222222223
Variance in EOF1-2 coordinate:  128.94222222222223
index 1
98.33748060528703
index 0
30.6047416169352

(A)と(B)を比べて分かるように、固有ベクトルに符号の不定性はある。

（課題）EOF解析によって風の主軸を求める¶

# データの読み込み
file = "data/Maizuru_1hr/Maizuru_2017_uv.csv"
# -->適宜変更

X_org = pd.read_csv(file,encoding='shift_jis',index_col=[0],parse_dates=[0])
print(X_org.head())

# 偏差行列にする（データは三列あるが、まずは簡単のため最初の二列の数値だけ取り出す)
columns = X_org.columns
data = X_org.values[:,0:2] # .valuesで数値を取り出す（ラベルは剥がす）

                        zonal wind (m/s)  meridional wind (m/s)
hours since 2017/01/01                                         
1                               0.565685           5.656854e-01
2                               1.293431           5.357568e-01
3                              -0.739104           3.061467e-01
4                              -0.803635          -1.940147e+00
5                               1.100000           2.020667e-16

<ipython-input-8-a5f9a9992a35>:5: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  X_org = pd.read_csv(file,encoding='shift_jis',index_col=[0],parse_dates=[0])

mean = data.mean(axis=0) #　平均
anom = data - mean #　平均偏差行列
m,n = anom.shape

# 特異値分解(Numpyの関数を使う)
U, s, Vh = np.linalg.svd(anom)

# 描画
fig = plt.figure(figsize=(10,5))
ax = fig.add_subplot(121)
ax2 = fig.add_subplot(122)

# -------（左パネル）------------
## x,y軸をそれぞれ国語、算数の得点として、(1) それぞれの得点をプロット ＋ (2) 固有ベクトルを重ね書き

## (1) オリジナルデータを描画
ax.scatter(anom[:,0],anom[:,1])

print(anom[:,0].std(),anom[:,1].std())

## (2) 主成分ベクトル(EOF1, EOF2)を描画
colors = ["red","blue"]
labels = ["EOF1","EOF2"]
for i in range(2):
    
    #　主成分ベクトル．ベクトルの大きさは，主成分得点の標準偏差とする
    x2 = s[i]*Vh[i,0]/m**0.5; y2 = s[i]*Vh[i,1]/m**0.5
    print('EOF',i,' = ', [x2, y2])
    
    ax.quiver([0],[0],[x2],[y2],color=colors[i],label ="EOF"+str(i),angles='xy',scale_units='xy',scale=1)
    
ax.legend()
ax.set_xlabel("U-wind"); ax.set_ylabel("V-wind")
ax.set_xlim(-20,20); ax.set_ylim(-20,20)

# -----（左パネル、ここから）-----


# -------(ここから　右パネル）-------
## x,y軸をそれぞれEOF1, EOF2として（基底を取り直して）、それぞれの主成分得点をプロット

pc1 = s[0]*U[:,0]
pc2 = s[1]*U[:,1]

#元の偏差行列の主成分ベクトルへの射影を計算しても良い
#V = Vh.T#V(=Vhの転置)
#eof1 = anom @ V[:,0]# "@" は行列の積
#eof2 = anom @ V[:,1]

ax2.scatter(pc1,pc2,s=3)
ax2.set_xlabel('PC1'); ax2.set_ylabel('PC2')
ax2.set_xlim(-20,20); ax2.set_ylim(-20,20)

# ------（右パネル、ここまで）------

plt.show()

1.7929952618734795 2.3398401897119125
EOF 0  =  [-1.2697194370151959, -2.225582824897192]
EOF 1  =  [-1.265955986741467, 0.7222417897862975]