In this section, we explain the proposed method. First, before introducing the optimization problem, we explain our reason for using the inner product of the difference between correlation matrices, instead of using only the difference. The model of the proposed method is then introduced, and the optimization problem of the proposed method is presented. Finally, the simulation design of our numerical study and the fMRI data for a mental arithmetic task, for a demonstration of our proposed approach, are explained.

#### 2.1. Model of Proposed Method

Here, the model of the proposed method is explained. Let

${\mathit{R}}_{o}^{\left(1\right)}=\left({r}_{ijo}^{\left(1\right)}\right)\phantom{\rule{0.277778em}{0ex}}{r}_{ijo}^{\left(1\right)}\in [-1,1],\phantom{\rule{0.277778em}{0ex}}(i,j=1,2,\cdots ,p;\phantom{\rule{0.277778em}{0ex}}o=1,2,\cdots ,n)$, and

${\mathit{R}}_{o}^{\left(2\right)}=\left({r}_{ijo}^{\left(2\right)}\right)\phantom{\rule{0.277778em}{0ex}}{r}_{ijo}^{\left(2\right)}\in [-1,1],\phantom{\rule{0.277778em}{0ex}}(i,j=1,2,\cdots ,p;\phantom{\rule{0.277778em}{0ex}}o=1,2,\cdots ,n)$, be the correlation matrices between variables under condition 1 and condition 2, respectively, where

n is the number of subjects and

p is the number of variables. From

${\mathit{R}}_{o}^{\left(1\right)}\phantom{\rule{4pt}{0ex}}\mathrm{and}\phantom{\rule{4pt}{0ex}}{\mathit{R}}_{o}^{\left(2\right)}\phantom{\rule{0.277778em}{0ex}}(o=1,2,\cdots ,n)$, the difference is calculated as follows:

where

$\mathit{D}={({\mathit{d}}_{1},{\mathit{d}}_{2},\cdots ,{\mathit{d}}_{p})}^{T}=\left({d}_{ij}\right),\phantom{\rule{0.277778em}{0ex}}{d}_{ij}\in [-2,2]\phantom{\rule{0.277778em}{0ex}}(i,j=1,2,\cdots ,p)$. The clustering structure of

$\mathit{D}$ is assumed to be in the following form:

where

${\mathit{D}}_{\ell}\in {[-2,2]}^{{p}_{\ell}\times {p}_{\ell}}\phantom{\rule{0.277778em}{0ex}}(\ell =1,2,\cdots ,k)$ is a block matrix such that the absolute values of the elements tend to be higher than those of the elements of

${\mathit{O}}_{\ell s}\phantom{\rule{0.277778em}{0ex}}(\ell \ne s)$, where each element of

${\mathit{O}}_{\ell s}$ is 0. That is,

$\mathit{D}$ is assumed to be changed to Equation (

1) through the permutation of these rows and columns.

However, even if

$\mathit{D}$ has such a clustering structure, the structure is masked because the

${d}_{ij}$ values are observed with noise, and the corresponding correlations are bounded within the range

$-1$ to 1. Therefore, it is difficult to capture the clustering structure using low-rank approximation based only on

$\mathit{D}$. To overcome this problem, the inner product of

$\mathit{D}$ is focused upon because the clustering structure of the inner product is emphasized, rather than using only

$\mathit{D}$. Here, the inner product of

$\mathit{D}$ is defined as follows:

where the elements of

$\mathsf{\Delta}$ are defined as

${\delta}_{ij}\in \mathbb{R}\phantom{\rule{0.277778em}{0ex}}(i,j=1,2,\cdots ,p)$, and

${}^{T}$ indicates translocation. We focus on the inner product and construct the model based on low-rank approximation from

${\delta}_{ij}$. First, from the property of the inner product, for an arbitrary

${\delta}_{ij}$, there exists

${\theta}_{ij}\phantom{\rule{0.277778em}{0ex}}(0<{\theta}_{ij}\le 2\pi )$ such that

where

${\mathit{\delta}}_{j}={({\delta}_{1j},{\delta}_{2j},\cdots ,{\delta}_{pj})}^{T}\phantom{\rule{0.277778em}{0ex}}(j=1,2,\cdots ,p)$, and

$\parallel \xb7\parallel $ is the Euclidean norm. Here,

$cos{\theta}_{ij}\in [-1,1]$ can be considered a correlation, and the matrix representation is

Based on Equations (

3) and (

4), Equation (

2) is expressed as follows:

where

$\mathit{d}=(\parallel {\mathit{d}}_{1}\parallel ,\phantom{\rule{0.277778em}{0ex}}\parallel {\mathit{d}}_{2}\parallel ,\cdots ,\parallel {\mathit{d}}_{p}{\parallel )}^{T}$, and ⊙ is a Hadamard product.

From Equation (

5), we then construct the following model:

where

$\mathit{X}=\left({x}_{iq}\right)\phantom{\rule{0.277778em}{0ex}}{x}_{iq}\in \mathbb{R}(i=1,2,\cdots ,p;\phantom{\rule{0.277778em}{0ex}}q=1,2,\cdots ,d)$ is a coordinate matrix with rank

$d\phantom{\rule{0.277778em}{0ex}}(d\le p)$,

d is determined by the researchers, and

$\mathit{E}=\left({e}_{ij}\right)\phantom{\rule{0.277778em}{0ex}}{e}_{ij}\in \mathbb{R}\phantom{\rule{0.277778em}{0ex}}(i,j=1,2,\cdots ,n)$ is an error matrix. In addition, the following constraint is added to

$\mathit{X}$:

where

${\mathit{x}}_{i}={({x}_{i1},{x}_{i2},\cdots ,{x}_{id})}^{T}$. From the constraint of Equation (

7),

${\mathit{x}}_{i}^{T}{\mathit{x}}_{j}$ becomes the correlation coefficient between

i and

j with rank

d [

20,

21], and

$\mathit{X}{\mathit{X}}^{T}$ also becomes a correlation matrix with rank

d. The reason why

${\mathit{X}}^{T}\mathit{X}$ becomes a correlation matrix is explained in [

15].

To reiterate, the purpose of the proposed method is estimating a low-rank matrix such that the clustering structure is emphasized. To achieve this purpose,

$\mathit{X}$ is estimated such that the sum of squares for

$\mathit{E}$ in Equation (

6) is minimized.

#### 2.3. Algorithm for Estimating Low-Rank Correlation Matrix Based on MM Algorithm

This subsection provides a detailed description of the MM algorithm for estimating

$\mathit{X}$. First, we derive the majorizing function of Equation (

8) in the same manner as in [

24,

25]. The updated formula for

${\mathit{x}}_{i}$ is also derived based on the majorizing function. Based on the majorizing function, the problem is then converted into a linear problem. Therefore, the updated formula can be derived via the Lagrange multipliers method with the constraint Equation (9).

However, first, before the derivation the majorizing function is presented, the principle of the MM algorithm is explained. For more details on the MM algorithm, see [

24,

25]. Let

$f\left(\theta \right)$ be an objective function with the parameter

$\theta $, and

$g\left(\theta \right|\phantom{\rule{0.277778em}{0ex}}{\theta}^{\left(t\right)})$ be a real valued function, where

${\theta}^{\left(t\right)}$ is the fixed value of

$\theta $. When

$g\left(\theta \right|\phantom{\rule{0.277778em}{0ex}}{\theta}^{\left(t\right)})$ satisfies

$g\left(\theta \right|\phantom{\rule{0.277778em}{0ex}}{\theta}^{\left(t\right)})$ is said to be a majorizing function. In general, deriving the updated formula for

$g\left(\theta \right|\phantom{\rule{0.277778em}{0ex}}{\theta}^{\left(t\right)})$ is expected to be easy compared to that for

$f\left(\theta \right)$. Therefore, in this proposed approach, the majorizing function of (

8) is derived in the same manner as in [

24,

25].

The objective function Equation (

8) can now be defined as follows:

The third term of Equation (

10) corresponding to

i is expressed as follows:

where

${\mathit{B}}_{i}=\parallel {\mathit{d}}_{i}{\parallel}^{2}\phantom{\rule{0.277778em}{0ex}}{\sum}_{j\ne i}{\parallel {\mathit{d}}_{j}\parallel}^{2}\phantom{\rule{0.277778em}{0ex}}{\mathit{x}}_{j}{\mathit{x}}_{j}^{T}$. Let

${\lambda}_{i}$ be the maximum eigenvalue of

${\mathit{B}}_{i}$. For any

$\mathit{x}\in {\mathbb{R}}^{d}$ with the constraint

$\parallel \mathit{x}\parallel =1$, the following inequality is satisfied:

where

${\mathit{I}}_{d}$ is an identity matrix of size

d. That is,

${\mathit{B}}_{i}-{\lambda}_{i}$ is negative semi-definite. Let

${\mathit{x}}_{i}\in {\mathbb{R}}^{d}$ and

${\mathit{z}}_{i}\in {\mathbb{R}}^{d}$ be coordinate vectors of subject

i corresponding to the current step and the previous step, respectively. Based on Equation (

12), the following inequality is satisfied:

If

${\mathit{x}}_{i}={\mathit{z}}_{i}$, Equation (

13) becomes an equality equation. Equation (

13) is substituted into the third term of Equation (

10), resulting in

From (

14), the updated formula of

${\mathit{x}}_{i}$ is derived as follows:

For the algorithm based on Equation (

15), see Algorithm 1.

**Algorithm 1** Estimating correlation matrix with rank d |

**Require:** Inner product $\mathsf{\Delta}$, rank d, initial vectors ${\mathit{z}}_{i}\phantom{\rule{0.277778em}{0ex}}(i=1,2,\cdots ,p)$, and $\epsilon >0$
**Ensure:** coordinate matrix $\mathit{X}$ with rank d $t\leftarrow 0$ **while** ${L}^{\left(t\right)}\left(\mathit{X}\right|\mathsf{\Delta})-{L}^{(t-1)}\left(\mathit{X}\right|\mathsf{\Delta})\ge \epsilon $ **do** **for** $i=1,2,\cdots ,p$ **do** ${\mathit{x}}_{i}\leftarrow \left({\sum}_{j\ne i}{\delta}_{ij}\parallel {\mathit{d}}_{i}\parallel \phantom{\rule{0.277778em}{0ex}}\parallel {\mathit{d}}_{j}\parallel \phantom{\rule{0.277778em}{0ex}}{\mathit{x}}_{j}-({\mathit{B}}_{i}-{\lambda}_{i}{\mathit{I}}_{d}){\mathit{z}}_{i}\right)/\parallel {\sum}_{j\ne i}{\delta}_{ij}\parallel {\mathit{d}}_{i}\parallel \phantom{\rule{0.277778em}{0ex}}\parallel {\mathit{d}}_{j}\parallel \phantom{\rule{0.277778em}{0ex}}{\mathit{x}}_{j}-({\mathit{B}}_{i}-{\lambda}_{i}{\mathit{I}}_{d}){\mathit{z}}_{i}\parallel $ ${\mathit{z}}_{i}\leftarrow {\mathit{x}}_{i}$ **end for** $t\leftarrow t+1$ **end while** **return**$\mathit{X}$ |

Finally, to detect the clustering structure of the variables, k-means is applied to $\mathit{X}$, with which k clusters on the d dimensions are detected from ${\mathit{x}}_{i}\in {\mathbb{R}}^{d}\phantom{\rule{0.277778em}{0ex}}(i=1,2,\cdots ,p)$.

In short, the proposed method can estimate p by d matrix $\mathit{X}$, which is composed of the coordinates of variables on d dimensions, from the inner product matrix with full rank $\mathsf{\Delta}$. Therefore, estimating $\mathit{X}$ from $\mathsf{\Delta}$ is considered a dimensional reduction. Furthermore, k-means is applied to $\mathit{X}$ such that the rows of $\mathit{X}$ are classified into k clusters.

#### 2.4. Simulation Study

In this subsection, the superiority of the proposed method is shown via the results of a numerical simulation. In particular, the recovery of clustering results is evaluated in this simulation.

First, we reveal the simulation design. To evaluate the clustering results, artificial data with a true clustering structure are generated and correlation matrices between variables are calculated from the data. Dimensional reduction clustering approaches are then applied to the difference between the two correlation matrices, and the clustering results are obtained. In this numerical simulation, the true number of clusters is assumed to be known beforehand. Finally, an adjusted Rand index (ARI) [

28] between the true clustering structure and the estimated clustering structure is calculated, and the effectiveness of the proposed method is compared with that of existing approaches. Here, ARI is the similarity index between two clustering results. When two clustering results are completely equivalent to each other, ARI becomes 1; otherwise, ARI becomes close to 0.

Afterward, we explain the method for generating the artificial data. Multivariate data representing condition 1 and condition 2 are generated as follows:

where

X and

Y are random vectors of conditions 1 and 2, respectively;

${\mathbf{0}}_{p}$ is a vector with length

p, for which the elements are all zero; and

${\Sigma}_{X}\in {[-1,1]}^{p\times p}$ and

${\Sigma}_{X}\in {[-1,1]}^{p\times p}$ are true correlation matrices of condition 1 and condition 2, respectively. Here, in this simulation,

p is set to 120. From Equation (

16),

p dimensional vectors are generated 60 times for each condition, and sample correlation matrices of condition 1 and condition 2 are calculated as

$\mathit{R}\left(X\right)\in {[-1,1]}^{p\times p}$ and

$\mathit{R}\left(Y\right)\in {[-1,1]}^{p\times p}$, respectively. The input data are then calculated as

$\mathit{D}=\mathit{R}\left(X\right)-\mathit{R}\left(Y\right)$.

In this simulation, four factors are utilized; a summary of the simulation is shown in

Table 1. As a result, there are

$4\times 3\times 3\times 2=72$ patterns in this simulation. For each pattern, the corresponding artificial data are generated 100 times and evaluated using the ARI. The descriptions of the factors are presented as follows.

**Factor** **1:****Methods**

For this factor, we evaluate four methods. The purpose of setting this factor is to evaluate the effect of using an inner product and to estimate a bounded low-rank matrix. Here, the proposed method is referred to as method 1. By contrast, method 2 is designed to have an approach similar to that of the proposed approach, except method 2 is based only on difference and not on the inner product. Through a comparison between the proposed method and method 2, we can evaluate the effect of using the inner product. On the other hand, in method 3, the low-rank matrix is estimated from only the difference and calculated via Cholesky decomposition, where these estimated values are characterized by no constraint. Based on a comparison between the proposed method and method 3, the effect of both the inner product and bounded constraint is evaluated. Meanwhile, in method 4, the low-rank matrix is estimated from the inner product; however, the estimated values are characterized by no constraint. Therefore, based on a comparison between the proposed method and method 4, the effect of the bounded constraints is estimated.

These four methods are then explained from the perspective of calculation. The first method is the proposed approach based on the inner product of

$\mathit{D}$. For the second method, an approach based on difference, and not on the inner product model, is adopted. The second method consists of two steps, as does the proposed method. First,

$\mathit{D}$ is decomposed as

$\mathit{D}={\mathit{B}}^{T}\mathit{B}$ using Cholesky decomposition, where

$\mathit{B}=({\mathit{b}}_{1},{\mathit{b}}_{2},\cdots ,{\mathit{b}}_{p})\phantom{\rule{0.277778em}{0ex}}{\mathit{b}}_{i}\in {\mathbb{R}}^{q}(i=1,2,\cdots ,p;\phantom{\rule{0.277778em}{0ex}}q\le p)$. Let

${\mathit{b}}^{\u2020}=(\parallel {\mathit{b}}_{1}\parallel ,\phantom{\rule{0.277778em}{0ex}}\parallel {\mathit{b}}_{2}\parallel ,\cdots ,\phantom{\rule{0.277778em}{0ex}}\parallel {\mathit{b}}_{p}{\parallel )}^{T}$; therefore, there exists

$cos{\varphi}_{ij}$ such that

${d}_{ij}=\parallel {\mathit{b}}_{i}\parallel \phantom{\rule{0.277778em}{0ex}}\parallel {\mathit{b}}_{j}\parallel \phantom{\rule{0.277778em}{0ex}}cos{\varphi}_{ij}\phantom{\rule{0.277778em}{0ex}}(0\le {\varphi}_{ij}<2\pi )$. From the decomposition, the optimization problem of the second method is formulated as follows:

where

${\mathit{X}}_{2}=({\mathit{x}}_{1}^{\left(2\right)},\phantom{\rule{0.277778em}{0ex}}{\mathit{x}}_{2}^{\left(2\right)},\phantom{\rule{0.277778em}{0ex}}\cdots ,\phantom{\rule{0.277778em}{0ex}}{\mathit{x}}_{p}^{\left(2\right)}),\phantom{\rule{0.277778em}{0ex}}{\mathit{x}}_{i}^{\left(2\right)}\in {\mathbb{R}}^{d}$. The parameters of Equation (

17) can be estimated in the same manner as in the proposed method. Subsequently,

k-means is applied to the estimated

${\mathit{X}}_{2}$.

The third method also consists of two steps. Eigenvalue decomposition is applied to $\mathit{D}$, and the low-rank matrix corresponding to ${\lambda}_{1},\phantom{\rule{0.277778em}{0ex}}{\lambda}_{2},\phantom{\rule{0.277778em}{0ex}}\cdots ,\phantom{\rule{0.277778em}{0ex}}{\lambda}_{d}$ is estimated, where ${\lambda}_{1}\ge {\lambda}_{2}\ge \cdots \ge {\lambda}_{q}\phantom{\rule{0.277778em}{0ex}}(q\le p)$ are eigenvalues of $\mathit{D}$. Afterward, k-means is applied to the estimated low-rank matrix. The fourth method is similar to the third method, with the only difference being that eigenvalue decomposition is applied to the inner product matrix of $\mathit{D}$ and not only to $\mathit{D}$.

These methods, i.e., method 2, method 3, and method 4, are referred to as tandem approaches [

29], each of which consists of two steps. First, the existing dimensional reduction method is applied to the data and the low-rank data matrix is estimated. Second, a typical clustering algorithm, such as

k-means clustering, is applied to the estimated low-rank data matrix. In a practical situation, these approaches are sometimes used.

Both the third and fourth methods provide us with low-rank matrices and clustering results. However, from these results, it is difficult to interpret the degree of the relation because these estimated values are not bounded. On the other hand, both the first and second methods allow us to interpret the results easily because these estimated values are bounded within the range $-1$ to 1.

**Factor** **2:****Rank**

For the four methods mentioned in the previous subsection, the rank must be determined. In this simulation, ranks are set to $2,\phantom{\rule{0.277778em}{0ex}}3$, and 4.

**Factor** **3:****Number of clusters**

All four methods mentioned in the Factor 1: Methods subsection adopt k-means clustering. In the simulation, we assume that the the number of true clusters is known beforehand. The number of clusters is set to $k=2,\phantom{\rule{0.277778em}{0ex}}3,$ and 4.

**Factor** **4:****Difference between true correlations**

At first, the generation of true clustering structures is determined by

$k=2,3$, and 4. The true clustering structure is dependent on

${\Sigma}_{X}$ and

${\Sigma}_{Y}$. When

$k=2$, the equations for

${\Sigma}_{X}$ and

${\Sigma}_{Y}$ are

where

${\mathit{O}}_{60}\in {\left\{\phantom{\rule{0.277778em}{0ex}}0\phantom{\rule{0.277778em}{0ex}}\right\}}^{60\times 60}$,

${\mathit{I}}_{60}$ is a

$60\times 60$ identity matrix,

${\Sigma}_{1}=\left({\sigma}_{ij}^{\left(1\right)}\right)\phantom{\rule{4pt}{0ex}}(i,j=1,2,\cdots ,60)$, and

${\sigma}_{ij}^{\left(1\right)}=1$ if

$i=j$. To reiterate, there are two levels for simulation factor 4. The first level and second level are

${\sigma}_{ij}^{\left(1\right)}=0.1\phantom{\rule{0.277778em}{0ex}}(i\ne j)$ and

${\sigma}_{ij}^{\left(1\right)}=0.2\phantom{\rule{0.277778em}{0ex}}(i\ne j)$, respectively. When

${\sigma}_{ij}^{\left(1\right)}=0.1\phantom{\rule{0.277778em}{0ex}}(i\ne j)$, the corresponding difference matrix is more affected by observational error compared to when

${\sigma}_{ij}^{\left(1\right)}=0.2\phantom{\rule{0.277778em}{0ex}}(i\ne j)$. With this factor, the degree of effect on the observational error is evaluated.

For

$k=3$,

where

${\Sigma}_{2}\in {\{0,\phantom{\rule{0.277778em}{0ex}}0.1,\phantom{\rule{0.277778em}{0ex}}0.2,\phantom{\rule{0.277778em}{0ex}}1\}}^{40\times 40}$ is defined in the same manner as

${\Sigma}_{1}$. Finally, for

$k=4$,

where

${\Sigma}_{3}\in {\{0,\phantom{\rule{0.277778em}{0ex}}0.1,\phantom{\rule{0.277778em}{0ex}}0.2,\phantom{\rule{0.277778em}{0ex}}1\}}^{30\times 30}$ is defined in the same manner as both

${\Sigma}_{1}$ and

${\Sigma}_{2}$.

With the aforementioned ${\Sigma}_{X}$ and ${\Sigma}_{Y}$, a true clustering structure is described. For $k=2,\phantom{\rule{0.277778em}{0ex}}3$, and 4, the cluster sizes of each are the same; that is, for $k=2,\phantom{\rule{0.277778em}{0ex}}3$, and 4, the cluster sizes are 60, 40, and 30, respectively.