[High Dimensional Statistics] Maximum Mean Discrepancy(MMD)

Date :     Updated :

References

  • Yonsei univ. STA9104 : Statistical theory for high dimensional and big data
  • A Kernel two-sample test , Gretton, 2012.

1. Introduction

1. 1. Two sample test problem setting

$X_1, \cdots, X_n \overset{iid}{\sim} p$ and $Y_1, \cdots, Y_n \overset{iid}{\sim} q$


Hypothesis : $H_0 : p=q \quad, \quad H_1 : p \neq q$


  • classical method to test two-sample test.

    there are many classical ways for two-sample test. It include,

    1. t-test, in case of 1-dimensional
    2. Hotelling $T - squared$ test , which is generalization of t-test into $\mathbb{R}^d$
      • However, it only applies when $n<d$ and only sensitive to mean difference
    3. Tests based on empirical CDF, e.g. K.S. test and CVM test
      • Sensitive to any difference in distribution,
      • However, hard to generalize to $\mathbb{R}^d$


  • MMD solve a lot of problems.
    1. It workd well for multivariate data
    2. It is sensitive to general alternatives
    3. It is easy to estimate
    4. It is an example of IPM (Intergral Probability Metrics)


1.2. Integral Probability Metrics(IPM)

Definition (Integral Probability Metrics) for a class if function $\mathcal{F}$, IPM between $p$ and $q$ is defined as

\[IPM(p,q) = \underset{f \in \mathcal{F}}{sup} \bigg \vert \mathbb{E}_p [f(x)] - \mathbb{E}_q [f(x)] \bigg\vert\]

Examples are total variation distance, wasserstein distance, kolmogorov distance.

2. Maximun Mean Discrepancy(MMD)

  • If $\mathcal{F}$ is too larger, it is difficult to estimate IPM($p,q$). Also, if $\mathcal{F}$ is too small, $IPM (p,q)=0$ even if $p \neq q$.

  • MMD takes middle ground between these two extremes,

    It takes $\mathcal{F}$ to be a unit ball in a RKHS $\mathcal{H}$


2.1. Reproducin Kernel Hilbert space

Definition (Reproducing kernel Hibert space)

A RKHS is a Hilber space $\mathcal{F}$ defined with an inner product $< \cdot, \cdot >_{\mathcal{F}}$ where it has a reproducing kernel property

\(\begin{align} \mathcal{k} : \chi \times \chi \longrightarrow &\mathbb{R} \quad such \quad that \\ \forall x \in \chi ,\quad \forall f \in \mathcal{F}, \quad &\underbrace{<f(x), \mathcal{k}(:, x)>_{\mathcal{F}} = f(x)}_{Reprodcing \quad Property} \end{align}\)

2.2. Definition of MMD

Definition (Maximun Mean Discrepancy)

Let $\mathcal{F}$ to be a unit ball in a RKHS such that $\forall f \in \mathcal{F}$ , $\vert \vert f \vert \vert_{\mathcal{F}} = \sqrt{<f,f>_\mathcal{F}} \leq1$.Then,

\[MMD(p,q) = \underset{f\in\mathcal{F}}{sup} \bigg\vert \mathbb{E}_p [f(x)] - \mathbb{E}_q [f(x)] \bigg\vert\]

, That is, MMD is a IPM with $f $ being unit ball in RKHS.


2.3. Properties of MMD

  • Property 1

    If RKHS is associated with a “characteristic kernel”, such as gaussian kernel, then

\(MMD( p,q) =0 \quad if \,\,\, and \,\,\, only\,\,\, if \quad p = q\)

  • Property 2

\(\begin{align} MMD^2(p,q) &= \bigg\lbrack \underset{f\in\mathcal{F}}{sup} \bigg\vert \mathbb{E}_p [f(x)] - \mathbb{E}_q [f(x)] \bigg\vert \bigg\rbrack^2\\ &= \bigg\lbrack \underset{f\in\mathcal{F}}{sup} <f, \mathbb{E}_p [\mathcal{k} (\cdot, x) ] - \mathbb{E}_q [\mathcal{k} (\cdot, x) ] \bigg\rbrack^2 \quad \quad\quad\quad\quad\quad\quad\quad\quad\quad\quad \,\longleftarrow by \,\,\, the \,\,\, reproducing \,\,\, property \\ & \leq \vert\vert f \vert \vert _{\mathcal{H}}^2 \,\,\vert \vert \mathbb{E}_p [\mathcal{k} (\cdot, x) ] - \mathbb{E}_q [\mathcal{k} (\cdot, x) \vert \vert_\mathcal{H}^2 \quad \quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\,\,\longleftarrow \vert\vert f \vert \vert_\mathcal{H}^2 \leq 1 \\ & \leq <\mathbb{E}_p [\mathcal{k} (\cdot, x) ] - \mathbb{E}_q [\mathcal{k} (\cdot, x)] ,\mathbb{E}_p [\mathcal{k} (\cdot, x) ] - \mathbb{E}_q [\mathcal{k} (\cdot, x)] >_\mathcal{H}\\ &= <\mathbb{E}_p [\mathcal{k}(\cdot, x)],\mathbb{E}_p [\mathcal{k}(\cdot, x')]> + <\mathbb{E}_q [\mathcal{k}(\cdot, x)],\mathbb{E}_q [\mathcal{k}(\cdot, x')]> -2 <\mathbb{E}_p [\mathcal{k}(\cdot, x)],\mathbb{E}_q [\mathcal{k}(\cdot, x')]> \\ &=\mathbb{E}_{X,X' \overset{ind}{\sim} p} [\mathcal{k} (X,X')] + \mathbb{E}_{Y,Y' \overset{ind}{\sim} q} [\mathcal{k} (Y,Y')] - 2 \mathbb{E}_{X\overset{ind}{\sim} p,Y \overset{ind}{\sim} q} [\mathcal{k} (X,Y)] \longleftarrow \underbrace{\mathcal{k}(x,x') = <\mathcal{k}(\cdot,x), \mathcal{k}(\cdot,x')> _\mathcal{H}}_{RKHS \quad property.} \end{align}\)

​ In fact, the upper bound can be achievd by some $f \in \mathcal{F}$, hence,

\[\begin{align} MMD^2(p,q) =\mathbb{E}_{X,X' \overset{ind}{\sim} p} [\mathcal{k} (X,X')] + \mathbb{E}_{Y,Y' \overset{ind}{\sim} q} [\mathcal{k} (Y,Y')] - 2 \mathbb{E}_{X\overset{ind}{\sim} p,Y \overset{ind}{\sim} q} [\mathcal{k} (X,Y)] \end{align}\]

is achived. Even if it is sup, since it can be achieved, we can say that sup is achieved. That is, optimization problem in IPM calculation, which is obtaining supremum, is bypassed.

Also, there is no $f$ in equation, which means that it is very straightforward to estimate.


​ For estimation, consider Plug-in estimator, or a V-statistic in our context :

\[\begin{align} \hat{MMD}^2(p,q) &= \bigg\lbrack \underset{f \in \mathcal{F}}{sup} \bigg\vert \frac{1}{m} \sum_{i=1}^m f(x_i) - \frac{1}{n} \sum_{j=1}^n f(x_j)\bigg\vert \bigg\rbrack ^2 \\ &= \frac{1}{m^2} \sum_{i,j=1}^m \mathcal{k}(x_i, x_j) +\frac{1}{n^2} \sum_{i,j=1}^n \mathcal{k}(y_i, y_j) - \frac{2}{nm} \sum_{j=1}^n\sum_{i}^m \mathcal{k}(x_i, y_j) \end{align}\]

2.4. Convergence of MMD

Let $X_1, \cdots , X_m $ be $i.i.d.$ observations from $p$ and $Y_1 , \cdots, Y_n $ be $i.i.d.$ observation from $q$. Assum that these two samples are mutually independent. Let $\mathcal{F}$ be the unit ball in a reproducing kernel Hilbert space(RKHS) associated with kernel $\mathcal{k}$ and define

\[\begin{align} &MMD = \underset{f \in \mathcal{F} }{sup} \bigg\vert \mathbb{E}_p [f(X)] - \mathbb{E}_q [f(Y)] \bigg\vert \\ &\widehat{MMD} = \underset{f \in \mathcal{F}}{sup} \bigg\vert \frac{1}{m} \sum_{i=1}^m f(X_i) - \frac{1}{n} \sum_{j=1}^n f(Y_j)\bigg\vert \end{align}\]

Also, assume that the kernel $k$ is bounded as $0 \leq \mathcal{k}(x,y) \leq K$ for all $x,y$ in the domain $\mathcal{X}$.


Want to show :

\(\mathbb{P} \bigg\lbrack \vert \widehat{MMD} - MMD \vert > 2 \bigg( \sqrt{\frac{K}{m}} + \sqrt{\frac{K}{n}} \bigg) +t \bigg\rbrack \leq \exp \bigg(- \frac{t^2 mn}{2K(m+n)} \bigg)\)

proof)

Step 1) obtain upper bound for Rademacher Complexity

Let $\epsilon_t, \cdots , \epsilon_m $ be $i.i.d.$ Rademacher random variables taking values of ${$1 , 2$}$where $\mathbb{P}(\epsilon_1 = -1) =\mathbb{P}(\epsilon_1 = 1)= \frac{1}{2}$. Then, the Rademacher coplexity of $\mathcal{F}$ satisfies

\[\mathcal{R}_m (\mathcal{F}) = \mathbb{E} _{X, \epsilon} \bigg\lbrack \underset{f\in \mathcal{F}}{sup} \bigg \vert \frac{1 }{m} \sum_{i=1}^{m} \epsilon_i f(X_i) \bigg\vert \bigg\rbrack \leq \sqrt{\frac{K}{m}}\]

why?

\(\begin{align} \mathcal{R}_m (\mathcal{F}) &= \mathbb{E} _{X, \epsilon} \bigg\lbrack \underset{f\in \mathcal{F}}{sup} \bigg \vert \frac{1 }{m} \sum_{i=1}^{m} \epsilon_i f(X_i) \bigg\vert \bigg\rbrack \\ &= \mathbb{E}_{X,\epsilon} \bigg\lbrack \underset{f\in \mathcal{F}}{sup} \bigg \vert <f, \frac{1}{m} \sum \epsilon_i \mathcal{k} (X_i ) >_{\mathcal{H}} \bigg \vert \bigg \rbrack \longleftarrow reproducing \,\,\, property \\ & \leq \frac{1}{m }\mathbb{E}_{X,\epsilon} \bigg\lbrack \vert \vert f \vert \vert_\mathcal{H} \,\, \vert \vert \sum \epsilon_i \mathcal{k} (X_i )\vert \vert_{\mathcal{H}} \bigg\rbrack \quad \quad \longleftarrow Cauchy \,\,\,Schwarz \,\,\,inequality\\ &\leq \frac{1}{m} \mathbb{E}_{X,\epsilon} \bigg\lbrack \vert \vert \sum \epsilon_i \mathcal{k} (X_i )\vert \vert_{\mathcal{H}} \bigg\rbrack \quad\quad\quad\quad\,\,\,\,\longleftarrow \mathcal{F} \,\,\, is\,\,\, unit \,\,\,ball\\ & = \frac{1}{m} \mathbb{E}_{X,\epsilon} \bigg\lbrack \big{<} \sum_{i=1}^m \epsilon_i \mathcal{k} (X_i ) , \sum_{j=1}^m \epsilon_j \mathcal{k} (X_j )\big{>}_{\mathcal{H}}^{\frac{1}{2}}\bigg\rbrack \\ & = \frac{1}{m} \mathbb{E}_{X,\epsilon} \bigg\lbrack \sum_{i=1}^m\sum_{j=1}^m \epsilon_i \epsilon_j\big{<} \mathcal{k} (X_i ) , \mathcal{k} (X_j )\big{>}_{\mathcal{H}}^{\frac{1}{2}}\bigg\rbrack \\ &= \frac{1}{m} \mathbb{E}_{X,\epsilon} \bigg\lbrack \bigg( \sum_{i=1}^m\sum_{j=1}^m \epsilon_i \epsilon_j\mathcal{k}(X_i, X_j ) \bigg)^{\frac{1}{2}}\bigg\rbrack \,\,\longleftarrow reproducing \,\,\, property \\ & \leq \frac{1}{m} \bigg( \sum_{i=1}^m\sum_{j=1}^m \mathbb{E}_{X,\epsilon}\big\lbrack \epsilon_i \epsilon_j\mathcal{k}(X_i, X_j ) \big\rbrack\bigg)^{\frac{1}{2}} \,\,\,\,\longleftarrow Jensen's \,\,\, inequality \\ & = \frac{1}{m} \bigg( \sum_{i=1}^m \mathbb{E}_{X_i}\big\lbrack \mathcal{k}(X_i, X_i ) \big\rbrack \bigg)^{\frac{1}{2}} \quad\quad\quad\quad\,\longleftarrow if \,\,\, i\neq j, \,\,\, 0 \,\, \because E[\epsilon_i \epsilon_j ]=0 \\ &\leq \frac{1} {m} (mK)^{\frac{1}{2}} \\ & = \sqrt{\frac{K}{m}} \end{align}\)

Step 2) Implment McDiarmid’s Inequality

To use McDiardmid’s inequality, we need to derive Bounded difference condition.


Fisrt, bound the $ \vert \widehat{MMD} - MMD \big \vert$ by some formula satisfying bounded difference condition .

\[\begin{align} \vert \widehat{MMD} - MMD \big \vert &= \bigg\vert \underset{f \in \mathcal{F}}{sup} \big\vert \frac{1}{m} \sum_{i=1}^m f(X_i) - \frac{1}{n} \sum_{j=1}^n f(Y_j)\big\vert - \underset{f \in \mathcal{F} }{sup} \big\vert \mathbb{E}_p [f(X)] - \mathbb{E}_q [f(Y)] \big\vert \bigg\vert\\ &\leq \underset{f \in \mathcal{F}}{sup} \big\vert \frac{1}{m} \sum_{i=1}^m f(X_i) - \frac{1}{n} \sum_{j=1}^n f(Y_j) - \mathbb{E}_p [f(X)] - \mathbb{E}_q [f(Y)] \big\vert \quad \longleftarrow Triangle \,\,\, inequality \end{align}\]

Here, Let’s denote

\[\underset{f \in \mathcal{F}}{sup} \big\vert \frac{1}{m} \sum_{i=1}^m f(X_i)- \frac{1}{n} \sum_{j=1}^n f(Y_j) - \mathbb{E}_p [f(X)]- \mathbb{E}_q [f(Y)] \big\vert\]

as

\[:= \triangle(X_1, \cdots ,X_m , Y_{m+1}, \cdots ,Y_{m_n}) \equiv \triangle(X,Y)\]


Then, we can find bounded difference condition for $\triangle(X,Y)$

for $i = 1,2, \cdots, m$,

\[\begin{align} &\quad\,\,\vert \triangle(X_1 , \cdots, X_i, \cdots, X_m, Y_{m+1} , \cdots, Y_{m+n}) - \triangle(X_1 , \cdots, X_i', \cdots, X_m, Y_{m+1} , \cdots, Y_{m+n}) \vert \\ &\leq \frac{1}{m} \underset{f\in\mathcal{F}}{sup} \vert f(X_i) - f(X_i') \vert\\ &=\frac{1}{m}\underset{f\in\mathcal{F}}{sup} \vert \big<f, \mathcal{k}(\cdot, X_i )\big>_\mathcal{H} - \big<f, \mathcal{k}(\cdot, X_i ')\big>_\mathcal{H} \vert \quad\quad\longleftarrow reproducing \,\,\, property \\ & =\frac{1}{m}\underset{f\in\mathcal{F}}{sup} \vert \big<f, \mathcal{k}(\cdot, X_i )- \mathcal{k}(\cdot, X_i ')\big>_\mathcal{H} \vert\\ & \leq\frac{1}{m}\underset{f\in\mathcal{F}}{sup} \vert\vert f \vert\vert_{\mathcal{H}} \,\,\vert\vert\mathcal{k}(\cdot, X_i )- \mathcal{k}(\cdot, X_i ')\vert\vert_{\mathcal{H}} \longleftarrow Cauchy-Schwarz \\ &\leq\frac{1}{m} \vert\vert\mathcal{k}(\cdot, X_i )- \mathcal{k}(\cdot, X_i ')\vert\vert_{\mathcal{H}}\quad\quad\quad\,\quad \longleftarrow \because \mathcal{F} \,\,\,is \,\,\,a \,\,\,unit\,\,\,ball \quad and \,\,\,no \,\,f\,\,in \,\,\,the\,\,\,rest \\ & = \frac{1}{m} \big<\mathcal{k}(\cdot, X_i )- \mathcal{k}(\cdot, X_i ') , \mathcal{k}(\cdot, X_i )- \mathcal{k}(\cdot, X_i ') \big>_\mathcal{H}^{\frac{1}{2}} \\ & = \frac{1}{m} \vert \mathcal{k}(x_i, x_i) + \mathcal{k}(x_i ', x_i) -2 \mathcal{k} (x_i , x_i ') \vert^{\frac{1}{2}} \quad\quad\longleftarrow reproducing \,\,\, property \\ & \leq \frac{1}{m} \big( \vert \mathcal{k}(x_i, x_i)\vert + \vert\mathcal{k}(x_i ', x_i)\vert + \vert2 \mathcal{k} (x_i , x_i ') \vert\big)^{\frac{1}{2}} \longleftarrow triangle \,\,\, inequality \\ & \leq 2\frac{ K^{\frac{1}{2}}}{m} \end{align}\]

Therefore, if we do smiliar procedure to $i = m+1, \cdots, m+n$,

\[L_i = \cases{ 2\frac{ K^{\frac{1}{2}}}{m} \quad for \quad i = 1,\cdots,m \\ 2\frac{ K^{\frac{1}{2}}}{n} \quad for \quad i = m+1, \cdots, m+n}\]

Because $\sum_{i=1}^{m+n} L_i^2 = \frac{(m+n)4K}{mn}$, applying Mcdiarmid’s inequality

\[\mathbb{P} \bigg\lbrack \triangle (X,Y) - \mathbb{E }[{\triangle(X,Y)}] > t \bigg\rbrack \leq \exp \bigg( - \frac{t^2 mn} {2K(m+n)}\bigg)\]

Step 3) control $\mathbf{\mathbb{E}[\triangle(X,Y)]}$

To control $\mathbb{E}[\triangle(X,Y)]$, Symmtetrization and Rademacher coplexity is used. The result from class is

\[\begin{align} \mathbb{E}\big[ \widehat{MMD} - MMD \big\rbrack & = \mathbb{E}_{X,Y} \bigg\lbrack \underset{f \in \mathcal{F}}{sup} \big\vert \frac{1}{m} \sum_{i=1}^m f(X_i) - \frac{1}{n} \sum_{j=1}^n f(Y_j)\big\vert - \underset{f \in \mathcal{F} }{sup} \big\vert \mathbb{E}_p [f(X)] - \mathbb{E}_q [f(Y)] \big\vert \bigg\rbrack \\ Traidngle \,\,\, ineqauality \longrightarrow \quad& \leq \mathbb{E}_{X,Y} \bigg\lbrack\underset{f \in \mathcal{F}}{sup} \big\vert \frac{1}{m} \sum_{i=1}^m f(X_i) - \frac{1}{n} \sum_{j=1}^n f(Y_j) - \mathbb{E}_p [f(X)] - \mathbb{E}_q [f(Y)] \big\vert \bigg\rbrack\\ Symmetrization \longrightarrow \quad& = \mathbb{E}_{X,Y} \bigg\lbrack\underset{f \in \mathcal{F}}{sup} \big\vert \frac{1}{m} \sum_{i=1}^m f(X_i) - \frac{1}{n} \sum_{j=1}^n f(Y_j) - \mathbb{E}_{X'} [\frac{1}{m} \sum_{i=1}^mf(X_i')] - \mathbb{E}_{Y'} [\frac{1}{n} \sum_{i=1}^nf(Y_i')] \big\vert \bigg\rbrack \\ Jensen's inequality \longrightarrow \quad& \leq \mathbb{E}_{X,X', Y,Y'} \bigg\lbrack\underset{f \in \mathcal{F}}{sup} \big\vert \frac{1}{m} \sum_{i=1}^m \big(f(X_i)-f(X_i')\big) - \frac{1}{n} \sum_{j=1}^n \big(f(Y_j) - f(Y_j ') \big) \big\vert \bigg\rbrack \\ Ramadecahar\,\,\,variables \longrightarrow \quad& = \mathbb{E}_{X,X', Y,Y', \epsilon, \epsilon '} \bigg\lbrack\underset{f \in \mathcal{F}}{sup} \big\vert \frac{1}{m} \sum_{i=1}^m \epsilon_i\big(f(X_i)-f(X_i')\big) - \frac{1}{n} \sum_{j=1}^n \epsilon_j' \big(f(Y_j) - f(Y_j ') \big) \big\vert \bigg\rbrack\\ Traidngle \,\,\, ineqauality \longrightarrow \quad& \leq \mathbb{E}_{X, \epsilon} \bigg\lbrack \bigg\vert\frac{1}{m} \sum_{i=1}^m \epsilon_i f(X_i) \bigg\vert\bigg\rbrack+ \mathbb{E}_{X', \epsilon}\bigg\lbrack \bigg\vert\frac{1}{m} \sum_{i=1}^m \epsilon_i f(X_i') \bigg\vert \bigg\rbrack\\&+ \mathbb{E}_{Y, \epsilon'}\bigg\lbrack \bigg\vert\frac{1}{m} \sum_{i=1}^m \epsilon_i 'f(Y_i) \bigg\vert \bigg\rbrack+ \mathbb{E}_{Y', \epsilon'}\bigg\lbrack \bigg\vert\frac{1}{m} \sum_{i=1}^m \epsilon_i 'f(Y_i') \bigg\vert\bigg\rbrack \\ Ramedacher\,\,\,Complexity \longrightarrow \quad&=2 \mathcal{R}_m (\mathcal{F}) + 2 \mathcal{R}_n (\mathcal{F}) \\ \end{align}\]

Applying the above result (a), It is easy to verify that

\[\mathbb{E}\big[ \vert \widehat{MMD} - MMD \big \vert \big\rbrack \leq 2 \sqrt{\frac{K}{m}} + 2\sqrt{\frac{K}{n}}\]

Step 4) Merge the ingredients

Merging two results,

\[\mathbb{P} \bigg\lbrack \triangle (X,Y) > t +\mathbb{E }[{\triangle(X,Y)}]\bigg\rbrack \leq \mathbb{P} \bigg\lbrack \triangle (X,Y) > t +2 \sqrt{\frac{K}{m}} + 2\sqrt{\frac{K}{n}}\bigg\rbrack\leq \exp \bigg( - \frac{t^2 mn} {2K(m+n)}\bigg)\]

2.5. Testing based on MMD

Suppose that we want to test whether $H_0 : p=q$ or $H_1 : p \neq q$. Based on above ineqaulity, we can construct a valid level $\alpha$ test $\phi : { X_1 , \cdots, X_m, Y_1, \cdots, Y_n} \rightarrow {0 , 1}$ such that $\mathbb{P} (\phi=1) \leq \alpha $ under $H_ : p=q $.


Because $p=q$ under $H_0$ , $MMD = 0 $ and we can treat $Y_i = X_i ‘$. ($\because$ both from same distribution $p=q$). Therefore,

$ \widehat{MMD} - MMD = \widehat{MMD} = \underset{ f \in \mathcal{F} }{sup} \big( \frac{1}{m} \sum_{i=1}^m (f(x_i) - f(x_i ‘)\big)$


applying similar procedure as 2.4. , we can find

\[\mathbb{P} (\widehat{MMD} > t + \sqrt{\frac{2K}{m}}) \leq \exp (-\frac{t^2m}{4K})\]

By letting $\alpha = \exp (-\frac{t^2m}{4K})$ , $t = \sqrt{\frac{-4Klog \alpha}{m}}$

\[\mathbb{P} \bigg\lbrack \widehat{MMD} > \sqrt{\frac{2K}{m}} + \sqrt{\frac{-4Klog \alpha}{m}} \bigg\rbrack \leq \alpha\]

Therefore, we reject $H_0$ if the calculated estimator is larger than $\sqrt{\frac{2K}{m}} + \sqrt{\frac{-4Klog \alpha}{m}}$


  • Because Information distribution is not needed. there is no term related to $p$ or $q$


2.6. Permutation test based on MMD

Procedure

  • for $i$ : 1 to $B$:

    • Randomly permute ($X_1 , \cdots, X_m , Y_1 , \cdots , Y_n$)

      $\longrightarrow$ ($\tilde{X}, \tilde{Y}$) =($\tilde{X}_1 , \cdots, \tilde{X}_m , \tilde{Y}_1 , \cdots , \tilde{Y}_n$)

    • Calculate $\widehat{MMD}_i ^2(\tilde{X}, \tilde{Y})$
    • Reject the null of $p=q$ if $\tilde{pval} = \frac{1}{B+1} \bigg\vert \sum_{i=1}^B I(\widehat{MMD}> \widehat{MMD}_{obs}) +1 \bigg\vert \leq \alpha$

Leave a comment