[High Dimensional Statistics] Maximum Mean Discrepancy(MMD)
References
- Yonsei univ. STA9104 : Statistical theory for high dimensional and big data
- A Kernel two-sample test , Gretton, 2012.
1. IntroductionPermalink
1. 1. Two sample test problem settingPermalink
Hypothesis :
-
classical method to test two-sample test.
there are many classical ways for two-sample test. It include,
- t-test, in case of 1-dimensional
- Hotelling
test , which is generalization of t-test into- However, it only applies when
and only sensitive to mean difference
- However, it only applies when
- Tests based on empirical CDF, e.g. K.S. test and CVM test
- Sensitive to any difference in distribution,
- However, hard to generalize to
- MMD solve a lot of problems.
- It workd well for multivariate data
- It is sensitive to general alternatives
- It is easy to estimate
- It is an example of IPM (Intergral Probability Metrics)
1.2. Integral Probability Metrics(IPM)Permalink
Definition (Integral Probability Metrics) for a class if function
Examples are total variation distance, wasserstein distance, kolmogorov distance.
2. Maximun Mean Discrepancy(MMD)Permalink
-
If
is too larger, it is difficult to estimate IPM( ). Also, if is too small, even if . -
MMD takes middle ground between these two extremes,
It takes
to be a unit ball in a RKHS
2.1. Reproducin Kernel Hilbert spacePermalink
Definition (Reproducing kernel Hibert space)
A RKHS is a Hilber space
2.2. Definition of MMDPermalink
Definition (Maximun Mean Discrepancy)
Let
, That is, MMD is a IPM with
2.3. Properties of MMDPermalink
-
Property 1
If RKHS is associated with a “characteristic kernel”, such as gaussian kernel, then
- Property 2
In fact, the upper bound can be achievd by some
is achived. Even if it is sup, since it can be achieved, we can say that sup is achieved. That is, optimization problem in IPM calculation, which is obtaining supremum, is bypassed.
Also, there is no
For estimation, consider Plug-in estimator, or a V-statistic in our context :
2.4. Convergence of MMDPermalink
Let
Also, assume that the kernel
Want to show :
proof)
Step 1) obtain upper bound for Rademacher Complexity
Let
why?
Step 2) Implment McDiarmid’s Inequality
To use McDiardmid’s inequality, we need to derive Bounded difference condition.
Fisrt, bound the
Here, Let’s denote
as
Then, we can find bounded difference condition for
for
Therefore, if we do smiliar procedure to
Because
Step 3) control
To control
Applying the above result (a), It is easy to verify that
Step 4) Merge the ingredients
Merging two results,
2.5. Testing based on MMDPermalink
Suppose that we want to test whether
Because
applying similar procedure as 2.4. , we can find
By letting
Therefore, we reject
- Because Information distribution is not needed. there is no term related to
or
2.6. Permutation test based on MMDPermalink
Procedure
-
for
: 1 to :-
Randomly permute (
) ( ) =( ) - Calculate
- Reject the null of
if
-
Leave a comment