Literature Review of a Two-Sample Test
Lee, J. S., Cox, D. D., & Follen, M. (2015). A Two Sample Test for Functional
22(2), 121-135. doi:10.5351/csam.2015.22.2.121.
The study of “A Two Sample Test for Functional Data” was published in the journal Communications for Statistical Applications and Methods. The study was an extensive simulation, which portrays proposed test works in comparison to other methods under a variety of alternatives. This is one of the best methods for testing all alternatives. It is an application to demonstrate real-world data and the applicability for this particular method.
For the simulations they used MATLAB except for the use of smoothing data to create functional data, was done through R. They also used a randomization approach to achieve a specific level of significance. Overall, simulations were used to assess power. Table 1 presented eight testing procedures that were considered in simulation studies. Table 2 was a list of six alternatives for simulations. They also included plots for all six of these alternatives. The second figure was plots of six alternatives again, with the empirical c.d.f.’s of their p-values.
However, some issues are still unresolved which instead are listed as potential research questions. These are, the concern of a bias of sample eigenvalues and the correction of this is a very difficult task. Another issue was establishing the consistency of the Hotelling’s T2 test, which can be very challenging. Despite these issues, they concluded that the used method provides good results for a variety of situations and is very useful for using as a functional data analytic tool.
They discussed other studies of similar nature in this study, primarily those dealing with multivariate analysis using Hotelling’s T-squared test. They state that in general, the methodology for constructing tests that would work well in nonparametric inference problems for single functions. This is also referred to as the Adaptive Neyman methodology. This has been shown to work very well for a variety of problems, as it has good power against a wide range of alternatives.
Furthermore, they made the setting for the setting to extend the null hypothesis in order to assume equal distributions for both of the populations in the samples. This allows for the observations between both populations to be interchangeable. With this being assumed, that randomly permuted the population label, recomputed the test statistic to give a test statistic for the estimated p-value of the real data. Using this permutation methodology allows for easy implementation and makes it easier to understand.
I agree with the conclusions made in this study. The concerns they pointed out were clear when evaluating all of the data. They provided multiple plots to support visual information for the data seen in tables one and two. The plots show how each of the six alternatives provided would replace simulation studies. Also, as seen in the figures, there are differences in the means of their probes, which could be enough for analysis for an empirical study but not for the intent here, which is to analytically determine the significance. They also indicated that they would like to perform another study, where different variables would be controlled. The purpose of this is that they believed that these different factors could have affected the previous analysis and could have lead to misleading results. Another issue that should be considered is the statement that preliminary studies have indicated that there tends to be multiple sources of variability with the measurements used.