Jorge Numata (firstname.lastname@example.org)
Oliver Ebenhöh (email@example.com)
Ernst-Walter Knapp (firstname.lastname@example.org)
 Macromolecular Modeling Group, Freie Universitat Berlin, Takustr. 6, Berlin, 14195 Germany
 Systems Biology and Mathematical Modelling, Max Planck Institute of Molecular Plant Physiology, Am Muhlenberg 1, Potsdam-Golm, 14476 Germany
Non-linear correlations based on mutual information are evaluated to measure statistical dependencies among data points measured from metabolism in two dimensional space. While the Pearson correlation coefficient is only rigorously applicable to characterize strictly linear correlations with Gaussian noise, the mutual information coefficient is more generally valid. Here, we use recent distribution-free (non-parametric) mutual information estimators based on k-nearest neighbor distances. The mutual information algorithm of Kraskov et al. is found to yield estimates with low systematic and statistical error. The significance of the different methods is probed for artificial sets of tens to hundreds of data points, a size currently typical for metabolomic data. We analyze experimental data on metabolite concentrations from Arabidopsis thaliana by using these procedures. The mutual information was able to detect additional non-linear correlations undetectable for the Pearson coefficient.