Personal tools
Home Rätsch Lab Talks & Lectures Datafusion-Bio-IT-2010

Kernel Methods for Fusing Heterogeneous Data

Course by Gunnar Rätsch at Bio-IT 2010 in Hannover, October 4, 2010.

This presentation is part of the CHI Molecular Diagnostics pre-conference course Introduction to Biomedical Data Fusion.


Abstract

Kernel methods, in particular support vector machines, have established themselves as a very powerful and versatile paradigm for learning from high-dimensional data. Kernels have been developed not only to deal with numerical data but also sequence information or even graphs representing e.g. protein-protein interaction data. Their widespread use for developing molecular signatures as well as the large number and diversity of bioinformatics applications testify the power of this approach.

Adding to that the ability to combine various kernels irrespective of their underlying data type and to learn optimal combinations from the data itself provides therefore a unique tool for achieving optimal prediction performance and data understanding through data fusion.

This course will give a brief introduction to kernel methods, an overview over the various types of kernels relevant to biological data and discuss the use of kernel combination for data fusion. It will also present a corresponding machine learning toolbox which Dr. Rätsch's group has developed for unified large scale learning from a broad range of data including also the fusion of data from very diverse sources.



Overview

The course will be structured as follows:

  • Introduction to Support Vector Machines (SVMs)
  • The kernel concept
  • Kernels for non-vectorial Data
  • Integration of heterogeneous data
  • Illustrative examples
  • Shogun Software

For the tutorial paper we have developed a Galaxy-based web service and a toolbox (easySVM) that can be easily used for most of the problems considered in the tutorial. Examples for using the software can be found here.


Further reading

Books:

Kernel Methods in Computational Biology Introduction to Computational Genomics Learning with Kernels Large-Scale Kernel Machines Semi-Supervised Learning Structured Output Learning

Papers:


Acknowledgements

I gratefully acknowledge help from Sören Sonnenburg and Cheng Soon Ong for preparing an earlier version of this tutorial. Moreover, slides were contributed by Peter Gehler, Karsten Borgwardt and Petra Philips.


Contact

In case of comments, problems, questions etc. feel free to contact Dr. Gunnar Rätsch.

Dr. Rätsch is heading the research group for Machine Learning in Biology at the Friedrich Miescher Laboratory of the Max-Planck Society. His earlier works on boosting and support vector machines lead to his current interest of applying machine learning to real world problems from computational biology. Besides their works on using kernel methods for data fusion, his group focuses e.g. on novel analysis methods for next generation sequencing data, the prediction of MHC binding, ab initio gene finding in nematode genomes and the prediction and validation of transcriptional regulation (e.g. alternative splicing).

Document Actions