Machine learning methods for precision medicine with omics data

Sunday January 10th, 9AM-12:30PM

Course Description

A key bioinformatics task in precision medicine is the precise taxonomy of diseases with genomics data and other types of omics data. Machine learning methods including feature selection from high-dimensional data, supervised classification and unsupervised clustering play major roles in this task.

There have been many such practices on small-scale genomics studies in the past 15 years, which had illustrated the power of machine learning approaches in the field, and also revealed many pitfalls if applied improperly. The recent initiatives on Precision Medicine in the US and some other nations are expected to scale up the data of genomics and other omics studies by several magnitudes. Such data will also be joined by big data from medical practices. The efficient and right application of machine learning methods is crucial for getting the correct knowledge from the big data. On the other hand, the world is evidencing fast development of the machine learning and artificial intelligence fields in recent years. New methods like deep learning and probabilistic learning, among many others, have shown amazing close-to human-level performances in many tasks like image recognition, text mining and the analysis of big data on the internet. These new methods have high potential in applications on big biological data. But biological data have their own unique characteristics. Unlike kits and protocols for well-developed bench experiments, for most advanced machine learning methods, it’s unlikely to achieve reliable new knowledge from big omics data if methods are applied as automated tools without real understanding of the principle behind the methods and without real understanding of the investigated biological question.

This tutorial will introduce the framework of machine learning theories, explain the major principles of classical and newly-emerging machine learning methods, present details of some representative methods and their application examples, and discuss their potentials, open questions and common pitfalls in the application on precision medicine studies.

Instructor

Xuegong Zhang, Professor of Pattern Recognition and Bioinformatics, Tsinghua University
Rui Jiang, Associate Professor of Bioinformatics and Pattern Recognition, Tsinghua University

Who should attend

Bioinformatics Researchers who are interested in the recent advances of machine learning, and who had previous exposure to machine learning concepts/methods but may not have a systematic view.
Biologists and Medical Scientists who are curious about the recent big buzz of machine learning and AI, and are wondering about opportunities of ML applications.

Short Course Agenda

Introduction (~20 min)
- Introduction to Precision Medicine concepts
- Introduction to Machine Learning concepts
- Performance assessment of a learning machine
Supervised Learning (~40 min)
- Classical methods: a quick list
- Artificial Neural Networks
- Support Vector Machines and the VC theory about generalization
Unsupervised Learning (~30 min)
- Classical clustering methods
- Density estimation
- Latent variable models
Deep Learning (~60 min)
- Deep learning vs. “shallow learning”
- Convolutional neural networks (CNN)
- Auto-encoders (AE)
- Deep belief networks (DBN) and restricted Boltzman machines (RBM)
Enabling technologies for machine learning with big data: a brief view (~20 min)
- Parallel computing
- Heterogeneous computing (GPU, FPGA, etc.)
- Distributed computing
- Cloud computing
Useful Resources and Discussions (~20 min)

Registration

Please refer to the Registration section for pricing and registration link.