UMAP for Data Integration

UMAP for Data Integration

In this article, I will demonstrate how one can perform graph based across modalities integration of single cell Omics (scOmics) data by using graph intersection approach with Igraph and UMAP.

This is the twentieth article from the column Mathematical Statistics and Machine Learning for Life Sciences where I try to explain some mysterious analytical techniques used in Bioinformatics and Computational Biology in a simple way. Data integration is an important next step for improving analysis accuracy by utilizing synergistic effects via combining multiple sources of information. In Computational Biology and Biomedicine, data integration is making particular advances in Single Cell research area. Last year, Nature recognized Single Cell Multimodal Omics Integration as a method of the year 2019. In this article, I will demonstrate how one can perform graph based across modalities integration of single cell Omics (scOmics) data by using graph intersection approach with Igraph and** UMAP**.

Idea Behind Data Integration

When combining data originating from different statistical distributions, it is probably not a good idea to simply concatenate them together without taking into account their individual distributions as some of them might be of binary, categorical or continuous nature. Working with scOmics data one often deals with continuous scRNAseq (transcriptomics) / scProteomics (proteomics) and binary scBSseq (methylation) / scATAseq (open chromatin region) data. Suppose the individual data types are capable of distinguishing cells from sick and healthy individuals with an accuracy of 78%, 83% and 75% (the numbers are made up). Then, one expectation from combining the multiple scOmics together would be a boost in prediction accuracy (e.g. up to 96%) due to reduction of technology-specific noise and enhancing the consistent across scOmics signal.

Image for post

Image by Author

There can be different approaches to combining the individual scOmics. One possible way could be converting the scOmics into a common non-parametric space where they loose the memory about their technology of origin. This is how e.g. Artificial Neural Networks (ANN) are capable of collapsing multiple sources of information, that I describes in details in one of my previous posts. The Similarity Network Fusion (SNF) also belongs to this type of approach. Another approach would be explicitly model the individual data distributions and combining them as a joint probability using Bayes rule. Finally, one can try to extract common variation (i.e. disentangle the technology-specific and shared variation in the data) and factor it (split into orthogonal components) for better interpretation, a way used by PLSCCA and Factor Analysis.

editors-pick machine-learning stats-ml-life-sciences towards-data-science data-science

Bootstrap 5 Complete Course with Examples

Bootstrap 5 Tutorial - Bootstrap 5 Crash Course for Beginners

Nest.JS Tutorial for Beginners

Hello Vue 3: A First Look at Vue 3 and the Composition API

Building a simple Applications with Vue 3

Deno Crash Course: Explore Deno and Create a full REST API with Deno

How to Build a Real-time Chat App with Deno and WebSockets

Convert HTML to Markdown Online

HTML entity encoder decoder Online

15 Machine Learning and Data Science Project Ideas with Datasets

Learning is a new fun in the field of Machine Learning and Data Science. In this article, we’ll be discussing 15 machine learning and data science projects.

Most popular Data Science and Machine Learning courses — July 2020

Most popular Data Science and Machine Learning courses — August 2020. This list was last updated in August 2020 — and will be updated regularly so as to keep it relevant

“How’d you get started with machine learning and data science?”

“How’d you get started with machine learning and data science?”: I trained my first model in 2017 on my friend's lounge room floor.

Difference between Machine Learning, Data Science, AI, Deep Learning, and Statistics

In this article, I clarify the various roles of the data scientist, and how data science compares and overlaps with related fields such as machine learning, deep learning, AI, statistics, IoT, operations research, and applied mathematics.

Best Free Datasets for Data Science and Machine Learning Projects

This post will help you in finding different websites where you can easily get free Datasets to practice and develop projects in Data Science and Machine Learning.