Unsupervised | | | | |
Network-based methods (variables) | Identification of subgroups of variables that span several data types | Coexpression measures are used to build a coexpression network across all data types, where variables are nodes and links represent correlations; clustering algorithms are latter applied to achieve groups of variables | Outperforms single data type network analysis; combines the network with known pathways; large multilayer network density complicates biological interpretation | PAthway Representation and Analysis by DIrect Reference on Graphical Models (PARADIGM) [70]; Lemon-Tree [71] |
Network-based methods (patients) | Classifies patients on the basis of the similarity of their omic data | Constructs networks of samples for each available data type and efficiently fuses these into one comprehensive network | Identifies homogeneous individual clusters; goes beyond dichotomous patient classification by capturing continuous phenotypes; does not provide information on which biological features cluster individuals | Similarity network fusion (SNF) [69] |
Bayesian methods | With the different layers of omics builds a Bayesian model of relations with the aim to create classifiers when related to clinical information | Computes probabilistic relationships among variables that can express mutual dependencies between them | Allows prior assumptions for each data type distribution and between data sets; good at modelling nonlinear relationships; high computational cost; difficulty of choosing prior distributions | Multiple dataset integration (MDI) [72]; patient-specific data fusion (PSDF) [73]; Bayesian consensus clustering (BCC) [74]; COpy Number and EXpression In Cancer (CONEXIC) [75] |
Matrix factorisation methods | Projects variations among data sets onto dimension-reduced space | Algorithms aim to unravel a latent data matrix of reduced dimensionality that best explains observed variables’ variations among all data sets | Noise, dimension and heterogeneity reduction; requires heavy computation time and large memory | Joint non-negative matrix factorisation (NMF) [76]; iCluster+ [77]; joint Bayes factor [78] |
Supervised | | | | |
Network-based methods | Clinically meaningful groups are chosen as input for different network construction | Network construction relies on clinical characteristics of patients so as to allow analysis of underlying networks | Network models are tailored for prognosis or diagnosis; comparison of resulting networks is not straightforward | Analysis Tool for Heritable and Environmental Network Associations (ATHENA) [79]; jActiveModules [80] |
Machine learning | Clinical covariates and omics data are included in a machine learning model for prediction or classification | Machine learning methods use various kernel-based frameworks for data transformation, integration and classifier training | The model obtained is better when large amounts of data are used for training the system; machine learning kernel parameterisation can be difficult | Semidefinite programming/support vector machine (SDP/SVM) [81]; feature selection multiple kernel learning (FSMKL) [82] |
Multistep analysis | A step-by-step semiautomated data integration process | Works independently on the different layers prior to integrating them by identifying variables differentially expressed in several layers | User has more control flexibility over the workflow, filtering and selection of relevant biological/clinical information; relatively lower statistical or predictive power | Integrative Bayesian analysis of genomics data (iBAG) [83]; multiple concerted disruption (MCD) [84]; Anduril [85] |