| Title: | Functional Segmentation of the Methylome |
|---|---|
| Description: | Implements FUSE (Functional Segmentation of DNA methylation data), a data-driven method for identifying spatially coherent DNA methylation segments from whole-genome bisulfite sequencing (WGBS) count data. The method performs hierarchical clustering of CpG sites based on methylated and unmethylated read counts across multiple samples and determines the optimal number of segments using an information criterion (AIC or BIC). Resulting segments represent regions with homogeneous methylation profiles across the input cohort while allowing sample-specific methylation levels. The package provides functions for clustering, model selection, tree cutting, segment-level summarization, and visualization. Input can be supplied as count matrices or extracted directly from 'BSseq' and 'methrix' objects. |
| Authors: | Susanna Holmström [aut, cre] (ORCID: <https://orcid.org/0009-0005-7542-5913>), Antti Hakkinen [ctb, cph] |
| Maintainer: | Susanna Holmström <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 1.1.0 |
| Built: | 2026-06-03 10:00:55 UTC |
| Source: | https://github.com/holmsusa/methfuse |
Produces a hierarchical clustering tree based on the input matrices of counts.
fuse.cluster(x, ...) ## Default S3 method: fuse.cluster(x, K1, chr = NULL, pos = NULL, ...) ## S3 method for class 'BSseq' fuse.cluster(x, ...) ## S3 method for class 'methrix' fuse.cluster(x, ...)fuse.cluster(x, ...) ## Default S3 method: fuse.cluster(x, K1, chr = NULL, pos = NULL, ...) ## S3 method for class 'BSseq' fuse.cluster(x, ...) ## S3 method for class 'methrix' fuse.cluster(x, ...)
x |
Input object. One of:
|
... |
Additional arguments if K0 is a matrix. |
K1 |
Methylated count matrix (if x is matrix). |
chr |
Chromosome labels (if x is matrix). |
pos |
Genomic positions (if x is matrix. |
A clustering tree of class hclust.
# Example: Clustering generated data set.seed(1234) K0 <- matrix( rep(c(sample(0:20, 200, replace = TRUE), sample(20:40, 200, replace = TRUE)), 2), nrow = 100, byrow = TRUE ) K1 <- matrix( rep(c(sample(20:40, 200, replace = TRUE), sample(0:20, 200, replace = TRUE)), 2), nrow = 100, byrow = TRUE ) tree <- fuse.cluster(K0, K1) tree# Example: Clustering generated data set.seed(1234) K0 <- matrix( rep(c(sample(0:20, 200, replace = TRUE), sample(20:40, 200, replace = TRUE)), 2), nrow = 100, byrow = TRUE ) K1 <- matrix( rep(c(sample(20:40, 200, replace = TRUE), sample(0:20, 200, replace = TRUE)), 2), nrow = 100, byrow = TRUE ) tree <- fuse.cluster(K0, K1) tree
Divides the clustering tree into a specified number of clusters.
fuse.cut.tree(tree, k)fuse.cut.tree(tree, k)
tree |
Clustering tree of class |
k |
Number of clusters |
A vector indicating which cluster each element in the original data frame belonged to
# Example: Cutting small tree in 2 segments set.seed(1234) K0 <- matrix( rep(c(sample(0:20, 200, replace = TRUE), sample(20:40, 200, replace = TRUE)), 2), nrow = 100, byrow = TRUE ) K1 <- matrix( rep(c(sample(20:40, 200, replace = TRUE), sample(0:20, 200, replace = TRUE)), 2), nrow = 100, byrow = TRUE ) tree <- fuse.cluster(K0, K1) segments <- fuse.cut.tree(tree, 2) segments# Example: Cutting small tree in 2 segments set.seed(1234) K0 <- matrix( rep(c(sample(0:20, 200, replace = TRUE), sample(20:40, 200, replace = TRUE)), 2), nrow = 100, byrow = TRUE ) K1 <- matrix( rep(c(sample(20:40, 200, replace = TRUE), sample(0:20, 200, replace = TRUE)), 2), nrow = 100, byrow = TRUE ) tree <- fuse.cluster(K0, K1) segments <- fuse.cut.tree(tree, 2) segments
Performs the full FUSE segmentation workflow: hierarchical clustering, model selection, tree cutting, and genomic segment summarization.
fuse.segment(x, ...) ## Default S3 method: fuse.segment(x, K1, chr, pos, method = c("BIC", "AIC"), ...) ## S3 method for class 'BSseq' fuse.segment(x, method = c("BIC", "AIC"), ...) ## S3 method for class 'methrix' fuse.segment(x, method = c("BIC", "AIC"), ...)fuse.segment(x, ...) ## Default S3 method: fuse.segment(x, K1, chr, pos, method = c("BIC", "AIC"), ...) ## S3 method for class 'BSseq' fuse.segment(x, method = c("BIC", "AIC"), ...) ## S3 method for class 'methrix' fuse.segment(x, method = c("BIC", "AIC"), ...)
x |
Input object. One of:
|
... |
Additional arguments (matrix input only) |
K1 |
Methylated count matrix (if x is matrix). |
chr |
Chromosome labels (if x is matrix). |
pos |
Genomic positions (if x is matrix. |
method |
Information criterion for model selection:
For internal use, |
fuse.segment() is an S3 generic with methods for:
matrixRaw count matrices (K0, K1) with genomic annotation.
BSseqBioconductor BSseq objects.
methrixBioconductor methrix objects (supports DelayedMatrix).
An object of class fuse_summary, containing:
Data frame with one row per genomic segment.
Matrix of per-sample methylation estimates.
Per-CpG methylation estimates.
Genomic positions of CpGs.
For BSseq objects:
Methylated counts are obtained via getCoverage(x, "M")
Unmethylated counts via getCoverage(x, "Cov") - M
Chromosome and position from rowRanges(x)
For methrix objects:
Methylated counts via get_matrix(x, "M")
Total coverage via get_matrix(x, "C")
Unmethylated counts computed as C - M
Genomic coordinates extracted from locus metadata
Summarizes FUSE segmentation results into one row per segment, including genomic coordinates, CpG count, segment length, average methylation (beta), and stability flag based on likelihood testing. Also returns per-sample methylation estimates for each segment. Result can be visualized using plot(result).
fuse.summary(K0, K1, chr, pos, segments)fuse.summary(K0, K1, chr, pos, segments)
K0 |
Integer or numeric matrix of unmethylated counts. |
K1 |
Integer or numeric matrix of methylated counts. |
chr |
Character vector giving the chromosome for each site. |
pos |
Numeric vector giving genomic coordinates for each site. |
segments |
Integer vector giving segment IDs for each site in K0 and K1. |
A list with four elements:
A data frame with one row per segment and the following columns:
Segment: Segment ID
Chr: Chromosome
Start: Start genomic coordinate
End: End genomic coordinate
CpGs: Number of CpGs in the segment
Length: Genomic length (End - Start + 1)
Beta: Average methylation across samples and CpGs
Coherent: Logical indicator (TRUE if segment is coherently methylated, else FALSE)
Matrix of per-sample methylation estimates for each segment (rows = segments, columns = samples).
Average beta per CpG site, used for plotting.
Genomic position for every given CpG site, used for plotting.
set.seed(1234) K0 <- matrix( rep(c(sample(0:20, 200, replace = TRUE), sample(20:40, 200, replace = TRUE)), 2), nrow = 100, byrow = TRUE ) K1 <- matrix( rep(c(sample(20:40, 200, replace = TRUE), sample(0:20, 200, replace = TRUE)), 2), nrow = 100, byrow = TRUE ) tree <- fuse.cluster(K0, K1) segments <- fuse.cut.tree(tree, 4) res <- fuse.summary(K0, K1, rep("chr1", nrow(K0)), 1:nrow(K0), segments) head(res$summary) head(res$betas_per_segment)set.seed(1234) K0 <- matrix( rep(c(sample(0:20, 200, replace = TRUE), sample(20:40, 200, replace = TRUE)), 2), nrow = 100, byrow = TRUE ) K1 <- matrix( rep(c(sample(20:40, 200, replace = TRUE), sample(0:20, 200, replace = TRUE)), 2), nrow = 100, byrow = TRUE ) tree <- fuse.cluster(K0, K1) segments <- fuse.cut.tree(tree, 4) res <- fuse.summary(K0, K1, rep("chr1", nrow(K0)), 1:nrow(K0), segments) head(res$summary) head(res$betas_per_segment)
Determines the optimal number of clusters to cut a hierarchical clustering tree, based on the selected information criterion (e.g., BIC or AIC).
number.of.clusters(tree, n, method = c("BIC", "AIC"))number.of.clusters(tree, n, method = c("BIC", "AIC"))
tree |
Clustering tree of class |
n |
Number of samples in the original data. |
method |
Information criterion method. One of |
An integer representing the optimal number of clusters.
# Example: Determine number of clusters in dummy data set set.seed(1234) K0 <- matrix( rep(c(sample(0:20, 200, replace = TRUE), sample(20:40, 200, replace = TRUE)), 2), nrow = 100, byrow = TRUE ) K1 <- matrix( rep(c(sample(20:40, 200, replace = TRUE), sample(0:20, 200, replace = TRUE)), 2), nrow = 100, byrow = TRUE ) tree <- fuse.cluster(K0, K1) k <- number.of.clusters(tree, ncol(K0), 'BIC') k# Example: Determine number of clusters in dummy data set set.seed(1234) K0 <- matrix( rep(c(sample(0:20, 200, replace = TRUE), sample(20:40, 200, replace = TRUE)), 2), nrow = 100, byrow = TRUE ) K1 <- matrix( rep(c(sample(20:40, 200, replace = TRUE), sample(0:20, 200, replace = TRUE)), 2), nrow = 100, byrow = TRUE ) tree <- fuse.cluster(K0, K1) k <- number.of.clusters(tree, ncol(K0), 'BIC') k
Plotting method for fuse_summary.
## S3 method for class 'fuse_summary' plot(x, ..., segments_to_plot = 1:50)## S3 method for class 'fuse_summary' plot(x, ..., segments_to_plot = 1:50)
x |
A fuse_summary object |
... |
Additional arguments |
segments_to_plot |
Integer vector of segment indices |
Raw CpG-level methylation values are shown as grey points. Segment-level methylation is shown as horizontal bars (red = hypermethylated, blue = hypomethylated).
No return value, called for side effects.