The epiCMIT (epigenetically-determined Cumulative MIToses) mitotic clock represents a relative measure of the total proliferative history of normal and neoplastic B cells. It is built considering the highest score from their two underlying hyper- and hypomethylation-based mitotic clocks, called epiCMIT-hyper and the epiCMIT-hypo, respectively. The code I provide here calculates the three mitotic clocks for each sample. All of them range from 0 to 1, depending on low or high relative proliferative history. Based on the data analyzed in the manuscript, considering hyper- or hypomethylation separately may not be sufficient to capture the entire mitotic history of cancer cells. A comprehensive selection of CpGs was performed to build the epiCMIT. Nonetheless, given the careful CpG filtering it is likely that epiCMIT represent a pan-cancer mitotic clock. Please note that the proliferative history of B-cell tumors include the proliferative history of normal B-cell development and the proliferative history of malignant transformation and progression. The epiCMIT should be compared then among B-cell tumors with the same cellular origin! The current code supports several DNA methylation approaches, including 450k and EPIC Illumina arrays, and the next generation sequencing (NGS) approaches single (S) and paired end (PE) RRBS, ERRBS and WGBS. The original epiCMIT implementation for Illumina arrays was developed in Duran-Ferrer M 2020, whereas the strategy for NGS-based approaches was developed in the CLL-map study Knisbacher, Lin, Hahn, Nadeu, Duran-Ferrer, 2022.
The epiCMIT mitotic clock, from Strati, P., Green, M.R. A, Nat Cancer, 2020.
Please, make sure you have at least version 1.44.0 of the GenomicRanges package to run the following lines of code. A complete list of packages and versions used to compile this vignette is available at the end.
## Load packages and set general options
options(stringsAsFactors = F,error=NULL)
##needed packages
library(GenomicRanges)
## Loading required package: stats4
## Loading required package: BiocGenerics
##
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:stats':
##
## IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
##
## anyDuplicated, append, as.data.frame, basename, cbind, colnames,
## dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
## grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
## order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
## rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
## union, unique, unsplit, which.max, which.min
## Loading required package: S4Vectors
##
## Attaching package: 'S4Vectors'
## The following objects are masked from 'package:base':
##
## expand.grid, I, unname
## Loading required package: IRanges
## Loading required package: GenomeInfoDb
library(data.table)
##
## Attaching package: 'data.table'
## The following object is masked from 'package:GenomicRanges':
##
## shift
## The following object is masked from 'package:IRanges':
##
## shift
## The following objects are masked from 'package:S4Vectors':
##
## first, second
We will download the data from github. Make sure you have internet connection.
download.file("https://github.com/Duran-FerrerM/Pan-B-cell-methylome/raw/master/data/Estimate.epiCMIT.RData", destfile = "Estimate.epiCMIT.RData", method="libcurl")
load("Estimate.epiCMIT.RData")
## Take a look at RRBS data. Please, note that your data can be a matrix, data.frame or GRange object.
print(RRBS.SE.hg19.Examples$CW101)
## NULL
The methodology to estimate the epiCMIT is the following:
DNAm.to.epiCMIT
. This function converts a DNA methylation matrix (DNAm) into a suitable GRange object to calculate the epiCMIT and contains the following arguments:DNAm
: DNA methylation matrix. Should be a matrix, data.frame or GRanges object.DNAm.genome.assembly
: The genome assembly of the DNAm matrix. Supported versions are “hg19” and “hg38”.map.DNAm.to
: Map the data to Illumina-450k epiCMIT-CpGs or WGBS epiCMIT-CpGs. Should be “Illumina.450K.epiCMIT” or “WGBS.epiCMIT”.min.epiCMIT.CpGs
: Integer, default 800 CpGs. Recommended a minimum of 800 epiCMIT-CpGs, less CpGs have not been thoroughly tested.CpGs.as.DNAm.rownames
: logical, default TRUE. Are rownames of your matrix CpG names? If so they will be used for mapping. If FALSE, coordinates should be provided. If DNAm
is of class matrix and CpGs.as.DNAm.rownames=FALSE
, CpG coordinates and DNAm info columns from DNAm should be provided in the following arguments:DNAm.chr
, DNAm.start
, DNAm.end
, DNAm.meth
DNAm.to.epiCMIT
with the epiCMIT
function, which contains the following arguments:DNAm.epiCMIT
:The GRange object after running DNAm.to.epiCMIT
function.return.epiCMIT.annot
: logical indicating whether to export the epiCMIT-CpGs used and metadata info.export.results
: logical, indicating whether to export the results.export.results.dir
: character indicating the path to export the results.export.results.name
: character string with the name for the exported file.The original epiCMIT estimation is calculated from 450k and EPIC DNA methylation data.
##
## Calculate epiCMIT in example Illumina 450K data from Duran-Ferrer 2020, Nature Cancer
##
head(Illumina.450k.hg19.example)
## S1 S1.1 Bcells.NBC Bcells.NBC.1 Bcells.GC.CB
## cg19715410 0.03281870 0.03204408 0.1315335 0.0700329 0.1107626
## cg18279094 0.02607567 0.02970688 0.1606909 0.1352674 0.2294113
## cg11823511 0.04829795 0.06345230 0.2384474 0.1095039 0.2925793
## cg24453699 0.08637016 0.06725103 0.2145321 0.1373994 0.4452114
## cg03989260 0.08927458 0.10001880 0.3149984 0.2808355 0.3639931
## cg14565725 0.08295045 0.10734008 0.1662055 0.1278814 0.2003273
## Bcells.GC.CB.1 Bcells.ncsMBC Bcells.ncsMBC.1 Bcells.bmPC
## cg19715410 0.2497778 0.2458823 0.4375922 0.5352242
## cg18279094 0.4184098 0.4309570 0.5719976 0.6644706
## cg11823511 0.4131022 0.3857834 0.5039130 0.6403677
## cg24453699 0.4590231 0.3955716 0.4534276 0.6935022
## cg03989260 0.4303416 0.4102787 0.4789490 0.5853757
## cg14565725 0.3058882 0.3349291 0.3574508 0.5383428
## Bcells.bmPC.1 ALL_637 ALL_111 MCL.M199 MCL.M195
## cg19715410 0.5361781 0.6234612 0.5251786 0.2117264 0.2466298
## cg18279094 0.7122690 0.6979892 0.7992767 0.8451157 0.9494396
## cg11823511 0.6835919 0.6410632 0.2016332 0.7028695 0.9289529
## cg24453699 0.6827635 0.3946649 0.7623484 0.8900264 0.9667127
## cg03989260 0.5905094 0.7772374 0.6203727 0.8037098 0.9220087
## cg14565725 0.5540709 0.5948486 0.9020117 0.5273358 0.4137479
## DLBCL.DL30.D2611 DLBCL.DL33.D1450 CLL.131 CLL.519 MM_34475
## cg19715410 0.7191328 0.5545599 0.7516146 0.3368856 0.5450536
## cg18279094 0.8543303 0.5978679 0.8471168 0.8086633 0.8061107
## cg11823511 0.7264451 0.4858303 0.9281268 0.4706775 0.7506979
## cg24453699 0.8233838 0.4958754 0.9069049 0.4134561 0.8409076
## cg03989260 0.8014943 0.4929729 0.3928446 0.2600882 0.5510840
## cg14565725 0.7680909 0.5080088 0.2889624 0.2526042 0.2311253
## MM_38267
## cg19715410 0.4488824
## cg18279094 0.7297705
## cg11823511 0.8005615
## cg24453699 0.9191897
## cg03989260 0.2505447
## cg14565725 0.7836650
DNAm.epiCMIT <- DNAm.to.epiCMIT(DNAm = Illumina.450k.hg19.example,
DNAm.genome.assembly = "hg19",
map.DNAm.to = "Illumina.450K.epiCMIT",
min.epiCMIT.CpGs = 800 # minimum recommended
)
## Mapping the DNAm methylation matrix in hg19 genome assembly to 450K-based epiCMIT as reported in Duran-Ferrer.M 2020.
DNAm.epiCMIT
## GRanges object with 1348 ranges and 20 metadata columns:
## seqnames ranges strand | S1 S1.1 Bcells.NBC
## <Rle> <IRanges> <Rle> | <numeric> <numeric> <numeric>
## cg19715410 chr1 29139021 * | 0.0328187 0.0320441 0.131534
## cg18279094 chr1 63790044 * | 0.0260757 0.0297069 0.160691
## cg11823511 chr1 91183697 * | 0.0482980 0.0634523 0.238447
## cg24453699 chr1 91190891 * | 0.0863702 0.0672510 0.214532
## cg03989260 chr1 119529360 * | 0.0892746 0.1000188 0.314998
## ... ... ... ... . ... ... ...
## cg04819337 chr21 44982367 * | 0.964340 0.941688 0.825908
## cg05705496 chr21 45666671 * | 0.915620 0.919199 0.801698
## cg07443748 chr22 17073594 * | 0.935098 0.895979 0.818871
## cg02724647 chr22 17646207 * | 0.942553 0.933896 0.820176
## cg09331835 chr22 21692413 * | 0.915718 0.927865 0.836925
## Bcells.NBC.1 Bcells.GC.CB Bcells.GC.CB.1 Bcells.ncsMBC
## <numeric> <numeric> <numeric> <numeric>
## cg19715410 0.0700329 0.110763 0.249778 0.245882
## cg18279094 0.1352674 0.229411 0.418410 0.430957
## cg11823511 0.1095039 0.292579 0.413102 0.385783
## cg24453699 0.1373994 0.445211 0.459023 0.395572
## cg03989260 0.2808355 0.363993 0.430342 0.410279
## ... ... ... ... ...
## cg04819337 0.854883 0.644030 0.568468 0.548715
## cg05705496 0.873003 0.544333 0.520256 0.631695
## cg07443748 0.843503 0.623809 0.579205 0.543690
## cg02724647 0.862859 0.741641 0.672352 0.639689
## cg09331835 0.874563 0.815323 0.675565 0.646640
## Bcells.ncsMBC.1 Bcells.bmPC Bcells.bmPC.1 ALL_637 ALL_111
## <numeric> <numeric> <numeric> <numeric> <numeric>
## cg19715410 0.437592 0.535224 0.536178 0.623461 0.525179
## cg18279094 0.571998 0.664471 0.712269 0.697989 0.799277
## cg11823511 0.503913 0.640368 0.683592 0.641063 0.201633
## cg24453699 0.453428 0.693502 0.682763 0.394665 0.762348
## cg03989260 0.478949 0.585376 0.590509 0.777237 0.620373
## ... ... ... ... ... ...
## cg04819337 0.520183 0.306169 0.332087 0.881284 0.933870
## cg05705496 0.568901 0.340151 0.351566 0.483426 0.427183
## cg07443748 0.504945 0.322319 0.342618 0.432292 0.179971
## cg02724647 0.615848 0.436913 0.423863 0.913431 0.911068
## cg09331835 0.637598 0.403076 0.382335 0.815445 0.723059
## MCL.M199 MCL.M195 DLBCL.DL30.D2611 DLBCL.DL33.D1450 CLL.131
## <numeric> <numeric> <numeric> <numeric> <numeric>
## cg19715410 0.211726 0.246630 0.719133 0.554560 0.751615
## cg18279094 0.845116 0.949440 0.854330 0.597868 0.847117
## cg11823511 0.702870 0.928953 0.726445 0.485830 0.928127
## cg24453699 0.890026 0.966713 0.823384 0.495875 0.906905
## cg03989260 0.803710 0.922009 0.801494 0.492973 0.392845
## ... ... ... ... ... ...
## cg04819337 0.495737 0.3719058 0.430579 0.446953 0.413158
## cg05705496 0.302435 0.3551045 0.252200 0.728992 0.519884
## cg07443748 0.545637 0.0859663 0.362245 0.576011 0.211775
## cg02724647 0.519634 0.2636301 0.571378 0.543850 0.189855
## cg09331835 0.613091 0.2608929 0.728922 0.686221 0.539346
## CLL.519 MM_34475 MM_38267
## <numeric> <numeric> <numeric>
## cg19715410 0.336886 0.545054 0.448882
## cg18279094 0.808663 0.806111 0.729771
## cg11823511 0.470678 0.750698 0.800561
## cg24453699 0.413456 0.840908 0.919190
## cg03989260 0.260088 0.551084 0.250545
## ... ... ... ...
## cg04819337 0.731931 0.273938 0.209698
## cg05705496 0.663109 0.374354 0.299490
## cg07443748 0.408895 0.305450 0.217465
## cg02724647 0.858257 0.368123 0.226500
## cg09331835 0.688484 0.300242 0.266912
## -------
## seqinfo: 22 sequences from an unspecified genome; no seqlengths
##calculate epiCMIT
epiCMIT.Illumina <- epiCMIT(DNAm.epiCMIT = DNAm.epiCMIT,
return.epiCMIT.annot = FALSE,
export.results = TRUE,
export.results.dir = ".",
export.results.name = "Illumina.450k.example_"
)
## Calculating the epiCMIT mitotic clock with DNAm.to.epiCMIT() parameters:
## DNAm.class = data.frame
## DNAm.genome.assembly = hg19
## map.DNAm.to = Illumina.450K.epiCMIT
## CpGs.as.DNAm.rownames = TRUE
## DNAm.chr = NULL
## DNAm.start = NULL
## DNAm.end = NULL
## DNAm.meth = NULL
## min.epiCMIT.CpGs = 800
head(epiCMIT.Illumina$epiCMIT.scores)
## Samples epiCMIT epiCMIT.hyper epiCMIT.hypo
## S1 S1 0.06727728 0.06428607 0.06727728
## S1.1 S1.1 0.06690286 0.06087516 0.06690286
## Bcells.NBC Bcells.NBC 0.18628517 0.18150005 0.18628517
## Bcells.NBC.1 Bcells.NBC.1 0.15363033 0.14141785 0.15363033
## Bcells.GC.CB Bcells.GC.CB 0.33186043 0.22794942 0.33186043
## Bcells.GC.CB.1 Bcells.GC.CB.1 0.40810107 0.35610520 0.40810107
epiCMIT.Illumina.results <- cbind(epiCMIT.Illumina$epiCMIT.scores,
epiCMIT.CpGs=epiCMIT.Illumina$epiCMIT.run.info$epiCMIT.CpGs,
epiCMIT.hyper.CpGs=epiCMIT.Illumina$epiCMIT.run.info$epiCMIT.hyper.CpGs,
epiCMIT.hypo.CpGs=epiCMIT.Illumina$epiCMIT.run.info$epiCMIT.hypo.CpGs
)
DT::datatable(epiCMIT.Illumina.results, options = list(scrollX = T, scrollY=T), rownames = F)
##export
fwrite(epiCMIT.Illumina.results,"epiCMIT.Illumina.arrays.scores.tsv",sep="\t")
For NGS-based DNA methylaion data, a new strategy was developed in the CLL-map study. Briefly, an extended epiCMIT-CpG catalogue was found using Whole Genome Bisulphite data and is systematically searched in each sample separately, irrespective of CpG numbers and identity. All epiCMIT-CpG set behaves very similar and thus can be applied. You can check a more detailed tutorial here.
To estimate the epiCMIT in all samples, we can then create a loop with the R function lapply
and run both functions at the same time for all the samples to subsequently store all the results into a list. To do so, we can execute the following lines of code:
##
## Example with RRBS-SE hg19 data
##
epiCMIT.RRBS <- suppressWarnings( ## supress warning so that all warning do not appear in the html document.
suppressMessages(
lapply(names(RRBS.SE.hg19.Examples),function(sample.i){
##transform data to run epiCMIT() function.
DNAm.epiCMIT <- DNAm.to.epiCMIT(DNAm = RRBS.SE.hg19.Examples[[sample.i]],
DNAm.genome.assembly = "hg19",
map.DNAm.to = "WGBS.epiCMIT",
min.epiCMIT.CpGs = 800 # minimum recommended
)
##calculate epiCMIT
epiCMIT.results <- epiCMIT(DNAm.epiCMIT = DNAm.epiCMIT,
return.epiCMIT.annot = TRUE,
export.results = FALSE, ## change to TRUE to export results for every single sample
export.results.dir = ".",
export.results.name = paste0("RRBS-SE.",sample.i,"_example_")
)
})
)
)
We now can gather all the relevant results into a matrix-like structure so that we can export all the table at once.
## prepare data to export it.
epiCMIT.RRBS.scores <- do.call(rbind,lapply(epiCMIT.RRBS,function(x){x[["epiCMIT.scores"]]}))
epiCMIT.RRBS.scores$epiCMIT.CpGs <- as.numeric(do.call(rbind,lapply(epiCMIT.RRBS,function(x){x[["epiCMIT.run.info"]][["epiCMIT.CpGs"]]})))
epiCMIT.RRBS.scores$epiCMIT.hyper.CpGs <- as.numeric(do.call(rbind,lapply(epiCMIT.RRBS,function(x){x[["epiCMIT.run.info"]][["epiCMIT.hyper.CpGs"]]})))
epiCMIT.RRBS.scores$epiCMIT.hypo.CpGs <- as.numeric(do.call(rbind,lapply(epiCMIT.RRBS,function(x){x[["epiCMIT.run.info"]][["epiCMIT.hypo.CpGs"]]})))
head(epiCMIT.RRBS.scores[,-1])
## epiCMIT epiCMIT.hyper epiCMIT.hypo epiCMIT.CpGs epiCMIT.hyper.CpGs
## DFCI-5069 0.4691847 0.2696861 0.4691847 4826 1038
## DFCI-5071 0.5942540 0.5942540 0.4730709 3065 743
## DFCI-5102 0.5817348 0.4411374 0.5817348 5674 1236
## DFCI-5072 0.4945748 0.4693410 0.4945748 4362 955
## DFCI-5073 0.5578167 0.4235328 0.5578167 4258 958
## DFCI-5074 0.6760768 0.6065394 0.6760768 1096 241
## epiCMIT.hypo.CpGs
## DFCI-5069 3788
## DFCI-5071 2322
## DFCI-5102 4438
## DFCI-5072 3407
## DFCI-5073 3300
## DFCI-5074 855
DT::datatable(epiCMIT.RRBS.scores, options = list(scrollX = T, scrollY=T), rownames = F)
##export
fwrite(epiCMIT.RRBS.scores,"epiCMIT.RRBS.scores.tsv",sep="\t")
knitr::opts_chunk$set(echo = TRUE)
## Load packages and set general options
options(stringsAsFactors = F,error=NULL)
##needed packages
library(GenomicRanges)
library(data.table)
download.file("https://github.com/Duran-FerrerM/Pan-B-cell-methylome/raw/master/data/Estimate.epiCMIT.RData", destfile = "Estimate.epiCMIT.RData", method="libcurl")
load("Estimate.epiCMIT.RData")
## Take a look at RRBS data. Please, note that your data can be a matrix, data.frame or GRange object.
print(RRBS.SE.hg19.Examples$CW101)
##
## Calculate epiCMIT in example Illumina 450K data from Duran-Ferrer 2020, Nature Cancer
##
head(Illumina.450k.hg19.example)
DNAm.epiCMIT <- DNAm.to.epiCMIT(DNAm = Illumina.450k.hg19.example,
DNAm.genome.assembly = "hg19",
map.DNAm.to = "Illumina.450K.epiCMIT",
min.epiCMIT.CpGs = 800 # minimum recommended
)
DNAm.epiCMIT
##calculate epiCMIT
epiCMIT.Illumina <- epiCMIT(DNAm.epiCMIT = DNAm.epiCMIT,
return.epiCMIT.annot = FALSE,
export.results = TRUE,
export.results.dir = ".",
export.results.name = "Illumina.450k.example_"
)
head(epiCMIT.Illumina$epiCMIT.scores)
epiCMIT.Illumina.results <- cbind(epiCMIT.Illumina$epiCMIT.scores,
epiCMIT.CpGs=epiCMIT.Illumina$epiCMIT.run.info$epiCMIT.CpGs,
epiCMIT.hyper.CpGs=epiCMIT.Illumina$epiCMIT.run.info$epiCMIT.hyper.CpGs,
epiCMIT.hypo.CpGs=epiCMIT.Illumina$epiCMIT.run.info$epiCMIT.hypo.CpGs
)
DT::datatable(epiCMIT.Illumina.results, options = list(scrollX = T, scrollY=T), rownames = F)
##export
fwrite(epiCMIT.Illumina.results,"epiCMIT.Illumina.arrays.scores.tsv",sep="\t")
##
## Example with RRBS-SE hg19 data
##
epiCMIT.RRBS <- suppressWarnings( ## supress warning so that all warning do not appear in the html document.
suppressMessages(
lapply(names(RRBS.SE.hg19.Examples),function(sample.i){
##transform data to run epiCMIT() function.
DNAm.epiCMIT <- DNAm.to.epiCMIT(DNAm = RRBS.SE.hg19.Examples[[sample.i]],
DNAm.genome.assembly = "hg19",
map.DNAm.to = "WGBS.epiCMIT",
min.epiCMIT.CpGs = 800 # minimum recommended
)
##calculate epiCMIT
epiCMIT.results <- epiCMIT(DNAm.epiCMIT = DNAm.epiCMIT,
return.epiCMIT.annot = TRUE,
export.results = FALSE, ## change to TRUE to export results for every single sample
export.results.dir = ".",
export.results.name = paste0("RRBS-SE.",sample.i,"_example_")
)
})
)
)
## prepare data to export it.
epiCMIT.RRBS.scores <- do.call(rbind,lapply(epiCMIT.RRBS,function(x){x[["epiCMIT.scores"]]}))
epiCMIT.RRBS.scores$epiCMIT.CpGs <- as.numeric(do.call(rbind,lapply(epiCMIT.RRBS,function(x){x[["epiCMIT.run.info"]][["epiCMIT.CpGs"]]})))
epiCMIT.RRBS.scores$epiCMIT.hyper.CpGs <- as.numeric(do.call(rbind,lapply(epiCMIT.RRBS,function(x){x[["epiCMIT.run.info"]][["epiCMIT.hyper.CpGs"]]})))
epiCMIT.RRBS.scores$epiCMIT.hypo.CpGs <- as.numeric(do.call(rbind,lapply(epiCMIT.RRBS,function(x){x[["epiCMIT.run.info"]][["epiCMIT.hypo.CpGs"]]})))
head(epiCMIT.RRBS.scores[,-1])
DT::datatable(epiCMIT.RRBS.scores, options = list(scrollX = T, scrollY=T), rownames = F)
##export
fwrite(epiCMIT.RRBS.scores,"epiCMIT.RRBS.scores.tsv",sep="\t")
sessionInfo()
sessionInfo()
## R version 4.2.0 (2022-04-22)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Mojave 10.14.6
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] data.table_1.14.2 GenomicRanges_1.48.0 GenomeInfoDb_1.32.2
## [4] IRanges_2.30.0 S4Vectors_0.34.0 BiocGenerics_0.42.0
## [7] BiocStyle_2.24.0
##
## loaded via a namespace (and not attached):
## [1] rstudioapi_0.13 XVector_0.36.0 knitr_1.39
## [4] magrittr_2.0.3 zlibbioc_1.42.0 R6_2.5.1
## [7] rlang_1.0.2 fastmap_1.1.0 stringr_1.4.0
## [10] tools_4.2.0 DT_0.23 xfun_0.31
## [13] cli_3.3.0 jquerylib_0.1.4 crosstalk_1.2.0
## [16] htmltools_0.5.2 yaml_2.3.5 digest_0.6.29
## [19] bookdown_0.27 GenomeInfoDbData_1.2.8 BiocManager_1.30.18
## [22] htmlwidgets_1.5.4 bitops_1.0-7 sass_0.4.1
## [25] RCurl_1.98-1.7 evaluate_0.15 rmarkdown_2.14
## [28] stringi_1.7.6 compiler_4.2.0 bslib_0.3.1
## [31] jsonlite_1.8.0