1 Introduction

The epiCMIT (epigenetically-determined Cumulative MIToses) mitotic clock represents a relative measure of the total proliferative history of normal and neoplastic B cells. It is built considering the highest score from their two underlying hyper- and hypomethylation-based mitotic clocks, called epiCMIT-hyper and the epiCMIT-hypo, respectively. The code I provide here calculates the three mitotic clocks for each sample. All of them range from 0 to 1, depending on low or high relative proliferative history. A comprehensive selection of CpGs was performed to build the epiCMIT. Nonetheless, given the careful CpG filtering it is likely that epiCMIT represent a pan-cancer mitotic clock. The current code supports several DNA methylation approaches, including 450k and EPIC Illumina arrays, and the next generation sequencing (NGS) approaches single (S) and paired end (PE) RRBS, ERRBS and WGBS, which are subsequently explained in the following section.

The epiCMIT mitotic clock, from Strati, P., Green, M.R. A, Nat Cancer, 2020.

Here, I provide the necessary steps to calculate the epiCMIT in NGS DNA methylation data. Like in the original strategy, the epiCMIT is built considering the highest score from their two underlying hyper- and hypomethylation-based mitotic clocks, called epiCMIT-hyper and the epiCMIT-hypo, respectively. Please, note that the epiCMIT should always be analyzed together with the CLL epitypes (or at least with IGHV status), as the total proliferative history is dependent on the CLL cellular origin, with n-CLL/unmutated CLL showing lower baseline epiCMIT compared to m-CLL/mutated CLLs.

The validity of the NGS-based epiCMIT mitotic score has been validated in the CLL 1100 study (Knisbacher, Lin, Hahn, Nadeu, Duran-Ferrer et al, Nat. Genet 2022,accepted) and can be summarized in the following figure. Briefly, the new NGS-based epiCMIT estimation is strongly correlated (R=0.94, P=4.5e-12) with the original methodology in Duran-Ferrer 2020 (panel a). This new NGS-based epiCMIT was compared in paired samples profiled with different platforms , including 450k Illumina array data, RRBS-SE and RRBS-PE data. Next, unsupervised principal component analysis (PCA) shows similar epiCMIT gradients irrespective of the number and identity of the epiCMIT-CpGs used, cohort compositions and batches, further supporting its robustness (panel b). Finally and most importantly, the epiCMIT is the one of the most important -and the only- continuous variable predicting clinical outcome in CLL in a multi-omic clincal data modelling (panel c). As the range in epiCMIT is 0.19-0.84, this implies that a n-CLL patient with an epiCMIT of 0.84 shows 6.2 times higher risk of dying compared to a n-CLL patient with 0.19, as the hazard ratio for each 0.1 epiCMIT increments is 1.4 in the multi-omic clinical data modelling reported in the CLL1100 study.

2 Load required data.

2.1 Load R packages.

## Load packages and set general options
options(stringsAsFactors = F,error=NULL)
##needed packages
library(GenomicRanges)

## Loading required package: stats4

## Loading required package: BiocGenerics

## 
## Attaching package: 'BiocGenerics'

## The following objects are masked from 'package:stats':
## 
##     IQR, mad, sd, var, xtabs

## The following objects are masked from 'package:base':
## 
##     anyDuplicated, append, as.data.frame, basename, cbind, colnames,
##     dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
##     grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
##     order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
##     rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
##     union, unique, unsplit, which.max, which.min

## Loading required package: S4Vectors

## 
## Attaching package: 'S4Vectors'

## The following objects are masked from 'package:base':
## 
##     expand.grid, I, unname

## Loading required package: IRanges

## Loading required package: GenomeInfoDb

library(data.table)

## 
## Attaching package: 'data.table'

## The following object is masked from 'package:GenomicRanges':
## 
##     shift

## The following object is masked from 'package:IRanges':
## 
##     shift

## The following objects are masked from 'package:S4Vectors':
## 
##     first, second

2.2 Download and load required data into R.

download.file("https://github.com/Duran-FerrerM/Pan-B-cell-methylome/raw/master/data/Estimate.epiCMIT.RData", destfile = "Estimate.epiCMIT.RData", method="libcurl")
load("Estimate.epiCMIT.RData")

3 Analyses

The methodology to estimate the epiCMIT is the following:

Map your DNA methylation matrix to epiCMIT-CpGs and the correct genome assembly using the function DNAm.to.epiCMIT. This function converts a DNA methylation matrix (DNAm) into a suitable GRange object to calculate the epiCMIT and contains the following arguments:
- DNAm: DNA methylation matrix. Should be a matrix, data.frame or GRanges object.
- DNAm.genome.assembly: The genome assembly of the DNAm matrix. Supported versions are “hg19” and “hg38”.
- map.DNAm.to: Map the data to Illumina-450k epiCMIT-CpGs or WGBS epiCMIT-CpGs. Should be “Illumina.450K.epiCMIT” or “WGBS.epiCMIT”. +min.epiCMIT.CpGs: Integer, default 800 CpGs. Recommended a minimum of 800 epiCMIT-CpGs, less CpGs have not been thoroughly tested.
- CpGs.as.DNAm.rownames: logical, default TRUE. Are rownames of your matrix CpG names? If so they will be used for mapping. If FALSE, coordinates should be provided. If DNAm is of class matrix and CpGs.as.DNAm.rownames=FALSE, CpG coordinates and DNAm info columns from DNAm should be provided in the following arguments:DNAm.chr, DNAm.start, DNAm.end, DNAm.meth
Calculate the epiCMIT after running DNAm.to.epiCMIT with the epiCMITfunction, which contains the following arguments:
- DNAm.epiCMIT:The GRange object after running DNAm.to.epiCMIT function.
- return.epiCMIT.annot: logical indicating whether to export the epiCMIT-CpGs used and metadata info.
- export.results: logical, indicating whether to export the results.
- export.results.dir: character indicating the path to export the results.
- export.results.name: character string with the name for the exported file.

3.1 `DNAm.to.epiCMIT` and `epiCMIT` functions.

We can create a loop with the R function lapply and run both functions at the same time for all the samples to subsequently store all the results into a list. To do so, we can execute the following lines of code:

##
## Example with RRBS-SE hg19 data
##

epiCMIT.RRBS <- suppressWarnings( ## supress warning so that all warning do not appear in the html document.
  suppressMessages(
    lapply(names(RRBS.SE.hg19.Examples),function(sample.i){
  
  ##transform data to run epiCMIT() function.
  DNAm.epiCMIT <- DNAm.to.epiCMIT(DNAm = RRBS.SE.hg19.Examples[[sample.i]],
                                  DNAm.genome.assembly = "hg19",
                                  map.DNAm.to = "WGBS.epiCMIT",
                                  min.epiCMIT.CpGs = 800 # minimum recommended
  )
  
  ##calculate epiCMIT
  epiCMIT.results <- epiCMIT(DNAm.epiCMIT = DNAm.epiCMIT,
                             return.epiCMIT.annot = TRUE,
                             export.results = FALSE, ## change to TRUE to export results for every single sample
                             export.results.dir = ".",
                             export.results.name = paste0("RRBS-SE.",sample.i,"_example_")
  )
})
)
)

We now can gather all the relevant results into a matrix-like structure so that we can export all the table at once.

## prepare data to export it.
epiCMIT.RRBS.scores <- do.call(rbind,lapply(epiCMIT.RRBS,function(x){x[["epiCMIT.scores"]]}))
epiCMIT.RRBS.scores$epiCMIT.CpGs <- as.numeric(do.call(rbind,lapply(epiCMIT.RRBS,function(x){x[["epiCMIT.run.info"]][["epiCMIT.CpGs"]]})))
epiCMIT.RRBS.scores$epiCMIT.hyper.CpGs <- as.numeric(do.call(rbind,lapply(epiCMIT.RRBS,function(x){x[["epiCMIT.run.info"]][["epiCMIT.hyper.CpGs"]]})))
epiCMIT.RRBS.scores$epiCMIT.hypo.CpGs <- as.numeric(do.call(rbind,lapply(epiCMIT.RRBS,function(x){x[["epiCMIT.run.info"]][["epiCMIT.hypo.CpGs"]]})))
head(epiCMIT.RRBS.scores[,-1])

##             epiCMIT epiCMIT.hyper epiCMIT.hypo epiCMIT.CpGs epiCMIT.hyper.CpGs
## DFCI-5069 0.4691847     0.2696861    0.4691847         4826               1038
## DFCI-5071 0.5942540     0.5942540    0.4730709         3065                743
## DFCI-5102 0.5817348     0.4411374    0.5817348         5674               1236
## DFCI-5072 0.4945748     0.4693410    0.4945748         4362                955
## DFCI-5073 0.5578167     0.4235328    0.5578167         4258                958
## DFCI-5074 0.6760768     0.6065394    0.6760768         1096                241
##           epiCMIT.hypo.CpGs
## DFCI-5069              3788
## DFCI-5071              2322
## DFCI-5102              4438
## DFCI-5072              3407
## DFCI-5073              3300
## DFCI-5074               855

##export
fwrite(epiCMIT.RRBS.scores,"epiCMIT.RRBS.scores.tsv",sep="\t")

3.2 Get all code at once

knitr::opts_chunk$set(echo = TRUE)

## Load packages and set general options
options(stringsAsFactors = F,error=NULL)
##needed packages
library(GenomicRanges)
library(data.table)


download.file("https://github.com/Duran-FerrerM/Pan-B-cell-methylome/raw/master/data/Estimate.epiCMIT.RData", destfile = "Estimate.epiCMIT.RData", method="libcurl")
load("Estimate.epiCMIT.RData")




##
## Example with RRBS-SE hg19 data
##

epiCMIT.RRBS <- suppressWarnings( ## supress warning so that all warning do not appear in the html document.
  suppressMessages(
    lapply(names(RRBS.SE.hg19.Examples),function(sample.i){
  
  ##transform data to run epiCMIT() function.
  DNAm.epiCMIT <- DNAm.to.epiCMIT(DNAm = RRBS.SE.hg19.Examples[[sample.i]],
                                  DNAm.genome.assembly = "hg19",
                                  map.DNAm.to = "WGBS.epiCMIT",
                                  min.epiCMIT.CpGs = 800 # minimum recommended
  )
  
  ##calculate epiCMIT
  epiCMIT.results <- epiCMIT(DNAm.epiCMIT = DNAm.epiCMIT,
                             return.epiCMIT.annot = TRUE,
                             export.results = FALSE, ## change to TRUE to export results for every single sample
                             export.results.dir = ".",
                             export.results.name = paste0("RRBS-SE.",sample.i,"_example_")
  )
})
)
)



## prepare data to export it.
epiCMIT.RRBS.scores <- do.call(rbind,lapply(epiCMIT.RRBS,function(x){x[["epiCMIT.scores"]]}))
epiCMIT.RRBS.scores$epiCMIT.CpGs <- as.numeric(do.call(rbind,lapply(epiCMIT.RRBS,function(x){x[["epiCMIT.run.info"]][["epiCMIT.CpGs"]]})))
epiCMIT.RRBS.scores$epiCMIT.hyper.CpGs <- as.numeric(do.call(rbind,lapply(epiCMIT.RRBS,function(x){x[["epiCMIT.run.info"]][["epiCMIT.hyper.CpGs"]]})))
epiCMIT.RRBS.scores$epiCMIT.hypo.CpGs <- as.numeric(do.call(rbind,lapply(epiCMIT.RRBS,function(x){x[["epiCMIT.run.info"]][["epiCMIT.hypo.CpGs"]]})))
head(epiCMIT.RRBS.scores[,-1])

##export
fwrite(epiCMIT.RRBS.scores,"epiCMIT.RRBS.scores.tsv",sep="\t")


sessionInfo()

4 Session Information

sessionInfo()

## R version 4.2.0 (2022-04-22)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Mojave 10.14.6
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
## [1] data.table_1.14.2    GenomicRanges_1.48.0 GenomeInfoDb_1.32.2 
## [4] IRanges_2.30.0       S4Vectors_0.34.0     BiocGenerics_0.42.0 
## [7] BiocStyle_2.24.0    
## 
## loaded via a namespace (and not attached):
##  [1] rstudioapi_0.13        XVector_0.36.0         knitr_1.39            
##  [4] magrittr_2.0.3         zlibbioc_1.42.0        R6_2.5.1              
##  [7] rlang_1.0.2            fastmap_1.1.0          stringr_1.4.0         
## [10] tools_4.2.0            xfun_0.31              cli_3.3.0             
## [13] jquerylib_0.1.4        htmltools_0.5.2        yaml_2.3.5            
## [16] digest_0.6.29          bookdown_0.27          GenomeInfoDbData_1.2.8
## [19] BiocManager_1.30.18    bitops_1.0-7           sass_0.4.1            
## [22] RCurl_1.98-1.7         evaluate_0.15          rmarkdown_2.14        
## [25] stringi_1.7.6          compiler_4.2.0         bslib_0.3.1           
## [28] jsonlite_1.8.0

epiCMIT mitotic clock estimation in NGS DNA methylation data.

05/08/2022

1 Introduction

2 Load required data.

2.1 Load R packages.

2.2 Download and load required data into R.

3 Analyses

3.1 `DNAm.to.epiCMIT` and `epiCMIT` functions.

3.2 Get all code at once

4 Session Information

epiCMIT mitotic clock estimation in NGS DNA methylation data.

05/08/2022

1 Introduction

2 Load required data.

2.1 Load R packages.

2.2 Download and load required data into R.

3 Analyses

3.1 DNAm.to.epiCMIT and epiCMIT functions.

3.2 Get all code at once

4 Session Information

3.1 `DNAm.to.epiCMIT` and `epiCMIT` functions.