1 Introduction

B-cell tumors comprise a variety of neoplasias derived from normal B cells of the heamopoietic system. Collectively, ALL, MCL, CLL, DLBCL and MM represent the majority of diagnosed B-cell neoplasias. They are further classified in subtypes with different clinicobiological features. The code we provide here represents a Pan B-cell tumor classifier algorithm, which contains two steps: in the firt step, an unknown sample is classified into one of the five major B-cell tumor entities previously mentioned; in the second step, these major entities are further classified into their subtypes: ALL subtypes, HeH; 11q23/MLL; t(12;21); t(1;19); t(9;22); dic(9;20). MCL subtypes, C1/cMCL and C2/nnMCL. CLL subtypes, n-CLL/low programmed CLL, i-CLL/intermediate programmed CLL and m-CLL/high programmed CLL and DLBCL subtypes, ABC and GCB. Although an unbiased prediction can serve as an internal control, if trusty knowledge of the entity is avaialble, we recommend to specify it to the classifier algorithm. Please, note that the CpGs of the classifier are present in both Illumina 450k and EPIC arrays, and thus both platforms are supported. A first step to predict B-cell tumor content in the samples is available (strongly recommended), as the predictor assumes a minimum of 60% tumor cell content. Please, note that the tumor cell content prediction based on DNA methylation may not be accurate for some DLBCL and for MM cases, as we reported in Duran-Ferrer 2020. The strategy and accuracy of the classifier can be seen in the following figure:

2 Load required data.

2.1 Load R packages.

## Load packages and set general options
options(stringsAsFactors = F,error=NULL)
##needed packages
library(e1071)
library(xlsx)
library(data.table)

2.2 Download and load required data into R.

download.file("https://github.com/Duran-FerrerM/Pan-B-cell-methylome/raw/master/data/B.cell.tumor.classifier.RData", destfile = "B.cell.tumor.classifier.RData", method="libcurl")
load("B.cell.tumor.classifier.RData")

3 Analyses

3.1 Deconvolute.Bcell.tumor function.

The methodology to predict B-cell tumors is simply using the function Deconvolute.Bcell.tumor, which has the following arguments:

  • data: data.frame or matrix with the methylation beta values and named rows(CpGs) and columns(Samples)
  • predict.tumor.cell.content: logical indicating whether predict B-cell tumor content based on DNA methylation values.
  • microenvironment.CpGs: Type of microenvironmental cells used for deconvolution. Bcells.2CpGs only allow the prediction of B-cell proportion.
  • predict.Bcell.tumors: logical indicating whether predict B-cell tumor entities and subtypes.
  • which.predictor: character string that specifies if the full prediction (entity+subtype) or which one of the five predictors will be applied. Possible values are “entity.subtype”,“entity”,“ALL”,“CLL”,“DLBCL” or “MCL”. Defaults to “entity.subtype” (full prediction).
  • impute.missings: impute missing beta values? Default to FALSE. Use with caution, the effect of missings to the predictions has not been extensively tested.
  • export: export results to a .xlsx file? The file will be named “methy.classifier.results.xlsx” and stored in the working directory.

NOTE:. In the case of leukemia or lymphoma samples, it is expected that normal B/T lymphocytes are poorly infiltrated in the sample, and thus the B/T-cell signature present in the data can be understood as a surrogate of tumor cell content. This strategy yielded proved correct for high number of samples as we reported in Duran-Ferrer M 2020 for MCL and CLL patients. This may be not the case for other tumors.

The output consits of one table with the cellular compositions of B-cell tumor samples depending on the assumed microenvironmental cells, and the others the estimated probabilities of B-cell tumor entities and subtypes, and a final table with the predicted entity/subtype for each sample. The output includes a raw svm prediction and a suggested prediction that takes into account confusion among entities/subtypes. If only one predictor is applied (see arguments) the ouptut only includes a table with the predictions of that specific predictor.

3.2 Example data.

Results <- Deconvolute.Bcell.tumor(data=example.betas,
                        predict.tumor.cell.content = T,
                        microenvironment.CpGs = "Pan.Bcell.microenvironment",
                        predict.Bcell.tumors = T,
                        which.predictor="entity.subtype",
                        impute.missings = T,
                        export=F
                        )
## Loading required namespace: quadprog
## Warning in Deconvolute.Bcell.tumor(data = example.betas, predict.tumor.cell.content = T, : The following samples show <60% tumor cell content and thus B-cell tumor predictions may be inaccurate:
## S1;S1.2;S1.5;Gran;Gran.2;Gran.5;Tcells.CD8T.1;Tcells.CD4T.1;Tcells.CD8T.5;Mono;Mono.2;Mono.3;MACi;MACi.2;MACm;EC.HDLEC;EC.HDLEC.9;EC.HDLEC.4;PBMC.2;PBMC.4;PBMC.3;WB.2;WB;WB.3;ALL_222SF;MCL.M215;CLL.1425;DLBCL.DL110.D2538;DLBCL.DL51.D1454;MM_41956;NK;NK.1;NK.2
## Warning in Deconvolute.Bcell.tumor(data = example.betas, predict.tumor.cell.content = T, : Missing values were imputed at:
## S1.2 in cg23186952
Results <- cbind(Results$Cellular.proportions, ## cellular comosition
                 Results$combined.prediction, ## Combined prediction
                 do.call(cbind,Results[grep("^Cellular.proportions$|^combined.prediction$",names(Results),invert = T)]) ## All probabilities for all predictors, including entity and subtypes
                 )

DT::datatable(Results, options = list(scrollX = T, scrollY=T), rownames = F)
##export
fwrite(Results,"B.cell.tumors.composition.tsv",sep="\t")

3.3 Get all code at once

knitr::opts_chunk$set(echo = TRUE)

## Load packages and set general options
options(stringsAsFactors = F,error=NULL)
##needed packages
library(e1071)
library(xlsx)
library(data.table)


download.file("https://github.com/Duran-FerrerM/Pan-B-cell-methylome/raw/master/data/B.cell.tumor.classifier.RData", destfile = "B.cell.tumor.classifier.RData", method="libcurl")
load("B.cell.tumor.classifier.RData")


Results <- Deconvolute.Bcell.tumor(data=example.betas,
                        predict.tumor.cell.content = T,
                        microenvironment.CpGs = "Pan.Bcell.microenvironment",
                        predict.Bcell.tumors = T,
                        which.predictor="entity.subtype",
                        impute.missings = T,
                        export=F
                        )


Results <- cbind(Results$Cellular.proportions, ## cellular comosition
                 Results$combined.prediction, ## Combined prediction
                 do.call(cbind,Results[grep("^Cellular.proportions$|^combined.prediction$",names(Results),invert = T)]) ## All probabilities for all predictors, including entity and subtypes
                 )

DT::datatable(Results, options = list(scrollX = T, scrollY=T), rownames = F)


##export
fwrite(Results,"B.cell.tumors.composition.tsv",sep="\t")


sessionInfo()

4 Session Information

sessionInfo()
## R version 4.2.0 (2022-04-22)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Mojave 10.14.6
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] quadprog_1.5-8    data.table_1.14.2 xlsx_0.6.5        e1071_1.7-9      
## [5] BiocStyle_2.24.0 
## 
## loaded via a namespace (and not attached):
##  [1] rstudioapi_0.13     knitr_1.39          magrittr_2.0.3     
##  [4] R6_2.5.1            rlang_1.0.2         fastmap_1.1.0      
##  [7] stringr_1.4.0       tools_4.2.0         DT_0.23            
## [10] xfun_0.31           cli_3.3.0           jquerylib_0.1.4    
## [13] crosstalk_1.2.0     htmltools_0.5.2     class_7.3-20       
## [16] yaml_2.3.5          digest_0.6.29       bookdown_0.26      
## [19] rJava_1.0-6         BiocManager_1.30.18 htmlwidgets_1.5.4  
## [22] xlsxjars_0.6.1      sass_0.4.1          evaluate_0.15      
## [25] rmarkdown_2.14      proxy_0.4-26        stringi_1.7.6      
## [28] compiler_4.2.0      bslib_0.3.1         jsonlite_1.8.0