Cancer genomes are different from their counterpart normal genomes. Base pair-resolution comparison between the cancer and matched normal genomes, which is now feasible with next-generation sequencing, have revealed various types of cancer genome-specific alterations. We believe that some of these changes are responsible for the initiation and progression of the disease and more, some of them have clinical relevance as demonstrated by BCR-ABL1 fusion genes and EGFR L858R mutations that can be pharmacologically inhibited and treat the corresponding diseases. During decades, we have analyzed microarray- or sequencing-based genome and transcriptome data of various types of cancers including lung cancer (CCR, 2005), CML (Leukemia, 2006), hepatocellular carcinoma (IJC, 2008), cholangiocarcinomas (Oncotarget, 2016), melanomas (Oncogene, 2017) and more cancer types in ongoing projects. We are further focusing on the clinical translation of genomic insights, for example, to identify the clinically relevant biomarkers. We evaluated the treatment response in terms of genomic/transcriptomic markers for dacomitinib/head-and-neck cancers (CCR, 2015), dovitinib/lung cancers (Ann Oncol, 2017) and novel combinations of drug-cancer types in ongoing projects.
PanCancer analysis and public dataset
Global efforts to obtain and analyze the large-scaled cancer genomes such as TCGA (the Cancer Genome Analysis) and ICGC (International Cancer Genome Consortium) have made the high-quality multiomics cancer genomes datasets, e.g., 10,000 TCGA and 3,000 ICGC cancers, available to the research community. These datasets are available to public community and may be valuable in identifying novel features of cancer genomes in addition to support the meta-analysis as independent cohort. As a member of TCGA consortium, we demonstrated that GBM tumors can be classified according to the level of microRNA expression (Cancer Res, 2011). Then, we have analyzed the sequencing data of colorectal cancers identifying the novel mutation types – the microsatellite instability (MSI), which is recently established as predictive markers for immune checkpoint inhibitors (Cell, 2013 and Cancer Res, 2014). The effort has been extended to 8,000 exome- and 3,000 genome-sequencing TCGA data identifying the PanCancer-scaled landscape of MSI (Nat Comm, 2017). These are good examples how we can exploit the public cancer genome datasets and recently, we are in pursue of identifying novel mutation signatures types and their potential biological and clinical relevance using TCGA/ICGC mutation calls (ongoing project).
Immunology and tumor microenvironments
Recently introduced, novel classes of cancer therapeutics – monoclonal antibodies targeting the immune checkpoints such as PD-1/PD-L1, CTLA4 – have revolutionized the cancer medicine. However, likewise the targeted cancer medicine, the blind administration of the immune checkpoints inhibitors have achieved clinical response in a small subset of patients. The mutator phenotypes with MSI is the first genomic biomarkers to predict the therapeutic response to immune checkpoint inhibitors across cancer types by FDA. Furthermore, it becomes clear that non-tumor cells in tumor microenvironments such as immune and stromal cells play important roles in the tumor development as well as the therapeutic response to immune checkpoint inhibitors. In this perspective, we have analyzed RNA-seq of more than 7000 TCGA tumors and evaluated tumor purity, the ratio of tumor-nontumor cells, may be confounding factors in the immunoprofiling of tumor microenvironment as well as clustering-based analyses (CIR, 2017). In addition, we analyzed the expression of thyroid cancers in TCGA consortium to identify novel class of the disease showing high level of expression for immune genes with unfavorable patient prognosis (Cancers, 2019). In ongoing projects, we are analyzing the combined set of DNA- and RNA-seq for refined characterization of tumor and their microenvironments (ongoing). Moreover, the single cell RNA-seq has been recently introduced to facilitate the tumor microenvironment profiling at individual cell levels. We are currently analyzing the single cell RNA-seq of five gastric cancers (ongoing).
Intratumoral heterogeneity has been recently recognized and it becomes clear that cancer genome evolution may be more dynamic than previously recognized. We have analyzed the multiregion sequencing or primary-vs.-metastasis comparative sequencing to evaluate the heterogeneity and cancer genome evolution for prostate cancers (J Pathol, 2014), early-vs.-advanced gastric cancers (J Pathol, 2014), colorectal cancers with liver metastases (CCR, 2015), synchronous colorectal cancers (Oncotarget, 2015), ovarian cancers with periotoneal seeding (J Pathol 2016), gastric cancers with lymph node metastases (Gastric cancer, 2018), and cancers arising in benign gastric/colorectal lesions (Cancers, 2020). We are also investigating the genomic evolution of longitudinal biopsies obtained from patients with bladder cancers and primary-vs.-metastatic cervical cancers (ongoing).
Bioinformatics algorithms and database
We have developed a number of bioinformatics algorithms, e.g., a method to search for the binding motifs of transcription factors using gene expression data (BSEA, In Silico Biology 2006), concordance-based expression enrichment analysis (GSECA, BMC Bioinformatics, 2007), the enrichment-based copy number alteration analysis (GEAR, Bioinformatics, 2008), and clustering-based pathway analysis (PathCluster, Bioinformatics 2008). We also modified Smith-Waterman algorithm to identify the read depth-based copy number alteration from sequencing data (rSW-seq, BMC Bioinformatics, 2010) and mutation hotspot residues (MutClustSW, IEEE-TCBB, 2019). PanCancer-scaled 8,000 copy number profiles obtained from the public database, were collectively subject to meta-analysis with the results and datasets available in a database (MetaCGH, Genome Res, 2013).