||Takeru Fujii, Kazumitsu Maehara, Masatoshi Fujita, Yasuyuki Ohkawa, Discriminative feature of cells characterizes cell populations of interest by a small subset of genes., PLoS computational biology, 10.1371/journal.pcbi.1009579, 17, 11, e1009579, 2021.11, Organisms are composed of various cell types with specific states. To obtain a comprehensive understanding of the functions of organs and tissues, cell types have been classified and defined by identifying specific marker genes. Statistical tests are critical for identifying marker genes, which often involve evaluating differences in the mean expression levels of genes. Differentially expressed gene (DEG)-based analysis has been the most frequently used method of this kind. However, in association with increases in sample size such as in single-cell analysis, DEG-based analysis has faced difficulties associated with the inflation of P-values. Here, we propose the concept of discriminative feature of cells (DFC), an alternative to using DEG-based approaches. We implemented DFC using logistic regression with an adaptive LASSO penalty to perform binary classification for discriminating a population of interest and variable selection to obtain a small subset of defining genes. We demonstrated that DFC prioritized gene pairs with non-independent expression using artificial data and that DFC enabled characterization of the muscle satellite/progenitor cell population. The results revealed that DFC well captured cell-type-specific markers, specific gene expression patterns, and subcategories of this cell population. DFC may complement DEG-based methods for interpreting large data sets. DEG-based analysis uses lists of genes with differences in expression between groups, while DFC, which can be termed a discriminative approach, has potential applications in the task of cell characterization. Upon recent advances in the high-throughput analysis of single cells, methods of cell characterization such as scRNA-seq can be effectively subjected to the discriminative methods..
||Akihito Harada, Kazumitsu Maehara, Tetsuya Handa, Yasuhiro Arimura, Jumpei Nogami, Yoko Hayashi-Takanaka, Katsuhiko Shirahige, Hitoshi Kurumizaka, Hiroshi Kimura, Yasuyuki Ohkawa, A chromatin integration labelling method enables epigenomic profiling with lower input, Nature Cell Biology, 10.1038/s41556-018-0248-3, 21, 2, 287-296, 2019.02, Chromatin plays a crucial role in gene regulation, and chromatin immunoprecipitation followed by sequencing (ChIP–seq) has been the standard technique for examining protein–DNA interactions across the whole genome. However, it is difficult to obtain epigenomic information from limited numbers of cells by ChIP–seq because of sample loss during chromatin preparation and inefficient immunoprecipitation. In this study, we established an immunoprecipitation-free epigenomic profiling method named chromatin integration labelling (ChIL), which enables the amplification of genomic sequences closely associated with the target molecules before cell lysis. Using ChIL followed by sequencing (ChIL–seq), we reliably detected the distributions of histone modifications and DNA-binding factors in 100–1,000 cells. In addition, ChIL–seq successfully detected genomic regions associated with histone marks at the single-cell level. Thus, ChIL–seq offers an alternative method to ChIP–seq for epigenomic profiling using small numbers of cells, in particular, those attached to culture plates and after immunofluorescence..
||Akihito Harada, Kazumitsu Maehara, Yusuke Ono, Hiroyuki Taguchi, Kiyoshi Yoshioka, Yasuo Kitajima, Yan Xie, Yuko Sato, Takeshi Iwasaki, Jumpei Nogami, Seiji Okada, Tetsuro Komatsu, Yuichiro Semba, Tatsuya Takemoto, Hiroshi Kimura, Hitoshi Kurumizaka, Yasuyuki Ohkawa, Histone H3.3 sub-variant H3mm7 is required for normal skeletal muscle regeneration, Nature Communications, 10.1038/s41467-018-03845-1, 9, 1, 2018.05, Regulation of gene expression requires selective incorporation of histone H3 variant H3.3 into chromatin. Histone H3.3 has several subsidiary variants but their functions are unclear. Here we characterize the function of histone H3.3 sub-variant, H3mm7, which is expressed in skeletal muscle satellite cells. H3mm7 knockout mice demonstrate an essential role of H3mm7 in skeletal muscle regeneration. Chromatin analysis reveals that H3mm7 facilitates transcription by forming an open chromatin structure around promoter regions including those of myogenic genes. The crystal structure of the nucleosome containing H3mm7 reveals that, unlike the S57 residue of other H3 proteins, the H3mm7-specific A57 residue cannot form a hydrogen bond with the R40 residue of the cognate H4 molecule. Consequently, the H3mm7 nucleosome is unstable in vitro and exhibited higher mobility in vivo compared with the H3.3 nucleosome. We conclude that the unstable H3mm7 nucleosome may be required for proper skeletal muscle differentiation..
||Yuki Kuniyoshi, Kazumitsu Maehara, Takeshi Iwasaki, Masayasu Hayashi, Yuichiro Semba, Masatoshi Fujita, Yuko Sato, Hiroshi Kimura, Akihito Harada, Yasuyuki Ohkawa, Identification of immunoglobulin gene sequences from a small read number of mRNA-seq using hybridomas, PLoS One, 10.1371/journal.pone.0165473, 11, 10, 2016.10, Identification of immunoglobulin genes in hybridomas is essential for producing antibodies for research and clinical applications. A couple of methods such as RACE and degenerative PCR have been developed for determination of the Igh and Igl/Igk coding sequences (CDSs) but it has been difficult to process a number of hybridomas both with accuracy and rapidness. Here, we propose a new strategy for antibody sequence determination by mRNA-seq of hybridomas. We demonstrated that hybridomas highly expressed the Igh and Igl/Igk genes and that de novo transcriptome assembly using mRNA-seq data enabled identification of the CDS of both Igh and Igl/Igk accurately. Furthermore, we estimated that only 30,000 sequenced reads are required to identify immunoglobulin sequences from four different hybridoma clones. Thus, our approach would facilitate determining variable CDSs drastically..
||Jun Ya Kaimori, Kazumitsu Maehara, Yoko Hayashi-Takanaka, Akihito Harada, Masafumi Fukuda, Satoko Yamamoto, Naotsugu Ichimaru, Takashi Umehara, Shigeyuki Yokoyama, Ryo Matsuda, Tsuyoshi Ikura, Koji Nagao, Chikashi Obuse, Naohito Nozaki, Shiro Takahara, Toshifumi Takao, Yasuyuki Ohkawa, Hiroshi Kimura, Yoshitaka Isaka, Histone H4 lysine 20 acetylation is associated with gene repression in human cells, Scientific Reports, 10.1038/srep24318, 6, 2016.04, Histone acetylation is generally associated with gene activation and chromatin decondensation. Recent mass spectrometry analysis has revealed that histone H4 lysine 20, a major methylation site, can also be acetylated. To understand the function of H4 lysine 20 acetylation (H4K20ac), we have developed a specific monoclonal antibody and performed ChIP-seq analysis using HeLa-S3 cells. H4K20ac was enriched around the transcription start sites (TSSs) of minimally expressed genes and in the gene body of expressed genes, in contrast to most histone acetylation being enriched around the TSSs of expressed genes. The distribution of H4K20ac showed little correlation with known histone modifications, including histone H3 methylations. A motif search in H4K20ac-enriched sequences, together with transcription factor binding profiles based on ENCODE ChIP-seq data, revealed that most transcription activators are excluded from H4K20ac-enriched genes and a transcription repressor NRSF/REST co-localized with H4K20ac. These results suggest that H4K20ac is a unique acetylation mark associated with gene repression..
||Kazumitsu Maehara, Yasuyuki Ohkawa, Exploration of nucleosome positioning patterns in transcription factor function, Scientific Reports, 10.1038/srep19620, 6, 2016.01, The binding of transcription factors (TFs) triggers activation of specific chromatin regions through the recruitment and activation of RNA polymerase. Unique nucleosome positioning (NP) occurs during gene expression and has been suggested to be involved in various other chromatin functions. However, the diversity of NP that can occur for each function has not been clarified. Here we used MNase-Seq data to evaluate NP around 258 cis-regulatory elements in the mouse genome. Principal component analysis of the 258 elements revealed that NP consisted of five major patterns. Furthermore, the five NP patterns had predictive power for the level of gene expression. We also demonstrated that selective NP patterns appeared around TF binding sites. These results suggest that the NP patterns are correlated to specific functions on chromatin..
||Kazumitsu Maehara, Akihito Harada, Yuko Sato, Masaki Matsumoto, Keiichi Nakayama, Hiroshi Kimura, Yasuyuki Ohkawa, Tissue-specific expression of histone H3 variants diversified after species separation, Epigenetics and Chromatin, 10.1186/s13072-015-0027-3, 8, 1, 2015.09, Background: The selective incorporation of appropriate histone variants into chromatin is critical for the regulation of genome function. Although many histone variants have been identified, a complete list has not been compiled. Results: We screened mouse, rat and human genomes by in silico hybridization using canonical histone sequences. In the mouse genome, we identified 14 uncharacterized H3 genes, among which 13 are similar to H3.3 and do not have human or rat counterparts, and one is similar to human testis-specific H3 variant, H3T/H3.4, and had a rat paralog. Although some of these genes were previously annotated as pseudogenes, their tissue-specific expression was confirmed by sequencing the 3′-UTR regions of the transcripts. Certain new variants were also detected at the protein level by mass spectrometry. When expressed as GFP-tagged versions in mouse C2C12 cells, some variants were stably incorporated into chromatin and the genome-wide distributions of most variants were similar to that of H3.3. Moreover, forced expression of H3 variants in chromatin resulted in alternate gene expression patterns after cell differentiation. Conclusions: We comprehensively identified and characterized novel mouse H3 variant genes that encoded highly conserved amino acid sequences compared to known histone H3. We speculated that the diversity of H3 variants acquired after species separation played a role in regulating tissue-specific gene expression in individual species. Their biological relevance and evolutionary aspect involving pseudogene diversification will be addressed by further functional analysis..
||Kazumitsu Maehara, Yasuyuki Ohkawa, Agplus
A rapid and flexible tool for aggregation plots, Bioinformatics, 10.1093/bioinformatics/btv322, 31, 18, 3046-3047, 2015.07, Aggregation plots are frequently used to evaluate signal distributions at user-interested points in ChIP-Seq data analysis. agplus, a new and simple command-line tool, enables rapid and flexible generation of text tables tailored for aggregation plots from which users can easily design multiple groups based on user-definitions such as regulatory regions or transcription initiation sites..
||Kazumitsu Maehara, Jun Odawara, Akihito Harada, Tomohiko Yoshimi, Koji Nagao, Chikashi Obuse, Koichi Akashi, Taro Tachibana, Toshio Sakata, Yasuyuki Ohkawa, A co-localization model of paired ChIP-seq data using a large ENCODE data set enables comparison of multiple samples, Nucleic Acids Research, 10.1093/nar/gks1010, 41, 1, 54-62, 2013.01, Deep sequencing approaches, such as chromatin immunoprecipitation by sequencing (ChIP-seq), have been successful in detecting transcription factor-binding sites and histone modification in the whole genome. An approach for comparing two different ChIP-seq data would be beneficial for predicting unknown functions of a factor. We propose a model to represent co-localization of two different ChIP-seq data. We showed that a meaningful overlapping signal and a meaningless background signal can be separated by this model. We applied this model to compare ChIP-seq data of RNA polymerase II C-terminal domain (CTD) serine 2 phosphorylation with a large amount of peak-called data, including ChIP-seq and other deep sequencing data in the Encyclopedia of DNA Elements (ENCODE) project, and then extracted factors that were related to RNA polymerase II CTD serine 2 in HeLa cells. We further analyzed RNA polymerase II CTD serine 7 phosphorylation, of which their function is still unclear in HeLa cells. Our results were characterized by the similarity of localization for transcription factor/histone modification in the ENCODE data set, and this suggests that our model is appropriate for understanding ChIP-seq data for factors where their function is unknown..