Kyushu University Academic Staff Educational and Research Activities Database
List of Papers
Osamu Maruyama Last modified date:2023.10.03

Associate Professor / Modeling and Optimization / Department of Design Futures / Faculty of Design


Papers
1. Tsukasa Koga, Osamu Maruyama, CBOEP: Generating negative enhancer-promoter interactions to train classifiers, ACM-BCB 2023, 2023.09, For training and testing enhancer-promoter interaction (EPI) clas-
sifiers, the question on which non-positive EPIs are selected as
negative instances must be answered. Most previous methods use
the dataset of the EPI classifier TargetFinder where negative EP
pairs are sampled from non-positive EP pairs. Consequently, over
92% of EPIs in the TargetFinder-positive and negative sets of cell
line GM12878 have a 2-fold or greater positive/negative class imbal-
ance of promoter occurrences between the positive and negative
EP pairs. This situation negatively impacts the predictability of EPI
classifiers trained using the datasets.
Thus, we first proposed the condition that the negative EPIs
should satisfy. Second, we devised a method called CBOEP (class
balanced occurrences of enhancers and promoters), to generate
negative EPI sets that approximately fulfil this condition for a given
positive EPI set. CBOEP solves the finding problem by reducing it to
the maximum-flow problem. Third, we applied the generated nega-
tive EPI sets to existing EPI classifiers, TransEPI and TargetFinder.
The negative datasets lead to higher prediction performance than
the existing negative EPI datasets. The source code is available at
https://github.com/maruyama-lab-design/CBOEP..
2. Osamu Maruyama Yinuo Li Hiroki Narita Hidehiro Toh Wan Kin Au Yeung Hiroyuki Sasaki, CMIC: predicting DNA methylation inheritance of CpG islands with embedding vectors of variable-length k-mers, BMC Bioinformatics, 10.1186/s12859-022-04916-3, 23, 371, 2022.09, [URL], Background: Epigenetic modifcations established in mammalian gametes are largely
reprogrammed during early development, however, are partly inherited by the embryo
to support its development. In this study, we examine CpG island (CGI) sequences to
predict whether a mouse blastocyst CGI inherits oocyte-derived DNA methylation
from the maternal genome. Recurrent neural networks (RNNs), including that based on
gated recurrent units (GRUs), have recently been employed for variable-length inputs
in classifcation and regression analyses. One advantage of this strategy is the ability
of RNNs to automatically learn latent features embedded in inputs by learning their
model parameters. However, the available CGI dataset applied for the prediction of
oocyte-derived DNA methylation inheritance are not large enough to train the neural
networks.
Results: We propose a GRU-based model called CMIC (CGI Methylation Inheritance
Classifer) to augment CGI sequence by converting it into variable-length k-mers,
where the length k is randomly selected from the range kmin to kmax, N times, which
were then used as neural network input. N was set to 1000 in the default setting. In
addition, we proposed a new embedding vector generator for k-mers called splitDNA2vec. The randomness of this procedure was higher than the previous work,
dna2vec.
Conclusions: We found that CMIC can predict the inheritance of oocyte-derived DNA
methylation at CGIs in the maternal genome of blastocysts with a high F-measure
(0.93). We also show that the F-measure can be improved by increasing the parameter
N, that is, the number of sequences of variable-length k-mers derived from a single
CGI sequence. This implies the efectiveness of augmenting input data by converting a
DNA sequence to N sequences of variable-length k-mers. This approach can be applied
to diferent DNA sequence classifcation and regression analyses, particularly those
involving a small amount of data..
3. Wan Kin Au Yeung, Osamu Maruyama, Hiroyuki Sasaki, A convolutional neural network-based regression model to infer the epigenetic crosstalk responsible for CG methylation patterns., BMC Bioinform. , 10.1186/s12859-021-04272-8, 22, 341-341, 2021.06, [URL].
4. Osamu Maruyama, Fumiko Matsuzaki, DegSampler3: Pairwise Dependency Model in Degradation Motif Site Prediction of Substrate Protein Sequences, Proc. of 19th IEEE International Conference on Bioinformatics and Bioengineering, 2019.10.
5. Osamu Maruyama,Fumiko Matsuzaki, DegSampler: Collapsed Gibbs sampler for detecting E3 binding sites, 18th IEEE International Conference on Bioinformatics and Bioengineering, BIBE 2018 Proceedings - 2018 IEEE 18th International Conference on Bioinformatics and Bioengineering, BIBE 2018, 10.1109/BIBE.2018.00009, 1-9, 2018.12, In this paper, we address the problem of finding sequence motifs in substrate proteins specific to E3 ubiquitin ligases (E3s). We formulated a posterior probability distribution of sites by designing a likelihood function based on amino acid indexing and a prior distribution based on the disorderness of protein sequences. These designs are derived from known characteristics of E3 binding sites in substrate proteins. Then, we devise a collapsed Gibbs sampling algorithm for the posterior probability distribution called DegSampler. We performed computational experiments using 36 sets of substrate proteins specific to E3s and compared the performance of DegSampler with those of popular motif finders, MEME and GLAM2. The results showed that DegSampler was superior to the others in finding E3 binding motifs. Thus, DegSampler is a promising tool for finding E3 motifs in substrate proteins..
6. Natsu Nakajima, Morihiro Hayashida, Jesper Jansson, Osamu Maruyama, Tatsuya Akutsu, Determining the minimum number of protein-protein interactions required to support known protein complexes, PLoS One, 10.1371/journal.pone.0195545, 13, 4, 2018.04, The prediction of protein complexes from protein-protein interactions (PPIs) is a well-studied problem in bioinformatics. However, the currently available PPI data is not enough to describe all known protein complexes. In this paper, we express the problem of determining the minimum number of (additional) required protein-protein interactions as a graph theoretic problem under the constraint that each complex constitutes a connected component in a PPI network. For this problem, we develop two computational methods: one is based on integer linear programming (ILPMinPPI) and the other one is based on an existing greedy-type approximation algorithm (GreedyMinPPI) originally developed in the context of communication and social networks. Since the former method is only applicable to datasets of small size, we apply the latter method to a combination of the CYC2008 protein complex dataset and each of eight PPI datasets (STRING, MINT, BioGRID, IntAct, DIP, BIND, WI-PHI, iRefIndex). The results show that the minimum number of additional required PPIs ranges from 51 (STRING) to 964 (BIND), and that even the four best PPI databases, STRING (51), BioGRID (67), WI-PHI (93) and iRefIndex (85), do not include enough PPIs to form all CYC2008 protein complexes. We also demonstrate that the proposed problem framework and our solutions can enhance the prediction accuracy of existing PPI prediction methods. ILPMinPPI can be freely downloaded from http://sunflower.kuicr.kyoto-u.ac.jp/~nakajima/..
7. Osamu Maruyama, Yuki Kuwahara, RocSampler: Regularizing Overlapping Protein Complexes in Protein-Protein Interaction Networks, BMC Bioinformatics, 10.1186/s12859-017-1920-5, 18, 51-62, 491, 2017.12, [URL], BackgroundIn recent years, protein-protein interaction (PPI) networks have been well recognized as important resources to elucidate various biological processes and cellular mechanisms. In this paper, we address the problem of predicting protein complexes from a PPI network. This problem has two difficulties. One is related to small complexes, which contains two or three components. It is relatively difficult to identify them due to their simpler internal structure, but unfortunately complexes of such sizes are dominant in major protein complex databases, such as CYC2008. Another difficulty is how to model overlaps between predicted complexes, that is, how to evaluate different predicted complexes sharing common proteins because CYC2008 and other databases include such protein complexes. Thus, it is critical how to model overlaps between predicted complexes to identify them simultaneously.ResultsIn this paper, we propose a sampling-based protein complex prediction method, RocSampler (Regularizing Overlapping Complexes), which exploits, as part of the whole scoring function, a regularization term for the overlaps of predicted complexes and that for the distribution of sizes of predicted complexes. We have implemented RocSampler in MATLAB and its executable file for Windows is available at the site, http://imi.kyushu-u.ac.jp/~om/software/RocSampler/.ConclusionsWe have applied RocSampler to five yeast PPI networks and shown that it is superior to other existing methods. This implies that the design of scoring functions including regularization terms is an effective approach for protein complex prediction..
8. Osamu Maruyama, Limsoon Wong, Regularizing predicted complexes by mutually exclusive protein-protein interactions, Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015, 1068-1075, 2015.08, Protein complexes are key entities in the cell respon-
sible for various cellular mechanisms and biological processes. We
propose here a method for predicting protein complexes from
a protein-protein interaction (PPI) network, using information
on mutually exclusive PPIs. If two interactions are mutually
exclusive, they are not allowed to exist simultaneously in the
same predicted complex. We introduce a new regularization term
which checks whether predicted complexes are connected by mu-
tually exclusive PPIs. This regularization term is added into the
scoring function of our earlier protein complex prediction tool,
PPSampler2. We show that PPSampler2 with mutually exclusive
PPIs outperforms the original one. Furthermore, the performance
is superior to well-known representative conventional protein
complex prediction methods. Thus, it is is effective to use mutual
exclusiveness of PPIs in protein complex prediction..
9. So Kobiki, Osamu Maruyama, ReSAPP: Predicting overlapping protein complexes by merging multiple-sampled partitions of proteins, Journal of bioinformatics and computational biology, 12, 6, 1442004, 2014.12.
10. Chern Han Yong, Osamu Maruyama, Limsoon Wong, Discovery of small protein complexes from PPI networks with size-specific supervised weighting, BMC systems biology 8, S3-S3, 2014., 2014.12.
11. Osamu Maruyama, Shota Shikita, A scale-free structure prior for Bayesian inference of Gaussian graphical models, IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2014. , 2014.11.
12. Yasuhiro Okamoto, Kensuke Koyanagi, Takayoshi Shoudai, Osamu Maruyama, Discovery of Tree Structured Patterns Using Markov Chain Monte Carlo Method, Proc. 7th IADIS International Conference on Information Systems 2014, 28th February - 2nd March 2014, Madrid, Spain., 95-102, 2014.02.
13. Peiying Ruan, Morihiro Hayashida, Osamu Maruyama, Tatsuya Akutsu, Prediction of heterotrimeric protein complexes by two-phase learning using neighboring kernels, BMC Bioinformatics (APBC 2014), doi:10.1186/1471-2105-15-S2-S6, 15(Suppl 2), S6, 2014.01, [URL].
14. Yuta Taniguchi, Yasuhiro Yamada, Osamu Maruyama, Satoru Kuhara, Daisuke Ikeda, The Purity Measure for Genomic Regions Leads to Horizontally Transferred Genes, Journal of Bioinformatics and Computational Biology (JBCB), 11, 6, 1343002, 2013.12.
15. Chasanah Kusumastuti Widita, Osamu Maruyama, PPSampler2: Predicting Protein Complexes More Accurately and Efficiently by Sampling, BMC Systems Biology, 7, Suppl 6, S14, 2013.12, The problem of predicting sets of components of heteromeric protein complexes is a challenging problem in
Systems Biology. There have been many tools proposed to predict those complexes. Among them, PPSampler, a
protein complex prediction algorithm based on the Metropolis-Hastings algorithm, is reported to outperform other
tools. In this work, we improve PPSampler by refining scoring functions and a proposal distribution used inside the
algorithm so that predicted clusters are more accurate as well as the resulting algorithm runs faster. The new
version is called PPSampler2. In computational experiments, PPSampler2 is shown to outperform other tools
including PPSampler. The F-measure score of PPSampler2 is 0.67, which is at least 26% higher than those of the
other tools. In addition, about 82% of the predicted clusters that are unmatched with any known complexes are
statistically significant on the biological process aspect of Gene Ontology. Furthermore, the running time is
reduced to twenty minutes, which is 1/24 of that of PPSampler..
16. Osamu Maruyama, Heterodimeric protein complex identification by naïve Bayes classifiers, BMC Bioinformatics, doi:10.1186/1471-2105-14-347, 14, 347, 2013.12, [URL].
17. Peiying Ruan, Morihiro Hayashida, Osamu Maruyama, Tatsuya Akutsu, Prediction of Heterodimeric Protein Complexes from Weighted Protein-Protein Interaction Networks Using Novel Features and Kernel Functions, PLoS ONE, 10.1371/journal.pone.0065265, 8, 6, 2013.06, [URL], Since many proteins express their functional activity by interacting with other proteins and forming protein complexes, it is very useful to identify sets of proteins that form complexes. For that purpose, many prediction methods for protein complexes from protein-protein interactions have been developed such as MCL, MCODE, RNSC, PCP, RRW, and NWE. These methods have dealt with only complexes with size of more than three because the methods often are based on some density of subgraphs. However, heterodimeric protein complexes that consist of two distinct proteins occupy a large part according to several comprehensive databases of known complexes. In this paper, we propose several feature space mappings from protein-protein interaction data, in which each interaction is weighted based on reliability. Furthermore, we make use of prior knowledge on protein domains to develop feature space mappings, domain composition kernel and its combination kernel with our proposed features. We perform ten-fold cross-validation computational experiments. These results suggest that our proposed kernel considerably outperforms the naive Bayes-based method, which is the best existing method for predicting heterodimeric protein complexes..
18. Daisuke Ikeda, Osamu Maruyama, Satoru Kuhara, Infrequent, Unexpected, and Contrast Pattern Discovery from Bacterial Genomes by Genome-wide Comparative Analysis
, Proc. of 4th International Conference on Bioinformatics Models, Methods and Algorithms, 308-311, 2013.02.
19. Daisuke Tatsuke, Osamu Maruyama, Sampling Strategy for Protein Complex Prediction Using Cluster Size Frequency, Gene, Special issue of the 23rd International Conference on Genome Informatics (GIW), 2012.12, In this paper we propose a Markov chain Monte Carlo sampling method for
predicting protein complexes from protein-protein interactions (PPIs). Many
of the existing tools for this problem are designed more or less based on a
density measure of a subgraph of the PPI network. This kind of measures
is less effective for smaller complexes. On the other hand, it can be found
that the number of complexes of a size in a database of protein complexes
follows a power-law. Thus, most of the complexes are small-sized. For example,
in CYC2008, a database of curated protein complexes of yeast, 42% of
the complexes are heterodimeric, i.e., a complex consisting of two different
proteins. In this work, we propose a protein complex prediction algorithm,
called PPSampler (Proteins’ Partition Sampler), which is designed based on
the Metropolis-Hastings algorithm using a parameter representing a target
value of the relative frequency of the number of predicted protein complexes
of a particular size. In a performance comparison, PPSampler outperforms
other existing algorithms. Furthermore, about half of the predicted clusters
that are not matched with any known complexes in CYC2008 are statistically
significant by Gene Ontology terms. Some of them can be expected to
be true complexes..
20. Osamu Maruyama, Heterodimeric Protein Complex Identification, ACM Conference on Bioinformatics, Computational Biology and Biomedicine 2011, 2011.08.
21. Osamu Maruyama and Ayaka Chihara, NWE: Node-Weighted Expansion for Protein Complex Prediction Using Random Walk Distances, Proc. IEEE International Conference on Bioinformatics & Biomedicine (IEEE BIBM 2010), 590-594, 2010.12.
22. Yukio Yasukochi, Osamu Maruyama, Milind C. Mahajan, Carolyn Pad- den, Ghia M. Euskirchen, Vincent Schulz, Hideki Hirakawa, Satoru Kuhara, Xing-Hua Pan, Peter E. Newburger, Michael Snyder, and Sherman M. Weiss- man, X chromosome-wide analyses of genomic DNA methylation states and
gene expression in male and female neutrophils
, Proceedings of the National Academy of Sciences of the United States of America (PNAS), 107, 3704-3709, 2010.02.
23. Osamu Maruyama, Hideki Hirakawa, Takao Iwayanagi, Yoshiko Ishida, Shizu Takeda, Jun Otomo, Satoru Kuhara, Evaluating Protein Sequence Signatures Inferred
from Protein-Protein Interaction Data by Gene Ontology Annotations, 2008 IEEE International Conference on Bioinformatics and Biomedicine, 417-420, 2008.11.
24. Osamu Maruyama, Akiko Matsuda, and Satoru Kuhara, Reconstructing phylogenetic trees of prokaryote genomes by randomly sampling oligopeptides, International Journal of Bioinformatics Research and Applicaions (IJBRA) 1(4), 429-446, 2005. (preliminary version has appeared in the Proceedings of the 5th International Conference on Computational Science (ICCS 2005), Lecture Notes in Computer Science 3514-6, Springer-Verlag, II-911-918, 2005). , 2005.11.
25. Daichi Shigemizu and Osamu Maruyama., Searching for Regulatory Elements of Alternative Splicing Events Using Phylogenetic Footprinting,, Proceedings of the 4th Workshop on Algorithms in Bioinformatics, Lecture Notes in Bioinformatics 3240, Springer-Verlag, 3240, 147-158, 147-158, 2004.09.
26. Osamu Maruyama, Extensive Search for Discriminative Features of Alternative Splicing, Pacific Symposium on Biocomputing 2004, 54-65, 54-65, 2004.01.
27. Osamu Maruyama, Finding optimal degenerate patterns in DNA sequences, Bioinformatics, 10.1093/bioinformatics/btg1079, 19, II206-II214, 19(supplement 2), 206-214, 2003.09.
28. Tatsuya Akutsu, Satoru Kuhara, Osamu Maruyama, and Satoru Miyano., Identification of genetic networks by strategic gene disruptions and
gene overexpressions under a boolean model, Theoretical Computer Science, 10.1016/S0304-3975(02)00425-5, 298, 1, 235-251, 298, 235-251, 2003.01.
29. O. Maruyama, H. Bannai, Y. Tamada, S.Kuhara, and S.Miyano, Fast algorithm for extracting multiple unordered short motifs using bit
operations, Information Sciences, 10.1016/S0020-0255(02)00219-0, 146, 1-4, 115-126, 146, 115-126,, 2002.01.
30. H. Bannai, Y. Tamada, O. Maruyama, K. Nakai, and S. Miyano, Extensive feature detection of N-terminal protein sorting signals, Bioinformatics, 10.1093/bioinformatics/18.2.298, 18, 2, 298-305, \textbf{18}(2), 298--305., 2002.01.
31. Osamu Maruyama, Markov Chain Monte Carlo Algorithms, A Mathematical Approach to Research Problems of Science and Technology, 349-363, 2014. .