九州大学 研究者情報
研究者情報 (研究者の方へ)入力に際してお困りですか?
基本情報 研究活動 教育活動 社会活動
丸山 修(まるやま おさむ) データ更新日:2023.10.03

准教授 /  芸術工学研究院 未来共生デザイン部門 モデリング・最適化


主な研究テーマ
Computational Biology
キーワード:配列モチーフ予測,DNAメチル化状態予測,アルゴリズム,機械学習
1996.04.
研究業績
主要著書
1. 丸山 修,阿久津 達也, バイオインフォマティクス—配列データ解析と構造予測 (シリーズ予測と発見の科学 4) , 朝倉書店, 2007.05.
主要原著論文
1. Osamu Maruyama Yinuo Li Hiroki Narita Hidehiro Toh Wan Kin Au Yeung Hiroyuki Sasaki, CMIC: predicting DNA methylation inheritance of CpG islands with embedding vectors of variable-length k-mers, BMC Bioinformatics, 10.1186/s12859-022-04916-3, 23, 371, 2022.09, [URL], Background: Epigenetic modifcations established in mammalian gametes are largely
reprogrammed during early development, however, are partly inherited by the embryo
to support its development. In this study, we examine CpG island (CGI) sequences to
predict whether a mouse blastocyst CGI inherits oocyte-derived DNA methylation
from the maternal genome. Recurrent neural networks (RNNs), including that based on
gated recurrent units (GRUs), have recently been employed for variable-length inputs
in classifcation and regression analyses. One advantage of this strategy is the ability
of RNNs to automatically learn latent features embedded in inputs by learning their
model parameters. However, the available CGI dataset applied for the prediction of
oocyte-derived DNA methylation inheritance are not large enough to train the neural
networks.
Results: We propose a GRU-based model called CMIC (CGI Methylation Inheritance
Classifer) to augment CGI sequence by converting it into variable-length k-mers,
where the length k is randomly selected from the range kmin to kmax, N times, which
were then used as neural network input. N was set to 1000 in the default setting. In
addition, we proposed a new embedding vector generator for k-mers called splitDNA2vec. The randomness of this procedure was higher than the previous work,
dna2vec.
Conclusions: We found that CMIC can predict the inheritance of oocyte-derived DNA
methylation at CGIs in the maternal genome of blastocysts with a high F-measure
(0.93). We also show that the F-measure can be improved by increasing the parameter
N, that is, the number of sequences of variable-length k-mers derived from a single
CGI sequence. This implies the efectiveness of augmenting input data by converting a
DNA sequence to N sequences of variable-length k-mers. This approach can be applied
to diferent DNA sequence classifcation and regression analyses, particularly those
involving a small amount of data..
2. Wan Kin Au Yeung, Osamu Maruyama, Hiroyuki Sasaki, A convolutional neural network-based regression model to infer the epigenetic crosstalk responsible for CG methylation patterns., BMC Bioinform. , 10.1186/s12859-021-04272-8, 22, 341-341, 2021.06, [URL].
3. Osamu Maruyama, Fumiko Matsuzaki, DegSampler3: Pairwise Dependency Model in Degradation Motif Site Prediction of Substrate Protein Sequences, Proc. of 19th IEEE International Conference on Bioinformatics and Bioengineering, 2019.10.
4. Osamu Maruyama,Fumiko Matsuzaki, DegSampler: Collapsed Gibbs sampler for detecting E3 binding sites, 18th IEEE International Conference on Bioinformatics and Bioengineering, BIBE 2018
Proceedings - 2018 IEEE 18th International Conference on Bioinformatics and Bioengineering, BIBE 2018
, 10.1109/BIBE.2018.00009, 1-9, 2018.12, In this paper, we address the problem of finding sequence motifs in substrate proteins specific to E3 ubiquitin ligases (E3s). We formulated a posterior probability distribution of sites by designing a likelihood function based on amino acid indexing and a prior distribution based on the disorderness of protein sequences. These designs are derived from known characteristics of E3 binding sites in substrate proteins. Then, we devise a collapsed Gibbs sampling algorithm for the posterior probability distribution called DegSampler. We performed computational experiments using 36 sets of substrate proteins specific to E3s and compared the performance of DegSampler with those of popular motif finders, MEME and GLAM2. The results showed that DegSampler was superior to the others in finding E3 binding motifs. Thus, DegSampler is a promising tool for finding E3 motifs in substrate proteins..
5. Osamu Maruyama, Yuki Kuwahara, RocSampler: Regularizing Overlapping Protein Complexes in Protein-Protein Interaction Networks, BMC Bioinformatics, 10.1186/s12859-017-1920-5, 18, 51-62, 491, 2017.12, [URL], BackgroundIn recent years, protein-protein interaction (PPI) networks have been well recognized as important resources to elucidate various biological processes and cellular mechanisms. In this paper, we address the problem of predicting protein complexes from a PPI network. This problem has two difficulties. One is related to small complexes, which contains two or three components. It is relatively difficult to identify them due to their simpler internal structure, but unfortunately complexes of such sizes are dominant in major protein complex databases, such as CYC2008. Another difficulty is how to model overlaps between predicted complexes, that is, how to evaluate different predicted complexes sharing common proteins because CYC2008 and other databases include such protein complexes. Thus, it is critical how to model overlaps between predicted complexes to identify them simultaneously.ResultsIn this paper, we propose a sampling-based protein complex prediction method, RocSampler (Regularizing Overlapping Complexes), which exploits, as part of the whole scoring function, a regularization term for the overlaps of predicted complexes and that for the distribution of sizes of predicted complexes. We have implemented RocSampler in MATLAB and its executable file for Windows is available at the site, http://imi.kyushu-u.ac.jp/~om/software/RocSampler/.ConclusionsWe have applied RocSampler to five yeast PPI networks and shown that it is superior to other existing methods. This implies that the design of scoring functions including regularization terms is an effective approach for protein complex prediction..
6. Osamu Maruyama, Limsoon Wong, Regularizing predicted complexes by mutually exclusive protein-protein interactions, Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015, 1068-1075, 2015.08, Protein complexes are key entities in the cell respon-
sible for various cellular mechanisms and biological processes. We
propose here a method for predicting protein complexes from
a protein-protein interaction (PPI) network, using information
on mutually exclusive PPIs. If two interactions are mutually
exclusive, they are not allowed to exist simultaneously in the
same predicted complex. We introduce a new regularization term
which checks whether predicted complexes are connected by mu-
tually exclusive PPIs. This regularization term is added into the
scoring function of our earlier protein complex prediction tool,
PPSampler2. We show that PPSampler2 with mutually exclusive
PPIs outperforms the original one. Furthermore, the performance
is superior to well-known representative conventional protein
complex prediction methods. Thus, it is is effective to use mutual
exclusiveness of PPIs in protein complex prediction..
7. So Kobiki, Osamu Maruyama, ReSAPP: Predicting overlapping protein complexes by merging multiple-sampled partitions of proteins, Journal of bioinformatics and computational biology, 12, 6, 1442004, 2014.12.
8. Chern Han Yong, Osamu Maruyama, Limsoon Wong, Discovery of small protein complexes from PPI networks with size-specific supervised weighting, BMC systems biology 8, S3-S3, 2014., 2014.12.
9. Osamu Maruyama, Shota Shikita, A scale-free structure prior for Bayesian inference of Gaussian graphical models, IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2014. , 2014.11.
10. Chasanah Kusumastuti Widita, Osamu Maruyama, PPSampler2: Predicting Protein Complexes More Accurately and Efficiently by Sampling, BMC Systems Biology, 7, Suppl 6, S14, 2013.12, The problem of predicting sets of components of heteromeric protein complexes is a challenging problem in
Systems Biology. There have been many tools proposed to predict those complexes. Among them, PPSampler, a
protein complex prediction algorithm based on the Metropolis-Hastings algorithm, is reported to outperform other
tools. In this work, we improve PPSampler by refining scoring functions and a proposal distribution used inside the
algorithm so that predicted clusters are more accurate as well as the resulting algorithm runs faster. The new
version is called PPSampler2. In computational experiments, PPSampler2 is shown to outperform other tools
including PPSampler. The F-measure score of PPSampler2 is 0.67, which is at least 26% higher than those of the
other tools. In addition, about 82% of the predicted clusters that are unmatched with any known complexes are
statistically significant on the biological process aspect of Gene Ontology. Furthermore, the running time is
reduced to twenty minutes, which is 1/24 of that of PPSampler..
11. Daisuke Tatsuke, Osamu Maruyama, Sampling Strategy for Protein Complex Prediction Using Cluster Size Frequency, Gene, Special issue of the 23rd International Conference on Genome Informatics (GIW), 2012.12, In this paper we propose a Markov chain Monte Carlo sampling method for
predicting protein complexes from protein-protein interactions (PPIs). Many
of the existing tools for this problem are designed more or less based on a
density measure of a subgraph of the PPI network. This kind of measures
is less effective for smaller complexes. On the other hand, it can be found
that the number of complexes of a size in a database of protein complexes
follows a power-law. Thus, most of the complexes are small-sized. For example,
in CYC2008, a database of curated protein complexes of yeast, 42% of
the complexes are heterodimeric, i.e., a complex consisting of two different
proteins. In this work, we propose a protein complex prediction algorithm,
called PPSampler (Proteins’ Partition Sampler), which is designed based on
the Metropolis-Hastings algorithm using a parameter representing a target
value of the relative frequency of the number of predicted protein complexes
of a particular size. In a performance comparison, PPSampler outperforms
other existing algorithms. Furthermore, about half of the predicted clusters
that are not matched with any known complexes in CYC2008 are statistically
significant by Gene Ontology terms. Some of them can be expected to
be true complexes..
12. Osamu Maruyama, Heterodimeric Protein Complex Identification, ACM Conference on Bioinformatics, Computational Biology and Biomedicine 2011, 2011.08.
13. Osamu Maruyama and Ayaka Chihara, NWE: Node-Weighted Expansion for Protein Complex Prediction Using Random Walk Distances, Proc. IEEE International Conference on Bioinformatics & Biomedicine (IEEE BIBM 2010), 590-594, 2010.12.
14. Yukio Yasukochi, Osamu Maruyama, Milind C. Mahajan, Carolyn Pad- den, Ghia M. Euskirchen, Vincent Schulz, Hideki Hirakawa, Satoru Kuhara, Xing-Hua Pan, Peter E. Newburger, Michael Snyder, and Sherman M. Weiss- man, X chromosome-wide analyses of genomic DNA methylation states and
gene expression in male and female neutrophils
, Proceedings of the National Academy of Sciences of the United States of America (PNAS), 107, 3704-3709, 2010.02.
15. Osamu Maruyama, Hideki Hirakawa, Takao Iwayanagi, Yoshiko Ishida, Shizu Takeda, Jun Otomo, Satoru Kuhara, Evaluating Protein Sequence Signatures Inferred
from Protein-Protein Interaction Data by Gene Ontology Annotations, 2008 IEEE International Conference on Bioinformatics and Biomedicine, 417-420, 2008.11.
16. Osamu Maruyama, Akiko Matsuda, and Satoru Kuhara, Reconstructing phylogenetic trees of prokaryote genomes by randomly sampling oligopeptides, International Journal of Bioinformatics Research and Applicaions (IJBRA) 1(4), 429-446, 2005.
(preliminary version has appeared in the Proceedings of the 5th International Conference on Computational Science (ICCS 2005), Lecture Notes in Computer Science 3514-6, Springer-Verlag, II-911-918, 2005).
, 2005.11.
17. Daichi Shigemizu and Osamu Maruyama., Searching for Regulatory Elements of Alternative Splicing Events Using Phylogenetic Footprinting,, Proceedings of the 4th Workshop on Algorithms in Bioinformatics, Lecture
Notes in Bioinformatics 3240, Springer-Verlag
, 3240, 147-158, 147-158, 2004.09.
18. Osamu Maruyama, Extensive Search for Discriminative Features of Alternative Splicing, Pacific Symposium on Biocomputing 2004, 54-65, 54-65, 2004.01.
19. Osamu Maruyama, Finding optimal degenerate patterns in DNA sequences, Bioinformatics, 10.1093/bioinformatics/btg1079, 19, II206-II214, 19(supplement 2), 206-214, 2003.09.
主要学会発表等
1. Osamu Maruyama, Fumiko Matsuzaki, DegSampler3: Pairwise Dependency Model in Degradation Motif Site Prediction of Substrate Protein Sequences, 2019 IEEE 19th International Conference on Bioinformatics and Bioengineering, BIBE 2019, 2019.10, [URL].
2. Osamu Maruyama, Fumiko Matsuzaki, DegSampler: Collapsed Gibbs Sampler for Detecting E3 Binding Sites, 2018 IEEE 18th International Conference on Bioinformatics and Bioengineering (BIBE), 2018.12, In this paper, we address the problem of finding sequence motifs in substrate proteins specific to E3 ubiquitin ligases (E3s). We formulated a posterior probability distribution of sites by designing a likelihood function based on amino acid indexing and a prior distribution based on the disorderness of protein sequences. These designs are derived from known characteristics of E3 binding sites in substrate proteins. Then, we devise a collapsed Gibbs sampling algorithm for the posterior probability distribution called DegSampler. We performed computational experiments using 36 sets of substrate proteins specific to E3s and compared the performance of DegSampler with those of popular motif finders, MEME and GLAM2. The results showed that DegSampler was superior to the others in finding E3 binding motifs. Thus, DegSampler is a promising tool for finding E3 motifs in substrate proteins..
3. Osamu Maruyama, Limsoon Wong, Regularizing predicted complexes by mutually exclusive protein-protein interactions, International Symposium on Network Enabled Health Informatics, Biomedicine and Bioinformatics, HI-BI-BI 2015, 2015.08, [URL], Protein complexes are key entities in the cell responsible for various cellular mechanisms and biological processes. We
propose here a method for predicting protein complexes from
a protein-protein interaction (PPI) network, using information
on mutually exclusive PPIs. If two interactions are mutually
exclusive, they are not allowed to exist simultaneously in the
same predicted complex. We introduce a new regularization term
which checks whether predicted complexes are connected by mu-
tually exclusive PPIs. This regularization term is added into the
scoring function of our earlier protein complex prediction tool,
PPSampler2. We show that PPSampler2 with mutually exclusive
PPIs outperforms the original one. Furthermore, the performance
is superior to well-known representative conventional protein
complex prediction methods. Thus, it is is effective to use mutual
exclusiveness of PPIs in protein complex prediction..
4. Tatsuke Daisuke, Osamu Maruyama, Sampling Strategy for Protein Complex Prediction Using Cluster Size Frequency, The 23rd International Conference on Genome Informatics, 2012.12, [URL], In this paper we propose a Markov chain Monte Carlo sampling method for
predicting protein complexes from protein-protein interactions (PPIs). Many
of the existing tools for this problem are designed more or less based on a
density measure of a subgraph of the PPI network. This kind of measures
is less effective for smaller complexes. On the other hand, it can be found
that the number of complexes of a size in a database of protein complexes
follows a power-law. Thus, most of the complexes are small-sized. For example,
in CYC2008, a database of curated protein complexes of yeast, 42% of
the complexes are heterodimeric, i.e., a complex consisting of two different
proteins. In this work, we propose a protein complex prediction algorithm,
called PPSampler (Proteins’ Partition Sampler), which is designed based on
the Metropolis-Hastings algorithm using a parameter representing a target
value of the relative frequency of the number of predicted protein complexes
of a particular size. In a performance comparison, PPSampler outperforms
other existing algorithms. Furthermore, about half of the predicted clusters
that are not matched with any known complexes in CYC2008 are statistically
significant by Gene Ontology terms. Some of them can be expected to
be true complexes..
5. Osamu Maruyama, Evaluating Protein Sequence Signatures Inferred
from Protein-Protein Interaction Data by Gene Ontology Annotations, 2008 IEEE International Conference on Bioinformatics and Biomedicine, 2008.11, [URL].
作品・ソフトウェア・データベース等
1. Osamu Maruyama, Yinuo Li, and Narita Hiroki, CMIC, 2022.06, https://github.com/maruyama-lab-design/CMIC.
2. Osamu Maruyama and Daisuke Tatsuke, PPSampler, 2013.06, PPSampler is a sampling algorithm for predicting protein complexes from a protein-protein interaction network..
学会活動
所属学会名
電子情報通信学会
International Society for Computational Biology
日本バイオインフォマティクス学会
学会大会・会議・シンポジウム等における役割
2019.10.28~2019.10.28, 2019 IEEE 19th International Conference on Bioinformatics and Bioengineering, BIBE 2019, 座長.
2012.12.12~2012.12.14, GIW, 座長(Chairmanship).
2010.12.13~2010.12.15, 2010年日本バイオインフォマティクス学会年会, 座長(Chairmanship).
2012.08.09~2012.08.09, 第30回情報処理学会バイオ情報学研究会, 座長(Chairmanship).
2010.12.13~2010.12.15, 2010年日本バイオインフォマティクス学会年会, 組織委員会委員.
学会誌・雑誌・著書の編集への参加状況
2023.08~2023.10, IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 国際, 査読委員.
2020.06~2020.06, 2020年日本バイオインフォマティクス学会年会・第9回生命医薬情報学連合大会(IIBMP2020), 国内, 査読委員.
2020.06~2020.06, International Conference on Computational Systems-Biology and Bioinformatics (CSBio), 国際, 査読委員.
2019.06~2019.06, IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 国際, 査読委員.
2018.01~2018.12, The 9th IEEE International Conference on Awareness Science and Technology (iCAST 2018), 国際, 座長.
2018.01~2018.12, The 9th IEEE International Conference on Awareness Science and Technology (iCAST 2018), 国際, 査読委員.
2018.01~2018.12, Asia Pacific Bioinformatics Conference (APBC), 国際, 査読委員.
2017.01~2017.12, International Work-Conference on Bioinformatics and Biomedical Engineering (IWBBIO), 国際, 査読委員.
2017.01~2017.12, International Conference on Bioinformatics (InCoB), 国際, 査読委員.
2015.01~2017.12, Workshop on Advances in Artificial Intelligence and Bioinformatics (IJCAI\_BAI), 国際, 査読委員.
2015.01~2015.12, International Conference on Computational Systems-Biology and Bioinformatics (CSBio), 国際, 査読委員.
2013.01~2015.12, International Symposium on Network Enabled Health Informatics, Biomedicine and Bioinformatics (HI-BI-BI), 国際, 査読委員.
2016.01~2016.12, International Symposium on Network Analysis and Mining for Health Informatics, Biomedicine and Bioinformatics (Net-HI-BI-BI), 国際, 査読委員.
2013.01~2013.12, International Symposium on Network Analysis and Mining for Health Informatics, Biomedicine and Bioinformatics (Net-HI-BI-BI), 国際, 査読委員.
2017.01~2017.12, International Conference on Genome Informatics (GIW), 国際, 査読委員.
2012.01~2014.12, International Conference on Genome Informatics (GIW), 国際, 査読委員.
2004.01~2006.12, International Conference on Genome Informatics (GIW), 国際, 査読委員.
2007.01~2007.12, IEEE International Conference on Bioinformatics and Bioengineering (BIBE), 国際, 査読委員.
2007.01~2017.12, IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 国際, 査読委員.
2006.01~2016.12, International Workshop on Bioinformatics Research and Applications (ISBRA), 国際, 査読委員.
2006.03, Journal of Computational Intelligence in Bioinformatics (JCIB), 国際, 編集委員.
2012.01~2012.01, International Conference on Genome Informatics (GIW), 国際, 査読委員.
2007.01~2007.01, International Symposium on Bioinformatics Research and Applications, 国際, 査読委員.
2007.01~2007.01, IEEE International Conference on Bioinformaticsand Biomedicine, 国際, 査読委員.
2006.01~2006.01, International Workshop on Bioinformatics Research and Applications, 国際, 査読委員.
学術論文等の審査
年度 外国語雑誌査読論文数 日本語雑誌査読論文数 国際会議録査読論文数 国内会議録査読論文数 合計
2022年度      
2021年度      
2019年度
2018年度     18    18 
2010年度
2009年度
その他の研究活動
海外渡航状況, 海外での教育研究歴
University of Philippines (UP) Diliman校, University of Philippines (UP) Manila校, Philippines, 2020.02~2020.02.
ISMB2019, Switzerland, 2019.07~2019.07.
BIBE2019, Greece, 2019.10~2020.11.
Philppine大学Manila校, Ateneo de Manila 大学, Philippines, 2018.11~2018.11.
Queensland Univ., Australia, 2006.07~2007.09.
受賞
(財)電気通信普及財団 第9回 テレコムシステム技術学生賞佳作, (財)電気通信普及財団, 1994.03.
研究資金
科学研究費補助金の採択状況(文部科学省、日本学術振興会)
2021年度~2024年度, 基盤研究(B), 代表, 3次元構造言語ゲノムの数理的解析と応用.
2019年度~2023年度, 基盤研究(B), 分担, 色の感覚意識体験に関連する神経表現の共通性と多様性.
2018年度~2022年度, 特別推進研究, 分担, 多階層オミックスによる卵子の発生能制御分子ネットワークの解明.
2017年度~2019年度, 基盤研究(C), 代表, 混合正則化モデリングを軸としたヘテロ生物データ群からの機械学習の研究.
2014年度~2016年度, 基盤研究(C), 代表, 大規模バイオデータに対する混合正則化モデリングと最適化サンプリング技法の研究.
2004年度~2006年度, 若手研究(B), 代表, ヘテロな検索空間に対する最適パターン探索アルゴリズムの構築とゲノムデータへの適用.
2001年度~2002年度, 奨励研究(A), 代表, 属性の創造と探索によるDNAシグナル配列発見方式の研究.
1997年度~1998年度, 奨励研究(A), 代表, グラフの局所情報からのグラフを復元するためのグラフ形成規則の定式化と学習方式の研究.
日本学術振興会への採択状況(科学研究費補助金以外)
2007年度~2007年度, 特定国派遣, 代表, モチーフ発見の理論的限界.

九大関連コンテンツ

pure2017年10月2日から、「九州大学研究者情報」を補完するデータベースとして、Elsevier社の「Pure」による研究業績の公開を開始しました。