Faculty Profiles - OHSHIMA SATOSHI

Information

写真a

OHSHIMA SATOSHI

Organization

Research Institute for Information Technology Section of Advanced Computational Science Associate Professor
Joint Graduate School of Mathematics for Innovation （Concurrent）
Graduate School of Information Science and Electrical Engineering Department of Information Science and Technology（Concurrent）

Contact information

Homepage

https://lab.exth.net/
Ohshima Laboratory
https://exth.net/
Satoshi Ohshima's website

External link

Research Areas

Informatics / High performance computing

Degree

Ph.D （ 2009.3 The University of Electro-Communications ）

Research History

Kyushu University 情報基盤研究開発センター Associate Professor

2022.10 - Present
Nagoya University 情報基盤センター Associate Professor

2019.7 - 2022.9
Kyushu University 情報基盤研究開発センター Assistant Professor

2017.4 - 2019.6
文部科学省研究振興局参事官(情報担当)付計算科学技術推進室

2014.4 - 2016.3
The University of Tokyo 情報基盤センター Assistant Professor

2009.10 - 2017.3
The University of Electro-Communications 大学院情報システム学研究科 Academic Researcher

2009.4 - 2009.9

▼display all

Education

The University of Electro-Communications 情報システム学研究科

2006.4 - 2009.3
The University of Electro-Communications 情報システム学研究科

2004.4 - 2006.3
The University of Electro-Communications 電気通信学部情報工学科

2000.4 - 2004.3

Research Interests・Research Keywords

Research theme： Utilizing RT core of GPU on computational science

Keyword： GPU, RT core, computational science

Research period： 2021.4
Research theme： Communication avoiding CG method on various current parallel computer systems

Keyword： communication avoiding CG method

Research period： 2017.4 - 2020.12
Research theme： Parallel-in-Time integration

Keyword： Parallel-in-Time integration

Research period： 2017.4 - 2020.3
Research theme： Low-rank approximation method on GPU

Keyword： Low-rank approximation method, GPU

Research period： 2016.4
Research theme： Molecular dynamics simulation on multi-core and many-core processors

Keyword： Molecular dynamics simulation, multi-core CPU, many-core processor, Auto-Tuning

Research period： 2015.4 - 2020.12
Research theme： Auto-Tuning for parallel numerical calculation

Keyword： Auto-Tuning

Research period： 2009.10
Research theme： High-performance GPU computing

Keyword： GPU, GPGPU, GPU computing

Research period： 2004.5

Awards

山下記念研究賞

2025.3 情報処理学会

　More details

Award type：Award from Japanese society, conference, symposium, etc.

情報処理学会が授与する研究賞
Best Paper Award of PDSEC 2023

2023.5 PDSEC committee 前職名古屋大学にて学生らと実施した研究について書いた共著論文 "Implementation of Radio Wave Propagation using RT Cores and Consideration of Programming Models" が国際会議IPDPS2023のワークショップであるPDSEC2023においてBest Paper Award（最優秀論文賞）を受賞。

　More details

前職の名古屋大学在籍時に指導していた学生が中心となり、共同で指導していた教員らとともに執筆・投稿した論文が、国際ワークショップPDSECにてBest Paper Awardを受賞した。

Papers

A Study on the Performance and Usability of Managed Memory and Unified Memory for Accelerating Numerical Calculation Program

2025.12

　More details

DOI： 10.1109/MCSoC67473.2025.00017
Large-Scale FMO-MP2 Calculations of the Spike Protein Droplet Model

Doi, H; Nakano, T; Sakakura, K; Akisawa, K; Okuwaki, K; Hirano, Y; Yamamoto, E; Yasuoka, K; Ohshima, S; Katagiri, T; Mochizuki, Y

JOURNAL OF COMPUTATIONAL CHEMISTRY 46 ( 4 ) e70052 2025.2 （ ISSN:0192-8651 eISSN:1096-987X ）

　More details

Language：English Publisher：Journal of Computational Chemistry

The spike protein of SARS-CoV-2 is a challenging target for theoretical approaches. Here we report a benchmark calculation of the spike protein droplet model by the fragment molecular orbital (FMO) at the second-order Møller-Plesset perturbation (MP2) level on the supercomputer Fugaku. One hundred structure samples from molecular dynamics (MD) simulations were used for both the closed and open forms of this protein (PDB IDs 6XLU and 6XM0 respectively). The number of total fragments is about 20,000, and the job time per structure was about 2 h on 8 racks of Fugaku.

DOI： 10.1002/jcc.70052

Web of Science

Scopus

PubMed
WaitIO-Hybrid: Communication for Coupling MPI Programs Among Heterogeneous Systems

Sumimoto S., Arakawa T., Sakaguchi Y., Matsuba H., Ohshima S., Yashiro H., Hanawa T., Nakajima K.

Lecture Notes in Computer Science 2025

　More details

DOI： 10.1007/978-981-96-4207-6_13
Accelerating Heterogeneous Coupling Computing with WaitIO Using RDMA

Sumimoto, S; Arakawa, T; Sakaguchi, Y; Matsuba, H; Ohshima, S; Yashiro, H; Hanawa, T; Nakajima, K

PROCEEDINGS OF INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING IN ASIA-PACIFIC REGION WORKSHOPS, HPC ASIA 2025 WORKSHOPS 1 - 10 2025 （ ISBN:979-8-4007-1342-2 ）

　More details

Publisher：Proceedings of International Conference on High Performance Computing in Asia Pacific Region Workshops HPC Asia 2025 Workshops

In this paper, we propose communication libraries WaitIO-Verbs and WaitIO-Tofu using RDMA communication to speed up communication performance in the h3-OpenSYS/WaitIO (WaitIO) library, which can connect multiple MPI programs across multiple heterogeneous systems. It is important to use industry-standard communication methods for communication between heterogeneous systems, and WaitIO implements WaitIO-Socket and WaitIO-File, which use POSIX-based specifications for socket and file IO. However, since POSIX specifications generally use system calls, there is a possibility that sufficient performance may not be obtained depending on the system. Therefore, to further speed up communication, we implemented WaitIO-Verbs and WaitIO-Tofu using user-level RDMA using industry-standard or default system communication specifications. As a result of implementation and evaluation, we achieved high communication performance and application performance. WaitIO achieved high application performance even between multiple heterogeneous clusters, which MPI could not achieve.

DOI： 10.1145/3703001.3724382

Web of Science

Scopus
WaitIO+MPI Hybridによる異種システム間でのAllreduceの高速化

植野貴大, 住元真司, 中島研吾, 片桐孝洋, 大島聡史, 星野哲也, 河合直聡, 永井亨

情報処理学会研究報告(HPC-196) 2024.9
スーパーコンピュータ玄界の性能評価

大島聡史, 南里豪志, 美添一樹

情報処理学会研究報告(HPC-196) 2024.9
マルチプロセス実行によるGPU演算性能向上への試み

⼤島聡史, 伊⽥明弘, 河合直聡, 深⾕猛, 横⽥理央, ⼭崎市太郎

第29回計算工学講演会 2024.6
WaitIO+MPI Hybridによる異種システム間でのAllreduceの高速化

佐賀一繁, 竹房あつ子, 合田憲人, 高倉弘喜, 栗本崇, 坂根栄作, 藤原一毅, 田中秀樹, 大島聡史, 山本啓二, 塙敏博

情報処理学会研究報告(HPC-194) 2024.5

　More details

Language：Japanese
WaitIOのRDMAによる通信高速化

住元真司, 荒川隆, 坂口吉生, 松葉浩也, 大島聡史, 八代尚, 塙敏博, 中島研吾

情報処理学会研究報告(HPC-194) 2024.5
Xabclib:A Fully Auto-tuned Sparse Iterative Solveri

ArXiv 2024

　More details

Web of Science
Development Status of ABINIT-MP in 2023

MOCHIZUKI Yuji, NAKANO Tatsuya, SAKAKURA Kota, OKUWAKI Koji, DOI Hideo, KATO Toshihiro, TAKIZAWA Hiroyuki, NARUSE Akira, OHSHIMA Satoshi, HOSHINO Tetsuya, KATAGIRI Takahiro

Journal of Computer Chemistry, Japan 23 ( 1 ) 4 - 8 2024 （ ISSN:13471767 eISSN:13473824 ）

　More details

Language：Japanese Publisher：Society of Computer Chemistry, Japan

In August 2023, we released the latest version of our ABINIT-MP program, Open Version 2 Revision 8. In this version, the most commonly used FMO-MP2 calculations are even faster than in the previous Revision 4. It is now also possible to calculate excitation and ionization energies for regions of interest. Improved interaction analysis is also available. In addition, we have started GPU-oriented modifications. In this preliminary report, we present the current status of ABINIT-MP.

DOI： 10.2477/jccj.2024-0001

CiNii Research

researchmap
Current Status and Future of the ABINIT-MP Program

MOCHIZUKI Yuji, NAKANO Tatsuya, SAKAKURA Kota, DOI Hideo, OKUWAKI Koji, KATO Toshihiro, TAKIZAWA Hiroyuki, OHSHIMA Satoshi, HOSHINO Tetsuya, KATAGIRI Takahiro

Journal of Computer Chemistry, Japan 23 ( 4 ) 85 - 97 2024 （ ISSN:13471767 eISSN:13473824 ）

　More details

Language：Japanese Publishing type：Research paper (scientific journal) Publisher：Society of Computer Chemistry, Japan

The fragment molecular orbital (FMO) program ABINIT-MP has a quarter-century history, and related research and development of the Open Version 2 series is currently underway. This paper first summarizes the current status of the latest Revision 8 (released on August 2023). It then describes future improvements and enhancements, including GPU support. The connection with coarse-grained simulation (dissipative particle dynamics) and the possibility of cooperation with quantum computation are also touched upon.

DOI： 10.2477/jccj.2024-0022

Web of Science

CiNii Research

researchmap
Adaptation of XAI to Auto-tuning for Numerical Libraries

Aoki S., Katagiri T., Ohshima S., Kawai M., Nagai T., Hoshino T.

Proceedings - 2024 IEEE 17th International Symposium on Embedded Multicore/Many-core Systems-on-Chip, MCSoC 2024 2024

　More details

DOI： 10.1109/MCSoC64144.2024.00095
Xabclib:A Fully Auto-tuned Sparse Iterative Solver.

Takahiro Katagiri, Takao Sakurai, Mitsuyoshi Igai, Shoji Itoh, Satoshi Ohshima, Hisayasu Kuroda, Ken Naono, Kengo Nakajima

CoRR abs/2405.01599 2024

　More details

Publishing type：Research paper (scientific journal)

DOI： 10.48550/arXiv.2405.01599

researchmap
次世代計算基盤の資源管理に関する調査研究の中間報告

佐賀一繁, 竹房あつ子, 竹房あつ子, 合田憲人, 合田憲人, 高倉弘喜, 高倉弘喜, 栗本崇, 栗本崇, 坂根栄作, 藤原一毅, 田中秀樹, 大島聡史, 山本啓二, 塙敏博

情報処理学会研究報告(Web) 2024 ( HPC-194 ) 2024

　More details

J-GLOBAL

researchmap
Implementation and performance comparisons of radio wave propagation loss using various GPUs with ray tracing accelerators

イジュンヒョク, 大島聡史

情報処理学会研究報告(Web) 2024 ( HPC-195 ) 2024

　More details

J-GLOBAL

researchmap
RDMA対応WaitIOにおけるメモリ登録キャッシュの設計

住元真司, 荒川隆, 坂口吉生, 八代尚, 大島聡史, 塙敏博, 中島研吾, 中島研吾

情報処理学会研究報告(Web) 2024 ( ARC-259 ) 2024

　More details

J-GLOBAL

researchmap
九州大学情報基盤研究開発センター新スーパーコンピュータシステムの紹介

大島聡史, 南里豪志, 美添一樹, 平島智将, 原田浩睦, 池田嗣穂

大学ICT推進協議会 2023年度年次大会 2023.12

　More details

Language：Others
4D Reconstruction of PET Using GPU Supercomputer

OHSHIMA Satoshi, YUASA Yoshinao, MATSUMURA Kaito, YOKOTA Tatsuya, HONTANI Hidekata, SAKATA Muneyuki, KIMURA Yuichi, KATAGIRI Takahiro, NAGAI Toru, HANAWA Toshihiro, HOSHINO Tetsuya

Medical Imaging Technology 41 ( 4-5 ) 150 - 156 2023.11 （ ISSN:0288450X eISSN:21853193 ）

　More details

Language：Japanese Publisher：The Japanese Society of Medical Imaging Technology

With the development of medical imaging technology, various techniques have been developed and used to visually understand the inside of the living body. However, these technologies can only directly obtain images and videos, and diagnosis is still performed by human hands, such as physicians. There are great expectations for software that can reduce such labor, and an increasing number of technologies are already being used in the field of medicine, but the target is limited because they require knowledge and skills in both medicine (medical imaging) and computing technology. Therefore, in this study, researchers in the medical imaging and high-performance computing fields are collaborating to accelerate and scale up the image reconstruction of PET. This paper describes the details of this effort and the results obtained so far.

DOI： 10.11409/mit.41.150

CiNii Research
Bayesian phase difference estimation algorithm for direct calculation of fine structure splitting: accelerated simulation of relativistic and quantum many-body effects

Kenji Sugisaki, Srinivasa Prasannaa, Satoshi Ohshima, Takahiro Katagiri, Yuji Mochizuki, Bijaya Kumar Sahoo, Bhanu Pratap Das

Electronic Structure 5 ( 3 ) 2023.9 （ ISSN:2516-1075 eISSN:2516-1075 ）

　More details

Language：English Publishing type：Research paper (scientific journal) Publisher：IOP Publishing

Abstract

Despite rapid progress in the development of quantum algorithms in quantum computing as well as numerical simulation methods in classical computing for atomic and molecular applications, no systematic and comprehensive electronic structure study of atomic systems that covers almost all of the elements in the periodic table using a single quantum algorithm has been reported. In this work, we address this gap by implementing the recently-proposed quantum algorithm, the Bayesian Phase Difference Estimation (BPDE) approach, to determine fine structure splittings of a wide range of boron-like atomic systems. Since accurate estimate of fine structure splittings strongly depend on the relativistic as well as quantum many-body effects, our study can test the potential of the BPDE approach to produce results close to the experimental values. Our numerical simulations reveal that the BPDE algorithm, in the Dirac–Coulomb–Breit framework, can predict fine structure splittings of ground states of the considered systems quite precisely. We performed our simulations of relativistic and electron correlation effects on Graphics Processing Unit by utilizing NVIDIA’s cuQuantum, and observe a ×42.7 speedup as compared to the Central Processing Unit-only simulations in an 18-qubit active space.

DOI： 10.1088/2516-1075/acf909

Web of Science

Scopus

researchmap

Other Link： https://iopscience.iop.org/article/10.1088/2516-1075/acf909/pdf
CUDA Fortran+MIG+UVMを用いたBLR行列QR分解の大規模高速化

大島聡史, 伊田明弘, 河合直聡, 横田理央, 山崎市太郎

情報処理学会研究報告(Web) 2023 ( HPC-190 ) 2023.7

　More details

Language：Others
「不⽼」 Type II上でcuQuantum量⼦シミュレータを⽤いた相対論的量⼦化学計算の事例

杉﨑研司, Prasannaa V. S, 大島聡史, 片桐孝洋, 森野慎也, 望月祐志, Sahoo B. K, Das B. P

第28回計算工学講演会予稿集 2023.6

　More details

Language：Others
数値計算ライブラリの自動チューニングにおけるXAI適用の試み—An Adaptation of XAI to Auto-tuning for Numerical Calculation Library

青木将太, 片桐孝洋, 大島聡史, 永井亨, 星野哲也

計算工学講演会論文集 = Proceedings of the Conference on Computational Engineering and Science / 日本計算工学会編 28 904 - 907 2023.5

　More details

Language：Japanese
Implementation of Radio Wave Propagation using RT Cores and Consideration of Programming Models.

Shinya Hashinoki, Satoshi Ohshima, Takahiro Katagiri, Toru Nagai, Tetsuya Hoshino

IPDPS Workshops 673 - 681 2023.5

　More details

Language：Others Publishing type：Research paper (other academic)

DOI： 10.1109/IPDPSW59300.2023.00115
QR Factorization of Block Low-Rank Matrices on Multi-Instance GPU Reviewed International journal

Satoshi Ohshima, Akihiro Ida, Rio Yokota, Ichitaro Yamazaki

Parallel and Distributed Computing, Applications and Technologies 2023.4

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： https://doi.org/10.1007/978-3-031-29927-8_28
QR Factorization of Block Low-Rank Matrices on Multi-instance GPU Reviewed

Satoshi Ohshima, Akihiro Ida, Rio Yokota, Ichitaro Yamazaki

Parallel and Distributed Computing, Applications and Technologies 13798 LNCS 359 - 369 2023.4 （ ISSN:0302-9743 ISBN:9783031299261, 9783031299278 eISSN:1611-3349 ）

　More details

Language：Others Publishing type：Part of collection (book) Publisher：Springer Nature Switzerland

The QR factorization, which is a fundamental operation in linear algebra, is used extensively in scientific simulations. The acceleration and memory reduction of it are important research targets. QR factorization using block low-rank matrices (BLR-QR) has previously been proposed to address this issue. In this study, we consider its implementation on a GPU. Current CPUs and GPUs have numerous computational cores and the performance consists of the total performance of them. Therefore, the degree of parallelism of the target calculation is important for obtaining high performance. By contrast, many applications, including BLR-QR, do not have sufficient parallelism. Batched computation has attracted attention for achieving high performance in such calculations. However, the use of it requires major code rewriting and is extremely laborious. Thus, we propose the use of the multi-instance GPU (MIG) feature of current GPUs. Using MIG, we succeeded in obtaining a 53.3% time reduction over the CPU and 77.6% over the GPU without MIG. From the above result, we succeeded in demonstrating rapid implementation of BLR-QR on MIG and usefulness of MIG.

DOI： 10.1007/978-3-031-29927-8_28

Scopus

researchmap
Autotuning Power Consumption and Computation Accuracy using ppOpen-AT.

Shouhei Yamanashi, Hisashi Yashiro, Takahiro Katagiri, Toru Nagai, Satoshi Ohshima

MCSoC 208 - 215 2023.1

　More details

Language：Others Publishing type：Research paper (other academic)

DOI： 10.1109/MCSoC57363.2022.00041
Parallelization of Automatic Tuning for Hyperparameter Optimization of Pedestrian Route Prediction Applications using Machine Learning.

Sorataro Fujika, Yuga Yajima, Teruo Tanaka, Akihiro Fujii, Yuka Kato, Satoshi Ohshima, Takahiro Katagiri

Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region 96 - 105 2023 （ ISBN:9781450398053 ）

　More details

Publishing type：Research paper (international conference proceedings) Publisher：ACM

We study software automatic tuning. Automatic tuning tools using iterative one-dimensional search estimate hyperparameters of machine learning programs. Iterative one-dimensional search searches the parameter space consisting of possible values of the parameters to be tuned by repeatedly measuring and evaluating the target program. Since it takes time to train a machine learning program, estimating the optimal hyperparameters is time-consuming. Therefore, we propose a method to reduce the time required for automatic tuning by parallelization of iterative one-dimensional search. For parallelization, we use multiple job execution on a supercomputer that can utilize multiple GPUs, which is effective for machine learning. In this method, each job measures different hyperparameters. The next search point is determined by referring to the data obtained from each job. The target program is a pedestrian path prediction application. This program predicts future routes and arrival points based on past pedestrian trajectory data. The program is intended to be used in a variety of locations, and the locations and movement patterns will vary depending on the dataset used for training. We hypothesized that the estimation results of one dataset could be used for automatic tuning of another dataset, thereby reducing the time required for automatic tuning. Experimental results confirm that the parallelized iterative one-dimensional search reduces the estimation time from 89.5 hours to 4 hours compared to the sequential search. We also show that the iterative one-dimensional search efficiently investigates the point at which the performance index improves. Moreover, the hyperparameters estimated for one data set are used as the initial point for the search and automatic tuning for another data set. Compared to the results of automatic tuning with the currently used hyperparameters as the initial values, both the number of executions and execution time were reduced.

DOI： 10.1145/3578178.3578235

Scopus

researchmap

Other Link： https://dblp.uni-trier.de/db/conf/hpcasia/hpcasia2023.html#FujikaYTFKOK23
Autotuning by Changing Directives and Number of Threads in OpenMP using ppOpen-AT.

Toma Sakurai, Satoshi Ohshima, Takahiro Katagiri, Toru Nagai

CoRR abs/2312.05779 2023

　More details

Publishing type：Research paper (scientific journal)

DOI： 10.48550/arXiv.2312.05779

researchmap
Direct Computational Methods for Fine-structure Splitting by Quantum Computers and Accelerated Numerical Simulation by GPUs

杉崎研司, 杉崎研司, 杉崎研司, PRASANNAA V. S., 大島聡史, 片桐孝洋, 望月祐志, 望月祐志, SAHOO B. K., DAS B. P., DAS B. P.

応用物理学会春季学術講演会講演予稿集(CD-ROM) 70th 2023 （ ISSN:2758-4704 ）

　More details

J-GLOBAL

researchmap
mdx: A Cloud Platform for Supporting Data Science and Cross-Disciplinary Research Collaborations.

Toyotaro Suzumura, Akiyoshi Sugiki, Hiroyuki Takizawa, Akira Imakura, Hiroshi Nakamura, Kenjiro Taura, Tomohiro Kudoh, Toshihiro Hanawa, Yuji Sekiya, Hill Hiroki Kobayashi, Yohei Kuga, Ryo Nakamura, Renhe Jiang, Junya Kawase, Masatoshi Hanai, Hiroshi Miyazaki, Tsutomu Ishizaki, Daisuké Shimotoku, Daisuke Miyamoto, Kento Aida, Atsuko Takefusa, Takashi Kurimoto, Koji Sasayama, Naoya Kitagawa, Ikki Fujiwara, Yusuke Tanimura, Takayuki Aoki, Toshio Endo, Satoshi Ohshima, Keiichiro Fukazawa, Susumu Date, Toshihiro Uchibayashi

DASC/PiCom/CBDCom/CyberSciTech 1 - 7 2022.12 （ ISBN:978-1-6654-6297-6 ）

　More details

Language：Others Publishing type：Research paper (other academic) Publisher：Proceedings of the 2022 IEEE International Conference on Dependable Autonomic and Secure Computing International Conference on Pervasive Intelligence and Computing International Conference on Cloud and Big Data Computing International Conference on Cyber Science and Technology Congress Dasc Picom Cbdcom Cyberscitech 2022

The growing amount of data and advances in data science have created a need for a new kind of cloud platform that provides users with flexibility, strong security, and the ability to couple with supercomputers and edge devices through high-performance networks. We have built such a nation-wide cloud platform, called "mdx"to meet this need. The mdx platform's virtualization service, jointly operated by 9 national universities and 2 national research institutes in Japan, launched in 2021, and more features are in development. Currently mdx is used by researchers in a wide variety of domains, including materials informatics, geo-spatial information science, life science, astronomical science, economics, social science, and computer science. This paper provides an overview of the mdx platform, details the motivation for its development, reports its current status, and outlines its future plans.

DOI： 10.1109/DASC/PiCom/CBDCom/Cy55231.2022.9927975

Web of Science

Scopus

researchmap

Other Link： https://dblp.uni-trier.de/db/conf/dasc/dasc2022.html#SuzumuraSTINTKH22
磁場閉じ込めプラズマの乱流シミュレーション・データに対する画像を用いた解析—Image-based Analysis for Turbulence Simulation Data of Magnetic Confined Plasmas—小特集プラズマ・核融合シミュレーション研究の最近の進展

定方翼, 沼波政倫, 片桐孝洋, 大島聡史, 永井亨

シミュレーション = Journal of the Japan Society for Simulation Technology / 日本シミュレーション学会編 41 ( 4 ) 228 - 233 2022.12

　More details

Language：Japanese
Development Status of ABINIT-MP in 2022 Invited Reviewed

MOCHIZUKI Yuji, NAKANO Tatsuya, SAKAKURA Kota, WATANABE Hiromasa, SATO Shinya, OKUWAKI Koji, AKISAWA Kazuki, DOI Hideo, OHSHIMA Satoshi, KATAGIRI Takahiro

J. Comp. Chem. Jpn. 21 ( 4 ) 106 - 110 2022.12 （ ISSN:13471767 eISSN:13473824 ）

　More details

Language：Japanese Publisher：Society of Computer Chemistry, Japan

Development Status of ABINIT-MP in 2022
We have been developing the ABINIT-MP program for fragment molecular orbital (FMO) calculations over 20 years. Several improvements for accelerated processing were made after the release of Open Version 2 Revision 4 at September 2021. Functionalities were enhanced as well. In this short report, we summarize such developments toward the next release of Revision 8

DOI： 10.2477/jccj.2022-0037

CiNii Research

researchmap
OpenMP/OpenACCハイブリッド並列化のためのコード変換フレームワークの提案

川崎真之 , 大島聡史 , 八巻隼人 , 三輪忍 , 本多弘樹

情報処理学会研究報告 2022-HPC-187 ( 8 ) 1 - 7 2022.11
WaitIO-Hybrid:共有ファイルシステムとSocketを併用可能なシステム間通信ライブラリ

住元真司, 荒川隆, 坂口吉生, 松葉浩也, 八代尚, 大島聡史, 塙敏博, 中島研吾

情報処理学会研究報告 2022-HPC-187 ( 6 ) 1 - 8 2022.11
マルチインスタンスGPU上におけるBLR行列のQR分解

大島聡史, 伊田明弘, 横田理央, 山崎市太郎

日本応用数理学会年会講演予稿集(CD-ROM) 2022 2022.9

　More details

Language：Others
A Novel Approach for Data Analysis Based on Visualization of Phase Space Distribution Function in Plasma Turbulence Simulations

Tsubasa SADAKATA, Shuta KITAZAWA, Masanori NUNAMI, Takahiro KATAGIRI, Satoshi OHSHIMA, Toru NAGAI

Plasma and Fusion Research 17 2403079 - 2403079 2022.6 （ eISSN:1880-6821 ）

　More details

Language：Others Publishing type：Research paper (scientific journal) Publisher：Japan Society of Plasma Science and Nuclear Fusion Research

Gyrokinetic simulations are important for analyzing magnetically confined plasmas. However, the data obtained from the gyrokinetic simulations are time-series of a five-dimensional phase space distribution function, making analyzing the transport phenomena extremely difficult because of its high dimensionality and large data size. We propose a novelmethod for analyzing such phase space distribution functions. First, the two-dimensional velocity space distribution function is mapped into the wavenumber space and visualized as an image. This enables us to easily capture the global features and the features of the individual velocity space distribution functions. Second, we apply similarity analysis based on the local features of images and cluster analysis based on distances between images and the velocity space distribution function. The proposed method enables us to automatically extract similar structures in the velocity space distribution function and quantify the duration of these structures.

DOI： 10.1585/pfr.17.2403079

Web of Science

Scopus

researchmap
高精度行列積ライブラリの性能チューニングにおけるXAIの適用と評価—Adaptation and Evaluation of XAI to Performance Auto-tuning on an Accurate Precision Matrix-Matrix Library

片桐孝洋, 青木将太, 大島聡史, 永井亨

計算工学講演会論文集 = Proceedings of the Conference on Computational Engineering and Science / 日本計算工学会編 27 548 - 551 2022.6

　More details

Language：Japanese
mdx: A Cloud Platform for Supporting Data Science and Cross-Disciplinary Research Collaborations.

Toyotaro Suzumura, Akiyoshi Sugiki, Hiroyuki Takizawa, Akira Imakura, Hiroshi Nakamura, Kenjiro Taura, Tomohiro Kudoh, Toshihiro Hanawa, Yuji Sekiya, Hill Hiroki Kobayashi, Shin Matsushima, Yohei Kuga, Ryo Nakamura, Renhe Jiang, Junya Kawase, Masatoshi Hanai, Hiroshi Miyazaki, Tsutomu Ishizaki, Daisuké Shimotoku, Daisuke Miyamoto, Kento Aida, Atsuko Takefusa, Takashi Kurimoto, Koji Sasayama, Naoya Kitagawa, Ikki Fujiwara, Yusuke Tanimura, Takayuki Aoki, Toshio Endo, Satoshi Ohshima, Keiichiro Fukazawa, Susumu Date, Toshihiro Uchibayashi

CoRR abs/2203.14188 2022.5

　More details

Language：Others Publishing type：Research paper (scientific journal)

DOI： 10.48550/arXiv.2203.14188

researchmap
Mesh Tensorflowを用いたMNIST学習の性能評価

鵜野, 圭介, 大島, 聡史, 片桐, 孝洋, 永井, 亨

第84回全国大会講演論文集 2022 ( 1 ) 65 - 66 2022.2

　More details

Language：Japanese

Mesh tensorflowは2019年に提案された分散型深層学習用の言語である。モデル並列によって大規模ニューラルネットワーク学習を行うといった特徴を持つソフトウェアだが、BERT型の構造を持つニューラルネットワークのみにしか使用できない点もあり性能評価事例が多くない。そこで本研究ではGPUスパコン上でMesh Tensorflowのサンプルコードの性能評価を行う。コード内ではパラメタとしてバッチサイズやレイアウトルール等が存在するため、それらを考慮した性能評価を行う。

CiNii Books

CiNii Research

researchmap
A64FXを用いたフラグメント分子軌道計算プログラムの性能評価

満田, 晴紀, 片桐, 孝洋, 坂倉, 耕太, 中野, 達也, 望月, 祐志, 大島, 聡史, 永井, 亨

第84回全国大会講演論文集 2022 ( 1 ) 63 - 64 2022.2

　More details

Language：Japanese

フラグメント分子軌道計算（FMO）を行うソフトウェアABINIT-MPについて、名古屋大学情報基盤センターが提供するスーパーコンピュータ「不老」TypeⅠサブシステム（CPU:ARMA64FX）を用いて性能評価した結果を示す。

CiNii Books

CiNii Research

researchmap
A64FXを用いたフラグメント分子軌道計算プログラムの性能評価

満田, 晴紀, 片桐, 孝洋, 坂倉, 耕太, 中野, 達也, 望月, 祐志, 大島, 聡史, 永井, 亨

第84回全国大会講演論文集 2022 ( 1 ) 63 - 64 2022.2

　More details

Language：Japanese

フラグメント分子軌道計算（FMO）を行うソフトウェアABINIT-MPについて、名古屋大学情報基盤センターが提供するスーパーコンピュータ「不老」TypeⅠサブシステム（CPU:ARMA64FX）を用いて性能評価した結果を示す。
サポートベクターマシンへのCMOSアニーリング適用の検討

福原, 諒河, 森下, 誠, 片桐, 孝洋, 大島, 聡史, 永井, 亨

第84回全国大会講演論文集 2022 ( 1 ) 69 - 70 2022.2

　More details

Language：Japanese

本発表ではサポートベクターマシン（SVM）にCMOSアニーリングを適用することを検討する。SVMとは線形入力素子を利用して2クラスのパターン識別器を構成する手法である。訓練サンプルから、各データ点との距離が最大となるマージン最大化超平面を求めるという基準で線形入力素子のパラメータを学習する。この際誤分類の許容度や汎化性能におけるハイパーパラメータの調整が必要になるが本研究では交差エントロピー損失を評価指標としてチューニングする。一方、CMOSアニーリングマシンは「組合せ最適化問題」に対して高速に最適解を出すアニーリング技術を常温環境で利用できる非ノイマン型計算機である。これを2クラス分類問題に用いSVMと比較する。
Mesh Tensorflowを用いたMNIST学習の性能評価

鵜野, 圭介, 大島, 聡史, 片桐, 孝洋, 永井, 亨

第84回全国大会講演論文集 2022 ( 1 ) 65 - 66 2022.2

　More details

Language：Japanese

Mesh tensorflowは2019年に提案された分散型深層学習用の言語である。モデル並列によって大規模ニューラルネットワーク学習を行うといった特徴を持つソフトウェアだが、BERT型の構造を持つニューラルネットワークのみにしか使用できない点もあり性能評価事例が多くない。そこで本研究ではGPUスパコン上でMesh Tensorflowのサンプルコードの性能評価を行う。コード内ではパラメタとしてバッチサイズやレイアウトルール等が存在するため、それらを考慮した性能評価を行う。
RTコアによる電波の伝搬損失計算の実装とプログラミングモデルの検討

Ohshima Satoshi

研究報告ハイパフォーマンスコンピューティング（HPC） 2022-HPC-185 1 - 12 2022

　More details

CiNii Research
Fortran標準規格do concurrentを用いたGPUオフローディング手法の評価

星野哲也, 塙敏博, 大島聡史

情報処理学会研究報告(Web) 2022-HPC-183 1 - 8 2022

　More details

CiNii Research
Installation and trial use of AlphaFold2 on Flow Type II supercomputer

望月祐志, 望月祐志, 大島聡史, 森脇由隆, 奥脇弘次, 秋澤和輝, 北原駿, 片桐孝洋

応用物理学会春季学術講演会講演予稿集(CD-ROM) 69th 2022 （ ISSN:2758-4704 ）

　More details

J-GLOBAL

researchmap
自動チューニングにおけるXAI適用の試み:精度保証ライブラリを例にして

片桐孝洋, 青木将太, 大島聡史, 永井亨

日本応用数理学会年会講演予稿集(CD-ROM) 2022 2022 （ ISSN:1345-3378 ）

　More details

J-GLOBAL

researchmap
Performance Evaluation of the x-means Clustering with Hybrid MPI-OpenMP Parallelization

定方翼, 沼波政倫, 沼波政倫, 片桐孝洋, 大島聡史, 永井亨

情報処理学会研究報告(Web) 2022 ( HPC-187 ) 2022

　More details

J-GLOBAL

researchmap
Parallelization of GKV benchmark using OpenACC.

Makoto Morishita, Satoshi Ohshima, Takahiro Katagiri, Toru Nagai

IEEE International Parallel and Distributed Processing Symposium Workshops 723 - 729 2021.6

　More details

Language：Others Publishing type：Research paper (other academic)

DOI： 10.1109/IPDPSW52791.2021.00109
An Auto-tuning with Adaptation of A64 Scalable Vector Extension for SPIRAL.

Naruya Kitai, Daisuke Takahashi, Franz Franchetti, Takahiro Katagiri, Satoshi Ohshima, Toru Nagai

IEEE International Parallel and Distributed Processing Symposium Workshops 789 - 797 2021.6

　More details

Language：Others Publishing type：Research paper (other academic)

DOI： 10.1109/IPDPSW52791.2021.00117
マルチGPU環境における機械学習ハイパーパラメータの自動チューニング（１）

多部田, 敏樹, 藤家, 空太郎, 藤井, 昭宏, 田中, 輝雄, 加藤, 由花, 大島, 聡史, 片桐, 孝洋

第83回全国大会講演論文集 2021 ( 1 ) 41 - 42 2021.3

　More details

Language：Japanese

我々は複数のパラメータを同時に推定する手法として，パラメータ空間における反復一次元探索を提案している．この手法はパラメータの組み合わせを自動的に選択し，その実行性能を実測，さらに別の組み合わせの選択を繰り返すことで探索を行う．この提案手法を機械学習プログラムに適用する．機械学習には複数のハイパーパラメータが存在し，適切なハイパーパラメータの組み合わせを推定するには時間がかかる．本研究は歩行者経路予測アプリケーションに用いる機械学習のハイパーパラメータについて適切な組み合わせを推定し，マルチGPU環境を利用して実測処理を並列化することで，約15日かかる推定が約12時間で完了することを示す．
マルチGPU環境における機械学習ハイパーパラメータの自動チューニング（２）

藤家, 空太郎, 多部田, 敏樹, 藤井, 昭宏, 田中, 輝雄, 加藤, 由花, 大島, 聡史, 片桐, 孝洋

第83回全国大会講演論文集 2021 ( 1 ) 43 - 44 2021.3

　More details

Language：Japanese

我々は反復一次元探索を用いた自動チューニングの研究に取り組んでおり，マルチGPU環境を用いた機械学習のプログラムのハイパーパラメータの最適化を進めている．機械学習は同一のハイパーパラメータを用いても毎回教師データが変わるなど同一の結果にならないため，自動チューニングの結果にブレが生じる．このブレに対して，これまで，我々は推定したパラメータに対して追加測定を行い自動チューニングの安定性を高める手法を提案してきた．本研究では，歩行者経路予測アプリケーションに用いる機械学習プログラムに適用しマルチGPU環境で推定したハイパーパラメータの値を並列化し複数回まとめて追加測定することによる，自動チューニングの精度向上について示す．
高精度行列-行列積における疎行列演算実装選択の自動チューニングの検討

青木将太, 片桐孝洋, 大島聡史, 永井亨

情報処理学会第83回全国大会 1 - 2 2021.3

　More details

Language：Others
量子アニーリングマシンにおける組み合わせ最適化問題の適用可能性の調査

大山基樹, 森下誠, 片桐孝洋, 大島聡史, 永井亨

情報処理学会第83回全国大会 1 - 2 2021.3

　More details

Language：Others
マルチGPU環境における機械学習ハイパーパラメータの自動チューニング（１）

多部田敏樹, 藤家空太郎, 藤井昭宏, 田中輝雄, 加藤由花, 大島聡史, 片桐孝洋

情報処理学会第83回全国大会 1 - 2 2021.3

　More details

Language：Others
マルチGPU環境における機械学習ハイパーパラメータの自動チューニング（２）

藤家空太郎, 多部田敏樹, 藤井昭宏, 田中輝雄, 加藤由花, 大島聡史, 片桐孝洋

情報処理学会第83回全国大会 1 - 2 2021.3

　More details

Language：Others
RTコアによるハードウェアレイトレーシングの性能評価

枦木慎也, 大島聡史, 片桐孝洋, 永井亨

情報処理学会第83回全国大会 1 - 2 2021.3

　More details

Language：Others
GPUクラスタを用いて並列化した自動チューニングの機械学習プログラムへの適用と安定性の検証

藤家空太郎, 多部田敏樹, 藤井昭宏, 田中輝雄, 加藤由花, 大島聡史, 片桐孝洋

情報処理学会研究報告(HPC-178) 1 - 8 2021.3

　More details

Language：Others
Adaptation of A64 Scalable Vector Extension for Spiral

Naruya Kitai, Daisuke Takahasi, Franz Franchetti, Takahiro Katagiri, Satoshi Ohshima, Toru Nagai

情報処理学会研究報告(HPC-178) 1 - 6 2021.3

　More details

Language：English
スーパーコンピュータ「不老」におけるOpenFOAMの性能評価

大島聡史, 今野雅

オープンCAE・FrontISTR合同シンポジウム2020 1 - 10 2020.12

　More details

Language：Others
スーパーコンピュータ「不老」のシステム構成と性能

大島聡史, 永井亨, 片桐孝洋

大学ICT推進協議会 2020年度年次大会 1 - 8 2020.12

　More details

Language：Others
スーパーコンピュータ「不老」のサービスとエコシステム

田島嘉則, 山田一成, 高橋一郎, 毛利晃大, 片桐孝洋, 大島聡史, 永井亨

大学ICT推進協議会 2020年度年次大会 1 - 4 2020.12

　More details

Language：Others
スーパーコンピュータ「不老」における光ディスクライブラリを用いたコールドストレージシステムの構築

高橋一郎, 大島聡史, 片桐孝洋

大学ICT推進協議会 2020年度年次大会 1 - 6 2020.12

　More details

Language：Others
カスタムキャビテーションモデルを用いたNACA0015水中翼周りの数値解析

池田拓士, 秋山善克, 今野雅, 大島聡史

オープンCAE・FrontISTR合同シンポジウム2020 1 - 2 2020.12

　More details

Language：Others
OpenFOAMへのカスタムキャビテーションモデルの実装

秋山善克, 池田拓士, 今野雅, 大島聡史

オープンCAE・FrontISTR合同シンポジウム2020 1 - 3 2020.12

　More details

Language：Others
LNGタンク内の異密度LNGの混合流動解析

田村守淑, 今野雅, 大島聡史

オープンCAE・FrontISTR合同シンポジウム2020 1 - 7 2020.12

　More details

Language：Others
Performance Evaluation of Accurate Matrix-Matrix Multiplication on GPU Using Sparse Matrix Multiplications Reviewed

Fumiya Ishiguro, Takahiro Katagiri, Satoshi Ohshima, Toru Nagai

2020 Eighth International Symposium on Computing and Networking Workshops (CANDARW) 178 - 184 2020.11

　More details

Language：English Publishing type：Research paper (other academic)

DOI： 10.1109/CANDARW51189.2020.00044
自動チューニング言語ppOpen-ATによる混合精度演算の最適化機能について

片桐孝洋, 山梨祥平, 八代尚, 大島聡史, 永井亨

日本応用数理学会年会講演予稿集(CD-ROM) 2020 2020.9

　More details

Language：Others
名古屋大学スーパーコンピュータ「不老」における医用画像処理

大島聡史, 小田昌宏, 片桐孝洋, 森健策

電子情報通信学会技報 MI2020-32(2020-09) 120 ( 156(MI2020 17-32) ) 69 - 74 2020.9

　More details

Language：Others
Parareal法における低精度演算・混合精度演算の活用

大島聡史, 飯塚幹夫, 小野謙二

日本応用数理学会 2020年年会 2020 64 - 65 2020.9

　More details

Language：Others
医用画像処理におけるLDDMMのマルチGPU高速化

杉浦拓未, 大島聡史, 片桐孝洋, 横田達也, 本谷秀堅, 永井亨

情報処理学会研究報告(HPC-175) 2020 ( HPC-175 ) 1 - 7 2020.7

　More details

Language：Others
スーパーコンピュータ「不老」の性能評価

大島聡史, 永井亨, 片桐孝洋

情報処理学会研究報告(HPC-175) 2020 ( HPC-175 ) 1 - 10 2020.7

　More details

Language：Others
Parareal法における低精度計算・混合精度計算の活用について

大島聡史, 飯塚幹夫, 小野謙二

第25回計算工学講演会 25 1 - 2 2020.6

　More details

Language：Others

Utilization of Low-precision and Mixed-precision Calculation in Parareal Method
Scalable Direct-Iterative Hybrid Solver for Sparse Matrices on Multi-Core and Vector Architectures Reviewed

Kenji Ono, Toshihiro Kato, Satoshi Ohshima, Takeshi Nanri

HPCAsia2020: Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region 11 - 21 2020.1

　More details

Language：English Publishing type：Research paper (other academic)

In the present paper, we propose an efficient direct-iterative hybrid solver for sparse matrices that can derive the scalability of the latest multi-core, many-core, and vector architectures and examine the execution performance of the proposed SLOR-PCR method. We also present an efficient implementation of the PCR algorithm for SIMD and vector architectures so that it is easy to output instructions optimized by the compiler. The proposed hybrid method has high cache reusability, which is favorable for modern low B/F architecture because efficient use of the cache can mitigate the memory bandwidth limitation. The measured performance revealed that the SLOR-PCR solver showed excellent scalability up to 352 cores on the cc-NUMA environment, and the achieved performance was higher than that of the conventional Jacobi and Red-Black ordering method by a factor of 3.6 to 8.3 on the SIMD architecture. In addition, the maximum speedup in computation time was observed to be a factor of 6.3 on the cc-NUMA architecture with 352 cores.

DOI： 10.1145/3368474.3368484
3次元医用画像データの再構成処理の並列化

中島大地, 大島聡史, 五嶋優詞, 横田達也, 片桐孝洋, 本谷秀堅, 永井亨, 岩本千佳, 大内田研宙, 橋爪誠

情報処理学会研究報告(Web) 2019 ( HPC-172 ) 1 - 7 2019.12

　More details

Language：Japanese
Optimization of Numerous Small Dense-Matrix-Vector Multiplications in H-matrix Arithmetic on GPU Reviewed

Satoshi Ohshima, Ichitaro Yamazaki, Akihiro Ida, Rio Yokota

2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC) 9 - 16 2019.10

　More details

Language：English Publishing type：Research paper (other academic)

Optimization of Numerous Small Dense-Matrix-Vector Multiplications in H-matrix Arithmetic on GPU

DOI： 10.1109/MCSoC.2019.00009
GPUによる階層型行列計算法の高速化に向けた多数の小密行列ベクトル積計算の最適化

大島聡史, 山崎市太郎, 伊田明弘, 横田理央

日本応用数理学会年会講演予稿集(CD-ROM) 2019 180‐181 2019.9

　More details

Language：Japanese

GPUによる階層型行列計算法の高速化に向けた多数の小密行列ベクトル積計算の最適化
Performance evaluation of the MODYLAS application on modern multi-core and many-core environments Reviewed International journal

Satoshi Ohshima, Soichiro Suzuki, Tatsuya Sakashita, Masao Ogino, Takahiro Katagiri, Yoshimichi Andoh

In Proceedings of IPDPSW2019 2019.8

　More details

Language：English Publishing type：Research paper (international conference proceedings)
Performance evaluation of the MODYLAS application on modern multi-core and many-core environments Reviewed

Satoshi Ohshima, Soichiro Suzuki, Tatsuya Sakashita, Masao Ogino, Takahiro Katagiri, Yoshimichi Andoh

IPDPSW2019 2019.5

　More details

Language：English Publishing type：Research paper (other academic)

Performance evaluation of the MODYLAS application on modern multi-core and many-core environments
高精度行列-行列積のためのBatched BLASおよび疎行列演算を用いた実装方式のGPU環境での性能評価

石黒史也, 片桐孝洋, 大島聡史, 永井亨, 荻野正雄

日本応用数理学会年会講演予稿集(CD-ROM) 2018 147‐148 2018.9

　More details

Language：Japanese

高精度行列‐行列積のためのBatched BLASおよび疎行列演算を用いた実装方式のGPU環境での性能評価
マルチコア・メニーコア計算機環境におけるChebyshev基底通信削減CG法の性能評価

大島聡史, 藤井昭宏, 田中輝雄, 深谷猛, 須田礼仁

日本応用数理学会年会講演予稿集(CD-ROM) 2018 471‐472 2018.9

　More details

Language：Japanese

マルチコア・メニーコア計算機環境におけるChebyshev基底通信削減CG法の性能評価
512bit SIMD環境における分子動力学アプリケーションMODYLASの性能評価

大島聡史, 鈴木惣一朗, 坂下逹哉, 荻野正雄, 片桐孝洋, 安藤嘉倫

情報処理学会研究報告(Web) 2018 ( HPC-166 ) Vol.2018‐HPC‐166,No.14,1‐9 (WEB ONLY) 2018.9

　More details

Language：Japanese

512bit SIMD環境における分子動力学アプリケーションMODYLASの性能評価
512bit SIMD環境における分子動力学アプリケーションMODYLASの性能評価

Satoshi Ohshima, Soichiro Suzuki, Tatsuya Sakashita, Masao Ogino, Takahiro Katagiri

研究報告ハイパフォーマンスコンピューティング（HPC） 166 ( 14 ) 1 - 9 2018.9

　More details

Language：Japanese Publishing type：Research paper (scientific journal)
Performance of Hierarchical-matrix BiCGStab Solver on GPU Clusters

Ichitaro Yamazaki, Ahmad Abdelfattah, Akihiro Ida, Satoshi Ohshima, Stanimire Tomov, Rio Yokota, Jack Dongarra

32nd IEEE International Parallel and Distributed Processing Symposium, IPDPS 2018 Proceedings - 2018 IEEE 32nd International Parallel and Distributed Processing Symposium, IPDPS 2018 930 - 939 2018.8

　More details

Language：English Publishing type：Research paper (other academic)

HACApK is a software package for solving dense linear systems of equations and is used in other software packages, like ppohBEM for solving boundary integral equations. To enable the solution of large-scale boundary value problems, HACApK hierarchically compresses the coefficient matrix and uses the BiConjugate Gradient Stabilized (BiCGStab) method for solving the linear system. To extend HACApK's capability, this paper outlines how we ported the HACApK linear solver onto GPU clusters. Though the potential of GPUS has been widely accepted in high-performance computing, it is still a challenge to utilize the GPUS for a solver, like HACApK, that requires fine-grained irregular computation and global communication. To utilize the GPUS, we integrated the variable-size batched GPU kernel that was recently released in the MAGMA software package. This is the first time the variable-size batched kernels were used in a solver or application code. We discuss several techniques to improve the performance of the batched kernel and demonstrate the effects of these techniques on two state-of-The-Art GPU clusters. For instance, with two 14-core Intel Xeon CPUs and four NVIDIA P100 GPUS per node, the GPU kernel obtained a solver speedup of 8× on one node and 4× on eight nodes. We also show that when the inter-GPU communication becomes significant, the solution time can be further reduced by a factor of 2× by carefully designing the communication layer with the underlying node architecture in mind.

DOI： 10.1109/IPDPS.2018.00102
Chebyshev基底通信削減CG法のマルチコア・メニーコア計算環境における性能評価

Satoshi Ohshima, Akihiro Fujii, Teruo Tanaka, Takeshi Fukaya, Reiji Suda

研究報告ハイパフォーマンスコンピューティング（HPC） 165 ( 17 ) 1 - 9 2018.7

　More details

Language：Japanese Publishing type：Research paper (scientific journal)
Mellanox社のスイッチ装置への集団通信オフロード機能による集団通信隠蔽効果の調査

南里豪志, 大島聡史, 小野謙二

情報処理学会研究報告(Web) 2018 ( HPC-165 ) Vol.2018‐HPC‐165,No.12,1‐10 (WEB ONLY) 2018.7

　More details

Language：Japanese

Mellanox社のスイッチ装置への集団通信オフロード機能による集団通信隠蔽効果の調査
GPGPUによる高精度行列-行列積アルゴリズムのためのBatched BLASを用いた実装方式の提案

石黒史也, 片桐孝洋, 大島聡史, 永井亨, 荻野正雄

情報処理学会研究報告(Web) 2018 ( HPC-165 ) Vol.2018‐HPC‐165,No.32,1‐8 (WEB ONLY) 2018.7

　More details

Language：Japanese

GPGPUによる高精度行列‐行列積アルゴリズムのためのBatched BLASを用いた実装方式の提案
Chebyshev基底通信削減CG法のマルチコア・メニーコア計算環境における性能評価

大島聡史, 藤井昭宏, 田中輝雄, 深谷猛, 須田礼仁

情報処理学会研究報告(Web) 2018 ( HPC-165 ) Vol.2018‐HPC‐165,No.17,1‐9 (WEB ONLY) 2018.7

　More details

Language：Japanese

Chebyshev基底通信削減CG法のマルチコア・メニーコア計算環境における性能評価
Mellanox社のスイッチ装置への集団通信オフロード機能による集団通信隠蔽効果の調査

Takeshi Nanri, Satoshi Ohshima, Kenji Ono

研究報告ハイパフォーマンスコンピューティング（HPC） 165 ( 12 ) 1 - 10 2018.7

　More details

Language：Japanese Publishing type：Research paper (scientific journal)
GPGPUによる高精度行列-行列積アルゴリズムのためのBatched BLASを用いた実装方式の提案

Fumiya Ishiguro, Takahiro Katagiri, Satoshi Ohshima, Toru Nagai, Masao Ogino

研究報告ハイパフォーマンスコンピューティング（HPC） 165 ( 32 ) 1 - 8 2018.7

　More details

Language：Japanese Publishing type：Research paper (scientific journal)
階層型行列計算におけるソフトウェア自動チューニング

大島聡史, 山崎市太郎, 伊田明弘, 横田理央

計算工学講演会論文集(CD-ROM) 23 ROMBUNNO.D‐05‐03 2018.6

　More details

Language：Japanese

Software Auto-Tuning for Hierarchical Matrix Computation
A thread-level parallelization of pairwise additive potential and force calculations suitable for current many-core architectures Reviewed

Yoshimichi Andoh, Soichiro Suzuki, Satoshi Ohshima, Tatsuya Sakashita, Masao Ogino, Takahiro Katagiri, Noriyuki Yoshii, Susumu Okazaki

Journal of Supercomputing 74 ( 6 ) 2449 - 2469 2018.6

　More details

Language：English Publishing type：Research paper (scientific journal)

In molecular dynamics (MD) simulations, calculations of potentials and their derivatives by coordinate, i.e., forces, in a pairwise additive manner such as the Lennard–Jones interactions and a short-range part of the Coulombic interactions form the main part of arithmetic operations. It is essential to achieve high thread-level parallelization efficiency of these pairwise additive calculations of potentials and forces to use current supercomputers with many-core architectures effectively. In this paper, we propose four new thread-level parallelization algorithms for the pairwise additive potential and force calculations. We implement the four codes in a MD calculation code based on the fast multipole method. Performance benchmarks were taken on the FX100 supercomputer and Intel Xeon Phi coprocessor. The code succeeds in achieving high thread-level parallelization efficiency with 32 threads on the FX100 and up to 60 threads on the Xeon Phi.

DOI： 10.1007/s11227-018-2272-2
Performance of Hierarchical-matrix BiCGStab Solver on GPU Clusters. Reviewed

Ichitaro Yamazaki, Ahmad Abdelfattah, Akihiro Ida, Satoshi Ohshima, Stanimire Tomov, Rio Yokota, Jack J. Dongarra

2018 IEEE International Parallel and Distributed Processing Symposium, IPDPS 2018, Vancouver, BC, Canada, May 21-25, 2018 930 - 939 2018.5

　More details

Language：Others Publishing type：Research paper (other academic)

Performance of Hierarchical-matrix BiCGStab Solver on GPU Clusters.

DOI： 10.1109/IPDPS.2018.00102
高精度行列-行列積アルゴリズムにおけるBatched BLASの適用

石黒史也, 片桐孝洋, 大島聡史, 永井亨, 荻野正雄

情報処理学会全国大会講演論文集 80th ( 1 ) 1.49‐1.50 2018.3

　More details

Language：Japanese

高精度行列‐行列積アルゴリズムにおけるBatched BLASの適用
Optimization of hierarchical matrix computation on GPU Reviewed

Satoshi Ohshima, Ichitaro Yamazaki, Akihiro Ida, Rio Yokota

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 10776 274 - 292 2018.3

　More details

Language：English Publishing type：Research paper (other academic)

The demand for dense matrix computation in large scale and complex simulations is increasing
however, the memory capacity of current computer system is insufficient for such simulations. Hierarchical matrix method (H -matrices) is attracting attention as a computational method that can reduce the memory requirements of dense matrix computations. However, the computation of H -matrices is more complex than that of dense and sparse matrices
thus, accelerating the H -matrices is required. We focus on H -matrix - vector multiplication (HMVM) on a single NVIDIA Tesla P100 GPU. We implement five GPU kernels and compare execution times among various processors (the Broadwell-EP, Skylake-SP, and Knights Landing) by OpenMP. The results show that, although an HMVM kernel can compute many small GEMV kernels, merging such kernels to a single GPU kernel was the most effective implementation. Moreover, the performance of BATCHED BLAS in the MAGMA library was comparable to that of the manually tuned GPU kernel.

DOI： 10.1007/978-3-319-69953-0_16
Optimization of Hierarchical matrix computation on GPU Reviewed

Satoshi Ohshima, Ichitaro Yamazaki, Akihiro Ida, Rio Yokota

Lecture Notes in Computer Science 10776 2018.3

　More details

Language：English Publishing type：Research paper (scientific journal)

DOI： 10.1007/978-3-319-69953-0_16
高精度行列-行列積アルゴリズムにおけるbatched BLASの適用

Fumiya Ishiguro, Takahiro Katagiri, Satoshi Ohshima, Toru Nagai, Masao Ogino

第80回情報処理学会全国大会第80回情報処理学会全国大会予稿集 2018.3

　More details

Language：Japanese Publishing type：Research paper (other academic)
Optimization of Hierarchical matrix computation on GPU Reviewed International journal

Satoshi Ohshima, @Ichitaro Yamazaki, @Akihiro Ida, @Rio Yokota

In proceedings of Supercomputing Frontiers. SCFA 2018, Lecture Notes in Computer Science 10776 274 - 292 2018.3

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1007/978-3-319-69953-0_16

Other Link： https://doi.org/10.1007/978-3-319-69953-0_16
A thread-level parallelization of pairwise additive potential and force calculations suitable for current many-core architectures Reviewed International journal

Yoshimichi Andoh, Soichiro Suzuki, Satoshi Ohshima, Tatsuya Sakashita, Masao Ogino, Takahiro Katagiri, Noriyuki Yoshii, Susumu Okazaki

The Journal of Supercomputing 2018.2

　More details

Language：English Publishing type：Research paper (scientific journal)

DOI： 10.1007/s11227-018-2272-2

Other Link： https://link.springer.com/article/10.1007/s11227-018-2272-2
非ブロッキング集団通信の通信隠蔽効果に関する調査

南里豪志, 大島聡史, 小野謙二

情報処理学会研究報告(Web) 2017 ( HPC-162 ) Vol.2017‐HPC‐162,No.17,1‐11 (WEB ONLY) 2017.12

　More details

Language：Japanese

非ブロッキング集団通信の通信隠蔽効果に関する調査
スーパーコンピュータシステムITOの性能評価

大島聡史, 南里豪志, 渡部善隆, 天野浩文, 小野謙二

情報処理学会研究報告(Web) 2017 ( HPC-162 ) Vol.2017‐HPC‐162,No.7,1‐9 (WEB ONLY) 2017.12

　More details

Language：Japanese

スーパーコンピュータシステムITOの性能評価
非ブロッキング集団通信の通信隠蔽効果に関する調査

Takeshi Nanri, Satoshi Ohshima, Kenji Ono

研究報告ハイパフォーマンスコンピューティング（HPC） 162 ( 17 ) 1 - 11 2017.12

　More details

Language：Japanese Publishing type：Research paper (scientific journal)

本稿では，非ブロッキング集団通信による通信隠蔽技術について，特にプログレススレッドを用いた場合の実用上の効果を計測し，評価した．従来の通信隠蔽率のみを計測するベンチマークプログラムでは，プログレススレッドを利用した場合の計算性能の低下による影響が計測結果に反映されないため，実用性の検証が困難である．そこで本稿では，計算と通信を含む総合的な性能評価を行うため，スレッド並列とプロセス並列によるハイブリッド並列のベンチマークプログラムを作成した．このプログラムは，通信と計算の量をそれぞれ明示的に指定するため，プログレススレッドへの CPU コアの割り当て方法やスレッドのスケジューリングポリシーなどの実行時パラメータを変化させた場合の，計測結果の相互比較も可能となった．このプログラムを，Fujitsu PRIMERGY CX 400 および Fujitsu PRIMEHPC FX 100 上で実行し，性能を計測した．その結果，Alltoall では，適切な実行時パラメータを選択することにより，プログラム全体としての性能向上が見込めることが分かった．一方，Allreduce では，特にノード内で複数のプロセスを起動した場合に，性能が低下する場合があることが分かった．これらの結果から，非ブロッキング集団通信の利用にあたっては，使用する集団通信の種類やメッセージサイズ，計算量等に応じて，効果を事前に調査することが重要であることを確認した．また，非ブロッキング集団通信を推進するもう一つの手段であるオフロード機能について，Mellanox 社の SHArP 機能を用いた場合の通信隠蔽効果を予備評価し，通信隠蔽による性能向上が見込めることを確認した．
スーパーコンピュータシステムITOの性能評価

Satoshi Ohshima, Takeshi Nanri, Yoshitaka Watanabe, Hirofumi Amano, Kenji Ono

研究報告ハイパフォーマンスコンピューティング（HPC） 162 ( 7 ) 1 - 9 2017.12

　More details

Language：Japanese Publishing type：Research paper (scientific journal)

九州大学情報基盤研究開発センターではスーパーコンピュータシステム “ITO” を導入し，2017 年 10 月より一部システムによる試験運用を開始，2018 年 1 月より全システムによるサービス提供を予定している．本システムは最新の CPU や GPU を搭載していることに加えて，オープンデータの活用やパブリッククラウドサービスとの連携を考慮した挑戦的なシステムである．本稿では ITO の設計を紹介し，既に試験運用を開始しているバックエンドサブシステム B を用いて測定した性能評価の結果を示す．
スーパーコンピュータ上でのDeep Learning学習環境の初期構築

野村行弘, 佐藤一誠, 佐藤一誠, 佐藤一誠, 塙敏博, 花岡昇平, 中尾貴祐, 竹永智美, 佐藤大介, 星野哲也, 関谷勇司, 大島聡史, 林直人, 阿部修

電子情報通信学会技術研究報告 117 ( 281(MI2017 47-62) ) 1‐2 2017.10

　More details

Language：Japanese

スーパーコンピュータ上でのDeep Learning学習環境の初期構築
階層型行列計算のGPU向け最適化

大島聡史, 山崎市太郎, 伊田明弘, 横田理央

日本応用数理学会年会講演予稿集(CD-ROM) 2017 151‐152 2017.9

　More details

Language：Japanese

階層型行列計算のGPU向け最適化
GPUクラスタ上における階層型行列計算の最適化

Satoshi Ohshima, Ichitaro Yamazaki, Akihiro Ida, Rio Yokota

研究報告ハイパフォーマンスコンピューティング（HPC） 160 ( 14 ) 1 - 8 2017.7

　More details

Language：Japanese Publishing type：Research paper (scientific journal)

階層型行列は小さな密行列と低ランク近似行列から構成される行列である．密行列を階層型行列によって近似することで，大規模な計算をより少ないメモリ量で行うことが可能となる．しかし階層型行列を用いた計算は複雑であるため，最適化が求められている．我々はこれまで階層型行列を用いた境界要素法による静電場解析問題の実装と評価をマルチコア CPU やメニーコアプロセッサにて実施してきた．本稿では，階層型行列を係数行列に持つ線形方程式に対する反復法を対象として，GPU クラスタ上での性能評価や最適化に取り組んだ結果を示す．主要な計算部である階層型行列ベクトル積計算を構成する密行列ベクトル積計算を MAGMA BLAS に行わせることで高速化を目指したところ，GPU カーネル起動のオーバーヘッドにより実行時間が増大したが，BATCHED MAGMA を用いることで大幅に性能が改善した．実験環境としては TSUBAME 2.5 (最大 8 ノード / 1 ノードあたり 1 GPU) および Reedbush-H (最大 8 ノード / 1 ノードあたり 1 GPU) を使用し，それぞれ 8 ノードまで性能向上は得られたが，ノード数を増やした場合には MPI 処理の時間も目立ってきており，さらなる最適化が求められる結果となった．
GPUクラスタ上における階層型行列計算の最適化

大島聡史, 山崎市太郎, 伊田明弘, 横田理央

情報処理学会研究報告(Web) 2017 ( HPC-160 ) Vol.2017‐HPC‐160,No.14,1‐8 (WEB ONLY) 2017.7

　More details

Language：Japanese

GPUクラスタ上における階層型行列計算の最適化
Auto-tuning on NUMA and Many-core Environments with an FDM code

Takahiro Katagiri, Satoshi Ohshima, Masaharu Matsumoto

31st IEEE International Parallel and Distributed Processing Symposium, IPDPS 2017 The Twelfth International Workshop on Automatic Performance Tuning (iWAPT2017) (In Conjunction with the IEEE IPDPS2017) 2017.6

　More details

Language：English Publishing type：Research paper (other academic)
Pascal vs KNL: Performance Evaluation with ICCG Solve Reviewed

Tetsuya Hoshino, Satoshi Ohshima, Toshihiro Hanawa, Kengo Nakaima, Akihiro Ida

HPC in Asia Workshop Poster Session, ISC High Performance 2017 2017.6

　More details

Language：English Publishing type：Research paper (other academic)

Pascal vs KNL: Performance Evaluation with ICCG Solve
Auto-Tuning on NUMA and many-core environments with an FDM code Reviewed

Takahiro Katagiri, Satoshi Ohshima, Masaharu Matsumoto

Proceedings - 2017 IEEE 31st International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2017 1399 - 1407 2017.6

　More details

Language：English Publishing type：Research paper (other academic)

In this paper, we focus on auto-tuning (AT) performance on nonuniform memory access (NUMA) and many-core architectures. Code from the finite difference method (FDM) is selected to evaluate AT performance, and results on the Xeon Phi (Knights Landing, KNL) for four kinds of memory (FLAT and CACHE) and cluster modes (QUADRANT and SNC4) yielded the following findings: (1) The KNL memory mode did not affectoverall performance, except FLAT-SNC4. The difference ofexecution time for the CACHE mode to the FLAT mode was only 0.99%. (2) Hyper-threading (HT) technology worked well, and yielded 1.86x (baseline) and 1.50x (with AT). (3) Varying hybrid MPI/OpenMP execution was very effective for KNL. Themaximum factors of speedups were 2.16x in the baseline and2.91x with AT. (4) AT with code selection persisted as a powerful tool, even in KNL. We obtained speedups by AT for a maximum of 1.64x. Moreover, we had room to speedup by a further 1.31x by adapting AT for the fastest execution.

DOI： 10.1109/IPDPSW.2017.27
ポストムーア時代における有限差分法コードの自動チューニング技法の一考察

片桐孝洋, 大島聡史, 松本正晴

計算工学講演会論文集 Proceedings of the Conference on Computational Engineering and Science 22 ROMBUNNO.C‐01‐1 2017.5

　More details

Language：Japanese

A Consideration of Auto-tuning Technology for Finite Difference Method in Post Moore's era
GPU搭載スーパーコンピュータReedbush‐Hの性能評価

塙敏博, 星野哲也, 中島研吾, 大島聡史, 伊田明弘

情報処理学会研究報告(Web) 2017 ( HPC-159 ) Vol.2017‐HPC‐159,No.9,1‐6 (WEB ONLY) 2017.4

　More details

Language：Japanese

GPU搭載スーパーコンピュータReedbush‐Hの性能評価
OpenACCを用いたICCG法ソルバーのPascal GPUにおける性能評価

星野哲也, 大島聡史, 塙敏博, 中島研吾, 伊田明宏

情報処理学会研究報告(Web) 2017 ( HPC-158 ) Vol.2017‐HPC‐158,No.18,1‐9 (WEB ONLY) 2017.3

　More details

Language：Japanese

OpenACCを用いたICCG法ソルバーのPascal GPUにおける性能評価
Xeon Phi+OmniPath環境におけるOpenMP,MPI性能最適化

塙敏博, 星野哲也, 中島研吾, 大島聡史, 伊田明弘

情報処理学会研究報告(Web) 2017 ( HPC-158 ) Vol.2017‐HPC‐158,No.21,1‐8 (WEB ONLY) 2017.3

　More details

Language：Japanese

Xeon Phi+OmniPath環境におけるOpenMP,MPI性能最適化
ICCG法ソルバーのIntel Xeon Phi向け最適化

中島研吾, 中島研吾, 大島聡史, 大島聡史, 塙敏博, 星野哲也, 伊田明弘, 伊田明弘

情報処理学会研究報告(Web) 2016 ( HPC-157 ) Vol.2016‐HPC‐157,No.16,1‐8 (WEB ONLY) 2016.12

　More details

Language：Japanese

Optimization of ICCG Solver for Intel Xeon Phi
パイプライン型共役勾配法の性能評価

塙敏博, 中島研吾, 中島研吾, 大島聡史, 大島聡史, 星野哲也, 伊田明弘, 伊田明弘

情報処理学会研究報告(Web) 2016 ( HPC-157 ) Vol.2016‐HPC‐157,No.6,1‐9 (WEB ONLY) 2016.12

　More details

Language：Japanese

Performance Evaluation of Pipelined CG Method
データ解析・シミュレーション融合スーパーコンピュータシステムReedbush‐Uの性能評価

塙敏博, 中島研吾, 大島聡史, 伊田明弘, 星野哲也, 田浦健次朗

情報処理学会研究報告(Web) 2016 ( HPC-156 ) Vol.2016‐HPC‐156,No.10,1‐10 (WEB ONLY) 2016.9

　More details

Language：Japanese

データ解析・シミュレーション融合スーパーコンピュータシステムReedbush‐Uの性能評価
高バンド幅メモリ環境における数値計算アルゴリズムの変革と自動チューニング技術~FDMコードを例にして~

片桐孝洋, 松本正晴, 大島聡史

日本応用数理学会年会講演予稿集(CD-ROM) 2016 ROMBUNNO.9GATSU12NICHI,09:30,3E,1 2016.9

　More details

Language：Japanese

高バンド幅メモリ環境における数値計算アルゴリズムの変革と自動チューニング技術~FDMコードを例にして~
3次元積層技術による高メモリバンド幅時代の自動チューニング~FDMコードを例にして~

片桐孝洋, 松本正晴, 大島聡史

情報処理学会研究報告(Web) 2016 ( HPC-155 ) Vol.2016‐HPC‐155,No.38,1‐8 (WEB ONLY) 2016.8

　More details

Language：Japanese

3次元積層技術による高メモリバンド幅時代の自動チューニング~FDMコードを例にして~
ppOpen-HPC: Open Source Infrastructure for Development and Execution of Large-Scale Scientific Applications on Post-Peta-Scale Supercomputers with Automatic Tuning (AT) Reviewed

Kengo Nakajima, Masaki Satoh, Takashi Furumura, Hiroshi Okuda, Takeshi Iwashita, Hide Sakaguchi, Takahiro Katagiri, Masaharu Matsumoto, Satoshi Ohshima, Hideyuki Jitsumoto, Takashi Arakawa, Futoshi Mori, Takeshi Kitayama, Akihiro Ida, Miki Y. Matsuo

OPTIMIZATION IN THE REAL WORLD: TOWARD SOLVING REAL-WORLD OPTIMIZATION PROBLEMS 13 15 - 35 2016.8

　More details

Language：English Publishing type：Research paper (other academic)

ppOpen-HPC is an open source infrastructure for development and execution of large-scale scientific applications on post-peta-scale (pp) supercomputers with automatic tuning (AT). ppOpen-HPC focuses on parallel computers based on many-core architectures and consists of various types of libraries covering general procedures for scientific computations. The source code, developed on a PC with a single processor, is linked with these libraries, and the parallel code generated is optimized for post-peta-scale systems. In this article, recent achievements and progress of the ppOpen-HPC project are summarized.

DOI： 10.1007/978-4-431-55420-2_2
階層型行列ベクトル積のメニーコア向け最適化

大島聡史, 伊田明弘, 河合直聡, 塙敏博

情報処理学会研究報告(Web) 2016 ( HPC-155 ) Vol.2016‐HPC‐155,No.39,1‐9 (WEB ONLY) 2016.8

　More details

Language：Japanese

階層型行列ベクトル積のメニーコア向け最適化
FPGAを用いた階層型行列ベクトル積

塙敏博, 伊田明弘, 大島聡史, 河合直聡

情報処理学会研究報告(Web) 2016 ( HPC-155 ) Vol.2016‐HPC‐155,No.40,1‐9 (WEB ONLY) 2016.8

　More details

Language：Japanese

FPGAを用いた階層型行列ベクトル積
Auto-tuning of hybrid MPI/OpenMP execution with code selection by ppOpen-AT

Takahiro Katagiri, Masaharu Matsumoto, Satoshi Ohshima

30th IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2016 Proceedings - 2016 IEEE 30th International Parallel and Distributed Processing Symposium, IPDPS 2016 1488 - 1495 2016.7

　More details

Language：English Publishing type：Research paper (other academic)

In this paper, we propose an effective kernel implementation for an application of the finite difference method (FDM) by merging computations of central-difference and explicit time expansion schemes without IF statements inside the loops. The effectiveness of the implementation depends on the CPU architecture and execution situation, such as the problem size and the number of MPI processes and OpenMP threads. We adopt auto-tuning (AT) technology to select the best implementation. The AT function for the selection, referred to as «code selection», is implemented in an AT language, namely, ppOpen-AT. The results of experiments conducted using current advanced CPUs (Xeon Phi, Ivy Bridge, and FX10) indicated that crucial speedups of conventional AT are achieved by code selection. In particular, the heaviest kernels achieved speedups of 4.21x (Xeon Phi), 2.52x (Ivy Bridge), and 2.03x (FX10).

DOI： 10.1109/IPDPSW.2016.49
Utilization and expansion of ppOpen-AT for OpenACC

Satoshi Ohshima, Takahiro Katagiri, Masaharu Matsumoto

30th IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2016 Proceedings - 2016 IEEE 30th International Parallel and Distributed Processing Symposium, IPDPS 2016 1496 - 1505 2016.7

　More details

Language：English Publishing type：Research paper (other academic)

For application programmers, reducing efforts for optimizing programs is an important issue. Our solution of this issue is an auto-tuning (AT) technique. We are developing an AT language named ppOpen-AT. We have shown that this language is useful for multi-and many-core parallel programming. Today, OpenACC attracts attention as an easy and useful graphics processing unit (GPU) programming environment. While OpenACC is one possible parallel programming environment, users have to spend time and energy in order to optimize OpenACC programs. In this study, we investigate the usability of ppOpen-AT for OpenACC programs and propose to expand ppOpen-AT for further optimization of OpenACC.

DOI： 10.1109/IPDPSW.2016.123
エクサスケールコンピューティングに向けた自動性能チューニング研究の進展（AT研究動向とAT専用言語ppOpen-ATの最新機能の紹介／複数性能パラメタ空間における実行時AT機構／FFTにおけるAT）

片桐孝洋, 大島聡史, 松本正晴, 田中輝雄, 望月大義, 村田陸, 藤井昭宏, 高橋大介

ハイパフォーマンスコンピューティングと計算科学シンポジウム論文集 ( 2016 ) 47 - 48 2016.5

　More details

Language：Japanese
Utilization and Expansion of ppOpen-AT for OpenACC Reviewed

Satoshi Ohshima, Takahiro Katagiri, Masaharu Matsumoto

2016 IEEE 30TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW) 1496 - 1505 2016.5

　More details

Language：English Publishing type：Research paper (other academic)

For application programmers, reducing efforts for optimizing programs is an important issue. Our solution of this issue is an auto-tuning (AT) technique. We are developing an AT language named ppOpen-AT. We have shown that this language is useful for multi-and many-core parallel programming. Today, OpenACC attracts attention as an easy and useful graphics processing unit (GPU) programming environment. While OpenACC is one possible parallel programming environment, users have to spend time and energy in order to optimize OpenACC programs. In this study, we investigate the usability of ppOpen-AT for OpenACC programs and propose to expand ppOpen-AT for further optimization of OpenACC.

DOI： 10.1109/IPDPSW.2016.123
Auto-tuning of Hybrid MPI/OpenMP Execution with Code Selection by ppOpen-AT Reviewed

Takahiro Katagiri, Masaharu Matsumoto, Satoshi Ohshima

2016 IEEE 30TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW) 1488 - 1495 2016.5

　More details

Language：English Publishing type：Research paper (other academic)

In this paper, we propose an effective kernel implementation for an application of the finite difference method (FDM) by merging computations of central-difference and explicit time expansion schemes without IF statements inside the loops. The effectiveness of the implementation depends on the CPU architecture and execution situation, such as the problem size and the number of MPI processes and OpenMP threads. We adopt auto-tuning (AT) technology to select the best implementation. The AT function for the selection, referred to as "code selection", is implemented in an AT language, namely, ppOpen-AT. The results of experiments conducted using current advanced CPUs (Xeon Phi, Ivy Bridge, and FX10) indicated that crucial speedups of conventional AT are achieved by code selection. In particular, the heaviest kernels achieved speedups of 4.21x (Neon Phi), 2.52x (Ivy Bridge), and 2.03x (FX10).

DOI： 10.1109/IPDPSW.2016.49
分子動力学計算ソフトウェアMODYLASのメニーコアアーキテクチャ対応並列化に関する研究（分子動力学計算による研究の現状と課題／粒子対計算部分のメニーコア間スレッド並列の効率化／Xeon Phiによる分子動力学計算の高速化）

安藤嘉倫, 鈴木惣一朗, 大島聡史

ハイパフォーマンスコンピューティングと計算科学シンポジウム論文集 ( 2016 ) 97 - 98 2016.5

　More details

Language：Japanese
ポストムーア時代に向けた階層型自動チューニング機能の性能評価

片桐孝洋, 松本正晴, 大島聡史

計算工学講演会論文集 Proceedings of the Conference on Computational Engineering and Science 21 ROMBUNNO.F‐1‐1 2016.5

　More details

Language：Japanese

Performance Evaluation of a Hierarchical Auto-tuning Function toward to Post Moore's Era
FPGAを用いた疎行列数値計算の性能評価

大島聡史, 塙敏博, 片桐孝洋, 中島研吾

情報処理学会研究報告(Web) 2016 ( HPC-153 ) VOL.2016‐HPC‐153,NO.1 (WEB ONLY) 2016.2

　More details

Language：Japanese

FPGAを用いた疎行列数値計算の性能評価
有限要素法係数行列生成プロセスのメニィコア環境における最適化

中島研吾, 中島研吾, 成瀬彰, 大島聡史, 大島聡史, 塙敏博, 片桐孝洋, 片桐孝洋, 田浦健次朗

情報処理学会研究報告(Web) 2015 ( HPC-152 ) VOL.2015‐HPC‐152,NO.12 (WEB ONLY) 2015.12

　More details

Language：Japanese

Optimization of matrix assembly process in FEM applications on manycore architectures
Directive-Based Auto-Tuning for the Finite Difference Method on the Xeon Phi

Takahiro Katagiri, Satoshi Ohshima, Masaharu Matsumoto

29th IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2015 Proceedings - 2015 IEEE 29th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2015 1221 - 1230 2015.9

　More details

Language：English Publishing type：Research paper (other academic)

In this paper, we present a directive-based auto-tuning (AT) framework, called ppOpen-AT, and demonstrate its effect using simulation code based on the Finite Difference Method (FDM). The framework utilizes well-known loop transformation techniques. However, the codes used are carefully designed to minimize the software stack in order to meet the requirements of a many-core architecture currently in operation. The results of evaluations conducted using ppOpen-AT indicate that maximum speedup factors greater than 550% are obtained when it is applied in eight nodes of the Intel Xeon Phi. Further, in the AT for data packing and unpacking, a 49% speedup factor for the whole application is achieved. By using it with strong scaling on 32 nodes in a cluster of the Xeon Phi, we also obtain 24% speedups for the overall execution.

DOI： 10.1109/IPDPSW.2015.11
ppOpen‐ATによるOpenACCプログラムの自動チューニング

大島聡史, 片桐孝洋, 松本正晴

日本応用数理学会年会講演予稿集(CD-ROM) 2015 ROMBUNNO.9GATSU11NICHI,09:30,B,2 2015.9

　More details

Language：Japanese

ppOpen‐ATによるOpenACCプログラムの自動チューニング
ppOpen‐ATによる静的コード生成で実現する自動チューニング方式の評価

片桐孝洋, 松本正晴, 大島聡史

日本応用数理学会年会講演予稿集(CD-ROM) 2015 ROMBUNNO.9GATSU11NICHI,09:30,B,1 2015.9

　More details

Language：Japanese

ppOpen‐ATによる静的コード生成で実現する自動チューニング方式の評価
ppOpen‐APPL/FVMを使用した並列有限要素法アプリケーション

中島研吾, 中島研吾, 塙敏博, 大島聡史, 大島聡史, 片桐孝洋, 片桐孝洋

情報処理学会研究報告(Web) 2015 ( HPC-151 ) VOL.2015-HPC-151,NO.24 (WEB ONLY) 2015.9

　More details

Language：Japanese

Parallel FEM application using ppOpen‐APPL/FVM
SCG‐AT:静的コード生成のみによる自動チューニング実現方式

片桐孝洋, 松本正晴, 大島聡史

情報処理学会研究報告(Web) 2015 ( HPC-150 ) VOL.2015-HPC-150,NO.32 (WEB ONLY) 2015.7

　More details

Language：Japanese

SCG‐AT:静的コード生成のみによる自動チューニング実現方式
ppOpen‐ATを用いたOpenACCプログラムの自動チューニング

大島聡史, 片桐孝洋, 松本正晴

情報処理学会研究報告(Web) 2015 ( HPC-150 ) VOL.2015-HPC-150,NO.30 (WEB ONLY) 2015.7

　More details

Language：Japanese

ppOpen‐ATを用いたOpenACCプログラムの自動チューニング
1ノード200超スレッド時代の自動チューニングに向けて : FDMコードを例にして

片桐孝洋, 大島聡史, 松本正晴

計算工学講演会論文集 Proceedings of the Conference on Computational Engineering and Science 20 ROMBUNNO.E-1-4 2015.6

　More details

Language：Japanese

Towards Auto-tuning for Era of 200+ Threads Parallelism on One Node : An Adaptation of an FDM code
1ノード200超スレッド時代の自動チューニング手法 : FDMコード最適化を中心に (特集エクサスケール時代に向けた数値計算処理の自動チューニングの進展)

片桐孝洋, 大島聡史, 松本正晴

計算工学 20 ( 2 ) 3262 - 3265 2015.6

　More details

Language：Japanese

An Auto-Tuning Methodology in The Era of Over 200 Threads per Node : Adaptation of Code Optimization to an FDM Program
CFDツールOpenFOAM®への疎行列ライブラリXabclibの適用

櫻井隆雄, 片桐孝洋, 大島聡史, 黒田久泰, 猪貝光祥, 直野健

ハイパフォーマンスコンピューティングと計算科学シンポジウム論文集 ( 2015 ) 84 - 84 2015.5

　More details

Language：Japanese
Directive-based Auto-tuning for the Finite Difference Method on the Xeon Phi Reviewed

Takahiro Katagiri, Satoshi Ohshima, Masaharu Matsumoto

2015 IEEE 29TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS 1221 - 1230 2015.5

　More details

Language：English Publishing type：Research paper (other academic)

In this paper, we present a directive-based auto-tuning (AT) framework, called ppOpen-AT, and demonstrate its effect using simulation code based on the Finite Difference Method (FDM). The framework utilizes well-known loop transformation techniques. However, the codes used are carefully designed to minimize the software stack in order to meet the requirements of a many-core architecture currently in operation. The results of evaluations conducted using ppOpen-AT indicate that maximum speedup factors greater than 550% are obtained when it is applied in eight nodes of the Intel Xeon Phi. Further, in the AT for data packing and unpacking, a 49% speedup factor for the whole application is achieved. By using it with strong scaling on 32 nodes in a cluster of the Xeon Phi, we also obtain 24% speedups for the overall execution.

DOI： 10.1109/IPDPSW.2015.11
1ノード200超スレッド時代の自動チューニング手法~FDMコード最適化を中心に~

片桐孝洋, 大島聡史, 松本正晴

計算工学 20 ( 2 ) 3262 - 3265 2015.4

　More details

Language：Japanese

1ノード200超スレッド時代の自動チューニング手法~FDMコード最適化を中心に~
未知語の音声クエリに対する複数検索結果を用いた音声中の検索語検出

大島聡史, 小嶋和徳, 石亀昌明, 伊藤慶明

日本音響学会研究発表会講演論文集(CD-ROM) 2015 ROMBUNNO.1-P-6 2015.3

　More details

Language：Japanese

未知語の音声クエリに対する複数検索結果を用いた音声中の検索語検出
動的な並列実行機構を用いたSpMV実装の性能評価

大島聡史, 片桐孝洋, 櫻井隆雄, 中島研吾, 黒田久泰, 直野健, 猪貝光祥

情報処理学会研究報告. [ハイパフォーマンスコンピューティング] 2015 ( 3 ) 1 - 12 2015.2

　More details

Language：Japanese

動的な並列実行機構を用いたSpMV実装の性能評価
Auto-tuning of computation kernels from an FDM code with ppOpen-AT Reviewed

Takahiro Katagiri, Satoshi Ohshima, Masaharu Matsumoto

Proceedings - 2014 IEEE 8th International Symposium on Embedded Multicore/Manycore SoCs, MCSoC 2014 91 - 98 2014.11

　More details

Language：English Publishing type：Research paper (other academic)

In this paper, we propose an Auto-tuning (AT) function with an AT language for a dedicated numerical library with respect to supercomputers in operation. The AT function is based on well-known loop transformation techniques, such as loop split, fusion, and re-ordering of statements. However, loop split with copies or increase of computations, and loop fusion to the split loop are taken into account by utilizing user knowledge.

DOI： 10.1109/MCSoC.2014.22
Performance optimization of SpMV using CRS format by considering OpenMP scheduling on CPUs and MIC Reviewed

Satoshi Ohshima, Takahiro Katagiri, Masaharu Matsumoto

Proceedings - 2014 IEEE 8th International Symposium on Embedded Multicore/Manycore SoCs, MCSoC 2014 253 - 260 2014.11

　More details

Language：English Publishing type：Research paper (other academic)

In this study, we evaluate the performance of sparse matrix-vector multiplication (SpMV) using the compressed row storage (CRS) format on CPUs and MIC. We focus on the relationship between OpenMP scheduling and performance. The performance of SpMV is measured using various OpenMP scheduling settings and the results are analyzed, which show that OpenMP scheduling has a considerable effect on the performance of SpMV. We confirm that some scheduling settings resulted in performance improvements compared with default scheduling for particular matrices. The results of the evaluation show that the performance of SpMV is improved by up to 1.57 times compared with SPARC64 IXfx, 2.47 times compared with Xeon Ivy Bridge-EP, and 2.26 times compared with Knights Corner. Next, we modify the SpMV function of OpenATLib, an auto-tuned numerical library, to consider the scheduling of optimization as an additional SpMV implementation. We measure the performance of the GMRES solver and obtain performance improvements of up to 11.4%. These results will help to improve the performance of various numerical calculation applications.

DOI： 10.1109/MCSoC.2014.43
有限要素法係数行列生成プロセスのマルチコア・メニィコア環境における最適化

中島研吾, 大島聡史, 塙敏博

情報処理学会研究報告. [ハイパフォーマンスコンピューティング] 2014 ( 22 ) 1 - 7 2014.9

　More details

Language：Japanese

Optimization of matrix assembling process in FEM applications on multicore/manycore architectures
Finite-element method (FEM) is one of the most well-known numerical methods for solving partial differential equations (PDE), and applied to various kinds of scientific simulations. Matrix assembling and sparse matrix solver are the most expensive processes in finite-element procedures. In the present work, the matrix assembling process is parallelized using OpenMP, and three types of implementations are evaluated on various types of multicore/manycore architectures. Results and analyses of computations and strategies towards automatic tuning will be described in the presentation.
1ノード200超スレッド時代の自動チューニング手法~FDMコードを例にして~

片桐孝洋, 大島聡史, 松本正晴

日本応用数理学会年会講演予稿集(CD-ROM) 2014 ROMBUNNO.9GATSU3NICHI,09:30,E,3 2014.8

　More details

Language：Japanese

1ノード200超スレッド時代の自動チューニング手法~FDMコードを例にして~
疎行列ソルバーにおける自動チューニングを用いたOpenMP指示文の最適化

大島聡史, 松本正晴, 片桐孝洋

日本応用数理学会年会講演予稿集(CD-ROM) 2014 ROMBUNNO.9GATSU3NICHI,09:30,E,1 2014.8

　More details

Language：Japanese

疎行列ソルバーにおける自動チューニングを用いたOpenMP指示文の最適化
様々な計算機環境におけるOpenMP/OpenACCを用いたICCG法の性能評価

大島聡史, 松本正晴, 片桐孝洋, 塙敏博, 中島研吾

情報処理学会研究報告. [ハイパフォーマンスコンピューティング] 2014 ( 21 ) 1 - 10 2014.7

　More details

Language：Japanese

様々な計算機環境におけるOpenMP/OpenACCを用いたICCG法の性能評価
Xeon PhiにおけるppOpen-ATを用いた有限差分法コードの自動チューニング

片桐孝洋, 大島聡史, 松本正晴

計算工学講演会論文集 Proceedings of the Conference on Computational Engineering and Science 19 ROMBUNNO.F-6-3 2014.6

　More details

Language：Japanese

Auto-tuning for A Code from Finite Difference Method with ppOpen-AT on the Xeon Phi
Implementation and evaluation of an AMR framework for FDM applications Reviewed

Masaharu Matsumoto, Futoshi Mori, Satoshi Ohshima, Hideyuki Jitsumoto, Takahiro Katagiri, Kengo Nakajima

Procedia Computer Science 29 936 - 946 2014.6

　More details

Language：English Publishing type：Research paper (other academic)

In order to execute various finite-difference method applications on large-scale parallel computers with a reasonable cost of computer resources, a framework using an adaptive mesh refinement (AMR) technique has been developed. AMR can realize high-resolution simulations while saving computer resources by generating and removing hierarchical grids dynamically. In the AMR framework, a dynamic domain decomposition (DDD) technique, as a dynamic load balancing method, is also implemented to correct the computational load imbalance between each process associated with parallelization. By performing a 3D AMR test simulation, it is confirmed that dynamic load balancing can be achieved and execution time can be reduced by introducing the DDD technique. © The Authors. Published by Elsevier B.V.

DOI： 10.1016/j.procs.2014.05.084
通信削減アルゴリズムCAQRのRSDFTの直交化処理への適用と評価

片桐孝洋, 高山恒一, 米村崇, 熊洞宏樹, 猪貝光祥, 北上純一, 江口義之, 深谷猛, 山本有作, 岩田潤一, 内田和之, 大島聡史, 中島研吾

情報処理学会研究報告. [ハイパフォーマンスコンピューティング] 2014 ( 3 ) 1 - 6 2014.5

　More details

Language：Japanese

通信削減アルゴリズムCAQRのRSDFTの直交化処理への適用と評価
レイテンシコアの高度化・高効率化による将来のHPCIシステムに関する調査研究のアプリケーションの異機種環境での評価～メニーコア環境を中心に～

片桐孝洋, 大島聡史, 中島研吾, 米村崇, 熊洞宏樹, 樋口清隆, 橋本昌人, 高山恒一, 藤堂眞治, 岩田潤一, 内田和之, 佐藤正樹, 羽角博康, 黒木聖夫, 安達斉, 江口義之

情報処理学会研究報告. [ハイパフォーマンスコンピューティング] 2014 ( 26 ) 1 - 13 2014.2

　More details

Language：Japanese

レイテンシコアの高度化・高効率化による将来のHPCIシステムに関する調査研究のアプリケーションの異機種環境での評価~メニーコア環境を中心に~
Auto-tuning of computation kernels from an FDM code with ppOpen-AT

Takahiro Katagiri, Satoshi Ohshima, Masaharu Matsumoto

2014 8th IEEE International Symposium on Embedded Multicore/Manycore SoCs, MCSoC 2014 Proceedings - 2014 IEEE 8th International Symposium on Embedded Multicore/Manycore SoCs, MCSoC 2014 91 - 98 2014.1

　More details

Language：English Publishing type：Research paper (other academic)

In this paper, we propose an Auto-tuning (AT) function with an AT language for a dedicated numerical library with respect to supercomputers in operation. The AT function is based on well-known loop transformation techniques, such as loop split, fusion, and re-ordering of statements. However, loop split with copies or increase of computations, and loop fusion to the split loop are taken into account by utilizing user knowledge.

DOI： 10.1109/MCSoC.2014.22
Implementation and evaluation of an AMR framework for FDM applications Reviewed

Masaharu Matsumoto, Futoshi Mori, Satoshi Ohshima, Hideyuki Jitsumoto, Takahiro Katagiri, Kengo Nakajima

14th Annual International Conference on Computational Science, ICCS 2014 Procedia Computer Science 29 936 - 946 2014.1

　More details

Language：English Publishing type：Research paper (scientific journal)

In order to execute various finite-difference method applications on large-scale parallel computers with a reasonable cost of computer resources, a framework using an adaptive mesh refinement (AMR) technique has been developed. AMR can realize high-resolution simulations while saving computer resources by generating and removing hierarchical grids dynamically. In the AMR framework, a dynamic domain decomposition (DDD) technique, as a dynamic load balancing method, is also implemented to correct the computational load imbalance between each process associated with parallelization. By performing a 3D AMR test simulation, it is confirmed that dynamic load balancing can be achieved and execution time can be reduced by introducing the DDD technique.

DOI： 10.1016/j.procs.2014.05.084
実アプリを用いたさまざまなアーキテクチャからなる計算機システムの性能評価

深沢圭一郎, 片桐孝洋, 大宮学, 江川隆輔, 大島聡史, 青木尊之, 下川辺隆史, 荻野正雄, 岩下武史, 東田学

情報処理学会研究報告(Web) 2013 ( ARC-207 ) VOL.2013-ARC-207,NO.16 (WEB ONLY) - 7 2013.12

　More details

Language：Japanese

Performance Evaluation of Computer Systems Consisted of Various Architectures with Scientific Application
Energy optimization for scientific programs using auto-tuning language ppOpen-AT Reviewed

Takahiro Katagiri, Cheng Luo, Reiji Suda, Shoichi Hirasawa, Satoshi Ohshima

Proceedings - IEEE 7th International Symposium on Embedded Multicore/Manycore System-on-Chip, MCSoC 2013 123 - 128 2013.11

　More details

Language：English Publishing type：Research paper (other academic)

In this paper, we demonstrate a new approach for power-consumption optimization using a dedicated Auto-tuning (AT) language. Our approach is based on recently developed technologies: (1) a power measurement application programming interface, (2) an AT mathematical core library. Preliminary performance evaluation enables us to select the best kernel for a real-world scientific program using either the CPU or Graphics Processing Unit, with respect to energy consumption. From the results of the evaluation, we found the performance-changing point in the experimental environment. © 2013 IEEE.

DOI： 10.1109/MCSoC.2013.14
Early experiences for adaptation of auto-tuning by ppOpen-AT to an explicit method Reviewed

Takahiro Katagiri, Satoshi Ito, Satoshi Ohshima

Proceedings - IEEE 7th International Symposium on Embedded Multicore/Manycore System-on-Chip, MCSoC 2013 153 - 158 2013.9

　More details

Language：English Publishing type：Research paper (other academic)

We present a code optimization technique by adapting an auto-tuning (AT) function to an explicit method with the static code generator FIBER. The AT function is evaluated with current multicore processors to match situations with high-thread parallelism (HTP). The results of performance evaluations indicate that the AT function is crucial for HTP, as the speedups of the explicit method with a static code generator are as much as 7.4x compared to that of original implementations based on compiler optimization only. © 2013 IEEE.

DOI： 10.1109/MCSoC.2013.15
メニーコアアーキテクチャ向けのSpMV最適化と自動チューニング

大島聡史, 金子勇, 片桐孝洋

日本応用数理学会年会講演予稿集(CD-ROM) 2013 ROMBUNNO.9098 2013.9

　More details

Language：Japanese

メニーコアアーキテクチャ向けのSpMV最適化と自動チューニング
ppOpen‐ATにより自動生成されたppOpen‐HPCコードにおける自動チューニング機能の性能評価

片桐孝洋, 大島聡史, 松本正晴

日本応用数理学会年会講演予稿集(CD-ROM) 2013 ROMBUNNO.9059 2013.9

　More details

Language：Japanese

ppOpen‐ATにより自動生成されたppOpen‐HPCコードにおける自動チューニング機能の性能評価
Xeon PhiにおけるSpMVの性能評価

大島聡史, 金子勇, 片桐孝洋

情報処理学会研究報告. [ハイパフォーマンスコンピューティング] 2013 ( 33 ) 1 - 8 2013.7

　More details

Language：Japanese

Xeon PhiにおけるSpMVの性能評価
陽解法カーネルのための自動チューニング記述言語ppOpen‐ATの新機能について

片桐孝洋, 大島聡史, 伊東聰

計算工学講演会論文集(CD-ROM) 18 ROMBUNNO.D-13-1 2013.6

　More details

Language：Japanese

陽解法カーネルのための自動チューニング記述言語ppOpen‐ATの新機能について
A Sparse Matrix Library with Automatic Selection of Iterative Solvers and Preconditioners Reviewed

Takao Sakurai, Takahiro Katagiri, Hisayasu Kuroda, Ken Naono, Mitsuyoshi Igai, Satoshi Ohshima

2013 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE 18 1332 - 1341 2013.6

　More details

Language：English Publishing type：Research paper (other academic)

Many iterative solvers and preconditioners have recently been proposed for linear iterative matrix libraries. Currently, library users have to manually select the solvers and preconditioners to solve their target matrix. However, if they select the wrong combination of the two, they have to spend a lot of time on calculations or they cannot obtain the solution. Therefore, an approach for the automatic selection of solvers and preconditioners is needed. We have developed a function that automatically selects an effective solver/preconditioner combination by referencing the history of relative residuals at run-time to predict whether the solver will converge or stagnate. Numerical evaluation with 50 Florida matrices showed that the proposed function can select effective combinations in all matrices. This suggests that our function can play a significant role in sparse iterative matrix computations. (C) 2013 The Authors. Published by Elsevier B.V. and peer review under responsibility of the organizers of the 2013 International Conference on Computational Science

DOI： 10.1016/j.procs.2013.05.300
陽解法カーネルのため自動チューニング記述言語ppOpen-ATの新機能について

片桐孝洋, 大島聡史, 伊東聰

計算工学講演会論文集 Proceedings of the Conference on Computational Engineering and Science 18 4p 2013.6

　More details

Language：Japanese

A New Function of an Auto-tuning Description Language ppOpen-AT for Kernels of Explicit Method
メニーコアプロセッサXeon Phiの性能評価

大島聡史, 金子勇

情報処理学会研究報告. [ハイパフォーマンスコンピューティング] 2013 ( 20 ) 1 - 6 2013.5

　More details

Language：Japanese

メニーコアプロセッサXeon Phiの性能評価
レイテンシコアの高度化・高効率化による将来のHPCIシステムに関する調査研究におけるアプリケーションの最適化と複数の計算機環境での性能評価

大島聡史, 片桐孝洋, 中島研吾, 米村崇, 熊洞宏樹, 樋口清隆, 橋本昌人, 高山恒一, 藤堂眞治, 岩田潤一, 内田和之, 佐藤正樹, 羽角博康, 黒木聖夫

先進的計算基盤システムシンポジウム論文集 2013 ( 2013 ) 111 - 111 2013.5

　More details

Language：Japanese
レイテンシコアの高度化・高効率化による将来のHPCIシステムに関する調査研究のためのアプリケーション最適化と異機種計算機環境での性能評価

片桐孝洋, 大島聡史, 中島研吾, 米村崇, 熊洞宏樹, 樋口清隆, 橋本昌人, 高山恒一, 藤堂眞治, 岩田潤一, 内田和之, 佐藤正樹, 羽角博康, 黒木聖夫

情報処理学会研究報告. [ハイパフォーマンスコンピューティング] 2013 ( 4 ) 1 - 9 2013.5

　More details

Language：Japanese

レイテンシコアの高度化・高効率化による将来のHPCIシステムに関する調査研究のためのアプリケーション最適化と異機種計算機環境での性能評価
レイテンシコアの高度化・高効率化による将来のHPCIシステムに関する調査研究のためのアプリケーションと性能評価

片桐孝洋, 大島聡史, 中島研吾, 米村崇, 熊洞宏樹, 樋口清隆, 橋本昌人, 高山恒一, 藤堂眞治, 岩田潤一, 内田和之, 佐藤正樹, 羽角博康, 黒木聖夫

情報処理学会研究報告(CD-ROM) 2012 ( 5 ) ROMBUNNO.ARC-202,NO.2 2013.2

　More details

Language：Japanese

レイテンシコアの高度化・高効率化による将来のHPCIシステムに関する調査研究のためのアプリケーションと性能評価
超低消費電力高性能計算に向けた取り組み

大島聡史, Luo Cheng, 平澤将一, 片桐孝洋, 須田礼二, 本多弘樹

第54回プログラミング･シンポジウム予稿集 54th ( 2013 ) 75 - 80 2013.1

　More details

Language：Japanese

Research activity for Ultra Low Performance HPC
Implementation and Evaluation of 3D Finite Element Method Application for CUDA Reviewed

Satoshi Ohshima, Masae Hayashi, Takahiro Katagiri, Kengo Nakajima

HIGH PERFORMANCE COMPUTING FOR COMPUTATIONAL SCIENCE - VECPAR 2012 7851 140 - 148 2013.1

　More details

Language：English Publishing type：Research paper (other academic)

This paper describes a fast implementation of a FEM application on a GPU. We implemented our own FEM application and succeeded in obtaining a performance improvement in two of our application components: Matrix Assembly and Sparse Matrix Solver. Moreover, we found that accelerating our Boundary Condition Setting component on the GPU and omitting CPU-GPU data transfer between Matrix Assembly and Sparse Matrix Solver slightly further reduces execution time. As a result, the execution time of the entire FEM application was shortened from 44.65 sec on only a CPU (Nehalem architecture, 4 cores, OpenMP) to 17.52 sec on a CPU with a GPU (TeslaC2050).

DOI： 10.1007/978-3-642-38718-0_16
Control formats for unsymmetric and symmetric sparse matrix-vector multiplications on OpenMP implementations Reviewed

Takahiro Katagiri, Takao Sakurai, Mitsuyoshi Igai, Satoshi Ohshima, Hisayasu Kuroda, Ken Naono, Kengo Nakajima

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 7851 236 - 248 2013.1

　More details

Language：English Publishing type：Research paper (other academic)

In this paper, we propose "control formats" to obtain better thread performance of sparse matrix-vector multiplication (SpMV) for unsymmetric and symmetric matrices. By using the control formats, we established the following maximum speedups of SpMV in 16-thread execution on one node of the T2K Open Supercomputer: (1) 7.14x for an unsymmetric matrix by using the proposed Branchless Segmented Scan compared to the original Segmented Scan method
(2) 12.7x for a symmetric matrix by using the proposed Zero-element Computation-free method compared to a simple SpMV implementation. © 2013 Springer-Verlag.

DOI： 10.1007/978-3-642-38718-0_24
レイテンシコアの高度化・高効率化による将来のHPCIシステムに関する調査研究のためのアプリケーションと性能評価

片桐孝洋, 大島聡史, 中島研吾, 米村崇, 熊洞宏樹, 樋口清隆, 橋本昌人, 高山恒一, 藤堂眞治, 岩田潤一, 内田和之, 佐藤正樹, 羽角博康, 黒木聖夫

研究報告ハイパフォーマンスコンピューティング（HPC） 2012 ( 2 ) 1 - 12 2012.12

　More details

Language：Japanese

本報告では，レイテンシコアの高度化・高効率化による将来の HPCI システムに関する調査研究におけるターゲットアプリケーションの特徴について，演算パターンと通信パターンの観点からの分類法を提案する．東京大学情報基盤センターに設定された富士通 PRIMEHPC FX10 を用いたプロファイル結果を示し，同計算機でのハードウェア性能からの特徴について紹介する．
レイテンシコアの高度化・高効率化による将来のHPCIシステムに関する調査研究のためのアプリケーションと性能評価

片桐孝洋, 大島聡史, 中島研吾, 米村崇, 熊洞宏樹, 樋口清隆, 橋本昌人, 高山恒一, 藤堂眞治, 岩田潤一, 内田和之, 佐藤正樹, 羽角博康, 黒木聖夫

研究報告計算機アーキテクチャ（ARC） 2012 ( 2 ) 1 - 12 2012.12

　More details

Language：Japanese

本報告では，レイテンシコアの高度化・高効率化による将来の HPCI システムに関する調査研究におけるターゲットアプリケーションの特徴について，演算パターンと通信パターンの観点からの分類法を提案する．東京大学情報基盤センターに設定された富士通 PRIMEHPC FX10 を用いたプロファイル結果を示し，同計算機でのハードウェア性能からの特徴について紹介する．
BiCGStab法の前処理付きアルゴリズムに対する改善

伊藤祥司, 片桐孝洋, 櫻井隆雄, 猪貝光祥, 大島聡史, 黒田久泰, 直野健

情報処理学会論文誌トランザクション(CD-ROM) 2012 ( 1 ) ROMBUNNO.KONPYUTINGUSHISUTEMU,VOL.5,NO.3,11-21 2012.10

　More details

Language：Japanese

An Improvement in Preconditioned Algorithm of BiCGStab Method
大規模超並列スーパーコンピューターシステムOakleaf‐FX(Fujitsu PRIMEHPC FX10)の性能評価

大島聡史, 實本英之, 鴨志田良和, 片桐孝洋, 田浦健次朗, 中島研吾

情報処理学会研究報告(CD-ROM) 2012 ( 3 ) ROMBUNNO.HPC-135,NO.43 2012.10

　More details

Language：Japanese

Performance Evaluation of Oakleaf‐FX (Fujitsu PRIMEHPC FX10) Supercomputer System
SSG-AT: An auto-tuning method of sparse matrix-vector multiplicataion for semi-structured grids-An adaptation to openfoam Reviewed

Satoshi Ito, Satoshi Ohshima, Takahiro Katagiri

Proceedings - IEEE 6th International Symposium on Embedded Multicore SoCs, MCSoC 2012 191 - 197 2012.9

　More details

Language：English Publishing type：Research paper (other academic)

We are developing ppOpen-AT, which is an infrastructureof auto-tuning (AT) for ppOpen-HPC. ppOpen-HPC is numerical middleware for post Petascale era. In this study, we propose a new auto-tuning (AT) facility for semi-structured grids in OpenFOAM. We focus on sparse matrix-vector multiplication and the matrix storage formats. Using the features of input data and mesh connectivity, we propose a hybrid storage format that is suitable for semistructured grids. We evaluate the proposed AT facility on the T2K supercomputer and an Intel Xeon cluster. For a typical computational fluid dynamics scenario, we obtain speedup factors of 1.3 on the T2K and 1.84 on the Xeon cluster. These results indicate that the proposed AT method has the potential to select the optimal data format according to features of the input sparse matrix. © 2012 IEEE.

DOI： 10.1109/MCSoC.2012.26
収束障害(Fault Convergence):数値計算ソフトウェアにおける新しい安全性の概念

片桐孝洋, 櫻井隆雄, 伊藤祥司, 猪貝光祥, 大島聡史, 黒田久泰, 直野健, 中島研吾

情報処理学会研究報告(CD-ROM) 2012 ( 2 ) ROMBUNNO.HPC-134,NO.9 2012.8

　More details

Language：Japanese

Fault Convergence: A New Concept of Safety for Numerical Computation Software
ポストペタスケール環境のための自動チューニング基盤ppOpen‐ATの新機能について

片桐孝洋, 伊東聰, 大島聡史

日本応用数理学会年会講演予稿集(CD-ROM) 2012 273 - 274 2012.8

　More details

Language：Japanese

ポストペタスケール環境のための自動チューニング基盤ppOpen‐ATの新機能について
Xabclib:ソルバ・前処理自動選択機能を備えた疎行列ライブラリ

櫻井隆雄, 片桐孝洋, 直野健, 黒田久泰, 中島研吾, 猪貝光祥, 大島聡史, 伊藤祥司

日本応用数理学会年会講演予稿集(CD-ROM) 2012 281 - 282 2012.8

　More details

Language：Japanese

Xabclib:ソルバ・前処理自動選択機能を備えた疎行列ライブラリ
GPUを用いた疎行列ベクトル積計算の最適化

大島聡史

日本応用数理学会年会講演予稿集(CD-ROM) 2012 299 - 300 2012.8

　More details

Language：Japanese

GPUを用いた疎行列ベクトル積計算の最適化
A Fully Run-time Auto-tuned Sparse Iterative Solver with OpenATLib Reviewed

Ken Naono, Takao Sakurai, Takahiro Katagiri, Satoshi Ohshima, Shoji Itoh, Kengo Nakajima, Mitsuyoshi Igai, Hisayasu Kuroda

2012 4TH INTERNATIONAL CONFERENCE ON INTELLIGENT AND ADVANCED SYSTEMS (ICIAS), VOLS 1-2 143 - 148 2012.6

　More details

Language：English Publishing type：Research paper (other academic)

We propose a general application programming interface called OpenATLib for auto-tuning (AT). OpenATLib is carefully designed to establish the reusability of AT functions for sparse iterative solvers. Using APIs of OpenATLib, we develop a fully auto-tuned sparse iterative solver called Xabclib. Xabclib has several novel runtime AT functions. We also develop a numerical computation policy that can optimize memory space and computational accuracy. Using the above functions and policies, we obtain the following important findings: (1) an average memory space is reduced to 1/45 under lower memory policies, and (2) fault convergence, which the conventional solvers judges to be converged but actually not converged in the sense of the before-preconditioned matrix, is avoided under higher accuracy policies. The results imply policy-based runtime AT plays significant role in sparse iterative matrix computations.

DOI： 10.1109/ICIAS.2012.6306176
BiCGStab法の前処理付きアルゴリズムに対する改善

伊藤祥司, 片桐孝洋, 櫻井隆雄, 猪貝光祥, 大島聡史, 黒田久泰, 直野健

情報処理学会論文誌コンピューティングシステム（ACS） 5 ( 3 ) 11 - 21 2012.5

　More details

Language：Japanese

An Improvement in Preconditioned Algorithm of BiCGStab Method
前処理付きBiCGStab（PBiCGStab）法の改善アルゴリズムを提案する．前処理付きBiCG法にCGS法の導出手順を適用すると，CGS法の合理的な前処理付きアルゴリズムが構成される．この手法をPBiCGStab法へと拡張するにあたり，BiCGStab法に現れるMR演算に対し論理面からの新たな考察を行い，適用できることを示した．本提案アルゴリズムが従来のPBiCGStabよりも合理的であることと，数値実験により本提案の有効性を示す．An improved preconditioned BiCGStab algorithm (improved PBiCGStab) is proposed. Rational preconditioned algorithm of CGS has been constructed, by applying the derivation procedure of the CGS to the preconditioned BiCG. In order to extend this approach to the BiCGStab, minimum residual part of the BiCGStab must be considered logically. This proposed algorithm is also more rational than the conventional typical PBiCGStab mathematically. Numerical results show advantages of this improved PBiCGStab.
収束障害(Fault Convergence)：数値計算ソフトウェアにおける新しい安全性の概念

片桐孝洋, 櫻井隆雄, 伊藤祥司, 猪貝光祥, 大島聡史, 黒田久泰, 直野健, 中島研吾

研究報告ハイパフォーマンスコンピューティング（HPC） 2012 ( 9 ) 1 - 8 2012.5

　More details

Language：Japanese

Fault Convergence: A New Concept of Safety for Numerical Computation Software
本論文では，数値計算ソフトウェアで多く用いられている数値反復解法において生じると考えられる収束障害（Fault Convergence）の概念を提案する．数値計算分野で用いられている偽収束（False Convergence）との違いを議論する．Laprie により定義されたディペンダブルコンピューティング実現のための 3 つの脅威―障害 (fault) ‐異常 (error) ‐故障 (failure) モデル－を用いて数値反復解法での収束問題を議論することにより，収束障害の一例を示す．In this paper, we propose a concept of "Fault Convergence" for numerical iteration methods, which are widely used methods in numerical software. With respect to the difference to the concept of "False" convergence on numerical computation field, we explain a situation that fault convergence occurs. By using the model proposed by Laprie with the 3 kinds of threats to dependable computing―the fault-error-failure model―, we discuss an example of fault convergence situation in convergence problem to numerical iterative methods.
ppOpen-HPCのための自動チューニング基盤ppOpen-ATの開発

片桐孝洋, 大島聡史, 伊東聰

計算工学講演会論文集 Proceedings of the Conference on Computational Engineering and Science 17 ROMBUNNO.E-7-6 2012.5

　More details

Language：Japanese

Development of Auto-tuning Infrastructure ppOpen-AT for ppOpen-HPC
ppOpen-ATにおけるOpenFOAM高速化の取り組み

伊東聰, 大島聡史, 片桐孝洋

計算工学講演会論文集 Proceedings of the Conference on Computational Engineering and Science 17 ROMBUNNO.E-7-7 2012.5

　More details

Language：Japanese

Study of OpenFOAM tuning in ppOpen-AT
大規模SMP並列スーパーコンピューター(HITACHI SR16000モデルM1)の性能評価

大島聡史, 實本英之, 鴨志田良和, 片桐孝洋, 田浦健次朗, 中島研吾

情報処理学会研究報告(CD-ROM) 2011 ( 6 ) ROMBUNNO.HPC-133,NO.5 2012.4

　More details

Language：Japanese

Performance Evaluation of HITACHI SR16000 model M1 Supercomputer System
複数GPU向けのCUDAコードを生成するOpenMP処理系の提案

長塚郁, 大島聡史, 平澤将一, 近藤正章, 本多弘樹

情報処理学会研究報告(CD-ROM) 2011 ( 6 ) ROMBUNNO.HPC-133,NO.12 2012.4

　More details

Language：Japanese

複数GPU向けのCUDAコードを生成するOpenMP処理系の提案
大規模SMP並列スーパーコンピューター(HITACHI SR16000モデルM1)の性能評価

大島聡史, 實本英之, 鴨志田良和, 片桐孝洋, 田浦健次朗, 中島研吾

研究報告ハイパフォーマンスコンピューティング（HPC） 2012 ( 5 ) 1 - 10 2012.3

　More details

Language：Japanese

Performance Evaluation of HITACHI SR16000 model M1 Supercomputer System
本稿では東京大学情報基盤センターにおいて 2011 年 10 月に稼働を開始したスーパーコンピューターシステム HITACHI SR16000 モデル M1（愛称 Yayoi）の性能について報告する．本システムは計算ノードに Power7 プロセッサを搭載した最新のスーパーコンピューターシステムである．いくつかのベンチマークを用いて性能評価を行った結果，性能の特性や重要な実行時環境変数の設定などが明らかとなった．We report the performance of HITACHI SR16000 model M1 supercomputer system (named Yayoi) which has started in October 2011 at Information Technology Center, The University of Tokyo. This is a latest supercomputer system which mounts Power7 CPU on the computation node. We executed several benchmarks on the system and unveiled characteristic features of performance and imporant parameters.
複数GPU向けのCUDAコードを生成するOpenMP処理系の提案

長塚郁, 大島聡史, 平澤将一, 近藤正章, 本多弘樹

研究報告ハイパフォーマンスコンピューティング（HPC） 2012 ( 12 ) 1 - 8 2012.3

　More details

Language：Japanese

著者らは OpenMP プログラムから CUDA プログラムへ変換する処理系，"OMPCUDA" の開発を行っている．本稿では，OMPCUDA における複数 GPU 向けの CUDA プログラムを生成するための機能の実装を述べ，生成された CUDA コードの評価結果について考察する．
BiCGStab法の前処理付きアルゴリズムに対する改善

伊藤祥司, 片桐孝洋, 櫻井隆雄, 猪貝光祥, 大島聡史, 黒田久泰, 直野健

ハイパフォーマンスコンピューティングと計算科学シンポジウム論文集 2012 ( 2012 ) 117 - 126 2012.1

　More details

Language：Japanese

An improvement in preconditioned algorithm of BiCGStab method
並列スーパーコンピュータSR16000/M1の構成と性能

大島聡史, 實本英之, 鴨志田良和, 片桐孝洋, 田浦健次朗, 中島研吾

ハイパフォーマンスコンピューティングと計算科学シンポジウム論文集 2012 ( 2012 ) 84 - 84 2012.1

　More details

Language：Japanese
ヘテロ環境を目指した拡張階層型領域間分割に基づく高次フィルイン付き前処理手法の高速化

林雅江, 大島聡史, 中島研吾

情報処理学会研究報告(CD-ROM) 2011 ( 2 ) ROMBUNNO.HPC-130,NO.5 2011.8

　More details

Language：Japanese

Parallel ILU Preconditioner Based on Extended Hierarchical Interface Decomposition for Heterogeneous environments
自動チューニングインターフェースOpenATLibにおける自動チューニング機能の評価

櫻井隆雄, 片桐孝洋, 直野健, 黒田久泰, 中島研吾, 猪貝光祥, 大島聡史, 伊藤祥司

情報処理学会研究報告(CD-ROM) 2011 ( 2 ) ROMBUNNO.HPC-130,NO.43 2011.8

　More details

Language：Japanese

Evaluation of Auto‐Tuning Function on OpenATLib
三次元有限要素法アプリケーションにおける行列生成処理のCUDA向け実装

大島聡史, 林雅江, 片桐孝洋, 中島研吾

情報処理学会研究報告(CD-ROM) 2011 ( 2 ) ROMBUNNO.HPC-130,NO.11 2011.8

　More details

Language：Japanese

Implementation of Matrix Assembly in 3D Finite Element Method for CUDA
ヘテロ環境を目指した拡張階層型領域間分割に基づく高次フィルイン付き前処理手法の高速化

林雅江, 大島聡史, 中島研吾

研究報告ハイパフォーマンスコンピューティング（HPC） 2011 ( 5 ) 1 - 5 2011.7

　More details

Language：Japanese

Parallel ILU Preconditioner Based on Extended Hierarchical Interface Decomposition for Heterogeneous environments
拡張階層型領域間分割は，領域外からの高次フィルインを考慮可能とする並列化手法であり，分散データの局所性も高いことから，メニーコア環境での効率的な並列化手法として期待される．本研究では，物性値分布に不均質性をもつことから悪条件となる三次元静弾性問題に対し，拡張階層型領域間分割に基づく高次フィルイン前処理付き反復解法を適用する．本報告では，T2K(東大) を利用し，マルチコア環境における本並列実装プログラムの収束性および高次フィルイン付き前処理の並列性能についてマルチカラー法との比較に基づき評価する．Extended version of Hierarchical Interface Decomposition(HID) is developed as an effective parallelization method for Finite Element Method(FEM) on multi/many-core environments for its high locality of distributed mesh data. And thichker separators introduced in Extended HID allow us to take into account high level fill-ins in parallel ILU preconditioners. We implemented Extended HID to OpenMP parallel FEM base simulation of linear elasticity problem with heterogeneous property. The developed code has been tested on the T2K Open Super Computer(T2K/Tokyo) using 1 node, 16 cores to evaluate the robustness and the scalability of parallel ILU decomposition based on the comparison with MC ordering.
自動チューニングインターフェースOpenATLibにおける自動チューニング機能の評価

櫻井隆雄, 片桐孝洋, 直野健, 黒田久泰, 中島研吾, 猪貝光祥, 大島聡史, 伊藤祥司

研究報告ハイパフォーマンスコンピューティング（HPC） 2011 ( 43 ) 1 - 6 2011.7

　More details

Language：Japanese

Evaluation of Auto-Tuning Function on OpenATLib
科学技術計算等で利用される行列計算ライブラリは高い演算性能が得られるパラメータの選択や入力に多大な手間が必要なため，それを自動的に設定する方式が求められている．そこで，筆者らは自動チューニングインターフェース OpenATlib を開発している．本稿では OpenATLib の提供する機能の 1 つであるリスタート周期自動チューニング機能について述べる．本機能では残差履歴を用いて最適なリスタート周期を自動的に選択する．T2K オープンスパコンを用いて 3 種の解法で本機能の効果を評価した結果，固定値と比較して最大で 38.5 倍の性能差があり，機能の有効性が確認できた．Matrix libraries have many parameters as inputs by the user. They include problem parameters what are difficult to set values and the approach of automatically setting them is needed. Then, we proposed Auto-tuning interface "OpenATLib." In this paper, we explain a runtime automatic tuning approach for deciding the size of projection matrix in Krylov subspace methods. This approach searches the best size of projection matrix with history of residual values at runtime.Performance evaluations of OpenATLib using 3 solvers on T2K Open Supercomputer indicates that the maximum speedup establishes 38.5x.
三次元有限要素法アプリケーションにおける行列生成処理のCUDA向け実装

大島聡史, 林雅江, 片桐孝洋, 中島研吾

研究報告ハイパフォーマンスコンピューティング（HPC） 2011 ( 11 ) 1 - 6 2011.7

　More details

Language：Japanese

Implementation of Matrix Assembly in 3D Finite Element Method for CUDA
本稿では三次元有限要素法 (FEM) アプリケーションにおける行列生成処理の CUDA 向け実装について述べる．GPU は高い演算性能・メモリ転送性能を持つため様々な科学技術計算アプリケーションに利用されており，FEM についても多くの研究がなされている．特に，FEM の実行時間の多くは疎行列ソルバーが占めるため，疎行列ソルバーを対象とした GPU 実装の研究が盛んである．本稿では，疎行列ソルバーに次いで実行時間を要する処理である行列生成処理を対象として，1GPU，2GPU および 2GPU と CPU を用いた実装と性能について報告する．We describe the implementation of matrix assembly process in 3D Finite Element Method (FEM) using CUDA. Because GPU has high calculation performance and memory transfer performance, GPU is now utilizing for several scientific applications include FEM. Especially, many researches aim at speeeding up of sparse matrix solver because sparse matrix solver has the largest time ratio of execution time of FEM. In this paper, we focus on matrix assembly process which has the second largest time ratio of FEM, and show the result of implementation and performance evaluation.
前処理付きBiCGStab法の問題点に対する改良

伊藤祥司, 片桐孝洋, 櫻井隆雄, 猪貝光祥, 大島聡史, 黒田久泰, 直野健, 中島研吾

計算工学講演会論文集(CD-ROM) 16 ROMBUNNO.F-4-3 2011.5

　More details

Language：Japanese

前処理付きBiCGStab法の問題点に対する改良
HxABCLibScript : 非均質計算機向け自動チューニング記述言語拡張 (ハイパフォーマンスコンピューティング(HPC) Vol.2011-HPC-129)

片桐孝洋, 大島聡史, 平澤将一, 本多弘樹

情報処理学会研究報告 2010 ( 6 ) 1 - 8 2011.4

　More details

Language：Japanese

HxABCLibScript : An Extension of an Auto-tuning Language for Heterogeneous Computing Environment
本稿では，CPUおよびGPU(Graphics Processing Unit)を混載した非均質計算機において，任意のプログラムの一部分が，適する計算資源上で実行される最適化を実現する自動チューニング専用言語HxABCLibScriptを提案する．性能評価の結果，HxABCLibScript記述から自動生成されるコードは，問題サイズや反復回数に応じ，CPUとGPU間で適切に計算資源を切り替えることで最適化されることを確認した．In this paper, we propose HxABCLibScript, which is a dedicated language for auto-tuning description on heterogeneous computer environment, which includes CPU and GPU (Graphics Processing Unit), to adapt arbitrary parts of programs. Results of performance evaluation indicated that the automatically generated codes from the description of HxABCLibScript can select the best computer resources between CPU and GPU according to problem size or the number of iterations on the program.
三次元有限要素法アプリケーションのCUDA向け実装と性能評価—Implementation and Evaluation of 3D Finite Element Method for CUDA

大島聡史, 林雅江, 片桐孝洋

情報処理学会研究報告 2010年度 ( 6 ) 1 - 6 2011.4

　More details

Language：Japanese

Implementation and Evaluation of 3D Finite Element Method for CUDA
三次元有限要素法アプリケーションのCUDA向け実装と性能評価

大島聡史, 林雅江, 片桐孝洋, 中島研吾

研究報告ハイパフォーマンスコンピューティング（HPC） 2011 ( 20 ) 1 - 6 2011.3

　More details

Language：Japanese

Implementation and Evaluation of 3D Finite Element Method for CUDA
本稿では三次元弾性静力学を対象とした有限要素法(Finite Element Method, FEM)のGPU(CUDA)向け実装と性能評価について述べる．高い演算性能・メモリ転送性能を持つGPUは様々な科学技術計算アプリケーションに利用されており，FEMについても多くの研究がなされている．本稿では特に前処理付き共役勾配法(Conjugate Gradient Method, CG法)による疎行列ソルバーと係数行列生成部分に注目し，CUDA向けの実装と性能評価を行った結果を報告する．In this paper, we describe the implementation and evaluation of Finite Element Method(FEM) on GPU(CUDA). Because GPU has high calculation performance and memory transfer performance, GPU is now utilizing for several scientific applications include FEM. We show the result of implementation and performance evaluation especially about sparse matrix solver using Conjugate Gradient Method and matrix creation.
Segmented Scan法のCUDA向け最適化実装

大島聡史, 櫻井隆雄, 片桐孝洋, 中島研吾, 黒田久泰, 直野健, 猪貝光祥, 伊藤祥司

情報処理学会研究報告(CD-ROM) 2010 ( 3 ) ROMBUNNO.HPC-126,NO.1 2010.10

　More details

Language：Japanese

Optimized Implementation of Segmented Scan Method for CUDA
ATのGPUへの展開

大島聡史

日本応用数理学会年会講演予稿集 2010 297 - 298 2010.9

　More details

Language：Japanese

ATのGPUへの展開
GPUコンピューティング向け中間言語の研究

平澤将一, 大島聡史, 本多弘樹

情報処理学会論文誌プログラミング（PRO） 3 ( 4 ) 66 - 66 2010.9

　More details

Language：Japanese

Research on Intermediate Language for GPU Computing
In this presentation, we will discuss a intermediate language suitable for GPU computing. GPUs as data parallel processors have very high execution peak performance for general purpose computation. GPUs attract attention as accelerators in HPC (High Performance Computing). Accelerators usually have parallel execution models like SIMD and SPDM, independent memory, and high-speed on-chip scratchpad memory. Intermediate languages used in CPU compilers cannot fully describe these features. Users are using different programming environments for each accelerators and tuning source codes toward the peak performance. We evaluate the execution performance of a native compiling environment based on Java Bytecode and discuess the intermediate language which is suitable to describe the accelerator features.
Segmented Scan 法のCUDA向け最適化実装

大島聡史, 櫻井隆雄, 片桐孝洋, 中島研吾, 黒田久泰, 直野健, 猪貝光祥, 伊藤祥司

情報処理学会研究報告. [ハイパフォーマンスコンピューティング] 126 ( 1 ) A1 - A7 2010.8

　More details

Language：Japanese

Optimized Implementation of Segmented Scan Method for CUDA
本稿では Segmented Scan 法を用いた疎行列ベクトル積の CUDA 向け最適化実装について述べる．我々は実装の再利用性に着目した自動チューニングインターフェース OpenATLib の提案を行い，また OpenATLib の提供する機能の一つである疎行列ベクトル積においては Segmented Scan 方式を元にスカラ計算機向けに改良を行った Branchless Segmented Scan 方式を提案している．本稿ではこれらの方式を元にして CUDA 向けの新たな Segmented Scan 方式を考案し実装した．GPU 上で高速実行可能なようにアルゴリズムの改良や各種の最適化を行った結果，偏りの大きな行列に対して NVIDIA GeForceGTX285 上で最大で 3.26GFLOPS の性能を達成した．We discuss about optimized implementation of sparse matrix vector multiplication for CUDA using Segmented Scan method. We proposed Auto-tuning interface OpenATLib and we also proposed Branchless Segmented Scan method besed on Segmented Scan method for scalar computer as an important new feature of sparse matrix vector multiplication. In this paper, we proposed and implemented new Segmented Scan method for CUDA based on Segmented Scan method and Branchless Segmented Scan method. As a result of optimized implementation, we aimed 3.26GFLOPS on NVIDIA GeForceGTX285.
GPU向けソフトウェアキャッシュ機構の実装と評価

平澤将一, 下田和明, 下田和明, 大島聡史, 本多弘樹

情報処理学会研究報告(CD-ROM) 2009 ( 4 ) ROMBUNNO.ARC-186,9 2009.12

　More details

Language：Japanese

A Software Cache Implementation for GPU
GPU向けソフトウェアキャッシュ機構の実装と評価

平澤将一, 下田和明, 大島聡史, 本多弘樹

情報処理学会研究報告. 計算機アーキテクチャ研究会報告 186 I1 - I10 2009.11

　More details

Language：Japanese

A Software Cache Implementation for GPU
高性能コンピューティングにおいて GPU が注目されている．NVIDIA 製 GPU は CUDA において高性能なシェアードメモリを有効に用いるプログラミング技術により各種アプリケーションで非常に高いピーク性能が得られている一方，プログラミングの容易さ，汎用性に問題を残している．本研究においては CUDA においてユーザが明示的に使用するシェアードメモリの一部をデバイスメモリのキャッシュとするソフトウェアキャッシュ機構を提案する．本機構によりデバイスメモリからシェアードメモリへ暗黙的にデータ転送が行われ汎用計算の高速化が達成される．In HPC, GPU attracts attention. Although programming difficulty still remains, very high peak performance can be achieved using NVIDIA GPUs. In this research, we propose a software cache mechanism which caches the device memory of CUDA with the shared memory. User data can be transfered implicitly with the software cache and performance improvement of general-purpose computation benchmark programs can be achieved.
GPU向けソフトウェアキャッシュ機構の実装と評価

平澤将一, 下田和明, 大島聡史, 本多弘樹

情報処理学会研究報告. [ハイパフォーマンスコンピューティング] 123 ( 9 ) I1 - I10 2009.11

　More details

Language：Japanese

A Software Cache Implementation for GPU
高性能コンピューティングにおいて GPU が注目されている．NVIDIA 製 GPU は CUDA において高性能なシェアードメモリを有効に用いるプログラミング技術により各種アプリケーションで非常に高いピーク性能が得られている一方，プログラミングの容易さ，汎用性に問題を残している．本研究においては CUDA においてユーザが明示的に使用するシェアードメモリの一部をデバイスメモリのキャッシュとするソフトウェアキャッシュ機構を提案する．本機構によりデバイスメモリからシェアードメモリへ暗黙的にデータ転送が行われ汎用計算の高速化が達成される．In HPC, GPU attracts attention. Although programming difficulty still remains, very high peak performance can be achieved using NVIDIA GPUs. In this research, we propose a software cache mechanism which caches the device memory of CUDA with the shared memory. User data can be transfered implicitly with the software cache and performance improvement of general-purpose computation benchmark programs can be achieved.
OMPCUDA:GPU向けOpenMP処理系

大島聡史, 平澤将一, 本多弘樹

情報処理学会シンポジウム論文集 2009 ( 2 ) 42 2009.1

　More details

Language：Japanese

OMPCUDA:GPU向けOpenMP処理系
OMPCUDA:GPU向けOpenMPの実装

大島聡史, 平澤将一, 本多弘樹

情報処理学会シンポジウム論文集 2009 ( 2 ) 131 - 138 2009.1

　More details

Language：Japanese

OMPCUDA:GPU向けOpenMPの実装
OMPCUDA : GPU向け OpenMP の実装

大島聡史, 平澤将一, 本多弘樹

情報処理学会研究報告. [ハイパフォーマンスコンピューティング] 118 ( 125(HPC-118) ) 121 - 126 2008.12

　More details

Language：Japanese

OMPCUDA : Implementation of OpenMP for GPU
General-purpose computation using GPU (GPGPU) has been a focus of attention because of its performance, but the difficulity of GPGPU programming is a problem. So we have proposed GPGPU programming style using existing parallel programming style. In this paper, we implemented OMPCUDA OpenMP for CUDA-capable GPU, for explore a possibilities of GPGPU using OpenMP. Then, we evaluated out implimentation using test program. As a result, we confirmed that OMPCUDA make GPGPU parallel programming easy and can get speed-up.
メッセージ通信型GPGPUプログラミング(プログラミング環境,「ハイパフォーマンスコンピューティングとアーキテクチャの評価」に関する北海道ワークショップ(HOKKE-2008))

大島聡史, 平澤将一, 本多弘樹

情報処理学会研究報告. [ハイパフォーマンスコンピューティング] 2008 ( 19 ) 109 - 114 2008.3

　More details

Language：Japanese

Message Passing GPGPU Programming
As GPU's performance increases, general-purpose computation using GPU (GPGPU) is watched with keen interest more and more. GPGPU is expected to overtake CPU's performance by its parallel processing tendency, but, programming for GPGPU is not easy because of its special programming style. In this paper, we propose GPGPU programming style using existing parallel programming style. We take up several existing parallel programming styles such as message passing, and examined how they can be applied to GPGPU programming.
メッセージ通信型GPGPUプログラミング

大島聡史, 平澤将一, 本多弘樹

情報処理学会研究報告. ARC,計算機アーキテクチャ研究会報告 177 ( 19(ARC-177 HPC-114) ) 109 - 114 2008.3

　More details

Language：Japanese

Message Passing GPGPU Programming
As GPU's performance increases, general-purpose computation using GPU (GPGPU) is watched with keen interest more and more. GPGPU is expected to overtake CPU's performance by its parallel processing tendency, but, programming for GPGPU is not easy because of its special programming style. In this paper, we propose GPGPU programming style using existing parallel programming style. We take up several existing parallel programming styles such as message passing, and examined how they can be applied to GPGPU programming.
既存の並列化手法を用いたGPGPUプログラミング

大島聡史, 平澤将一, 本多弘樹

第49回プログラミング･シンポジウム予稿集 49th ( 2008 ) 81 - 88 2008.1

　More details

Language：Japanese

GPGPU Programming Using Existing Parallelizing Method
既存の並列化手法を用いたGPGPUプログラミングの提案

大島聡史, 平澤将一, 本多弘樹

情報処理学会研究報告. ARC,計算機アーキテクチャ研究会報告 175 ( 115(ARC-175) ) 7 - 10 2007.11

　More details

Language：Japanese

Proposal of GPGPU Programming Using Existing Parallelizing Method
GPGPU utilizing GPU's performance for general-purpose computation is attracting much attention. GPGPU is expected to effect higher performance than CPU. However, creating GPGPU programming is not easy because programming methods peculiar to GPGPU programming are needed. In this paper, we propose to use existing parallellizing method as one of a new method making GPGPU programming easier. Also we consider writing programs running on GPUs with OpenMP and MPI based on the new GPGPU programming language of CUDA.
ソフトウェア DSM Mocha とMPIの並列ベンチマークを用いた性能評価

今村昌之, 鈴木祥, 坂口朋也, 大島聡史, 片桐孝洋, 吉瀬謙二, 弓場敏嗣

情報処理学会研究報告. ARC,計算機アーキテクチャ研究会報告 172 ( 17(ARC-172 HPC-109) ) 103 - 108 2007.3

　More details

Language：Japanese

A Performance Comparison of Software-DSM Mocha and MPI Using Parallel Benchmarks
Software distributed shared memory (S-DSM) system is more friendly and easier to do programming compared with message pcissing interface. In this paper, we compared the performance of Mocha which is one of S-DSM systems, and MPICH which is a widely-used parallel programming library with message passing interface. Four applications (MM, SOR, IS, LU) were used for the parallel benchmarks. To measure S-DSM system overhead, the exe- cution time of interrupt handlers was measured. The result shows that the followings should be performed to archive high performance in the S-DSM system compared with MPI: 1. Introdaction of a pre-fetching for shared data before a page fault; 2. Improvement of acquiring lock performance; 3. Improvement of barrier synchronization performance.
ソフトウェアDSM MochaとMPIの並列ベンチマークを用いた性能評価(クラスタ,「ハイパフォーマンスコンピューティングとアーキテクチャの評価」に関する北海道ワークショップ(HOKKE-2007))

今村昌之, 鈴木祥, 坂口朋也, 大島聡史, 片桐孝洋, 吉瀬謙二, 弓場敏嗣

情報処理学会研究報告. [ハイパフォーマンスコンピューティング] 2007 ( 17 ) 103 - 108 2007.3

　More details

Language：Japanese

A Performance Comparison of Software-DSM Mocha and MPI Using Parallel Benchmarks
Software distributed shared memory (S-DSM) system is more friendly and easier to do programming compared with message pcissing interface. In this paper, we compared the performance of Mocha which is one of S-DSM systems, and MPICH which is a widely-used parallel programming library with message passing interface. Four applications (MM, SOR, IS, LU) were used for the parallel benchmarks. To measure S-DSM system overhead, the exe- cution time of interrupt handlers was measured. The result shows that the followings should be performed to archive high performance in the S-DSM system compared with MPI: 1. Introdaction of a pre-fetching for shared data before a page fault; 2. Improvement of acquiring lock performance; 3. Improvement of barrier synchronization performance.
CPUとGPUを用いた基本行列計算ライブラリ

大島聡史, 片桐孝洋, 弓場敏嗣, 平澤将一, 本多弘樹

情報処理学会シンポジウム論文集 2007 ( 1 ) 66 2007.1

　More details

Language：Japanese

CPUとGPUを用いた基本行列計算ライブラリ
Parallel processing of matrix multiplication in a CPU and GPU heterogeneous environment Reviewed

Satoshi Ohshima, Kenji Kise, Takahiro Katagiri, Toshitsugu Yuba

HIGH PERFORMANCE COMPUTING FOR COMPUTATIONAL SCIENCE - VECPAR 2006 4395 305 - 318 2007.1

　More details

Language：English Publishing type：Research paper (other academic)

GPUs for numerical computations are becoming an attractive alternative in research. In this paper, we propose a new parallel processing environment for matrix multiplications by using both CPUs and GPUs. The execution time of matrix multiplications can be decreased to 40.1% by our method, compared with using the fastest of either CPU only case or GPU only case. Our method performs well when matrix sizes are large.
CPUとGPUを用いた並列GEMM演算の提案と実装

大島聡史, 吉瀬謙二, 片桐孝洋, 弓場敏嗣

情報処理学会論文誌コンピューティングシステム（ACS） 47 ( 12 ) 317 - 328 2006.9

　More details

Language：Japanese

Proposal and Implementation of Parallel GEMM Routine Using CPU and GPU
GPUs for numerical computations are becoming an attractive research topics. We have proposed a new computation method of GPU, which utilizes parallel processing based on CPU and GPU. In this paper, we apply this method to existing numerical computation library. We examine a performance tuning method and execute performance experiments using a benchmark program. We also apply the method to the GEMM routine of BLAS and execute the HPL benchmark. As a result, the performance can be improved to 1.45 times by our method, compared with a CPU only environment using Pentium4 3.0GHz. There is a precision problem depending on GPU's arithmatic precision, but we show such a potentiality that our method can be applied to various applications.
MPIとの比較によるソフトウェアDSMの性能評価

今村昌之, 鈴木祥, 坂口朋也, 大島聡史, 片桐孝洋, 吉瀬謙二, 弓場敏嗣

情報処理学会研究報告. ARC,計算機アーキテクチャ研究会報告 169 ( 88 ) 157 - 162 2006.8

　More details

Language：Japanese

A Performance Comparison of Parallel Applications between Software-DSM and MPI
Software distributed shared memory (S-DSM) system achieves a virtual shared memory on distributed memory environment such as PC cluster without special hardware supports. S-DSM system is more friendly and easier to do programming compared with message passing interface. In this paper, we compare the performance of Mocha which is one of S-DSM systems, and MPICH which is a popular parallel programming library with message passing interface. Three applications (MM, SOR, LU) are used for our benchmarks. The results show the application in MPI that can be tuned for communication is better performance than one on S-DSM, but the one in MPI that can not be tuned is equal performance to one on S-DSM.
相乗り通信を利用したソフトウェアDSMの通信回数削減手法

坂口朋也, 今村昌之, 鈴木祥, 大島聡史, 片桐孝洋, 吉瀬謙二, 弓場敏嗣

情報処理学会研究報告. ARC,計算機アーキテクチャ研究会報告 169 ( 88 ) 151 - 156 2006.8

　More details

Language：Japanese

Ainori Communication : A Method to Reduce the Page Transfer of S-DSM Systems
We discuss the method to speed up the software distributed shared memory (S-DSM) systems. By predicting a page which will be needed to be transferred in the future and prefetching it, we can speed up the S-DSM systems. In this paper, we propose a method to transfer the predicted pages together with a message used in S-DSM systems. We will call this method Ainori communication. We evaluate our implementation using four S-DSM benchmarks and show that it can decrease the number of communication and improve the performance.
MPIとの比較によるソフトウェアDSMの性能評価

今村昌之, 鈴木祥, 坂口朋也, 大島聡史, 片桐孝洋, 吉瀬謙二, 弓場敏嗣

情報処理学会研究報告 2006 ( 88(ARC-169) ) 157 - 162 2006.7

　More details

Language：Japanese

A Performance Comparison of Parallel Applications between Software‐DSM and MPI
相乗り通信を利用したソフトウェアDSMの通信回数削減手法

坂口朋也, 今村昌之, 鈴木祥, 大島聡史, 片桐孝洋, 吉瀬謙二, 弓場敏嗣

情報処理学会研究報告 2006 ( 88(ARC-169) ) 151 - 156 2006.7

　More details

Language：Japanese

Ainori Communication: A Method to Reduce the Page Transfer of S‐DSM Systems
We discuss the method to speed up the software distributed shared memory (S-DSM) systems. By predicting a page which will be needed to be transferred in the future and prefetching it, we can speed up the S-DSM systems. In this paper, we propose a method to transfer the predicted pages together with a message used in S-DSM systems. We will call this method Ainori communication. We evaluate our implementation using four S-DSM benchmarks and show that it can decrease the number of communication and improve the performance.
CPUとGPUを用いた並列GEMM演算の提案と実装

大島聡史, 吉瀬謙二, 片桐孝洋, 弓場敏嗣

情報処理学会シンポジウム論文集 2006 ( 5 ) 41 - 50 2006.5

　More details

Language：Japanese

Proposal and Implementation of Parallel GEMM Routine Using CPU and GPU
CPUとGPUを複数用いた並列数値計算環境の検討

大島聡史, 吉瀬謙二, 片桐孝洋, 本多弘樹, 弓場敏嗣

情報処理学会シンポジウム論文集 2006 ( 5 ) 252 - 253 2006.5

　More details

Language：Japanese

CPUとGPUを複数用いた並列数値計算環境の検討
CPUとGPUの並列処理による行列積和演算方式の提案 (2005年並列/分散/協調処理に関する『武雄』サマー・ワークショップ(SWoPP武雄2005)--研究会・連続同時開催)

大島聡史, 吉瀬謙二, 片桐孝洋, 弓場敏嗣

情報処理学会研究報告 2005 ( 80 ) 139 - 144 2005.8

　More details

Language：Japanese

Proposal of Matrix Multiply and Add Method by Parallel Processing Using CPU and GPU
A research that uses GPU for numerical calculation is becoming active. In this paper, we do not only solve numerical problem using GPU but also propose a method that divide a problem and calculate it using CPU and GPU. Measure a execution time of matrix multiply and add, and because of parallel processing the performance was improved of 38.1% than the case solved only with CPU. Moreover, we exmine a method for a best problem distribution using each FLOPS of CPU and GPU obtained by a preliminary experiment. As a result, we could obtain values by prediction nearby experimental values.
GPUによるBLAS演算の性能評価

大島聡史, 吉瀬謙二, 片桐孝洋, 弓場敏嗣

情報処理学会シンポジウム論文集 2005 ( 5 ) 247 - 248 2005.5

　More details

Language：Japanese

GPUによるBLAS演算の性能評価
GPUによる高速な行列積の実装

大島聡史, 吉瀬謙二, 片桐孝洋, 弓場敏嗣

第67回全国大会講演論文集 2005 ( 1 ) 159 - 160 2005.3

　More details

Language：Japanese

GPUによる高速な行列積の実装
命令レベル並列性を利用したOpenMPによるプロセッサシミュレータの並列実行

大島聡史, 檜田敏克, 吉瀬謙二, 片桐孝洋, 本多弘樹, 弓場敏嗣

第66回全国大会講演論文集 2004 ( 1 ) 121 - 122 2004.3

　More details

Language：Japanese

命令レベル並列性を利用したOpenMPによるプロセッサシミュレータの並列実行

▼display all

Books

演習で学ぶ数値計算 : C&Fortran

片桐, 孝洋, 大島, 聡史

共立出版 2022.3

　More details

Responsible for pages：総ページ数：viii, 212p Language：Japanese

Presentations

GPUにおけるマルチプロセス実行の最適化について（MIGを使ったBLR-QRの高速化の最新状況＋α）

大島聡史

ATマイクロワークショップ2024 2024.11

　More details

Event date： 2024.11
Considering multi process calculations on current GPU Invited International conference

Satoshi Ohshima

ATAT in HPSC 2024 2024.3

　More details

Event date： 2024.3

Language：English Presentation type：Oral presentation (general)

Venue：National Center for High-Performance Computing in Hsinchu Science Park Country：Taiwan, Province of China
九大新スパコン玄界による限界のないコンピューティングへの挑戦 Invited

大島聡史

Supercomputing JAPAN 2024 2024.3

　More details

Event date： 2024.3

Language：Japanese Presentation type：Oral presentation (general)

Venue：タワーホール船堀 Country：Japan
一万計算コア超時代のGPUに向けたプログラム最適化と自動チューニングを考える

大島聡史

第15回自動チューニング技術の現状と応用に関するシンポジウム（ATTA2023） 2023.12

　More details

Event date： 2023.12

Language：Japanese Presentation type：Oral presentation (general)

Venue：工学院大学新宿キャンパスアーバンテックホール Country：Japan
QR Factorization of Block Low-rank Matrices on Multiple-/Multi-Instance GPUs Invited International conference

Satoshi Ohshima

ATAT in HPSC 2023 2023.3

　More details

Event date： 2023.3

Language：English Presentation type：Oral presentation (general)

Venue：NCU Center for Mathematics and Theoretical Physics Country：Taiwan, Province of China
BLR-QR on GPU：マルチインスタンスGPUを用いた多数の小密行列計算の高速化

大島聡史, 伊田明弘, 横田理央, 山崎市太郎

第14回自動チューニング技術の現状と応用に関するシンポジウム（ATTA2022） 2022.12

　More details

Event date： 2022.12

Language：Others

Country：Other
QR Factorization of Block Low-Rank Matrices on Multi-Instance GPU International conference

@Satoshi Ohshima, Akihiro Ida, Rio Yokota, Ichitaro Yamazaki

The 23rd International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT' 22) 2022.12

　More details

Event date： 2022.12

Language：English Presentation type：Oral presentation (general)

Country：Japan
大規模分散医用画像処理アプリケーションの実用化に向けた研究

大島聡史

JHPCN: 学際大規模情報基盤共同利用・共同研究拠点第14回シンポジウム 2022.7

　More details

Event date： 2022.7

Language：Others

Country：Other
大規模分散医用画像処理に向けた医用画像処理アプリケーションの最適化

大島聡史

JHPCN: 学際大規模情報基盤共同利用・共同研究拠点第14回シンポジウム 2022.7

　More details

Event date： 2022.7

Language：Others

Country：Other
スーパーコンピュータ「不老」の"クラウド的な"利用の状況について

大島聡史

PCクラスタワークショップin 神戸2022「クラウドとHPC」 2022.5

　More details

Event date： 2022.5

Language：Others

Country：Other
Effectiveness of Low-/Mixed-Precision Computation on Parareal Method International conference

Satoshi Ohshima

ATAT in HPSC 2021 (2021 Conference on Advanced Topics and Auto Tuning in High-Performance Scientific Computing) 2021.3

　More details

Event date： 2021.3

Language：Others

Country：Other
RTコアによるハードウェアレイトレーシングの性能評価

枦木慎也, 大島聡史, 片桐孝洋, 永井亨

情報処理学会第83回全国大会 2021.3

　More details

Event date： 2021.3

Language：Others

Country：Other
高精度行列-行列積における疎行列演算実装選択の自動チューニングの検討

青木将太, 片桐孝洋, 大島聡史, 永井亨

情報処理学会第83回全国大会 2021.3

　More details

Event date： 2021.3

Language：Others

Country：Other
量子アニーリングマシンにおける組み合わせ最適化問題の適用可能性の調査

大山基樹, 森下誠, 片桐孝洋, 大島聡史, 永井亨

情報処理学会第83回全国大会 2021.3

　More details

Event date： 2021.3

Language：Others

Country：Other
マルチGPU環境における機械学習ハイパーパラメータの自動チューニング（２）

藤家空太郎, 多部田敏樹, 藤井昭宏, 田中輝雄, 加藤由花, 大島聡史, 片桐孝洋

情報処理学会第83回全国大会 2021.3

　More details

Event date： 2021.3

Language：Others

Country：Other
マルチGPU環境における機械学習ハイパーパラメータの自動チューニング（１）

多部田敏樹, 藤家空太郎, 藤井昭宏, 田中輝雄, 加藤由花, 大島聡史, 片桐孝洋

情報処理学会第83回全国大会 2021.3

　More details

Event date： 2021.3

Language：Others

Country：Other
Adaptation of A64 Scalable Vector Extension for Spiral

Naruya Kitai, Daisuke Takahasi, Franz Franchetti, Takahiro Katagiri, Satoshi Ohshima, Toru Nagai

情報処理学会研究報告(HPC-178) 2021.3

　More details

Event date： 2021.3

Language：English

Country：Other
GPUクラスタを用いて並列化した自動チューニングの機械学習プログラムへの適用と安定性の検証

藤家空太郎, 多部田敏樹, 藤井昭宏, 田中輝雄, 加藤由花, 大島聡史, 片桐孝洋

情報処理学会研究報告(HPC-178) 2021.3

　More details

Event date： 2021.3

Language：Others

Country：Other
ジャイロ運動論シミュレーションにおける位相空間上の分布関数構造の可視化および自動類似度判定システムの開発に向けて

北澤修太, 沼波政倫, 大谷寛明, 片桐孝洋, 大島聡史, 永井亨

先進的描画技術を用いた可視化情報の研究会（VR2020） 2020.12

　More details

Event date： 2020.12

Language：Others

Country：Other
ジャイロ運動論シミュレーションにおける位相空間上の分布関数構造の可視化と類似度判定

北澤修太, 沼波政倫, 大谷寛明, 片桐孝洋, 大島聡史, 永井亨

閉じ込め・輸送研究会2020 2020.12

　More details

Event date： 2020.12

Language：Others

Country：Other
スーパーコンピュータ「不老」のシステム構成と性能

大島聡史, 永井亨, 片桐孝洋

大学ICT推進協議会 2020年度年次大会 2020.12

　More details

Event date： 2020.12

Language：Others

Country：Other
スーパーコンピュータ「不老」における光ディスクライブラリを用いたコールドストレージシステムの構築

高橋一郎, 大島聡史, 片桐孝洋

大学ICT推進協議会 2020年度年次大会 2020.12

　More details

Event date： 2020.12

Language：Others

Country：Other
スーパーコンピュータ「不老」のサービスとエコシステム

田島嘉則, 山田一成, 高橋一郎, 毛利晃大, 片桐孝洋, 大島聡史, 永井亨

大学ICT推進協議会 2020年度年次大会 2020.12

　More details

Event date： 2020.12

Language：Others

Country：Other
Large-scale numerical simulation of fluid-rigid body interactions simulation based on a stabilized ISPH method with Chebyshev basis CG solver International conference

Bowen Liu, Masao Ogino, Mitsuteru Asai, Takahiro Katagiri, Satoshi Ohshima

COMPSAFE 2020 2020.12

　More details

Event date： 2020.12

Language：Others

Country：Other
LNGタンク内の異密度LNGの混合流動解析

田村守淑, 今野雅, 大島聡史

オープンCAE・FrontISTR合同シンポジウム2020 2020.12

　More details

Event date： 2020.12

Language：Others

Country：Other
スーパーコンピュータ「不老」におけるOpenFOAMの性能評価

大島聡史, 今野雅

オープンCAE・FrontISTR合同シンポジウム2020 2020.12

　More details

Event date： 2020.12

Language：Others

Country：Other
カスタムキャビテーションモデルを用いたNACA0015水中翼周りの数値解析

池田拓士, 秋山善克, 今野雅, 大島聡史

オープンCAE・FrontISTR合同シンポジウム2020 2020.12

　More details

Event date： 2020.12

Language：Others

Country：Other
OpenFOAMへのカスタムキャビテーションモデルの実装

秋山善克, 池田拓士, 今野雅, 大島聡史

オープンCAE・FrontISTR合同シンポジウム2020 2020.12

　More details

Event date： 2020.12

Language：Others

Country：Other
「不老」の特徴的な機能とベンチマーク結果の紹介

大島聡史

第1回スーパーコンピュータ「不老」ユーザ会 2020.8

　More details

Event date： 2020.8

Language：Others

Country：Other
ユーザプログラム利用状況の紹介

大島聡史

第1回スーパーコンピュータ「不老」ユーザ会 2020.8

　More details

Event date： 2020.8

Language：Others

Country：Other
Performance evaluation of the MODYLAS application on modern multi-core and many-core environments International conference

Satoshi Ohshima, Soichiro Suzuki, Tatsuya Sakashita, Masao Ogino, Takahiro Katagiri, Yoshimichi Andoh

The Fourteenth International Workshop on Automatic Performance Tuning (iWAPT2019, IPDPS2019 Workshop) 2019.5

　More details

Event date： 2019.5

Language：English Presentation type：Oral presentation (general)

Venue：Hilton Copacabana, Rio de Janeiro, Brazil Country：Brazil
Trying to accelerate many small BLAS calculations on GPU International conference

Satoshi Ohshima

ATAT in HPSC (2019 Conference on Advanced Topics and Auto Tuning in High-Performance Scientific Computing) 2019.2

　More details

Event date： 2019.2

Language：English Presentation type：Oral presentation (general)

Venue：National Sun Yat-sen University, Kaohsiung, Taiwan Country：Taiwan, Province of China
階層型行列計算におけるソフトウェア自動チューニング

大島聡史, 山崎市太郎, 伊田明弘, 横田理央

第23回計算工学講演会 2018.6

　More details

Event date： 2018.6

Language：Japanese Presentation type：Oral presentation (general)

Venue：ウインクあいち Country：Japan
Performance of Hierarchical-matrix BiCGStab Solver on GPU Clusters

Ichitaro Yamazaki, Ahmad Abdelfattah, Akihiro Ida, Satoshi Ohshima, Stanimire Tomov, Rio Yokota, Jack Dongarra

32nd IEEE International Parallel and Distributed Processing Symposium, IPDPS 2018 2018.8

　More details

Event date： 2018.5

Language：English

Venue：Vancouver Country：Canada

HACApK is a software package for solving dense linear systems of equations and is used in other software packages, like ppohBEM for solving boundary integral equations. To enable the solution of large-scale boundary value problems, HACApK hierarchically compresses the coefficient matrix and uses the BiConjugate Gradient Stabilized (BiCGStab) method for solving the linear system. To extend HACApK's capability, this paper outlines how we ported the HACApK linear solver onto GPU clusters. Though the potential of GPUS has been widely accepted in high-performance computing, it is still a challenge to utilize the GPUS for a solver, like HACApK, that requires fine-grained irregular computation and global communication. To utilize the GPUS, we integrated the variable-size batched GPU kernel that was recently released in the MAGMA software package. This is the first time the variable-size batched kernels were used in a solver or application code. We discuss several techniques to improve the performance of the batched kernel and demonstrate the effects of these techniques on two state-of-The-Art GPU clusters. For instance, with two 14-core Intel Xeon CPUs and four NVIDIA P100 GPUS per node, the GPU kernel obtained a solver speedup of 8× on one node and 4× on eight nodes. We also show that when the inter-GPU communication becomes significant, the solution time can be further reduced by a factor of 2× by carefully designing the communication layer with the underlying node architecture in mind.
Optimization of Hierarchical matrix computation on GPU International conference

Satoshi Ohshima, @Ichitaro Yamazaki, @Akihiro Ida, @Rio Yokota

SC-Asia 2018 2018.3

　More details

Event date： 2018.3

Language：English Presentation type：Oral presentation (general)

Venue：Resorts World Convention Centre Sentosa Country：Singapore
スーパーコンピュータシステムITOの性能評価

大島聡史、南里豪志、渡部善隆、天野浩文、小野謙二

情報処理学会第162回ハイパフォーマンスコンピューティング研究発表会 2017.12

　More details

Event date： 2017.12

Language：Japanese Presentation type：Oral presentation (general)

Venue：くまもと県民交流館パレア Country：Japan
非ブロッキング集団通信の通信隠蔽効果に関する調査

南里豪志、大島聡史、小野謙二

情報処理学会第162回ハイパフォーマンスコンピューティング研究発表会 2017.12

　More details

Event date： 2017.12

Language：Japanese Presentation type：Oral presentation (general)

Venue：くまもと県民交流館パレア Country：Japan
通信削減CG法の性能評価@OFP

大島聡史

ATマイクロワークショップ@鳥羽 2017.10

　More details

Event date： 2017.10

Language：Japanese Presentation type：Symposium, workshop panel (public)

Venue：鳥羽シーサイドホテル Country：Japan
Auto-tuning of directives: tuning directives of OpenMP and OpenACC International conference

Satoshi Ohshima

Second International Workshop on Deepening Performance Models for Automatic Tuning (DPMAT) 2017.8

　More details

Event date： 2017.8

Language：English Presentation type：Oral presentation (general)

Venue：Nagoya University Country：Japan
GPUクラスタ上における階層型行列計算の最適化

大島聡史, @山崎市太郎, @伊田明弘, @横田理央

情報処理学会第160回ハイパフォーマンスコンピューティング研究発表会 2017.7

　More details

Event date： 2017.7

Language：Japanese Presentation type：Oral presentation (general)

Venue：秋田アトリオンビル Country：Japan
KNLを用いたFDMコードの自動チューニングとGPU適用の最新動向

@片桐孝洋，大島聡史，@松本正晴

2017年ハイパフォーマンスコンピューティングと計算科学シンポジウム(HPCS2017) 2017.6

　More details

Event date： 2017.6

Language：Japanese Presentation type：Oral presentation (general)

Venue：神戸大学先端融合研究環統合研究拠点コンベンションホール Country：Japan
Auto-tuning on NUMA and Many-core Environments with an FDM code International conference

@Takahiro Katagiri, Satoshi Ohshima @Masaharu Matsumoto

The Twelfth International Workshop on Automatic Performance Tuning (iWAPT2017) (In Conjunction with the IEEE IPDPS2017) 2017.6

　More details

Event date： 2017.5 - 2017.6

Language：English Presentation type：Symposium, workshop panel (public)

Venue：Buena Vista Palace Hotel Country：United States
Auto-Tuning of Hierarchical Computations with ppOpen-AT

Takahiro Katagiri, Masaharu Matsumoto, Satoshi Ohshima

SIAM Conference on Parallel Processing for Scientific Computing (PP16), MS55 Auto-Tuning for the Post Moore's Era - Part I of II 2016.4

　More details

Event date： 2016.4

Language：English

Country：Other

Auto-Tuning of Hierarchical Computations with ppOpen-AT
マルチGPU環境における機械学習ハイパーパラメータの自動チューニング（２）

藤家, 空太郎, 多部田, 敏樹, 藤井, 昭宏, 田中, 輝雄, 加藤, 由花, 大島, 聡史, 片桐, 孝洋

第83回全国大会講演論文集 2021.3

　More details

Language：Japanese

Country：Japan

我々は反復一次元探索を用いた自動チューニングの研究に取り組んでおり，マルチGPU環境を用いた機械学習のプログラムのハイパーパラメータの最適化を進めている．機械学習は同一のハイパーパラメータを用いても毎回教師データが変わるなど同一の結果にならないため，自動チューニングの結果にブレが生じる．このブレに対して，これまで，我々は推定したパラメータに対して追加測定を行い自動チューニングの安定性を高める手法を提案してきた．本研究では，歩行者経路予測アプリケーションに用いる機械学習プログラムに適用しマルチGPU環境で推定したハイパーパラメータの値を並列化し複数回まとめて追加測定することによる，自動チューニングの精度向上について示す．
パネルディスカッション「スーパーコンピュータの産業利用と今後の産学共創のあり方」（パネリスト）

大島聡史

Cyber HPC Symposium 2021

　More details

Language：Others

Country：Other
GPUが支えるDX 変革の今、この先を考える－ GPUスパコンとOpenACC ー（パネリスト）

大島聡史

GPU Computing Workshop 2020

　More details

Language：Others

Country：Other
SIAM AN10(Conference Reports)

片桐孝洋, 大島聡史

応用数理 2010.12

　More details

Language：Japanese

Country：Japan

SIAM AN10(Conference Reports)
未踏ユースから育ったタレントたち【PART 1 若い未踏クリエータからのメッセージ】：13.発掘し，育成し，つなぐ場所

大島聡史

情報処理 2011.11

　More details

Language：Japanese

Country：Japan

IT Talents Who Sprang Out of the Mitoh-Youth : Field for Exploration, Growing, and Connection
高精度行列-行列積アルゴリズムにおけるbatched BLASの適用

石黒史也, 片桐孝洋, 大島聡史, 永井亨, 荻野正雄

第80回全国大会講演論文集 2018.3

　More details

Language：Japanese

Country：Other

行列-行列積に代表される基本線形計算を集約したライブラリBLAS (Basic Linear Algebra Subprograms) は、多くの数値計算で必須の処理である。しかし、従来の数値計算ライブラリは、演算速度の向上は考慮しているが演算精度の向上に関する考慮が不十分である。一方、尾崎が提案した高精度行列-行列演算アルゴリズム(以降、尾崎の方法)は、利用している浮動小数点演算型の精度限界まで高精度演算ができる。本研究では尾崎の方法の実装に対して、複数の小さな行列-行列積を高速に行うことができるBatched BLASを適用した実装方式を提案する。性能評価では、Batched BLASの有効性を検証し、CPU環境だけでなくGPU環境についても提案する実装方式の有効性を検証する。
3.基礎物理シミュレーション研究と可視化技術の進展 3.2 可視化技術

大谷寛明, 大谷寛明, 石黒静児, 石黒静児, 宮澤順一, 宮澤順一, 大野暢亮, 陰山聡, 三浦英昭, 森高外征雄, 森高外征雄, 田村祐一, 北澤修太, 片桐孝洋, 大島聡史, 永井亨, 沼波政倫, 沼波政倫, 名倉成輝, 川原慎太郎, HU Kunqi, 小山田耕二, 後藤拓也, 嘉無木昇, 高丸尚教, PETROSKY Tomio, 田中智

プラズマ・核融合学会誌 2020.10

　More details

Language：Others

Country：Japan

3. Progress in Simulation Study of Fundamental Physics and Visualization Technology: 3.2: Visualization Technology
可視化技術—Visualization Technology—プロジェクトレビュー核融合科学研究所における数値実験炉研究プロジェクト ; 基礎物理シミュレーション研究と可視化技術の進展

大谷寛明, 石黒静児, 宮澤順一, 大野暢亮, 陰山聡, 三浦英昭, 森高外征雄, 田村祐一, 北澤修太, 片桐孝洋, 大島聡史, 永井亨, 沼波政倫, 名倉成輝, 川原慎太郎, 胡昆祁, 小山田耕二, 後藤拓也, 嘉無木昇, 高丸尚教, PETROSKY Tomio, 田中智

プラズマ・核融合学会誌 = Journal of plasma and fusion research / プラズマ・核融合学会編集委員会編 2020.10

　More details

Language：Japanese

Country：Japan
マルチGPU環境における機械学習ハイパーパラメータの自動チューニング（１）

多部田, 敏樹, 藤家, 空太郎, 藤井, 昭宏, 田中, 輝雄, 加藤, 由花, 大島, 聡史, 片桐, 孝洋

第83回全国大会講演論文集 2021.3

　More details

Language：Japanese

Country：Japan

我々は複数のパラメータを同時に推定する手法として，パラメータ空間における反復一次元探索を提案している．この手法はパラメータの組み合わせを自動的に選択し，その実行性能を実測，さらに別の組み合わせの選択を繰り返すことで探索を行う．この提案手法を機械学習プログラムに適用する．機械学習には複数のハイパーパラメータが存在し，適切なハイパーパラメータの組み合わせを推定するには時間がかかる．本研究は歩行者経路予測アプリケーションに用いる機械学習のハイパーパラメータについて適切な組み合わせを推定し，マルチGPU環境を利用して実測処理を並列化することで，約15日かかる推定が約12時間で完了することを示す．

▼display all

MISC

20th IEEE International Workshop on Automatic Performance Tuning (iWAPT 2025)

Takizawa H., Godoy W., Ohshima S., Marques O.A., Katagiri T., Pueschel M., Imamura T., Iwashita T., Lara V., Vuduc R., Wang W., Yamamoto Y., Ida A., Van Werkhoven B., Takahashi D., Xu F., Chung I.H., Chou J., Doerfert J., Komatsu K., Gerndt M., Sakamoto R., Hung S.H., Benkner S., Fukaya T., Low T.M.

Proceedings 2025 IEEE International Parallel and Distributed Processing Symposium Workshops Ipdpsw 2025 209 - 210 2025 （ ISBN:9798331526436 ）

　More details

Publisher：Proceedings 2025 IEEE International Parallel and Distributed Processing Symposium Workshops Ipdpsw 2025

DOI： 10.1109/IPDPSW66978.2025.00038

Scopus
【HPCによる大規模医用画像処理】GPUスパコンを用いたPETの四次元再構成

大島聡史, 湯淺義尚, 松村海飛, 横田達也, 本谷秀堅, 坂田宗之, 木村裕一, 片桐孝洋, 永井亨, 塙敏博, 星野哲也

MEDICAL IMAGING TECHNOLOGY 41 ( 4-5 ) 150 - 155 2023.11 （ ISSN:0288-450X ）

　More details

Language：Japanese Publisher：(一社)日本医用画像工学会

医用画像処理技術の発達により,生体の内部を視覚的に理解するためのさまざまな技術が開発され,利用されている.しかし,それらにより直接的に得ることができるのは画像や映像であり,診断は医師など人の手によって行われている.これらの労力を軽減するソフトウェアへの期待は大きく,すでに医療の現場で利用されている技術も増えてきているが,医療(医用画像)と計算機技術の両方の知識と技術が必要なため,対象は限られている.そこで本研究では,医用画像処理分野と高性能計算分野の研究者が協力してPETにおける画像再構成の高速化と大規模化に取り組んでいる.本稿ではその取り組みの内容とこれまでに得られた成果を紹介する.(著者抄録)

Professional Memberships

IPSJ SIG High Performance Computing

2005.1 - Present
IPSJ SIG Programming

2009.8 - Present
IPSJ SIG System Architecture

2004.12 - 2021.3
Information Processing Society of Japan, Game Informatics Research Group

2021.5 - 2024.3
Auto-Tuning Research Group

2010.3 - Present
Algorithms for Matrix / Eigenvalue Problems and their Applications, The Japan Society for Industrial and Applied Mathematics

2010.4 - Present
The Open CAE Society of Japan

2010.7 - Present
Association for Computing Machinery (ACM)

2010.5 - Present
Society for Industrial and Applied Mathematics (SIAM)

2010.5 - 2021.12

▼display all

Committee Memberships

情報処理学会 HPC研究会 Steering committee member Domestic

2025.4 - Present

　 More details

Committee type：Academic society
自動チューニング研究会主査 Domestic

2023.6 - 2027.6

　 More details

Committee type：Academic society
スーパーコンピューティングジャパン理事 Domestic

2023.3 - Present

　 More details

Committee type：Academic society
情報処理学会 HPC研究会 Steering committee member Domestic

2016.4 - 2020.3
自動チューニング研究会 Organizer Domestic

2016.4 - 2020.3
日本応用数理学会若手の会 Steering committee member Domestic

2015.4 - 2019.3
日本応用数理学会　「行列・固有値問題の解法とその応用」研究部会 Steering committee member Domestic

2013.4 - Present

　 More details

Committee type：Academic society
オープンCAE学会 Executive Domestic

2011.4 - Present

　 More details

Committee type：Academic society

▼display all

Academic Activities

xSIG 2025 Program Chair

xSIG 2025 Organizing Comittee 2025.8

　More details

Type：Competition, symposium, etc.
オープンCAEシンポジウム2024実行委員長

Role(s)： Planning, management, etc.

オープンCAEシンポジウム2024実行委員会 2024.11

　More details

Type：Competition, symposium, etc.
xSIG 2024 PC委員

cross-disciplinary Workshop on Computing Systems, Infrastructures, and Programming (xSIG) 2024 （ Japan ） 2024.8

　More details

Type：Competition, symposium, etc.
ICS2024 Finance Chair International contribution

ICS 2024: International Conference on Supercomputing （ Kyoto Japan ） 2024.6

　More details

Type：Competition, symposium, etc.
iWAPT 2024 Steering Committee International contribution

International Workshop on Automatic Performance Tuning (iWAPT) 2024 （ San Francisco UnitedStatesofAmerica ） 2024.5

　More details

Type：Competition, symposium, etc.
HPC Asia 2024 Proceedings Chair International contribution

The International Conference on High Performance Computing in Asia-Pacific Region (HPC Asia 2024) （ Nagoya Japan ） 2024.1

　More details

Type：Competition, symposium, etc.
PC委員

cross-disciplinary Workshop on Computing Systems, Infrastructures, and Programming (xSIG) 2023 （ Japan ） 2023.8

　More details

Type：Competition, symposium, etc.
Program Chair International contribution

International Workshop on Automatic Performance Tuning (iWAPT) 2023 （ St. Petersburg, Florida UnitedStatesofAmerica ） 2023.5

　More details

Type：Competition, symposium, etc.
幹事

第64回プログラミング・シンポジウム（ Japan ） 2023.1

　More details

Type：Competition, symposium, etc.
PC委員

cross-disciplinary Workshop on Computing Systems, Infrastructures, and Programming (xSIG) 2022 （ Japan ） 2022.8

　More details

Type：Competition, symposium, etc.
Program Vice-chair International contribution

International Workshop on Automatic Performance Tuning (iWAPT) 2022 （オンライン　その他） 2022.6

　More details

Type：Competition, symposium, etc.
PC委員

xSIG2018 （ Japan ） 2018.5

　More details

Type：Competition, symposium, etc.
PC委員 International contribution

GCA17 （ Japan ） 2017.11

　More details

Type：Competition, symposium, etc.
実行委員

SWoPP2017 （ Japan ） 2017.7

　More details

Type：Competition, symposium, etc.
PC委員

HPCS2017 （ Japan ） 2017.6

　More details

Type：Competition, symposium, etc.
PC委員

xSIG2017 （ Japan ） 2017.4

　More details

Type：Competition, symposium, etc.
Screening of academic papers

Role(s)： Peer review

2017

　More details

Type：Peer review

Number of peer-reviewed articles in foreign language journals：0

Number of peer-reviewed articles in Japanese journals：1

Proceedings of International Conference Number of peer-reviewed papers：11

Proceedings of domestic conference Number of peer-reviewed papers：3

▼display all

Research Projects

低ランク構造行列法の適用範囲拡大と多様な計算アーキテクチャの活用

Grant number：24K02949 2024.4 - 2027.3

Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B)

伊田明弘, 横田理央, 塙敏博, 岩下武史, 大島聡史, 星野哲也, 平石拓, 河合直聡

　 More details

Grant type：Scientific research funding

本研究では、低ランク構造行列法ライブラリの高機能化を実施する。科学技術計算では、密行列演算に基づく計算手法の数値線形代数ライブラリが広く利用されいる。密行列演算から低ランク構造行列演算へ置き換えが行えるように、低ランク構造行列法の適用範囲を拡大する。低ランク構造行列に基づく新たな数値計算アルゴリズムを開発する。アルゴリズム開発は、GPU・FPGAなど最新の計算機アーキテクチャで構成されるクラスタ計算機を意識し、実装の最適化を行う。様々な低ランク構造行列の演算に対し、最適な計算機アーキテクチャを割当て、混合精度演算・動的負荷分散なども活用し、計算機の性能を最大限に引き出す実装法を研究する。

CiNii Research
JHPCN2022：大規模分散医用画像処理アプリケーションの実用化に向けた研究

2022.4 - 2023.3

　 More details

Authorship：Principal investigator

JHPCN jh220011
JHPCN2022：管楽器および音響機器の大規模流体音響解析管楽器および音響機器の大規模流体音響解析

2022.4 - 2023.3

　 More details

Authorship：Coinvestigator(s)

JHPCN jh220001
JHPCN2022：多粒子分散系の乱流輸送に関する大規模シミュレーション

2022.4 - 2023.3

　 More details

Authorship：Coinvestigator(s)

JHPCN jh220003
JHPCN2022：Hierarchical low-rank approximation methods on distributed memory and GPUs

2022.4 - 2023.3

　 More details

Authorship：Coinvestigator(s)

JHPCN jh220009
JHPCN2022：FMOプログラムABINIT-MPの高速化と超大規模系への対応

2022.4 - 2023.3

　 More details

Authorship：Coinvestigator(s)

JHPCN jh220010
JHPCN2022：高性能かつ高信頼な数値計算手法とその応用

2022.4 - 2023.3

　 More details

Authorship：Coinvestigator(s)

JHPCN jh220022
JHPCN2022：三次元強震動シミュレーションとリアルタイムデータ同化の融合

2022.4 - 2023.3

　 More details

Authorship：Coinvestigator(s)

JHPCN jh220029
JHPCN2022：機械学習ソフトウェアへのソフトウェア自動チューニング技術の適用

2022.4 - 2023.3

　 More details

Authorship：Coinvestigator(s)

JHPCN jh220044
JHPCN2022：HPCと高速通信技術の融合による大規模データの拠点間転送技術開発と実データを用いたシステム実証試験

2022.4 - 2023.3

　 More details

Authorship：Coinvestigator(s)

JHPCN jh220048
JHPCN2022：Innovative Multigrid Methods II

2022.4 - 2023.3

　 More details

Authorship：Coinvestigator(s)

JHPCN jh220049
HPCI2022：新規感染症のための計算科学的解析環境の整備

2022.4 - 2023.3

　 More details

Authorship：Coinvestigator(s)

hp220025
科研2021-：レイトレーシング加速機構を備える画像処理ハードウェアを用いた高性能計算科学の創成

2021.4 - 2024.3

　 More details

Authorship：Principal investigator

科研萌芽
科研2021-：格子H行列に基づく数値線形代数の構築と最新アーキテクチャへの高性能実装法

2021.4 - 2024.3

　 More details

Authorship：Coinvestigator(s)

科研基盤B
格子H行列に基づく数値線形代数の構築と最新アーキテクチャへの高性能実装法

2021 - 2023

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B)

　 More details

Authorship：Coinvestigator(s) Grant type：Scientific research funding
レイトレーシング加速機構を備える画像処理ハードウェアを用いた高性能計算科学の創成

2021 - 2023

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Challenging Research(Exploratory)

　 More details

Authorship：Principal investigator Grant type：Scientific research funding
科研2020-：超巨大ニューラルネットの継続学習への型破りな線形代数技術の適用

2020.4 - 2023.3

　 More details

Authorship：Coinvestigator(s)

科研開拓
超巨大ニューラルネットの継続学習への型破りな線形代数技術の適用

2020 - 2022

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Challenging Research(Pioneering)

　 More details

Authorship：Coinvestigator(s) Grant type：Scientific research funding
科研2018-：機械学習向けハードウェアとの親和性が高い連立一次方程式の解法の開発とその高性能超並列実装

2018.4 - 2021.3

　 More details

Authorship：Coinvestigator(s)

科研基盤研究(B)
機械学習向けハードウェアとの親和性が高い連立一次方程式の解法

Grant number：18H03248 2018 - 2020

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B)

　 More details

Authorship：Coinvestigator(s) Grant type：Scientific research funding
ディープラーニングを利用した革新的自動チューニング基盤の創製

Grant number：18K19782 2018 - 2020

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Challenging Research(Exploratory)

　 More details

Authorship：Coinvestigator(s) Grant type：Scientific research funding
科研2017-：H行列法ライブラリの機能拡張と次世代スパコン向け最適化

2017.4 - 2020.3

　 More details

Authorship：Coinvestigator(s)

科研基盤研究(B)
科研2017-：逐次問題の並列計算の数理とフレームワーク研究開発・実証 International coauthorship

2017.4 - 2019.3

　 More details

Authorship：Coinvestigator(s)

科研基盤研究(B)
逐次問題の並列計算の数理とフレームワーク研究開発・実証

2017 - 2019

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B)

　 More details

Authorship：Coinvestigator(s) Grant type：Scientific research funding
H行列法ライブラリの機能拡張と次世代スパコン向け最適化

Grant number：17H01749 2017 - 2019

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B)

　 More details

Authorship：Coinvestigator(s) Grant type：Scientific research funding
科研2016-：通信回避・削減アルゴリズムのための自動チューニング技術の新展開

2016.4 - 2019.3

　 More details

Authorship：Collaborating Investigator(s) (not designated on Grant-in-Aid)

科研基盤研究(B)
国際交流による自動チューニングのための性能モデルの深化 International coauthorship

2016.4 - 2018.3

　 More details

日本-台湾の2国間による国際共同研究プロジェクト
JHPCN2016-：Hierarchical low-rank approximation methods on distributed memory and GPUs International coauthorship

2016.4 - 2017.3

　 More details

JHPCN国際共同研究
通信回避・削減アルゴリズムのための自動チューニング技術の新展開

Grant number：16H02823 2016 - 2018

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B)

　 More details

Authorship：Collaborating Investigator(s) (not designated on Grant-in-Aid) Grant type：Scientific research funding
JHPCN2015：分子動力学計算ソフトウェアMODYLASのメニーコアアーキテクチャ対応並列化に関する研究

2015.4 - 2016.3

　 More details

JHPCN jh150015

▼display all

Educational Activities

大学院システム情報科学府および工学部電気情報工学科の講義を担当。卒研生・修士学生の受入あり。

Class subject

プログラミング演習

2025.6 - 2025.8 Summer quarter
大規模計算特論

2024.10 - 2025.2 Second semester
プログラミング演習

2024.6 - 2024.8 Summer quarter
基幹教育セミナー

2024.4 - 2024.8 First semester
大規模計算特論

2023.10 - 2024.2 Second semester
基幹教育セミナー

2023.4 - 2023.8 First semester

▼display all

Visiting, concurrent, or part-time lecturers at other universities, institutions, etc.

2025 名古屋大学情報基盤センター Classification:Affiliate faculty Domestic/International Classification:Japan

Semester, Day Time or Duration：2025年4月 - 2026年3月
2024 名古屋大学情報基盤センター Classification:Affiliate faculty Domestic/International Classification:Japan

Semester, Day Time or Duration：2024年4月 - 2025年3月
2023 名古屋大学情報基盤センター Classification:Affiliate faculty Domestic/International Classification:Japan

Semester, Day Time or Duration：2023年4月 - 2024年3月
2022 名古屋大学情報学部 Classification:Part-time lecturer Domestic/International Classification:Japan

Semester, Day Time or Duration：秋II期、プログラミング2（週2コマ）
2022 名古屋大学情報基盤センター Classification:Affiliate faculty Domestic/International Classification:Japan

Semester, Day Time or Duration：2022年10月 - 2023年3月

Teaching Student Awards

xSIG 2025 Outstanding Student Award

Classification of award-winning students：Postgraduate student Name of award-winning student：遠藤悠介

　 More details

Outline of Social Contribution and International Cooperation activities

Participated as lead PM in the "Fukuoka Mitou", a project adopted by the Ministry of Economy, Trade and Industry's “AKATSUKI Project,” and was involved in the project to discover and develop young human resources. (Ongoing from June 2023)

Activities contributing to policy formation, academic promotion, etc.

2018.1 - 2020.1 文部科学省

文部科学省研究振興局　HPCI計画推進委員会将来のHPCIの在り方に関する検討ワーキンググループ委員