Faculty Profiles - NANRI TAKESHI

Information

写真a

NANRI TAKESHI

Organization

Research Institute for Information Technology Section of Advanced Computational Science Associate Professor
Graduate School of Information Science and Electrical Engineering Department of Information Science and Technology（Concurrent）

Contact information

Profile

Research: My major research topic is to develop technologies for runtime systems of parallel programs. Parallel computers have become popular platform from PC servers to large-scale supercomputers. To utilize the computational power of those platforms, users must prepare parallel programs. In an execution of a parallel program on a parallel computer, a runtime system interprets the operation written in the program into specific behavior of the computer. Therefore, technologies in runtime systems are the keys for efficient use of parallel computers. Especially, when the size of the computer becomes large, to achieve better performance, consideration of the information available at runtime, such as the allocation of processes, balance of load among processors or contentions of resources among jobs. In our research, we are developing technologies for dynamic optimization of runtime systems on large-scale parallel computers.

Homepage

https://nanrilab-kyushu-u.notion.site/top
Takeshi Nanri

External link

Research Areas

Informatics / High performance computing

Degree

Ph.D

Research History

Research Institute for Information Technology Section of Advanced Computational Science Associate Professor

2007.3 - Present

Education

Kyushu University Graduate School of Engineering Master Course of Information Engineering

- 1995.3

　 More details

Country：Japan
Kyushu University School of Engineering

- 1993.3

　 More details

Country：Japan

Research Interests・Research Keywords

Research theme： Fundamental techniques to enable highly scalable parallel computations

Keyword： scalability, parallel computation, high-performance computation

Research period： 2011.9
Research theme： Technologies for dynamic optimization of communication libraries on large-scale parallel computers

Keyword： Parallel Computing, Runtime Optimization

Research period： 2005.4
Research theme： Programming environment for hierarchical parallel environment

Keyword： Hierarchical parallel computer, distributed shared memory, communication optimization

Research period： 2003.4

Awards

大学ICT推進協議会2020年度年次大会優秀論文賞

2021.4 大学ICT推進協議会 DIMMスロット装着型不揮発性メモリ上のRDMAによるメッセージキューイングシステムの試作
山下記念研究賞

2013.7 一般社団法人情報処理学会第136回ハイパフォーマンスコンピューティング研究会における研究発表「Tofuネットワークにおけるプロセス配置形状による集団通信アルゴリズムの性能解析」に対する受賞。

　More details

スーパーコンピュータの大規模化に伴って，ノード間インターコネクトネットワークとして，コストの低い多次元メッシュ/トーラストポロジを採用したものを用いる事例が増えている．多次元メッシュ/トーラスは，使用するノード数が同じでも，プロセスが配置されるノード群の形状によって性能が大きく変動する．本研究では，京コンピュータや，その互換機である FujitsuPRIMEHPC FX10で用いられている Tofuインターコネクトネットワークを対象として，プロセス配置の形状による集団通信アルゴリズムの性能への影響を計測した．得られた性能を，Tofuインターコネクトの性能解析ツールを用いて取得した通信衝突による転送待ち時間と比較したところ，プロセス配置形状による変動がどちらもほぼ同じ傾向を示すことを明らかにした．これらの結果から，集団通信アルゴリズムの選択において，プロセス配置の形状を考慮した性能見
積もりが重要であることを示した．

Papers

「京」の後の時代を支えるスパコン：5．多数のXeonプロセッサを用いるスパコン Invited

@南里豪志

情報処理 60 ( 12 ) 1198 - 1203 2019.11

　More details

Language：Japanese Publishing type：Research paper (scientific journal)
分散共有メモリシステム上にソフトウェアによって構築されたキャッシュシステムの静的制御 Reviewed

南里豪志, 佐藤周行, 島崎眞昭

情報処理学会論文誌 1997.9

　More details

Language：Japanese Publishing type：Research paper (scientific journal)
Portability in Implementing Distributed Shared Memory System on the Workstation Cluster Environment Reviewed

Takeshi Nanri, Hiroyuki Sato and Masaaki Shimasaki

Research Reports on Information Science and Electrical Engineering of Kyushu University 1997.3

　More details

Language：English Publishing type：Research paper (scientific journal)
Optimization of a GEMM Implementation using Intel AMX Reviewed International journal

Endo Y., Ohshima S., Nanri T.

Proceedings of Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region Sca Hpcasia 2026 81 - 90 2026.1 （ ISBN:9798400720673 ）

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：Proceedings of Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region Sca Hpcasia 2026

In high-performance computing, general matrix multiplication (xGEMM) routines form the core of Level-3 BLAS kernels, which enables efficient matrix operations. Among these, the low-precision GEMM such as BFloat16 has become indispensable in machine learning and deep learning because it reduces memory usage and power consumption. To meet this demand, recent hardware platforms are equipped with dedicated matrix computation units separate from the CPU, and research efforts have focused on maximizing their performance. Intel's Advanced Matrix Extension (AMX) is one such hardware accelerator designed specifically for low-precision matrix operations. In this study, we implement and optimize matrix multiplication-accumulation using AMX by applying blocking and tile register level optimizations and evaluate its performance. Our results demonstrate a performance improvement in the range of 7.27-20.40% compared to that of the BFloat16 GEMM implementations provided by MKL and OpenBLAS.

DOI： 10.1145/3773656.3773660

Scopus
On the normalised energy dissipation rate in homogeneous isotropic turbulence Reviewed

Kitamura, T; Nagata, K; Shimoyama, K; Nanri, T

JOURNAL OF FLUID MECHANICS 1010 2025.4 （ ISSN:0022-1120 eISSN:1469-7645 ）

　More details

Publisher：Journal of Fluid Mechanics

The Reynolds number dependence of the normalised energy dissipation rate is studied, where is the energy dissipation rate, is the integral length scale and is the root-mean-square velocity. We present the derivation of the exact relationship between the normalised energy dissipation rate and integrated form of the Kármán-Howarth equation in homogeneous isotropic turbulence. The present mathematical formulation is valid for both forced and decaying turbulence. The discussion of is developed under the assumption that the term resulting from the nonlinear energy transfer appearing in is constant at sufficiently high-Reynolds-number turbulence. The fact that the integrated term originating from nonlinear energy transfer is constant plays the role of a lower bound in, implying that the energy dissipation rate is finite in high-Reynolds-number turbulence. Furthermore, the origin of the non-equilibrium dissipation law could be the imbalance between and, the influence of external forces, or both. In decaying turbulence with forced turbulence as the initial condition, the imbalance between and causes the non-equilibrium dissipation law. The validity of the theoretical analysis is investigated using direct numerical simulations of the forced and decaying turbulence.

DOI： 10.1017/jfm.2025.316

Web of Science

Scopus
Introduction of the new supercomputer Genkai

HIRASHIMA Tomoyuki, SUGAO Takahiko, HARADA Hiroyoshi, NANRI Takeshi, OHSHIMA Satoshi

Proceedings of the Annual Conference of Academic Exchange for Information Environment and Strategy 2024 ( 0 ) 37 - 40 2024.12 （ eISSN:24349305 ）

　More details

Language：Japanese Publishing type：Research paper (conference, symposium, etc.) Publisher：Academic eXcange for Information Environment and Strategy

九州大学情報基盤研究開発センター（以下、「九大センター」という。）では、2024年10月よりスーパーコンピュータシステム玄界の運用を開始した。本稿ではITOからの利用制度の方針変更や玄界において提供する各種利用制度を紹介する。

DOI： 10.24669/axies.2024.0_37

CiNii Research
Optical properties of rutile TiO<inf>2</inf> with Zr, Mo, Zn, Cd impurities Reviewed

Ohno K., Sahara R., Nanri T., Kawazoe Y.

Computational Condensed Matter 41 2024.12 （ ISSN:2352-2143 ）

　More details

Publisher：Computational Condensed Matter

To explore quasiparticle (QP) energy gaps and photoabsorption spectra of rutile TiO2 with nonmagnetic transition metal (Zr, Mo, Zn, Cd) impurities, we conducted a Γ-point only GW ＋ Bethe–Salpeter equation (BSE) calculation on a 72 (or 71) atom supercell. Our findings reveal that Zn and Cd impurities must coexist, at least partly, with oxygen vacancies to maintain charge neutrality. Among the systems considered, Mo, Zn, or Cd doped rutile TiO2 may exhibit optical absorption and catalytic activity under visible light. The resulting QP energy gaps (ΔɛQP) and photoabsorption energies (PAEs) are fairly in good agreement with both experimental and theoretical data currently available. The necessary conditions for the applicability of the Γ-point only approach in the GW ＋ BSE framework were found to be: (1) The Γ-point only GW calculation should reproduce a reasonable band gap. (2) The “superficial” exciton binding energy (the diagonal element of Wvc;vc−2Xvc;vc between v= VBM and c= CBM, where W and X are the direct and exchange terms of the BSE matrix elements, respectively) must be positive or marginally negative. (3) The “real” exciton binding energy (ΔɛQP− the lowest PAE) should be positive, even if it is exceptionally small.

DOI： 10.1016/j.cocom.2024.e00977

Web of Science

Scopus
Forward and backward multi-particle dispersion in homogeneous isotropic turbulence Invited Reviewed

ARAKAWA Ryunosuke, KITAMURA Takuya, SONOBE Yohei, SAIMOTO Akihide, NANRI Takeshi

Transactions of the JSME (in Japanese) 90 ( 929 ) 1 - 9 2024.1 （ eISSN:21879761 ）

　More details

Language：Japanese Publishing type：Research paper (scientific journal) Publisher：The Japan Society of Mechanical Engineers

<p>Turbulent diffusion in homogeneous isotropic turbulence is numerically investigated using the direct numerical simulation (DNS). The four-dimensional turbulence database allows us to track fluid particles not only in the forward direction of time but also in the backward direction. Two-particle dispersion has been studied in previous studies and it is known that backward diffusion is faster than forward diffusion. However, little is known about multi-particle dispersion due to the difficulty of observing it experimentally. Studies on backward diffusion are also limited. In this study, multi-particle dispersion is numerically investigated and its properties are discussed, e.g., direction of time and geometry of a tetrahedron. The results show that forward and backward diffusions of multi-particles behave differently at the beginning and evolve similarly after the transient time, but the coefficients of the backward direction are larger than those of the forward direction.</p>

DOI： 10.1299/transjsme.23-00281

CiNii Research
Introduction of the New Supercomputer System at Research Institute for Information Technology of Kyushu University

Ohshima Satoshi, Nanri Takeshi, Yoshizoe Kazuki, Hirashima Tomoyuki, Harada Hiroyoshi, Ikeda Tsuguho

Proceedings of the Annual Conference of Academic Exchange for Information Environment and Strategy 2023 ( 0 ) 89 - 96 2023.12 （ eISSN:24349305 ）

　More details

Language：Japanese Publishing type：Research paper (conference, symposium, etc.) Publisher：Academic eXcange for Information Environment and Strategy

九州大学情報基盤研究開発センターでは、2024年7月より新スーパーコンピュータシステムの運用を開始する。本稿では、このシステムの概要を、現有システムITOからの改善点を踏まえながら紹介する。

DOI： 10.24669/axies.2023.0_89

CiNii Research
Implementation of Coupled Numerical Analysis of Magnetospheric Dynamics and Spacecraft Charging Phenomena via Code-To-Code Adapter (CoToCoA) Framework Reviewed

Miyake Y., Sunada Y., Tanaka Y., Nakazawa K., Nanri T., Fukazawa K., Katoh Y.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 14074 LNCS 438 - 452 2023 （ ISSN:03029743 ISBN:9783031360206 ）

　More details

Publisher：Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

This paper addresses the implementation of a coupled numerical analysis of the Earth’s magnetospheric dynamics and spacecraft charging (SC) processes based on our in-house Code-To-Code Adapter (CoToCoA). The basic idea is that the magnetohydrodynamic (MHD) simulation reproduces the global dynamics of the magnetospheric plasma, and its pressure and density data at local spacecraft positions are provided and used for the SC calculations. This allows us to predict spacecraft charging that reflects the dynamic changes of the space environment. CoToCoA defines three types of independent programs: Requester, Worker, and Coupler, which are executed simultaneously in the analysis. Since the MHD side takes the role of invoking the SC analysis, Requester and Worker positions are assigned to the MHD and SC calculations, respectively. Coupler then supervises necessary coordination between them. Physical data exchange between the models is implemented using MPI remote memory access functions. The developed program has been tested to ensure that it works properly as a coupled physical model. The numerical experiments also confirmed that the addition of the SC calculations has a rather small impact on the MHD simulation performance with up to about 500-process executions.

DOI： 10.1007/978-3-031-36021-3_46

Scopus
Numerical approach for aerodynamics around two tone holes of woodwind instruments Reviewed

Takanami S., Tabata R., Iwagami S., Ohno T., Nanri T., Kobayashi T., Takahashi K.

Proceedings of the International Congress on Acoustics 2022 （ ISSN:22267808 ）

　More details

Publisher：Proceedings of the International Congress on Acoustics

In this paper, we discuss the numerical reproducibility of the compressible fluid behavior around two tone holes of woodwind instruments by using compressible Large Eddy Simulation (LES). In particular, we focus on the situation that the tone holes are opened and closed with moving pads above the tone holes, which is regarded as a moving boundary problem with topology change, and reproduce the change of the pitch when opening and closing the tone holes. Our two-dimensional model of a "recorder" has two tone holes. To reproduce the opening and closing the tone holes, the pads are moved continuously. That is, the position of the pads were continuously changed in the order of "open - close". Our numerical results are consistent with the Keef's experimental results. We solved the moving boundary problem with topology change under the situation of acoustics of fluid-structure interaction, and reproduced the pitch change in the opening and closing the tone holes of the recorder like woodwind instrument model.

Scopus
Compressible fluid analysis on basic properties of a thermoacoustic equipment Reviewed

Tashima Y., Ohno T., Nanri T., Kobayashi T., Takahashi K.

Proceedings of the International Congress on Acoustics 2022 （ ISSN:22267808 ）

　More details

Publisher：Proceedings of the International Congress on Acoustics

Two and three-dimensional models of a test-tube thermoacoustic engine were numerically analyzed using compressible Large Eddy Simulation (LES) to investigate initial transient behavior, i.e., generation mechanism of thermoacoustic waves in an initial state. In the model used in this study, a stack is placed near the bottom of the test-tube and a temperature gradient is applied between both ends of the stack. The model has an external region connecting the opening of the test-tube for radiations of sound waves and heat. As a result, a fluid flow was observed inside the stack, a strong pressure oscillation, i.e., acoustic resonance, was observed inside the test-tube, and sound radiation from the open end was also observed. Furthermore, the frequency of the sound vibration was almost the same as the theoretical estimation of the resonance frequency of the test-tube. Thus, we successfully reproduced the basic properties of the thermoacoustic engine in the initial state. However, noise components increased in time evolution and stationary oscillations were not attained yet. Thus, we need to improve our numerical method. We are also planning to analyze sound waves' generation mechanism taking into account aeroacoustic theory, e.g., Lighthill's acoustic analogy.

Scopus
Aeroacoustic analysis of port noise by using a three-dimensional numerical model of a bass reflex speaker system Reviewed

Uryuu K., Tabata R., Ohno T., Nanri T., Kobayashi T., Takahashi K.

Proceedings of the International Congress on Acoustics 2022 （ ISSN:22267808 ）

　More details

Publisher：Proceedings of the International Congress on Acoustics

We numerically study port noise observed for a bass reflex speaker system with a compressible fluid solver, Large-Eddy Simulation(LES), to explore the noise generation mechanism from the viewpoint of aeroacoustics. The port noise is considered an aerodynamic sound generated by vortices, created by the interaction between the acoustic field and port opening. However, the detail of the sound generation mechanism is still an open problem. By using a 3D-model, the port-noise is well reproduced by compressible LES, when the speaker system is acoustically driven at its resonance frequency, the Helmholtz resonance frequency of the bass reflex speaker system. Vortices are created near the edges of the port and generate broadband noise. However, noises are enhanced due to resonance in some bands, which correspond to the acoustic resonance frequencies of the port and encloser themselves. We are also planning to investigate the noise generation mechanism by using Howe's energy corollary, which allows us to estimate the energy transfer between fluid dynamics and acoustics, namely we consider the problems of where the aeroacoustic noise is generated and how much energy is transferred from the vortex motions to the acoustic field.

Scopus
Aeroacoustic analysis of oboe reeds with compressible direct numerical simulation Reviewed

Nakahara Y., Sumita R., Tabata R., Iwagami S., Nanri T., Kobayashi T., Hattori Y., Takahashi K.

Proceedings of the International Congress on Acoustics 2022 （ ISSN:22267808 ）

　More details

Publisher：Proceedings of the International Congress on Acoustics

A two-dimensional model of an oboe reed is studied numerically with a direct numerical simulation (DNS) of the compressible Navier-Stokes equations to investigate the sound generation mechanism from the viewpoint of aeroacoustics. The numerical tool is extremely accurate due to the smallest mesh size on the order of micrometers and successfully reproduces the details of fluid motion and acoustic vibrations inside and outside the reed. Particular attention is paid to the effect of reed vibration on the sound generation mechanism. When the reeds are fixed and a periodically varying flow is injected through the fixed reed slit, an aerodynamics sound created inside the reeds is an almost monotone including a few overtones. On the other hand, when a flow is injected through periodically vibrating reeds from an oral cavity, more overtone components are observed and the pressure waveforms are similar to those observed in the experiment. This indicates that the richness of the overtones of the double-reed instrument is mainly attributed to the aerodynamic sound created by the flow injected through vibrating reeds and the bore, a linear resonator, just enhances characteristics of the instrument, e.g., formant.

Scopus
Numerical study of a French horn mouthpiece accompanied by vibrating lips and an oral cavity with compressible direct numerical simulation Reviewed

Sumita R., Tabata R., Iwagami S., Nakahara Y., Nanri T., Kobayashi T., Hattori Y., Takahashi K.

Proceedings of the International Congress on Acoustics 2022 （ ISSN:22267808 ）

　More details

Publisher：Proceedings of the International Congress on Acoustics

A two-dimensional model of a French Horn mouthpiece is numerically studied with a 2D direct numerical simulation (DNS) of the compressible Navier-Stokes equations to investigate the sound generation mechanism from the viewpoint of aeroacoustics. That is, we consider the sounding mechanism of buzzing, when the mouthpiece without a bore is played. Our numerical tool is highly accurate due to the minimum mesh size of the order of the micro-meter, and details of fluid motion and acoustic oscillation inside and near the mouthpiece are successfully reproduced. In particular, we focus on the roles of vibrating lips and an oral cavity in the sound generation mechanism. When the mouthpiece without lips and an oral cavity is driven by a periodic flow with a certain frequency, a single tone without overtones is observed. On the other hand, when the mouthpiece is driven by vibrating lips with an oral cavity, a generating sound includes rich overtones and its waveform is similar to that observed experimentally. Since the bore is a linear element and cannot generates overtones from a single tone by itself, the sound of a horn including rich overtones is generated by a mouthpiece with the vibrating lips and oral cavity.

Scopus
Numerical study of the feedback mechanism of the edge tone Reviewed

Onomata T., Iwagami S., Tabata R., Ohno T., Nanri T., Kabayashi T., Takahashi K.

Proceedings of the International Congress on Acoustics 2022 （ ISSN:22267808 ）

　More details

Publisher：Proceedings of the International Congress on Acoustics

We numerically investigate fundamental problems of the edge tone with compressible Large Eddy Simulation (LES) together with acoustic solver FDTD and incompressible LES. Jet oscillation and edge tone in the first mode are successfully reproduced by a 3D model with compressible LES. Namely, the acoustic intensity changes with the jet velocity well following the sixth power law, which is an evidence of the reliability of our numerical method. Next, we estimate the intensity of acoustic feedback in the following way. According to Kaykayoglu and Rockwell, effective pressure sources are considered to be located slightly downstream of the edge tip on both sides of the edge. Indeed, such a pair of positive and negative pressure spots periodically appear in our numerical calculation. Then, we set the pressure spots on both sides of the edge and reproduce acoustic waves radiated from them by FDTD. The acoustic particle velocity of the reproduced acoustic field at the nozzle outlet is regarded as acoustic feedback. Even though such acoustic feedback may make a contribution to driving the jet, we can consider that the fluid feedback is still dominant in a low-Reynolds number regime as pointed out by Paál et al. from the results of incompressible LES.

Scopus
Design of a Flexible In Situ Framework with a Temporal Buffer for Data Processing and Visualization of Time-Varying Datasets Reviewed

Kenji Ono, Jorji Nonaka, Hiroyuki Yoshikawa, Takeshi Nanri, Yoshiyuki Morie, Tomohiro Kawanabe, Fumiyoshi Shoji

Lecture Notes in Computer Science 11203 243 - 257 2019.1

　More details

Language：English Publishing type：Research paper (scientific journal)
Hybrid storage system consisting of cache drive and multi-tier SSD for improved IO access when IO is concentrated Reviewed

Kazuichi Oe, Takeshi Nanri, Koji Okamura

IEICE Transactions on Information and Systems E102D ( 9 ) 1715 - 1730 2019.1

　More details

Language：English Publishing type：Research paper (scientific journal)

In previous studies, we determined that workloads often contain many input-output (IO) concentrations. Such concentrations are aggregations of IO accesses. They appear in narrow regions of a storage volume and continue for durations of up to about an hour. These narrow regions occupy a small percentage of the logical unit number capacity, include most IO accesses, and appear at unpredictable logical block addresses. We investigated these workloads by focusing on page-level regularity and found that they often include few regularities. This means that simple caching may not reduce the response time for these workloads sufficiently because the cache migration algorithm uses page-level regularity. We previously developed an on-the-fly automated storage tiering (OTFAST) system consisting of an SSD and an HDD. The migration algorithm identifies IO concentrations with moderately long durations and migrates them from the HDD to the SSD. This means that there is little or no reduction in the response time when the workload includes few such concentrations. We have now developed a hybrid storage system consisting of a cache drive with an SSD and HDD and a multi-tier SSD that uses OTFAST, called "OTF-AST with caching." The OTF-AST scheme handles the IO accesses that produce moderately long duration IO concentrations while the caching scheme handles the remaining IO accesses. Experiments showed that the average response time for our system was 45% that of Facebook FlashCache on a Microsoft Research Cambridge workload.

DOI： 10.1587/transinf.2018EDP7253
ATSMF Automated tiered storage with fast memory and slow flash storage to improve response time with concentrated input-output (IO) workloads Reviewed

Kazuichi Oe, Mitsuru Sato, Takeshi Nanri

IEICE Transactions on Information and Systems E101D ( 12 ) 2889 - 2901 2018.12

　More details

Language：English Publishing type：Research paper (scientific journal)

The response times of solid state drives (SSDs) have decreased dramatically due to the growing use of non-volatile memory express (NVMe) devices. Such devices have response times of less than 100 micro seconds on average. The response times of all-flash-array systems have also decreased dramatically through the use of NVMe SSDs. However, there are applications, particularly virtual desktop infrastructure and in-memory database systems, that require storage systems with even shorter response times. Their workloads tend to contain many input-output (IO) concentrations, which are aggregations of IO accesses. They target narrow regions of the storage volume and can continue for up to an hour. These narrow regions occupy a few percent of the logical unit number capacity, are the target of most IO accesses, and appear at unpredictable logical block addresses. To drastically reduce the response times for such workloads, we developed an automated tiered storage system called “automated tiered storage with fast memory and slow flash storage” (ATSMF) in which the data in targeted regions are migrated between storage devices depending on the predicted remaining duration of the concentration. The assumed environment is a server with non-volatile memory and directly attached SSDs, with the user applications executed on the server as this reduces the average response time. Our system predicts the effect of migration by using the previously monitored values of the increase in response time during migration and the change in response time after migration. These values are consistent for each type of workload if the system is built using both non-volatile memory and SSDs. In particular, the system predicts the remaining duration of an IO concentration, calculates the expected response-time increase during migration and the expected response-time decrease after migration, and migrates the data in the targeted regions if the sum of response-time decrease after migration exceeds the sum of response-time increase during migration. Experimental results indicate that ATSMF is at least 20% faster than flash storage only and that its memory access ratio is more than 50%.

DOI： 10.1587/transinf.2018PAP0005
Approaches for memory-efficient communication library and runtime communication optimization

Takeshi Nanri

Advanced Software Technologies for Post-Peta Scale Computing The Japanese Post-Peta CREST Research Project 121 - 138 2018.12

　More details

Authorship：Lead author Language：English

This article summarizes the works established in Advanced Communication for Exa (ACE) project. The most important motivation of this project was the severe demands for scalable communication toward Exa-scale computations. Therefore, in the project, we have built a PGAS-based communication library, Advanced Communication Primitives (ACP). Its fundamental communication model is onesided, based on PGAS model, so that it can consume internal memory footprint as small as possible. Based on this model, several applications including simulations of magnetohydrodynamic, molecular orbitals, and particles were tuned to achieve higher scalability. In addition to that, some communication optimization techniques have been investigated. Especially, tuning methods of collective communications, such as message ordering, algorithm selection, and overlapping, are studied. Also, in this project, a network simulator NSIM-ACE is developed. It simulates behavior of packets for one-sided communications to study the effects of congestions on interconnects.

DOI： 10.1007/978-981-13-1924-2_7
Design of a Flexible In Situ Framework with a Temporal Buffer for Data Processing and Visualization of Time-Varying Datasets. Reviewed

Kenji Ono, Jorji Nonaka, Hiroyuki Yoshikawa, Takeshi Nanri, Yoshiyuki Morie, Tomohiro Kawanabe, Fumiyoshi Shoji

High Performance Computing - ISC High Performance 2018 International Workshops, Frankfurt/Main, Germany, June 28, 2018, Revised Selected Papers 243 - 257 2018.6

　More details

Language：English Publishing type：Research paper (other academic)

Design of a Flexible In Situ Framework with a Temporal Buffer for Data Processing and Visualization of Time-Varying Datasets.

DOI： 10.1007/978-3-030-02465-9_17
Performance Evaluation and Optimization of MagnetoHydroDynamic Simulation for Planetary Magnetosphere with Xeon Phi KNL Reviewed

Keiichiro Fukazawa, Takeshi Soga, Takayuki Umeda, Takeshi Nanri

Parallel Computing is Everywhere 178 - 187 2018.1

　More details

Language：English

The magnetohydrodynamic (MHD) simulation is often applied to study the global dynamics and configuration of a planetary magnetosphere for the space weather. In this paper, the computational performance of MHD code is evaluated with 128 nodes Xeon Phi KNL of Cray XC40. As the results, the 2D and 3D domain decompositions of SoA (structure of array) make the effective performances and AoS (array of structure) and hybrid parallel computation become low performances. Adding the performance optimizations for Xeon Phi to our MHD simulation code, then we have obtained 2.4 % increase of execution efficiency in total and we achieved 3 TFlops performance gain using 128 nodes.

DOI： 10.3233/978-1-61499-843-3-178
Analysis of the Quality of Academic Papers by the Words in Abstracts Invited Reviewed International journal

Tetsuya Nakatoh, Kenta Nagatani, Toshiro Minami, Sachio Hirokawa, Takeshi Nanri, Miho Funamori

HIMI 2017, Part II, LNCS 10274, Proc. of the 19th International Conference on Human-Computer Interaction (HCI International 2017) 2017.7

　More details

Language：English Publishing type：Research paper (international conference proceedings)
HPCにおける通信ライブラリの動向 Reviewed

南里豪志

シミュレーション 36 ( 2 ) 79 - 84 2017.6

　More details

Language：Japanese Publishing type：Research paper (scientific journal)
Assessing the Significance of Scholarly Articles using their Attributes Invited Reviewed International journal

Tetsuya Nakatoh, Sachio Hirokawa, Toshiro Minami, Takeshi Nanri, Miho Funamori

Proc. of the 22nd International Symposium on Artificial Life and Robotics (AROB2017) 742 - 746 2017.1

　More details

Language：English Publishing type：Research paper (scientific journal)
同種コンパイラーと他機種実行を利用した計算時間の短縮 Reviewed

藤野清次, 小玉捷平, 南里豪志, 岩里洸介

日本シミュレーション学会論文誌 8 ( 1 ) 21 - 24 2016.1

　More details

Language：Japanese Publishing type：Research paper (scientific journal)
性能向上を期待できる継続時間とIOアクセス数を満たしたIOアクセス集中領域を自動抽出してSSDに移動することで性能向上を図る階層型ストレージシステムの提案と評価 Reviewed

大江和一, 岩田聡, 南里豪志, 岡村耕二

情報処理学会論文誌 9 ( 1 ) 1 - 16 2016.1

　More details

Language：Japanese Publishing type：Research paper (scientific journal)
直接網において複数の通信デバイスを有効に使用する隣接通信アルゴリズムの提案 Reviewed

森江善之, 森江善之, 南里豪志, 南里豪志

情報処理学会論文誌トランザクションコンピューティングシステム(Web) 8 ( 4 ) 26-35 (WEB ONLY) 2015.11

　More details

Language：Japanese Publishing type：Research paper (scientific journal)

A Neighboring Communication Algorithm Using Effective Multiple Communication Devices on Direct Connection Network
並列計算における reduction指示の実装に関する考察 Reviewed

岩里洸介, 南里豪志, 藤野清次

日本シミュレーション学会論文誌 7 ( 4 ) 109 - 113 2015.7

　More details

Language：Japanese Publishing type：Research paper (scientific journal)
Performance Measurements of MHD Simulation for Planetary Magnetosphere on Peta-Scale Computer FX10 Reviewed

FUKAZAWA Keiichiro, Takeshi Nanri, Takayuki Umeda

Advances in Parallel Computing 2014.3

　More details

Language：English

DOI： 10.3233/978-1-61499-381-0-387
Performance evaluation of magnetohydrodynamics simulation for magnetosphere on K computer Reviewed

FUKAZAWA Keiichiro, Takeshi Nanri, Takayuki Umeda

Communications in Computer and Information Science 2013.12

　More details

Language：English

DOI： 10.1007/978-3-642-45037-2_61
Implementation of Neighbor Communication Algorithm Using Multi-NICs Effectively by Extended RDMA Interface Reviewed

Yoshiyuki Morie, Takeshi Nanri

SC13 Technical Posters 1 - 2 2013.11

　More details

Language：Others

Implementation of Neighbor Communication Algorithm Using Multi-NICs Effectively by Extended RDMA Interface
多次元メッシュ/トーラスにおける通信衝突を考慮したタスク配置最適化技術 Reviewed

森江善之, 南里豪志

情報処理学会 6 ( 3 ) 12 - 21 2013.9

　More details

Language：Japanese Publishing type：Research paper (scientific journal)
多次元メッシュ/トーラスにおける通信衝突を考慮したタスク配置最適化技術 Reviewed

森江善之, 南里豪志

情報処理学会論文誌トランザクションコンピューティングシステム(Web) 6 ( 3 ) 12-21 (WEB ONLY) 2013.9

　More details

Language：Japanese Publishing type：Research paper (scientific journal)

Task Allocation Technique for Avoiding Contentions on Multi-dimensional Mesh/Torus
A Neighbor Communication Algorithm with Making an Effective Use of NICs on Multidimensional-Mesh/torus Reviewed

Yoshiyuki Morie, Takeshi Nanri

International Conference on Simulation Technology (JSST2013) JSST2013 1 - 2 2013.9

　More details

Language：Others

A Neighbor Communication Algorithm with Making an Effective Use of NICs on Multidimensional-Mesh/torus
Development of a CUDA Implementation of the 3D FDTD Method Reviewed International journal

Matthew Livesey, James Francis Stack, Jr., Fumie Costen, Takeshi Nanri, Norimasa Nakashima, Seiji FUJINO

IEEE Antennas and Propagation Magazine 54 ( 5 ) 186 - 195 2012.10

　More details

Language：English Publishing type：Research paper (scientific journal)

DOI： 10.1109/MAP.2012.6348145
MPI_Allreduceの「京」上での実装と評価 Reviewed

松本幸，安達知也，住元真司，曽我武史，南里豪志，宇野篤也，黒川原佳，庄司文由，横川三津夫

情報処理学会 ACS論文誌 ( 40 ) 2012.9

　More details

Language：Japanese Publishing type：Research paper (scientific journal)
Task Allocation Optimization for Neighboring Communication on Fat Tree Reviewed

Yoshiyuki Morie, Takeshi Nanri

4th IEEE International Conference on High Performance Computing and Communication 9th IEEE International Conference on Embedded Software and Systems, HPCC-ICESS 2012 1219 - 1225 2012.1

　More details

Language：Others

Task Allocation Optimization for Neighboring Communication on Fat Tree

DOI： 10.1109/HPCC.2012.179
A Method for Predicting a Penalty of Contentions by Considering Priorities of Routing among Packets on Direct Interconnection Network Reviewed

Yoshiyuki Morie, Takeshi Nanri, Ryutaro Susukita

2011 Fourth International Joint Conference on Computational Sciences and Optimization 263 - 267 2011.4

　More details

Language：Others

A Method for Predicting a Penalty of Contentions by Considering Priorities of Routing among Packets on Direct Interconnection Network

DOI： 10.1109/CSO.2011.35
Task Allocation Method for Avoiding Contentions by the Information of Concurrent Communication Reviewed

Yoshiyuki Morie, Takeshi Nanri, Motoyoshi Kurokawa

The Tenth IASTED International Conference on Parallel and Distributed Computing and Networks 62 - 69 2011.2

　More details

Language：Others

Task Allocation Method for Avoiding Contentions by the Information of Concurrent Communication

DOI： 10.2316/P.2011.719-025
負荷バランスの動的最適化によるMPIブロードキャスト性能改善 Reviewed

曽我武史, 栗原康志, 南里豪志, 黒川原佳, 村上和彰

情報処理学会論文誌　コンピュータシステム 2008.12

　More details

Language：Japanese Publishing type：Research paper (scientific journal)

Dynamic Optimization of Load Balance in MPI Broadcast
Performance Models for MPI Collective Communications with Network Contention Reviewed

Hyacinthe Nzigou Mamadou, Takeshi Nanri and Kazuaki Murakami

IEICE Transactions on Communications 2008.5

　More details

Language：English Publishing type：Research paper (scientific journal)
衝突削減のためのタスク配置最適化に関する研究 Reviewed

森江善之, 末安直樹, 松本透, 南里豪志, 石畑宏明, 井上弘士, 村上和彰

次世代スーパーコンピューティング・シンポジウム2007 2007 2007.10

　More details

Language：Others
通信タイミングを考慮した衝突削減のためのMPIランク配置最適化技術 Reviewed

森江善之, 末安直樹, 松本透, 南里豪志, 石畑宏明, 井上弘士, 村上和彰

情報処理学会論文誌 48 ( SIG13(ACS19) ) 192 - 202 2007.8

　More details

Language：Japanese Publishing type：Research paper (scientific journal)

Optimization of MPI Rank Allocation Considering Communication Timing for Reducing Contention

▼display all

Presentations

DIMMスロット装着型不揮発性メモリ上のRDMAによるメッセージキューイングシステムの試作

@南里豪志、@大江和一、@吉田英司、@大辻弘貴、@林英里香

大学ICT推進協議会2020年度年次大会 2020.12

　More details

Event date： 2020.12

Language：Japanese Presentation type：Oral presentation (general)

Venue：オンライン Country：Japan
Implementation of Task Scheduler for Quantum-Classic Hybrid Environments

Takumi Tsuda, Taizo Kobayashi, Kin'ya Takahashi, @Takeshi Nanri

第198回ハイパフォーマンスコンピューティング・第14回量子ソフトウェア合同研究発表会 2025.3

　More details

Event date： 2025.3

Language：Japanese Presentation type：Oral presentation (general)

Venue：Sapporo
高スループット非同期集団通信の性能モデル化に向けた予備評価

Yoshiyuki Morie, Yasutaka Wada, Ryohei Kobayashi, Ryuichi Sakamoto, @Takeshi Nanri

第198回ハイパフォーマンスコンピューティング・第14回量子ソフトウェア合同研究発表会 2025.3

　More details

Event date： 2025.3

Language：Japanese Presentation type：Oral presentation (general)

Venue：Sapporo
Implamentation of an Benchmark That Enables Comparison of Overlapping Effects among Different Non-Blocking Collective Implementations

Takeru Narumi, @Takeshi Nanri

第198回ハイパフォーマンスコンピューティング・第14回量子ソフトウェア合同研究発表会 2025.3

　More details

Event date： 2025.3

Language：Japanese Presentation type：Oral presentation (general)

Venue：Sapporo
Implementation and Performance Evaluation of Discontinuous Data Transfer in Halo Communication with Tofu Interconnect

Rennma Arisako, Takeshi Nanri

2024年並列／分散／協調処理に関するサマー・ワークショップ 2024.8

　More details

Event date： 2024.8

Language：Japanese Presentation type：Oral presentation (general)

Venue：徳島市
Implementation of Coupled Numerical Analysis of Magnetospheric Dynamics and Spacecraft Charging Phenomena via Code-To-Code Adapter (CoToCoA) Framework International conference

Y. Miyake, Y. Sunada, Y. Tanaka, K. Nakazawa, @T. Nanri, K. Fukazawa and Y. Katoh

ICCS 2023 2023.6

　More details

Event date： 2023.6

Language：English Presentation type：Oral presentation (general)

Venue：Prague Country：Czech Republic
九州大学スーパーコンピュータとAWSクラウドサービスによるハイブリッド計算環境の相互補完的利用方法に関する調査

@南里豪志, 松山和広, 田代皓嗣, 原田浩睦

大学ICT推進協議会 2022年度年次大会 2022.12

　More details

Event date： 2022.12

Language：Japanese Presentation type：Oral presentation (general)

Venue：仙台国際センター Country：Japan
Cross-reference simulation by Code-To-Code Adapter (CoToCoA) library for the study of multi-scale physics in planetary magnetospheres International conference

Yuto Katoh, Keiichiro Fukazawa, @Takeshi Nanri, Yohei Miyake

2021 Ninth International Symposium on Computing and Networking Workshops (CANDARW) 2021.12

　More details

Event date： 2021.12

Language：English Presentation type：Oral presentation (general)

Country：Japan
実用アプリケーションのスイッチのキーテクノロジーであるSHARPを使用したMPI通信パフォーマンス向上の挑戦と、将来のスイッチテクノロジーへの期待 Invited

南里豪志

GPU TECHNOLOGY CONFERENCE 2020.10

　More details

Event date： 2020.10

Language：Japanese Presentation type：Oral presentation (general)

Venue：オンライン Country：Japan
Scalable Direct-Iterative Hybrid Solver for Sparse Matrices on Multi-Core and Vector Architectures

Kenji Ono, Toshihiro Kato, Satoshi Ohshima, Takeshi Nanri

International Conference on High Performance Computing in Asia-Pacific Region 2019.12

　More details

Event date： 2020.1

Language：English

Venue：Fukuoka Country：Japan
Application of cross-reference framework CoToCoA to Macro- and micro-scale simulations of planetary magnetospheres

Keiichiro Fukazawa, Yuto Katoh, Takeshi Nanri, Yohei Miyake

7th International Symposium on Computing and Networking Workshops, CANDARW 2019 2019.11

　More details

Event date： 2019.11

Language：English

Venue：Nagasaki Country：Japan

In this study, we have introduced the Code-to-Code Adapter (CoToCoA) library to couple the magnetohydrodynamic (MHD) simulation and the Electron Hybrid (EH) simulation of planetary magnetospheres. CoToCoA has been developed newly to connect the different codes easily. The concept of CoToCoA is that we do not add modifications to each code as possible without data transfer functions, and we do not need to know the referred code without data format. With CoToCoA, we have been developing the cross-reference simulation of macro (MHD) and micro (EH) scales in the magnetosphere. Then, we have evaluated the performance of cross-reference simulation using CoToCoA on the massively parallel computer system.
Hybrid Storage System to Achieve Efficient Use of Fast Memory Area

Kazuichi Oe, Takeshi Nanri

7th International Symposium on Computing and Networking, CANDAR 2019 2019.11

　More details

Event date： 2019.11

Language：English

Venue：Nagasaki Country：Japan

Hybrid storage techniques are useful methods to improve the cost performance for input-output (IO) intensive workloads. These techniques choose areas of concentrated IO accesses and migrate them to an upper tier to extract as much performance as possible through greater use of upper tier areas. Automated tiered storage with fast memory and slow flash storage (ATSMF) is a hybrid storage system situated between non-volatile memories (NVMs) and solid-state drives (SSDs). ATSMF aims to reduce the average response time for IO accesses by migrating areas of concentrated IO access from an SSD to an NVM. When a concentrated IO access finishes, the system migrates these areas from the NVM back to the SSD. Unfortunately, the published ATSMF implementation temporarily consumes much NVM capacity upon migrating concentrated IO access areas to NVM, because its algorithm executes NVM migration with high priority. As a result, it often delays evicting areas in which IO concentrations have ended to the SSD. Therefore, to reduce the consumption of NVM while maintaining the average response time, we developed new techniques for making ATSMF more practical. The first is a queue handling technique based on the number of IO accesses for NVM migration and eviction. The second is an eviction method that selects only write-accessed partial regions in finished areas. The third is a technique for variable eviction timing to balance the NVM consumption and average response time. Experimental results indicate that the average response times of the proposed ATSMF are almost the same as those of the published ATSMF, while the NVM consumption is drastically lower.
Performance improvement of high-speed file transfer over JHPCN

Praphan Pavarangkoon, Ken T. Murata, Kazunori Yamamoto, Kazuya Muranaga, Takamichi Mizuhara, Keiichiro Fukazawa, Ryusuke Egawa, Takahiro Katagiri, Masao Ogino, Takeshi Nanri

17th IEEE International Conference on Dependable, Autonomic and Secure Computing, IEEE 17th International Conference on Pervasive Intelligence and Computing, IEEE 5th International Conference on Cloud and Big Data Computing, 4th Cyber Science and Technology Congress, DASC-PiCom-CBDCom-CyberSciTech 2019 2019.8

　More details

Event date： 2019.8

Language：English

Venue：Fukuoka Country：Japan

This paper proposes a novel file transfer tool to improve file transfer performance over Japan high performance computing and networking (JHPCN). We first develop a high-performance and flexible protocol (HpFP) for inter-datacenter transport network. An original HpFP is designed first for specified networks and puts more emphasis on latency and packet loss tolerances than fairness and friendliness, while an enhanced HpFP is more suitable for real network environments. Then, based on the enhanced HpFP, we implement a file transfer tool, called high-performance copy (HCP). The performance of our file transfer tool is evaluated between datacenters of JHPCN using real datasets collected from supercomputer resources. The results show that the HCP achieves higher throughput than traditional tool for file transfer over JHPCN.
Non-volatile memory driver for applying automated tiered storage with fast memory and slow flash storage

Kazuichi Oe, Takeshi Nanri

6th International Symposium on Computing and Networking Workshops, CANDARW 2018 2018.12

　More details

Event date： 2018.11

Language：English

Venue：Takayama Country：Japan

Automated tiered storage with fast memory and slow flash storage (ATSMF) is a hybrid storage system located between non-volatile memories (NVMs) and solid state drives (SSDs). ATSMF aims to reduce average response time for inputoutput (IO) accesses by migrating concentrated IO access areas from SSD to NVM. However, the current ATSMF implementation cannot reduce average response time sufficiently because of the bottleneck caused by the Linux brd driver, which is used for the NVM access driver. The response time of the brd driver is more than ten times larger than memory access speed. To reduce the average response time sufficiently, we developed a block-level driver for NVM called a 'two-mode (2M) memory driver.' The 2M memory driver has both the. map IO access mode and direct IO access mode to reduce the response time while maintaining compatibility with the Linux device-mapper framework. The direct IO access mode has a drastically lower response time than the Linux brd driver because the ATSMF driver can execute the IO access function of 2M memory driver directly. Experimental results also indicate that ATSMF using the 2M memory driver reduces the IO access response time to less than that of ATSMF using the Linux brd driver in most cases.
Design of a Flexible In Situ Framework with a Temporal Buffer for Data Processing and Visualization of Time-Varying Datasets

Kenji Ono, Jorji Nonaka, Hiroyuki Yoshikawa, Takeshi Nanri, Yoshiyuki Morie, Tomohiro Kawanabe, Fumiyoshi Shoji

International Conference on High Performance Computing, ISC High Performance 2018 2018.1

　More details

Event date： 2018.6

Language：English

Venue：Frankfurt Country：Germany

This paper presents an in situ framework focused on time-varying simulations, and uses a novel temporal buffer for storing simulation results sampled at user-defined intervals. This framework has been designed to provide flexible data processing and visualization capabilities in modern HPC operational environments composed of powerful front-end systems, for pre-and post-processing purposes, along with traditional back-end HPC systems. The temporal buffer is implemented using the functionalities provided by Open Address Space (OpAS) library, which enables asynchronous one-sided communication from outside processes to any exposed memory region on the simulator side. This buffer can store time-varying simulation results, and can be processed via in situ approaches with different proximities. We present a prototype of our framework, and code integration process with a target simulation code. The proposed in situ framework utilizes separate files to describe the initialization and execution codes, which are in the form of Python scripts. This framework also enables the runtime modification of these Python-based files, thus providing greater flexibility to the users, not only for data processing, such as visualization and analysis, but also for the simulation steering.
Design of an In Transit Framework with Staging Buffer for Flexible Data Processing and Visualization of Time-Varying Data

Kenji Ono, Jorji Nonaka, Yoshiyuki Morie, Takeshi Nanri, Tomohiro Kawanabe

ISC WORKSHOP ON IN SITU VISUALIZATION 2018 2018.6

　More details

Event date： 2018.6

Language：English

Venue：Frankfurt Country：Germany
Proposal of Interface for Runtime Memory Manipulation of Applications via PGAS-based Communication Library Invited International conference

Takeshi Nanri

Workshop on PGAS programming models: Experiences and Implementations (PGAS-EI) 2018.1

　More details

Event date： 2018.1

Language：English Presentation type：Oral presentation (invited, special)

Venue：Tokyo Country：Japan
Automated Tiered Storage System Consisting of Memory and Flash Storage to Improve Response Time with Input-Output (IO) Concentration Workloads

Kazuichi Oe, Mitsuru Sato, Takeshi Nanri

5th International Symposium on Computing and Networking, CANDAR 2017 2018.4

　More details

Event date： 2017.11

Language：English

Venue：Aomori Country：Japan

The response time of solid state drives (SSDs) has dramatically reduced according to the spread of non-volatile memory express (NVMe) devices. These devices have response times of less than 100 micro seconds on average. The response time of all-flash-array systems has also drastically reduced through the use of NVMe SSDs. However, there are applications, particularly, virtual desktop infrastructure and in-memory database systems, that require storage systems with even shorter response time. Their workloads were found to contain many input-output (IO) concentrations. We define IO concentration by using a declarative style. Input-output (IO) concentrations are aggregations of IO accesses. They appear in narrow regions of the storage volume and continue for periods of up to about an hour. These narrow regions occupy a few percent of the logical unit number capacity, include most IO accesses, and appear at unpredictable logical block addresses. To drastically reduce the response time of these workloads, we developed automated tiered storage system called 'automated tiered storage with fast memory and slow flash storage' (ATSMF). The memory component of ATSMF is a memory with a non-volatile feature. The system predicts the remaining duration of IO concentration, calculates the response-time increase during migration and response-time decrease after migration, and migrates the IO concentrations if the response-time decrease after migration surpasses the response-time increase during migration. Experimental results indicate that ATSMF is at least 20% faster than flash storage only and its memory access ratio is more than 50%.
Analysis of the quality of academic papers by the words in abstracts

Tetsuya Nakatoh, Kenta Nagatani, Toshiro Minami, Sachio Hirokawa, Takeshi Nanri, Miho Funamori

Thematic track on Human Interface and the Management of Information, held as part of the 19th International Conference on Human–Computer Interaction, HCI International 2017 2017.1

　More details

Event date： 2017.7

Language：English

Venue：Vancouver Country：Canada

The investigation of related research is very important for research activities. However, it is not easy to choose an appropriate and important academic paper from among the huge number of possible papers. The researcher searches by combining keywords and then selects an paper to be checked because it uses an index that can be evaluated. The citation count is commonly used as this index, but information about recently published papers cannot be obtained. This research attempted to identify good papers using only the words included in the abstract. We constructed a classifier by machine learning and evaluated it using cross validation. As a result, it was found that a certain degree of discrimination is possible.
Parallel Application Experiences Using Advanced Communication Primitives International conference

Shinji Sumimoto, Yuichiro Ajima, Takafumi Nose, Kazushige Saga, Naoyuki Shida, Takeshi Nanri

25th Euromicro International Conference on Parallel, Distributed and network-based Processing 2017.3

　More details

Event date： 2017.3

Language：English Presentation type：Oral presentation (general)

Country：Russian Federation
Feasibility study for building hybrid storage system consisting of non-volatile DIMM and SSD

Kazuichi Oe, Takeshi Nanri, Koji Okamura

4th International Symposium on Computing and Networking, CANDAR 2016 2017.1

　More details

Event date： 2016.11

Language：English

Venue：Hiroshima Country：Japan

Various vendors develop a byte accessible Nonvolatile Dual-Inline Memory Module (NVDIMM). The performance of the NVDIMM drastically surpasses that of the Solid State Drive (SSD), which is connected by PCI express. However, the cost of the NVDIMM is much higher than that of the SSD. Therefore, a hybrid storage system between the NVDIMM and SSD is an effective technique for improving cost-performance. If a system uses the NVDIMM less while maintaining performance, its cost-performance should be improved. Our previous work involves on-the-fly automated storage tiering (OTF-AST). OTF-AST is a hybrid storage system consisting of an SSD and HDD. It aims to reduce the average response time of IO accesses by migrating only the IO concentration area to the SSD when IO concentration happens. Therefore, we construct OTF-AST with both the DIMM and SSD and evaluate it in order to understand how to build a cost-effective hybrid storage system with these devices. We use a DIMM instead of a byte accessible NVDIMM, which is difficult to obtain. As a result, we found that the original OTF-AST is suitable for a hybrid storage system consisting of the DIMM and SSD. Moreover, we can improve the performance of OTF-AST if replace its migration algorithm with a more positive migration algorithm. This is because the IO access response time barely increases when the data migration between the DIMM and SSD is done. We will build a more positive migration algorithm in the near future.
Effect of Overlapping Halo Exchange with One-Sided Communication International conference

Takeshi Nanri, Keiichiro Fukazawa

5th JSST Annual Conference International Conference on Simulation Technology 2016.10

　More details

Event date： 2016.10

Language：English Presentation type：Oral presentation (general)

Venue：Kyoto Country：Japan
Development of A Memory Efficient Communication Method for Connecting MPI Programs by using ACP Library International conference

Hiroaki Honda, Yoshiyuki Morie, Takeshi Nanri

5th JSST Annual Conference International Conference on Simulation Technology 2016.10

　More details

Event date： 2016.10

Language：English Presentation type：Oral presentation (general)

Venue：Kyoto Country：Japan
Efficient communications of particle data in particle-based simulations International conference

Ryutaro Susukita, Yoshiyuki Morie, Takeshi Nanri

5th JSST Annual Conference International Conference on Simulation Technology 2016.10

　More details

Event date： 2016.10

Language：English Presentation type：Oral presentation (general)

Venue：Kyoto Country：Japan
Performance Evaluation of MHD Simulation Code with X86 CPUs and Manycore Systems International conference

Keiichiro Fukazawa, Takayuki Umeda, Takeshi Nanri

5th JSST Annual Conference International Conference on Simulation Technology 2016.10

　More details

Event date： 2016.10

Language：English Presentation type：Oral presentation (general)

Venue：Kyoto Country：Japan
Effective calculation with halo communication using halo functions

Keiichiro Fukazawa, Yoshiyuki Morie, Toshiya Takami, Takeshi Nanri, Takeshi Soga

23rd European MPI Users' Group Meeting, EuroMPI 2016 2016.9

　More details

Event date： 2016.9

Language：English

Venue：Edinburgh Country：United Kingdom

The issue of halo communication is the decrease of parallel scalability. To overcome the issues, we have introduced "Halo thread" to our simulation code. However, we have not solved the issue basically in the strong scaling. In this study, we have developed the Halo functions which perform the halo communication effectively. Then we can perform the calculation and communication in a pipeline and obtained good performance.
The Design of Advanced Communication to Reduce Memory Usage for Exa-scale Systems International conference

Shinji Sumimoto, Yuichiro Ajima, Kazushige Saga, Takafumi Nose, Naoyuki Shida, Takeshi Nanri

12th International Meeting On High Performance Computing for Computational Science 2016.9

　More details

Event date： 2016.9

Language：English Presentation type：Oral presentation (general)

Country：Portugal
Improvement of Eisenstat-SSOR preconditioning using tolerance value International conference

Seiji FUJINO, Takeshi Nanri

5th IMA Conference on Numerical Linear Algebra and Optimization 2016.9

　More details

Event date： 2016.9

Language：English Presentation type：Oral presentation (general)

Venue：Birmingham Country：United Kingdom
Effective Calculation with Halo communication using Halo Functions International conference

Keiichiro Fukazawa, Toshiya Takami, Takeshi Soga, Yoshiyuki Morie, Takeshi Nanri

23rd European MPI Users' Group Meeting 2016.9

　More details

Event date： 2016.9

Language：English Presentation type：Oral presentation (general)

Venue：Edinburgh Country：United Kingdom
Runtime Algorithm Selection of Collective Communication with RMA-based Monitoring Mechanism International conference

Takeshi Nanri

4th Annual MVAPICH Users Group Meeting 2016.8

　More details

Event date： 2016.8

Language：English Presentation type：Oral presentation (general)

Venue：Columbus, Ohio Country：United States
NSIM-ACE: An Interconnection Network Simulator for Evaluating Remote Direct Memory Access International conference

Ryutaro Susukita, Yoshiyuki Morie, Takeshi Nanri

International Conference on Simulation and Modeling Methodologies, Technologies and Applications 2016.7

　More details

Event date： 2016.7

Language：English Presentation type：Oral presentation (general)

Country：Portugal
The design of advanced communication to reduce memory usage for exa-scale systems

Shinji Sumimoto, Yuichiro Ajima, Kazushige Saga, Takafumi Nose, Naoyuki Shida, Takeshi Nanri

12th International Conference on High Performance Computing for Computational Science, VECPAR 2016 2017.1

　More details

Event date： 2016.6

Language：English

Venue：Porto Country：Portugal

Current MPI (Message Passing Interface) communication libraries require larger memories in proportion of the number of processes, and can not be used for exa-scale systems. This paper proposes a global memory based communication design to reduce memory usage for exa-scale communication. To realize exa-scale communication, we propose true global memory based communication primitives called Advanced Communication Primitives (ACPs). ACPs provide global address, which is able to use remote atomic memory operations on the global memory, RDMA (Remote Direct Memory Access) based remote memory copy operation, global heap allocator and global data libraries. ACPs are different from the other communication libraries because ACPs are global memory based so that house keeping memories can be distributed to other processes and programmers explicitly consider memory usage by using ACPs. The preliminary result of memory usage by ACPs is 70 MB on one million processes.
Memory Efficient One-Sided Communucation Library "aCP" in Globary Memory on Raspberry Pi 2

Yoshiyuki Morie, Hiroaki Honda, Takeshi Nanri, Taizo Kobayashi, Hidetomo Shibamura, Ryutaro Susukita, Yuichiro Ajima

36th IEEE International Conference on Distributed Computing Systems, ICDCS 2016 2016.8

　More details

Event date： 2016.6

Language：English

Venue：Nara Country：Japan

Previously, communications in parallel programs forHigh Performance Computing (HPC) and Distributed Computing(DC) are mostly written with two-sided communicationinterfaces that are based on a pair of operations, Send andReceive. Since such interface requires explicit synchronizationbetween both sides of the communication, techniquesfor communication optimization such as overlapping are notefficiently described in many cases. On the other hand, onesidedcommunication interface is becoming important as amethod to describe asynchronous communications to enablehighly overlapped communication with computation. As oneof such interface, in this demonstration, Advanced CommunicationPrimitives (ACP) is introduced. ACP is a portableinterface that supports UDP, IBverbs of InfiniBand and Tofulibrary of K Computer. In addition to that, it is designed tobe memory efficient. For example, with 10 thousand processes, the memory consumption of ACP over UDP is estimated to beless than 1MB. Since the number of computational elements isincreasing more rapidly than the amount of available memory, this memory efficiency is becoming one of the keys for parallelprograms in HPC and DC. To show this characteristics, we runACP library on Raspberry Pi 2, and examine its performanceand memory consumption.
Evaluation of On-Demand Message-Passing Module over RDMA Network

Takeshi Nanri

ACSI2016 2016.1

　More details

Event date： 2016.1

Language：English Presentation type：Oral presentation (general)

Venue：Fukuoka Country：Japan
Analysis of Storage Workloads of Input-Output Access Locality and Designing of Hybrid Storage System International conference

Kazuichi Oe, Takeshi Nanri, KOJI OKAMURA

1st International Conference on Enterprise Architecture and Information Systems 2016.1

　More details

Event date： 2016.1

Language：English Presentation type：Oral presentation (general)

Country：Japan
Performance Evaluation of RDMA Communication Patterns by Means of Simulations International conference

Ryutaro Susukita, Yoshiyuki Morie, Takeshi Nanri, Hidetomo Shibamura

2015 Joint International Mechanical, Electronic and Information Technology Conference 2015.12

　More details

Event date： 2015.12

Language：English Presentation type：Oral presentation (general)

Venue：Chonqing Country：China
On-The-Fly Automated Storage Tiering with Caching and both Proactive and Observational Migration International conference

Kazuichi Oe, Takeshi Nanri, KOJI OKAMURA

Workshop on Computer Systems and Architectures (CSA'15) 2015.12

　More details

Event date： 2015.12

Language：English Presentation type：Oral presentation (general)

Venue：Sapporo Country：Japan
直接網において複数の通信デバイスを有効に使用する隣接通信アルゴリズムの提案

森江善之, 南里豪志

2015 ハイパフォーマンスコンピューティングと計算科学シンポジウム 2015.5

　More details

Event date： 2015.5

Language：Japanese Presentation type：Oral presentation (general)

Country：Japan
Performance and memory usage evaluations for channel interface of Advanced Communication Primitives library International conference

Hiroaki Honda, Takeshi Nanri, Yoshiyuki Morie

1st Pan-American Congress on Computational Mechanics (PANACM 2015) 2015.4

　More details

Event date： 2015.4

Language：English Presentation type：Oral presentation (general)

Venue：Buenos Aires Country：Argentina
Channel Interface: a Primitive Model for Memory Efficient Communication International conference

Takeshi Nanri

23rd Euromicro International Conference on Parallel, Distributed and network-based Processing 2015.3

　More details

Event date： 2015.3

Language：English Presentation type：Oral presentation (general)

Venue：Turku Country：Finland
Design and Implementation of Channel Interface as a Memory Efficient Communication Model International conference

Takeshi Nanri

Annual Meeting on Advanced Computing System and Infrastructure (ACSI) 2015 2015.1

　More details

Event date： 2015.1

Language：English Presentation type：Oral presentation (general)

Venue：Tsukuba Country：Japan
Proposal of HINT Interface for Runtime Tuning of Communication Links International conference

Takeshi Nanri

22nd Euromicro International Conference on Parallel, Distributed and network-based Processing 2014.2

　More details

Event date： 2014.2

Language：English Presentation type：Oral presentation (general)

Venue：Turin Country：Italy
性能予測と実測を併用した集団通信アルゴリズム選択

児玉大器, 南里豪志

今後のHPC（基盤技術と応用）に関するワークショップ 2013.12

　More details

Event date： 2013.12

Language：Japanese Presentation type：Oral presentation (general)

Venue：長崎市 Country：Japan
MPI における最適化情報提供のためのインターフェイスに関する評価

南里豪志

今後のHPC（基盤技術と応用）に関するワークショップ 2013.12

　More details

Event date： 2013.12

Language：Japanese Presentation type：Oral presentation (general)

Venue：長崎市 Country：Japan
プログラムのヒント情報を用いた通信ライブラリ動的最適化技術について

杉山裕宣, 南里豪志

今後のHPC（基盤技術と応用）に関するワークショップ 2013.12

　More details

Event date： 2013.12

Language：Japanese Presentation type：Oral presentation (general)

Venue：長崎市 Country：Japan
Performance Study of Non-blocking Collective Communication Implementations Toward Adaptive Selection International conference

Tsuyoshi Okuma, Takeshi Nanri

Networking, Computing, Systems and Software 2013.12

　More details

Event date： 2013.12

Language：English Presentation type：Oral presentation (general)

Venue：Matsuyama Country：Japan
Topology Aware Performance Prediction of Collective Communication Algorithms on Multi-Dimensional Mesh/Torus International conference

Hironobu Sugiyama, Takeshi Nanri

Networking, Computing, Systems and Software 2013.12

　More details

Event date： 2013.12

Language：English Presentation type：Oral presentation (general)

Venue：Matsuyama Country：Japan
通信ライブラリの自動チューニングを支援する Hint API の提案

南里豪志

第141回ハイパフォーマンスコンピューティング研究会 2013.10

　More details

Event date： 2013.10

Language：Japanese Presentation type：Oral presentation (general)

Venue：那覇市 Country：Japan
A　neighbor　communication algorithm with making an effective use　of NICs on multidimensional-mesh/torus International conference

Yoshiyuki Morie, Takeshi Nanri

International Conference on Simulation Technology 2013.9

　More details

Event date： 2013.9

Language：English Presentation type：Oral presentation (general)

Venue：Tokyo Country：Japan
What Communication Library Can do with a Little Hint from Programmers? International conference

Takeshi Nanri

MVAPICH User Group Meeting 2013.8

　More details

Event date： 2013.8

Language：English Presentation type：Oral presentation (general)

Venue：Columbus Country：United States
A Cost-Efficient Approach for Automatic Algorithm Selection of Collective Communications Invited International conference

Takeshi Nanri, Hironobu Sugiyama, FUKAZAWA Keiichiro

SIAM Conference on Computational Science and Engineering 2013.3

　More details

Event date： 2013.2 - 2013.3

Language：English Presentation type：Oral presentation (general)

Venue：Boston Country：United States
多次元メッシュ/トーラスにおけるプロセス配置に応じた集団通信アルゴリズム選択技術の提案

南里豪志, 杉山裕宣, 森江善之

第138回ハイパフォーマンスコンピューティング研究会 2013.2

　More details

Event date： 2013.2

Language：Japanese Presentation type：Oral presentation (general)

Venue：あわら市 Country：Japan
多次元メッシュ/トーラスにおける通信衝突を考慮したタスク配置最適化技術

森江善之, 南里豪志

ハイパフォーマンスコンピューティングと計算科学シンポジウム 2013.1

　More details

Event date： 2013.1

Language：Japanese Presentation type：Oral presentation (general)

Venue：東京 Country：Japan
Performance Prediction Technology for Collective Communication Algorithm on Multi-Dimensional Mesh/Torus International conference

Hironobu Sugiyama, Takeshi Nanri

International workshop on HPC, Krylov Subspace method and its application 2013.1

　More details

Event date： 2013.1

Language：English Presentation type：Oral presentation (general)

Venue：Beppu Country：Japan
Evaluation of Implementation Methods for Non-Blocking Collective Communications in Overlapping Communication and Computation International conference

Tsuyoshi Okuma, Takeshi Nanri

International workshop on HPC, Krylov Subspace method and its application 2013.1

　More details

Event date： 2013.1

Language：English Presentation type：Oral presentation (general)

Venue：Beppu Country：Japan
Task Allocation Method for Avoiding Contentions by the Information of Concurrent Communication International conference

Yoshiyuki Morie, Takeshi Nanri

International workshop on HPC, Krylov Subspace method and its application 2013.1

　More details

Event date： 2013.1

Language：English Presentation type：Oral presentation (general)

Venue：Beppu Country：Japan
Introduction of ACE(Advanced Communication library for Exa) Project Invited International conference

Takeshi Nanri

International workshop on HPC, Krylov Subspace method and its application 2013.1

　More details

Event date： 2013.1

Language：English Presentation type：Oral presentation (general)

Venue：Beppu Country：Japan
通信衝突を削減するタスク配置最適化における通信タイミングの予測方式の影響

森江善之, 南里豪志

第194回計算機アーキテクチャ・第137回ハイパフォーマンスコンピューティング合同研究発表会（HOKKE-20） 2012.12

　More details

Event date： 2012.12

Language：Japanese Presentation type：Oral presentation (general)

Venue：札幌市 Country：Japan
An Alternative Domain Decomposition Technique for CUDA-based 3D FDTD Methods International conference

Matthew Livesey, James Francis Stack, Jr., Fumie Costen, Takeshi Nanri, Norimasa Nakashima, Seiji FUJINO

9th European Radar Conference 2012.11

　More details

Event date： 2012.10 - 2012.11

Language：English Presentation type：Oral presentation (general)

Venue：Amsterdam Country：Netherlands
Tofu ネットワークにおけるプロセス配置形状による集団通信アルゴリズムの性能解析,

南里豪志

ハイパフォーマンスコンピューティング研究発表会 2012.10

　More details

Event date： 2012.10

Language：Japanese Presentation type：Oral presentation (general)

Venue：那覇市 Country：Japan

スーパーコンピュータの大規模化に伴って，ノード間インターコネクトネットワークとして，コストの低い多次元メッシュ/トーラストポロジを採用したものを用いる事例が増えている．多次元メッシュ/トーラスは，使用するノード数が同じでも，プロセスが配置されるノード群の形状によって性能が大きく変動する．本研究では，京コンピュータや，その互換機である Fujitsu PRIMEHPC FX10で用いられている Tofuインターコネクトネットワークを対象として，プロセス配置の形状による集団通信アルゴリズムの性能への影響を計測した．得られた性能を，Tofuインターコネクトの性能解析ツールを用いて取得した通信衝突による転送待ち時間と比較したところ，プロセス配置形状による変動がどちらもほぼ同じ傾向を示すことを明らかにした．これらの結果から，集団通信アルゴリズムの選択において，プロセス配置の形状を考慮した性能見積もりが重要であることを示した．
異なるスカラアーキテクチャ（x86、SPARC64）の電磁流体コードによる性能評価

深沢圭一郎, 南里豪志, 高見利也

ハイパフォーマンスコンピューティング研究発表会 2012.10

　More details

Event date： 2012.10

Language：Japanese Presentation type：Oral presentation (general)

Venue：那覇市 Country：Japan
Impact of GPU Memory Access Patterns on FDTD International conference

Matthew Livesey, James Francis Stack, Jr., Fumie Costen, Takeshi Nanri, Norimasa Nakashima, Seiji FUJINO

IEEE Antennas and Propagation Society International Symposium (APSURSI) 2012.7

　More details

Event date： 2012.7

Language：English Presentation type：Oral presentation (general)

Venue：Chicago Country：United States
Efficient Runtime Algorithm Selection of Collective Communication with Topology-Based Performance Models International conference

Takeshi Nanri, Motoyoshi Kurokawa

International Conference on Parallel and Distributed Processing Techniques and Applications 2012.7

　More details

Event date： 2012.7

Language：English Presentation type：Oral presentation (general)

Venue：Las Vegas Country：United States
Effective Performance of Large-Scale MHD Simulation for Planetary Magnetosphere with Massively Parallel Computer International conference

FUKAZAWA Keiichiro, Takeshi Nanri

JSST2012 International Conference on Simulation Technology 2012.7

　More details

Event date： 2012.7

Language：English Presentation type：Oral presentation (general)

Venue：Kobe Country：Japan
Balancing Communication and Execution Technique for Parallelized Sparse Matrix-Vector Multiplication International conference

Seiji FUJINO, Takeshi Nanri, Kenichirou Kusaba

4th International Conference on Future Computational Technologies and Applications 2012.7

　More details

Event date： 2012.7

Language：English Presentation type：Oral presentation (general)

Venue：Nice Country：France
Task Allocation Optimization for Neighboring Communication on Fat Tree International conference

Yoshiyuki Morie, Takeshi Nanri

14th IEEE International Conference on High Performance Computing and Communication 2012.6

　More details

Event date： 2012.6

Language：English Presentation type：Oral presentation (general)

Venue：Liverpool Country：United Kingdom
Performance of Large Scale MHD Simulation of Global Planetary Magnetosphere with Massively Parallel Scalar Type Supercomputer Including Post Processing International conference

FUKAZAWA Keiichiro, Takeshi Nanri

14th IEEE International Conference on High Performance Computing and Communication 2012.6

　More details

Event date： 2012.6

Language：English Presentation type：Oral presentation (general)

Venue：Liverpool Country：United Kingdom
MPI Allreduce の「京」上での実装と評価

松本幸，安達知也，住元真司，曽我武史，南里豪志，宇野篤也，黒川原佳，庄司文由，横川三津夫

先進的計算基盤システムシンポジウム（SACSIS2012） 2012.5

　More details

Event date： 2012.5

Presentation type：Oral presentation (general)

Venue：神戸 Country：Japan
並列FMOプログラムOpenFMOの性能最適化

稲富雄一、眞木淳、高見利也、本田宏明、小林泰三、南里豪志、青柳睦、南一生

第133回ハイパフォーマンスコンピューティング研究会 2012.3

　More details

Event date： 2012.3

Presentation type：Oral presentation (general)

Venue：神戸 Country：Japan
ランク配置に応じた集団通信アルゴリズム動的選択技術の提案

南里豪志、黒川原佳

第133回ハイパフォーマンスコンピューティング研究会 2012.3

　More details

Event date： 2012.3

Presentation type：Oral presentation (general)

Country：Japan
スケーラブルな通信ライブラリ実装技術

南里豪志

第8回戦略的高性能計算システム開発に関するワークショップ 2012.2

　More details

Event date： 2012.2

Language：Japanese Presentation type：Oral presentation (general)

Venue：東京 Country：Japan
通信ライブラリにおける実行時自動チューニング技術 Invited

南里豪志

第3回自動チューニング技術の現状と応用に関するシンポジウム 2011.12

　More details

Event date： 2011.12

Presentation type：Oral presentation (general)

Venue：東京大学 Country：Japan
MPI Allreduce の「京」上での実装と評価

松本幸，安達知也，田中稔，住元真司，曽我武史，南里豪志

第19回ハイパフォーマンスコンピューティングとアーキテクチャの評価に関する北海道ワークショップ 2011.11

　More details

Event date： 2011.11

Presentation type：Oral presentation (general)

Country：Japan
Effect of Dynamic Algorithm Selection of All-to-All Communication on Environments with Unstable Network Speed International conference

Takeshi Nanri and Motoyoshi Kurokawa

International Conference on High Performance Computing & Simulation, 2011.7

　More details

Event date： 2011.7

Presentation type：Oral presentation (general)

Venue：Istanbul Country：Turkey
A Method for Predicting a Penalty of Contentions by Considering Priorities of Routing among Packets on Direct Interconnection Network International conference

Yoshiyuki Morie, Takeshi Nanri, Ryutaro Susukita and Koji Inoue,

International Joing Conference on Computational Sciences and Optimization 2011 2011.4

　More details

Event date： 2011.4

Presentation type：Oral presentation (general)

Venue：Kunming Country：China
Task Allocation Method for Avoiding Contentions by the Information of Concurrent Communications International conference

Yoshiyuki Morie, Takeshi Nanri, and Motoyoshi Kurokawa

The Tenth IASTED International Conference on Parallel and Distributed Computing and Networks 2011.2

　More details

Event date： 2011.2

Presentation type：Oral presentation (general)

Venue：Innsbruck Country：Austria
通信と計算の負荷を考慮した並列疎行列ベクトル積の動的負荷分散技術

草場健一郎，南里豪志，藤野清次

2010年並列／分散／協調処理に関する『金沢』サマー・ワークショップ 2010.8

　More details

Event date： 2010.8

Presentation type：Oral presentation (general)

Venue：金沢 Country：Japan
Runtime Load-balancing Technique for Sparse Matrix-Vector Multiplication International conference

Kenichiro Kusaba, Takeshi Nanri and Seiji Fujino

International Workshop on Innovative Architecture 2010.3

　More details

Event date： 2010.3

Presentation type：Oral presentation (general)

Venue：Kona Country：United States
A Robust Dynamic Optimization for MPI Alltoall Operation International conference

Hyacinthe Nzigou Mamadou, Takeshi Nanri, and Kazuaki Murakami

18th International Heterogeneity in Computing Workshop 2009.5

　More details

Event date： 2009.5

Presentation type：Oral presentation (general)

Venue：Rome Country：Italy
階層型並列計算機向けPAGMEつきCG法の実装と性能解析

馬場慎也，南里豪志，藤野清次，染原一仁

計算工学講演会 2009.5

　More details

Event date： 2009.5

Presentation type：Oral presentation (general)

Venue：東京 Country：Japan

Implementation and Performance Evaluation of Parallelized CG method with PAGME - Preconditioning Method on Hierarchical Parallel Computers
A Dynamic Solution for Efficient MPI Collective Communications International conference

Hyacinthe Nzigou Mamadou, Feng Long Gu, Vivien Oddou, Takeshi Nanri, Kazuaki Murakami

International Workshop on HPC and Grid Applications 2009.4

　More details

Event date： 2009.4

Presentation type：Oral presentation (general)

Venue：Sanya, Hainan Country：China
Proﬁling Technique for Dynamic Optimization According to Waiting Time International conference

Takeshi Soga, Takeshi Nanri, Motoyoshi Kurokawa and Kazuaki Murakami

HPC Asia 2009.3

　More details

Event date： 2009.3

Presentation type：Oral presentation (general)

Venue：Kaohsiung Country：Taiwan, Province of China
Dependence on loop distribution of performance in hybrid-parallel IDR(s) method International conference

Shinya Baba, Yusuke Onoue, Takeshi Nanri and Seiji Fujino

HPC Asia 2009.3

　More details

Event date： 2009.3

Presentation type：Oral presentation (general)

Venue：Kaohsiung Country：Taiwan, Province of China
並列版 PAGME つき CG 法の性能解析

馬場慎也, 南里豪志, 藤野清次, 染原一仁

情報処理学会ハイパフォーマンスコンピューティング研究会 2008.12

　More details

Event date： 2008.12

Presentation type：Oral presentation (general)

Venue：福岡 Country：Japan

Performance analysis of the CG method with parallelized PAGME
性能モデルによる予測を併用した Alltoallアルゴリズム動的選択技術の評価

南里豪志, Hyacinthe Nzigou Mamadou, Feng Long Gu, 村上和彰

情報処理学会ハイパフォーマンスコンピューティング研究会 2008.12

　More details

Event date： 2008.12

Presentation type：Oral presentation (general)

Venue：福岡 Country：Japan

Evaluation of Dynamic Algorithm Selection with Performance Prediction Models on Alltoall Operation
ハイブリッド並列化したIDR(s)法の計算時間に対するプロセス数とスレッド数の組み合わせ依存性について

馬場慎也、南里豪志、藤野清次

情報処理学会ハイパフォーマンスコンピューティング研究会 2008.5

　More details

Event date： 2008.5

Presentation type：Oral presentation (general)

Venue：東京都 Country：Japan

Dependence on combination with number of processes and threads for com-
putation times of hybrid-parallel version of IDR(s) Method
Effect of Reordering Internal Messages in MPI Broadcast According to the Load Imbalance International conference

Takesi Soga, Takeshi Nanri, Motoyoshi Kurokawa and Kazuaki Murakami

IWIA '08 2008.1

　More details

Presentation type：Oral presentation (general)

Venue：Hiro Country：United States
Performance Analysis and Linear Optimization Modeling of All-to-all Collective Communication Algorithms International conference

Hyacinthe Nzigou Mamadou, Takeshi Nanri and Kazuaki Murakami

SBAC-PAD 2007 2007.10

　More details

Presentation type：Oral presentation (general)

Venue：Gramad Country：Brazil
Dynamic Optimization of Load Balance in MPI Broadcast International conference

Takesi Soga, Kouji Kurihara, Takeshi Nanri, Motoyoshi Kurokawa and Kazuaki Murakami

Euro PVM/MPI 2007 2007.10

　More details

Venue：Paris Country：France
SMMH - A Parallel Heuristic for Combinatorial Optimization Problems International conference

Guilherme Domingues, Yoshiyuki Morie, Feng Long Gu , Takeshi Nanri and Kazuaki Murakami

International Conference on Computational Methods in Science and Engineering 2007 2007.9

　More details

Presentation type：Oral presentation (general)

Venue：Corfu Country：Greece
Investigating the Performance of Collective Communications on SMP Clusters: A Case for MPI_Allgather International conference

Feng Long Gu, Hyacinthe Nzigou Mamadou, Guilherme Domingues, Takeshi Nanri and Kazuaki Murakami

International Conference on Computational Methods in Science and Engineering 2007 2007.9

　More details

Presentation type：Oral presentation (general)

Venue：Corfu Country：Greece
Evaluation of the Performance of Parallel Sparse-Matrix Multiplication and the Effect of Dynamic Load-Balancing International conference

Takeshi Nanri, Takeshi Soga, Koji Kurihara, Feng Long Gu, Hiroaki Ishihata and Kazuaki Murakami

International Conference on Computational Methods in Science and Engineering 2007 2007.9

　More details

Presentation type：Oral presentation (general)

Venue：Corfu Country：Greece
A Study of All-to-all Collective Communication Algorithms on Modern High Performance System Architectures International conference

Hyacinthe Nzigou Mamadou, Feng Long Gu, Takeshi Nanri, Kazuaki Murakami

High Performance Computing International Conference (HPC Asia) 2007 2007.9

　More details

Presentation type：Oral presentation (general)

Venue：Seoul Country：Korea, Republic of
負荷ばらつきを考慮したMPIブロードキャスト通信の動的最適化に関する研究

栗原康志，Hyacinthe Nzigou Mamadou，南里豪志，末安直樹，松本透，井上弘士，村上和彰

SWoPP2007 2007.8

　More details

Presentation type：Oral presentation (general)

Venue：旭川市 Country：Japan
通信タイミングを考慮した衝突削減のためのMPIランク配置最適化技術

森江善之, 末安直樹, 松本透, 南里豪志, 石畑宏明, 井上弘士, 村上和彰

先進的計算基盤システムシンポジウム (SACSIS2007) 2007.5

　More details

Presentation type：Oral presentation (general)

Venue：東京 Country：Japan
通信タイミングを考慮したMPI ランク配置最適化技術

森江善之, 末安直樹, 松本透, 南里豪志, 石畑宏明, 井上弘士, 村上和彰

HOKKE2007 2007.3

　More details

Presentation type：Oral presentation (general)

Venue：札幌市 Country：Japan
Collective Communication Costs Analysis over Gigabit Ethernet and InfiniBand International conference

Hyacinthe Nzigou Mamadou, Takeshi Nanri and Kazuaki Murakami

High Performance Computing - HiPC 2006 2006.12

　More details

Presentation type：Oral presentation (general)

Country：India
Implementation of GAMESS on Parallel Computers: TCP/IP versus MPI International conference

Feng Long Gu, Takeshi Nanri and Kazuaki Murakami

International Conference of Computational Methods in Sciences and Engineering 2006.10

　More details

Presentation type：Oral presentation (general)

Country：Greece
並列計算機の大規模化に向けた MPI の Alltoall通信アルゴリズムの性能評価

南里豪志

第10回環瀬戸内応用数理研究部会シンポジウム 2006.7

　More details

Presentation type：Oral presentation (general)

Venue：沖縄県 Country：Japan
Performance comparison of vector-calculations between Itanium2 and other processors International conference

T. Nanri, Y. Watanabe, H. Sato

International Workshop on Innovative Architecture 2005.1

　More details

Presentation type：Oral presentation (general)

Venue：ハワイ Country：United States
Design and Implementation of an Adaptive Distributed Shared Memory System International conference

Takeshi Nanri, Hiroyuki Sato and Masaaki Shimasaki

International Conference of Parallel and Distributed Computing and Systems 2001.8

　More details

Venue：Anaheim Country：United States
Preliminary Investigation of Distributed Shared Memory System on a Cluster of High Performance Clusters International conference

Takeshi Nanri, Yoshitaka Watanabe, Hiroyuki Sato and Masaaki Shimasaki

European Congress on Computational Methods in Applied Sciences and Engineering 2000.9

　More details

Venue：Barcelona Country：Spain
Effects of Scheduling Attributes on Multithread-Based Software DSM System International conference

Takeshi Nanri, Hiroyuki Sato and Masaaki Shimasaki

Workshop on Scheduling Algorithms for Parallel/Distributed Computing 1999.7

　More details

Venue：Rhodes Country：Greece
Implementation of PVM-based Distributed Shared Memory System International conference

Takeshi Nanri, Hiroyuki Sato and Masaaki Shimasaki

International Conference on Parallel and Distributed Processing Techniques and Applications 1998.7

　More details

Venue：Las Vegas Country：United States
非ブロッキング集団通信の通信隠蔽効果に関する調査

Takeshi Nanri, Satoshi Ohshima, Kenji Ono

2017.12

　More details

Language：Japanese

Country：Other
スーパーコンピュータシステムITOの性能評価

Satoshi Ohshima, Takeshi Nanri, Yoshitaka Watanabe, Hirofumi Amano, Kenji Ono

2017.12

　More details

Language：Japanese

Country：Other
Attribute-based quality classification of academic papers

Tetsuya Nakatoh, Sachio Hirokawa, Toshiro Minami, Takeshi Nanri, Miho Funamori

2017.11

　More details

Language：English

Country：Other

Investigating the relevant literature is very important for research activities. However, it is difficult to select the most appropriate and important academic papers from the enormous number of papers published annually. Researchers search paper databases by combining keywords, and then select papers to read using some evaluation measure—often, citation count. However, the citation count of recently published papers tends to be very small because citation count measures accumulated importance. This paper focuses on the possibility of classifying high-quality papers superficially using attributes such as publication year, publisher, and words in the abstract. To examine this idea, we construct classifiers by applying machine-learning algorithms and evaluate these classifiers using cross-validation. The results show that our approach effectively finds high-quality papers.

▼display all

MISC

高スループット非同期集団通信の性能モデル化に向けた予備評価

森江善之, 和田康孝, 小林諒平, 坂本龍一, 南里豪志

情報処理学会研究報告(Web) 2025 ( HPC-198 ) 2025

　More details

J-GLOBAL

researchmap
スーパーコンピュータ玄界の性能評価

大島聡史, 南里豪志, 美添一樹

情報処理学会研究報告(Web) 2024 ( HPC-196 ) 2024

　More details

J-GLOBAL

researchmap
Implementation and Performance Evaluation of Discontinuous Data Transfer in Halo Communication with Tofu Interconnect

有迫廉真, 南里豪志

情報処理学会研究報告(Web) 2024 ( HPC-195 ) 2024

　More details

J-GLOBAL

researchmap
Introduction of multi-site sharing experiment of ultra-high-resolution meteorological satellite images using “JHPCN Wide Area Distributed Cloud“ and tiled displays

川鍋友宏, 村田健史, 山本和憲, 村永和哉, 樋口篤志, 豊嶋紘一, 深沢圭一郎, 小野謙二, 南里豪志

日本地球惑星科学連合大会予稿集(Web) 2022 2022

　More details

J-GLOBAL

researchmap
FX100における永続型集団通信関数のプロトタイプ実装と評価

森江善之, 畑中正行, 高木将通, 堀敦史, 石川裕, 南里豪志

情報処理学会研究報告(Web) 2018.2

　More details

Language：Japanese

FX100における永続型集団通信関数のプロトタイプ実装と評価
スーパーコンピュータシステムITOの性能評価

大島聡史, 南里豪志, 渡部善隆, 天野浩文, 小野謙二

情報処理学会研究報告(Web) 2017.12

　More details

Language：Japanese

スーパーコンピュータシステムITOの性能評価
ACPライブラリの通信性能およびメモリ使用量の評価

森江善之, 森江善之, 本田宏明, 本田宏明, 南里豪志, 南里豪志

情報処理学会研究報告(Web) 2016.2

　More details

Language：Japanese

ACPライブラリの通信性能およびメモリ使用量の評価
ACPライブラリによるMPI_Comm_spawnの置き換えとOpenFMOへの適用

本田宏明, 森江善之, 南里豪志, 稲富雄一, 高見利也, 本田宏明, 森江善之, 南里豪志, 稲富雄一, 高見利也

情報処理学会研究報告(Web) 2016.2

　More details

Language：Japanese

ACPライブラリによるMPI_Comm_spawnの置き換えとOpenFMOへの適用
ステンシル計算における効率的なHalo通信・計算モデルの開発

深沢圭一郎, 深沢圭一郎, 森江善之, 森江善之, 曽我武史, 曽我武史, 高見利也, 高見利也, 南里豪志, 南里豪志

情報処理学会研究報告(Web) 2016.2

　More details

Language：Japanese

Development of Effective Halo Communication and Calculation Model on Stencil Computation
ACP通信ライブラリを用いたOpenFMOプログラムの実装

本田宏明, 本田宏明, 森江善之, 森江善之, 南里豪志, 南里豪志, 稲富雄一, 稲富雄一, 高見利也, 高見利也

日本コンピュータ化学会年会講演予稿集 2015.10

　More details

Language：Japanese

ACP通信ライブラリを用いたOpenFMOプログラムの実装
エクサスケールコンピューティングに向けたHaloスレッドの電磁流体シミュレーションに対する効果

深沢圭一郎, 森江善之, 曽我武史, 高見利也, 南里豪志, 深沢圭一郎, 森江善之, 曽我武史, 高見利也, 南里豪志

情報処理学会研究報告(Web) 2015.9

　More details

Language：Japanese

Effects of Halo Thread to the Magnetohydrodynamic Simulation toward Exascale Computing
RDMAにおける同期通信のインターコネクトシミュレーション

薄田竜太郎, 森江善之, 南里豪志, 柴村英智

電子情報通信学会技術研究報告 2015.7

　More details

Language：Japanese

Interconnection Network Simulation of Synchronization Communication in RDMA
InfiniBandによるACP基本層の実装と評価

森江善之, 南里豪志, 安島雄一郎, 本田宏明, 曽我武史, 小林泰三, 住元真司, 森江善之, 南里豪志, 安島雄一郎, 本田宏明, 曽我武史, 小林泰三, 住元真司

情報処理学会研究報告(Web) 2015.2

　More details

Language：Japanese

Implementation and Evaluation of ACP Basic layer
ACPライブラリの集団通信インターフェース

本田宏明, 本田宏明, 山田博厚, 森江善之, 森江善之, 南里豪志, 南里豪志, 高見利也, 高見利也

情報処理学会研究報告(Web) 2015.2

　More details

Language：Japanese

ACPライブラリの集団通信インターフェース
RDMA評価のための大規模インターコネクトシミュレータ「NSIM‐ACE」

薄田竜太郎, 森江善之, 南里豪志, 柴村英智

情報処理学会研究報告(Web) 2014.12

　More details

Language：Japanese

NSIM-ACE: A Simulator for Evaluating RDMA on Large-Scale Interconnection Networks
多次元メッシュ/トーラスにおけるプロセス配置に応じた集団通信アルゴリズム選択技術の提案

南里豪志, 杉山裕宣, 森江善之

情報処理学会研究報告(CD-ROM) 2013.4

　More details

Language：Japanese

Proposal of a Method for Selecting Algorithm of Collective Communications on Multi-Dimensional Mesh/Torus
通信衝突を削減するタスク配置最適化における通信タイミングの予測方式の影響

森江善之, 南里豪志

情報処理学会研究報告(CD-ROM) 2013.2

　More details

Language：Japanese

通信衝突を削減するタスク配置最適化における通信タイミングの予測方式の影響
通信衝突削減のためのタスク配置最適化の評価

森江善之, 南里豪志, 石畑宏明, 井上弘士, 村上和彰

情報処理学会研究報告 2008.3

　More details

Language：Japanese

Evaluation of optimization of task allocation for reducing contentions
OpenMP入門(4)

南里豪志

計算工学 2007.7

　More details

Language：Japanese Publishing type：Article, review, commentary, editorial, etc. (scientific journal)
OpenMP入門(3)

南里豪志

計算工学 2007.4

　More details

Language：Japanese Publishing type：Article, review, commentary, editorial, etc. (scientific journal)
通信タイミングを考慮したランク配置最適化技術

森江善之, 末安直樹, 松本透, 南里豪志, 石畑宏明, 井上弘士, 村上和彰

情報処理学会研究報告 2007.3

　More details

Language：Japanese

Optimization of rank allocation considerin communication timing
OpenMP入門(2)

南里　豪志

計算工学 2007.1

　More details

Language：Japanese Publishing type：Article, review, commentary, editorial, etc. (scientific journal)
OpenMP入門(1)

南里　豪志

計算工学 2006.10

　More details

Language：Japanese Publishing type：Article, review, commentary, editorial, etc. (scientific journal)
MPIによる並列プログラミング入門

南里　豪志

プラズマ・核融合学会誌 2003.8

　More details

Language：Japanese Publishing type：Article, review, commentary, editorial, etc. (scientific journal)

▼display all

Industrial property rights

Patent	Number of applications: 1	Number of registrations: 1
Utility model	Number of applications: 0	Number of registrations: 0
Design	Number of applications: 0	Number of registrations: 0
Trademark	Number of applications: 0	Number of registrations: 0

Professional Memberships

情報処理学会
IEEE

Committee Memberships

電子情報通信学会九州支部庶務幹事 Domestic

2019.4 - 2021.3
IEEE福岡支部 secretary Domestic

2019.4 - 2021.3
情報処理学会ハイパフォーマンスコンピューティング研究会 Steering committee member Domestic

2016.4 - 2020.3

Academic Activities

Program Chair International contribution

7th International Workshop on Large-scale HPC Application Modernization （ Japan ） 2020.11

　More details

Type：Competition, symposium, etc.
Track chair International contribution

HPC Asia 2020 （ Fukuoka Japan ） 2020.1

　More details

Type：Competition, symposium, etc.
実行副委員長

AXIES2019 （ Japan ） 2019.12

　More details

Type：Competition, symposium, etc.
企画委員

男女共同参画シンポジウム（ Japan ） 2019.9

　More details

Type：Competition, symposium, etc.
電子情報通信学会誌

2019.4 - 2021.3

　More details

Type：Academic society, research group, etc.
Local Arrangement

ACSI2016 （ Fukuoka Japan ） 2016.1

　More details

Type：Competition, symposium, etc.

Number of participants：110
Committee International contribution

International Workshop LENS (Language, Network and System Software) 2015 （ Tokyo Japan ） 2015.10

　More details

Type：Competition, symposium, etc.

Number of participants：50
座長（Chairmanship）

2009年並列／分散／協調処理に関する『仙台』サマー・ワークショップ（ Japan ） 2009.8

　More details

Type：Competition, symposium, etc.
実行委員

SWoPP2007 （ Japan ） 2007.8 - Present

　More details

Type：Competition, symposium, etc.

Number of participants：200
実行委員長

SWoPP2006 （ Japan ） 2006.7 - Present

　More details

Type：Competition, symposium, etc.

Number of participants：200

▼display all

Research Projects

高スケーラブル並列計算の実現に向けた作業用コア共有型オーバラップ技術の開発

Grant number：26K14845 2026.4 - 2029.3

Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (C)

南里豪志

　 More details

Grant type：Scientific research funding

CiNii Research
数値シミュレーションと機械学習の効率的な連成計算手法の研究開発とPINNsへの応用

Grant number：25K15146 2025.4 - 2028.3

Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (C)

深沢圭一郎, 南里豪志

　 More details

Grant type：Scientific research funding

本研究は数値シミュレーションと高頻度に大量のその出力データを必要とするML/AI処理とのメモリ型連成計算を実現する手法を開発する。また、開発された手法を使い物理シミュレーションと物理情報ニューラルネットワーク（PINNs）を連成させ、物理法則を満たした高精度のサロゲートモデルの構築を目指す。高頻度出力データを扱うには、I/O型連成では困難であり、既存のメモリ型連成計算フレームワークにもそれを扱う手法は存在しない。また、高次元でダイナミックに変化する物理シミュレーションではPINNsを用いたサロゲートモデルの高精度化が難しい。本研究ではこれらを解決する手法を研究開発する。

CiNii Research
次世代計算基盤に係る調査研究（文部科学省）

2022.7 - 2024.3

理化学研究所

　 More details

Authorship：Coinvestigator(s)

次世代計算基盤には、SDGs・Society 5.0の実現に向けた課題解決のためのプラットフォームとしての役割が求められる。そこで、今後の科学に「研究DX」をもたらす高度なデジタルツイン実現の基盤として、広範な計算手法・シミュレーション技法や大規模データを駆使しつつ、それらが密に連携しながら全体のワークフロー実行が可能な汎用性の高い計算基盤の実現を目指し、あるべきアーキテクチャやシステムソフトウェア・ライブラリ技術について、アプリケーションとのコデザインを通じた調査研究を行う。
特に、システム設計の基本理念として演算精度も考慮しながら必要な計算性能を確保し、電力制約の下でデータ移動を高度化・効率化する「FLOPS to Byte」指向のシステム構築を、アーキテクチャ開発からアルゴリズム設計、アプリケーション技術に至るまで実践する。
ALL Japan体制のもと、実効的な性能を向上させる次世代計算基盤のシステム構成や要素技術の調査検討、要素技術の開発を、アーキテクチャ・システムソフトウェアとアプリケーションとのコデザインを通じて実施する。
Implementation of Efficient Asynchronously Coupled Computation with Timed Buffer on NVDIMM

Grant number：22K12049 2022 - 2024

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (C)

NANRI TAKESHI

　 More details

Authorship：Principal investigator Grant type：Scientific research funding

We designed and implemented an interface for a time-series buffer area intended for asynchronous coupled simulations. In asynchronous coupled simulations, data generated by one program for each version is accessed as needed by another program. To facilitate this, we designed an interface based on a producer-consumer model, in which the program generating the data is registered as the producer and the program referencing the data is registered as the consumer, with operations tailored to each role. Using this interface, we successfully implemented in-situ visualization, a type of coupled simulation, and confirmed its functionality. Furthermore, by introducing NVIDIA's latest network card, BlueField-2, communication time was reduced, thereby enhancing practical usability.

CiNii Research
システムソフトウェア・ライブラリ調査研究

2022 - 2023

文部科学省次世代計算基盤に係る調査研究事業

　 More details

Authorship：Coinvestigator(s) Grant type：Contract research
不揮発性メモリへ高効率にＲＤＭＡする技術の研究・開発

2020.10 - 2021.3

Joint research

　 More details

Authorship：Principal investigator Grant type：Other funds from industry-academia collaboration
量子計算及びイジング計算システムの統合型研究開発（NEDO）

2020.4 - 2027.3

産業技術総合研究所

　 More details

Authorship：Coinvestigator(s)

超スマート社会の実現のため、先進的なモビリティサービスやスマートファクトリ、金融、創薬など多様な産業分野におけるディジタライゼーションの進展と、これに伴う高性能次世代コンピューティングに対する社会的要請が急激に高まっている。本プロジェクトにおいては、3つのNEDO プロジェクト「超伝導パラメトロン素子を用いた量子アニーリング技術の研究開発」（2018年度〜）、「イジングマシン共通ソフトウェア基盤の研究開発」（2018年度〜）、「超伝導体・半導体技術を融合した集積量子計算システムの開発」（2020年度〜）を2021年4月に統合し、フルスタック型の統合型研究開発を産学官連携に基づいて実施する
量子計算及びイジング計算システムの統合型研究開発

2020 - 2027

NEDO 高効率・高速処理を可能とするAIチップ・次世代コンピューティングの技術開発

　 More details

Authorship：Coinvestigator(s) Grant type：Contract research
不揮発性メモリへ高効率にＲＤＭＡする技術の研究・開発

2019.9 - 2020.3

Joint research

　 More details

Authorship：Principal investigator Grant type：Other funds from industry-academia collaboration
NVDIMM上の通信バッファによるスケーラブルな非同期通信レイヤの開発

Grant number：19K11991 2019 - 2021

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (C)

南里豪志

　 More details

Authorship：Principal investigator Grant type：Scientific research funding

DIMMスロットに装着可能な不揮発性メモリNVDIMMは、DRAMより省電力かつ安価で大容量化が容易なメモリデバイスとして注目されている。本研究では、このNVDIMMを通信ライブラリ内部のバッファ領域として用いる通信レイヤを開発する。これにより、大規模並列計算機での非ブロッキング一対一通信による通信隠蔽が可能となるため、並列アプリケーションのスケーラビリティ向上が期待できる。また、DRAM上バッファとNVDIMM上バッファを、通信頻度等の実行時の状況に応じて切り替えることにより、1～10μ秒と予想されているNVDIMMの遅延時間による性能への影響の軽減を図る。

CiNii Research
超並列において高スケーラビリティを実現するステンシル計算・通信モデルの開発

2018 - 2020

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (C)

　 More details

Authorship：Coinvestigator(s) Grant type：Scientific research funding
エクサスケールスパコンの省エネ化に向けたシステム電力管理戦略の研究

2018 - 2020

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B)

　 More details

Authorship：Coinvestigator(s) Grant type：Scientific research funding
Development of Time-Reversal Method for Detecting Multiple Moving Targets Behind the Wall International coauthorship

2017.4 - 2018.3

JHPCN (Japan)

　 More details

Authorship：Principal investigator

There are many imaging systems in the world for see-through the wall or cancer detection such as MRI for medical imaging. However the current technologies are not cheap nor not available everywhere. One of the cheap alternatives to such expensive systems is microwave imaging using the Time Reversal (TR) method which was first introduced in acoustics. TR has found applications in various disciplines ranging from non-destructive testing, underwater communications and medicine. TR has also been studied for Ground Penetrating Radar (GPR) as well as Through the Wall Imaging (TWI). The TR method with some super-resolution techniques such as Decomposition Of the Time-Reversal Operator (DORT in its French acronym) or MUltiple SIgnal Classification (MUSIC) requires more than 150 Fast Fourier Transform and more than 20000 singular value decomposition for a very small imaging system which consists of 13 antenna elements. Therefore the current approach is far from the real-time system due to the long computational time. Furthermore there is a high demand on the detection of multiple moving targets but the work in this field is scarce. The detection of multiple moving targets behind the wall is the one of the most challenging scenarios in through-the-wall microwave imaging. So far Fumie Costen at University of Manchester has developed the spatio-temporal windowing for the differential MDM (multi-static data matrix) for time reversal algorithm to detect multiple moving objects in a simple canonical case. This project will develop and verify an algorithm to detect the multiple moving targets with high computational efficiency.
スケーラブル通信ライブラリを用いた次世代惑星電磁圏連成計算技術の創出

2017 - 2019

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Challenging Research(Exploratory)

　 More details

Authorship：Coinvestigator(s) Grant type：Scientific research funding
MPI向け準備型集団通信インタフェースの研究開発

2015 - 2017

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (C)

　 More details

Authorship：Principal investigator Grant type：Scientific research funding
並列言語ＣＡＦプログラム向け通信隠蔽技術の研究開発

Grant number：24500068 2012 - 2014

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (C)

　 More details

Authorship：Principal investigator Grant type：Scientific research funding
省メモリ技術と動的最適化技術によるスケーラブル通信ライブラリの開発（JST CREST 研究領域「ポストペタスケール高性能計算に資するシステムソフトウェア技術の創出」）

2011.10 - 2017.3

九州大学（日本）

　 More details

Authorship：Principal investigator

Within a decade, the number of processing cores on supercomputers is predicted to be more than 100 million. This project researches technologies for memory saving and runtime optimizations to implement a scalable communication library that will be required on such large scale computers. In addition to that, the project also develops methods for building scalable applications by utilizing facilities of the communication library.
省メモリ技術と動的最適化技術によるスケーラブル通信ライブラリの開発

2011 - 2016

Grants-in-Aid for Scientific Research 戦略的創造研究推進事業

　 More details

Authorship：Principal investigator Grant type：Competitive funding other than Grants-in-Aid for Scientific Research
1億コア超の大規模並列計算環境に耐える通信ライブラリおよび数値計算ライブラリの研究

2011

教育研究プログラム・研究拠点形成プロジェクト（特別枠：追加採択分）

　 More details

Authorship：Principal investigator Grant type：On-campus funds, funds, etc.
並列言語ＣＡＦ向け動的通信最適化技術の開発

Grant number：21700036 2009 - 2011

Grants-in-Aid for Scientific Research Grant-in-Aid for Young Scientists (B)

　 More details

Authorship：Principal investigator Grant type：Scientific research funding
ＩＰｖ６とＭｙｒｉｎｅｔによる階層型クラスタ上のＯｐｅｎＭＰ処理環境の開発

Grant number：18700065 2006 - 2008

Grants-in-Aid for Scientific Research Grant-in-Aid for Young Scientists (B)

　 More details

Authorship：Principal investigator Grant type：Scientific research funding
ペタスケール・システムインターコネクト技術の開発（文部科学省「次世代IT基盤構築のための研究開発」、研究開発領域「将来のスーパーコンピューティングのための要素技術の研究開発」（平成１７年度〜１９年度））

2005.4 - 2008.3

九州大学（日本）

　 More details

Authorship：Coinvestigator(s)

PSI is one of the national projects on elemental technologies for peta-scale computing systems. The project works intensively on the topics of system interconnection networks: Optical switches, Intelligent interconnects and Performance prediction.
階層型クラスタシステム上のＯｐｅｎＭＰプログラム翻訳実行環境の開発に関する研究

Grant number：15700033 2003 - 2005

Grants-in-Aid for Scientific Research Grant-in-Aid for Young Scientists (B)

　 More details

Authorship：Principal investigator Grant type：Scientific research funding
超並列において高スケーラビリティを実現するステンシル計算・通信モデルの開発

Grant number：18K11336

深沢圭一郎, 南里豪志

　 More details

Grant type：Scientific research funding

本研究では、エクサスケール環境においてスケーラビリティ減衰が無いステンシル計算・通信モデルの開発、及びそこで利用されるHalo通信関数の開発を行うことを目的とした。
まずステンシルシミュレーションにおいて、「計算」と「通信が必要な計算と通信」にスレッドを分けるモデルを開発した。これにより、通信が終わったことを知るための同期が必要無く、並列性能劣化を回避することができた。次に、そこで利用された通信モデルを関数群（Halo関数）にまとめ、他のアプリケーションでも容易に利用可能とした。これらの性能を2000ノード利用した環境で測定を行い、高いスケーラビリティを確認した。

CiNii Research

▼display all

Educational Activities

For graduated students, teaching at classes of high-performance parallel computing and network.
For under graduate students, teaching at classes of programming and network.

Class subject

通信ネットワークB

2025.12 - 2026.2 Winter quarter
通信ネットワークA

2025.10 - 2025.12 Fall quarter
High-Performance Parallel Computing II

2024.6 - 2024.8 Summer quarter
高性能並列計算法特論Ⅱ

2024.6 - 2024.8 Summer quarter
【通年】情報理工学講究

2024.4 - 2025.3 Full year
【通年】情報理工学演習

2024.4 - 2025.3 Full year
【通年】情報理工学研究Ⅰ

2024.4 - 2025.3 Full year
情報理工学論議Ⅰ

2024.4 - 2024.9 First semester
情報理工学論述Ⅰ

2024.4 - 2024.9 First semester
情報理工学読解

2024.4 - 2024.9 First semester
【修士】高性能並列計算法特論

2024.4 - 2024.9 First semester
High-Performance Parallel Computing I

2024.4 - 2024.6 Spring quarter
高性能並列計算法特論Ⅰ

2024.4 - 2024.6 Spring quarter
情報ネットワーク特論

2023.12 - 2024.2 Winter quarter
通信ネットワークB

2023.12 - 2024.2 Winter quarter
通信ネットワークⅡ

2023.12 - 2024.2 Winter quarter
(IUPE)Int. to Information Processing II

2023.12 - 2024.2 Winter quarter
情報理工学論議Ⅱ

2023.10 - 2024.3 Second semester
情報理工学論述Ⅱ

2023.10 - 2024.3 Second semester
情報理工学演示

2023.10 - 2024.3 Second semester
（後期）通信ネットワーク

2023.10 - 2024.3 Second semester
通信ネットワークA

2023.10 - 2023.12 Fall quarter
通信ネットワークⅠ

2023.10 - 2023.12 Fall quarter
(IUPE)Int. to Information Processing I

2023.10 - 2023.12 Fall quarter
高性能並列計算法特論Ⅱ

2023.6 - 2023.8 Summer quarter
High-Performance Parallel Computing II

2023.6 - 2023.8 Summer quarter
【通年】情報理工学研究Ⅰ

2023.4 - 2024.3 Full year
【通年】情報理工学講究

2023.4 - 2024.3 Full year
【通年】情報理工学演習

2023.4 - 2024.3 Full year
【修士】高性能並列計算法特論

2023.4 - 2023.9 First semester
情報理工学論議Ⅰ

2023.4 - 2023.9 First semester
情報理工学論述Ⅰ

2023.4 - 2023.9 First semester
情報理工学読解

2023.4 - 2023.9 First semester
高性能並列計算法特論Ⅰ

2023.4 - 2023.6 Spring quarter
サイバーセキュリティ基礎論

2023.4 - 2023.6 Spring quarter
サイバーセキュリティ基礎論

2023.4 - 2023.6 Spring quarter
電気情報工学入門

2023.4 - 2023.6 Spring quarter
High-Performance Parallel Computing I

2023.4 - 2023.6 Spring quarter
(IUPE)Int. to Information Processing II

2022.12 - 2023.2 Winter quarter
情報ネットワーク特論

2022.12 - 2023.2 Winter quarter
通信ネットワークB

2022.12 - 2023.2 Winter quarter
情報理工学論議Ⅱ

2022.10 - 2023.3 Second semester
情報理工学論述Ⅱ

2022.10 - 2023.3 Second semester
情報理工学演示

2022.10 - 2023.3 Second semester
（後期）通信ネットワーク

2022.10 - 2023.3 Second semester
(IUPE)Int. to Information Processing I

2022.10 - 2022.12 Fall quarter
通信ネットワークA

2022.10 - 2022.12 Fall quarter
High-Performance Parallel Computing II

2022.6 - 2022.8 Summer quarter
高性能並列計算法特論Ⅱ

2022.6 - 2022.8 Summer quarter
情報理工学講究

2022.4 - 2023.3 Full year
情報理工学研究Ⅰ

2022.4 - 2023.3 Full year
情報理工学演習

2022.4 - 2023.3 Full year
High-Performance Parallel Computing

2022.4 - 2022.9 First semester
【修士】高性能並列計算法特論

2022.4 - 2022.9 First semester
情報理工学読解

2022.4 - 2022.9 First semester
情報理工学論述Ⅰ

2022.4 - 2022.9 First semester
情報理工学論議Ⅰ

2022.4 - 2022.9 First semester
High-Performance Parallel Computing I

2022.4 - 2022.6 Spring quarter
サイバーセキュリティ基礎論

2022.4 - 2022.6 Spring quarter
サイバーセキュリティ基礎論

2022.4 - 2022.6 Spring quarter
高性能並列計算法特論Ⅰ

2022.4 - 2022.6 Spring quarter
情報ネットワーク特論

2021.12 - 2022.2 Winter quarter
(IUPE)Int. to Information Processing II

2021.12 - 2022.2 Winter quarter
情報ネットワーク特論

2021.12 - 2022.2 Winter quarter
(IUPE)Int. to Information Processing II

2021.12 - 2022.2 Winter quarter
(IUPE)Int. to Information Processing l

2021.10 - 2021.12 Fall quarter
(IUPE)Int. to Information Processing l

2021.10 - 2021.12 Fall quarter
(IUPE)Int. to Information Processing II

2020.12 - 2021.2 Winter quarter
(IUPE)Int. to Information Processing II

2020.12 - 2021.2 Winter quarter
(IUPE)Int. to Information Processing II

2020.12 - 2021.2 Winter quarter
情報ネットワーク特論

2020.10 - 2021.3 Second semester
(IUPE)Int. to Information Processing l

2020.10 - 2020.12 Fall quarter
(IUPE)Int. to Information Processing l

2020.10 - 2020.12 Fall quarter
(IUPE)Int. to Information Processing l

2020.10 - 2020.12 Fall quarter
(IUPE)Int. to Information Processing II

2019.12 - 2020.2 Winter quarter
(IUPE)Int. to Information Processing II

2019.12 - 2020.2 Winter quarter
情報ネットワーク特論

2019.10 - 2020.3 Second semester
情報ネットワーク特論

2019.10 - 2020.3 Second semester
(IUPE)Int. to Information Processing l

2019.10 - 2019.12 Fall quarter
(IUPE)Int. to Information Processing l

2019.10 - 2019.12 Fall quarter
Introduction to Information Processing

2019.4 - 2019.6 Spring quarter
(IUPE) Introduction to Information Processing

2019.4 - 2019.6 Spring quarter
(IUPE) Introduction to Information Processing

2019.4 - 2019.6 Spring quarter
情報ネットワーク特論

2018.10 - 2019.3 Second semester
Introduction to Information Processing

2018.4 - 2018.6 Spring quarter
Introduction to Information Processing

2018.4 - 2018.6 Spring quarter
情報ネットワーク特論

2017.10 - 2018.3 Second semester
Introduction to Information Processing

2017.4 - 2017.9 First semester
Introduction to Information Processing

2017.4 - 2017.6 Spring quarter
Introduction to Information Processing

2017.4 - 2017.6 Spring quarter
情報ネットワーク特論

2016.10 - 2017.3 Second semester
Introduction to Information Processing

2016.4 - 2016.9 First semester
情報ネットワーク特論

2015.10 - 2016.3 Second semester
Introduction to Information Processing

2015.4 - 2015.9 First semester
情報ネットワーク特論

2014.10 - 2015.3 Second semester
Introduction to Information Processing

2014.4 - 2014.9 First semester
情報ネットワーク特論

2013.10 - 2014.3 Second semester
情報ネットワーク特論

2012.10 - 2013.3 Second semester
情報ネットワーク特論

2011.10 - 2012.3 Second semester
情報ネットワーク特論

2010.10 - 2011.3 Second semester
情報処理概論

2010.4 - 2010.9 First semester
情報処理概論

2009.4 - 2009.9 First semester
情報処理概論

2008.4 - 2008.9 First semester
情報処理概論

2007.10 - 2008.3 Second semester
情報処理概論

2007.4 - 2007.9 First semester
情報処理概論

2006.4 - 2006.9 First semester
情報処理概論

2005.4 - 2005.9 First semester
[G]High-Performance Parallel Computing II

2025.6 - 2025.8 Summer quarter
基幹教育セミナー

2025.6 - 2025.8 Summer quarter
高性能並列計算法特論Ⅱ

2025.6 - 2025.8 Summer quarter
【通年】情報理工学演習

2025.4 - 2026.3 Full year
【通年】情報理工学研究Ⅰ

2025.4 - 2026.3 Full year
【通年】情報理工学講究

2025.4 - 2026.3 Full year
情報理工学読解

2025.4 - 2025.9 First semester
情報理工学論議Ⅰ

2025.4 - 2025.9 First semester
情報理工学論述Ⅰ

2025.4 - 2025.9 First semester
[G]High-Performance Parallel Computing I

2025.4 - 2025.6 Spring quarter
高性能並列計算法特論Ⅰ

2025.4 - 2025.6 Spring quarter
(IUPE)Int. to Information Processing II

2024.12 - 2025.2 Winter quarter
情報ネットワーク特論

2024.12 - 2025.2 Winter quarter
通信ネットワークB

2024.12 - 2025.2 Winter quarter
通信ネットワークⅡ

2024.12 - 2025.2 Winter quarter
情報理工学演示

2024.10 - 2025.3 Second semester
情報理工学論議Ⅱ

2024.10 - 2025.3 Second semester
情報理工学論述Ⅱ

2024.10 - 2025.3 Second semester
（後期）通信ネットワーク

2024.10 - 2025.3 Second semester
(IUPE)Int. to Information Processing I

2024.10 - 2024.12 Fall quarter
通信ネットワークA

2024.10 - 2024.12 Fall quarter
通信ネットワークⅠ

2024.10 - 2024.12 Fall quarter
High-Performance Parallel Computing II

2024.6 - 2024.8 Summer quarter
高性能並列計算法特論Ⅱ

2024.6 - 2024.8 Summer quarter
【通年】情報理工学演習

2024.4 - 2025.3 Full year
【通年】情報理工学研究Ⅰ

2024.4 - 2025.3 Full year
【通年】情報理工学講究

2024.4 - 2025.3 Full year
【修士】高性能並列計算法特論

2024.4 - 2024.9 First semester
情報理工学読解

2024.4 - 2024.9 First semester
情報理工学論議Ⅰ

2024.4 - 2024.9 First semester
情報理工学論述Ⅰ

2024.4 - 2024.9 First semester
High-Performance Parallel Computing I

2024.4 - 2024.6 Spring quarter
高性能並列計算法特論Ⅰ

2024.4 - 2024.6 Spring quarter

▼display all

Visiting, concurrent, or part-time lecturers at other universities, institutions, etc.

2024 九州工業大学情報工学部 Classification:Part-time lecturer Domestic/International Classification:Japan
2023 九州工業大学情報工学部 Classification:Part-time lecturer Domestic/International Classification:Japan
2023 岡山大学工学部 Classification:Part-time lecturer Domestic/International Classification:Japan
2023 放送大学 Classification:Affiliate faculty
2022 放送大学 Classification:Affiliate faculty
2022 九州工業大学情報工学部 Classification:Part-time lecturer Domestic/International Classification:Japan
2021 放送大学 Classification:Part-time lecturer Domestic/International Classification:Japan

Semester, Day Time or Duration：面接授業（計８コマ）担当

▼display all

Other educational activity and Special note

2023 Class Teacher 学部
2011 Special Affairs システム情報科学研究院の青柳研究室に参加し、学部生 2名の卒業研究について、実質的な指導を担当した。また、システム情報科学研究院の村上研究室に参加し、修士2年生 1名の卒業研究について、実質的な指導を担当した。

　詳細を見る

システム情報科学研究院の青柳研究室に参加し、学部生 2名の卒業研究について、実質的な指導を担当した。
また、システム情報科学研究院の村上研究室に参加し、修士2年生 1名の卒業研究について、実質的な指導を担当した。

Social Activities

スーパーコンピュータ超入門

九州大学情報基盤研究開発センター九州大学情報基盤研究開発センター 2020.10

　More details

Audience：General,　Scientific,　Company,　Civic organization,　Governmental agency

Type：Seminar, workshop

スーパーコンピュータという言葉は知っているが、どんなものか良く分からない、という方を対象に、スーパーコンピュータの役割やパーソナルコンピュータとの違いなどを紹介する。
並列プログラミングにおける国際的な標準規格 MPI (Message Passing Interface) の仕様策定会議に参加

2016

　More details

並列プログラミングにおける国際的な標準規格 MPI (Message Passing Interface) の仕様策定会議に参加
社会人向けスパコン実践スクール　今のパソコンは、昔の大型計算機と言われた計算機を遥かに凌ぐスペックを有しており、また簡単に手に入るようになった。　今回のセミナーでは、オフィスで使用している程度のパソコンを使用して、８ノードの並列計算機（ＰＣクラスタ）を構成し、ＬｉｎｕｘやＭＰＩなどのソフトウェアをインストールして、ネットワークに接続し、実際に自分のパソコンからシミュレーション・コードを走らせて、その性能を評価してみます。

財団法人計算科学振興財団、大学院GP「大学連合による計算科学の最先端人材育成」神戸ポートアイランド内　神戸大学ＢＴセンター 2009.6

　More details

Audience：General,　Scientific,　Company,　Civic organization,　Governmental agency

Type：Seminar, workshop

Educational Activities for Highly-Specialized Professionals in Other Countries

2020.2 - 2020.3 国立研究開発法人科学技術振興機構「さくらサイエンスプラン」科学技術研修コース「ミャンマーの数学科の大学院生が数学のスーパーコンピューティングへの応用を学ぶ」

Main countries of student/trainee affiliation：Myanmar