Kyushu University Academic Staff Educational and Research Activities Database
List of Papers
Takeshi Nanri Last modified date:2022.05.13

Associate Professor / Department of Advanced Information Technology, Faculty of Information Science and Electrical Engineering / Section of Advanced Computational Science / Research Institute for Information Technology

1. Optimization of MPI Rank Allocation Considering Communication Timing for Reducing Contention.
2. Dynamic Optimization of Load Balance in MPI Broadcast.
3. A Neighboring Communication Algorithm Using Effective Multiple Communication Devices on Direct Connection Network.
4. Task Allocation Technique for Avoiding Contentions on Multi-dimensional Mesh/Torus.
5. Task Allocation Optimization for Neighboring Communication on Fat Tree.
6. Task Allocation Method for Avoiding Contentions by the Information of Concurrent Communication.
7. Takeshi Nanri, Hiroyuki Sato and Masaaki Shimasaki, Portability in Implementing Distributed Shared Memory System on the Workstation Cluster Environment, Research Reports on Information Science and Electrical Engineering of Kyushu University, Vol.2, No2, pp.185--190, 1997.03.
8. FUKAZAWA Keiichiro, Takeshi Nanri, Takayuki Umeda, Performance evaluation of magnetohydrodynamics simulation for magnetosphere on K computer, Communications in Computer and Information Science, 10.1007/978-3-642-45037-2_61, 2013.12.
9. Hyacinthe Nzigou Mamadou, Takeshi Nanri and Kazuaki Murakami, Performance Models for MPI Collective Communications with Network Contention, IEICE Transactions on Communications, Vol. E91-B, No. 4, pp. 1015-1024, 2008.05.
10. FUKAZAWA Keiichiro, Takeshi Nanri, Takayuki Umeda, Performance Measurements of MHD Simulation for Planetary Magnetosphere on Peta-Scale Computer FX10, Advances in Parallel Computing, 10.3233/978-1-61499-381-0-387, 2014.03.
11. Keiichiro Fukazawa, Takeshi Soga, Takayuki Umeda, Takeshi Nanri, Performance Evaluation and Optimization of MagnetoHydroDynamic Simulation for Planetary Magnetosphere with Xeon Phi KNL, Parallel Computing is Everywhere, 10.3233/978-1-61499-843-3-178, 178-187, 2018.01, The magnetohydrodynamic (MHD) simulation is often applied to study the global dynamics and configuration of a planetary magnetosphere for the space weather. In this paper, the computational performance of MHD code is evaluated with 128 nodes Xeon Phi KNL of Cray XC40. As the results, the 2D and 3D domain decompositions of SoA (structure of array) make the effective performances and AoS (array of structure) and hybrid parallel computation become low performances. Adding the performance optimizations for Xeon Phi to our MHD simulation code, then we have obtained 2.4 % increase of execution efficiency in total and we achieved 3 TFlops performance gain using 128 nodes..
12. Implementation of Neighbor Communication Algorithm Using Multi-NICs Effectively by Extended RDMA Interface.
13. Kazuichi Oe, Takeshi Nanri, Koji Okamura, Hybrid storage system consisting of cache drive and multi-tier SSD for improved IO access when IO is concentrated, IEICE Transactions on Information and Systems, 10.1587/transinf.2018EDP7253, E102D, 9, 1715-1730, 2019.01, In previous studies, we determined that workloads often contain many input-output (IO) concentrations. Such concentrations are aggregations of IO accesses. They appear in narrow regions of a storage volume and continue for durations of up to about an hour. These narrow regions occupy a small percentage of the logical unit number capacity, include most IO accesses, and appear at unpredictable logical block addresses. We investigated these workloads by focusing on page-level regularity and found that they often include few regularities. This means that simple caching may not reduce the response time for these workloads sufficiently because the cache migration algorithm uses page-level regularity. We previously developed an on-the-fly automated storage tiering (OTFAST) system consisting of an SSD and an HDD. The migration algorithm identifies IO concentrations with moderately long durations and migrates them from the HDD to the SSD. This means that there is little or no reduction in the response time when the workload includes few such concentrations. We have now developed a hybrid storage system consisting of a cache drive with an SSD and HDD and a multi-tier SSD that uses OTFAST, called "OTF-AST with caching." The OTF-AST scheme handles the IO accesses that produce moderately long duration IO concentrations while the caching scheme handles the remaining IO accesses. Experiments showed that the average response time for our system was 45% that of Facebook FlashCache on a Microsoft Research Cambridge workload..
14. Matthew Livesey, James Francis Stack, Jr., Fumie Costen, Takeshi Nanri, Norimasa Nakashima, Seiji FUJINO, Development of a CUDA Implementation of the 3D FDTD Method, IEEE Antennas and Propagation Magazine, 10.1109/MAP.2012.6348145, 54, 5, 186-195, 2012.10.
15. Kenji Ono, Jorji Nonaka, Hiroyuki Yoshikawa, Takeshi Nanri, Yoshiyuki Morie, Tomohiro Kawanabe, Fumiyoshi Shoji, Design of a Flexible In Situ Framework with a Temporal Buffer for Data Processing and Visualization of Time-Varying Datasets, Lecture Notes in Computer Science, 11203, 243-257, 2019.01.
16. Kenji Ono, Jorji Nonaka, Hiroyuki Yoshikawa, Takeshi Nanri, Yoshiyuki Morie, Tomohiro Kawanabe, Fumiyoshi Shoji, Design of a Flexible In Situ Framework with a Temporal Buffer for Data Processing and Visualization of Time-Varying Datasets., High Performance Computing - ISC High Performance 2018 International Workshops, Frankfurt/Main, Germany, June 28, 2018, Revised Selected Papers, 10.1007/978-3-030-02465-9_17, 243-257, 2018.06.
17. Tetsuya Nakatoh, Sachio Hirokawa, Toshiro Minami, Takeshi Nanri, Miho Funamori, Assessing the Significance of Scholarly Articles using their Attributes, Proc. of the 22nd International Symposium on Artificial Life and Robotics (AROB2017), 742-746, 2017.01.
18. Takeshi Nanri, Approaches for memory-efficient communication library and runtime communication optimization, Advanced Software Technologies for Post-Peta Scale Computing The Japanese Post-Peta CREST Research Project, 10.1007/978-981-13-1924-2_7, 121-138, 2018.12, This article summarizes the works established in Advanced Communication for Exa (ACE) project. The most important motivation of this project was the severe demands for scalable communication toward Exa-scale computations. Therefore, in the project, we have built a PGAS-based communication library, Advanced Communication Primitives (ACP). Its fundamental communication model is onesided, based on PGAS model, so that it can consume internal memory footprint as small as possible. Based on this model, several applications including simulations of magnetohydrodynamic, molecular orbitals, and particles were tuned to achieve higher scalability. In addition to that, some communication optimization techniques have been investigated. Especially, tuning methods of collective communications, such as message ordering, algorithm selection, and overlapping, are studied. Also, in this project, a network simulator NSIM-ACE is developed. It simulates behavior of packets for one-sided communications to study the effects of congestions on interconnects..
19. Tetsuya Nakatoh, Kenta Nagatani, Toshiro Minami, Sachio Hirokawa, Takeshi Nanri, Miho Funamori, Analysis of the Quality of Academic Papers by the Words in Abstracts, HIMI 2017, Part II, LNCS 10274, Proc. of the 19th International Conference on Human-Computer Interaction (HCI International 2017), 2017.07.
20. Kazuichi Oe, Mitsuru Sato, Takeshi Nanri, ATSMF
Automated tiered storage with fast memory and slow flash storage to improve response time with concentrated input-output (IO) workloads, IEICE Transactions on Information and Systems, 10.1587/transinf.2018PAP0005, E101D, 12, 2889-2901, 2018.12, The response times of solid state drives (SSDs) have decreased dramatically due to the growing use of non-volatile memory express (NVMe) devices. Such devices have response times of less than 100 micro seconds on average. The response times of all-flash-array systems have also decreased dramatically through the use of NVMe SSDs. However, there are applications, particularly virtual desktop infrastructure and in-memory database systems, that require storage systems with even shorter response times. Their workloads tend to contain many input-output (IO) concentrations, which are aggregations of IO accesses. They target narrow regions of the storage volume and can continue for up to an hour. These narrow regions occupy a few percent of the logical unit number capacity, are the target of most IO accesses, and appear at unpredictable logical block addresses. To drastically reduce the response times for such workloads, we developed an automated tiered storage system called “automated tiered storage with fast memory and slow flash storage” (ATSMF) in which the data in targeted regions are migrated between storage devices depending on the predicted remaining duration of the concentration. The assumed environment is a server with non-volatile memory and directly attached SSDs, with the user applications executed on the server as this reduces the average response time. Our system predicts the effect of migration by using the previously monitored values of the increase in response time during migration and the change in response time after migration. These values are consistent for each type of workload if the system is built using both non-volatile memory and SSDs. In particular, the system predicts the remaining duration of an IO concentration, calculates the expected response-time increase during migration and the expected response-time decrease after migration, and migrates the data in the targeted regions if the sum of response-time decrease after migration exceeds the sum of response-time increase during migration. Experimental results indicate that ATSMF is at least 20% faster than flash storage only and that its memory access ratio is more than 50%..
21. A Neighbor Communication Algorithm with Making an Effective Use of NICs on Multidimensional-Mesh/torus.
22. A Method for Predicting a Penalty of Contentions by Considering Priorities of Routing among Packets on Direct Interconnection Network.