Kyushu University Academic Staff Educational and Research Activities Database
List of Presentations
Takeshi Nanri Last modified date:2022.05.13

Associate Professor / Department of Advanced Information Technology, Faculty of Information Science and Electrical Engineering / Section of Advanced Computational Science / Research Institute for Information Technology


Presentations
1. Yuto Katoh, Keiichiro Fukazawa, Takeshi Nanri, Yohei Miyake , Cross-reference simulation by Code-To-Code Adapter (CoToCoA) library for the study of multi-scale physics in planetary magnetospheres, 2021 Ninth International Symposium on Computing and Networking Workshops (CANDARW), 2021.12.
2. Kenji Ono, Toshihiro Kato, Satoshi Ohshima, Takeshi Nanri, Scalable Direct-Iterative Hybrid Solver for Sparse Matrices on Multi-Core and Vector Architectures, International Conference on High Performance Computing in Asia-Pacific Region, 2019.12.
3. Keiichiro Fukazawa, Yuto Katoh, Takeshi Nanri, Yohei Miyake, Application of cross-reference framework CoToCoA to Macro- and micro-scale simulations of planetary magnetospheres, 7th International Symposium on Computing and Networking Workshops, CANDARW 2019, 2019.11, In this study, we have introduced the Code-to-Code Adapter (CoToCoA) library to couple the magnetohydrodynamic (MHD) simulation and the Electron Hybrid (EH) simulation of planetary magnetospheres. CoToCoA has been developed newly to connect the different codes easily. The concept of CoToCoA is that we do not add modifications to each code as possible without data transfer functions, and we do not need to know the referred code without data format. With CoToCoA, we have been developing the cross-reference simulation of macro (MHD) and micro (EH) scales in the magnetosphere. Then, we have evaluated the performance of cross-reference simulation using CoToCoA on the massively parallel computer system..
4. Kazuichi Oe, Takeshi Nanri, Hybrid Storage System to Achieve Efficient Use of Fast Memory Area, 7th International Symposium on Computing and Networking, CANDAR 2019, 2019.11, Hybrid storage techniques are useful methods to improve the cost performance for input-output (IO) intensive workloads. These techniques choose areas of concentrated IO accesses and migrate them to an upper tier to extract as much performance as possible through greater use of upper tier areas. Automated tiered storage with fast memory and slow flash storage (ATSMF) is a hybrid storage system situated between non-volatile memories (NVMs) and solid-state drives (SSDs). ATSMF aims to reduce the average response time for IO accesses by migrating areas of concentrated IO access from an SSD to an NVM. When a concentrated IO access finishes, the system migrates these areas from the NVM back to the SSD. Unfortunately, the published ATSMF implementation temporarily consumes much NVM capacity upon migrating concentrated IO access areas to NVM, because its algorithm executes NVM migration with high priority. As a result, it often delays evicting areas in which IO concentrations have ended to the SSD. Therefore, to reduce the consumption of NVM while maintaining the average response time, we developed new techniques for making ATSMF more practical. The first is a queue handling technique based on the number of IO accesses for NVM migration and eviction. The second is an eviction method that selects only write-accessed partial regions in finished areas. The third is a technique for variable eviction timing to balance the NVM consumption and average response time. Experimental results indicate that the average response times of the proposed ATSMF are almost the same as those of the published ATSMF, while the NVM consumption is drastically lower..
5. Praphan Pavarangkoon, Ken T. Murata, Kazunori Yamamoto, Kazuya Muranaga, Takamichi Mizuhara, Keiichiro Fukazawa, Ryusuke Egawa, Takahiro Katagiri, Masao Ogino, Takeshi Nanri, Performance improvement of high-speed file transfer over JHPCN, 17th IEEE International Conference on Dependable, Autonomic and Secure Computing, IEEE 17th International Conference on Pervasive Intelligence and Computing, IEEE 5th International Conference on Cloud and Big Data Computing, 4th Cyber Science and Technology Congress, DASC-PiCom-CBDCom-CyberSciTech 2019, 2019.08, This paper proposes a novel file transfer tool to improve file transfer performance over Japan high performance computing and networking (JHPCN). We first develop a high-performance and flexible protocol (HpFP) for inter-datacenter transport network. An original HpFP is designed first for specified networks and puts more emphasis on latency and packet loss tolerances than fairness and friendliness, while an enhanced HpFP is more suitable for real network environments. Then, based on the enhanced HpFP, we implement a file transfer tool, called high-performance copy (HCP). The performance of our file transfer tool is evaluated between datacenters of JHPCN using real datasets collected from supercomputer resources. The results show that the HCP achieves higher throughput than traditional tool for file transfer over JHPCN..
6. Kenji Ono, Jorji Nonaka, Hiroyuki Yoshikawa, Takeshi Nanri, Yoshiyuki Morie, Tomohiro Kawanabe, Fumiyoshi Shoji, Design of a Flexible In Situ Framework with a Temporal Buffer for Data Processing and Visualization of Time-Varying Datasets, International Conference on High Performance Computing, ISC High Performance 2018, 2018.01, This paper presents an in situ framework focused on time-varying simulations, and uses a novel temporal buffer for storing simulation results sampled at user-defined intervals. This framework has been designed to provide flexible data processing and visualization capabilities in modern HPC operational environments composed of powerful front-end systems, for pre-and post-processing purposes, along with traditional back-end HPC systems. The temporal buffer is implemented using the functionalities provided by Open Address Space (OpAS) library, which enables asynchronous one-sided communication from outside processes to any exposed memory region on the simulator side. This buffer can store time-varying simulation results, and can be processed via in situ approaches with different proximities. We present a prototype of our framework, and code integration process with a target simulation code. The proposed in situ framework utilizes separate files to describe the initialization and execution codes, which are in the form of Python scripts. This framework also enables the runtime modification of these Python-based files, thus providing greater flexibility to the users, not only for data processing, such as visualization and analysis, but also for the simulation steering..
7. Kazuichi Oe, Takeshi Nanri, Non-volatile memory driver for applying automated tiered storage with fast memory and slow flash storage, 6th International Symposium on Computing and Networking Workshops, CANDARW 2018, 2018.12, Automated tiered storage with fast memory and slow flash storage (ATSMF) is a hybrid storage system located between non-volatile memories (NVMs) and solid state drives (SSDs). ATSMF aims to reduce average response time for inputoutput (IO) accesses by migrating concentrated IO access areas from SSD to NVM. However, the current ATSMF implementation cannot reduce average response time sufficiently because of the bottleneck caused by the Linux brd driver, which is used for the NVM access driver. The response time of the brd driver is more than ten times larger than memory access speed. To reduce the average response time sufficiently, we developed a block-level driver for NVM called a 'two-mode (2M) memory driver.' The 2M memory driver has both the. map IO access mode and direct IO access mode to reduce the response time while maintaining compatibility with the Linux device-mapper framework. The direct IO access mode has a drastically lower response time than the Linux brd driver because the ATSMF driver can execute the IO access function of 2M memory driver directly. Experimental results also indicate that ATSMF using the 2M memory driver reduces the IO access response time to less than that of ATSMF using the Linux brd driver in most cases..
8. Kenji Ono, Jorji Nonaka, Yoshiyuki Morie, Takeshi Nanri, Tomohiro Kawanabe, Design of an In Transit Framework with Staging Buffer for Flexible Data Processing and Visualization of Time-Varying Data, ISC WORKSHOP ON IN SITU VISUALIZATION 2018, 2018.06.
9. Kazuichi Oe, Mitsuru Sato, Takeshi Nanri, Automated Tiered Storage System Consisting of Memory and Flash Storage to Improve Response Time with Input-Output (IO) Concentration Workloads, 5th International Symposium on Computing and Networking, CANDAR 2017, 2018.04, The response time of solid state drives (SSDs) has dramatically reduced according to the spread of non-volatile memory express (NVMe) devices. These devices have response times of less than 100 micro seconds on average. The response time of all-flash-array systems has also drastically reduced through the use of NVMe SSDs. However, there are applications, particularly, virtual desktop infrastructure and in-memory database systems, that require storage systems with even shorter response time. Their workloads were found to contain many input-output (IO) concentrations. We define IO concentration by using a declarative style. Input-output (IO) concentrations are aggregations of IO accesses. They appear in narrow regions of the storage volume and continue for periods of up to about an hour. These narrow regions occupy a few percent of the logical unit number capacity, include most IO accesses, and appear at unpredictable logical block addresses. To drastically reduce the response time of these workloads, we developed automated tiered storage system called 'automated tiered storage with fast memory and slow flash storage' (ATSMF). The memory component of ATSMF is a memory with a non-volatile feature. The system predicts the remaining duration of IO concentration, calculates the response-time increase during migration and response-time decrease after migration, and migrates the IO concentrations if the response-time decrease after migration surpasses the response-time increase during migration. Experimental results indicate that ATSMF is at least 20% faster than flash storage only and its memory access ratio is more than 50%..
10. Takeshi Nanri, Proposal of Interface for Runtime Memory Manipulation of Applications via PGAS-based Communication Library, Workshop on PGAS programming models: Experiences and Implementations (PGAS-EI), 2018.01.
11. Tetsuya Nakatoh, Sachio Hirokawa, Toshiro Minami, Takeshi Nanri, Miho Funamori, Attribute-based quality classification of academic papers, 2017.11, Investigating the relevant literature is very important for research activities. However, it is difficult to select the most appropriate and important academic papers from the enormous number of papers published annually. Researchers search paper databases by combining keywords, and then select papers to read using some evaluation measure—often, citation count. However, the citation count of recently published papers tends to be very small because citation count measures accumulated importance. This paper focuses on the possibility of classifying high-quality papers superficially using attributes such as publication year, publisher, and words in the abstract. To examine this idea, we construct classifiers by applying machine-learning algorithms and evaluate these classifiers using cross-validation. The results show that our approach effectively finds high-quality papers..
12. Tetsuya Nakatoh, Kenta Nagatani, Toshiro Minami, Sachio Hirokawa, Takeshi Nanri, Miho Funamori, Analysis of the quality of academic papers by the words in abstracts, Thematic track on Human Interface and the Management of Information, held as part of the 19th International Conference on Human–Computer Interaction, HCI International 2017, 2017.01, The investigation of related research is very important for research activities. However, it is not easy to choose an appropriate and important academic paper from among the huge number of possible papers. The researcher searches by combining keywords and then selects an paper to be checked because it uses an index that can be evaluated. The citation count is commonly used as this index, but information about recently published papers cannot be obtained. This research attempted to identify good papers using only the words included in the abstract. We constructed a classifier by machine learning and evaluated it using cross validation. As a result, it was found that a certain degree of discrimination is possible..
13. Kazuichi Oe, Takeshi Nanri, Koji Okamura, Feasibility study for building hybrid storage system consisting of non-volatile DIMM and SSD, 4th International Symposium on Computing and Networking, CANDAR 2016, 2017.01, Various vendors develop a byte accessible Nonvolatile Dual-Inline Memory Module (NVDIMM). The performance of the NVDIMM drastically surpasses that of the Solid State Drive (SSD), which is connected by PCI express. However, the cost of the NVDIMM is much higher than that of the SSD. Therefore, a hybrid storage system between the NVDIMM and SSD is an effective technique for improving cost-performance. If a system uses the NVDIMM less while maintaining performance, its cost-performance should be improved. Our previous work involves on-the-fly automated storage tiering (OTF-AST). OTF-AST is a hybrid storage system consisting of an SSD and HDD. It aims to reduce the average response time of IO accesses by migrating only the IO concentration area to the SSD when IO concentration happens. Therefore, we construct OTF-AST with both the DIMM and SSD and evaluate it in order to understand how to build a cost-effective hybrid storage system with these devices. We use a DIMM instead of a byte accessible NVDIMM, which is difficult to obtain. As a result, we found that the original OTF-AST is suitable for a hybrid storage system consisting of the DIMM and SSD. Moreover, we can improve the performance of OTF-AST if replace its migration algorithm with a more positive migration algorithm. This is because the IO access response time barely increases when the data migration between the DIMM and SSD is done. We will build a more positive migration algorithm in the near future..
14. Shinji Sumimoto, Yuichiro Ajima, Kazushige Saga, Takafumi Nose, Naoyuki Shida, Takeshi Nanri, The design of advanced communication to reduce memory usage for exa-scale systems, 12th International Conference on High Performance Computing for Computational Science, VECPAR 2016, 2017.01, Current MPI (Message Passing Interface) communication libraries require larger memories in proportion of the number of processes, and can not be used for exa-scale systems. This paper proposes a global memory based communication design to reduce memory usage for exa-scale communication. To realize exa-scale communication, we propose true global memory based communication primitives called Advanced Communication Primitives (ACPs). ACPs provide global address, which is able to use remote atomic memory operations on the global memory, RDMA (Remote Direct Memory Access) based remote memory copy operation, global heap allocator and global data libraries. ACPs are different from the other communication libraries because ACPs are global memory based so that house keeping memories can be distributed to other processes and programmers explicitly consider memory usage by using ACPs. The preliminary result of memory usage by ACPs is 70 MB on one million processes..
15. Keiichiro Fukazawa, Yoshiyuki Morie, Toshiya Takami, Takeshi Nanri, Takeshi Soga, Effective calculation with halo communication using halo functions, 23rd European MPI Users' Group Meeting, EuroMPI 2016, 2016.09, The issue of halo communication is the decrease of parallel scalability. To overcome the issues, we have introduced "Halo thread" to our simulation code. However, we have not solved the issue basically in the strong scaling. In this study, we have developed the Halo functions which perform the halo communication effectively. Then we can perform the calculation and communication in a pipeline and obtained good performance..
16. Yoshiyuki Morie, Hiroaki Honda, Takeshi Nanri, Taizo Kobayashi, Hidetomo Shibamura, Ryutaro Susukita, Yuichiro Ajima, Memory Efficient One-Sided Communucation Library "aCP" in Globary Memory on Raspberry Pi 2, 36th IEEE International Conference on Distributed Computing Systems, ICDCS 2016, 2016.08, Previously, communications in parallel programs forHigh Performance Computing (HPC) and Distributed Computing(DC) are mostly written with two-sided communicationinterfaces that are based on a pair of operations, Send andReceive. Since such interface requires explicit synchronizationbetween both sides of the communication, techniquesfor communication optimization such as overlapping are notefficiently described in many cases. On the other hand, onesidedcommunication interface is becoming important as amethod to describe asynchronous communications to enablehighly overlapped communication with computation. As oneof such interface, in this demonstration, Advanced CommunicationPrimitives (ACP) is introduced. ACP is a portableinterface that supports UDP, IBverbs of InfiniBand and Tofulibrary of K Computer. In addition to that, it is designed tobe memory efficient. For example, with 10 thousand processes, the memory consumption of ACP over UDP is estimated to beless than 1MB. Since the number of computational elements isincreasing more rapidly than the amount of available memory, this memory efficiency is becoming one of the keys for parallelprograms in HPC and DC. To show this characteristics, we runACP library on Raspberry Pi 2, and examine its performanceand memory consumption..
17. Takeshi Nanri, Keiichiro Fukazawa, Effect of Overlapping Halo Exchange with One-Sided Communication, 5th JSST Annual Conference International Conference on Simulation Technology, 2016.10.
18. Keiichiro Fukazawa, Takayuki Umeda, Takeshi Nanri, Performance Evaluation of MHD Simulation Code with X86 CPUs and Manycore Systems, 5th JSST Annual Conference International Conference on Simulation Technology, 2016.10.
19. Ryutaro Susukita, Yoshiyuki Morie, Takeshi Nanri, Efficient communications of particle data in particle-based simulations, 5th JSST Annual Conference International Conference on Simulation Technology, 2016.10.
20. Hiroaki Honda, Yoshiyuki Morie, Takeshi Nanri, Development of A Memory Efficient Communication Method for Connecting MPI Programs by using ACP Library, 5th JSST Annual Conference International Conference on Simulation Technology, 2016.10.
21. Shinji Sumimoto, Yuichiro Ajima, Kazushige Saga, Takafumi Nose, Naoyuki Shida, Takeshi Nanri, The Design of Advanced Communication to Reduce Memory Usage for Exa-scale Systems, 12th International Meeting On High Performance Computing for Computational Science, 2016.09.
22. Takeshi Nanri, Runtime Algorithm Selection of Collective Communication with RMA-based Monitoring Mechanism, 4th Annual MVAPICH Users Group Meeting, 2016.08.
23. Keiichiro Fukazawa, Toshiya Takami, Takeshi Soga, Yoshiyuki Morie, Takeshi Nanri, Effective Calculation with Halo communication using Halo Functions, 23rd European MPI Users' Group Meeting, 2016.09.
24. Seiji FUJINO, Takeshi Nanri, Improvement of Eisenstat-SSOR preconditioning using tolerance value, 5th IMA Conference on Numerical Linear Algebra and Optimization, 2016.09.
25. Ryutaro Susukita, Yoshiyuki Morie, Takeshi Nanri, NSIM-ACE: An Interconnection Network Simulator for Evaluating Remote Direct Memory Access, International Conference on Simulation and Modeling Methodologies, Technologies and Applications, 2016.07.
26. Kazuichi Oe, Takeshi Nanri, KOJI OKAMURA, Analysis of Storage Workloads of Input-Output Access Locality and Designing of Hybrid Storage System, 1st International Conference on Enterprise Architecture and Information Systems, 2016.01.
27. Dependence on combination with number of processes and threads for com-
putation times of hybrid-parallel version of IDR(s) Method
.
28. Performance analysis of the CG method with parallelized PAGME.
29. Evaluation of Dynamic Algorithm Selection with Performance Prediction Models on Alltoall Operation.
30. Implementation and Performance Evaluation of Parallelized CG method with PAGME - Preconditioning Method on Hierarchical Parallel Computers.
31. Yoshiyuki Morie, Takeshi Nanri, Task Allocation Optimization for Neighboring Communication on Fat Tree, 14th IEEE International Conference on High Performance Computing and Communication, 2012.06.
32. FUKAZAWA Keiichiro, Takeshi Nanri, Performance of Large Scale MHD Simulation of Global Planetary Magnetosphere with Massively Parallel Scalar Type Supercomputer Including Post Processing, 14th IEEE International Conference on High Performance Computing and Communication, 2012.06.
33. Matthew Livesey, James Francis Stack, Jr., Fumie Costen, Takeshi Nanri, Norimasa Nakashima, Seiji FUJINO, Impact of GPU Memory Access Patterns on FDTD, IEEE Antennas and Propagation Society International Symposium (APSURSI), 2012.07.
34. Seiji FUJINO, Takeshi Nanri, Kenichirou Kusaba, Balancing Communication and Execution Technique for Parallelized Sparse Matrix-Vector Multiplication, 4th International Conference on Future Computational Technologies and Applications, 2012.07.
35. FUKAZAWA Keiichiro, Takeshi Nanri, Effective Performance of Large-Scale MHD Simulation for Planetary Magnetosphere with Massively Parallel Computer, JSST2012 International Conference on Simulation Technology, 2012.07.
36. Takeshi Nanri, Motoyoshi Kurokawa, Efficient Runtime Algorithm Selection of Collective Communication with Topology-Based Performance Models, International Conference on Parallel and Distributed Processing Techniques and Applications, 2012.07.
37. Matthew Livesey, James Francis Stack, Jr., Fumie Costen, Takeshi Nanri, Norimasa Nakashima, Seiji FUJINO, An Alternative Domain Decomposition Technique for CUDA-based 3D FDTD Methods, 9th European Radar Conference, 2012.11.
38. Takeshi Nanri, Introduction of ACE(Advanced Communication library for Exa) Project, International workshop on HPC, Krylov Subspace method and its application, 2013.01.
39. Yoshiyuki Morie, Takeshi Nanri, Task Allocation Method for Avoiding Contentions by the Information of Concurrent Communication, International workshop on HPC, Krylov Subspace method and its application, 2013.01.
40. Tsuyoshi Okuma, Takeshi Nanri, Evaluation of Implementation Methods for Non-Blocking Collective Communications in Overlapping Communication and Computation, International workshop on HPC, Krylov Subspace method and its application, 2013.01.
41. Hironobu Sugiyama, Takeshi Nanri, Performance Prediction Technology for Collective Communication Algorithm on Multi-Dimensional Mesh/Torus, International workshop on HPC, Krylov Subspace method and its application, 2013.01.
42. Takeshi Nanri, Hironobu Sugiyama, FUKAZAWA Keiichiro, A Cost-Efficient Approach for Automatic Algorithm Selection of Collective Communications, SIAM Conference on Computational Science and Engineering, 2013.03.
43. Takeshi Nanri, What Communication Library Can do with a Little Hint from Programmers?, MVAPICH User Group Meeting, 2013.08.
44. Yoshiyuki Morie, Takeshi Nanri, A neighbor communication algorithm with making an effective use of NICs on multidimensional-mesh/torus, International Conference on Simulation Technology , 2013.09.
45. Tsuyoshi Okuma, Takeshi Nanri, Performance Study of Non-blocking Collective Communication Implementations Toward Adaptive Selection, Networking, Computing, Systems and Software, 2013.12.
46. Hironobu Sugiyama, Takeshi Nanri, Topology Aware Performance Prediction of Collective Communication Algorithms on Multi-Dimensional Mesh/Torus, Networking, Computing, Systems and Software, 2013.12.
47. Takeshi Nanri, Proposal of HINT Interface for Runtime Tuning of Communication Links, 22nd Euromicro International Conference on Parallel, Distributed and network-based Processing, 2014.02.
48. Takeshi Nanri, Design and Implementation of Channel Interface as a Memory Efficient Communication Model, Annual Meeting on Advanced Computing System and Infrastructure (ACSI) 2015, 2015.01.
49. Takeshi Nanri, Channel Interface: a Primitive Model for Memory Efficient Communication, 23rd Euromicro International Conference on Parallel, Distributed and network-based Processing, 2015.03.
50. Hiroaki Honda, Takeshi Nanri, Yoshiyuki Morie, Performance and memory usage evaluations for channel interface of Advanced Communication Primitives library, 1st Pan-American Congress on Computational Mechanics (PANACM 2015), 2015.04.
51. Ryutaro Susukita, Yoshiyuki Morie, Takeshi Nanri, Hidetomo Shibamura, Performance Evaluation of RDMA Communication Patterns by Means of Simulations, 2015 Joint International Mechanical, Electronic and Information Technology Conference, 2015.12.
52. Takeshi Nanri, Evaluation of On-Demand Message-Passing Module over RDMA Network, ACSI2016, 2016.01.
53. Kazuichi Oe, Takeshi Nanri, KOJI OKAMURA, On-The-Fly Automated Storage Tiering with Caching and both Proactive and Observational Migration, Workshop on Computer Systems and Architectures (CSA'15), 2015.12.
54. Shinji Sumimoto, Yuichiro Ajima, Takafumi Nose, Kazushige Saga, Naoyuki Shida, Takeshi Nanri, Parallel Application Experiences Using Advanced Communication Primitives, 25th Euromicro International Conference on Parallel, Distributed and network-based Processing, 2017.03.