九州大学 研究者情報
論文一覧
南里 豪志(なんり たけし) データ更新日:2023.12.06

准教授 /  情報基盤研究開発センター 先端計算科学研究部門 大学院システム情報科学研究院 情報知能工学部門


原著論文
1. 森江善之, 末安直樹, 松本透, 南里豪志, 石畑宏明, 井上弘士, 村上和彰, 通信タイミングを考慮した衝突削減のためのMPIランク配置最適化技術, 情報処理学会論文誌, 48, SIG13(ACS19), 192-202, 2007.08.
2. 曽我 武史, 栗原 康志, 南里 豪志, 黒川 原佳, 村上 和彰, 負荷バランスの動的最適化によるMPIブロードキャスト性能改善, 情報処理学会論文誌 コンピュータシステム, Vol. 1, No. 3, pp. 67-82 , 2008.12.
3. 森江 善之, 末安 直樹, 松本 透, 南里 豪志, 石畑 宏明, 井上 弘士, 村上 和彰, 衝突削減のためのタスク配置最適化に関する研究, 次世代スーパーコンピューティング・シンポジウム2007, 2007, 2007.10.
4. 森江善之, 森江善之, 南里豪志, 南里豪志, 直接網において複数の通信デバイスを有効に使用する隣接通信アルゴリズムの提案, 情報処理学会論文誌トランザクション コンピューティングシステム(Web), 8, 4, 26-35 (WEB ONLY), 2015.11.
5. 大江 和一, 岩田 聡, 南里 豪志, 岡村 耕二, 性能向上を期待できる継続時間とIOアクセス数を満たしたIOアクセス集中領域を自動抽出してSSDに移動することで性能向上を図る階層型ストレージシステムの提案と評価, 情報処理学会論文誌 , 9, 1, 1-16, 2016.01.
6. 森江 善之, 南里 豪志, 多次元メッシュ/トーラスにおける通信衝突を考慮したタスク配置最適化技術, 情報処理学会, 6, 3, 12-21, 2013.09.
7. 森江善之, 南里豪志, 多次元メッシュ/トーラスにおける通信衝突を考慮したタスク配置最適化技術, 情報処理学会論文誌トランザクション コンピューティングシステム(Web), 6, 3, 12-21 (WEB ONLY), 2013.09.
8. 藤野 清次, 小玉 捷平, 南里 豪志, 岩里 洸介, 同種コンパイラーと他機種実行を利用した計算時間の短縮, 日本シミュレーション学会論文誌 , 8, 1, 21-24, 2016.01.
9. 南里 豪志, 佐藤周行, 島崎眞昭, 分散共有メモリシステム上にソフトウェアによって構築されたキャッシュシステムの静的制御, 情報処理学会論文誌, 第 38巻 第 9号, pp.1859--1868, 1997.09.
10. 岩里 洸介, 南里 豪志, 藤野 清次, 並列計算における reduction指示の実装に関する考察, 日本シミュレーション学会論文誌, 7, 4, 109-113, 2015.07.
11. 南里 豪志, 「京」の後の時代を支えるスパコン:5.多数のXeonプロセッサを用いるスパコン, 情報処理, 60, 12, 1198-1203, 2019.11.
12. Yoshiyuki Morie, Takeshi Nanri, Task Allocation Optimization for Neighboring Communication on Fat Tree, 4th IEEE International Conference on High Performance Computing and Communication 9th IEEE International Conference on Embedded Software and Systems, HPCC-ICESS 2012, 10.1109/HPCC.2012.179, 1219-1225, 2012.01.
13. Yoshiyuki Morie, Takeshi Nanri, Motoyoshi Kurokawa, Task Allocation Method for Avoiding Contentions by the Information of Concurrent Communication, The Tenth IASTED International Conference on Parallel and Distributed Computing and Networks, 10.2316/P.2011.719-025, 62-69, 2011.02.
14. Takeshi Nanri, Hiroyuki Sato and Masaaki Shimasaki, Portability in Implementing Distributed Shared Memory System on the Workstation Cluster Environment, Research Reports on Information Science and Electrical Engineering of Kyushu University, Vol.2, No2, pp.185--190, 1997.03.
15. FUKAZAWA Keiichiro, Takeshi Nanri, Takayuki Umeda, Performance evaluation of magnetohydrodynamics simulation for magnetosphere on K computer, Communications in Computer and Information Science, 10.1007/978-3-642-45037-2_61, 2013.12.
16. Hyacinthe Nzigou Mamadou, Takeshi Nanri and Kazuaki Murakami, Performance Models for MPI Collective Communications with Network Contention, IEICE Transactions on Communications, Vol. E91-B, No. 4, pp. 1015-1024, 2008.05.
17. FUKAZAWA Keiichiro, Takeshi Nanri, Takayuki Umeda, Performance Measurements of MHD Simulation for Planetary Magnetosphere on Peta-Scale Computer FX10, Advances in Parallel Computing, 10.3233/978-1-61499-381-0-387, 2014.03.
18. Keiichiro Fukazawa, Takeshi Soga, Takayuki Umeda, Takeshi Nanri, Performance Evaluation and Optimization of MagnetoHydroDynamic Simulation for Planetary Magnetosphere with Xeon Phi KNL, Parallel Computing is Everywhere, 10.3233/978-1-61499-843-3-178, 178-187, 2018.01, [URL], The magnetohydrodynamic (MHD) simulation is often applied to study the global dynamics and configuration of a planetary magnetosphere for the space weather. In this paper, the computational performance of MHD code is evaluated with 128 nodes Xeon Phi KNL of Cray XC40. As the results, the 2D and 3D domain decompositions of SoA (structure of array) make the effective performances and AoS (array of structure) and hybrid parallel computation become low performances. Adding the performance optimizations for Xeon Phi to our MHD simulation code, then we have obtained 2.4 % increase of execution efficiency in total and we achieved 3 TFlops performance gain using 128 nodes..
19. 松本 幸,安達 知也,住元 真司,曽我 武史,南里 豪志,宇野 篤也,黒川 原佳,庄司 文由,横川 三津夫, MPI_Allreduceの「京」上での実装と評価, 情報処理学会 ACS論文誌, 40, 採択, 2012.09.
20. Yoshiyuki Morie, Takeshi Nanri, Implementation of Neighbor Communication Algorithm Using Multi-NICs Effectively by Extended RDMA Interface, SC13 Technical Posters, 1-2, 2013.11.
21. Kazuichi Oe, Takeshi Nanri, Koji Okamura, Hybrid storage system consisting of cache drive and multi-tier SSD for improved IO access when IO is concentrated, IEICE Transactions on Information and Systems, 10.1587/transinf.2018EDP7253, E102D, 9, 1715-1730, 2019.01, [URL], In previous studies, we determined that workloads often contain many input-output (IO) concentrations. Such concentrations are aggregations of IO accesses. They appear in narrow regions of a storage volume and continue for durations of up to about an hour. These narrow regions occupy a small percentage of the logical unit number capacity, include most IO accesses, and appear at unpredictable logical block addresses. We investigated these workloads by focusing on page-level regularity and found that they often include few regularities. This means that simple caching may not reduce the response time for these workloads sufficiently because the cache migration algorithm uses page-level regularity. We previously developed an on-the-fly automated storage tiering (OTFAST) system consisting of an SSD and an HDD. The migration algorithm identifies IO concentrations with moderately long durations and migrates them from the HDD to the SSD. This means that there is little or no reduction in the response time when the workload includes few such concentrations. We have now developed a hybrid storage system consisting of a cache drive with an SSD and HDD and a multi-tier SSD that uses OTFAST, called "OTF-AST with caching." The OTF-AST scheme handles the IO accesses that produce moderately long duration IO concentrations while the caching scheme handles the remaining IO accesses. Experiments showed that the average response time for our system was 45% that of Facebook FlashCache on a Microsoft Research Cambridge workload..
22. 南里 豪志, HPCにおける通信ライブラリの動向, シミュレーション, 36, 2, 79-84, 2017.06.
23. Matthew Livesey, James Francis Stack, Jr., Fumie Costen, Takeshi Nanri, Norimasa Nakashima, Seiji FUJINO, Development of a CUDA Implementation of the 3D FDTD Method, IEEE Antennas and Propagation Magazine, 10.1109/MAP.2012.6348145, 54, 5, 186-195, 2012.10.
24. Kenji Ono, Jorji Nonaka, Hiroyuki Yoshikawa, Takeshi Nanri, Yoshiyuki Morie, Tomohiro Kawanabe, Fumiyoshi Shoji, Design of a Flexible In Situ Framework with a Temporal Buffer for Data Processing and Visualization of Time-Varying Datasets, Lecture Notes in Computer Science, 11203, 243-257, 2019.01.
25. Kenji Ono, Jorji Nonaka, Hiroyuki Yoshikawa, Takeshi Nanri, Yoshiyuki Morie, Tomohiro Kawanabe, Fumiyoshi Shoji, Design of a Flexible In Situ Framework with a Temporal Buffer for Data Processing and Visualization of Time-Varying Datasets., High Performance Computing - ISC High Performance 2018 International Workshops, Frankfurt/Main, Germany, June 28, 2018, Revised Selected Papers, 10.1007/978-3-030-02465-9_17, 243-257, 2018.06.
26. Tetsuya Nakatoh, Sachio Hirokawa, Toshiro Minami, Takeshi Nanri, Miho Funamori, Assessing the Significance of Scholarly Articles using their Attributes, Proc. of the 22nd International Symposium on Artificial Life and Robotics (AROB2017), 742-746, 2017.01.
27. Takeshi Nanri, Approaches for memory-efficient communication library and runtime communication optimization, Advanced Software Technologies for Post-Peta Scale Computing
The Japanese Post-Peta CREST Research Project
, 10.1007/978-981-13-1924-2_7, 121-138, 2018.12, [URL], This article summarizes the works established in Advanced Communication for Exa (ACE) project. The most important motivation of this project was the severe demands for scalable communication toward Exa-scale computations. Therefore, in the project, we have built a PGAS-based communication library, Advanced Communication Primitives (ACP). Its fundamental communication model is onesided, based on PGAS model, so that it can consume internal memory footprint as small as possible. Based on this model, several applications including simulations of magnetohydrodynamic, molecular orbitals, and particles were tuned to achieve higher scalability. In addition to that, some communication optimization techniques have been investigated. Especially, tuning methods of collective communications, such as message ordering, algorithm selection, and overlapping, are studied. Also, in this project, a network simulator NSIM-ACE is developed. It simulates behavior of packets for one-sided communications to study the effects of congestions on interconnects..
28. Tetsuya Nakatoh, Kenta Nagatani, Toshiro Minami, Sachio Hirokawa, Takeshi Nanri, Miho Funamori, Analysis of the Quality of Academic Papers by the Words in Abstracts, HIMI 2017, Part II, LNCS 10274, Proc. of the 19th International Conference on Human-Computer Interaction (HCI International 2017), 2017.07.
29. Kazuichi Oe, Mitsuru Sato, Takeshi Nanri, ATSMF
Automated tiered storage with fast memory and slow flash storage to improve response time with concentrated input-output (IO) workloads, IEICE Transactions on Information and Systems, 10.1587/transinf.2018PAP0005, E101D, 12, 2889-2901, 2018.12, [URL], The response times of solid state drives (SSDs) have decreased dramatically due to the growing use of non-volatile memory express (NVMe) devices. Such devices have response times of less than 100 micro seconds on average. The response times of all-flash-array systems have also decreased dramatically through the use of NVMe SSDs. However, there are applications, particularly virtual desktop infrastructure and in-memory database systems, that require storage systems with even shorter response times. Their workloads tend to contain many input-output (IO) concentrations, which are aggregations of IO accesses. They target narrow regions of the storage volume and can continue for up to an hour. These narrow regions occupy a few percent of the logical unit number capacity, are the target of most IO accesses, and appear at unpredictable logical block addresses. To drastically reduce the response times for such workloads, we developed an automated tiered storage system called “automated tiered storage with fast memory and slow flash storage” (ATSMF) in which the data in targeted regions are migrated between storage devices depending on the predicted remaining duration of the concentration. The assumed environment is a server with non-volatile memory and directly attached SSDs, with the user applications executed on the server as this reduces the average response time. Our system predicts the effect of migration by using the previously monitored values of the increase in response time during migration and the change in response time after migration. These values are consistent for each type of workload if the system is built using both non-volatile memory and SSDs. In particular, the system predicts the remaining duration of an IO concentration, calculates the expected response-time increase during migration and the expected response-time decrease after migration, and migrates the data in the targeted regions if the sum of response-time decrease after migration exceeds the sum of response-time increase during migration. Experimental results indicate that ATSMF is at least 20% faster than flash storage only and that its memory access ratio is more than 50%..
30. Yoshiyuki Morie, Takeshi Nanri, A Neighbor Communication Algorithm with Making an Effective Use of NICs on Multidimensional-Mesh/torus, International Conference on Simulation Technology (JSST2013), JSST2013, 1-2, 2013.09.
31. Yoshiyuki Morie, Takeshi Nanri, Ryutaro Susukita, A Method for Predicting a Penalty of Contentions by Considering Priorities of Routing among Packets on Direct Interconnection Network, 2011 Fourth International Joint Conference on Computational Sciences and Optimization, 10.1109/CSO.2011.35, 263-267, 2011.04.

九大関連コンテンツ

pure2017年10月2日から、「九州大学研究者情報」を補完するデータベースとして、Elsevier社の「Pure」による研究業績の公開を開始しました。