Kyushu University [Tanimoto Teruo (Associate Professor) Faculty of Information Science and Electrical Engineering, Department of Advanced Information Technology]

Tanimoto Teruo

Last modified date：2023.09.29

Associate Professor / Department of Advanced Information Technology / Faculty of Information Science and Electrical Engineering

Papers

1.	Kuan Yi Ng, Aalaa M. A. Babai, Teruo Tanimoto, Satoshi Kawakami, and Koji Inoue, Empirical Power-Performance Analysis of Layer-wise CNN Inference on Single Board Computers, Journal of Information Processing, 31, 478-494, 2023.08.
2.	Wang LIAO, Yasunari Suzuki, Teruo Tanimoto, Yosuke Ueno, and Yuuki Tokunaga, WIT-Greedy: Hardware System Design of Weighted ITerative Greedy Decoder for Surface Code, Proceedings of the 28th Asia and South Pacific Design Automation Conference (ASP-DAC ‘23), 2023.01.
3.	Yasunari Suzuki, Takanori Sugiyama, Tomochika Arai, Wang Liao, Koji Inoue, and Teruo Tanimoto, Q3DE: A fault-tolerant quantum computer architecture for multi-bit burst errors by cosmic rays, Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture (MICRO-55), 1110-1125, 2022.10.
4.	Ilkwon Byun, Junpyo Kim, Dongmoon Min, Ikki Nagaoka, Kosuke Fukumitsu, Iori Ishikawa, Teruo Tanimoto, Masamitsu Tanaka, Koji Inoue, and Jangwoo Kim, XQsim: Modeling Cross-Technology Control Processors for 10+K Qubit Quantum Computers, Proceedings of ACM/IEEE International Symposium on Computer Architecture (ISCA ‘22), 366-382, 2022.06.
5.	Koki Ishida, Il-Kwon Byun, Ikki Nagaoka, Kosuke Fukumitsu, Masamitsu Tanaka, Satoshi Kawakami, Teruo Tanimoto, Takatsugu Ono, Jangwoo Kim, and Koji Inoue, Superconductor Computing for Neural Networks, IEEE Micro, vol.41, no.3, pp.19–26, 2021.05, The superconductor single-flux-quantum (SFQ) logic family has been recognized as a promising solution for the post-Moore era, thanks to the ultrafast and low-power switching characteristics of superconductor devices. Researchers have made tremendous efforts in various aspects, especially in device and circuit design. However, there has been little progress in designing a convincing SFQ-based architectural unit due to a lack of understanding about its potentials and limitations at the architectural level. This article provides the design principles for SFQ-based architectural units with an extremely high-performance neural processing unit (NPU). To achieve our goal, we developed and validated a simulation framework to identify critical architectural bottlenecks in designing a performance-effective SFQ-based NPU. We propose SuperNPU, which outperforms a conventional state-of-the-art NPU by 23 times in terms of computing performance and 1.23 times in power efficiency even with the cooling cost of the 4K environment..
6.	Koki Ishida, Il-Kwon Byun, Ikki Nagaoka, Kosuke Fukumitsu, Masamitsu Tanaka, Satoshi Kawakami, Teruo Tanimoto, Takatsugu Ono, Jangwoo Kim, Koji Inoue, SuperNPU: Architecting an Extremely Fast Neural Processing Unit Using Superconducting Logic Devices, Proceedings of the 53rd IEEE/ACM International Symposium on Microarchitecture, 10.1109/MICRO50266.2020.00018, 58-72, 2020.10.
7.	Teruo Tanimoto, Shuhei Matsuo, Satoshi Kawakami, Yutaka Tabuchi, Masao Hirokawa, Koji Inoue, How Many Trials Do We Need for Reliable NISQ Computing?, 2020 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), 10.1109/isvlsi49217.2020.00059, 288-290, 2020.07.
8.	Teruo Tanimoto, Shuhei Matsuo, Satoshi Kawakami, Yutaka Tabuchi, Masao Hirokawa, Koji Inoue, Practical Error Modeling Toward Realistic NISQ Simulation, 2020 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), 10.1109/isvlsi49217.2020.00060, 291-293, 2020.07.
9.	Koki Ishida, Masamitsu Tanaka, Ikki Nagaoka, Takatsugu Ono, Satoshi Kawakami, Teruo Tanimoto, Akira Fujimaki, Koji Inoue, 32 GHz 6.5 mW Gate-Level-Pipelined 4-Bit Processor using Superconductor Single-Flux-Quantum Logic, Proceedings of the IEEE Symposium on VLSI Circuits, 10.1109/vlsicircuits18222.2020.9162826, 1-2, 2020.06.
10.	Keitaro Oka, Satoshi Kawakami, Teruo Tanimoto, Takatsugu Ono, Koji Inoue, Enhancing a manycore-oriented compressed cache for GPGPU, Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, 10.1145/3368474.3368491, 22-31, 2020.01.
11.	Teruo Tanimoto, Takatsugu Ono, Koji Inoue, Dependence graph model for accurate critical path analysis on out-of-order processors, Journal of Information Processing, 10.2197/ipsjjip.25.983, 25, 983-992, 2017.12, The dependence graph model of out-of-order (OoO) instruction execution is a powerful representation used for the critical path analysis. However, most, if not all, of the previous models are out-of-date and lack enough detail to model modern OoO processors, or are too specific and complicated which limit their generality and applicability. In this paper, we propose an enhanced dependence graph model which remains simple but greatly improves the accuracy over prior models. The evaluation results using the gem5 simulator with configurations similar to Intel’s Haswell and Silvermont architecture show that the proposed enhanced model achieves CPI errors of 2.1% and 4.4% which are 90.3% and 77.1% improvements from the state-of-the-art model..
12.	Teruo Tanimoto, Takatsugu Ono, Koji Inoue, CPCI Stack: Metric for Accurate Bottleneck Analysis on OoO Microprocessors, Proceedings of the Fifth International Symposium on Computing and Networking (CANDAR ‘17), 10.1109/CANDAR.2017.60, 166-172, 2017.11.
13.	Hiroshi Sasaki, Fang-Hsiang Su, Teruo Tanimoto, and Simha Sethumadhavan, Why Do Programs Have Heavy Tails?, Proceedings of the 2017 IEEE International Symposium on Workload Characterization (IISWC ‘17), 135-145, 2017.10.
14.	Teruo Tanimoto, Takatsugu Ono, Koji Inoue, Hiroshi Sasaki, Enhanced Dependence Graph Model for Critical Path Analysis on Modern Out-of-Order Processors, IEEE COMPUTER ARCHITECTURE LETTERS, 10.1109/LCA.2017.2684813, 16, 2, 111-114, 2017.07, The dependence graph model of out-of-order (OoO) instruction execution is a powerful representation used for the critical path analysis. However most, if not all, of the previous models are out-of-date and lack enough detail to model modern OoO processors, or are too specific and complicated which limit their generality and applicability. In this paper, we propose an enhanced dependence graph model which remains simple but greatly improves the accuracy over prior models. The evaluation results using the gem5 simulator show that the proposed enhanced model achieves CPI error of 2.1 percent which is a 90.3 percent improvement against the state-of-the-art model..
15.	Hiroshi Sasaki, Fang-Hsiang Su, Teruo Tanimoto, Simha Sethumadhavan, Heavy Tails in Program Structure, IEEE COMPUTER ARCHITECTURE LETTERS, 10.1109/LCA.2016.2574350, 16, 1, 34-37, 2017.01, Designing and optimizing computer systems require deep understanding of the underlying system behavior. Historically many important observations that led to the development of essential hardware and software optimizations were driven by empirical observations about program behavior. In this paper, we report an interesting property of program structures by viewing dynamic program execution as a changing network. By analyzing the communication network created as a result of dynamic program execution, we find that communication patterns follow heavy-tailed distributions. In other words, a few instructions have consumers that are orders of magnitude larger than most instructions in a program. Surprisingly, these heavy-tailed distributions follow the iconic power law previously seen in man-made and natural networks. We provide empirical measurements based on the SPEC CPU2006 benchmarks to validate our findings as well as perform semantic analysis of the source code to reveal the causes of such behavior..
16.	Takatsugu Ono, Yotaro Konishi, Teruo Tanimoto, Noboru Iwamatsu, Takashi Miyoshi, Jun Tanaka, A Flexible Direct Attached Storage for a Data Intensive Application, IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 10.1587/transinf.2015PAP0029, E98D, 12, 2168-2177, 2015.12, Big data analysis and a data storing applications require a huge volume of storage and a high I/O performance. Applications can achieve high level of performance and cost efficiency by exploiting the high I/O performance of direct attached storages (DAS) such as internal HDDs. With the size of stored data ever increasing, it will be difficult to replace servers since internal HDDs contain huge amounts of data. Generally, the data is copied via Ethernet when transferring the data from the internal HDDs to the new server. However, the amount of data will continue to rapidly increase, and thus, it will be hard to make these types of transfers through the Ethernet since it will take a long time. A storage area network such as iSCSI can be used to avoid this problem because the data can be shared with the servers. However, this decreases the level of performance and increases the costs. Improving the flexibility without incurring I/O performance degradation is required in order to improve the DAS architecture. In response to this issue, we propose FlexDAS, which improves the flexibility of direct attached storage by using a disk area network (DAN) without degradation the I/O performance. A resource manager connects or disconnects the computation nodes to the HDDs via the FlexDAS switch, which supports the SAS or SATA protocols. This function enables for the servers to be replaced in a short period of time. We developed a prototype FlexDAS switch and quantitatively evaluated the architecture. Results show that the FlexDAS switch can disconnect and connect the HDD to the server in just 1.16 seconds. We also confirmed that the FlexDAS improves the performance of the data intensive applications by up to 2.84 times compared with the iSCSI..
17.	Takatsugu Ono, Yotaro Konishi, Teruo Tanimoto, Noboru Iwamatsu, Takashi Miyoshi, and Jun Tanaka, FlexDAS: A Flexible Direct Attached Storage for I/O Intensive Applications, Proceedings of IEEE International Conference on Big Data (IEEE BigData ‘14), 147-152, 2014.10.
18.	Hiroshi Sasaki, Teruo Tanimoto, Koji Inoue, and Hiroshi Nakamura, Scalability-based Manycore Partitioning, Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques (PACT ‘12), 107-116, 2012.09.

Unauthorized reprint of the contents of this database is prohibited.