Kyushu University Academic Staff Educational and Research Activities Database
List of Presentations
Takatsugu Ono Last modified date:2023.06.28

Associate Professor / Department of Advanced Information Technology / Faculty of Information Science and Electrical Engineering

1. Masamitsu Tanaka, Ikki Nagaoka, Satoshi Kawakami, Teruo Tanimoto, Takatsugu Ono, Koji Inoue, Akira Fujimaki, High-throughput single-flux-quantum circuits based on gate-level-pipelining toward artificial intelligence applications, 15th Superconducting SFQ VLSI Workshop (SSV) / 4th Workshop on Quantum and Classical Cryogenic Devices, Circuits, and Systems (QCCC), 2022.09.
2. Takumi Inaba, Takatsugu Ono, Koji Inoue, Satoshi Kawakami, Evaluating floating-point multipliers with opto-electrical hybrid circuits, ACM International Conference on Computing Frontiers, 2023.05.
3. Takatsugu Ono, An online program identification technology for a secure processor, Workshop on Secure Society in Future, 2021.07.
4. Koki Ishida,Masamitsu Tanaka,Takatsugu Ono,Koji Inoue, Prototype Design of 31 GHz Single-Flux-Quantum Gate-Level-Pipelined Microprocessor, 12th Superconducting SFQ VLSI Workshop, 2019.01.
5. Takatsugu Ono, Hardware-based malware detection for IoT microprocessors, The 6th International Workshop on Cyber Security-Workshop on IoT Security : Secure Smart Homes, 2018.09.
6. Ghadeer Almusaddar,Takatsugu Ono,Smruti Sarangi,Koji Inoue, Whitelisting Approach Using Hardware Performance Counters in IoT Microprocessors, IEICE Tech. Rep, 2018.04.
7. Masamitsu Tanaka, Yuki Hatanaka, Yuichi Matsui, Ikki Nagaoka, Koki Ishida, Kyosuke Sano, Taro Yamashita, Takatsugu Ono, Koji Inoue, Akira Fujimaki, 30GHz Operation of Datapath for Bit-Parallel, Gate-Level-Pipelined Rapid Single-Flux-Quantum Microprocessors, Applied Superconductivity Conference, 2018.10.
8. Takatsugu Ono, A Network Simulator for On/Off Links of Large-Scale Interconnection Networks, NII Shonan Meeting Seminar 134 Advances in Heterogeneous Computing from Hardware to Software, 2018.09.
9. Ghadeer Almusaddar, Takatsugu Ono, Smruti Sarangi, Koji Inoue, Whitelisting Approach Using Hardware Performance Counters in IoT Microprocessors, IEICE Tech. Rep, 2018.04.
10. Teruo Tanimoto, Takatsugu Ono, Inoue Koji, CPCI Stack
Metric for Accurate Bottleneck Analysis on OoO Microprocessors, 5th International Symposium on Computing and Networking, CANDAR 2017, 2018.04, Correctly understanding microarchitectural bottlenecks is important to optimize performance and energy of OoO (Out-of-Order) processors. Although CPI (Cycles Per Instruction) stack has been utilized for this purpose, it stacks architectural events heuristically by counting how many times the events occur, and the order of stacking affects the result, which may be misleading. It is because CPI stack does not consider the execution path of dynamic instructions. Critical path analysis (CPA) is a well-known method to identify the critical execution path of dynamic instruction execution on OoO processors. The critical path consists of the sequence of events that determines the execution time of a program on a certain processor. We develop a novel representation of CPCI stack (Cycles Per Critical Instruction stack), which is CPI stack based on CPA. The main challenge in constructing CPCI stack is how to analyze a large number of paths because CPA often results in numerous critical paths. In this paper, we show that there are more than ten to the tenth power critical paths in the execution of only one thousand instructions in 35 benchmarks out of 48 from SPEC CPU2006. Then, we propose a statistical method to analyze all the critical paths and show a case study using the benchmarks..
11. Takatsugu Ono, Yuta Kakibuka, Nikhil Jain, Abhinav Bhatele, Shinobu Miwa, Koji Inoue, Extending A Network Simulator for Power/Performance Prediction of Large Scale Interconnection Networks, SIAM Conference on Parallel Processing for Scientific Computing, 2018.03.
12. Takatsugu Ono, Secure Computing Platform for IoT Devices, The 6th International Cybersecurity Workshop, 2018.01.
13. Takatsugu Ono, Protecting an IoT Device from Malware - A Processor Architecture Perspective, Workshop on Architectural Implications of Security in IoT Processors, 2017.11.
14. Masamitsu Tanaka, Ryo Sato, Yuki Hatanaka, Yuichi Matsui, Hiroyuki Akaike, Akira Fujimaki, Koki Ishida, Takatsugu Ono, Koji Inoue, High-Throughput Bit-Parallel Arithmetic Logic Unit Using Rapid Single-Flux-Quantum Logic, International Superconductive Electronics Conference, 2017.06.
15. Koki Ishida, Masamitsu Tanaka, Takatsugu Ono, Koji Inoue, Logic Design of a Single-Flux-Quantum Gate-Level-Pipelined Microprocessor, Superconducting SFQ VLSI Workshop, 2017.02.
16. Satoshi Imamura, Yuichiro Yasui, Inoue Koji, Takatsugu Ono, Hiroshi Sasaki, Katsuki Fujisawa, Power-Efficient Breadth-First Search with DRAM Row Buffer Locality-Aware Address Mapping, 2016 High Performance Graph Data Management and Processing, HPGDMP 2016, 2017.01, Graph analysis applications have been widely used in real services such as road-traffic analysis and social network services. Breadth-first search (BFS) is one of the most representative algorithms for such applications; therefore, many researchers have tuned it to maximize performance. On the other hand, owing to the strict power constraints of modern HPC systems, it is necessary to improve power efficiency (i.e., performance per watt) when executing BFS. In this work, we focus on the power efficiency of DRAM and investigate the memory access pattern of a state-of-the-art BFS implementation using a cycle-accurate processor simulator. The results reveal that the conventional address mapping schemes of modern memory controllers do not efficiently exploit row buffers in DRAM. Thus, we propose a new scheme called per-row channel interleaving and improve the DRAM power efficiency by 30.3% compared to a conventional scheme for a certain simulator setting. Moreover, we demonstrate that this proposed scheme is effective for various configurations of memory controllers..
17. Koki Ishida, Masamitsu Tanaka, Takatsugu Ono, Inoue Koji, Single-flux-quantum cache memory architecture, 13th International SoC Design Conference, ISOCC 2016, 2016.12, Single-flux-quantum (SFQ) logic is promising technology to realize an incredible microprocessor which operates over 100 GHz due to its ultra-fast-speed and ultra-lowpower natures. Although previous work has demonstrated prototype of an SFQ microprocessor, the SFQ based L1 cache memory has not well optimized: A large access latency and strictly limited scalability. This paper proposes a novel SFQ cache architecture to support fast accesses. The sub-Arrayed structure applied to the cache produces better scalability in terms of capacity. Evaluation results show that the proposed cache achieves 1.8X fast access speed..
18. Masamitsu Tanaka, Ryo Sato, Yuki Hatanaka, Yuki Ando, Takahiro Kawaguchi, Koki Ishida, Akira Fujimaki, Kazuyoshi Takagi, Naofumi Takagi, Takatsugu Ono, Koji Inoue, Energy-Efficient, High-Performance Microprocessors Based on Single-Flux-Quantum Logic, 29th International Symposium on Superconductivity, 2016.12.
19. Satoshi Imamura, Yuichiro Yasui, 井上 弘士, 小野 貴継, Hiroshi Sasaki, 藤澤 克樹, Power-Efficient Breadth-First Search with DRAM Row Buffer Locality-Aware Address Mapping, 1st High Performance Graph Data Management and Processing workshop, 2016.11.
20. Yoshihiro Tanaka, Keitaro Oka, Takatsugu Ono, Inoue Koji, Accuracy analysis of machine learning-based performance modeling for microprocessors, 4th International Japan-Egypt Conference on Electronic, Communication and Computers, JEC-ECC 2016, 2016.07, This paper analyzes accuracy of performance models generated by machine learning-based empirical modeling methodology. Although the accuracy strongly depends on the quality of learning procedure, it is not clear what kind of learning algorithms and training data set (or feature) should be used. This paper inclusively explores the learning space of processor performance modeling as a case study. We focus on static architectural parameters as training data set such as cache size and clock frequency. Experimental results show that a tree-based non-linear regression modeling is superior to a stepwise linear regression modeling. Another observation is that clock frequency is the most important feature to improve prediction accuracy..
21. Yusuke Inoue, Takatsugu Ono, Koji Inoue, Adaptive Frame-Rate Optimization for Energy-Efficient Object Tracking, The 20th International Conference on Image Processing, Computer Vision & Pattern Recognition, 2016.07.
22. 井上 弘士, 稲富 雄一, 小野 貴継, Challenges in Power Constrained High Performance Computing, 2nd Annual Meeting on Advanced Computing System and Infrastructure, 2016.01.
23. Takatsugu Ono, Yotaro Konishi, Teruo Tanimoto, Noboru Iwamatsu, Takashi Miyoshi, Jun Tanaka, FlexDAS: A Flexible Direct Attached Storage for I/O Intensive Applications, IEEE International Conference on Big Data, 2014.10.
24. Teruo Tanimoto, Takatsugu Ono, Kohta Nakashima, Takashi Miyoshi, Hardware-assisted scalable flow control of shared receive queue, 28th ACM International Conference on Supercomputing, ICS 2014, 2014.01, The total number of processor cores in supercomputers is increasing while memory size per core is decreasing due to the adoption of processors with multiple cores. Shared Receive Queue is a technique that effectively reduces the memory usage of buffers, but the absence of flow control results in excess buffer pools. We propose a hardware-assisted flow control that reduces flow control latency by 95.1%, thus enabling scalable supercomputers with multi-core processors..
25. Takatsugu Ono, Inoue Koji, Kazuaki Murakami, Adaptive cache-line size management on 3D integrated microprocessors, 2009 International SoC Design Conference, ISOCC 2009, 2009.12, The memory bandwidth can dramatically be improved by means of stacking the main memory (DRAM) on processor cores and connecting them by wide on-chip buses composed of through silicon vias (TSVs). The 3D stacking makes it possible to reduce the cache miss penalty because large amount of data can be transferred from the main memory to the cache at a time. If a large cache line size is employed, we can expect the effect of prefetching. However, it might worsen the system performance if programs do not have enough spatial localities of memory references. To solve this problem, we introduce software-controllable variable line-size cache scheme. In this paper, we apply it to an L1 data cache with 3D stacked DRAM organization. In our evaluation, it is observed that our approach reduces the L1 data cache and stacked DRAM energy consumption up to 75%, compared to a conventional cache..