九州大学 研究者情報
論文一覧
小野 貴継(おの たかつぐ) データ更新日:2021.07.15

准教授 /  システム情報科学研究院 情報知能工学部門


原著論文
1. Koki Ishida, Il-Kwon Byun, Ikki Nagaoka, Kosuke Fukumitsu, Masamitsu Tanaka, Satoshi Kawakami, Teruo Tanimoto, Takatsugu Ono, Jangwoo Kim, Koji Inoue, Architecting an Extremely Fast Neural Processing Unit Using Superconducting Logic Devices, The 53rd IEEE/ACM International Symposium on Microarchitectur, 58-72, 2020.10.
2. Koki Ishida, Il-Kwon Byun, Ikki Nagaoka, Kosuke Fukumitsu, Masamitsu Tanaka, Satoshi Kawakami, Teruo Tanimoto, Takatsugu Ono, Jangwoo Kim, Koji Inoue, Superconductor Computing for Neural Networks, IEEE Micro, 41, 3, 19-26, 2021.05.
3. Ikki Nagaoka, Koki Ishida, Masamitsu Tanaka, Kyosuke Sano, Taro Yamashita, Takatsugu Ono, Koji Inoue,  Akira Fujimaki, Demonstration of a 52-GHz Bit-Parallel Multiplier Using Low-Voltage Rapid Single-Flux-Quantum Logic, IEEE Transactions on Applied Superconductivity, 31, 5, 1-5, 2021.08.
4. 坂田昂亮, 湯野剛史, 小野貴継,  川邊 武俊, 車両軌道計画問題を対象としたマルチコアプロセッサによるC/GMRES演算高速処理実装法, 自動車技術会論文集, 51, 3, 497-502, 2020.05.
5. Koki Ishida,Masamitsu Tanaka,Ikki Nagaoka,Takatsugu Ono,Satoshi Kawakami,Teruo Tanimoto,Akira Fujimaki,Koji Inoue, 32 GHz 6.5 mW Gate-Level-Pipelined 4-bit Processor using Superconductor Single-Flux-Quantum Logic, IEEE 2020 Symposia on VLSI Technology and Circuits, 2020.06.
6. Keitaro Oka, Satoshi Kawakami, Teruo Tanimoto, Takatsugu Ono, Koji Inoue, Enhancing a Manycore-Oriented Compressed Cache for GPGPU, International Conference on High Performance Computing in Asia-Pacific Region, 2020.01.
7. Giorgis Georgakoudis, Nikhil Jain, Takatsugu Ono, Koji Inoue, Shinobu Miwa, Abhinav Bhatele,, Evaluating the Impact of Energy Efficient Networks on HPC Workloads, 26th IEEE International Conference on High Performance Computing, Data, and Analytics, 2019.12.
8. Sandeep Kumar, Diksha Moolchandani, Takatsugu Ono and Smruti Sarangi, F-LaaS: A Control-Flow-Attack Immune License-as-a-Service Model, IEEE International Conference on Services Computing, 2019.07.
9. 川上哲志, 小野貴継, 納富雅也, 井上弘士, ナノフォトニック・ニューラルネットワークアクセラレータ向け統合評価環境, 電子情報通信学会論文誌, J102-A, 6, 182-193, 2019.06.
10. Teruo Tanimoto, Takatsugu Ono, Koji Inoue, Critical Path based Microarchitectural Bottleneck Analysis for Out-of-Order Execution, IEICE, E102-A, 6, 758-766, 2019.06.
11. 田中雅光, 佐藤諒, 石田浩貴, 畑中湧貴, 松井祐一, 小野貴継, 井上弘士, 藤巻 朗, 超伝導単一磁束量子回路による50GHzビット並列演算マイクロプロセッサに向けた要素回路設計, 電子情報通信学会論文誌, J101-C, 10, 2864-2877, 2019.06.
12. Takatsugu Ono, Zhe Chen, Inoue Koji, Improving lifetime in MLC phase change memory using slow writes, 2018 Japan-Africa Conference on Electronics, Communications, and Computations, JAC-ECC 2018
2018 Proceedings of the Japan-Africa Conference on Electronics, Communications, and Computations, JAC-ECC 2018
, 10.1109/JEC-ECC.2018.8679540, 65-68, 2019.04, [URL], This paper reports the performance and endurance impacts of a slow-write approach for a multi-level cell (MLC) of phase change memory (PCM). An MLC improves the density of PCM, but the endurance is a critical issue. To extend the lifetime of the cell, a slow-write approach is one of the techniques that is used. However, the slow-write approach increases the program execution time because it takes a long time. In this paper, we discuss three types of slow-write approach for MLC and evaluate the endurance and performance quantitatively to understand the effectiveness of our approach. Our evaluation results show that one of the approaches enhances the endurance of MLC PCM 1.57 times with a 1.41 % performance degradation on average compared with the conventional write operation..
13. Mihiro Sonoyama, Takatsugu Ono, Haruichi Kanaya, Osamu Muta, Smruti R. Sarangi, Inoue Koji, Radio propagation characteristics-based spoofing attack prevention on wireless connected devices, Journal of information processing, 10.2197/ipsjjip.27.322, 27, 322-334, 2019.01, [URL], A spoofing attack is a critical issue in wireless communication in which a malicious transmitter outside a system attempts to be genuine. As a countermeasure against this, we propose a device-authentication method based on position identification using radio-propagation characteristics (RPCs). Not depending on information processing such as encryption technology, this method can be applied to sensing devices etc. which commonly have many resource restrictions. We call the space from which attacks achieve success as the “attack space.” In order to confine the attack space inside of the target system to prevent spoofing attacks from the outside, formulation of the relationship between combinations of transceivers and the attack space is necessary. In this research, we consider two RPCs, the received signal strength ratio (RSSR) and the time difference of arrival (TDoA), and construct the attack-space model which uses these RPCs simultaneously. We take a tire pressure monitoring system (TPMS) as a case study of this method and execute a security evaluation based on radio-wave-propagation simulation. The simulation results assuming multiple noise environments all indicate that it is possible to eliminate the attack possibility from a distant location..
14. Satoshi Kawakami, Takatsugu Ono, Toshiyuki Ohtsuka, Inoue Koji, Parallel precomputation with input value prediction for model predictive control systems, IEICE Transactions on Information and Systems, 10.1587/transinf.2018PAP0003, E101D, 12, 2864-2877, 2018.12, [URL], We propose a parallel precomputation method for real-time model predictive control. The key idea is to use predicted input values produced by model predictive control to solve an optimal control problem in advance. It is well known that control systems are not suitable for multi- or many-core processors because feedback-loop control systems are inherently based on sequential operations. However, since the proposed method does not rely on conventional thread-/data-level parallelism, it can be easily applied to such control systems without changing the algorithm in applications. A practical evaluation using three real-world model predictive control system simulation programs demonstrates drastic performance improvement without degrading control quality offered by the proposed method..
15. Yusuke Inoue, Takatsugu Ono, Inoue Koji, Real-Time frame-rate control for energy-efficient on-line object tracking, IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, 10.1587/transfun.E101.A.2297, E101A, 12, 2297-2307, 2018.12, [URL], On-line object tracking (OLOT) has been a core technology in computer vision, and its importance has been increasing rapidly. Because this technology is utilized for battery-operated products, energy consumption must be minimized. This paper describes a method of adaptive frame-rate optimization to satisfy that requirement. An energy trade-off occurs between image capturing and object tracking. Therefore, the method optimizes the frame-rate based on always changed object speed for minimizing the total energy while taking into account the trade-off. Simulation results show a maximum energy reduction of 50.0%, and an average reduction of 35.9% without serious tracking accuracy degradation..
16. Satoshi Imamura, Yuichiro Yasui, Inoue Koji, Takatsugu Ono, Hiroshi Sasaki, Katsuki Fujisawa, Evaluating energy-efficiency of DRAM channel interleaving schemes for multithreaded programs, IEICE Transactions on Information and Systems, 10.1587/transinf.2017EDP7296, E101D, 9, 2247-2257, 2018.09, [URL], The power consumption of server platforms has been increasing as the amount of hardware resources equipped on them is increased. Especially, the capacity of DRAM continues to grow, and it is not rare that DRAM consumes higher power than processors on modern servers. Therefore, a reduction in the DRAM energy consumption is a critical challenge to reduce the system-level energy consumption. Although it is well known that improving row buffer locality (RBL) and bank-level parallelism (BLP) is effective to reduce the DRAM energy consumption, our preliminary evaluation on a real server demonstrates that RBL is generally low across 15 multithreaded benchmarks. In this paper, we investigate the memory access patterns of these benchmarks using a simulator and observe that cache line-grained channel interleaving schemes, which are widely applied to modern servers including multiple memory channels, hurt the RBL each of the benchmarks potentially possesses. In order to address this problem, we focus on a row-grained channel interleaving scheme and compare it with three cache line-grained schemes. Our evaluation shows that it reduces the DRAM energy consumption by 16.7%, 12.3%, and 5.5%on average (up to 34.7%, 28.2%, and 12.0%) compared to the other schemes, respectively..
17. Koki Ishida, Masamitsu Tanaka, Takatsugu Ono, Koji Inoue, Towards Ultra High-Speed Cryogenic Single-Flux-Quantum Computing, IEICE Transactions on Electronics, Vol.E101-C, No.5, 359-369, 2018.05.
18. 川上哲志, 小野貴継, 井上弘士, 納富 雅也, ナノフォトニック・ニューラルアクセラレータ向け性能評価環境の構築, 回路とシステムのワークショップ, 42-47, 2018.05.
19. Koki Ishida, Masamitsu Tanaka, Takatsugu Ono, Inoue Koji, Towards ultra-high-speed cryogenic single-flux-quantum computing, IEICE Transactions on Electronics, 10.1587/transele.E101.C.359, E101C, 5, 359-369, 2018.05, [URL], CMOS microprocessors are limited in their capacity for clock speed improvement because of increasing computing power, i.e., they face a power-wall problem. Single-flux-quantum (SFQ) circuits offer a solution with their ultra-fast-speed and ultra-low-power natures. This paper introduces our contributions towards ultra-high-speed cryogenic SFQ computing. The first step is to design SFQ microprocessors. From qualitatively and quantitatively evaluating past-designed SFQ microprocessors, we have found that revisiting the architecture of SFQ microprocessors and on-chip caches is the first critical challenge. On the basis of cross-layer discussions and analysis, we came to the conclusion that a bit-parallel gate-level pipeline architecture is the best solution for SFQ designs. This paper summarizes our current research results targeting SFQ microprocessors and onchip cache architectures..
20. Mihiro Sonoyama, Takatsugu Ono, Osamu Muta, Haruichi Kanaya, Inoue Koji, Wireless Spoofing-Attack Prevention Using Radio-Propagation Characteristics, 15th IEEE International Conference on Dependable, Autonomic and Secure Computing, 2017 IEEE 15th International Conference on Pervasive Intelligence and Computing, 2017 IEEE 3rd International Conference on Big Data Intelligence and Computing and 2017 IEEE Cyber Science and Technology Congress, DASC-PICom-DataCom-CyberSciTec 2017
Proceedings - 2017 IEEE 15th International Conference on Dependable, Autonomic and Secure Computing, 2017 IEEE 15th International Conference on Pervasive Intelligence and Computing, 2017 IEEE 3rd International Conference on Big Data Intelligence and Computing and 2017 IEEE Cyber Science and Technology Congress, DASC-PICom-DataCom-CyberSciTec 2017
, 10.1109/DASC-PICom-DataCom-CyberSciTec.2017.94, 2018-January, 502-510, 2018.03, [URL], A spoofing attack is a critical issue in wireless communication in embedded systems in which a malicious transmitter outside a system attempts to be genuine. As a countermeasure against this, we propose a device-authentication method based on position identification using radio-propagation characteristics (RPCs). Since RPCs are natural phenomena, this method does not depend on information processing such as encryption technology. We call the space from which attacks achieve success "attack space". By formulating the relationship between combinations of transceivers and the attack space, this method can be used in embedded systems. In this research, we consider two RPCs, the received signal strength ratio (RSSR) and the time difference of arrival (TDoA), and construct the attack-space model which use these RPCs simultaneously for preventing wireless spoofing-attacks. We explain the results of a validity evaluation for the proposed model based on radio-wave-propagation simulation assuming free space and a noisy environment..
21. Teruo Tanimoto, Takatsugu Ono, Koji Inoue, Dependence Graph Model for Accurate Critical Path Analysis on Out-of-Order Processors, Journal of Information Processing, Vol.25, 983-992, 2017.12.
22. Teruo Tanimoto, Takatsugu Ono, Inoue Koji, Dependence graph model for accurate critical path analysis on out-of-order processors, Journal of Information Processing, 10.2197/ipsjjip.25.983, 25, 983-992, 2017.12, [URL], The dependence graph model of out-of-order (OoO) instruction execution is a powerful representation used for the critical path analysis. However, most, if not all, of the previous models are out-of-date and lack enough detail to model modern OoO processors, or are too specific and complicated which limit their generality and applicability. In this paper, we propose an enhanced dependence graph model which remains simple but greatly improves the accuracy over prior models. The evaluation results using the gem5 simulator with configurations similar to Intel’s Haswell and Silvermont architecture show that the proposed enhanced model achieves CPI errors of 2.1% and 4.4% which are 90.3% and 77.1% improvements from the state-of-the-art model..
23. Mihiro Sonoyama, Takatsugu Ono, Osamu Muta, Haruichi Kanaya, Koji Inoue, Wireless Spoofing-Attack PreventionUsing Radio-Propagation Characteristics, IEEE International Conference on Dependable, Autonomic and Secure Computing, 502-510, 2017.11.
24. Teruo Tanimoto, Takatsugu Ono, Koji Inoue, CPCI Stack: Metric for Accurate Bottleneck Analysis on OoO Microprocessors, International Symposium on Computing and Networking, 166-172, 2017.11.
25. Masamitsu Tanaka, Ryo Sato, Yuki Hatanaka, Yuichi Matsui, Hiroyuki Akaike, Akira Fujimaki, Koki Ishida, Takatsugu Ono, Koji Inoue, 1.4-mW, 56-GHz Arithmetic Logic Unit Based on Superconductor Single-Flux-Quantum Logic Circuit, IEEE/ACM International Symposium on Low Power Electronics and Design, 2017.07.
26. Masamitsu Tanaka, Ryo Sato, Yuki Hatanaka, Yuichi Matsui, Hiroyuki Akaike, Akira Fujimaki, Koki Ishida, Takatsugu Ono, Koji Inoue, High-Throughput Bit-Parallel Arithmetic Logic Unit Using Rapid Single-Flux-Quantum Logic, International Superconductive Electronics Conference, 2017.06.
27. 石田 浩貴, 田中雅光, 小野 貴継, 井上 弘士, 単一磁束量子回路向けマイクロプロセッサのアーキテクチャ探索, 情報処理学会論文誌, 58, 3, 629-643, 2017.03.
28. Teruo Tanimoto, 小野 貴継, 井上 弘士, Hiroshi Sasaki, Enhanced Dependence Graph Model for Critical Path Analysis on Modern Out-of-Order Processors, IEEE Computer Architecture Letters, PP, 99, 1-1, 2017.03.
29. Satoshi Imamura, Keitaro Oka, Yuichiro Yasui, Yuichi Inadomi, Katsuki Fujisawa, Toshio Endo, Koji Ueno, Keiichiro Fukazawa, Nozomi Hata, Yuta Kakibuka, Koji Inoue, Takatsugu Ono, Evaluating the Impacts of Code-Level Performance Tunings on Power Efficiency, IEEE International Conference on Big Data (Big Data), 362-369, 2016.12.
30. Koki Ishida, Masamitsu Tanaka, Takatsugu Ono, Koji Inoue, Single-Flux-Quantum Cache Memory Architecture, 13th International SoC Design Conference, 106-107, 2016.10.
31. Yusuke Inoue, Takatsugu Ono, Koji Inoue, Adaptive Frame-Rate Optimization for Energy-Efficient Object Tracking, 20th International Conference on Image Processing, Computer Vision & Pattern Recognition, 158-164, 2016.07.
32. Yoshihiro Tanaka, Keitaro Oka, Takatsugu Ono, Koji Inoue, Accuracy Analysis of Machine Learning-Based Performance Modeling for Microprocessors, Fourth International Japan-Egypt Conference on Electronics, Communications and Computers (JEC-ECC), 87-90, 2016.05.
33. Satoshi Imamura, Keitaro Oka, Yuichiro Yasui, Yuichi Inadomi, Katsuki Fujisawa, Toshio Endo, Koji Ueno, Keiichiro Fukazawa, Nozomi Hata, Yuta Kakibuka, Inoue Koji, Takatsugu Ono, Evaluating the impacts of code-level performance tunings on power efficiency, 4th IEEE International Conference on Big Data, Big Data 2016
Proceedings - 2016 IEEE International Conference on Big Data, Big Data 2016
, 10.1109/BigData.2016.7840624, 362-369, 2016.01, [URL], As the power consumption of HPC systems will be a primary constraint for exascale computing, a main objective in HPC communities is recently becoming to maximize power efficiency (i.e., performance per watt) rather than performance. Although programmers have spent a considerable effort to improve performance by tuning HPC programs at a code level, tunings for improving power efficiency is now required. In this work, we select two representative HPC programs (Graph500 and SDPARA) and evaluate how traditional code-level performance tunings applied to these programs affect power efficiency. We also investigate the impacts of the tunings on power efficiency at various operating frequencies of CPUs and/or GPUs. The results show that the tunings significantly improve power efficiency, and different types of tunings exhibit different trends in power efficiency by varying CPU frequency. Finally, the scalability and power efficiency of state-of-the-art Graph500 implementations are explored on both a single-node platform and a 960-node supercomputer. With their high scalability, they achieve 27.43 MTEPS/Watt with 129.76 GTEPS on the single-node system and 4.39 MTEPS/Watt with 1,085.24 GTEPS on the supercomputer..
34. Takatsugu Ono, Yotaro Konishi, Teruo Tanimoto, Noboru Iwamatsu, Takashi Miyoshi, Jun Tanaka, A Flexible Direct Attached Storage for a Data Intensive Application, IEICE Transactions on Information and Systems, E98-D, 12, 2168-2177, 2015.12.
35. Takatsugu Ono, Yotaro Konishi, Teruo Tanimoto, Noboru Iwamatsu, Takashi Miyoshi, Jun Tanaka, A flexible direct attached storage for a data intensive application, IEICE Transactions on Information and Systems, 10.1587/transinf.2015PAP0029, E98D, 12, 2168-2177, 2015.12, [URL], Big data analysis and a data storing applications require a huge volume of storage and a high I/O performance. Applications can achieve high level of performance and cost efficiency by exploiting the high I/O performance of direct attached storages (DAS) such as internal HDDs. With the size of stored data ever increasing, it will be difficult to replace servers since internal HDDs contain huge amounts of data. Generally, the data is copied via Ethernet when transferring the data from the internal HDDs to the new server. However, the amount of data will continue to rapidly increase, and thus, it will be hard to make these types of transfers through the Ethernet since it will take a long time. A storage area network such as iSCSI can be used to avoid this problem because the data can be shared with the servers. However, this decreases the level of performance and increases the costs. Improving the flexibility without incurring I/O performance degradation is required in order to improve the DAS architecture. In response to this issue, we propose FlexDAS, which improves the flexibility of direct attached storage by using a disk area network (DAN) without degradation the I/O performance. A resource manager connects or disconnects the computation nodes to the HDDs via the FlexDAS switch, which supports the SAS or SATA protocols. This function enables for the servers to be replaced in a short period of time. We developed a prototype FlexDAS switch and quantitatively evaluated the architecture. Results show that the FlexDAS switch can disconnect and connect the HDD to the server in just 1.16 seconds. We also confirmed that the FlexDAS improves the performance of the data intensive applications by up to 2.84 times compared with the iSCSI..
36. Takatsugu Ono, Yotaro Konishi, Teruo Tanimoto, Noboru Iwamatsu, Takashi Miyoshi, Jun Tanaka, FlexDAS
A flexible direct attached storage for I/O intensive applications, 2nd IEEE International Conference on Big Data, IEEE Big Data 2014
Proceedings - 2014 IEEE International Conference on Big Data, IEEE Big Data 2014
, 10.1109/BigData.2014.7004224, 147-152, 2015.01, [URL], Big data analysis and a data storing applications require a huge volume of storage and a high I/O performance. Applications can achieve high levels of performance and cost efficiency by exploiting the high I/O performances of direct attached storages (DAS) such as internal HDDs. With the size of stored data ever increasing, it will be difficult to replace servers since internal HDDs contain huge amounts of data. In response to this issue, we propose FlexDAS, which improves the flexibility of direct attached storage by using a disk area network (DAN) without degrading the I/O performance. We developed a prototype FlexDAS switch and quantitatively evaluated the architecture. Results show that the FlexDAS switch can disconnect and connect the HDD to the server in just 1.16 seconds. The I/O performances of the disks connected via the FlexDAS switch were almost the same as the conventional DAS architecture..
37. 小西 洋太郎, 小野 貴継, 三吉 貴史, ディスクエリアネットワークを用いたオブジェクトストレージの高速なデータ復旧手法, 情報処理学会論文誌コンピューティングシステム(ACS), 6, 4, 38-48, 2013.10.
38. 小野 貴継, 井上 弘士, 村上 和彰, シミュレーション結果の再利用によるキャッシュ・ミス率予測技術, 情報処理学会論文誌, 52, 23, 3172-3183, 2011.12.
39. Takatsugu Ono, Inoue Koji, Kazuaki Murakami, Kenji Yoshida, Reducing On-Chip DRAM Energy via Data Transfer Size Optimization, IEICE Transactions on Electronics, E92-C, 4, 433-443, 2009.04.
40. Takatsugu Ono, Inoue Koji, Kazuaki Murakami, Kenji Yoshida, Reducing On-Chip DRAM energy via data transfer size optimization, IEICE Transactions on Electronics, 10.1587/transele.E92.C.433, E92-C, 4, 433-443, 2009.01, [URL], This paper proposes a software-controllable variable line-size (SC-VLS) cache architecture for low power embedded systems. High bandwidth between logic and a DRAM is realized by means of advanced integrated technology. System-in-Silicon is one of the architectural frameworks to realize the high bandwidth. An ASIC and a specific SRAM are mounted onto a silicon interposer. Each chip is connected to the silicon interposer by eutectic solder bumps. In the framework, it is important to reduce the DRAM energy consumption. The specific DRAM needs a small cache memory to improve the performance. We exploit the cache to reduce the DRAM energy consumption. During application program executions, an adequate cache line size which produces the lowest cache miss ratio is varied because the amount of spatial locality of memory references changes. If we employ a large cache line size, we can expect the effect of prefetching. However, the DRAM energy consumption is larger than a small line size because of the huge number of banks are accessed. The SC-VLS cache is able to change a line size to an adequate one at runtime with a small area and power overheads. We analyze the adequate line size and insert line size change instructions at the beginning of each function of a target program before executing the program. In our evaluation, it is observed that the SC-VLS cache reduces the DRAM energy consumption up to 88%, compared to a conventional cache with fixed 256 B lines..
41. 小野 貴継, 井上 弘士, 村上 和彰, メモリアクセスの特徴を活用した高速かつ正確なメモリアーキテクチャ・シミュレーション法, 情報処理学会論文誌コンピューティングシステム(ACS), 48, 13, 303-313, 2007.08.

九大関連コンテンツ

pure2017年10月2日から、「九州大学研究者情報」を補完するデータベースとして、Elsevier社の「Pure」による研究業績の公開を開始しました。
 
 
九州大学知的財産本部「九州大学Seeds集」