Kyushu University Academic Staff Educational and Research Activities Database
List of Papers
Katsuki Fujisawa Last modified date:2024.02.02

Professor / Division for Intelligent Societal Implementation of Mathmatical Computation / Institute of Mathematics for Industry


Papers
1. Hiroki Ishikura, Nariaki Tateiwa, Shingo Egi, Issa Oe, Nozomi Hata, Toru Mitsutake, Keiichiro Yamamura, Miyu Fujii, Katsuki Fujisawa, Scheduling system for automated storage and retrieval system with multiple machines using a time-expanded network, Japan Journal of Industrial and Applied Mathematics, 10.1007/s13160-023-00619-1, 2023.10.
2. Diversified Adversarial Attacks based on Conjugate Gradient Method.
Deep learning models are vulnerable to adversarial examples, and adversarial
attacks used to generate such examples have attracted considerable research
interest. Although existing methods based on the steepest descent have achieved
high attack success rates, ill-conditioned problems occasionally reduce their
performance. To address this limitation, we utilize the conjugate gradient (CG)
method, which is effective for this type of problem, and propose a novel attack
algorithm inspired by the CG method, named the Auto Conjugate Gradient (ACG)
attack. The results of large-scale evaluation experiments conducted on the
latest robust models show that, for most models, ACG was able to find more
adversarial examples with fewer iterations than the existing SOTA algorithm
Auto-PGD (APGD). We investigated the difference in search performance between
ACG and APGD in terms of diversification and intensification, and define a
measure called Diversity Index (DI) to quantify the degree of diversity. From
the analysis of the diversity using this index, we show that the more diverse
search of the proposed method remarkably improves its attack success rate..
3. Nariaki Tateiwa, Yuji Shinano, Keiichiro Yamamura, Akihiro Yoshida, Shizuo Kaji, Masaya Yasuda, Katsuki Fujisawa, CMAP-LAP: Configurable Massively Parallel Solver for Lattice Problems, 2021 IEEE 28th International Conference on High Performance Computing, Data, and Analytics (HiPC), 10.1109/hipc53243.2021.00018, 42-52, 2021.12, Lattice problems are a class of optimization problems that are notably hard. There are no classical or quantum algorithms known to solve these problems efficiently. Their hardness has made lattices a major cryptographic primitive for post-quantum cryptography. Several different approaches have been used for lattice problems with different computational profiles; some suffer from super-exponential time, and others require exponential space. This motivated us 10 develop a novel lattice problem solver, CMAP-LAP, based on the clever coordination of different algorithms that run massively in parallel. With our flexible framework, heterogeneous modules run asynchronously in parallel on a large-scale distributed system while exchanging information, which drastically boosts the overall performance. We also implement full checkpoint-and-restart functionality, which is vital to high-dimensional lattice problems. CMAP-LAP facilitates the implementation of large-scale parallel strategies for lattice problems since all the functions are designed to he customizable and abstract. Through numerical experiments with up to 103,680 cores, we evaluated the performance and stability of our system and demonstrated its high capability for future massive-scale experiments..
4. Akira Tanaka, Chansu Han, Takeshi Takahashi, Katsuki Fujisawa, Internet-Wide Scanner Fingerprint Identifier Based on TCP/IP Header, The 4th IEEE International Symposium on Future Cyber Security Technologies (FCST 2021), In conjunction with The 8th International Conference on Internet of Things: Systems, Management and Security (IoTSMS 2021),Gandia, Spain. December 6-9, 2021., 10.1109/FMEC54266.2021.9732414, 1-6, 2021.12.
5. Akira Tanaka, Nariaki Tateiwa, Nozomi Hata, Akihiro Yoshida, Takashi Wakamatsu, Shota Osafune, Katsuki Fujisawa, Offline map matching using time-expanded graph for low-frequency data, Transportation Research Part C: Emerging Technologies, 10.1016/j.trc.2021.103265, 130, 2021.09, Map matching is an essential preprocessing step for most trajectory-based intelligent transport system services. Due to device capability constraints and the lack of a high-performance model, map matching for low-sampling-rate trajectories is of particular interest. Therefore, we developed a time-expanded graph matching (TEG-matching) that has three advantages (1) high speed and accuracy, as it is robust for spatial measurement error and a pause such as at traffic lights; (2) being parameter-free, that is, our algorithm has no predetermined hyperparameters; and (3) only requiring ordered locations for map matching. Given a set of low-frequency GPS data, we construct a time-expanded graph (TEG) whose path from source to sink represents a candidate route. We find the shortest path on TEG to obtain the matching route with a small area between the vehicle trajectory. Additionally, we introduce two general speedup techniques (most map matching methods can apply) bottom-up segmentation and fractional cascading. Numerical experiments with worldwide vehicle trajectories in a public dataset show that TEG-matching outperforms state-of-the-art algorithms in terms of accuracy and speed, and we verify the effectiveness of the two general speedup techniques..
6. Obstacle Avoidable G2-continuous Trajectory Generated with Clothoid Spline Solution
This research engages in generating trajectories with continuous curvature and applying blending cases. Clothoid spline has an important property that its curvature is linear related to its arclength, this property can provide a curvature (G2) continuity to the interpolation operation. But it is rarely used in the trajectory plan because its shape is hard to control to avoid obstacles. This paper uses a collision-free corridor to limit the trajectory generation to avoid obstacles. In a transformable system, using line segments to connect with Clothoid spiral segments, the computation complexity could be greatly reduced by setting the starting conditions equal to zero. With the computation, a unique solution to the spiral segment is found. The computation process and the corresponding algorithm are elaborated in this paper..
7. Huiqiao Ren, Katsuki Fujisawa, G2 B-Spline Computation for Continuous Trajectory Generation., 2021 6th Asia-Pacific Conference on Intelligent Robot Systems, ACIRS 2021, 10.1109/ACIRS52449.2021.9519319, 1-7, 2021.04, This paper introduces a scheme of B-spline calculation, which is used for G2 continuous trajectory generation. The B-spline is composed of line segments, Euler spirals, and arcs. By simplifying and digging in the qualities of Euler spirals, we discover a valid spiral segment criterion. This criterion can be used in deciding how to compose the B-spline. One main trait of the trajectory is that this B-spline trajectory is assumed and generated inside a collision-free corridor in every interpolation scenario. Hence, the trajectory can be G2 continuous and avoid instant obstacles in every scenario case..
8. Huiqiao Ren, Fulin Zhou, Katsuki Fujisawa, Real-Time Automatic Anomaly Detection Approach Designed for Electrified Railway Power System., 2021 7th International Conference on Mechatronics and Robotics Engineering, ICMRE 2021, 10.1109/ICMRE51691.2021.9384838, 116-120, 2021.02, An automatic and intelligent abnormal electrical process detection scheme is crucial for protecting the stability and power quality of an electrical power system and further, the operation of the future grid. This paper introduces the automatic monitoring system for electrified railway power system and designs a framework based on the convolution neural network for abnormal electrical process detection, integrating the data processing, feature extraction, and classification into one model. Then inception blocks are introduced as a kernel-wise approach to boost the performance. The data from the railway electrification system is applied to this scheme and receives a high performance of 97% abnormal electrical process recognition rate..
9. Nariaki Tateiwa, Yuji Shinano, Satoshi Nakamura, Akihiro Yoshida, Shizuo Kaji, Masaya Yasuda, Katsuki Fujisawa, Massive parallelization for finding shortest lattice vectors based on ubiquity generator framework., The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC20), to be held from 15-20 November 2020 in Atlanta, GA, USA., 10.1109/SC41405.2020.00064, 2020-November, 60-60, 2020.11, Lattice-based cryptography has received attention as a next-generation encryption technique, because it is believed to be secure against attacks by classical and quantum computers. Its essential security depends on the hardness of solving the shortest vector problem (SVP). In the cryptography, to determine security levels, it is becoming significantly more important to estimate the hardness of the SVP by high-performance computing. In this study, we develop the world's first distributed and asynchronous parallel SVP solver, the MAssively Parallel solver for SVP (MAP-SVP). It can parallelize algorithms for solving the SVP by applying the Ubiquity Generator framework, which is a generic framework for branch-and-bound algorithms. The MAP-SVP is suitable for massive-scale parallelization, owing to its small memory footprint, low communication overhead, and rapid checkpoint and restart mechanisms. We demonstrate its performance and scalability of the MAP-SVP by using up to 100,032 cores to solve instances of the Darmstadt SVP Challenge..
10. 藤澤 克樹, Nested Subspace Arrangement for Representation of Relational Data, Thirty-seventh International Conference on Machine Learning (ICML2020), abs/2007.02007, 4127-4137, 2020.07.
11. Huiqiao Ren, Fujisawa Katsuki, Circular Arc Based Obstacle Avoiding Blending Trajectory plan, 2020 5th International Conference on Control and Robotics Engineering, ICCRE 2020, 10.1109/ICCRE49379.2020.9096450, 15-18, 2020.04, © 2020 IEEE. This paper works on the problem of generating a trajectory which is capable to dodge potential obstacles in a blending occasion with a non-holonomic premise. To the problem, this paper offers a solution of a circular blending curve generation with given directions, lanes and obstacle. This technique reforms the two-dimensional information into one-dimensional representation. With this trait, the obstacles could be represented by the constraints generated with the technique. Then, a collision-free corridor could be formed by the constraints. The corridor generating algorithm can provide outcome at a high speed, also cater to one all multiple obstacles situation..
12. Akihiro Yoshida, Tatsuru Higurashi, Masaki Maruishi, Nariaki Tateiwa, Nozomi Hata, Akira Tanaka, Takashi Wakamatsu, Kenichi Nagamatsu, Akira Tajima, Katsuki Fujisawa, New Performance Index “ Attractiveness Factor ” for Evaluating Websites via Obtaining Transition of Users ’Interests, Data Science and Engineering, Springer, 10.1007/s41019-019-00112-1, 5, 1, 48-64, 2020.01.
13. Akihiro Yoshida, Yosuke Yatsushiro, Nozomi Hata, Tatsuru Higurashi, Nariaki Tateiwa, Takashi Wakamatsu, Akira Tanaka, Kenichi Nagamatsu, Katsuki Fujisawa, Practical End-to-End Repositioning Algorithm for Managing Bike-Sharing System, 2019 IEEE International Conference on Big Data, Big Data 2019 Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019, 10.1109/BigData47090.2019.9005986, 1251-1258, 2019.12, One of the most critical problems in bike-sharing services is a bicycle repositioning problem, which is how service providers must relocate their bicycles to maintain the quality of service. In this paper, we propose an end-to-end approach for the bike repositioning problem, which realizes the operator-feasible repositioning plan with cooperation among multiple trucks. Our proposed algorithm consists of three procedures. First, we predict the number of rented and returned bicycles at each station with a deep learning based on the bicycle usage information. Second, we determine the optimal number of bicycles to satisfy the availability of each station by solving an integer optimization problem. Finally, we solve the vehicle routing problem formulated as another integer optimization problem. Based on our algorithm, service operators can actually perform a relocation task based with a reference to the truck capacity, routes, and the number of bicycles to be loaded and unloaded. We demonstrate the applicability of our algorithm in the real world through numerical experiments on the real bicycle data of a Japanese company..
14. Akihiro Yoshida, Yosuke Yatsushiro, Nozomi Hata, Tatsuru Higurashi, Nariaki Tateiwa, Takashi Wakamatsu, Akira Tanaka, Kenichi Nagamatsu, Katsuki Fujisawa, Practical End-to-End Repositioning Algorithm for Managing Bike-Sharing System, 2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 10.1109/BigData47090.2019.9005986, 1251-1258, 2019.12, One of the most critical problems in bike-sharing services is a bicycle repositioning problem, which is how service providers must relocate their bicycles to maintain the quality of service. In this paper, we propose an end-to-end approach for the bike repositioning problem, which realizes the operatorfeasible repositioning plan with cooperation among multiple trucks. Our proposed algorithm consists of three procedures. First, we predict the number of rented and returned bicycles at each station with a deep learning based on the bicycle usage information. Second, we determine the optimal number of bicycles to satisfy the availability of each station by solving an integer optimization problem. Finally, we solve the vehicle routing problem formulated as another integer optimization problem. Based on our algorithm, service operators can actually perform a relocation task based with a reference to the truck capacity, routes, and the number of bicycles to be loaded and unloaded. We demonstrate the applicability of our algorithm in the real world through numerical experiments on the real bicycle data of a Japanese company..
15. Nozomi Hata, Takashi Nakayama, Akira Tanaka, Takashi Wakamatsu, Akihiro Yoshida, Nariaki Tateiwa, Yuri Nishikawa, Jun Ozawa, Katsuki Fujisawa, Mobility Optimization on Cyber Physical System via Multiple Object Tracking and Mathematical Programming, 2018 IEEE International Conference on Big Data, Big Data 2018 Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018, 10.1109/BigData.2018.8622146, 4026-4035, 2019.01, Cyber-Physical Systems (CPSs) are attracting significant attention from a number of industries, including social infrastructure, manufacturing, retail, among others. We can easily gather big datasets of people and transportation movements by utilizing camera and sensor technologies, and create new industrial applications by optimizing and simulating social mobility in the cyberspace. In this paper, we develop the system which automatically performs a series of processes, including object detection, multiple object tracking, and mobility optimization. The mobility of humans and objects is one of the essential components in the real world. Therefore, our system can be widely applied to various application fields. Our major contributions to this paper are remarkable performance improvement of multiple object tracking and building the new mobility optimization engine. In the former, we improve the multiple object tracker using K-Shortest Paths (KSP), which achieves significant data reduction and acceleration by specifying and deleting unnecessary nodes. Numerical experiments show that our proposed tracker is over three times faster than the original KSP tracker while keeping the accuracy. We formulate the mobility optimization problem as the SATisfiability problem (SAT) and the Integer Programming problem (IP) in the latter. Numerical experiments demonstrate that the total transit time can be reduced from 30 s to 10 s. We discuss the characteristics of solutions obtained by the two formulations. We can finally select the appropriate optimization method according to the constraints of calculation time and accuracy for real applications..
16. Akihiro Yoshida, Tatsuru Higurashi, Masaki Maruishi, Nariaki Tateiwa, Nozomi Hata, Akira Tanaka, Takashi Wakamatsu, Kenichi Nagamatsu, Akira Tajima, Katsuki Fujisawa, New Performance Index “Attractiveness Factor” for Evaluating Websites via Obtaining Transition of Users’ Interests, Data Science and Engineering, 10.1007/s41019-019-00112-1, 2019.01, The studies of browsing behavior have gained increasing attention in web analysis for providing better service. Most of the conventional approaches focus on simple indices such as average dwell time and conversion rate. These indices make similar evaluations to websites even if their features are significantly different. Moreover, such statistical indices are not sensitive to the dynamics of users’ interests. In this paper, we propose a new framework for measuring a website’s attractiveness that takes into account both the distribution and dynamics of users’ interests. Within the framework, we define a new index for the website, called Attractiveness Factor, which evaluates the degree of users’ attention. It consists of three procedures: First, we capture the transition of users’ interests during browsing by solving a nonnegative matrix factorization and constrained network flow problems. To accommodate multiple types of interests of a user, we applied a soft clustering as opposed to a hard clustering to model attributes of users and websites. Second, for each website, the feature of each cluster is obtained by fitting the dwell time distribution with Weibull distribution. Finally, we calculate Attractiveness Factor of a website by applying the results of clustering and fitting. Attractiveness Factor depends on the distribution of the dwell time of users interested in the website, which reflects the change of interest of users. Numerical experiments with real web access data of Yahoo Japan News are conducted by solving extremely large-scale optimization problems. They show that Attractiveness Factor captures more exceptional information about browsing behavior more effectively than well-used indices. Attractive factors give low ratings to category pages; however, it can assign high ratings to websites that attract many people, such as hot topic news about the 2018 FIFA World Cup, Japan’s new imperial era’ REIWA,’ and North Korea—the United States Hanoi Summit. Moreover, we demonstrate that Attractiveness Factor can detect the tendency of users’ attention to each website at a given time interval of the day..
17. Katsuki Fujisawa, Toyotaro Suzumura, Hitoshi Sato, Koji Ueno, Satoshi Imamura, Ryo Mizote, Akira Tanaka, Nozomi Hata, Toshio Endo, Advanced computing and optimization infrastructure for extremely large-scale graphs on post-peta-scale supercomputers, Advanced Software Technologies for Post-Peta Scale Computing The Japanese Post-Peta CREST Research Project, 10.1007/978-981-13-1924-2_11, 207-226, 2018.12, In this paper, we present our ongoing research project. The objective of many ongoing research projects in high-performance computing (HPC) areas is to develop an advanced computing and optimization infrastructure for extremely largescale graphs on the peta-scale supercomputers. The extremely large-scale graphs that have recently emerged in various application fields, such as transportation, social networks, cybersecurity, disaster prevention, and bioinformatics, require fast and scalable analysis. The Graph500 benchmark measures the performance of any supercomputer performing a breadth-first search (BFS) in terms of traversed edges per second (TEPS). In 2014-2017, our project team has achieved about 38.6TeraTEPS on K computer and been a winner at the 8th and 10th to 15th Graph500 benchmark. We commenced our research project for developing the Urban OS (Operating System) for a large-scale city in 2013. The Urban OS, which is regarded as one of the emerging applications of the cyber-physical system (CPS), gathers big data sets of the distribution of people and transportation movements by utilizing sensor technologies and storing them in the cloud storage system. In the next step, we apply optimization, simulation, and deep learning techniques to solve them and check the validity of solutions obtained on the cyberspace. The Urban OS employs the graph analysis system developed by this research project and provides a feedback to a predicting and controlling center to optimize many social systems and services.We briefly explain our ongoing research project for realizing the Urban OS..
18. Nozomi Hata, Takashi Nakayama, Akira Tanaka, Takashi Wakamatsu, Akihiro Yoshida, Nariaki Tateiwa, Yuri Nishikawa, Jun Ozawa, Katsuki Fujisawa, Mobility Optimization on Cyber Physical System via Multiple Object Tracking and Mathematical Programming., Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018, 10.1109/BigData.2018.8622146, 4026-4035, 2018.12, © 2018 IEEE. Cyber-Physical Systems (CPSs) are attracting significant attention from a number of industries, including social infrastructure, manufacturing, retail, among others. We can easily gather big datasets of people and transportation movements by utilizing camera and sensor technologies, and create new industrial applications by optimizing and simulating social mobility in the cyberspace. In this paper, we develop the system which automatically performs a series of processes, including object detection, multiple object tracking, and mobility optimization. The mobility of humans and objects is one of the essential components in the real world. Therefore, our system can be widely applied to various application fields. Our major contributions to this paper are remarkable performance improvement of multiple object tracking and building the new mobility optimization engine. In the former, we improve the multiple object tracker using K-Shortest Paths (KSP), which achieves significant data reduction and acceleration by specifying and deleting unnecessary nodes. Numerical experiments show that our proposed tracker is over three times faster than the original KSP tracker while keeping the accuracy. We formulate the mobility optimization problem as the SATisfiability problem (SAT) and the Integer Programming problem (IP) in the latter. Numerical experiments demonstrate that the total transit time can be reduced from 30 s to 10 s. We discuss the characteristics of solutions obtained by the two formulations. We can finally select the appropriate optimization method according to the constraints of calculation time and accuracy for real applications..
19. Satoshi Imamura, Yuichiro Yasui, Koji Inoue, Takatsugu Ono, Hiroshi Sasaki, Katsuki Fujisawa, Evaluating energy-efficiency of DRAM channel interleaving schemes for multithreaded programs, IEICE Transactions on Information and Systems, 10.1587/transinf.2017EDP7296, E101D, 9, 2247-2257, 2018.09, The power consumption of server platforms has been increasing as the amount of hardware resources equipped on them is increased. Especially, the capacity of DRAM continues to grow, and it is not rare that DRAM consumes higher power than processors on modern servers. Therefore, a reduction in the DRAM energy consumption is a critical challenge to reduce the system-level energy consumption. Although it is well known that improving row buffer locality (RBL) and bank-level parallelism (BLP) is effective to reduce the DRAM energy consumption, our preliminary evaluation on a real server demonstrates that RBL is generally low across 15 multithreaded benchmarks. In this paper, we investigate the memory access patterns of these benchmarks using a simulator and observe that cache line-grained channel interleaving schemes, which are widely applied to modern servers including multiple memory channels, hurt the RBL each of the benchmarks potentially possesses. In order to address this problem, we focus on a row-grained channel interleaving scheme and compare it with three cache line-grained schemes. Our evaluation shows that it reduces the DRAM energy consumption by 16.7%, 12.3%, and 5.5%on average (up to 34.7%, 28.2%, and 12.0%) compared to the other schemes, respectively..
20. Satoshi Imamura, Yuichiro Yasui, Koji Inoue, Takatsugu Ono, Hiroshi Sasaki, Katsuki Fujisawa, Evaluating energy-efficiency of DRAM channel interleaving schemes for multithreaded programs, IEICE Transactions on Information and Systems, 10.1587/transinf.2017EDP7296, E101D, 9, 2247-2257, 2018.09, © 2018 The Institute of Electronics, Information and Communication Engineers. The power consumption of server platforms has been increasing as the amount of hardware resources equipped on them is increased. Especially, the capacity of DRAM continues to grow, and it is not rare that DRAM consumes higher power than processors on modern servers. Therefore, a reduction in the DRAM energy consumption is a critical challenge to reduce the system-level energy consumption. Although it is well known that improving row buffer locality (RBL) and bank-level parallelism (BLP) is effective to reduce the DRAM energy consumption, our preliminary evaluation on a real server demonstrates that RBL is generally low across 15 multithreaded benchmarks. In this paper, we investigate the memory access patterns of these benchmarks using a simulator and observe that cache line-grained channel interleaving schemes, which are widely applied to modern servers including multiple memory channels, hurt the RBL each of the benchmarks potentially possesses. In order to address this problem, we focus on a row-grained channel interleaving scheme and compare it with three cache line-grained schemes. Our evaluation shows that it reduces the DRAM energy consumption by 16.7%, 12.3%, and 5.5%on average (up to 34.7%, 28.2%, and 12.0%) compared to the other schemes, respectively..
21. Nariaki Tateiwa, Nozomi Hata, Akira Tanaka, Takashi Nakayama, Akihiro Yoshida, Takashi Wakamatsu, Katsuki Fujisawa, Hybrid Vehicle Control and Optimization with a New Mathematical Method, IFAC-PapersOnLine, 10.1016/j.ifacol.2018.10.037, 51, 31, 201-206, 2018.09, © 2018 For hybrid electric vehicle (HEV) systems, studies using model-based simulators have been actively conducted. The vehicle powertrain simulator makes it easier to evaluate the powertrain system. In this paper, we utilize a Toyota Hybrid System (THS) simulator to obtain a long-term control that optimizes the fuel efficiency when the vehicle speed over a certain period is given. Our proposed method obtains optimal long-term control by solving the shortest path problem with state of charge (SOC) constraints after constructing a graph expressing the transition of the fuel and battery consumption. We also propose a search method for vehicle control using bicubic spline interpolation without the preparation of a controller. We finally remove almost all edges from a graph by 97.2% at most through the utilization of 0-1 integer linear programming, which enables a 3.88x speedup in obtaining the optimal vehicle control..
22. Nariaki Tateiwa, Nozomi Hata, Akira Tanaka, Takashi Nakayama, Akihiro Yoshida, Takashi Wakamatsu, Katsuki Fujisawa, Hybrid Vehicle Control and Optimization with a New Mathematical Method, IFAC-PapersOnLine, 10.1016/j.ifacol.2018.10.037, 51, 31, 201-206, 2018, For hybrid electric vehicle (HEV) systems, studies using model-based simulators have been actively conducted. The vehicle powertrain simulator makes it easier to evaluate the powertrain system. In this paper, we utilize a Toyota Hybrid System (THS) simulator to obtain a long-term control that optimizes the fuel efficiency when the vehicle speed over a certain period is given. Our proposed method obtains optimal long-term control by solving the shortest path problem with state of charge (SOC) constraints after constructing a graph expressing the transition of the fuel and battery consumption. We also propose a search method for vehicle control using bicubic spline interpolation without the preparation of a controller. We finally remove almost all edges from a graph by 97.2% at most through the utilization of 0-1 integer linear programming, which enables a 3.88x speedup in obtaining the optimal vehicle control..
23. Akira Tanaka, Nozomi Hata, Nariaki Tateiwa, Katsuki Fujisawa, Practical approach to evacuation planning via network flow and deep learning., Proceedings - 2017 IEEE International Conference on Big Data, Big Data 2017, 10.1109/BigData.2017.8258322, 2018-January, 3368-3377, 2017.10, © 2017 IEEE. In this paper, we propose a practical approach to evacuation planning by utilizing network flow and deep learning algorithms. In recent years, large amounts of data are rapidly being stored in the cloud system, and effective data utilization for solving real-world problems is required more than ever. Hierarchical Data Analysis and Optimization System (HDAOS) enables us to select appropriate algorithms according to the degree of difficulty in solving problems and a given time for the decision-making process, and such selection helps address real-world problems. In the field of emergency evacuation planning, however, the Lexicographically Quickest Flow (LQF) algorithm has an extremely long computation time on a large-scale network, and is therefore not a practical solution. For Osaka city, which is the second-largest city in Japan, we must solve the maximum flow problems on a large-scale network with over 8.3M nodes and 32.8M arcs for obtaining an optimal plan. Consequently, we can feed back nothing to make an evacuation plan. To solve the problem, we utilize the optimal solution as training data of a deep Convolutional Neural Network (CNN). We train a CNN by using the results of the LQF algorithm in normal time, and in emergencies predict the evacuation completion time (ECT) immediately by the well-learned CNN. Our approach provides almost precise ECT, achieving an average regression error of about 2%. We provide several techniques for combining LQF with CNN and addressing numerous movements as CNN's input, which has rarely been considered in previous studies. Hodge decomposition also demonstrates that LQF is efficient from the standpoint of the total distance traveled by all evacuees, which reinforces the validity of the method of utilizing the LQF algorithm for deep learning..
24. Akira Tanaka, Nozomi Hata, Nariaki Tateiwa, Katsuki Fujisawa, Practical approach to evacuation planning via network flow and deep learning, 5th IEEE International Conference on Big Data, Big Data 2017 Proceedings - 2017 IEEE International Conference on Big Data, Big Data 2017, 10.1109/BigData.2017.8258322, 3368-3377, 2017.07, In this paper, we propose a practical approach to evacuation planning by utilizing network flow and deep learning algorithms. In recent years, large amounts of data are rapidly being stored in the cloud system, and effective data utilization for solving real-world problems is required more than ever. Hierarchical Data Analysis and Optimization System (HDAOS) enables us to select appropriate algorithms according to the degree of difficulty in solving problems and a given time for the decision-making process, and such selection helps address real-world problems. In the field of emergency evacuation planning, however, the Lexicographically Quickest Flow (LQF) algorithm has an extremely long computation time on a large-scale network, and is therefore not a practical solution. For Osaka city, which is the second-largest city in Japan, we must solve the maximum flow problems on a large-scale network with over 8.3M nodes and 32.8M arcs for obtaining an optimal plan. Consequently, we can feed back nothing to make an evacuation plan. To solve the problem, we utilize the optimal solution as training data of a deep Convolutional Neural Network (CNN). We train a CNN by using the results of the LQF algorithm in normal time, and in emergencies predict the evacuation completion time (ECT) immediately by the well-learned CNN. Our approach provides almost precise ECT, achieving an average regression error of about 2%. We provide several techniques for combining LQF with CNN and addressing numerous movements as CNN's input, which has rarely been considered in previous studies. Hodge decomposition also demonstrates that LQF is efficient from the standpoint of the total distance traveled by all evacuees, which reinforces the validity of the method of utilizing the LQF algorithm for deep learning..
25. Akifumi Kira, Hidenao Iwane, Hirokazu Anai, Yutaka Kimura, Katsuki Fujisawa, An indirect search algorithm for disaster restoration with precedence and synchronization constraints, PACIFIC JOURNAL OF MATHEMATICS FOR INDUSTRY, 10.1186/s40736-017-0032-5, 9, 2017.07, When a massive disaster occurs, to repair the damaged part of lifeline networks, planning is needed to appropriately allocate tasks to two or more restoration teams and optimize their traveling routes. However, precedence and synchronization constraints make restoration teams interdependent of one another, and impede a successful solution by standard local search. In this paper, we propose an indirect local search method using the product set of team-wise permutations as an auxiliary search space. It is shown that our method successfully avoids the interdependence problem induced by the precedence and synchronization constraints, and that it has the big advantage of non-deteriorating perturbations being available for iterated local search..
26. Koji Ueno, Toyotaro Suzumura, Naoya Maruyama, Katsuki Fujisawa, Satoshi Matsuoka, Efficient Breadth-First Search on Massively Parallel and Distributed-Memory Machines, Data Science and Engineering, 10.1007/s41019-016-0024-y, 2, 1, 22-35, 2017.03, There are many large-scale graphs in real world such as Web graphs and social graphs. The interest in large-scale graph analysis is growing in recent years. Breadth-First Search (BFS) is one of the most fundamental graph algorithms used as a component of many graph algorithms. Our new method for distributed parallel BFS can compute BFS for one trillion vertices graph within half a second, using large supercomputers such as the K-Computer. By the use of our proposed algorithm, the K-Computer was ranked 1st in Graph500 using all the 82,944 nodes available on June and November 2015 and June 2016 38,621.4 GTEPS. Based on the hybrid BFS algorithm by Beamer (Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing Workshops and PhD Forum, IPDPSW ’13, IEEE Computer Society, Washington, 2013), we devise sets of optimizations for scaling to extreme number of nodes, including a new efficient graph data structure and several optimization techniques such as vertex reordering and load balancing. Our performance evaluation on K-Computer shows that our new BFS is 3.19 times faster on 30,720 nodes than the base version using the previously known best techniques..
27. 藤澤 克樹, Efficient Breadth-First Search on Massively Parallel and Distributed Memory Machines, The proceedings of the IEEE BigData2016, 10.1007/s41019-016-0024-y, 2, 1, 22-35, 2017.03.
28. Satoshi Imamura, Yuichiro Yasui, Koji Inoue, Takatsugu Ono, Hiroshi Sasaki, Katsuki Fujisawa, Power-Efficient Breadth-First Search with DRAM Row Buffer Locality-Aware Address Mapping, 2016 High Performance Graph Data Management and Processing, HPGDMP 2016 Proceedings of HPGDMP 2016 High Performance Graph Data Management and Processing - Held in conjunction with SC 2016: The International Conference for High Performance Computing, Networking, Storage and Analysis, 10.1109/HPGDMP.2016.010, 17-24, 2017.01, Graph analysis applications have been widely used in real services such as road-traffic analysis and social network services. Breadth-first search (BFS) is one of the most representative algorithms for such applications; therefore, many researchers have tuned it to maximize performance. On the other hand, owing to the strict power constraints of modern HPC systems, it is necessary to improve power efficiency (i.e., performance per watt) when executing BFS. In this work, we focus on the power efficiency of DRAM and investigate the memory access pattern of a state-of-the-art BFS implementation using a cycle-accurate processor simulator. The results reveal that the conventional address mapping schemes of modern memory controllers do not efficiently exploit row buffers in DRAM. Thus, we propose a new scheme called per-row channel interleaving and improve the DRAM power efficiency by 30.3% compared to a conventional scheme for a certain simulator setting. Moreover, we demonstrate that this proposed scheme is effective for various configurations of memory controllers..
29. Yuichiro Yasui, Katsuki Fujisawa, Fast, Scalable, and Energy-Efficient Parallel Breadth-First Search, ROLE AND IMPORTANCE OF MATHEMATICS IN INNOVATION, 10.1007/978-981-10-0962-4_6, 25, 61-75, 2017.01, The breadth-first search (BFS) is one of the most centric processing in graph theory. In this paper, we presented a fast, scalable, and energy-efficient BFS for a nonuniform memory access (NUMA)-based system, in which the NUMA architecture was carefully considered. Our implementation achieved performance rates of 175 billion edges per second for Kronecker graph with 233 vertices and 237 edges on two racks of a SGI UV 2000 system with 1,280 threads and the fastest entries for a shared-memory system in the June 2014 and November 2014 Graph500 lists. It also produced the most energy-efficient entries in the first and second (small data category) and third, fourth, fifth, and sixth (big data category) Green Graph500 lists on a 4-socket Intel Xeon E5-4640 system..
30. 藤澤 克樹, Evaluating the Impacts of Code-Level Performance Tunings on Power Efficiency, The proceedings of the IEEE BigData2016,, 10.1109/BigData.2016.7840624, -, 362-369, 2016.12.
31. Koji Ueno, Toyotaro Suzumura, Naoya Maruyama, Katsuki Fujisawa, Satoshi Matsuoka, Extreme scale breadth-first search on supercomputers., Proceedings - 2016 IEEE International Conference on Big Data, Big Data 2016, 10.1109/BigData.2016.7840705, 1040-1047, 2016.12, © 2016 IEEE. Breadth-First Search(BFS) is one of the most fundamental graph algorithms used as a component of many graph algorithms. Our new method for distributed parallel BFS can compute BFS for one trillion vertices graph within half a second, using large supercomputers such as the K-Computer. By the use of our proposed algorithm, the K-Computer was ranked 1st in Graph500 using all the 82,944 nodes available on June and November 2015 and June 2016 38,621.4 GTEPS. Based on the hybrid-BFS algorithm by Beamer[3], we devise sets of optimizations for scaling to extreme number of nodes, including a new efficient graph data structure and optimization techniques such as vertex reordering and load balancing. Performance evaluation on the K shows our new BFS is 3.19 times faster on 30,720 nodes than the base version using the previously-known best techniques..
32. Yuichiro Yasui, Katsuki Fujisawa, Eng Lim Goh, John Baron, Atsushi Sugiura, Takashi Uchiyama, NUMA-aware Scalable Graph Traversal on SGI UV Systems, PROCEEDINGS OF THE ACM WORKSHOP ON HIGH PERFORMANCE GRAPH PROCESSING (HPGP'16), 10.1145/2915516.2915522, 19-26, 2016.10, Breadth-first search (BFS) is one of the most fundamental processing algorithm singraph theory. We previously presented a scalable BFS algorithm based on Beamer's direction optimizing algorithm forn on-uniform memory access(NUMA)-based systems, in which the NUMA architecture was care-fully considered. This paper presents our new implementation that reduces remote memory access in a top-down direction of direction-optimizing algorithm. We also discuss numerical results obtained on the SGI UV 2000 and UV 300 systems, which are shared-memory super computers based on a cache coherent (cc)-NUMA architecture that can handle thousands of threads on a single operating system. Our implementation has a chieved performance rates of 219 billion edges per second on a Kronecker graph with 2(34) vertices and 2(38) edges on arack of an SGI UV 300 system with 1,152 threads. This result exceeds the fast estentry for a shared memory system on the current Graph500 list presented in November 2015, which includes our previous implementation..
33. Katsuki Fujisawa, Fast and power efficient computation for sparse modeling, Journal of the Institute of Electronics, Information and Communication Engineers, 99, 5, 444-449, 2016.05.
34. Yuichiro Yasui, Katsuki Fujisawa, Eng Lim Goh, John Baron, Atsushi Sugiura, Takashi Uchiyama, NUMA-aware scalable graph traversal on SGI UV Systems, ACM Workshop on High Performance Graph Processing, HPGP 2016 HPGP 2016 - Proceedings of the ACM Workshop on High Performance Graph Processing, Co-located with HPDC 2016, 10.1145/2915516.2915522, 19-26, 2016.05, Breadth-first search (BFS) is one of the most fundamental processing algorithms in graph theory. We previously presented a scalable BFS algorithm based on Beamer's direction-optimizing algorithm for non-uniform memory access (NUMA)-based systems, in which the NUMA architecture was carefully considered. This paper presents our new implementation that reduces remote memory access in a top-down direction of direction-optimizing algorithm. We also discuss numerical results obtained on the SGI UV 2000 and UV 300 systems, which are shared-memory supercomputers based on a cache coherent (cc)-NUMA architecture that can handle thousands of threads on a single operating system. Our implementation has achieved performance rates of 219 billion edges per second on a Kronecker graph with 234 vertices and 238 edges on a rack of an SGI UV 300 system with 1,152 threads. This result exceeds the fastest entry for a shared-memory system on the current Graph500 list presented in November 2015, which includes our previous implementation..
35. Katsuki Fujisawa, Toshio Endo, Yuichiro Yasui, Advanced Computing and Optimization Infrastructure for Extremely Large-Scale Graphs on Post Peta-Scale Supercomputers., Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 10.1007/978-3-319-42432-3_33, 9725, 265-274, 2016.03, © Springer International Publishing Switzerland 2016. In this talk, we present our ongoing research project. The objective of this project is to develop advanced computing and optimization infrastructures for extremely large-scale graphs on post petascale supercomputers. We explain our challenge to Graph 500 and Green Graph 500 benchmarks that are designed to measure the performance of a computer system for applications that require irregular memory and network access patterns. The 1st Graph500 list was released in November 2010. The Graph500 benchmark measures the performance of any supercomputer performing a BFS (Breadth-First Search) in terms of traversed edges per second (TEPS). In 2014 and 2015, our project team was a winner of the 8th, 10th, and 11th Graph500 and the 3rd to 6th Green Graph500 benchmarks, respectively. We also present our parallel implementation for large-scale SDP (SemiDefinite Programming) problem. The semidefinite programming (SDP) problem is a predominant problem in mathematical optimization. The primal-dual interior-point method (PDIPM) is one of the most powerful algorithms for solving SDP problems, and many research groups have employed it for developing software packages. We solved the largest SDP problem (which has over 2.33 million constraints), thereby creating a new world record. Our implementation also achieved 1.774 PFlops in double precision for largescale Cholesky factorization using 2,720 CPUs and 4,080 GPUs on the TSUBAME 2.5 supercomputer..
36. Katsuki Fujisawa, Toshio Endo, Yuichiro Yasui, Advanced computing and optimization infrastructure for extremely large-scale graphs on post peta-scale supercomputers, 5th International Conference on Mathematical Software, ICMS 2016 Mathematical Software - 5th International Conference, ICMS 2016, Proceedings, 10.1007/978-3-319-42432-3_33, 265-274, 2016.01, In this talk, we present our ongoing research project. The objective of this project is to develop advanced computing and optimization infrastructures for extremely large-scale graphs on post petascale supercomputers. We explain our challenge to Graph 500 and Green Graph 500 benchmarks that are designed to measure the performance of a computer system for applications that require irregular memory and network access patterns. The 1st Graph500 list was released in November 2010. The Graph500 benchmark measures the performance of any supercomputer performing a BFS (Breadth-First Search) in terms of traversed edges per second (TEPS). In 2014 and 2015, our project team was a winner of the 8th, 10th, and 11th Graph500 and the 3rd to 6th Green Graph500 benchmarks, respectively. We also present our parallel implementation for large-scale SDP (SemiDefinite Programming) problem. The semidefinite programming (SDP) problem is a predominant problem in mathematical optimization. The primal-dual interior-point method (PDIPM) is one of the most powerful algorithms for solving SDP problems, and many research groups have employed it for developing software packages. We solved the largest SDP problem (which has over 2.33 million constraints), thereby creating a new world record. Our implementation also achieved 1.774 PFlops in double precision for largescale Cholesky factorization using 2,720 CPUs and 4,080 GPUs on the TSUBAME 2.5 supercomputer..
37. Satoshi Imamura, Keitaro Oka, Yuichiro Yasui, Yuichi Inadomi, Katsuki Fujisawa, Toshio Endo, Koji Ueno, Keiichiro Fukazawa, Nozomi Hata, Yuta Kakibuka, Koji Inoue, Takatsugu Ono, Evaluating the impacts of code-level performance tunings on power efficiency, 4th IEEE International Conference on Big Data, Big Data 2016 Proceedings - 2016 IEEE International Conference on Big Data, Big Data 2016, 10.1109/BigData.2016.7840624, 362-369, 2016.01, As the power consumption of HPC systems will be a primary constraint for exascale computing, a main objective in HPC communities is recently becoming to maximize power efficiency (i.e., performance per watt) rather than performance. Although programmers have spent a considerable effort to improve performance by tuning HPC programs at a code level, tunings for improving power efficiency is now required. In this work, we select two representative HPC programs (Graph500 and SDPARA) and evaluate how traditional code-level performance tunings applied to these programs affect power efficiency. We also investigate the impacts of the tunings on power efficiency at various operating frequencies of CPUs and/or GPUs. The results show that the tunings significantly improve power efficiency, and different types of tunings exhibit different trends in power efficiency by varying CPU frequency. Finally, the scalability and power efficiency of state-of-the-art Graph500 implementations are explored on both a single-node platform and a 960-node supercomputer. With their high scalability, they achieve 27.43 MTEPS/Watt with 129.76 GTEPS on the single-node system and 4.39 MTEPS/Watt with 1,085.24 GTEPS on the supercomputer..
38. Koji Ueno, Toyotaro Suzumura, Naoya Maruyama, Katsuki Fujisawa, Satoshi Matsuoka, Extreme scale breadth-first search on supercomputers, 4th IEEE International Conference on Big Data, Big Data 2016 Proceedings - 2016 IEEE International Conference on Big Data, Big Data 2016, 10.1109/BigData.2016.7840705, 1040-1047, 2016.01, Breadth-First Search(BFS) is one of the most fundamental graph algorithms used as a component of many graph algorithms. Our new method for distributed parallel BFS can compute BFS for one trillion vertices graph within half a second, using large supercomputers such as the K-Computer. By the use of our proposed algorithm, the K-Computer was ranked 1st in Graph500 using all the 82,944 nodes available on June and November 2015 and June 2016 38,621.4 GTEPS. Based on the hybrid-BFS algorithm by Beamer[3], we devise sets of optimizations for scaling to extreme number of nodes, including a new efficient graph data structure and optimization techniques such as vertex reordering and load balancing. Performance evaluation on the K shows our new BFS is 3.19 times faster on 30,720 nodes than the base version using the previously-known best techniques..
39. Yuki Tsujita, Toshio Endo, Katsuki Fujisawa, The scalable petascale data-driven approach for the Cholesky factorization with multiple GPUs, 1st International Workshop on Extreme Scale Programming Models and Middleware, ESPM2 2015 Proceedings of ESPM2 2015 1st International Workshop on Extreme Scale Programming Models and Middleware - Held in conjunction with SC 2015: The International Conference for High Performance Computing, Networking, Storage and Analysis, 10.1145/2832241.2832245, 38-45, 2015.11, The Cholesky factorization is an important linear algebra kernel which is used in the semidefinite programming (SDP) problem. However, the large computation costs for Cholesky factorization of the Schur complement matrix (SCM) has been obstacles to solve large scale problems. This paper describes a brand-new version of the parallel SDP solver, SDPARA, which has been equipped with a Cholesky factorization implementation and demonstrated 1.7PFlops performance with over two million constraints by using 4,080 GPUs. The performance and scalability is even more improved by introducing a data-driven approach, rather than traditional synchronous approach. Also we point out that typical data-driven implementations have limitation in scalability, and demonstrate the efficiency of the proposed approach via experiments on TSUBAME2.5 supercomputer..
40. Yuichiro Yasui, Katsuki Fujisawa, Fast and Scalable NUMA-based Thread Parallel Breadth-first Search, PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING & SIMULATION (HPCS 2015), 10.1109/HPCSim.2015.7237065, 377-385, 2015.10, The breadth-first search (BFS) is one of the most centric kernels in graph processing. Beamer's direction-optimizing BFS algorithm, which selects one of two traversal directions at each level, can reduce unnecessary edge traversals. In a previous paper, we presented an efficient BFS for a non-uniform memory access (NUMA)-based system, in which the NUMA architecture was carefully considered. In this paper, we investigate the locality of memory accesses in terms of the communication with remote memories in a BFS for a NUMA system, and describe a fast and highly scalable implementation. Our new implementation achieves performance rates of 174.704 billion edges per second for a Kronecker graph with 2(33) vertices and 2(37) edges on two racks of a SGI UV 2000 system with 1,280 threads. The implementations described in this paper achieved the fastest entries for a shared-memory system in the June 2014 and November 2014 Graph500 lists, and produced the most energy-efficient entries in the second, third, and fourth Green Graph500 lists (big data category)..
41. Y. Kira, K. Fujisawa, H. Iwane, H. Anai, A Matrix Scheduling Heuristic to Disaster Restoration of Lifeline Networks, ISMP 2015 | 22nd International Symposium on Mathematical Programming, 2015.07.
42. Yuichiro Yasui, Katsuki Fujisawa, Fast and scalable NUMA-based thread parallel breadth-first search, 13th International Conference on High Performance Computing and Simulation, HPCS 2015 Proceedings of the 2015 International Conference on High Performance Computing and Simulation, HPCS 2015, 10.1109/HPCSim.2015.7237065, 377-385, 2015.01, The breadth-first search (BFS) is one of the most centric kernels in graph processing. Beamer's direction-optimizing BFS algorithm, which selects one of two traversal directions at each level, can reduce unnecessary edge traversals. In a previous paper, we presented an efficient BFS for a non-uniform memory access (NUMA)-based system, in which the NUMA architecture was carefully considered. In this paper, we investigate the locality of memory accesses in terms of the communication with remote memories in a BFS for a NUMA system, and describe a fast and highly scalable implementation. Our new implementation achieves performance rates of 174.704 billion edges per second for a Kronecker graph with 233 vertices and 237 edges on two racks of a SGI UV 2000 system with 1,280 threads. The implementations described in this paper achieved the fastest entries for a shared-memory system in the June 2014 and November 2014 Graph500 lists, and produced the most energy-efficient entries in the second, third, and fourth Green Graph500 lists (big data category)..
43. Keita Iwabuchi, Hitoshi Sato, Yuichiro Yasui, Katsuki Fujisawa, Satoshi Matsuoka, NVM-based Hybrid BFS with memory efficient data structure, 2nd IEEE International Conference on Big Data, IEEE Big Data 2014 Proceedings - 2014 IEEE International Conference on Big Data, IEEE Big Data 2014, 10.1109/BigData.2014.7004270, 529-538, 2015.01, We introduce a memory efficient implementation for the NVM-based Hybrid BFS algorithm that merges redundant data structures to a single graph data structure, while offloading infrequent accessed graph data on NVMs based on the detailed analysis of access patterns, and demonstrate extremely fast BFS execution for large-scale unstructured graphs whose size exceed the capacity of DRAM on the machine. Experimental results of Kronecker graphs compliant to the Graph500 benchmark on a 2-way INTEL Xeon E5-2690 machine with 256 GB of DRAM show that our proposed implementation can achieve 4.14 GTEPS for a SCALE31 graph problem with 231 vertices and 235 edges, whose size is 4 times larger than the size of graphs that the machine can accommodate only using DRAM with only 14.99 % performance degradation. We also show that the power efficiency of our proposed implementation achieves 11.8 MTEPS/W. Based on the implementation, we have achieved the 3rd and 4th position of the Green Graph500 list (2014 June) in the Big Data category..
44. Keita Iwabuchi, Hitoshi Sato, Ryo Mizote, Yuichiro Yasui, Katsuki Fujisawa, Satoshi Matsuoka, Hybrid BFS approach using semi-external memory, 28th IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2014 Proceedings - IEEE 28th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2014, 10.1109/IPDPSW.2014.189, 1698-1707, 2014.11, NVM devices will greatly expand the possibility of processing extremely large-scale graphs that exceed the DRAM capacity of the nodes, however, efficient implementation based on detailed performance analysis of access patterns of unstructured graph kernel on systems that utilize a mixture of DRAM and NVM devices has not been well investigated. We introduce a graph data offloading technique using NVMs that augment the hybrid BFS (Breadth-first search) algorithm widely used in the Graph500 benchmark, and conduct performance analysis to demonstrate the utility of NVMs for unstructured data. Experimental results of a Scale27 problem of a Kronecker graph compliant to the Graph500 benchmark show that our approach maximally sustains 4.22 Giga TEPS (Traversed Edges Per Second), reducing DRAM size by half with only 19.18% performance degradation on a 4-way AMD Opteron 6172 machine heavily equipped with NVM devices. Although direct comparison is difficult, this is significantly greater than the result of 0.05 GTEPS for a SCALE 36 problem by using 1TB of DRAM and 12 TB of NVM as reported by Pearce et al. Although our approach uses higher DRAM to NVM ratio, we show that a good compromise is achievable between performance vs. capacity ratio for processing large-scale graphs. This result as well as detailed performance analysis of the proposed technique suggests that we can process extremely large-scale graphs per node with minimum performance degradation by carefully considering the data structures of a given graph and the access patterns to both DRAM and NVM devices. As a result, our implementation has achieved 4.35 MTEPS/W(Mega TEPS per Watt) and ranked 4th on November 2013 edition of the Green Graph500 list in the Big Data category by using only a single fat server heavily equipped with NVMs..
45. Jun Ya Gotoh, Katsuki Fujisawa, Convex optimization approaches to maximally predictable portfolio selection, Optimization, 10.1080/02331934.2012.741237, 63, 11, 1713-1735, 2014.11, In this article we propose a simple heuristic algorithm for approaching the maximally predictable portfolio, which is constructed so that return model of the resulting portfolio would attain the largest goodness-of-fit. It is obtained by solving a fractional program in which a ratio of two convex quadratic functions is maximized, and the number of variables associated with its nonconcavity has been a bottleneck in spite of continuing endeavour for its global optimization. The proposed algorithm can be implemented by simply solving a series of convex quadratic programs, and computational results show that it yields within a few seconds a (near) Karush-Kuhn-Tucker solution to each of the instances which were solved via a global optimization method in [H. Konno, Y. Takaya and R. Yamamoto, A maximal predictability portfolio using dynamic factor selection strategy, Int. J. Theor. Appl. Fin. 13 (2010) pp. 355-366]. In order to confirm the solution accuracy, we also pose a semidefinite programming relaxation approach, which succeeds in ensuring a near global optimality of the proposed approach. Our findings through computational experiments encourage us not to employ the global optimization approach, but to employ the local search algorithm for solving the fractional program of much larger size. © 2014 Taylor & Francis Group, LLC..
46. Keita Iwabuchi, Hitoshi Sato, Ryo Mizote, Yuichiro Yasui, Katsuki Fujisawa, Satoshi Matsuoka, Hybrid BFS approach using semi-external memory, Proceedings of the International Parallel and Distributed Processing Symposium, IPDPS, 10.1109/IPDPSW.2014.189, 1698-1707, 2014.11, © 2014 IEEE. NVM devices will greatly expand the possibility of processing extremely large-scale graphs that exceed the DRAM capacity of the nodes, however, efficient implementation based on detailed performance analysis of access patterns of unstructured graph kernel on systems that utilize a mixture of DRAM and NVM devices has not been well investigated. We introduce a graph data offloading technique using NVMs that augment the hybrid BFS (Breadth-first search) algorithm widely used in the Graph500 benchmark, and conduct performance analysis to demonstrate the utility of NVMs for unstructured data. Experimental results of a Scale27 problem of a Kronecker graph compliant to the Graph500 benchmark show that our approach maximally sustains 4.22 Giga TEPS (Traversed Edges Per Second), reducing DRAM size by half with only 19.18% performance degradation on a 4-way AMD Opteron 6172 machine heavily equipped with NVM devices. Although direct comparison is difficult, this is significantly greater than the result of 0.05 GTEPS for a SCALE 36 problem by using 1TB of DRAM and 12 TB of NVM as reported by Pearce et al. Although our approach uses higher DRAM to NVM ratio, we show that a good compromise is achievable between performance vs. capacity ratio for processing large-scale graphs. This result as well as detailed performance analysis of the proposed technique suggests that we can process extremely large-scale graphs per node with minimum performance degradation by carefully considering the data structures of a given graph and the access patterns to both DRAM and NVM devices. As a result, our implementation has achieved 4.35 MTEPS/W(Mega TEPS per Watt) and ranked 4th on November 2013 edition of the Green Graph500 list in the Big Data category by using only a single fat server heavily equipped with NVMs..
47. Katsuki Fujisawa, NVM-based Hybrid BFS with Memory Efficient Data Structure, The proceedings of the IEEE BigData2014, 2014.09.
48. Katsuki Fujisawa, Fast and Energy-efficient Breadth-first Search on a single NUMA system, Intentional Supercomputing Conference (ISC 14), 2014.06.
49. Katsuki Fujisawa, Hybrid BFS Approach Using Semi-External Memory, International Workshop on High Per- formance Data Intensive Computing (HPDIC2014) in Conjunction with IEEE IPDPS 2014, 2014.05.
50. Katsuki Fujisawa, Peta-scale General Solver for Semidefinite Programming Problems with over Two Mil- lion Constraints, The 28th IEEE International Parallel & Distributed Processing Sym- posium (IPDPS 2014), 2014.05, 最適化問題の高速計算と実社会への応用にも取り組んでおり、例えば半正定値計画問題(SDP)は組合せ最適化, システムと制御, データ科学, 金融工学, 量子化学など非常に幅広い応用を持ち、現在最適化の研究分野で最も注目されている最適化問題の一つとなっている。SDP に対しては高速かつ安定した反復解法である内点法アルゴリズムが存在しているが、巨大な線形方程式系の計算(行列要素の計算と行列のCholesky分解)が大きなボトルネックとなっている。最近の結果では多数GPU の活用や計算と通信のオーバーラップ技術を応用することによって、主要なボトルネックの1つである線形方程式系のCholesky 分解の高速化と世界最大規模の SDPを高速に解くことに成功した(最大で1.713PFlopsの性能を達成)..
51. Katsuki Fujisawa, Toshio Endo, Yuichiro Yasui, Hitoshi Sato, Naoki Matsuzawa, Satoshi Matsuoka, Hayato Waki, Petascale general solver for semidefinite programming problems with over two million constraints, Proceedings of the International Parallel and Distributed Processing Symposium, IPDPS, 10.1109/IPDPS.2014.121, 1171-1180, 2014.05, The semi definite programming (SDP) problem is one of the central problems in mathematical optimization. The primal-dual interior-point method (PDIPM) is one of the most powerful algorithms for solving SDP problems, and many research groups have employed it for developing software packages. However, two well-known major bottlenecks, i.e., the generation of the Schur complement matrix (SCM) and its Cholesky factorization, exist in the algorithmic framework of the PDIPM. We have developed a new version of the semi definite programming algorithm parallel version (SDPARA), which is a parallel implementation on multiple CPUs and GPUs for solving extremely large-scale SDP problems with over a million constraints. SDPARA can automatically extract the unique characteristics from an SDP problem and identify the bottleneck. When the generation of the SCM becomes a bottleneck, SDPARA can attain high scalability using a large quantity of CPU cores and some processor affinity and memory interleaving techniques. SDPARA can also perform parallel Cholesky factorization using thousands of GPUs and techniques for overlapping computation and communication if an SDP problem has over two million constraints and Cholesky factorization constitutes a bottleneck. We demonstrate that SDPARA is a high-performance general solver for SDPs in various application fields through numerical experiments conducted on the TSUBAME 2.5 supercomputer, and we solved the largest SDP problem (which has over 2.33 million constraints), thereby creating a new world record. Our implementation also achieved 1.713 PFlops in double precision for large-scale Cholesky factorization using 2,720 CPUs and 4,080 GPUs. © 2014 IEEE..
52. Yuichiro Yasui, Katsuki Fujisawa, Yukinori Sato, Fast and energy-efficient breadth-first search on a single NUMA system, 29th International Supercomputing Conference, ISC 2014 Supercomputing - 29th International Conference, ISC 2014, Proceedings, 10.1007/978-3-319-07518-1-23, 365-381, 2014.01, Breadth-first search (BFS) is an important graph analysis kernel. The Graph500 benchmark measures a computer's BFS performance using the traversed edges per second (TEPS) ratio. Our previous nonuniform memory access (NUMA)-optimized BFS reduced memory accesses to remote RAM on a NUMA architecture system; its performance was 11 GTEPS (giga TEPS) on a 4-way Intel Xeon E5-4640 system. Herein, we investigated the computational complexity of the bottom-up, a major bottleneck in NUMA-optimized BFS. We clarify the relationship between vertex out-degree and bottom-up performance. In November 2013, our new implementation achieved a Graph500 benchmark performance of 37.66 GTEPS (fastest for a single node) on an SGI Altix UV1000 (one-rack) and 31.65 GTEPS (fastest for a single server) on a 4-way Intel Xeon E5-4650 system. Furthermore, we achieved the highest Green Graph500 performance of 153.17 MTEPS/W (mega TEPS per watt) on an Xperia-A SO-04E with a Qualcomm Snapdragon S4 Pro APQ8064..
53. Yuichiro Yasui, Katsuki Fujisawa, Yukinori Sato, Fast and energy-efficient breadth-first search on a single NUMA system, 29th International Supercomputing Conference, ISC 2014 Supercomputing - 29th International Conference, ISC 2014, Proceedings, 10.1007/978-3-319-07518-1_23, 365-381, 2014.01, Breadth-first search (BFS) is an important graph analysis kernel. The Graph500 benchmark measures a computer's BFS performance using the traversed edges per second (TEPS) ratio. Our previous nonuniform memory access (NUMA)-optimized BFS reduced memory accesses to remote RAM on a NUMA architecture system; its performance was 11 GTEPS (giga TEPS) on a 4-way Intel Xeon E5-4640 system. Herein, we investigated the computational complexity of the bottom-up, a major bottleneck in NUMA-optimized BFS. We clarify the relationship between vertex out-degree and bottom-up performance. In November 2013, our new implementation achieved a Graph500 benchmark performance of 37.66 GTEPS (fastest for a single node) on an SGI Altix UV1000 (one-rack) and 31.65 GTEPS (fastest for a single server) on a 4-way Intel Xeon E5-4650 system. Furthermore, we achieved the highest Green Graph500 performance of 153.17 MTEPS/W (mega TEPS per watt) on an Xperia-A SO-04E with a Qualcomm Snapdragon S4 Pro APQ8064..
54. Yuichiro Yasui, Katsuki Fujisawa, Kazushige Goto, NUMA-optimized Parallel Breadth-first Search on Multicore Single-node System, 2013 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 10.1109/BigData.2013.6691600, 394-402, 2013.10, The breadth-first search (BFS) is one of the most important kernels in graph theory. The Graph500 benchmark measures the performance of any supercomputer performing a BFS in terms of traversed edges per second (TEPS). Previous studies have proposed hybrid approaches that combine a well-known top-down algorithm and an efficient bottom-up algorithm for large frontiers. This reduces some unnecessary searching of outgoing edges in the BFS traversal of a small-world graph, such as a Kronecker graph. In this paper, we describe a highly efficient BFS using column-wise partitioning of the adjacency list while carefully considering the non-uniform memory access (NUMA) architecture. We explicitly manage the way in which each working thread accesses a partial adjacency list in local memory during BFS traversal. Our implementation has achieved a processing rate of 11.15 billion edges per second on a 4-way Intel Xeon E5-4640 system for a scale-26 problem of a Kronecker graph with 2(26) vertices and 2(30) edges. Not all of the speedup techniques in this paper are limited to the NUMA architecture system. With our winning Green Graph500 submission of June 2013, we achieved 64.12 GTEPS per kilowatt hour on an ASUS Pad TF700T with an NVIDIA Tegra 3 mobile processor..
55. Makoto Yamashita, Katsuki Fujisawa, Mituhiro Fukuda, Kazuhide Nakata, Maho Nakata, Algorithm 925: Parallel Solver for Semidefinite Programming Problem having Sparse Schur Complement Matrix, ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 10.1145/2382585.2382591, 39, 1, 6-22, 2012.11, A SemiDefinite Programming (SDP) problem is one of the most central problems in mathematical optimization. SDP provides an effective computation framework for many research fields. Some applications, however, require solving a large-scale SDP whose size exceeds the capacity of a single processor both in terms of computation time and available memory. SDPARA (SemiDefinite Programming Algorithm paRAllel package) [Yamashita et al. 2003b] was designed to solve such large-scale SDPs. Its parallel performance is outstanding for general SDPs in most cases. However, the parallel implementation is less successful for some sparse SDPs obtained from applications such as Polynomial Optimization Problems (POPs) or Sensor Network Localization (SNL) problems, since this version of SDPARA cannot directly handle sparse Schur Complement Matrices (SCMs). In this article we improve SDPARA by focusing on the sparsity of the SCM and we propose a new parallel implementation using the formula-cost-based distribution along with a replacement of the dense Cholesky factorization. We verify numerically that these features are key to solving SDPs with sparse SCMs more quickly on parallel computing systems. The performance is further enhanced by multithreading and the new SDPARA attains considerable scalability in general. It also finds solutions for extremely large-scale SDPs arising from POPs which cannot be obtained by other solvers..
56. Katsuki Fujisawa, Hitoshi Sato, Satoshi Matsuoka, Toshio Endo, Makoto Yamashita, Maho Nakata, High-performance general solver for extremely large-scale semidefinite programming problems, International Conference for High Performance Computing, Networking, Storage and Analysis, SC, 10.1109/SC.2012.67, 93-93, 2012.11, Semidefinite programming (SDP) is one of the most important problems among optimization problems at present. It is relevant to a wide range of fields such as combinatorial optimization, structural optimization, control theory, economics, quantum chemistry, sensor network location and data mining. The capability to solve extremely large-scale SDP problems will have a significant effect on the current and future applications of SDP. In 1995, Fujisawa et al. started the SDPA(Semidefinite programming algorithm) Project aimed at solving large-scale SDP problems with high numerical stability and accuracy. SDPA is one of the main codes to solve general SDPs. SDPARA is a parallel version of SDPA on multiple processors with distributed memory, and it replaces two major bottleneck parts (the generation of the Schur complement matrix and its Cholesky factorization) of SDPA by their parallel implementation. In particular, it has been successfully applied to combinatorial optimization and truss topology optimization. The new version of SDPARA (7.5.0-G) on a large-scale supercomputer called TSUBAME 2.0 at the Tokyo Institute of Technology has successfully been used to solve the largest SDP problem (which has over 1.48 million constraints), and created a new world record. Our implementation has also achieved 533 TFlops in double precision for large-scale Cholesky factorization using 2,720 CPUs and 4,080 GPUs. © 2012 IEEE..
57. Makoto Yamashita, Katsuki Fujisawa, Mituhiro Fukuda, Kazuhiro Kobayashi, Kazuhide Nakata, Maho Nakata, Latest developments in the SDPA family for solving large-scale SDPs, International Series in Operations Research and Management Science, 10.1007/978-1-4614-0769-0_24, 687-713, 2012.01.
58. Toyotaro Suzumura, Koji Ueno, Hitoshi Sato, Katsuki Fujisawa, Satoshi Matsuoka, Performance characteristics of Graph500 on large-scale distributed environment, Proceedings - 2011 IEEE International Symposium on Workload Characterization, IISWC - 2011, 10.1109/IISWC.2011.6114175, 149-158, 2011.11, Graph500 is a new benchmark for supercomputers based on large-scale graph analysis, which is becoming an important form of analysis in many real-world applications. Graph algorithms run well on supercomputers with shared memory. For the Linpack-based supercomputer rankings, TOP500 reports that heterogeneous and distributed-memory super-computers with large numbers of GPGPUs are becoming dominant. However, the performance characteristics of large-scale graph analysis benchmarks such as Graph500 on distributed-memory supercomputers have so far received little study. This is the first report of a performance evaluation and analysis for Graph500 on a commodity-processor-based distributed-memory supercomputer. We found that the reference implementation "replicated-csr" based on distributed level-synchronized breadth-first search solves a large free graph problem with 231 vertices and 235 edges (approximately 2.15 billon vertices and 34.3 billion edges) in 3.09 seconds with 128 nodes and 3,072 cores. This equates to 11 giga-edges traversed per second. We describe the algorithms and implementations of the reference implementations of Graph500, and analyze the performance characteristics with varying graph sizes and numbers of computer nodes and different implementations. Our results will also contribute to the development of optimized algorithms for the coming exascale machines. © 2011 IEEE..
59. Optimization Software SDPA
The optimization software SDPA which has been developed by our group is a solver for symmetric cone programs. The symmetric cone program is a large scheme which includes linear programs, second-order cone programs and semidefmite programs. It has many applications covering various fields such as combinatorial optimization, systems and control theory, robust optimization and quantum chemistry. Primal-dual interior-point methods, which are polynomial-time algorithms, were proposed to solve symmetric cone programs. SDPA is based on the primal-dual interior-point method. In addition, SDPA utilizes sparsity of data in several ways and parallel computation to solve huge size problems efficiently. Using SDPA, we can obtain the solution of symmetric cone programs easily without knowing the details of the algorithm and its implementation techniques. This paper briefly explain the SDPA and its variants. Then outlines an algorithmic framework of the primal-dual interior-point method..
60. Maho Nakata, Bastiaan J Braams, Katsuki Fujisawa, Mituhiro Fukuda, Jerome K Percus, Makoto Yamashita, Zhengji Zhao, Variational calculation of second-order reduced density matrices by strong N-representability conditions and an accurate semidefinite programming solver., The Journal of chemical physics, 10.1063/1.2911696, 128, 16, 164113-164113, 2008.04, The reduced density matrix (RDM) method, which is a variational calculation based on the second-order reduced density matrix, is applied to the ground state energies and the dipole moments for 57 different states of atoms, molecules, and to the ground state energies and the elements of 2-RDM for the Hubbard model. We explore the well-known N-representability conditions (P, Q, and G) together with the more recent and much stronger T1 and T2(') conditions. T2(') condition was recently rederived and it implies T2 condition. Using these N-representability conditions, we can usually calculate correlation energies in percentage ranging from 100% to 101%, whose accuracy is similar to CCSD(T) and even better for high spin states or anion systems where CCSD(T) fails. Highly accurate calculations are carried out by handling equality constraints and/or developing multiple precision arithmetic in the semidefinite programming (SDP) solver. Results show that handling equality constraints correctly improves the accuracy from 0.1 to 0.6 mhartree. Additionally, improvements by replacing T2 condition with T2(') condition are typically of 0.1-0.5 mhartree. The newly developed multiple precision arithmetic version of SDP solver calculates extraordinary accurate energies for the one dimensional Hubbard model and Be atom. It gives at least 16 significant digits for energies, where double precision calculations gives only two to eight digits. It also provides physically meaningful results for the Hubbard model in the high correlation limit..
61. Katsuki Fujisawa, Kazuhide Nakata, Makoto Yamashita, Mituhiro Fukuda, SDPA Project: Solving large-scale semidefinite programs, Journal of the Operations Research Society of Japan, 10.15807/jorsj.50.278, 50, 4, 278-298, 2007.12, The Semidefinite Program (SDP) has recently attracted much attention of researchers in various fields for the following reasons: (i) It has been intensively studied in both theoretical and numerical aspects. Especially the primal-dual interior-point method is known as a powerful tool for solving large-scale SDPs with accuracy. (ii) Many practical problems in various fields such as combinatorial optimization, control and systems theory, robust optimization and quantum chemistry can be modeled or approximated by using SDPs. (iii) Several software packages for solving SDPs and related problems (ex. the Second-Order Cone Program : SOCP) are available on the Internet. In 1995, we started the SDPA Project aimed for solving large-scale SDPs with numerical stability and accuracy. The SDPA (SemiDefinite Programming Algorithm) is a C++ implementation of a Mehrotra-type primal-dual predictor-corrector interior-point method for solving the standard form SDP and its dual. We have also developed some variants of the SDPA to handle SDPs with various features. The SDPARA is a parallel version of the SDPA on multiple processors and distributed memory, which replaces two major bottleneck components of the SDPA by their parallel implementation using MPI and ScaLAPACK. The SDPARA on parallel computer has attractive features; it can load a large-scale SDP into the distributed memory and solve it in a reasonable time. In this paper, we show through some numerical experiments that the SDPARA attains high performance. The SDPARA-C is an integration of two software SDPARA and SDPA-C which is a primal-dual interior-point method using the positive definite matrix completion technique. The SDPARA-C requires a small amount of memory when we solve sparse SDPs with a large-scale matrix variable and/or a large number of equality constraints. The paper also explains a grid portal system for solving SDPs, which we call the SDPA Online Solver. In this paper, we review the major achievements of the SDPA Project on solving large-scale SDPs. This paper provides an introductory and comprehensive materials for researchers who are interested in practical computational aspects of the SDPs..
62. Makoto Yamashita, Katsuki Fujisawa, Kazuhide Nakata, Parallel Solver for SemiDefinite Programming, International Journal of Logistics and SCM systems, 2, 1, 22-29, 2007.07.
63. Construction and Operation of the Grid Challenge Testbed
This paper presents a case study to operate the Grid testbed for the Grid Challenge in SACSIS2006. The Grid Challenge is a programming competition on a Grid testbed, which is organized by multiple computing resources installed in universities and laboratories. In the last competition, the Grid testbed with more than 1200 CPUs was operated. The paper shows hardware/software specifications of the Grid testbed, and reports experience of the operation, which includes accounting, job management, and troubleshooting..
64. Makoto Yamashita, Katsuki Fujisawa, Mituhiro Fukuda, Masakazu Kojima, Kazuhide Nakata, Parallel Primal-Dual Interior-Point Methods for SemiDefinite Programs, Parallel Combinatorial Optimization, 10.1002/9780470053928.ch9, 211-238, 2006.04.
65. Katsuki Fujisawa, Mituhiro Fukuda, Kazuhide Nakata, Preprocessing sparse semidefinite programs via matrix completion, Optimization Methods and Software, 10.1080/10556780512331319523, 21, 1, 17-39, 2006.02, Considering that preprocessing is an important phase in linear programing, it should be more systematically incorporated in semidefinite programing (SDP) solvers. The conversion method proposed by the authors [Fukuda, M., Kojima, M., Murota, K. and Nakata, K., 2000, SIAM Journal on Optimization , 11, 647-674 and Nakata, K., Fujisawa, K., Fukuda, M., Kojima, M. and Murota, K., 2003, Mathematical Programming (Series B) , 95, 303-327] is a preprocessing method for sparse SDPs based on matrix completion. This article proposed a new version of the conversion method, which employs a flop estimation function inside its heuristic procedure. Extensive numerical experiments are included showing the advantage of preprocessing by the conversion method for certain classes of very sparse SDPs..
66. Takashi Funasaka, Masami Iwase, Katsuki Fujisawa, Shoshiro Hatakeyama, Visualization of stability of dynamical systems by 3D graphics supported by cluster computing, Proceedings of the Third Workshop - 2005 IEEE Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications, IDAACS 2005, 10.1109/IDAACS.2005.283052, 588-592, 2005.10, The stabilization of nonlinear systems depend strongly on the initial state and the parameters of the systems. The initial state and the parameters with which the system is stabilized can be distinguished by the geometrical structure. It is, however, difficult and sometimes impossible to analyze the structure analytically. Therefore it comes important to show and analyze the structure of the parameters and initial states numerically and visually. In this paper, we present a method to draw and visualize such region and structure in the three dimensional space. In general, the projection of the original high-dimensional space to the lower dimension one is required for using visual analysis. Thus, it is convenient that the viewpoint can be moved, without time loss, in the direction where analyst would like to see. As often as the viewpoint moves, the recomputation as quick as possible is required to realize the quick motion of viewpoint. It is, however, obvious that lots of computation and time are taken to draw the region. Therefore, high performance calculators are needed to realize the real-time drawing. In order to overcome this problem, FPGA and cluster-computing is used in this paper. Then it is demonstrated by illustrative examples that FPGA and cluster-computing shows high performance to draw the region of the parameters and initial state in 3D with which z n+1 = z2n + C can be stabilized, that is Mandelbrot and Julia sets, respectively. ©2005 IEEE..
67. M Kojima, Y Dai, K Fujisawa, S Kim, A Takeda, Parallel implementation of polyhedral continuation methods for systems of polynomial equations, MATHEMATICAL SOFTWARE, PROCEEDINGS, 283-284, 2002.10.
68. Maho Nakata, Hiroshi Nakatsuji, Masahiro Ehara, Mituhiro Fukuda, Kazuhide Nakata, Katsuki Fujisawa, Variational calculations of fermion second-order reduced density matrices by semidefinite programming algorithm, The Journal of Chemical Physics, 10.1063/1.1360199, 114, 19, 8282-8292, 2001.08, null.
69. TOPOLOGY OPTIMIZATION OF TRUSSES FOR SPECIFIED MULTIPLE LINEAR BUCKLING LOAD FACTORS BY USING SEMIDEFINITE PROGRAMMING
An algorithm based on Semi-Definite Programming (SDP) is proposed for the truss topology optimization problem for specified linear buckling load factor, and optimal topologies of trusses are computed by using the Semi-Definite Programming Algorithm (SDPA). It is well known that optimizing structures for specified buckling load factor is difficult because of non-differentiability of the buckling load factor for the case of multimodal solutions. It is shown, in the examples, that the proposed algorithm is applicable to multimodal cases, and an optimal topology with five-fold buckling load factors is found without any difficulty..
70. 中田 和秀, 藤澤 克樹, 福田 光浩, 小島 政和, 室田 一雄, Solving Sparse Semidefinite Programs by Matrix Completion(Part 2) (Mathematical Science of Optimization), 数理解析研究所講究録, 1174, 1174, 130-137, 2000.10.
71. Kazuhide Nakata, Katsuki Fujisawa, Mituhiro Fukuda, Masakazu Kojima, Kazuo Murota, Matrix Completion and Semidefinite Programming, 統計数理研究所共同研究レポート, 135, 223-237, 2000.10.
72. 半正定値計画問題に対するソフトウェアSDPAの広域並列計算システム.
73. 半正定値計画問題に対する内点法ソフトウェア.
74. 中田 和秀, 藤沢 克樹, 小島 政和, Semidefinite Programming with the Conjugate Gradient Method, 統計数理研究所共同研究レポート, 1113, 224-247, 1998.04.
75. Katsuki Fujisawa, Publications and Research Reports
http://sdpa.imi.kyushu-u.ac.jp/~fujisawa/research.html.