Kyushu University Academic Staff Educational and Research Activities Database
List of Presentations
Katsuki Fujisawa Last modified date:2024.02.02

Professor / Division for Intelligent Societal Implementation of Mathmatical Computation / Institute of Mathematics for Industry


Presentations
1. Katsuki Fujisawa, Advanced Computing and Optimization Infrastructure for Ex- tremely Large-Scale Graphs on Post Peta-Scale Supercomputers, SC15, 2015.11.
2. Katsuki Fujisawa, Advanced Computing and Optimization Infrastructure for Ex- tremely Large-Scale Graphs on Post Peta-Scale Supercomputers, HPCCON, 2015.10.
3. Katsuki Fujisawa, How to win Graph500 – A Challenge to Graph500 Benchmark –, Summer School for Combinatorial Optimization, Co@work, 2015.10.
4. Katsuki Fujisawa, A Challenge to Graph500 Benchmark: Trillion-Scale Graph Process- ing on K Computer, iDB2015, 2015.08.
5. Katsuki Fujisawa, A Challenge to Graph500 Benchmark: Trillion-Scale Graph Process- ing on K Computer, ISC15 : HPC in Asia 02, 2015.07.
6. Katsuki Fujisawa, Katsuki Fujisawa, Large-Scale Graph Analysis for Cyber Security on Post Peta-Scale Supercomputers, Kyushu Universuty Cybersecurity Center Opening Ceremony and Cybersecurity Symposium, 2015.07.
7. Katsuki Fujisawa, Advanced Computing and Optimization Infrastructure for Ex- tremely Large-Scale Graphs on Post Peta-Scale Supercomputers, 2014 ATIP Workshop: Japanese Research Toward Next-Generation Extreme Computing, 2014.11.
8. Katsuki Fujisawa, Advanced Computing and Optimization Infrastructure for Ex- tremely Large-Scale Graphs on Post Peta-Scale Supercomputers, IMI Workshop on Optimization in the Real World, 2014.10.
9. Katsuki Fujisawa, Petascale General Solver for Semidefinite Programming Problems with over Two Million Constraints, RTE-IBM Workshop Semi-Definite Programming for Optimal Power Flow Problem, 2014.04.
10. Katsuki Fujisawa, Toshio Endo, Yuichiro Yasui, Hitoshi Sato, Naoki Matsuzawa, Satoshi Matsuoka, Hayato Waki, Petascale general solver for semidefinite programming problems with over two million constraints, 28th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2014, 2014.01, The semi definite programming (SDP) problem is one of the central problems in mathematical optimization. The primal-dual interior-point method (PDIPM) is one of the most powerful algorithms for solving SDP problems, and many research groups have employed it for developing software packages. However, two well-known major bottlenecks, i.e., the generation of the Schur complement matrix (SCM) and its Cholesky factorization, exist in the algorithmic framework of the PDIPM. We have developed a new version of the semi definite programming algorithm parallel version (SDPARA), which is a parallel implementation on multiple CPUs and GPUs for solving extremely large-scale SDP problems with over a million constraints. SDPARA can automatically extract the unique characteristics from an SDP problem and identify the bottleneck. When the generation of the SCM becomes a bottleneck, SDPARA can attain high scalability using a large quantity of CPU cores and some processor affinity and memory interleaving techniques. SDPARA can also perform parallel Cholesky factorization using thousands of GPUs and techniques for overlapping computation and communication if an SDP problem has over two million constraints and Cholesky factorization constitutes a bottleneck. We demonstrate that SDPARA is a high-performance general solver for SDPs in various application fields through numerical experiments conducted on the TSUBAME 2.5 supercomputer, and we solved the largest SDP problem (which has over 2.33 million constraints), thereby creating a new world record. Our implementation also achieved 1.713 PFlops in double precision for large-scale Cholesky factorization using 2,720 CPUs and 4,080 GPUs..
11. Yuichiro Yasui, Katsuki Fujisawa, Kazushige Goto, NUMA-optimized parallel breadth-first search on multicore single-node system, 2013 IEEE International Conference on Big Data, Big Data 2013, 2013.12, The breadth-first search (BFS) is one of the most important kernels in graph theory. The Graph500 benchmark measures the performance of any supercomputer performing a BFS in terms of traversed edges per second (TEPS). Previous studies have proposed hybrid approaches that combine a well-known top-down algorithm and an efficient bottom-up algorithm for large frontiers. This reduces some unnecessary searching of outgoing edges in the BFS traversal of a small-world graph, such as a Kronecker graph. In this paper, we describe a highly efficient BFS using column-wise partitioning of the adjacency list while carefully considering the non-uniform memory access (NUMA) architecture. We explicitly manage the way in which each working thread accesses a partial adjacency list in local memory during BFS traversal. Our implementation has achieved a processing rate of 11.15 billion edges per second on a 4-way Intel Xeon E5-4640 system for a scale-26 problem of a Kronecker graph with 2 26 vertices and 230 edges. Not all of the speedup techniques in this paper are limited to the NUMA architecture system. With our winning Green Graph500 submission of June 2013, we achieved 64.12 GTEPS per kilowatt hour on an ASUS Pad TF700T with an NVIDIA Tegra 3 mobile processor..
12. Katsuki Fujisawa, Hitoshi Sato, Satoshi Matsuoka, Toshio Endo, Makoto Yamashita, Maho Nakata, High-performance general solver for extremely large-scale semidefinite programming problems, 2012 24th International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2012, 2012.12, Semidefinite programming (SDP) is one of the most important problems among optimization problems at present. It is relevant to a wide range of fields such as combinatorial optimization, structural optimization, control theory, economics, quantum chemistry, sensor network location and data mining. The capability to solve extremely large-scale SDP problems will have a significant effect on the current and future applications of SDP. In 1995, Fujisawa et al. started the SDPA(Semidefinite programming algorithm) Project aimed at solving large-scale SDP problems with high numerical stability and accuracy. SDPA is one of the main codes to solve general SDPs. SDPARA is a parallel version of SDPA on multiple processors with distributed memory, and it replaces two major bottleneck parts (the generation of the Schur complement matrix and its Cholesky factorization) of SDPA by their parallel implementation. In particular, it has been successfully applied to combinatorial optimization and truss topology optimization. The new version of SDPARA (7.5.0-G) on a large-scale supercomputer called TSUBAME 2.0 at the Tokyo Institute of Technology has successfully been used to solve the largest SDP problem (which has over 1.48 million constraints), and created a new world record. Our implementation has also achieved 533 TFlops in double precision for large-scale Cholesky factorization using 2,720 CPUs and 4,080 GPUs..
13. Toyotaro Suzumura, Koji Ueno, Hitoshi Sato, Katsuki Fujisawa, Satoshi Matsuoka, Performance characteristics of Graph500 on large-scale distributed environment, 2011 IEEE International Symposium on Workload Characterization, IISWC - 2011, 2011.12, Graph500 is a new benchmark for supercomputers based on large-scale graph analysis, which is becoming an important form of analysis in many real-world applications. Graph algorithms run well on supercomputers with shared memory. For the Linpack-based supercomputer rankings, TOP500 reports that heterogeneous and distributed-memory super-computers with large numbers of GPGPUs are becoming dominant. However, the performance characteristics of large-scale graph analysis benchmarks such as Graph500 on distributed-memory supercomputers have so far received little study. This is the first report of a performance evaluation and analysis for Graph500 on a commodity-processor-based distributed-memory supercomputer. We found that the reference implementation "replicated-csr" based on distributed level-synchronized breadth-first search solves a large free graph problem with 231 vertices and 235 edges (approximately 2.15 billon vertices and 34.3 billion edges) in 3.09 seconds with 128 nodes and 3,072 cores. This equates to 11 giga-edges traversed per second. We describe the algorithms and implementations of the reference implementations of Graph500, and analyze the performance characteristics with varying graph sizes and numbers of computer nodes and different implementations. Our results will also contribute to the development of optimized algorithms for the coming exascale machines..
14. Makoto Yamashita, Katsuki Fujisawa, Efficient parallel software for large-scale SemiDefinite Programs, 2010 IEEE International Symposium on Computer-Aided Control System Design, CACSD 2010, 2010.12, SemiDefinite Program (SDP) is one of principal problems in mathematical programming. Its application range is very wide and covers some problems arising from control theory; for example, a stability condition for differential inclusions and discrete-time optimal control problems. Solving these applications, however, sometimes requires long computation time, since they generate largescale SDPs. When Primal-Dual Interior-Point Methods (PDIPMs) are employed for solving large-scale SDPs, most of the computation time is occupied by the computation related to the Schur Complement Matrix (SCM). We have developed SDPARA (SemiDefinite Programming Algorithm paRAllel version) to deal with such largescale SDPs. In particular, the latest version of SDPARA can handle sparse SCMs adequately. In this paper, we concisely describe how parallel implementation of SDPARA shortens the computation time of the SCM and then discuss the latest implementation for sparse SCMs. Numerical results show that SDPARA achieves remarkable parallel scalability and enables us to solve large-scale SDPs from control theory..
15. Takashi Funasaka, Masami Iwase, Katsuki Fujisawa, Shoshiro Hatakeyama, Visualization of stability of dynamical systems by 3D graphics supported by cluster computing, 3rd IEEE Workshop on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications, IDAACS 2005, 2005.01, The stabilization of nonlinear systems depend strongly on the initial state and the parameters of the systems. The initial state and the parameters with which the system is stabilized can be distinguished by the geometrical structure. It is, however, difficult and sometimes impossible to analyze the structure analytically. Therefore it comes important to show and analyze the structure of the parameters and initial states numerically and visually. In this paper, we present a method to draw and visualize such region and structure in the three dimensional space. In general, the projection of the original high-dimensional space to the lower dimension one is required for using visual analysis. Thus, it is convenient that the viewpoint can be moved, without time loss, in the direction where analyst would like to see. As often as the viewpoint moves, the recomputation as quick as possible is required to realize the quick motion of viewpoint. It is, however, obvious that lots of computation and time are taken to draw the region. Therefore, high performance calculators are needed to realize the real-time drawing. In order to overcome this problem, FPGA and cluster-computing is used in this paper. Then it is demonstrated by illustrative examples that FPGA and cluster-computing shows high performance to draw the region of the parameters and initial state in 3D with which z n+1 = z2n + C can be stabilized, that is Mandelbrot and Julia sets, respectively..
16. Katsuki Fujisawa, Masakazu Kojima, Akiko Takeda, Makoto Yamashita, High performance grid and cluster computing for some optimization problems, Proceedings - 2004 International Symposium on Applications and the Internet Workshops (Saint 2004Workshop), 2004.06, The aim of this short article is to show that grid and cluster computing provides tremendous power to optimization methods. The methods that the article picks up are a successive convex relaxation method for quadratic optimization problems, a polyhedral homotopy method for polynomial systems of equations and a primal-dual interior-point method for semidefinite programming problems. Their parallel implementations on grids and clusters together with numerical results are reported..
17. Katsuki Fujisawa, Yukinobu Hamuro, Naoki Katoh, Takeshi Tokuyama, Katsutoshi Yada, Approximation of optimal two-dimensional association rules for categorical attributes using semidefinite programming, 2nd International Conference on Discovery Science, DS 1999, 1999.01, We consider the problem of finding two-dimensional association rules for categorical attributes. Suppose we have two conditional attributes A and B both of whose domains are categorical, and one binary target attribute whose domain is {“positive”, “negative”}. We want to split the Cartesian product of domains of A and B into two subsets so that a certain objective function is optimized, i.e., we want to find a good segmentation of the domains of A and B. We consider in this paper the objective function that maximizes the confidence under the constraint of the upper bound of the support size. We first prove that the problem is NP-hard, and then propose an approximation algorithm based on semidefinite programming. In order to evaluate the effectiveness and efficiency of the proposed algorithm, we carry out computational ex- periments for problem instances generated by real sales data consisting of attributes whose domain size is a few hundreds at maximum. Approxi- mation ratios of the solutions obtained measured by comparing solutions for semidefinite programming relaxation range from 76% to 95%. It is observed that the performance of generated association rules are signifi- cantly superior to that of one-dimensional rules..
18. , [URL].