Faculty Profiles - NAKASHIMA YUTO

Information

写真a

NAKASHIMA YUTO

Organization

Faculty of Information Science and Electrical Engineering Department of Informatics Associate Professor
Joint Graduate School of Mathematics for Innovation （Concurrent）
School of Sciences Department of Physics（Concurrent）
Graduate School of Information Science and Electrical Engineering Department of Information Science and Technology（Concurrent）

Profile

文字列の組合せ論および文字列データ処理アルゴリズムの研究を行なっている．教育では主に，理学部物理学科情報理学コースの講義・演習を担当している．

Homepage

https://sites.google.com/site/yuto0nakashima/

External link

Research Areas

Informatics / Theory of informatics

Degree

Ph.D.(Information Science)

Research History

Kyushu University Faculty of Information Science and Electrical Engineering, Department of Informatics Associate Professor

2024.6 - Present

　 More details

Country：Japan

researchmap
Kyushu University Faculty of Information Science and Electrical Engineering, Department of Informatics Assistant Professor

2017.4 - 2024.5

　 More details

researchmap

Education

Kyushu University 大学院システム情報科学府情報学専攻

2014.4 - 2017.3
Kyushu University 大学院システム情報科学府情報学専攻

2012.4 - 2014.3
Kyushu University 理学部物理学科

2008.4 - 2012.3

Research Interests・Research Keywords

Research theme： String Algorithm

Keyword： String Algorithm

Research period： 2024 - Present
Research theme： Combinatorics on words

Keyword： Combinatorics on words

Research period： 2024 - Present
Research theme： String Processing Algorithms, Combinatorics on Words

Keyword： Pattern Matching, Streaming Model, Lyndon Words

Research period： 2017.4

Awards

IPSJ Yamashita SIG Research Award

2024.3 Information Processing Society of Japan アルファベット順による lex-parse サイズ比

　More details

researchmap
SPIRE2020 Best Paper Award

2020.10 SPIRE2020

Papers

On the Compressiveness of the Burrows-Wheeler Transform. Reviewed

Hideo Bannai, Tomohiro I., Yuto Nakashima

Proceedings of the 36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025) 17:1 - 17:15 2025.6

　More details

Publishing type：Research paper (international conference proceedings)

DOI： 10.4230/LIPIcs.CPM.2025.17

researchmap

Other Link： https://dblp.uni-trier.de/db/conf/cpm/cpm2025.html#BannaiI025
Computing maximal palindromes in non-standard matching models. Reviewed

Takuya Mieno, Mitsuru Funakoshi, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

Inf. Comput. 304 2025.5

　More details

Publishing type：Research paper (scientific journal)

DOI： 10.1016/j.ic.2025.105283

researchmap
Tight bounds for the sensitivity of CDAWGs with left-end edits. Reviewed

Hiroto Fujimaru, Yuto Nakashima, Shunsuke Inenaga

Acta Informatica 62 2025.3

　More details

Publishing type：Research paper (scientific journal)

DOI： 10.1007/s00236-025-00478-y

researchmap
Maximal α-Gapped Repeats in a Fibonacci String Reviewed

Kazuma Yamane, Yuto Nakashima, Kazuhisa Seto, Takashi Horiyama

Proceedings of 50th International Conference on Current Trends in Theory and Practice of Informatics (SOFSEM 2025) 337 - 350 2025.1

　More details

Publishing type：Research paper (international conference proceedings)

DOI： 10.1007/978-3-031-82697-9_25

researchmap
On the Number of Non-equivalent Parameterized Squares in a String Reviewed

Rikuya Hamai, Kazushi Taketsugu, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai

Proceedings of 31th International Symposium on String Processing and Information Retrieval (SPIRE 2024) 174 - 183 2024.9

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1007/978-3-031-72200-4_13

researchmap
Edit and Alphabet-Ordering Sensitivity of Lex-Parse Reviewed

Yuto Nakashima, Dominik Köppl, Mitsuru Funakoshi, Shunsuke Inenaga, Hideo Bannai

Proceedings of the 49th International Symposium on Mathematical Foundations of Computer Science (MFCS 2024) 75:1 - 75:15 2024.8

　More details

Language：English Publishing type：Research paper (international conference proceedings)

researchmap
Computing Longest Common Subsequence Under Cartesian-Tree Matching Model. Reviewed

Taketo Tsujimoto, Hiroki Shibata, Takuya Mieno, Yuto Nakashima 0001, Shunsuke Inenaga

Proceedings of the 35th International Workshop on Combinatorial Algorithms (IWOCA 2024) 369 - 381 2024.7

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1007/978-3-031-63021-7_28

researchmap

Other Link： https://dblp.uni-trier.de/db/conf/iwoca/iwoca2024.html#TsujimotoSMNI24
Computing Maximal Palindromes in Non-standard Matching Models. Reviewed

Mitsuru Funakoshi, Takuya Mieno, Yuto Nakashima 0001, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

Proceedings of the 35th International Workshop on Combinatorial Algorithms (IWOCA 2024) 165 - 179 2024.7

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1007/978-3-031-63021-7_13

researchmap

Other Link： https://dblp.uni-trier.de/db/conf/iwoca/iwoca2024.html#FunakoshiMNIBT24
Faster space-efficient STR-IC-LCS computation Reviewed

Yuki Yonemoto, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai

Theoretical Computer Science 1003 114607 - 114607 2024.5 （ ISSN:0304-3975 ）

　More details

Language：English Publishing type：Research paper (scientific journal) Publisher：Elsevier BV

One of the most fundamental method for comparing two given strings A and B is the longest common subsequence (LCS), where the task is to find (the length) of an LCS of A and B. In this paper, we deal with the STR-IC-LCS1 problem which is one of the constrained LCS problems proposed by Chen and Chao [J. Comb. Optim, 2011]. A string Z is said to be an STR-IC-LCS of three given strings A, B, and P, if Z is a longest string satisfying that (1) Z includes P as a substring and (2) Z is a common subsequence of A and B. We present three efficient algorithms for this problem: First, we begin with a space-efficient solution which computes the length of an STR-IC-LCS in O(n2) time and O((ℓ+1)(n−ℓ+1)) space, where ℓ is the length of an LCS of A and B of length n. When ℓ=O(1) or n−ℓ=O(1), then this algorithm uses only linear O(n) space. Second, we present a faster algorithm that works in O(nr/log⁡r+n(n−ℓ+1)) time, where r is the length of P, while retaining the O((ℓ+1)(n−ℓ+1)) space efficiency. Third, we give an alternative algorithm that runs in O(nr/log⁡r+n(n−ℓ′+1)) time with O((ℓ′+1)(n−ℓ′+1)) space, where ℓ′ denotes the STR-IC-LCS length for input strings A, B, and P.

DOI： 10.1016/j.tcs.2024.114607

Scopus

researchmap
Largest Repetition Factorization of Fibonacci Words Reviewed

Kaisei Kishi, Yuto Nakashima, Shunsuke Inenaga

Proceedings of 30th International Symposium on String Processing and Information Retrieval (SPIRE 2023) 2023.9

　More details

Language：English Publishing type：Research paper (international conference proceedings)
Linear-Time Computation of Generalized Minimal Absent Words for Multiple Strings. Reviewed

Kouta Okabe, Takuya Mieno, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai

Proceedings of 30th International Symposium on String Processing and Information Retrieval (SPIRE 2023) 14240 LNCS 331 - 344 2023.9 （ ISSN:0302-9743 ISBN:9783031439797, 9783031439803 eISSN:1611-3349 ）

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：Springer Nature Switzerland

A string w is called a minimal absent word (MAW) for a string S if w does not occur as a substring in S and all proper substrings of w occur in S. MAWs are well-studied combinatorial string objects that have potential applications in areas including bioinformatics, musicology, and data compression. In this paper, we generalize the notion of MAWs to a set $$\mathcal {S} = \{S_1, \ldots, S_k\}$$ of multiple strings. We first describe our solution to the case of $$k = 2$$ strings, and show how to compute the set $$\textsf{M}$$ of MAWs in optimal $$O(n + |\textsf{M}|)$$ time and with O(n) working space, where n denotes the total length of the strings in $$\mathcal {S}$$. We then move on to the general case of $$k > 2$$ strings, and show how to compute the set $$\textsf{M}$$ of MAWs in $$O(n \lceil k / \log n \rceil + |\textsf{M}|)$$ time and with $$O(n (k + \log n))$$ bits of working space, in the word RAM model with machine word size $$\omega = \log n$$. The latter algorithm runs in optimal $$O(n + |\textsf{M}|)$$ time for $$k = O(\log n)$$.

DOI： 10.1007/978-3-031-43980-3_27

Scopus

researchmap
Optimally Computing Compressed Indexing Arrays Based on the Compact Directed Acyclic Word Graph. Reviewed

Hiroki Arimura, Shunsuke Inenaga, Yasuaki Kobayashi, Yuto Nakashima, Mizuki Sue

Proceedings of 30th International Symposium on String Processing and Information Retrieval (SPIRE 2023) 14240 LNCS 28 - 34 2023.9 （ ISSN:03029743 ISBN:9783031439797 ）

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

In this paper, we present the first study of the computational complexity of converting an automata-based text index structure, called the Compact Directed Acyclic Word Graph (CDAWG), of size e for a text T of length n into other text indexing structures for the same text, suitable for highly repetitive texts: the run-length BWT of size r, the irreducible PLCP array of size r, and the quasi-irreducible LPF array of size e, as well as the lex-parse of size O(r) and the LZ77-parse of size z, where $$r, z \leqslant e$$. As main results, we showed that the above structures can be optimally computed from either the CDAWG for T stored in read-only memory or its self-index version of size e without a text in O(e) worst-case time and words of working space. To obtain the above results, we devised techniques for enumerating a particular subset of suffixes in the lexicographic and text orders using the forward and backward search on the CDAWG by extending the result by Belazzougui et al. in 2015.

DOI： 10.1007/978-3-031-43980-3_3

Scopus

researchmap

Other Link： https://dblp.uni-trier.de/db/conf/spire/spire2023.html#ArimuraIKNS23
Computing SEQ-IC-LCS of Labeled Graphs Reviewed International journal

Yuki Yonemoto, Yuto Nakashima, Shunsuke Inenaga

Proceedings of the Prague Stringology Conference 2023 (PSC 2023) 2023.8

　More details

Language：English Publishing type：Research paper (international conference proceedings)
On Sensitivity of Compact Directed Acyclic Word Graphs Reviewed International journal

Hiroto Fujimaru, Yuto Nakashima, Shunsuke Inenaga

Proceedings of 14th International Conference on Words (WORDS 2023) 168 - 180 2023.6

　More details

Language：English Publishing type：Research paper (international conference proceedings)
Optimal LZ-End Parsing Is Hard Reviewed International journal

Hideo Bannai, Mitsuru Funakoshi, Kazuhiro Kurita, Yuto Nakashima, Kazuhisa Seto, Takeaki Uno

Proceedings of the 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023) 3:1 - 3:11 2023.6

　More details

Language：English Publishing type：Research paper (international conference proceedings)
Space-Efficient STR-IC-LCS Computation Reviewed International journal

Yuuki Yonemoto, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai

Proceedings of 47th International Conference on Current Trends in Theory and Practice of Informatics (SOFSEM 2023) 372 - 384 2023.1

　More details

Language：English Publishing type：Research paper (international conference proceedings)
Space-Efficient STR-IC-LCS Computation. Reviewed

Yuuki Yonemoto, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai

SOFSEM 372 - 384 2023

　More details

Publishing type：Research paper (international conference proceedings)

DOI： 10.1007/978-3-031-23101-8_25

researchmap

Other Link： https://dblp.uni-trier.de/db/conf/sofsem/sofsem2023.html#YonemotoNIB23
Largest Repetition Factorization of Fibonacci Words. Reviewed

Kaisei Kishi, Yuto Nakashima, Shunsuke Inenaga

SPIRE 284 - 296 2023

　More details

Publishing type：Research paper (international conference proceedings)

DOI： 10.1007/978-3-031-43980-3_23

researchmap

Other Link： https://dblp.uni-trier.de/db/conf/spire/spire2023.html#KishiNI23
Parameterized DAWGs: Efficient constructions and bidirectional pattern searches Reviewed

Katsuhito Nakashima, Noriki Fujisato, Diptarama Hendrian, Yuto Nakashima, Ryo Yoshinaka, Shunsuke Inenaga, Hideo Bannai, Ayumi Shinohara, Masayuki Takeda

Theoretical Computer Science 933 21 - 42 2022.10 （ ISSN:0304-3975 eISSN:1879-2294 ）

　More details

Language：English Publishing type：Research paper (scientific journal) Publisher：Elsevier BV

Two strings x and y over Σ∪Π of equal length are said to parameterized match (p-match) if there is a renaming bijection f:Σ∪Π→Σ∪Π that is identity on Σ and transforms x to y (or vice versa). The p-matching problem is to look for substrings in a text that p-match a given pattern. In this paper, we propose parameterized suffix automata (p-suffix automata) and parameterized directed acyclic word graphs (PDAWGs) which are the p-matching versions of suffix automata and DAWGs. While suffix automata and DAWGs are equivalent for standard strings, we show that p-suffix automata can have Θ(n2) nodes and edges but PDAWGs have only O(n) nodes and edges, where n is the length of an input string. We also give an O(n|Π|log⁡(|Π|+|Σ|))-time O(n)-space algorithm that builds the PDAWG in a left-to-right online manner. As a byproduct, it is shown that the parameterized suffix tree for the reversed string can also be built in the same time and space, in a right-to-left online manner. This duality also leads us to two further efficient algorithms for p-matching: Given the parameterized suffix tree for the reversal T‾ of the input string T, one can build the PDAWG of T in O(n) time in an offline manner; One can perform bidirectional p-matching in O(mlog⁡(|Π|+|Σ|)+occ) time using O(n) space, where m denotes the pattern length and occ is the number of pattern occurrences in the text T.

DOI： 10.1016/j.tcs.2022.09.008

Web of Science

Scopus

researchmap
Combinatorics of minimal absent words for a sliding window Reviewed

Tooru Akagi, Yuki Kuhara, Takuya Mieno, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

Theoretical Computer Science 927 109 - 119 2022.6 （ ISSN:0304-3975 eISSN:1879-2294 ）

　More details

Language：English Publishing type：Research paper (scientific journal) Publisher：Elsevier BV

A string w is called a minimal absent word (MAW) for another string T if w does not occur in T but the proper substrings of w occur in T. For example, let Σ={a,b,c} be the alphabet. Then, the set of MAWs for string w=abaab is {aaa,aaba,bab,bb,c}. In this paper, we study combinatorial properties of MAWs in the sliding window model, namely, how the set of MAWs changes when a sliding window of fixed length d is shifted over the input string T of length n, where 1≤d<n. We present tight upper and lower bounds on the maximum number of changes in the set of MAWs for a sliding window over T, both in the cases of general alphabets and binary alphabets. Our bounds improve on the previously known best bounds [Crochemore et al., 2020].

DOI： 10.1016/j.tcs.2022.06.002

Web of Science

Scopus

researchmap
Minimal Absent Words on Run-Length Encoded Strings Reviewed International journal

Tooru Akagi, Kouta Okabe, Takuya Mieno, Yuto Nakashima, Shunsuke Inenaga

Proceedings of the 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022) 27:1 - 27:17 2022.6

　More details

Language：English Publishing type：Research paper (international conference proceedings)
c-trie++: A dynamic trie tailored for fast prefix searches Reviewed

Kazuya Tsuruta, Dominik Köppl, Shunsuke Kanda, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

Information and Computation 285 104794 - 104794 2022.5 （ ISSN:0890-5401 ）

　More details

Language：English Publishing type：Research paper (scientific journal) Publisher：Elsevier BV

Given a dynamic set K of k strings of total length n whose characters are drawn from an alphabet of size σ, a keyword dictionary is a data structure built on K that provides lookup, prefix search, and update operations on K. Under the assumption that α=w/lg⁡σ characters fit into a single machine word of w bits, we propose a keyword dictionary that represents K in either nlg⁡σ+Θ(klg⁡n) or |T|lg⁡σ+Θ(kw) bits of space, where |T| is the number of nodes of a trie representing K. It supports all operations in O(m/α+lg⁡α) expected time on an input string of length m in the word RAM model. An evaluation of our implementation highlights the practical usefulness of the proposed data structure, especially for prefix searches — one of the most essential keyword dictionary operations.

DOI： 10.1016/j.ic.2021.104794

Scopus

researchmap
Factorizing strings into repetitions Reviewed International journal

Hiroe Inoue, Yoshiaki Matsuoka, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

Theory of Computing Systems 66 ( 2 ) 484 - 501 2022.4

　More details

Language：English Publishing type：Research paper (scientific journal)
Palindromic Trees for a Sliding Window and Its Applications Reviewed

Takuya Mieno, Kiichi Watanabe, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

Information Processing Letters 173 106174 - 106174 2022.1 （ ISSN:0020-0190 eISSN:1872-6119 ）

　More details

Language：English Publishing type：Research paper (scientific journal) Publisher：Elsevier BV

The palindromic tree (a.k.a. eertree) for a string S of length n is a tree-like data structure that represents the set of all distinct palindromic substrings of S, using O(n) space [Rubinchik and Shur, 2018]. It is known that, when S is over an alphabet of size σ and is given in an online manner, then the palindromic tree of S can be constructed in O(nlog⁡σ) time with O(n) space. In this paper, we consider the sliding window version of the problem: For a sliding window of length at most d, we present two versions of an algorithm which maintains the palindromic tree of size O(d) for every sliding window S[i..j] over S, where 1≤j−i+1≤d. The first version works in O(nlog⁡σ′) time with O(d) space where σ′≤d is the maximum number of distinct characters in the windows, and the second one works in O(n+dσ) time with (d+2)σ+O(d) space. We also show how our algorithms can be applied to efficient computation of minimal unique palindromic substrings (MUPS) and minimal absent palindromic words (MAPW) for a sliding window.

DOI： 10.1016/j.ipl.2021.106174

Web of Science

Scopus

researchmap
Factorizing Strings into Repetitions. Reviewed

Hiroe Inoue, Yoshiaki Matsuoka, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

Theory Comput. Syst. 66 ( 2 ) 484 - 501 2022

　More details

Publishing type：Research paper (scientific journal) Publisher：Springer Science and Business Media {LLC}

DOI： 10.1007/s00224-022-10070-3

researchmap
The Parameterized Suffix Tray Reviewed International journal

Noriki Fujisato, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

Proceedings of 12th International Conference on Algorithms and Complexity (CIAC 2021) 258 - 270 2021.10

　More details

Language：English Publishing type：Research paper (international conference proceedings)
Efficiently computing runs on a trie Reviewed

Ryo Sugahara, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

Theoretical Computer Science 887 143 - 151 2021.10

　More details

Language：English Publishing type：Research paper (scientific journal)

DOI： 10.1016/j.tcs.2021.07.011
Position Heaps for Cartesian-Tree Matching on Strings and Tries Reviewed International journal

Akio Nishimoto, Noriki Fujisato, Yuto Nakashima, Shunsuke Inenaga

Proceedings of 28th International Symposium in String Processing and Information Retrieval (SPIRE 2021) 241 - 254 2021.10

　More details

Language：English Publishing type：Research paper (international conference proceedings)
On the Approximation Ratio of LZ-End to LZ77 Reviewed International journal

Takumi Ideue, Takuya Mieno, Mitsuru Funakoshi, Yuto Nakashima, Shunsuke Inenaga, Masayuki Takeda

Proceedings of 28th International Symposium in String Processing and Information Retrieval (SPIRE 2021) 114 - 126 2021.10

　More details

Language：English Publishing type：Research paper (international conference proceedings)
Longest Common Rollercoasters Reviewed International journal

Kosuke Fujita, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

Proceedinds of 28th International Symposium in String Processing and Information Retrieval (SPIRE 2021) 21 - 32 2021.10

　More details

Language：English Publishing type：Research paper (international conference proceedings)
Grammar Index by Induced Suffix Sorting Reviewed International journal

Tooru Akagi, Dominik Köppl, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

Proceedings of 28th International Symposium in String Processing and Information Retrieval (SPIRE 2021) 85 - 99 2021.10

　More details

Language：English Publishing type：Research paper (international conference proceedings)
Counting Lyndon Subsequence Reviewed

Ryo Hirakawa, Yuto Nakashima, Shunsuke Inenaga, Masayuki Takeda

Proceedings of the Prague Stringology Conference 2021 (PSC 2021) 53 - 60 2021.8

　More details

Language：English Publishing type：Research paper (international conference proceedings)
Computing Minimal Unique Substrings for a Sliding Window Reviewed

Takuya Mieno, Yuta Fujishige, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

Algorithmica 84 ( 3 ) 670 - 693 2021.8 （ ISSN:0178-4617 eISSN:1432-0541 ）

　More details

Language：English Publishing type：Research paper (scientific journal) Publisher：Springer Science and Business Media LLC

<title>Abstract</title>A substring <italic>u</italic> of a string <italic>T</italic> is called a <italic>minimal unique substring</italic> (<italic>MUS</italic>) of <italic>T</italic> if <italic>u</italic> occurs exactly once in <italic>T</italic> and any proper substring of <italic>u</italic> occurs at least twice in <italic>T</italic>. In this paper, we study the problem of computing MUSs for a sliding window over a given string <italic>T</italic>. We first show how the set of MUSs can change when the window slides over <italic>T</italic>. We then present an <inline-formula><alternatives><tex-math>$$O(n\log \sigma ')$$</tex-math><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML">
<mml:mrow>
<mml:mi>O</mml:mi>
<mml:mo>(</mml:mo>
<mml:mi>n</mml:mi>
<mml:mo>log</mml:mo>
<mml:msup>
<mml:mi>σ</mml:mi>
<mml:mo>′</mml:mo>
</mml:msup>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:math></alternatives></inline-formula>-time and <italic>O</italic>(<italic>d</italic>)-space algorithm to compute MUSs for a sliding window of size <italic>d</italic> over the input string <italic>T</italic> of length <italic>n</italic>, where <inline-formula><alternatives><tex-math>$$\sigma '\le d$$</tex-math><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML">
<mml:mrow>
<mml:msup>
<mml:mi>σ</mml:mi>
<mml:mo>′</mml:mo>
</mml:msup>
<mml:mo>≤</mml:mo>
<mml:mi>d</mml:mi>
</mml:mrow>
</mml:math></alternatives></inline-formula> is the maximum number of distinct characters in every window.

DOI： 10.1007/s00453-021-00864-1

Web of Science

Scopus

researchmap

Other Link： https://link.springer.com/article/10.1007/s00453-021-00864-1/fulltext.html
Compressed Communication Complexity of Hamming Distance Reviewed

Shiori Mitsuya, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

Algorithms 14 ( 4 ) 116 - 116 2021.4

　More details

Language：English Publishing type：Research paper (scientific journal)

DOI： 10.3390/a14040116
Computing longest palindromic substring after single-character or block-wise edits. Reviewed

Mitsuru Funakoshi, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

Theor. Comput. Sci. 859 116 - 133 2021.3

　More details

Language：Others Publishing type：Research paper (scientific journal)

DOI： 10.1016/j.tcs.2021.01.014
Space-efficient algorithms for computing minimal/shortest unique substrings. Reviewed

Takuya Mieno, Dominik Köppl, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

Theor. Comput. Sci. 845 230 - 242 2020.12

　More details

Language：Others Publishing type：Research paper (scientific journal)

DOI： 10.1016/j.tcs.2020.09.017
Towards Efficient Interactive Computation of Dynamic Time Warping Distance. Reviewed

Akihiro Nishi, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

Proceedings of 27th International Symposium on String Processing and Information Retrieval (SPIRE 2020) 27 - 41 2020.10

　More details

Language：Others Publishing type：Research paper (other academic)

DOI： 10.1007/978-3-030-59212-7_3
On Repetitiveness Measures of Thue-Morse Words. Reviewed

Kanaru Kutsukake, Takuya Matsumoto, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

Proceedings of 27th International Symposium on String Processing and Information Retrieval (SPIRE 2020) 213 - 220 2020.10

　More details

Language：Others Publishing type：Research paper (other academic)

DOI： 10.1007/978-3-030-59212-7_15
Lyndon Words, the Three Squares Lemma, and Primitive Squares. Reviewed

Hideo Bannai, Takuya Mieno, Yuto Nakashima

Proceedings of 27th International Symposium on String Processing and Information Retrieval (SPIRE 2020) 265 - 273 2020.10

　More details

Language：Others Publishing type：Research paper (other academic)

DOI： 10.1007/978-3-030-59212-7_19
Grammar-compressed Self-index with Lyndon Words Reviewed

Kazuya Tsuruta, Dominik Köppl, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

情報処理学会論文誌数理モデル化と応用（TOM） 13 ( 2 ) 84 - 92 2020.8

　More details

Language：English

Grammar-compressed Self-index with Lyndon Words
We introduce a new class of straight-line programs (SLPs), named the Lyndon SLP, inspired by the Lyndon trees (Barcelo, 1990). Based on this SLP, we propose a self-index data structure of O(g) words of spacethat can be built from a string T in O(n lg n) expected time, retrieving the starting positions of all occurrences of a pattern P of length m in O(m + lg m lg n + occ lg g) time, where n is the length of T, g is the size of the Lyndon SLP for T, and occ is the number of occurrences of P in T.We introduce a new class of straight-line programs (SLPs), named the Lyndon SLP, inspired by the Lyndon trees (Barcelo, 1990). Based on this SLP, we propose a self-index data structure of O(g) words of spacethat can be built from a string T in O(n lg n) expected time, retrieving the starting positions of all occurrences of a pattern P of length m in O(m + lg m lg n + occ lg g) time, where n is the length of T, g is the size of the Lyndon SLP for T, and occ is the number of occurrences of P in T.
Detecting k-(Sub-)Cadences and Equidistant Subsequence Occurrences

Mitsuru Funakoshi, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda, Ayumi Shinohara

31st Annual Symposium on Combinatorial Pattern Matching, CPM 2020 31st Annual Symposium on Combinatorial Pattern Matching, CPM 2020 2020.6

　More details

Language：English Publishing type：Research paper (other academic)

The equidistant subsequence pattern matching problem is considered. Given a pattern string P and a text string T, we say that P is an equidistant subsequence of T if P is a subsequence of the text such that consecutive symbols of P in the occurrence are equally spaced. We can consider the problem of equidistant subsequences as generalizations of (sub-)cadences. We give bit-parallel algorithms that yield o(n2) time algorithms for finding k-(sub-)cadences and equidistant subsequences. Furthermore, O(n log2 n) and O(n log n) time algorithms, respectively for equidistant and Abelian equidistant matching for the case P = 3, are shown. The algorithms make use of a technique that was recently introduced which can efficiently compute convolutions with linear constraints. 2012 ACM Subject Classification Mathematics of computing ! Combinatorial algorithms.

DOI： 10.4230/LIPIcs.CPM.2020.12
DAWGs for Parameterized Matching Online Construction and Related Indexing Structures

Katsuhito Nakashima, Noriki Fujisato, Diptarama Hendrian, Yuto Nakashima, Ryo Yoshinaka, Shunsuke Inenaga, Hideo Bannai, Ayumi Shinohara, Masayuki Takeda

31st Annual Symposium on Combinatorial Pattern Matching, CPM 2020 31st Annual Symposium on Combinatorial Pattern Matching, CPM 2020 2020.6

　More details

Language：English Publishing type：Research paper (other academic)

Two strings x and y over of equal length are said to parameterized match (p-match) if there is a renaming bijection f : That is identity on and transforms x to y (or vice versa). The p-matching problem is to look for substrings in a text that p-match a given pattern. In this paper, we propose parameterized suffix automata (p-suffix automata) and parameterized directed acyclic word graphs (PDAWGs) which are the p-matching versions of suffix automata and DAWGs. While suffix automata and DAWGs are equivalent for standard strings, we show that p-suffix automata can have(n2) nodes and edges but PDAWGs have only O(n) nodes and edges, where n is the length of an input string. We also give O(n log( + ))-time O(n)-space algorithm that builds the PDAWG in a left-to-right online manner. As a byproduct, it is shown that the parameterized suffix tree for the reversed string can also be built in the same time and space, in a right-to-left online manner. 2012 ACM Subject Classification Theory of computation ! Pattern matching.

DOI： 10.4230/LIPIcs.CPM.2020.26
Practical grammar compression based on maximal repeats Reviewed

Isamu Furuya, Takuya Takagi, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Takuya Kida

Algorithms 13 ( 4 ) 2020.4

　More details

Language：English Publishing type：Research paper (scientific journal)

This study presents an analysis of RePair, which is a grammar compression algorithm known for its simple scheme, while also being practically effective. First, we show that the main process of RePair, that is, the step by step substitution of the most frequent symbol pairs, works within the corresponding most frequent maximal repeats. Then, we reveal the relation between maximal repeats and grammars constructed by RePair. On the basis of this analysis, we further propose a novel variant of RePair, called MR-RePair, which considers the one-time substitution of the most frequent maximal repeats instead of the consecutive substitution of the most frequent pairs. The results of the experiments comparing the size of constructed grammars and execution time of RePair and MR-RePair on several text corpora demonstrate that MR-RePair constructs more compact grammars than RePair does, especially for highly repetitive texts.

DOI： 10.3390/A13040103
C-Trie++ A dynamic trie tailored for fast prefix searches

Kazuya Tsuruta, Dominik Köppl, Shunsuke Kanda, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

2020 Data Compression Conference, DCC 2020 Proceedings - DCC 2020 Data Compression Conference 243 - 252 2020.3

　More details

Language：English Publishing type：Research paper (other academic)

Given a dynamic set K of k strings of total length n whose characters are drawn from an alphabet of size σ, a keyword dictionary is a data structure built on K that provides locate, prefix search, and update operations on K. Under the assumption that α = w / lg σ characters fit into a single machine word w, we propose a keyword dictionary that represents K in n lg σ + Θ(k lg n) bits of space, supporting all operations in Θ(m / α + lg α) expected time on an input string of length m in the word RAM model. This data structure is underlined with an exhaustive practical evaluation, highlighting the practical usefulness of the proposed data structure, especially for prefix searches - one of the most elementary keyword dictionary operations.

DOI： 10.1109/DCC47342.2020.00032
Fast Algorithms for the Shortest Unique Palindromic Substring Problem on Run-Length Encoded Strings Reviewed

Kiichi Watanabe, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

Theory of Computing Systems 2020.1

　More details

Language：English Publishing type：Research paper (scientific journal)

For a string S, a palindromic substring S[i.j] is said to be a shortest unique palindromic substring (SUPS) for an interval [s,t] in S, if S[i.j] occurs exactly once in S, the interval [i,j] contains [s,t], and every palindromic substring containing [s,t] which is shorter than S[i.j] occurs at least twice in S. In this paper, we study the problem of answering SUPS queries on run-length encoded strings. We show how to preprocess a given run-length encoded string RLE_S of size m in O(m) space and O(mlogσRLES+mlogm/loglogm) time so that all SUPSs for any subsequent query interval can be answered in O(logm/loglogm+α) time, where α is the number of outputs, and σRLES is the number of distinct runs of RLE_S. Additionaly, we consider a variant of the SUPS problem where a query interval is also given in a run-length encoded form. For this variant of the problem, we present two alternative algorithms with faster queries. The first one answers queries in O(loglogm/logloglogm+α) time and can be built in O(mlogσRLES+mlogm/loglogm) time, and the second one answers queries in O(log log m+ α) time and can be built in O(mlogσRLES) time. Both of these data structures require O(m) space.

DOI： 10.1007/s00224-020-09980-x
Minimal Unique Substrings and Minimal Absent Words in a Sliding Window

Takuya Mieno, Yuki Kuhara, Tooru Akagi, Yuta Fujishige, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

Proceedings of 46th International Conference on Current Trends in Theory and Practice of Computer Science, SOFSEM 2020 148 - 160 2020.1

　More details

Language：English Publishing type：Research paper (other academic)

A substring u of a string T is called a minimal unique substring (MUS) of T if u occurs exactly once in T and any proper substring of u occurs at least twice in T. A string w is called a minimal absent word (MAW) of T if w does not occur in T and any proper substring of w occurs in T. In this paper, we study the problems of computing MUSs and MAWs in a sliding window over a given string T. We first show how the set of MUSs can change in a sliding window over T, and present an-time and O(d)-space algorithm to compute MUSs in a sliding window of width d over T, where is the maximum number of distinct characters in every window. We then give tight upper and lower bounds on the maximum number of changes in the set of MAWs in a sliding window over T. Our bounds improve on the previous results in Crochemore et al. (2017).

DOI： 10.1007/978-3-030-38919-2_13
Faster STR-EC-LCS Computation

Kohei Yamada, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

Proceedings of 46th International Conference on Current Trends in Theory and Practice of Computer Science, SOFSEM 2020 125 - 135 2020.1

　More details

Language：English Publishing type：Research paper (other academic)

The longest common subsequence (LCS) problem is a central problem in stringology that finds the longest common subsequence of given two strings A and B. More recently, a set of four constrained LCS problems (called generalized constrained LCS problem) were proposed by Chen and Chao [J. Comb. Optim, 2011]. In this paper, we consider the substring-excluding constrained LCS (STR-EC-LCS) problem. A string Z is said to be an STR-EC-LCS of two given strings A and B excluding P if, Z is one of the longest common subsequences of A and B that does not contain P as a substring. Wang et al. proposed a dynamic programming solution which computes an STR-EC-LCS in O(mnr) time and space where [Inf. Process. Lett., 2013]. In this paper, we show a new solution for the STR-EC-LCS problem. Our algorithm computes an STR-EC-LCS in time where denotes the set of distinct characters occurring in both A and B, and L is the length of the STR-EC-LCS. This algorithm is faster than the O(mnr)-time algorithm for short/long STR-EC-LCS (namely, or), and is at least as efficient as the O(mnr)-time algorithm for all cases.

DOI： 10.1007/978-3-030-38919-2_11
An improved data structure for left-right maximal generic words problem

Yuta Fujishige, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

Proceedings of 30th International Symposium on Algorithms and Computation, ISAAC 2019 2019.12

　More details

Language：English Publishing type：Research paper (other academic)

For a set D of documents and a positive integer d, a string w is said to be d-left-right maximal, if (1) w occurs in at least d documents in D, and (2) any proper superstring of w occurs in less than d documents. The left-right-maximal generic words problem is, given a set D of documents, to preprocess D so that for any string p and for any positive integer d, all the superstrings of p that are d-left-right maximal can be answered quickly. In this paper, we present an O(n log m) space data structure (in words) which answers queries in O(|p| + o log log m) time, where n is the total length of documents in D, m is the number of documents in D and o is the number of outputs. Our solution improves the previous one by Nishimoto et al. (PSC 2015), which uses an O(n log n) space data structure answering queries in O(|p| + r · log n + o · log² n) time, where r is the number of right-extensions q of p occurring in at least d documents such that any proper right extension of q occurs in less than d documents.

DOI： 10.4230/LIPIcs.ISAAC.2019.40
On the size of the smallest alphabet for Lyndon trees Reviewed

Yuto Nakashima, Takuya Takagi, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

Theoretical Computer Science 792 131 - 143 2019.11

　More details

Language：English Publishing type：Research paper (scientific journal)

We consider the problem of reverse-engineering the Lyndon tree, i.e., given a full binary ordered tree T with n leaves as input, we are to compute a string w of length n of which Lyndon tree is isomorphic to the input tree T. Hereby we call such a string a solution string. Although the problem is easily solvable in linear time for binary alphabets and unbounded-size alphabets, it is not known how to efficiently find the smallest alphabet size for a solution string. In this paper, we show several new observations concerning this problem. Namely, we show that: 1) For any positive integer n, there exists a full binary ordered tree T with n leaves, s.t. the smallest alphabet size of a solution string for T is ⌊ [Formula presented] ⌋+1. 2) For any full binary ordered tree T with n leaves, there exists a solution string w over an alphabet of size at most ⌊ [Formula presented] ⌋+1. 3) For any full binary ordered tree T, there exists a solution string w over an alphabet of size at most h+1, where h is the height of T. 4) For any complete binary ordered tree T with 2^k leaves, there exists a solution string w over an alphabet of size at most 4.

DOI： 10.1016/j.tcs.2018.06.044
Direct Linear Time Construction of Parameterized Suffix and LCP Arrays for Constant Alphabets

Noriki Fujisato, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

Proceedings of 26th International Symposium on String Processing and Information Retrieval, SPIRE 2019 382 - 391 2019.10

　More details

Language：English Publishing type：Research paper (other academic)

We present the first worst-case linear time algorithm that directly computes the parameterized suffix and LCP arrays for constant sized alphabets. Previous algorithms either required quadratic time or the parameterized suffix tree to be built first. More formally, for a string over static alphabet and parameterized alphabet, our algorithm runs in time and O(n) words of space, where is the number of distinct symbols of in the string.

DOI： 10.1007/978-3-030-32686-9_27
On Longest Common Property Preserved Substring Queries

Kazuki Kai, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda, Tomasz Kociumaka

Proceedings of 26th International Symposium on String Processing and Information Retrieval, SPIRE 2019 162 - 174 2019.10

　More details

Language：English Publishing type：Research paper (other academic)

We revisit the problem of longest common property preserving substring queries introduced by Ayad et al. (SPIRE 2018, arXiv 2018). We consider a generalized and unified on-line setting, where we are given a set X of k strings of total length n that can be pre-processed so that, given a query string y and a positive integer (Formula presented), we can determine the longest substring of y that satisfies some specific property and is common to at least (Formula presented) strings in X. Ayad et al. considered the longest square-free substring in an on-line setting and the longest periodic and palindromic substring in an off-line setting. In this paper, we give efficient solutions in the on-line setting for finding the longest common square, periodic, palindromic, and Lyndon substrings. More precisely, we show that X can be pre-processed in O(n) time resulting in a data structure of O(n) size that answers queries in (Formula presented) time and O(1) working space, where (Formula presented) is the size of the alphabet, and the common substring must be a square, a periodic substring, a palindrome, or a Lyndon word.

DOI： 10.1007/978-3-030-32686-9_12
Compact Data Structures for Shortest Unique Substring Queries

Takuya Mieno, Dominik Köppl, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

Proceedings of 26th International Symposium on String Processing and Information Retrieval, SPIRE 2019 107 - 123 2019.10

　More details

Language：English Publishing type：Research paper (other academic)

Given a string T of length n, a substring of T is called a shortest unique substring (SUS) for an interval [s, t] if (a) u occurs exactly once in T, (b) u contains the interval [s, t] (i.e.), and (c) every substring v of T with containing [s, t] occurs at least twice in T. Given a query interval, the interval SUS problem is to output all the SUSs for the interval [s, t]. In this article, we propose a bits data structure answering an interval SUS query in output-sensitive time, where is the number of returned SUSs. Additionally, we focus on the point SUS problem, which is the interval SUS problem for. Here, we propose a bits data structure answering a point SUS query in the same output-sensitive time.

DOI： 10.1007/978-3-030-32686-9_8
Computing maximal palindromes and distinct palindromes in a trie

Mitsuru Funakoshi, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

23rd Prague Stringology Conference, PSC 2019 Proceedings of the Prague Stringology Conference, PSC 2019 3 - 15 2019.8

　More details

Language：English Publishing type：Research paper (other academic)

It is known that all maximal palindromes of a given string T of length n can be computed in O(n) time by Manacher's algorithm [J. ACM '75]. Also, all distinct palindromes in T can be computed in O(n) time [Groult et al., Inf. Process. Lett. 2010]. In this paper, we consider the problem of computing maximal palindromes and distinct palindromes of a given trie T (i.e. rooted edge-labeled tree). A trie is a natural generalization of a string which can be seen as a single path tree. We propose algorithms to compute all maximal palindromes and all distinct palindromes in T in O(N log h) time and O(N) space, where N is the number of edges in T and h is the height of T . To our knowledge these are the first sub-quadratic time solutions to these problems.
Shortest unique palindromic substring queries on run-length encoded strings

Kiichi Watanabe, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

Proceedings of 30th International Workshop on Combinatorial Algorithms, IWOCA 2019 430 - 441 2019.7

　More details

Language：English Publishing type：Research paper (other academic)

For a string S, a palindromic substring S[i.j] is said to be a shortest unique palindromic substring (SUPS) for an interval [s, t] in S, if S[i.j] occurs exactly once in S, the interval [i, j] contains [s, t], and every palindromic substring containing [s, t] which is shorter than S[i.j] occurs at least twice in S. In this paper, we study the problem of answering SUPS queries on run-length encoded strings. We show how to preprocess a given run-length encoded string RLE_S of size m in O(m) space and O(mlog σRLE_S + m√log m/log logm) time so that all SUPSs for any subsequent query interval can be answered in O(√log m/log logm+ α) time, where α is the number of outputs, and σRLE_S is the number of distinct runs of RLE_S.

DOI： 10.1007/978-3-030-25005-8_35
Computing runs on a trie

Ryo Sugahara, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

Proceedings of 30th Annual Symposium on Combinatorial Pattern Matching, CPM 2019 2019.6

　More details

Language：English Publishing type：Research paper (other academic)

A maximal repetition, or run, in a string, is a maximal periodic substring whose smallest period is at most half the length of the substring. In this paper, we consider runs that correspond to a path on a trie, or in other words, on a rooted edge-labeled tree where the endpoints of the path must be a descendant/ancestor of the other. For a trie with n edges, we show that the number of runs is less than n. We also show an O(n√log n log log n) time and O(n) space algorithm for counting and finding the shallower endpoint of all runs. We further show an O(n log n) time and O(n) space algorithm for finding both endpoints of all runs. We also discuss how to improve the running time even more.

DOI： 10.4230/LIPIcs.CPM.2019.23
On the size of overlapping Lempel-Ziv and Lyndon factorizations

Yuki Urabe, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

Proceedings of 30th Annual Symposium on Combinatorial Pattern Matching, CPM 2019 2019.6

　More details

Language：English Publishing type：Research paper (other academic)

Lempel-Ziv (LZ) factorization and Lyndon factorization are well-known factorizations of strings. Recently, Kärkkäinen et al. studied the relation between the sizes of the two factorizations, and showed that the size of the Lyndon factorization is always smaller than twice the size of the non-overlapping LZ factorization [STACS 2017]. In this paper, we consider a similar problem for the overlapping version of the LZ factorization. Since the size of the overlapping LZ factorization is always smaller than the size of the non-overlapping LZ factorization and, in fact, can even be an O(log n) factor smaller, it is not immediately clear whether a similar bound as in previous work would hold. Nevertheless, in this paper, we prove that the size of the Lyndon factorization is always smaller than four times the size of the overlapping LZ factorization.

DOI： 10.4230/LIPIcs.CPM.2019.29
Faster queries for longest substring palindrome after block edit

Mitsuru Funakoshi, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

Proceedings of 30th Annual Symposium on Combinatorial Pattern Matching, CPM 2019 2019.6

　More details

Language：English Publishing type：Research paper (other academic)

Palindromes are important objects in strings which have been extensively studied from combinatorial, algorithmic, and bioinformatics points of views. Manacher [J. ACM 1975] proposed a seminal algorithm that computes the longest substring palindromes (LSPals) of a given string in O(n) time, where n is the length of the string. In this paper, we consider the problem of finding the LSPal after the string is edited. We present an algorithm that uses O(n) time and space for preprocessing, and answers the length of the LSPals in O(ℓ + log log n) time, after a substring in T is replaced by a string of arbitrary length ℓ. This outperforms the query algorithm proposed in our previous work [CPM 2018] that uses O(ℓ + log n) time for each query.

DOI： 10.4230/LIPIcs.CPM.2019.27
The Parameterized Position Heap of a Trie

Noriki Fujisato, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

Proceedings of 11th International Conference on Algorithms and Complexity, CIAC 2019 237 - 248 2019.5

　More details

Language：English Publishing type：Research paper (other academic)

Let and be disjoint alphabets of respective size and. Two strings over of equal length are said to parameterized match (p-match) if there is a bijection such that (1) f is identity on and (2) f maps the characters of one string to those of the other string so that the two strings become identical. We consider the p-matching problem on a (reversed) trie and a string pattern P such that every path that p-matches P has to be reported. Let N be the size of the given trie. In this paper, we propose the parameterized position heap for that occupies O(N) space and supports p-matching queries in time, where m is the length of a query pattern P and is the number of paths in to report. We also present an algorithm which constructs the parameterized position heap for a given trie in time and working space.

DOI： 10.1007/978-3-030-17402-6_20
MR-RePair Grammar Compression Based on Maximal Repeats

Isamu Furuya, Takuya Takagi, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Takuya Kida

2019 Data Compression Conference, DCC 2019 Proceedings - DCC 2019 2019 Data Compression Conference 508 - 517 2019.3

　More details

Language：English Publishing type：Research paper (other academic)

We analyze the grammar generation algorithm of the RePair compression algorithm and show the relation between a grammar generated by RePair and maximal repeats. We reveal that RePair replaces step by step the most frequent pairs within the corresponding most frequent maximal repeats. Then, we design a novel variant of RePair, called MR-RePair, which substitutes the most frequent maximal repeats at once instead of substituting the most frequent pairs consecutively. We implemented MR-RePair and compared the size of the grammar generated by MR-RePair to that by RePair on several text corpora. Our experiments show that MR-RePair generates more compact grammars than RePair does, especially for highly repetitive texts.

DOI： 10.1109/DCC.2019.00059
Recovering, counting and enumerating strings from forward and backward suffix arrays

Yuki Kuhara, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

25th International Symposium on String Processing and Information Retrieval, SPIRE 2018 String Processing and Information Retrieval - 25th International Symposium, SPIRE 2018, Proceedings 254 - 267 2018.10

　More details

Language：English Publishing type：Research paper (other academic)

The suffix array SA_w of a string w of length n is a permutation of [1..n] such that SA_w[i]=j iff w[j, n] is the lexicographically i-th suffix of w. In this paper, we consider variants of the reverse-engineering problem on suffix arrays with two given permutations P and Q of [1..n], such that P refers to the forward suffix array of some string w and Q refers to the backward suffix array of the reversed string w^R. Our results are the following: (1) An algorithm which computes a solution string over an alphabet of the smallest size, in O(n) time. (2) The exact number of solution strings over an alphabet of size σ. (3) An efficient algorithm which computes all solution strings in the lexicographical order, in time near optimal up to log n factor.

DOI： 10.1007/978-3-030-00479-8_21
Algorithms and combinatorial properties on shortest unique palindromic substrings Reviewed

Hiroe Inoue, Yuto Nakashima, Takuya Mieno, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

Journal of Discrete Algorithms 52-53 122 - 132 2018.9

　More details

Language：English Publishing type：Research paper (scientific journal)

A palindrome is a string that reads the same forward and backward. A palindromic substring P of a string S is called a shortest unique palindromic substring (SUPS) for an interval [s,t] in S, if P occurs exactly once in S, this occurrence of P contains interval [s,t], and every palindromic substring of S which contains interval [s,t] and is shorter than P occurs at least twice in S. The SUPS problem is, given a string S, to preprocess S so that for any subsequent query interval [s,t] all the SUPSs for interval [s,t] can be answered quickly. We present an optimal solution to this problem. Namely, we show how to preprocess a given string S of length n in O(n) time and space so that all SUPSs for any subsequent query interval can be answered in O(α+1) time, where α is the number of outputs. We also discuss the number of SUPSs in a string.

DOI： 10.1016/j.jda.2018.11.009
Right-to-left online construction of parameterized position heaps

Noriki Fujisato, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

22nd Prague Stringology Conference, PSC 2018 Proceedings of the Prague Stringology Conference, PSC 2018 91 - 102 2018.8

　More details

Language：English Publishing type：Research paper (other academic)

Two strings of equal length are said to parameterized match if there is a bijection that maps the characters of one string to those of the other string, so that two strings become identical. The parameterized pattern matching problem is, given two strings T and P, to find the occurrences of substrings in T that parameterized match P. Diptarama et al. [Position Heaps for Parameterized Strings, CPM 2017] proposed an indexing data structure called parameterized position heaps, and gave a left-toright online construction algorithm. In this paper, we present a right-to-left online construction algorithm for parameterized position heaps. For a text string T of length n over two kinds of alphabets Σand II of respective size σ and π, our construction algorithm runs in O(n log(σ+π)) time with O(n) space. Our right-to-left parameterized position heaps support pattern matching queries in O(mlog(σ+π)+mπ+pocc)) time, where m is the length of a query pattern P and pocc is the number of occurrences to report. Our construction and pattern matching algorithms are as efficient as Diptarama et al.'s algorithms.
O(n log n)-time text compression by LZ-style longest first substitution

Akihiro Nishi, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

22nd Prague Stringology Conference, PSC 2018 Proceedings of the Prague Stringology Conference, PSC 2018 12 - 26 2018.8

　More details

Language：English Publishing type：Research paper (other academic)

Mauer et al. [A Lempel-Ziv-style Compression Method for Repetitive Texts, PSC 2017] proposed a hybrid text compression method called LZ-LFS which has both features of Lempel-Ziv 77 factorization and longest first substitution. They showed that LZ-LFS can achieve better compression ratio for repetitive texts, compared to some state-of-the-art compression algorithms. The drawback of Mauer et al.'s method is that their LZ-LFS compression algorithm takes O(n2) time on an input string of length n. In this paper, we show a faster LZ-LFS compression algorithm that works in O(n log n) time. We also propose a simpler version of LZ-LFS that can be computed in O(n) time.
Lyndon factorization of grammar compressed texts revisited

Isamu Furuya, Yuto Nakashima, I. Tomohiro, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

29th Annual Symposium on Combinatorial Pattern Matching, CPM 2018 29th Annual Symposium on Combinatorial Pattern Matching, CPM 2018 241 - 2410 2018.5

　More details

Language：English Publishing type：Research paper (other academic)

We revisit the problem of computing the Lyndon factorization of a string w of length N which is given as a straight line program (SLP) of size n. For this problem, we show a new algorithm which runs in O(P(n,N) + Q(n,N)n log logN) time and O(n logN + S(n,N)) space where P(n,N), S(n,N), Q(n,N) are respectively the pre-processing time, space, and query time of a data structure for longest common extensions (LCE) on SLPs. Our algorithm improves the algorithm proposed by I et al. (TCS '17), and can be more efficient than the O(N)-time solution by Duval (J. Algorithms '83) when w is highly compressible.

DOI： 10.4230/LIPIcs.CPM.2018.24
Longest substring palindrome after edit

Mitsuru Funakoshi, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

29th Annual Symposium on Combinatorial Pattern Matching, CPM 2018 29th Annual Symposium on Combinatorial Pattern Matching, CPM 2018 105 121 - 1214 2018.5

　More details

Language：English Publishing type：Research paper (other academic)

It is known that the length of the longest substring palindromes (LSPals) of a given string T of length n can be computed in O(n) time by Manacher's algorithm [J. ACM '75]. In this paper, we consider the problem of finding the LSPal after the string is edited. We present an algorithm that uses O(n) time and space for preprocessing, and answers the length of the LSPals in O(log(min{ω, log n})) time after single character substitution, insertion, or deletion, where ω denotes the number of distinct characters appearing in T. We also propose an algorithm that uses O(n) time and space for preprocessing, and answers the length of the LSPals in O(ℓ+log n) time, after an existing substring in T is replaced by a string of arbitrary length ℓ.

DOI： 10.4230/LIPIcs.CPM.2018.12
Longest lyndon substring after edit

Yuki Urabe, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

29th Annual Symposium on Combinatorial Pattern Matching, CPM 2018 29th Annual Symposium on Combinatorial Pattern Matching, CPM 2018 191 - 1910 2018.5

　More details

Language：English Publishing type：Research paper (other academic)

The longest Lyndon substring of a string T is the longest substring of T which is a Lyndon word. LLS(T) denotes the length of the longest Lyndon substring of a string T. In this paper, we consider computing LLS(T′) where T′ is an edited string formed from T. After O(n) time and space preprocessing, our algorithm returns LLS(T′) in O(log n) time for any single character edit. We also consider a version of the problem with block edits, i.e., a substring of T is replaced by a given string of length l. After O(n) time and space preprocessing, our algorithm returns LLS(T′) in O(l log σ + log n) time for any block edit where σ is the number of distinct characters in T. We can modify our algorithm so as to output all the longest Lyndon substrings of T′ for both problems.

DOI： 10.4230/LIPIcs.CPM.2018.19
Faster online elastic degenerate string matching

Kotaro Aoyama, Yuto Nakashima, I. Tomohiro, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

29th Annual Symposium on Combinatorial Pattern Matching, CPM 2018 29th Annual Symposium on Combinatorial Pattern Matching, CPM 2018 91 - 910 2018.5

　More details

Language：English Publishing type：Research paper (other academic)

An Elastic-Degenerate String [Iliopoulus et al., LATA 2017] is a sequence of sets of strings, which was recently proposed as a way to model a set of similar sequences. We give an online algorithm for the Elastic-Degenerate String Matching (EDSM) problem that runs in O(nm √mlogm+ N) time and O(m) working space, where n is the number of elastic degenerate segments of the text, N is the total length of all strings in the text, and m is the length of the pattern. This improves the previous algorithm by Grossi et al. [CPM 2017] that runs in O(nm² + N) time.

DOI： 10.4230/LIPIcs.CPM.2018.9
Shortest unique palindromic substring queries in optimal time

Yuto Nakashima, Hiroe Inoue, Takuya Mieno, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

28th International Workshop on Combinational Algorithms, IWOCA 2017 Combinatorial Algorithms - 28th International Workshop, IWOCA 2017, Revised Selected Papers 397 - 408 2018.1

　More details

Language：English Publishing type：Research paper (other academic)

palindrome is a string that reads the same forward and backward. A palindromic substring P of a string S is called a shortest unique palindromic substring (SUPS) for an interval [s, t] in S, if P occurs exactly once in S, this occurrence of P contains interval [s, t], and every palindromic substring of S which contains interval [s, t] and is shorter than P occurs at least twice in S. The SUPS problem is, given a string S, to preprocess S so that for any subsequent query interval [s, t] all the SUPSs for interval [s, t] can be answered quickly. We present an optimal solution to this problem. Namely, we show how to preprocess a given string S of length n in O(n) time and space so that all SUPSs for any subsequent query interval can be answered in O(α + 1) time, where α is the number of outputs.

DOI： 10.1007/978-3-319-78825-8_32
Almost linear time computation of maximal repetitions in run length encoded strings

Yuta Fujishige, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

28th International Symposium on Algorithms and Computation, ISAAC 2017 28th International Symposium on Algorithms and Computation, ISAAC 2017 2017.12

　More details

Language：English Publishing type：Research paper (other academic)

We consider the problem of computing all maximal repetitions contained in a string that is given in run-length encoding. Given a run-length encoding of a string, we show that the maximum number of maximal repetitions contained in the string is at most m+k-1, where m is the size of the run-length encoding, and k is the number of run-length factors whose exponent is at least 2. We also show an algorithm for computing all maximal repetitions in O(m α (m)) time and O(m) space, where α denotes the inverse Ackermann function.

DOI： 10.4230/LIPIcs.ISAAC.2017.33
The "runs" theorem Reviewed

Hideo Bannai, Tomohiro I, Shunsuke Inenaga, Yuto Nakashima, Masayuki Takeda, Kazuya Tsuruta

SIAM Journal on Computing 46 ( 5 ) 1501 - 1514 2017.9

　More details

Language：English Publishing type：Research paper (scientific journal)

We give a new characterization of maximal repetitions (or runs) in strings based on Lyndon words. The characterization leads to a proof of what was known as the "runs" conjecture [R. M. Kolpakov andG.Kucherov, Proceedings of the IEEE Symposium on Foundations of Computer Science (FOCS), IEEE Computer Society, Los Alamitos, CA, 1999, pp. 596-604]), which states that the maximum number of runs ρ(n) in a string of length n is less than n. The proof is remarkably simple, considering the numerous endeavors to tackle this problem in the last 15 years, and significantly improves our understanding of how runs can occur in strings. In addition, we obtain an upper bound of 3n for the maximum sum of exponents σ(n) of runs in a string of length n, improving on the best known bound of 4.1n by Crochemore et al. [J. Discrete Algorithms, 14 (2012), pp. 29-36], as well as other improved bounds on related problems. The characterization also gives rise to a new, conceptually simple linear-time algorithm for computing all the runs in a string. A notable characteristic of our algorithm is that, unlike all existing linear-time algorithms, it does not utilize the Lempel-Ziv factorization of the string. We also establish a relationship between runs and nodes of the Lyndon tree, which gives a simple optimal solution to the 2-period query problem that was recently solved by Kociumaka et al. [Proceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, (SODA) 2015, San Diego, CA, SIAM, Philadelphia, 2015, pp. 532-551].

DOI： 10.1137/15M1011032
On reverse engineering the Lyndon tree

Yuto Nakashima, Takuya Takagi, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

21st Prague Stringology Conference, PSC 2017 Proceedings of the Prague Stringology Conference, PSC 2017 108 - 117 2017.8

　More details

Language：English Publishing type：Research paper (other academic)

We consider the problem of reverse engineering the Lyndon tree, i.e., given a full binary ordered tree T with n leaves as input, compute a string w of length n for which it’s Lyndon tree is isomorphic to the input tree. Although the problem is easy and solvable in linear time when assuming a binary alphabet or when there is no limit on the alphabet size, how to efficiently find the smallest alphabet size for a solution string is not known. We show several new observations concerning this problem. Namely, we show that: 1) For any full binary ordered tree T, there exists a solution string w over an alphabet of size at most h + 1, where h is the height of T. 2) For any positive n, there exists a full binary ordered tree T with n leaves, s.t. the smallest alphabet size of the solution string for T is ⌊ⁿ₂ ⌋ + 1.
Inferring strings from Lyndon factorization Reviewed

Yuto Nakashima, Takashi Okabe, Tomohiro I, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

Theoretical Computer Science 689 147 - 156 2017.8

　More details

Language：English Publishing type：Research paper (scientific journal)

The Lyndon factorization of a string w is a unique factorization ℓ₁^p₁,…,ℓ_m^p_m of w such that ℓ₁,…,ℓ_m is a sequence of Lyndon words that is monotonically decreasing in lexicographic order. In this paper, we consider the reverse-engineering problem on Lyndon factorization: Given a sequence S=((s₁,p₁),…,(s_m,p_m)) of ordered pairs of positive integers, find a string w whose Lyndon factorization corresponds to the input sequence S, i.e., the Lyndon factorization of w is in a form of ℓ₁^p₁,…,ℓ_m^p_m with |ℓ_i|=s_i for all 1≤i≤m. Firstly, we show that there exists a simple O(n)-time algorithm if the size of the alphabet is unbounded, where n is the length of the output string. Secondly, we present an O(n)-time algorithm to compute a string over an alphabet of the smallest size. Thirdly, we show how to compute only the size of the smallest alphabet in O(m) time. Fourthly, we give an O(m)-time algorithm to compute an O(m)-size representation of a string over an alphabet of the smallest size. Finally, we propose an efficient algorithm to enumerate all strings whose Lyndon factorizations correspond to S.

DOI： 10.1016/j.tcs.2017.05.038
On the size of Lempel-Ziv and Lyndon factorizations

Juha Kärkkäinen, Dominik Kempa, Yuto Nakashima, Simon J. Puglisi, Arseny M. Shur

34th Symposium on Theoretical Aspects of Computer Science, STACS 2017 34th Symposium on Theoretical Aspects of Computer Science, STACS 2017 2017.3

　More details

Language：English Publishing type：Research paper (other academic)

Lyndon factorization and Lempel-Ziv (LZ) factorization are both important tools for analysing the structure and complexity of strings, but their combinatorial structure is very different. In this paper, we establish the first direct connection between the two by showing that while the Lyndon factorization can be bigger than the non-overlapping LZ factorization (which we demonstrate by describing a new, non-trivial family of strings) it is always less than twice the size.

DOI： 10.4230/LIPIcs.STACS.2017.45
Faster Lyndon factorization algorithms for SLP and LZ78 compressed text Reviewed

Tomohiro I, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

Theoretical Computer Science 656 215 - 224 2016.12

　More details

Language：English Publishing type：Research paper (scientific journal)

We present two efficient algorithms which, given a compressed representation of a string w of length N, compute the Lyndon factorization of w. Given a straight line program (SLP) S of size n that describes w, the first algorithm runs in O(n²+P(n,N)+Q(n,N)nlog⁡n) time and O(n²+S(n,N)) space, where P(n,N), S(n,N), Q(n,N) are respectively the pre-processing time, space, and query time of a data structure for longest common extensions (LCE) on SLPs. Given the Lempel–Ziv 78 encoding of size s for w, the second algorithm runs in O(slog⁡s) time and space.

DOI： 10.1016/j.tcs.2016.03.005
Longest common Abelian factors and large alphabets

Golnaz Badkobeh, Travis Gagie, Szymon Grabowski, Yuto Nakashima, Simon J. Puglisi, Shiho Sugimoto

23rd International Symposium on String Processing and Information Retrieval, SPIRE 2016 String Processing and Information Retrieval - 23rd International Symposium, SPIRE 2016, Proceedings 254 - 259 2016.10

　More details

Language：English Publishing type：Research paper (other academic)

Two strings X and Y are considered Abelian equal if the letters of X can be permuted to obtain Y (and vice versa). Recently, Alatabbi et al. (2015) considered the longest common Abelian factor problem in which we are asked to find the length of the longest Abelian-equal factor present in a given pair of strings. They provided an algorithm that uses O(σn²) time and O(σn) space, where n is the length of the pair of strings and σ is the alphabet size. In this paper we describe an algorithm that uses O(n² log² n log∗ n) time and O(n log² n) space, significantly improving Alatabbi et al.’s result unless the alphabet is small. Our algorithm makes use of techniques for maintaining a dynamic set of strings under split, join, and equality testing (Melhorn et al., Algorithmica 17(2), 1997).

DOI： 10.1007/978-3-319-46049-9_24
Computing smallest and largest repetition factorizations in O(n log n) time

Hiroe Inoue, Yoshiaki Matsuoka, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

20th Prague Stringology Conference, PSC 2016 Proceedings of the Prague Stringology Conference, PSC 2016 135 - 145 2016.8

　More details

Language：English Publishing type：Research paper (other academic)

A factorization f1, . . . , fm of a string w is called a repetition factorization of w if each factor fi is a repetition, namely, fi = xkx' for some non-empty string x, an integer k ≥ 2, and x' being a proper prefix of x. Dumitran et al. (Proc. SPIRE 2015) proposed an algorithm which computes a repetition factorization of a given string w in O(n) time, where n is the length of w. In this paper, we propose two algorithms which compute smallest/largest repetition factorizations in O(n log n) time. The first algorithm is a simple O(n log n) space algorithm while the second one uses only O(n) space.
Constructing LZ78 tries and position heaps in linear time for large alphabets Reviewed

Yuto Nakashima, Tomohiro I, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

Information Processing Letters 115 ( 9 ) 655 - 659 2015.9

　More details

Language：English Publishing type：Research paper (scientific journal)

We present the first worst-case linear-time algorithm to compute the Lempel-Ziv 78 factorization of a given string over an integer alphabet. Our algorithm is based on nearest marked ancestor queries on the suffix tree of the given string. We also show that the same technique can be used to construct the position heap of a set of strings in worst-case linear time, when the set of strings is given as a trie.

DOI： 10.1016/j.ipl.2015.04.002
Computing left-right maximal generic words

Takaaki Nishimoto, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

19th Prague Stringology Conference, PSC 2015 Proceedings of the Prague Stringology Conference 2015, PSC 2015 5 - 16 2015.1

　More details

Language：English Publishing type：Research paper (other academic)

The maximal generic words problem was proposed by Kucherov et al. (SPIRE 2012). Let D be a set of documents. In this problem, given a pattern P and a threshold δ ≤ |D|, we want to compute all right-maximal extensions of P which occur in at least δ distinct documents. They proposed an O(n)-space data structure which can solve this problem in O(|P| + rocc) time where n is the total length of documents in D and rocc is the number of right-maximal extensions of P. The data structure can be constructed in O(n) time. In this paper, we propose a more generalized problem. Given a pattern P and a threshold δ ≤ |D|, we want to compute all left-right-maximal extensions of P which occur in at least δ distinct documents. We propose an O(n log n)- space data structure which can solve this problem in O(|P| + occ log² n + rocc log n) time where occ is the number of left-right-maximal extensions of P.
A new characterization of maximal repetitions by Lyndon trees

Hideo Bannai, Tomohiro I, Shunsuke Inenaga, Yuto Nakashima, Masayuki Takeda, Kazuya Tsuruta

26th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2015 Proceedings of the 26th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2015 562 - 571 2015.1

　More details

Language：English Publishing type：Research paper (other academic)

We give a new characterization of maximal repetitions (or runs) in strings, using a tree defined on recursive standard factorizations of Lyndon words, called the Lyndon tree. The characterization leads to a remarkably simple novel proof of the linearity of the maximum number of runs p(n) in a string of length n. Furthermore, we show an upper bound of p(n) < 1.5n, which improves on the best upper bound 1.6n (Crochemore & Hie 2008) that does not rely on computational verification. The proof also gives rise to a new, conceptually simple linear-time algorithm for computing all the runs in a string. A notable characteristic of our algorithm is that, unlike all existing linear-time algorithms, it does not utilize the Lempel-Ziv factorization of the string.

DOI： 10.1137/1.9781611973730.38
Inferring strings from Lyndon factorization

Yuto Nakashima, Takashi Okabe, I. Tomohiro, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

39th International Symposium on Mathematical Foundations of Computer Science, MFCS 2014 Mathematical Foundations of Computer Science 2014 - 39th International Symposium, MFCS 2014, Proceedings 565 - 576 2014.1

　More details

Language：English Publishing type：Research paper (other academic)

The Lyndon factorization of a string w is a unique factorization ℓ^p1₁,⋯, ℓ^pm_m of w s.t. ℓ₁,⋯, ℓ_m is a sequence of Lyndon words that is monotonically decreasing in lexicographic order. In this paper, we consider the reverse-engineering problem on Lyndon factorization: Given a sequence S = ((s₁, p₁),⋯, (s_m, p _m)) of ordered pairs of positive integers, find a string w whose Lyndon factorization corresponds to the input sequence S, i.e., the Lyndon factorization of w is in a form of ℓ^p1₁,⋯, ℓ^pm_m with |ℓ_i| = s_i for all 1 ≤ i ≤ m. Firstly, we show that there exists a simple O(n)-time algorithm if the size of the alphabet is unbounded, where n is the length of the output string. Secondly, we present an O(n)-time algorithm to compute a string over an alphabet of the smallest size. Thirdly, we show how to compute only the size of the smallest alphabet in O(m) time. Fourthly, we give an O(m)-time algorithm to compute an O(m)-size representation of a string over an alphabet of the smallest size. Finally, we propose an efficient algorithm to enumerate all strings whose Lyndon factorizations correspond to S.

DOI： 10.1007/978-3-662-44465-8_48
Faster lyndon factorization algorithms for SLP and LZ78 compressed text

Tomohiro I, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

20th International Symposium on String Processing and Information Retrieval, SPIRE 2013 String Processing and Information Retrieval - 20th International Symposium, SPIRE 2013, Proceedings 174 - 185 2013.10

　More details

Language：English Publishing type：Research paper (other academic)

We present two efficient algorithms which, given a compressed representation of a string w of length N, compute the Lyndon factorization of w. Given a straight line program (SLP) S of size n and height h that describes w, the first algorithm runs in O(nh(n + log N log n)) time and O(n²) space. Given the Lempel-Ziv 78 encoding of size s for w, the second algorithm runs in O(s log s) time and space.

DOI： 10.1007/978-3-319-02432-5_21
Efficient Lyndon factorization of grammar compressed text

Tomohiro I, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

24th Annual Symposium on Combinatorial Pattern Matching, CPM 2013 Combinatorial Pattern Matching - 24th Annual Symposium, CPM 2013, Proceedings 153 - 164 2013.1

　More details

Language：English Publishing type：Research paper (other academic)

We present an algorithm for computing the Lyndon factorization of a string that is given in grammar compressed form, namely, a Straight Line Program (SLP). The algorithm runs in O(n⁴ + mn³ h) time and O(n ²) space, where m is the size of the Lyndon factorization, n is the size of the SLP, and h is the height of the derivation tree of the SLP. Since the length of the decompressed string can be exponentially large w.r.t. n, m and h, our result is the first polynomial time solution when the string is given as SLP.

DOI： 10.1007/978-3-642-38905-4_16
The position heap of a trie

Yuto Nakashima, Tomohiro I, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

19th International Symposium on String Processing and Information Retrieval, SPIRE 2012 String Processing and Information Retrieval - 19th International Symposium, SPIRE 2012, Proceedings 360 - 371 2012.10

　More details

Language：English Publishing type：Research paper (other academic)

The position heap is a text indexing structure for a single text string, recently proposed by Ehrenfeucht et al. [Position heaps: A simple and dynamic text indexing data structure, Journal of Discrete Algorithms, 9(1):100-121, 2011]. In this paper we introduce the position heap for a set of strings, and propose an efficient algorithm to construct the position heap for a set of strings which is given as a trie. For a fixed alphabet our algorithm runs in time linear in the size of the trie. We also show that the position heap can be efficiently updated after addition/removal of a leaf of the input trie.

DOI： 10.1007/978-3-642-34109-0-38

▼display all

Presentations

Maximal α-Gapped Repeats in a Fibonacci String International conference

Kazuma Yamane, Yuto Nakashima, Kazuhisa Seto, Takashi Horiyama

50th International Conference on Current Trends in Theory and Practice of Informatics (SOFSEM 2025) 2025.1

　More details

Event date： 2025.1

Language：English Presentation type：Oral presentation (general)

Country：Slovakia
On the Number of Non-equivalent Parameterized Squares in a String International conference

Rikuya Hamai, Kazushi Taketsugu, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai

1st International Symposium on String Processing and Information Retrieval (SPIRE 2024) 2024.9

　More details

Event date： 2024.9

Language：English Presentation type：Oral presentation (general)

Country：Mexico
Edit and Alphabet-Ordering Sensitivity of Lex-parse International conference

Yuto Nakashima, Dominik Koeppl, Mitsuru Funakoshi, Shunsuke Inenaga, Hideo Bannai

49th International Symposium on Mathematical Foundations of Computer Science (MFCS 2024) 2024.8

　More details

Event date： 2024.8

Language：English Presentation type：Oral presentation (general)

Country：Slovakia
Computing Longest Common Subsequence under Cartesian-Tree Matching Model International conference

Taketo Tsujimoto, Hiroki Shibata, Takuya Mieno, Yuto Nakashima, Shunsuke Inenaga

35th International Workshop on Combinatorial Algorithms (IWOCA 2024) 2024.7

　More details

Event date： 2024.7

Language：English Presentation type：Oral presentation (general)

Country：Italy
Computing maximal palindromes in non-standard matching models International conference

Mitsuru Funakoshi, Takuya Mieno, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

35th International Workshop on Combinatorial Algorithms (IWOCA 2024) 2024.7

　More details

Event date： 2024.7

Language：English Presentation type：Oral presentation (general)

Country：Italy
Linear-Time Computation of Generalized Minimal Absent Words for Multiple Strings International conference

Kouta Okabe, Takuya Mieno, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai

30th International Symposium on String Processing and Information Retrieval (SPIRE 2023) 2023.9

　More details

Event date： 2023.9

Language：English

Country：Italy
Optimally Computing Compressed Indexing Arrays Based on the Compact Directed Acyclic Word Graph International conference

Hiroki Arimura, Shunsuke Inenaga, Yasuaki Kobayashi, Yuto Nakashima, Mizuki Sue

30th International Symposium on String Processing and Information Retrieval (SPIRE 2023) 2023.9

　More details

Event date： 2023.9

Language：English

Country：Italy
Largest Repetition Factorization of Fibonacci Words International conference

Kaisei Kishi, Yuto Nakashima, Shunsuke Inenaga

30th International Symposium on String Processing and Information Retrieval (SPIRE 2023) 2023.9

　More details

Event date： 2023.9

Language：English

Country：Italy
Computing SEQ-IC-LCS of Labeled Graphs International conference

Yuki Yonemoto, Yuto Nakashima, Shunsuke Inenaga

Prague Stringology Conference 2023 (PSC 2023) 2023.8

　More details

Event date： 2023.8

Language：English

Country：Czech Republic
Optimal LZ-End Parsing Is Hard International conference

Hideo Bannai, Mitsuru Funakoshi, Kazuhiro Kurita, Yuto Nakashima, Kazuhisa Seto, Takeaki Uno

34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023) 2023.6

　More details

Event date： 2023.6

Language：English Presentation type：Oral presentation (general)

Venue：Marne-la-Vallée Country：France
On Sensitivity of Compact Directed Acyclic Word Graphs International conference

Hiroto Fujimaru, Yuto Nakashima, Shunsuke Inenaga

14th International Conference on Words (WORDS 2023) 2023.6

　More details

Event date： 2023.6

Language：English Presentation type：Oral presentation (general)

Venue：Umeå Country：Sweden
Space-Efficient STR-IC-LCS Computation International conference

Yuuki Yonemoto, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai

48th International Conference on Current Trends in Theory and Practice of Computer Science (SOFSEM 2023) 2023.1

　More details

Event date： 2023.1 - 2020.1

Language：English

Venue：Nový Smokovec Country：Slovakia
Minimal Absent Words on Run-Length Encoded Strings International conference

Tooru Akagi, Kouta Okabe, Takuya Mieno, Yuto Nakashima, Shunsuke Inenaga

33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022) 2022.6

　More details

Event date： 2022.6

Language：English Presentation type：Oral presentation (general)

Venue：Czech Technical University Country：Czech Republic
Longest Common Rollercoasters International conference

Kosuke Fujita, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

28th International Symposium on String Processing and Information Retrieval (SPIRE 2021) 2021.10

　More details

Event date： 2021.10

Language：English

Venue：Online Country：France
On the approximation ratio of LZ-End to LZ77 International conference

Takumi Ideue, Takuya Mieno, Mitsuru Funakoshi, Yuto Nakashima, Shunsuke Inenaga, Masayuki Takeda

28th International Symposium on String Processing and Information Retrieval (SPIRE 2021) 2021.10

　More details

Event date： 2021.10

Language：English

Venue：Online Country：France
Position Heaps for Cartesian-tree Matching on Strings and Tries International conference

Akio Nishimoto, Noriki. Fujisato, Yuto Nakashima, Shunsuke Inenaga

28th International Symposium on String Processing and Information Retrieval (SPIRE 2021) 2021.10

　More details

Event date： 2021.10

Language：English

Venue：Online Country：France
Grammar Index By Induced Suffix Sorting International conference

Tooru Akagi, Dominik Köppl, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

28th International Symposium on String Processing and Information Retrieval (SPIRE 2021) 2021.10

　More details

Event date： 2021.10

Language：English

Venue：Online Country：France
Counting Lyndon Subsequence International conference

Ryo Hirakawa, Yuto Nakashima, Shunsuke Inenaga, Masayuki Takeda

Prague Stringology Conference 2021 (PSC 2021) 2021.8

　More details

Event date： 2021.8

Language：English

Venue：Online Country：Czech Republic
The Parameterized Suffix Tray International conference

Noriki Fujisato, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda

12th International Conference on Algorithms and Complexity (CIAC 2021) 2021.5

　More details

Event date： 2021.5

Language：English

Venue：Online Country：Cyprus
On repetitiveness measures of Thue-Morse words International conference

Kanaru Kutsukake, Takuya Matsumoto, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

27th International Symposium on String Processing and Information Retrieval (SPIRE 2020) 2020.10

　More details

Event date： 2020.10

Language：English

Venue：Online Country：United States
Lyndon Words, the Three Squares Lemma, and Primitive Squares International conference

Hideo Bannai, Takuya Mieno, Yuto Nakashima

27th International Symposium on String Processing and Information Retrieval (SPIRE 2020) 2020.10

　More details

Event date： 2020.10

Language：English Presentation type：Oral presentation (general)

Venue：Online Country：United States
Towards Efficient Interactive Computation of Dynamic Time Warping Distance International conference

Akihiro Nishi, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

27th International Symposium on String Processing and Information Retrieval (SPIRE 2020) 2020.10

　More details

Event date： 2020.10

Language：English

Venue：Online Country：United States
DAWGs for Parameterized Matching: Online Construction and Related Indexing Structures International conference

Katsuhito Nakashima, Noriki Fujisato, Diptarama Hendrian, Yuto Nakashima, Ryo Yoshinaka, Shunsuke Inenaga, Hideo Bannai, Ayumi Shinohara, Masayuki Takeda

31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020) 2020.6

　More details

Event date： 2020.6

Language：English Presentation type：Oral presentation (general)

Venue：Online Country：Denmark
Detecting k-(Sub-)Cadences and Equidistant Subsequence Occurrences International conference

Mitsuru Funakoshi, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda, Ayumi Shinohara

31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020) 2020.6

　More details

Event date： 2020.6

Language：English Presentation type：Oral presentation (general)

Venue：Online Country：Denmark
c-Trie++: A Dynamic Trie Tailored for Fast Prefix Searches International conference

Kazuya Tsuruta, Dominik Köppl, Shunsuke Kanda, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

Data Compression Conference 2020 2020.3

　More details

Event date： 2020.3

Language：English

Venue：オンライン会議 Country：Other
Faster STR-EC-LCS Computation International conference

Kohei Yamada, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

46th International Conference on Current Trends in Theory and Practice of Computer Science, SOFSEM 2020 2020.1

　More details

Event date： 2020.1

Language：English

Venue：Limassol Country：Cyprus

The longest common subsequence (LCS) problem is a central problem in stringology that finds the longest common subsequence of given two strings A and B. More recently, a set of four constrained LCS problems (called generalized constrained LCS problem) were proposed by Chen and Chao [J. Comb. Optim, 2011]. In this paper, we consider the substring-excluding constrained LCS (STR-EC-LCS) problem. A string Z is said to be an STR-EC-LCS of two given strings A and B excluding P if, Z is one of the longest common subsequences of A and B that does not contain P as a substring. Wang et al. proposed a dynamic programming solution which computes an STR-EC-LCS in O(mnr) time and space where [Inf. Process. Lett., 2013]. In this paper, we show a new solution for the STR-EC-LCS problem. Our algorithm computes an STR-EC-LCS in time where denotes the set of distinct characters occurring in both A and B, and L is the length of the STR-EC-LCS. This algorithm is faster than the O(mnr)-time algorithm for short/long STR-EC-LCS (namely, or), and is at least as efficient as the O(mnr)-time algorithm for all cases.
Minimal Unique Substrings and Minimal Absent Words in a Sliding Window International conference

Takuya Mieno, Yuki Kuhara, Tooru Akagi, Yuta Fujishige, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

46th International Conference on Current Trends in Theory and Practice of Computer Science, SOFSEM 2020 2020.1

　More details

Event date： 2020.1

Language：English

Venue：Limassol Country：Cyprus

A substring u of a string T is called a minimal unique substring (MUS) of T if u occurs exactly once in T and any proper substring of u occurs at least twice in T. A string w is called a minimal absent word (MAW) of T if w does not occur in T and any proper substring of w occurs in T. In this paper, we study the problems of computing MUSs and MAWs in a sliding window over a given string T. We first show how the set of MUSs can change in a sliding window over T, and present an-time and O(d)-space algorithm to compute MUSs in a sliding window of width d over T, where is the maximum number of distinct characters in every window. We then give tight upper and lower bounds on the maximum number of changes in the set of MAWs in a sliding window over T. Our bounds improve on the previous results in Crochemore et al. (2017).
An improved data structure for left-right maximal generic words problem International conference

Yuta Fujishige, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

30th International Symposium on Algorithms and Computation, ISAAC 2019 2019.12

　More details

Event date： 2019.12

Language：English

Venue：Shanghai Country：China

For a set D of documents and a positive integer d, a string w is said to be d-left-right maximal, if (1) w occurs in at least d documents in D, and (2) any proper superstring of w occurs in less than d documents. The left-right-maximal generic words problem is, given a set D of documents, to preprocess D so that for any string p and for any positive integer d, all the superstrings of p that are d-left-right maximal can be answered quickly. In this paper, we present an O(n log m) space data structure (in words) which answers queries in O(|p| + o log log m) time, where n is the total length of documents in D, m is the number of documents in D and o is the number of outputs. Our solution improves the previous one by Nishimoto et al. (PSC 2015), which uses an O(n log n) space data structure answering queries in O(|p| + r · log n + o · log² n) time, where r is the number of right-extensions q of p occurring in at least d documents such that any proper right extension of q occurs in less than d documents.
Compact Data Structures for Shortest Unique Substring Queries International conference

Takuya Mieno, Dominik Köppl, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

26th International Symposium on String Processing and Information Retrieval, SPIRE 2019 2019.10

　More details

Event date： 2019.10

Language：English

Venue：Segovia Country：Spain

Given a string T of length n, a substring of T is called a shortest unique substring (SUS) for an interval [s, t] if (a) u occurs exactly once in T, (b) u contains the interval [s, t] (i.e.), and (c) every substring v of T with containing [s, t] occurs at least twice in T. Given a query interval, the interval SUS problem is to output all the SUSs for the interval [s, t]. In this article, we propose a bits data structure answering an interval SUS query in output-sensitive time, where is the number of returned SUSs. Additionally, we focus on the point SUS problem, which is the interval SUS problem for. Here, we propose a bits data structure answering a point SUS query in the same output-sensitive time.
On Longest Common Property Preserved Substring Queries International conference

Kazuki Kai, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda, Tomasz Kociumaka

26th International Symposium on String Processing and Information Retrieval, SPIRE 2019 2019.10

　More details

Event date： 2019.10

Language：English

Venue：Segovia Country：Spain

We revisit the problem of longest common property preserving substring queries introduced by Ayad et al. (SPIRE 2018, arXiv 2018). We consider a generalized and unified on-line setting, where we are given a set X of k strings of total length n that can be pre-processed so that, given a query string y and a positive integer (Formula presented), we can determine the longest substring of y that satisfies some specific property and is common to at least (Formula presented) strings in X. Ayad et al. considered the longest square-free substring in an on-line setting and the longest periodic and palindromic substring in an off-line setting. In this paper, we give efficient solutions in the on-line setting for finding the longest common square, periodic, palindromic, and Lyndon substrings. More precisely, we show that X can be pre-processed in O(n) time resulting in a data structure of O(n) size that answers queries in (Formula presented) time and O(1) working space, where (Formula presented) is the size of the alphabet, and the common substring must be a square, a periodic substring, a palindrome, or a Lyndon word.
Direct Linear Time Construction of Parameterized Suffix and LCP Arrays for Constant Alphabets International conference

Noriki Fujisato, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

26th International Symposium on String Processing and Information Retrieval, SPIRE 2019 2019.10

　More details

Event date： 2019.10

Language：English

Venue：Segovia Country：Spain

We present the first worst-case linear time algorithm that directly computes the parameterized suffix and LCP arrays for constant sized alphabets. Previous algorithms either required quadratic time or the parameterized suffix tree to be built first. More formally, for a string over static alphabet and parameterized alphabet, our algorithm runs in time and O(n) words of space, where is the number of distinct symbols of in the string.
Computing Maximal Palindromes and Distinct Palindromes in a Trie International conference

Mitsuru Funakoshi, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

Prague Stringology Conference 2019 2019.8

　More details

Event date： 2019.8

Language：English

Country：Czech Republic
Shortest unique palindromic substring queries on run-length encoded strings International conference

Kiichi Watanabe, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

30th International Workshop on Combinatorial Algorithms, IWOCA 2019 2019.7

　More details

Event date： 2019.7

Language：English

Venue：Pisa Country：Italy

For a string S, a palindromic substring S[i.j] is said to be a shortest unique palindromic substring (SUPS) for an interval [s, t] in S, if S[i.j] occurs exactly once in S, the interval [i, j] contains [s, t], and every palindromic substring containing [s, t] which is shorter than S[i.j] occurs at least twice in S. In this paper, we study the problem of answering SUPS queries on run-length encoded strings. We show how to preprocess a given run-length encoded string RLE_S of size m in O(m) space and O(mlog σRLE_S + m√log m/log logm) time so that all SUPSs for any subsequent query interval can be answered in O(√log m/log logm+ α) time, where α is the number of outputs, and σRLE_S is the number of distinct runs of RLE_S.
Computing runs on a trie International conference

Ryo Sugahara, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

30th Annual Symposium on Combinatorial Pattern Matching, CPM 2019 2019.6

　More details

Event date： 2019.6

Language：English

Venue：Pisa Country：Italy

A maximal repetition, or run, in a string, is a maximal periodic substring whose smallest period is at most half the length of the substring. In this paper, we consider runs that correspond to a path on a trie, or in other words, on a rooted edge-labeled tree where the endpoints of the path must be a descendant/ancestor of the other. For a trie with n edges, we show that the number of runs is less than n. We also show an O(n√log n log log n) time and O(n) space algorithm for counting and finding the shallower endpoint of all runs. We further show an O(n log n) time and O(n) space algorithm for finding both endpoints of all runs. We also discuss how to improve the running time even more.
On the size of overlapping Lempel-Ziv and Lyndon factorizations International conference

Yuki Urabe, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

30th Annual Symposium on Combinatorial Pattern Matching, CPM 2019 2019.6

　More details

Event date： 2019.6

Language：English

Venue：Pisa Country：Italy

Lempel-Ziv (LZ) factorization and Lyndon factorization are well-known factorizations of strings. Recently, Kärkkäinen et al. studied the relation between the sizes of the two factorizations, and showed that the size of the Lyndon factorization is always smaller than twice the size of the non-overlapping LZ factorization [STACS 2017]. In this paper, we consider a similar problem for the overlapping version of the LZ factorization. Since the size of the overlapping LZ factorization is always smaller than the size of the non-overlapping LZ factorization and, in fact, can even be an O(log n) factor smaller, it is not immediately clear whether a similar bound as in previous work would hold. Nevertheless, in this paper, we prove that the size of the Lyndon factorization is always smaller than four times the size of the overlapping LZ factorization.
Faster queries for longest substring palindrome after block edit International conference

Mitsuru Funakoshi, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

30th Annual Symposium on Combinatorial Pattern Matching, CPM 2019 2019.6

　More details

Event date： 2019.6

Language：English

Venue：Pisa Country：Italy

Palindromes are important objects in strings which have been extensively studied from combinatorial, algorithmic, and bioinformatics points of views. Manacher [J. ACM 1975] proposed a seminal algorithm that computes the longest substring palindromes (LSPals) of a given string in O(n) time, where n is the length of the string. In this paper, we consider the problem of finding the LSPal after the string is edited. We present an algorithm that uses O(n) time and space for preprocessing, and answers the length of the LSPals in O(ℓ + log log n) time, after a substring in T is replaced by a string of arbitrary length ℓ. This outperforms the query algorithm proposed in our previous work [CPM 2018] that uses O(ℓ + log n) time for each query.
The Parameterized Position Heap of a Trie International conference

Noriki Fujisato, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

11th International Conference on Algorithms and Complexity, CIAC 2019 2019.5

　More details

Event date： 2019.5

Language：English

Venue：Rome Country：Italy

Let and be disjoint alphabets of respective size and. Two strings over of equal length are said to parameterized match (p-match) if there is a bijection such that (1) f is identity on and (2) f maps the characters of one string to those of the other string so that the two strings become identical. We consider the p-matching problem on a (reversed) trie and a string pattern P such that every path that p-matches P has to be reported. Let N be the size of the given trie. In this paper, we propose the parameterized position heap for that occupies O(N) space and supports p-matching queries in time, where m is the length of a query pattern P and is the number of paths in to report. We also present an algorithm which constructs the parameterized position heap for a given trie in time and working space.
MR-RePair Grammar Compression Based on Maximal Repeats

Isamu Furuya, Takuya Takagi, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Takuya Kida

2019 Data Compression Conference, DCC 2019 2019.3

　More details

Event date： 2019.3

Language：English

Venue：Snowbird Country：United States

We analyze the grammar generation algorithm of the RePair compression algorithm and show the relation between a grammar generated by RePair and maximal repeats. We reveal that RePair replaces step by step the most frequent pairs within the corresponding most frequent maximal repeats. Then, we design a novel variant of RePair, called MR-RePair, which substitutes the most frequent maximal repeats at once instead of substituting the most frequent pairs consecutively. We implemented MR-RePair and compared the size of the grammar generated by MR-RePair to that by RePair on several text corpora. Our experiments show that MR-RePair generates more compact grammars than RePair does, especially for highly repetitive texts.
Recovering, counting and enumerating strings from forward and backward suffix arrays International conference

Yuki Kuhara, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

25th International Symposium on String Processing and Information Retrieval, SPIRE 2018 2018.10

　More details

Event date： 2018.10

Language：English

Venue：Lima Country：Peru

The suffix array SA_w of a string w of length n is a permutation of [1..n] such that SA_w[i]=j iff w[j, n] is the lexicographically i-th suffix of w. In this paper, we consider variants of the reverse-engineering problem on suffix arrays with two given permutations P and Q of [1..n], such that P refers to the forward suffix array of some string w and Q refers to the backward suffix array of the reversed string w^R. Our results are the following: (1) An algorithm which computes a solution string over an alphabet of the smallest size, in O(n) time. (2) The exact number of solution strings over an alphabet of size σ. (3) An efficient algorithm which computes all solution strings in the lexicographical order, in time near optimal up to log n factor.
Right-to-left online construction of parameterized position heaps International conference

Noriki Fujisato, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda.

Prague Stringology Conference 2018 2018.8

　More details

Event date： 2018.8

Language：English Presentation type：Oral presentation (general)

Country：Czech Republic
O(n log n)-time text compression by LZ-style longest first substitution International conference

Akihiro Nishi, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda.

Prague Stringology Conference 2018 2018.8

　More details

Event date： 2018.8

Language：English Presentation type：Oral presentation (general)

Country：Czech Republic
Faster online elastic degenerate string matching

Kotaro Aoyama, Yuto Nakashima, I. Tomohiro, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

29th Annual Symposium on Combinatorial Pattern Matching, CPM 2018 2018.7

　More details

Event date： 2018.7

Language：English

Venue：Qingdao Country：China

An Elastic-Degenerate String [Iliopoulus et al., LATA 2017] is a sequence of sets of strings, which was recently proposed as a way to model a set of similar sequences. We give an online algorithm for the Elastic-Degenerate String Matching (EDSM) problem that runs in O(nm √mlogm+ N) time and O(m) working space, where n is the number of elastic degenerate segments of the text, N is the total length of all strings in the text, and m is the length of the pattern. This improves the previous algorithm by Grossi et al. [CPM 2017] that runs in O(nm² + N) time.
Longest substring palindrome after edit

Mitsuru Funakoshi, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

29th Annual Symposium on Combinatorial Pattern Matching, CPM 2018 2018.7

　More details

Event date： 2018.7

Language：English

Venue：Qingdao Country：China

It is known that the length of the longest substring palindromes (LSPals) of a given string T of length n can be computed in O(n) time by Manacher's algorithm [J. ACM '75]. In this paper, we consider the problem of finding the LSPal after the string is edited. We present an algorithm that uses O(n) time and space for preprocessing, and answers the length of the LSPals in O(log(min{ω, log n})) time after single character substitution, insertion, or deletion, where ω denotes the number of distinct characters appearing in T. We also propose an algorithm that uses O(n) time and space for preprocessing, and answers the length of the LSPals in O(ℓ+log n) time, after an existing substring in T is replaced by a string of arbitrary length ℓ.
Longest lyndon substring after edit

Yuki Urabe, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

29th Annual Symposium on Combinatorial Pattern Matching, CPM 2018 2018.7

　More details

Event date： 2018.7

Language：English

Venue：Qingdao Country：China

The longest Lyndon substring of a string T is the longest substring of T which is a Lyndon word. LLS(T) denotes the length of the longest Lyndon substring of a string T. In this paper, we consider computing LLS(T′) where T′ is an edited string formed from T. After O(n) time and space preprocessing, our algorithm returns LLS(T′) in O(log n) time for any single character edit. We also consider a version of the problem with block edits, i.e., a substring of T is replaced by a given string of length l. After O(n) time and space preprocessing, our algorithm returns LLS(T′) in O(l log σ + log n) time for any block edit where σ is the number of distinct characters in T. We can modify our algorithm so as to output all the longest Lyndon substrings of T′ for both problems.
Lyndon 文字列とテキスト圧縮 Invited

中島祐人

2018年電子情報通信学会総合大会 2018.3

　More details

Event date： 2018.6

Language：Japanese Presentation type：Oral presentation (general)

Venue：東京電機大学（東京） Country：Japan
ストリーミングモデルにおける最長Lyndon文字列

中島祐人

冬のLAシンポジウム2017 2018.2

　More details

Event date： 2018.6

Language：Japanese Presentation type：Oral presentation (general)

Venue：京都大学（京都市） Country：Japan
Almost linear time computation of maximal repetitions in run length encoded strings International conference

Yuta Fujishige, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

The 28th International Symposium on Algorithms and Computation (ISAAC 2017) 2017.12

　More details

Event date： 2017.12

Language：English Presentation type：Oral presentation (general)

Country：Thailand
On Reverse Engineering the Lyndon Tree International conference

Yuto Nakashima, Takuya Takagi, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

The Prague Stringology Conference 2017 (PSC 2017) 2017.8

　More details

Event date： 2017.8

Language：English Presentation type：Oral presentation (general)

Country：Czech Republic
Almost linear time computation of maximal repetitions in run length encoded strings International conference

Yuta Fujishige, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

12th Workshop on Compression, Text and Algorithms 2017.9

　More details

Event date： 2017.6

Language：English Presentation type：Oral presentation (general)

Country：Italy

▼display all

Professional Memberships

情報処理学会

2021.4 - Present

Committee Memberships

The Prague Stringology Conference 2025: Program Committee Member

2025.4 - 2025.8

　 More details

Committee type：Other

researchmap
電子情報通信学会英文論文誌小特集編集委員会編集委員

2025.2 - Present

　 More details

Committee type：Academic society

researchmap
14th International Conference on Algorithms and Complexity: Program Committee Member

2024.9 - 2025.6

　 More details

Committee type：Other

researchmap
情報処理学会アルゴリズム研究運営委員会幹事

2024.4 - Present

　 More details

researchmap
The Prague Stringology Conference 2024: Program Committee Member

2024.4 - 2024.8

　 More details

Committee type：Other

researchmap
35th Annual Symposium on Combinatorial Pattern Matching: Local organizing committee Chair

2023.3 - 2024.9

　 More details

Committee type：Other

researchmap
EATCS Japan Chapter Organizer Foreign country

2023.1 - Present
EATCS Japan Chapter Secretary

2023.1 - Present

　 More details

researchmap
34th Annual Symposium on Combinatorial Pattern Matching: Program Committee member

2022.11 - 2023.6

　 More details

Committee type：Other

researchmap
電子情報通信学会情報・システムソサイエティ誌編集委員会編集委員

2022.6 - Present

　 More details

researchmap
情報処理学会アルゴリズム研究会運営委員 Domestic

2022.6 - 2026.6
情報処理学会アルゴリズム研究運営委員会運営委員

2022.4 - 2024.3

　 More details

researchmap
電子情報通信学会コンピュテーション研究専門委員会専門委員

2020.6 - Present

　 More details

researchmap
電子情報通信学会コンピュテーション研究会専門委員 Domestic

2020.6 - 2022.5

▼display all

Academic Activities

組織委員長 International contribution

35th Annual Symposium on Combinatorial Pattern Matching （ Japan ） 2024.6

　More details

Type：Competition, symposium, etc.
PC memer International contribution

34th Annual Symposium on Combinatorial Pattern Matching （ France ） 2023.6

　More details

Type：Competition, symposium, etc.
Screening of academic papers

Role(s)： Peer review

2023

　More details

Type：Peer review

Proceedings of International Conference Number of peer-reviewed papers：8
運営委員

STRセミナー2022　若手研究者のための大学間合同研究集会（ Japan ） 2022.9

　More details

Type：Competition, symposium, etc.
電子情報通信学会　情報・システムソサイエティ誌

2022.6 - 2024.6

　More details

Type：Academic society, research group, etc.
Screening of academic papers

Role(s)： Peer review

2022

　More details

Type：Peer review

Number of peer-reviewed articles in foreign language journals：3

Proceedings of International Conference Number of peer-reviewed papers：7
Screening of academic papers

Role(s)： Peer review

2021

　More details

Type：Peer review

Number of peer-reviewed articles in foreign language journals：2

Proceedings of International Conference Number of peer-reviewed papers：4
座長

冬のLAシンポジウム2019 （ Japan ） 2020.2

　More details

Type：Competition, symposium, etc.
Screening of academic papers

Role(s)： Peer review

2020

　More details

Type：Peer review

Number of peer-reviewed articles in foreign language journals：4

Proceedings of International Conference Number of peer-reviewed papers：4
座長

コンピュテーション研究会（ Japan ） 2019.10

　More details

Type：Competition, symposium, etc.
コメンテータ

第11回データ工学と情報マネジメントに関するフォーラム（ Japan ） 2019.3

　More details

Type：Competition, symposium, etc.
Screening of academic papers

Role(s)： Peer review

2019

　More details

Type：Peer review

Number of peer-reviewed articles in foreign language journals：0

Number of peer-reviewed articles in Japanese journals：0

Proceedings of International Conference Number of peer-reviewed papers：6

Proceedings of domestic conference Number of peer-reviewed papers：0
座長

夏のLAシンポジウム2018 （ Japan ） 2018.7

　More details

Type：Competition, symposium, etc.
座長

2018年電子情報通信学会総合大会（ Japan ） 2018.3

　More details

Type：Competition, symposium, etc.
座長

アルゴリズム研究会（167回）（ Japan ） 2018.3

　More details

Type：Competition, symposium, etc.
座長

冬のLAシンポジウム2017 （ Japan ） 2018.2

　More details

Type：Competition, symposium, etc.
Screening of academic papers

Role(s)： Peer review

2018

　More details

Type：Peer review

Number of peer-reviewed articles in foreign language journals：4

Number of peer-reviewed articles in Japanese journals：0

Proceedings of International Conference Number of peer-reviewed papers：3

Proceedings of domestic conference Number of peer-reviewed papers：0
Screening of academic papers

Role(s)： Peer review

2017

　More details

Type：Peer review

Number of peer-reviewed articles in foreign language journals：1

Number of peer-reviewed articles in Japanese journals：0

Proceedings of International Conference Number of peer-reviewed papers：2

Proceedings of domestic conference Number of peer-reviewed papers：0

▼display all

Research Projects

文字列に対するアルファベット順序最適化理論

Grant number：25K00136 2025.4 - 2030.3

Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B)

中島祐人, 坂内英夫, 栗田和宏

　 More details

Grant type：Scientific research funding

文字列を対象としたアルゴリズムは古くから盛んに研究され，データを扱うあらゆるシステムの基盤としての役割を果たしている．特に，辞書式順序依存な文字列処理においては，アルファベット順序の変更により同じ入力に対する振る舞いが異なりうる．つまり，入力や処理に応じた良いアルファベット順序を選択できれば，文字列処理の効率化につながる．本研究では，辞書式順序依存な文字列処理に対して最適なアルファベット順序を計算する「アルファベット順序最適化問題」への文字列組合せ論に基づいた数理的理解を推し進めるとともに，効率的なアルゴリズム開発に取り組む．
辞書式順序依存問題の複雑さの解明

Grant number：23H04386 2023 - 2024

Japan Society for the Promotion of Science・Ministry of Education, Culture, Sports, Science and Technology Grants-in-Aid for Scientific Research Grant-in-Aid for Transformative Research Areas (A)

中島祐人

　 More details

Authorship：Principal investigator Grant type：Scientific research funding

効率的な文字列アルゴリズムの開発には，文字列が持つ数理的性質の理解が重要である.文字列の特徴を捉える際に利用される構造や性質は，辞書式順序依存な構造と辞書式順序非依存な構造に大別される.辞書式順序依存な構造におけるこれまでの研究では，暗に与えられた辞書式順序のみを考えているに過ぎなかった.本研究では辞書式順序に依存する問題に着目し，組合せ論・計算量理論・アルゴリズム論の三方向から，辞書式順序が与える文字列構造への影響を解明する．

CiNii Research
広義文字列のアルゴリズムと組合せ論

Grant number：23K24808 2022.4 - 2026.3

Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B)

稲永俊介, 坂内英夫, 中島祐人, Koeppl Dominik

　 More details

Authorship：Coinvestigator(s) Grant type：Scientific research funding

文字列とは，文字あるいは記号の一本鎖列である．自然言語テキスト，サーバログ，DNA配列など，計算機処理可能な様々なデータを文字列と見なすことができる．1970年代から半世紀にわたって，検索・圧縮・比較・解析・発見などの重要タスクを文字列データ上で高速実行するアルゴリズムが多数提案されてきた．本提案課題では，文字列の定義域を拡張した「広義文字列」という新概念を提唱し，木型文字列／グラフ型文字列／２次元文字列／時系列の高速処理アルゴリズム開発と，それを支える組合せ論的性質の解明に取り組む．

CiNii Research
Lyndon文字列による反復性指標解析

Grant number：21K17705 2021 - 2024

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Early-Career Scientists

中島祐人

　 More details

Authorship：Principal investigator Grant type：Scientific research funding

文字列データを対象としたデータ圧縮においては，辞書式圧縮と呼ばれる様々な手法が知られている．一般に，部分文字列の反復(繰り返し)が多いほど，文字列は圧縮されやすい傾向にあるため，LZ 分解などの文字列構造のサイズは，文字列の反復性を捉えた反復性指標と見なすことができる．最近では，String attractor や部分文字列複雑性に基づいた反復性指標が提案され，反復性への理解が進められているが，本申請課題では，辞書式圧縮とは直接関係のない Lyndon 文字列および関連する文字列構造を導入することで，新たな視点から反復性指標の解析を行い，その本質を明らかにする．

CiNii Research
辞書式順序に基づいた文字列データ処理法の構築

Grant number：JPMJAX200K 2020 - 2022

科学技術振興機構 ACT-X JST Strategic Basic Research Program (Ministry of Education, Culture, Sports, Science and Technology)

　 More details

Authorship：Principal investigator Grant type：Contract research
情報爆縮に基づくIoTデータ処理基盤の構築

Grant number：18H04098 2018 - 2022

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (A)

竹田正幸, 定兼邦彦, 坂内英夫, 井智弘, 瀧本英二, 坂本比呂志, 畑埜晃平, 稲永俊介, 喜田拓也, 中島祐人

　 More details

Authorship：Coinvestigator(s) Grant type：Scientific research funding

IoT時代が到来し, 製造・産業界, モビリティ・交通インフラ, 医療・ヘルスケアなど, 広範な分野において技術革新が生み出され新たな価値が創出されると期待されている. 一方, IoTデータの急増に伴い, クラウドサーバへのデータ処理の集中, ネットワーク回線の逼迫, データ送信の遅延などの技術的問題が深刻化している. 現在のIoTに対する取り組みは実証フェーズのものが多くIoTを用いた価値創出に重点が置かれているため, 上記の問題の抜本的な解決は後回しとなっている. 代表者らは, これまでに, 入力データや計算に用いるデータ構造に潜む冗長性の除去に基づいて計算そのものの効率化を図る「情報爆縮」の研究を行ってきた. 本研究プロジェクトでは, この情報爆縮技術を核に据え, 理論と実用的の両面から, エッジ側とクラウド側の処理を効率化する技術を開発し低コストでデータを収集・集約・送信・蓄積・検索・解析できる新しいIoTデータ処理基盤の構築を目指す.
このために, 以下の3つの研究項目をおいて研究を行った.
(A) 圧縮データストリームに対するリアルタイム抽出・集計技術.
(B) 圧縮データに対する高速質問処理技術.
(C) 解釈可能な圧縮データ錬成技術.
前年度に引き続き, 特に(A), (B)に力点を置いて研究を行い、(A)スライド窓における極小ユニーク文字列の列挙・回文木の構築/維持のアルゴリズム、(B)接尾辞配列構築アルゴリズムを基にした高速な圧縮索引構造の提案等、多くの研究成果を得た.

CiNii Research
Lyndon文字列による簡潔で高速な文字列処理アルゴリズム

Grant number：18K18002 2018 - 2020

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Early-Career Scientists

中島祐人

　 More details

Authorship：Principal investigator Grant type：Scientific research funding

本研究の目的は，Lyndon文字列の性質に基づいて，簡潔で高速な文字列処理アルゴリズムを開発することである．
本目的の達成のために，Lyndon文字列を中心に，繰り返し構造や回文構造など広く文字列処理アルゴリズムや文字列組合せ論の問題に取り組んだ．Lyndon文字列を中心に，様々な文字列構造の性質の理解や，それらの性質を利用した効率的なアルゴリズムを提案した．

CiNii Research
情報爆縮基盤技術

Grant number：25240003 2017

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (A)

　 More details

Authorship：Coinvestigator(s) Grant type：Scientific research funding
ストリーミングモデルにおける文字列処理アルゴリズム基盤

Grant number：17H06923 2017

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Grant-in-Aid for Research Activity start-up

　 More details

Authorship：Principal investigator Grant type：Scientific research funding

▼display all

Educational Activities

主に，理学部物理学科情報理学コースの演習科目を担当している．

Class subject

情報科学

2024.10 - 2025.3 Second semester
電気情報工学セミナーA

2024.10 - 2024.11 Fall quarter
情報代数学演習

2024.4 - 2024.9 First semester
形式言語理論演習

2024.4 - 2024.9 First semester
情報代数学演習

2023.4 - 2023.9 First semester
形式言語理論演習

2023.4 - 2023.9 First semester
情報代数学演習

2022.4 - 2022.9 First semester
情報論理学演習

2022.4 - 2022.9 First semester
形式言語理論演習

2022.4 - 2022.9 First semester
情報論理学演習

2021.4 - 2021.9 First semester
情報代数学演習

2021.4 - 2021.9 First semester
形式言語理論演習

2021.4 - 2021.9 First semester
形式言語理論演習

2020.4 - 2020.9 First semester
情報代数学演習

2020.4 - 2020.9 First semester
情報論理学演習

2020.4 - 2020.9 First semester
形式言語理論演習

2019.4 - 2019.9 First semester
情報代数学演習

2019.4 - 2019.9 First semester
情報論理学演習

2019.4 - 2019.9 First semester
情報科学

2018.10 - 2019.3 Second semester
形式言語理論演習

2018.4 - 2018.9 First semester
情報代数学演習

2018.4 - 2018.9 First semester
情報論理学演習

2018.4 - 2018.9 First semester
情報論理学演習

2017.4 - 2017.9 First semester
形式言語理論演習

2017.4 - 2017.9 First semester
情報代数学演習

2017.4 - 2017.9 First semester

▼display all

FD Participation

2024.3 Role：Participation Title：【シス情FD】高度データサイエンティスト育成事業の取り組みについて

Organizer：[Undergraduate school/graduate school/graduate faculty]
2023.10 Role：Participation Title：【シス情FD】価値創造型半導体人材育成センターについて

Organizer：[Undergraduate school/graduate school/graduate faculty]
2023.4 Role：Speech Title：【シス情FD】若手教員による研究紹介⑧

Organizer：[Undergraduate school/graduate school/graduate faculty]
2022.5 Role：Participation Title：【シス情FD】若手教員による研究紹介④「量子コンピュータ・システム・アーキテクチャの研究～道具になることを目指して～」

Organizer：[Undergraduate school/graduate school/graduate faculty]
2022.4 Role：Participation Title：【シス情FD】第4期中期目標・中期計画等について

Organizer：[Undergraduate school/graduate school/graduate faculty]
2022.1 Role：Participation Title：【シス情FD】シス情関連の科学技術に対する国の政策動向（に関する私見）

Organizer：[Undergraduate school/graduate school/graduate faculty]
2021.10 Role：Participation Title：【シス情FD】熊本高専と九大システム情報との交流・連携に向けてー 3年半で感じた高専の実像ー

Organizer：[Undergraduate school/graduate school/graduate faculty]
2021.7 Role：Participation Title：若手教員による研究紹介　及び　科研取得のポイント、その他について　②

Organizer：[Undergraduate school/graduate school/graduate faculty]
2021.6 Role：Participation Title：若手教員による研究紹介　及び　科研取得のポイントについて　①

Organizer：[Undergraduate school/graduate school/graduate faculty]
2020.11 Role：Participation Title：マス・フォア・イノベーション卓越大学院について

Organizer：[Undergraduate school/graduate school/graduate faculty]
2020.4 Role：Participation Title：新型コロナウイルスが誘起した社会変化に対するシステム情報科学からの提言

Organizer：[Undergraduate school/graduate school/graduate faculty]
2018.7 Role：Participation Title：論文剽窃ソフトの活用方法について

Organizer：[Undergraduate school/graduate school/graduate faculty]

▼display all

Travel Abroad

2025.2 - 2025.3

Staying countory name 1：Poland Staying institution name 1：The University of Warsaw