Kyushu University Academic Staff Educational and Research Activities Database
List of Papers
Yuto Nakashima Last modified date:2022.04.20

Assistant Professor / Mathematical Informatics / Department of Informatics / Faculty of Information Science and Electrical Engineering


Papers
1. Yuto Nakashima, Tomohiro I, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda, The position heap of a trie, 19th International Symposium on String Processing and Information Retrieval, SPIRE 2012 String Processing and Information Retrieval - 19th International Symposium, SPIRE 2012, Proceedings, 10.1007/978-3-642-34109-0-38, 360-371, 2012.10, The position heap is a text indexing structure for a single text string, recently proposed by Ehrenfeucht et al. [Position heaps: A simple and dynamic text indexing data structure, Journal of Discrete Algorithms, 9(1):100-121, 2011]. In this paper we introduce the position heap for a set of strings, and propose an efficient algorithm to construct the position heap for a set of strings which is given as a trie. For a fixed alphabet our algorithm runs in time linear in the size of the trie. We also show that the position heap can be efficiently updated after addition/removal of a leaf of the input trie..
2. Tomohiro I, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda, Efficient Lyndon factorization of grammar compressed text, 24th Annual Symposium on Combinatorial Pattern Matching, CPM 2013 Combinatorial Pattern Matching - 24th Annual Symposium, CPM 2013, Proceedings, 10.1007/978-3-642-38905-4_16, 153-164, 2013.01, We present an algorithm for computing the Lyndon factorization of a string that is given in grammar compressed form, namely, a Straight Line Program (SLP). The algorithm runs in O(n4 + mn3 h) time and O(n 2) space, where m is the size of the Lyndon factorization, n is the size of the SLP, and h is the height of the derivation tree of the SLP. Since the length of the decompressed string can be exponentially large w.r.t. n, m and h, our result is the first polynomial time solution when the string is given as SLP..
3. Tomohiro I, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda, Faster lyndon factorization algorithms for SLP and LZ78 compressed text, 20th International Symposium on String Processing and Information Retrieval, SPIRE 2013 String Processing and Information Retrieval - 20th International Symposium, SPIRE 2013, Proceedings, 10.1007/978-3-319-02432-5_21, 174-185, 2013.10, We present two efficient algorithms which, given a compressed representation of a string w of length N, compute the Lyndon factorization of w. Given a straight line program (SLP) S of size n and height h that describes w, the first algorithm runs in O(nh(n + log N log n)) time and O(n2) space. Given the Lempel-Ziv 78 encoding of size s for w, the second algorithm runs in O(s log s) time and space..
4. Yuto Nakashima, Takashi Okabe, I. Tomohiro, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda, Inferring strings from Lyndon factorization, 39th International Symposium on Mathematical Foundations of Computer Science, MFCS 2014 Mathematical Foundations of Computer Science 2014 - 39th International Symposium, MFCS 2014, Proceedings, 10.1007/978-3-662-44465-8_48, 565-576, 2014.01, The Lyndon factorization of a string w is a unique factorization ℓp11,⋯, ℓpmm of w s.t. ℓ1,⋯, ℓm is a sequence of Lyndon words that is monotonically decreasing in lexicographic order. In this paper, we consider the reverse-engineering problem on Lyndon factorization: Given a sequence S = ((s1, p1),⋯, (sm, p m)) of ordered pairs of positive integers, find a string w whose Lyndon factorization corresponds to the input sequence S, i.e., the Lyndon factorization of w is in a form of ℓp11,⋯, ℓpmm with |ℓi| = si for all 1 ≤ i ≤ m. Firstly, we show that there exists a simple O(n)-time algorithm if the size of the alphabet is unbounded, where n is the length of the output string. Secondly, we present an O(n)-time algorithm to compute a string over an alphabet of the smallest size. Thirdly, we show how to compute only the size of the smallest alphabet in O(m) time. Fourthly, we give an O(m)-time algorithm to compute an O(m)-size representation of a string over an alphabet of the smallest size. Finally, we propose an efficient algorithm to enumerate all strings whose Lyndon factorizations correspond to S..
5. Takaaki Nishimoto, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda, Computing left-right maximal generic words, 19th Prague Stringology Conference, PSC 2015 Proceedings of the Prague Stringology Conference 2015, PSC 2015, 5-16, 2015.01, The maximal generic words problem was proposed by Kucherov et al. (SPIRE 2012). Let D be a set of documents. In this problem, given a pattern P and a threshold δ ≤ |D|, we want to compute all right-maximal extensions of P which occur in at least δ distinct documents. They proposed an O(n)-space data structure which can solve this problem in O(|P| + rocc) time where n is the total length of documents in D and rocc is the number of right-maximal extensions of P. The data structure can be constructed in O(n) time. In this paper, we propose a more generalized problem. Given a pattern P and a threshold δ ≤ |D|, we want to compute all left-right-maximal extensions of P which occur in at least δ distinct documents. We propose an O(n log n)- space data structure which can solve this problem in O(|P| + occ log2 n + rocc log n) time where occ is the number of left-right-maximal extensions of P..
6. Hideo Bannai, Tomohiro I, Shunsuke Inenaga, Yuto Nakashima, Masayuki Takeda, Kazuya Tsuruta, A new characterization of maximal repetitions by Lyndon trees, 26th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2015 Proceedings of the 26th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2015, 10.1137/1.9781611973730.38, 562-571, 2015.01, We give a new characterization of maximal repetitions (or runs) in strings, using a tree defined on recursive standard factorizations of Lyndon words, called the Lyndon tree. The characterization leads to a remarkably simple novel proof of the linearity of the maximum number of runs p(n) in a string of length n. Furthermore, we show an upper bound of p(n) < 1.5n, which improves on the best upper bound 1.6n (Crochemore & Hie 2008) that does not rely on computational verification. The proof also gives rise to a new, conceptually simple linear-time algorithm for computing all the runs in a string. A notable characteristic of our algorithm is that, unlike all existing linear-time algorithms, it does not utilize the Lempel-Ziv factorization of the string..
7. Yuto Nakashima, Tomohiro I, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda, Constructing LZ78 tries and position heaps in linear time for large alphabets, Information Processing Letters, 10.1016/j.ipl.2015.04.002, 115, 9, 655-659, 2015.09, We present the first worst-case linear-time algorithm to compute the Lempel-Ziv 78 factorization of a given string over an integer alphabet. Our algorithm is based on nearest marked ancestor queries on the suffix tree of the given string. We also show that the same technique can be used to construct the position heap of a set of strings in worst-case linear time, when the set of strings is given as a trie..
8. Hiroe Inoue, Yoshiaki Matsuoka, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda, Computing smallest and largest repetition factorizations in O(n log n) time, 20th Prague Stringology Conference, PSC 2016 Proceedings of the Prague Stringology Conference, PSC 2016, 135-145, 2016.08, A factorization f1, . . . , fm of a string w is called a repetition factorization of w if each factor fi is a repetition, namely, fi = xkx' for some non-empty string x, an integer k ≥ 2, and x' being a proper prefix of x. Dumitran et al. (Proc. SPIRE 2015) proposed an algorithm which computes a repetition factorization of a given string w in O(n) time, where n is the length of w. In this paper, we propose two algorithms which compute smallest/largest repetition factorizations in O(n log n) time. The first algorithm is a simple O(n log n) space algorithm while the second one uses only O(n) space..
9. Golnaz Badkobeh, Travis Gagie, Szymon Grabowski, Yuto Nakashima, Simon J. Puglisi, Shiho Sugimoto, Longest common Abelian factors and large alphabets, 23rd International Symposium on String Processing and Information Retrieval, SPIRE 2016 String Processing and Information Retrieval - 23rd International Symposium, SPIRE 2016, Proceedings, 10.1007/978-3-319-46049-9_24, 254-259, 2016.10, Two strings X and Y are considered Abelian equal if the letters of X can be permuted to obtain Y (and vice versa). Recently, Alatabbi et al. (2015) considered the longest common Abelian factor problem in which we are asked to find the length of the longest Abelian-equal factor present in a given pair of strings. They provided an algorithm that uses O(σn2) time and O(σn) space, where n is the length of the pair of strings and σ is the alphabet size. In this paper we describe an algorithm that uses O(n2 log2 n log∗ n) time and O(n log2 n) space, significantly improving Alatabbi et al.’s result unless the alphabet is small. Our algorithm makes use of techniques for maintaining a dynamic set of strings under split, join, and equality testing (Melhorn et al., Algorithmica 17(2), 1997)..
10. Tomohiro I, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda, Faster Lyndon factorization algorithms for SLP and LZ78 compressed text, Theoretical Computer Science, 10.1016/j.tcs.2016.03.005, 656, 215-224, 2016.12, We present two efficient algorithms which, given a compressed representation of a string w of length N, compute the Lyndon factorization of w. Given a straight line program (SLP) S of size n that describes w, the first algorithm runs in O(n2+P(n,N)+Q(n,N)nlog⁡n) time and O(n2+S(n,N)) space, where P(n,N), S(n,N), Q(n,N) are respectively the pre-processing time, space, and query time of a data structure for longest common extensions (LCE) on SLPs. Given the Lempel–Ziv 78 encoding of size s for w, the second algorithm runs in O(slog⁡s) time and space..
11. Juha Kärkkäinen, Dominik Kempa, Yuto Nakashima, Simon J. Puglisi, Arseny M. Shur, On the size of Lempel-Ziv and Lyndon factorizations, 34th Symposium on Theoretical Aspects of Computer Science, STACS 2017 34th Symposium on Theoretical Aspects of Computer Science, STACS 2017, 10.4230/LIPIcs.STACS.2017.45, 2017.03, Lyndon factorization and Lempel-Ziv (LZ) factorization are both important tools for analysing the structure and complexity of strings, but their combinatorial structure is very different. In this paper, we establish the first direct connection between the two by showing that while the Lyndon factorization can be bigger than the non-overlapping LZ factorization (which we demonstrate by describing a new, non-trivial family of strings) it is always less than twice the size..
12. Yuto Nakashima, Takashi Okabe, Tomohiro I, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda, Inferring strings from Lyndon factorization, Theoretical Computer Science, 10.1016/j.tcs.2017.05.038, 689, 147-156, 2017.08, The Lyndon factorization of a string w is a unique factorization ℓ1p1,…,ℓmpm of w such that ℓ1,…,ℓm is a sequence of Lyndon words that is monotonically decreasing in lexicographic order. In this paper, we consider the reverse-engineering problem on Lyndon factorization: Given a sequence S=((s1,p1),…,(sm,pm)) of ordered pairs of positive integers, find a string w whose Lyndon factorization corresponds to the input sequence S, i.e., the Lyndon factorization of w is in a form of ℓ1p1,…,ℓmpm with |ℓi|=si for all 1≤i≤m. Firstly, we show that there exists a simple O(n)-time algorithm if the size of the alphabet is unbounded, where n is the length of the output string. Secondly, we present an O(n)-time algorithm to compute a string over an alphabet of the smallest size. Thirdly, we show how to compute only the size of the smallest alphabet in O(m) time. Fourthly, we give an O(m)-time algorithm to compute an O(m)-size representation of a string over an alphabet of the smallest size. Finally, we propose an efficient algorithm to enumerate all strings whose Lyndon factorizations correspond to S..
13. Yuto Nakashima, Takuya Takagi, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda, On reverse engineering the Lyndon tree, 21st Prague Stringology Conference, PSC 2017 Proceedings of the Prague Stringology Conference, PSC 2017, 108-117, 2017.08, We consider the problem of reverse engineering the Lyndon tree, i.e., given a full binary ordered tree T with n leaves as input, compute a string w of length n for which it’s Lyndon tree is isomorphic to the input tree. Although the problem is easy and solvable in linear time when assuming a binary alphabet or when there is no limit on the alphabet size, how to efficiently find the smallest alphabet size for a solution string is not known. We show several new observations concerning this problem. Namely, we show that: 1) For any full binary ordered tree T, there exists a solution string w over an alphabet of size at most h + 1, where h is the height of T. 2) For any positive n, there exists a full binary ordered tree T with n leaves, s.t. the smallest alphabet size of the solution string for T is ⌊n2 ⌋ + 1..
14. Hideo Bannai, Tomohiro I, Shunsuke Inenaga, Yuto Nakashima, Masayuki Takeda, Kazuya Tsuruta, The "runs" theorem, SIAM Journal on Computing, 10.1137/15M1011032, 46, 5, 1501-1514, 2017.09, We give a new characterization of maximal repetitions (or runs) in strings based on Lyndon words. The characterization leads to a proof of what was known as the "runs" conjecture [R. M. Kolpakov andG.Kucherov, Proceedings of the IEEE Symposium on Foundations of Computer Science (FOCS), IEEE Computer Society, Los Alamitos, CA, 1999, pp. 596-604]), which states that the maximum number of runs ρ(n) in a string of length n is less than n. The proof is remarkably simple, considering the numerous endeavors to tackle this problem in the last 15 years, and significantly improves our understanding of how runs can occur in strings. In addition, we obtain an upper bound of 3n for the maximum sum of exponents σ(n) of runs in a string of length n, improving on the best known bound of 4.1n by Crochemore et al. [J. Discrete Algorithms, 14 (2012), pp. 29-36], as well as other improved bounds on related problems. The characterization also gives rise to a new, conceptually simple linear-time algorithm for computing all the runs in a string. A notable characteristic of our algorithm is that, unlike all existing linear-time algorithms, it does not utilize the Lempel-Ziv factorization of the string. We also establish a relationship between runs and nodes of the Lyndon tree, which gives a simple optimal solution to the 2-period query problem that was recently solved by Kociumaka et al. [Proceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, (SODA) 2015, San Diego, CA, SIAM, Philadelphia, 2015, pp. 532-551]..
15. Yuta Fujishige, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda, Almost linear time computation of maximal repetitions in run length encoded strings, 28th International Symposium on Algorithms and Computation, ISAAC 2017 28th International Symposium on Algorithms and Computation, ISAAC 2017, 10.4230/LIPIcs.ISAAC.2017.33, 2017.12, We consider the problem of computing all maximal repetitions contained in a string that is given in run-length encoding. Given a run-length encoding of a string, we show that the maximum number of maximal repetitions contained in the string is at most m+k-1, where m is the size of the run-length encoding, and k is the number of run-length factors whose exponent is at least 2. We also show an algorithm for computing all maximal repetitions in O(m α (m)) time and O(m) space, where α denotes the inverse Ackermann function..
16. Yuto Nakashima, Hiroe Inoue, Takuya Mieno, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda, Shortest unique palindromic substring queries in optimal time, 28th International Workshop on Combinational Algorithms, IWOCA 2017 Combinatorial Algorithms - 28th International Workshop, IWOCA 2017, Revised Selected Papers, 10.1007/978-3-319-78825-8_32, 397-408, 2018.01, palindrome is a string that reads the same forward and backward. A palindromic substring P of a string S is called a shortest unique palindromic substring (SUPS) for an interval [s, t] in S, if P occurs exactly once in S, this occurrence of P contains interval [s, t], and every palindromic substring of S which contains interval [s, t] and is shorter than P occurs at least twice in S. The SUPS problem is, given a string S, to preprocess S so that for any subsequent query interval [s, t] all the SUPSs for interval [s, t] can be answered quickly. We present an optimal solution to this problem. Namely, we show how to preprocess a given string S of length n in O(n) time and space so that all SUPSs for any subsequent query interval can be answered in O(α + 1) time, where α is the number of outputs..
17. Kotaro Aoyama, Yuto Nakashima, I. Tomohiro, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda, Faster online elastic degenerate string matching, 29th Annual Symposium on Combinatorial Pattern Matching, CPM 2018 29th Annual Symposium on Combinatorial Pattern Matching, CPM 2018, 10.4230/LIPIcs.CPM.2018.9, 91-910, 2018.05, An Elastic-Degenerate String [Iliopoulus et al., LATA 2017] is a sequence of sets of strings, which was recently proposed as a way to model a set of similar sequences. We give an online algorithm for the Elastic-Degenerate String Matching (EDSM) problem that runs in O(nm √mlogm+ N) time and O(m) working space, where n is the number of elastic degenerate segments of the text, N is the total length of all strings in the text, and m is the length of the pattern. This improves the previous algorithm by Grossi et al. [CPM 2017] that runs in O(nm2 + N) time..
18. Yuki Urabe, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda, Longest lyndon substring after edit, 29th Annual Symposium on Combinatorial Pattern Matching, CPM 2018 29th Annual Symposium on Combinatorial Pattern Matching, CPM 2018, 10.4230/LIPIcs.CPM.2018.19, 191-1910, 2018.05, The longest Lyndon substring of a string T is the longest substring of T which is a Lyndon word. LLS(T) denotes the length of the longest Lyndon substring of a string T. In this paper, we consider computing LLS(T′) where T′ is an edited string formed from T. After O(n) time and space preprocessing, our algorithm returns LLS(T′) in O(log n) time for any single character edit. We also consider a version of the problem with block edits, i.e., a substring of T is replaced by a given string of length l. After O(n) time and space preprocessing, our algorithm returns LLS(T′) in O(l log σ + log n) time for any block edit where σ is the number of distinct characters in T. We can modify our algorithm so as to output all the longest Lyndon substrings of T′ for both problems..
19. Mitsuru Funakoshi, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda, Longest substring palindrome after edit, 29th Annual Symposium on Combinatorial Pattern Matching, CPM 2018 29th Annual Symposium on Combinatorial Pattern Matching, CPM 2018, 10.4230/LIPIcs.CPM.2018.12, 105, 121-1214, 2018.05, It is known that the length of the longest substring palindromes (LSPals) of a given string T of length n can be computed in O(n) time by Manacher's algorithm [J. ACM '75]. In this paper, we consider the problem of finding the LSPal after the string is edited. We present an algorithm that uses O(n) time and space for preprocessing, and answers the length of the LSPals in O(log(min{ω, log n})) time after single character substitution, insertion, or deletion, where ω denotes the number of distinct characters appearing in T. We also propose an algorithm that uses O(n) time and space for preprocessing, and answers the length of the LSPals in O(ℓ+log n) time, after an existing substring in T is replaced by a string of arbitrary length ℓ..
20. Isamu Furuya, Yuto Nakashima, I. Tomohiro, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda, Lyndon factorization of grammar compressed texts revisited, 29th Annual Symposium on Combinatorial Pattern Matching, CPM 2018 29th Annual Symposium on Combinatorial Pattern Matching, CPM 2018, 10.4230/LIPIcs.CPM.2018.24, 241-2410, 2018.05, We revisit the problem of computing the Lyndon factorization of a string w of length N which is given as a straight line program (SLP) of size n. For this problem, we show a new algorithm which runs in O(P(n,N) + Q(n,N)n log logN) time and O(n logN + S(n,N)) space where P(n,N), S(n,N), Q(n,N) are respectively the pre-processing time, space, and query time of a data structure for longest common extensions (LCE) on SLPs. Our algorithm improves the algorithm proposed by I et al. (TCS '17), and can be more efficient than the O(N)-time solution by Duval (J. Algorithms '83) when w is highly compressible..
21. Akihiro Nishi, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda, O(n log n)-time text compression by LZ-style longest first substitution, 22nd Prague Stringology Conference, PSC 2018 Proceedings of the Prague Stringology Conference, PSC 2018, 12-26, 2018.08, Mauer et al. [A Lempel-Ziv-style Compression Method for Repetitive Texts, PSC 2017] proposed a hybrid text compression method called LZ-LFS which has both features of Lempel-Ziv 77 factorization and longest first substitution. They showed that LZ-LFS can achieve better compression ratio for repetitive texts, compared to some state-of-the-art compression algorithms. The drawback of Mauer et al.'s method is that their LZ-LFS compression algorithm takes O(n2) time on an input string of length n. In this paper, we show a faster LZ-LFS compression algorithm that works in O(n log n) time. We also propose a simpler version of LZ-LFS that can be computed in O(n) time..
22. Noriki Fujisato, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda, Right-to-left online construction of parameterized position heaps, 22nd Prague Stringology Conference, PSC 2018 Proceedings of the Prague Stringology Conference, PSC 2018, 91-102, 2018.08, Two strings of equal length are said to parameterized match if there is a bijection that maps the characters of one string to those of the other string, so that two strings become identical. The parameterized pattern matching problem is, given two strings T and P, to find the occurrences of substrings in T that parameterized match P. Diptarama et al. [Position Heaps for Parameterized Strings, CPM 2017] proposed an indexing data structure called parameterized position heaps, and gave a left-toright online construction algorithm. In this paper, we present a right-to-left online construction algorithm for parameterized position heaps. For a text string T of length n over two kinds of alphabets Σand II of respective size σ and π, our construction algorithm runs in O(n log(σ+π)) time with O(n) space. Our right-to-left parameterized position heaps support pattern matching queries in O(mlog(σ+π)+mπ+pocc)) time, where m is the length of a query pattern P and pocc is the number of occurrences to report. Our construction and pattern matching algorithms are as efficient as Diptarama et al.'s algorithms..
23. Hiroe Inoue, Yuto Nakashima, Takuya Mieno, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda, Algorithms and combinatorial properties on shortest unique palindromic substrings, Journal of Discrete Algorithms, 10.1016/j.jda.2018.11.009, 52-53, 122-132, 2018.09, A palindrome is a string that reads the same forward and backward. A palindromic substring P of a string S is called a shortest unique palindromic substring (SUPS) for an interval [s,t] in S, if P occurs exactly once in S, this occurrence of P contains interval [s,t], and every palindromic substring of S which contains interval [s,t] and is shorter than P occurs at least twice in S. The SUPS problem is, given a string S, to preprocess S so that for any subsequent query interval [s,t] all the SUPSs for interval [s,t] can be answered quickly. We present an optimal solution to this problem. Namely, we show how to preprocess a given string S of length n in O(n) time and space so that all SUPSs for any subsequent query interval can be answered in O(α+1) time, where α is the number of outputs. We also discuss the number of SUPSs in a string..
24. Yuki Kuhara, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda, Recovering, counting and enumerating strings from forward and backward suffix arrays, 25th International Symposium on String Processing and Information Retrieval, SPIRE 2018 String Processing and Information Retrieval - 25th International Symposium, SPIRE 2018, Proceedings, 10.1007/978-3-030-00479-8_21, 254-267, 2018.10, The suffix array SAw of a string w of length n is a permutation of [1..n] such that SAw[i]=j iff w[j, n] is the lexicographically i-th suffix of w. In this paper, we consider variants of the reverse-engineering problem on suffix arrays with two given permutations P and Q of [1..n], such that P refers to the forward suffix array of some string w and Q refers to the backward suffix array of the reversed string wR. Our results are the following: (1) An algorithm which computes a solution string over an alphabet of the smallest size, in O(n) time. (2) The exact number of solution strings over an alphabet of size σ. (3) An efficient algorithm which computes all solution strings in the lexicographical order, in time near optimal up to log n factor..
25. Isamu Furuya, Takuya Takagi, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Takuya Kida, MR-RePair
Grammar Compression Based on Maximal Repeats, 2019 Data Compression Conference, DCC 2019 Proceedings - DCC 2019 2019 Data Compression Conference, 10.1109/DCC.2019.00059, 508-517, 2019.03, We analyze the grammar generation algorithm of the RePair compression algorithm and show the relation between a grammar generated by RePair and maximal repeats. We reveal that RePair replaces step by step the most frequent pairs within the corresponding most frequent maximal repeats. Then, we design a novel variant of RePair, called MR-RePair, which substitutes the most frequent maximal repeats at once instead of substituting the most frequent pairs consecutively. We implemented MR-RePair and compared the size of the grammar generated by MR-RePair to that by RePair on several text corpora. Our experiments show that MR-RePair generates more compact grammars than RePair does, especially for highly repetitive texts..
26. Noriki Fujisato, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda, The Parameterized Position Heap of a Trie, Proceedings of 11th International Conference on Algorithms and Complexity, CIAC 2019, 10.1007/978-3-030-17402-6_20, 237-248, 2019.05, Let and be disjoint alphabets of respective size and. Two strings over of equal length are said to parameterized match (p-match) if there is a bijection such that (1) f is identity on and (2) f maps the characters of one string to those of the other string so that the two strings become identical. We consider the p-matching problem on a (reversed) trie and a string pattern P such that every path that p-matches P has to be reported. Let N be the size of the given trie. In this paper, we propose the parameterized position heap for that occupies O(N) space and supports p-matching queries in time, where m is the length of a query pattern P and is the number of paths in to report. We also present an algorithm which constructs the parameterized position heap for a given trie in time and working space..
27. Ryo Sugahara, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda, Computing runs on a trie, Proceedings of 30th Annual Symposium on Combinatorial Pattern Matching, CPM 2019, 10.4230/LIPIcs.CPM.2019.23, 2019.06, A maximal repetition, or run, in a string, is a maximal periodic substring whose smallest period is at most half the length of the substring. In this paper, we consider runs that correspond to a path on a trie, or in other words, on a rooted edge-labeled tree where the endpoints of the path must be a descendant/ancestor of the other. For a trie with n edges, we show that the number of runs is less than n. We also show an O(n√log n log log n) time and O(n) space algorithm for counting and finding the shallower endpoint of all runs. We further show an O(n log n) time and O(n) space algorithm for finding both endpoints of all runs. We also discuss how to improve the running time even more..
28. Mitsuru Funakoshi, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda, Faster queries for longest substring palindrome after block edit, Proceedings of 30th Annual Symposium on Combinatorial Pattern Matching, CPM 2019, 10.4230/LIPIcs.CPM.2019.27, 2019.06, Palindromes are important objects in strings which have been extensively studied from combinatorial, algorithmic, and bioinformatics points of views. Manacher [J. ACM 1975] proposed a seminal algorithm that computes the longest substring palindromes (LSPals) of a given string in O(n) time, where n is the length of the string. In this paper, we consider the problem of finding the LSPal after the string is edited. We present an algorithm that uses O(n) time and space for preprocessing, and answers the length of the LSPals in O(ℓ + log log n) time, after a substring in T is replaced by a string of arbitrary length ℓ. This outperforms the query algorithm proposed in our previous work [CPM 2018] that uses O(ℓ + log n) time for each query..
29. Yuki Urabe, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda, On the size of overlapping Lempel-Ziv and Lyndon factorizations, Proceedings of 30th Annual Symposium on Combinatorial Pattern Matching, CPM 2019, 10.4230/LIPIcs.CPM.2019.29, 2019.06, Lempel-Ziv (LZ) factorization and Lyndon factorization are well-known factorizations of strings. Recently, Kärkkäinen et al. studied the relation between the sizes of the two factorizations, and showed that the size of the Lyndon factorization is always smaller than twice the size of the non-overlapping LZ factorization [STACS 2017]. In this paper, we consider a similar problem for the overlapping version of the LZ factorization. Since the size of the overlapping LZ factorization is always smaller than the size of the non-overlapping LZ factorization and, in fact, can even be an O(log n) factor smaller, it is not immediately clear whether a similar bound as in previous work would hold. Nevertheless, in this paper, we prove that the size of the Lyndon factorization is always smaller than four times the size of the overlapping LZ factorization..
30. Kiichi Watanabe, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda, Shortest unique palindromic substring queries on run-length encoded strings, Proceedings of 30th International Workshop on Combinatorial Algorithms, IWOCA 2019, 10.1007/978-3-030-25005-8_35, 430-441, 2019.07, For a string S, a palindromic substring S[i.j] is said to be a shortest unique palindromic substring (SUPS) for an interval [s, t] in S, if S[i.j] occurs exactly once in S, the interval [i, j] contains [s, t], and every palindromic substring containing [s, t] which is shorter than S[i.j] occurs at least twice in S. In this paper, we study the problem of answering SUPS queries on run-length encoded strings. We show how to preprocess a given run-length encoded string RLES of size m in O(m) space and O(mlog σRLES + m√log m/log logm) time so that all SUPSs for any subsequent query interval can be answered in O(√log m/log logm+ α) time, where α is the number of outputs, and σRLES is the number of distinct runs of RLES..
31. Mitsuru Funakoshi, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda, Computing maximal palindromes and distinct palindromes in a trie, 23rd Prague Stringology Conference, PSC 2019 Proceedings of the Prague Stringology Conference, PSC 2019, 3-15, 2019.08, It is known that all maximal palindromes of a given string T of length n can be computed in O(n) time by Manacher's algorithm [J. ACM '75]. Also, all distinct palindromes in T can be computed in O(n) time [Groult et al., Inf. Process. Lett. 2010]. In this paper, we consider the problem of computing maximal palindromes and distinct palindromes of a given trie T (i.e. rooted edge-labeled tree). A trie is a natural generalization of a string which can be seen as a single path tree. We propose algorithms to compute all maximal palindromes and all distinct palindromes in T in O(N log h) time and O(N) space, where N is the number of edges in T and h is the height of T . To our knowledge these are the first sub-quadratic time solutions to these problems..
32. Takuya Mieno, Dominik Köppl, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda, Compact Data Structures for Shortest Unique Substring Queries, Proceedings of 26th International Symposium on String Processing and Information Retrieval, SPIRE 2019, 10.1007/978-3-030-32686-9_8, 107-123, 2019.10, Given a string T of length n, a substring of T is called a shortest unique substring (SUS) for an interval [s, t] if (a) u occurs exactly once in T, (b) u contains the interval [s, t] (i.e.), and (c) every substring v of T with containing [s, t] occurs at least twice in T. Given a query interval, the interval SUS problem is to output all the SUSs for the interval [s, t]. In this article, we propose a bits data structure answering an interval SUS query in output-sensitive time, where is the number of returned SUSs. Additionally, we focus on the point SUS problem, which is the interval SUS problem for. Here, we propose a bits data structure answering a point SUS query in the same output-sensitive time..
33. Noriki Fujisato, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda, Direct Linear Time Construction of Parameterized Suffix and LCP Arrays for Constant Alphabets, Proceedings of 26th International Symposium on String Processing and Information Retrieval, SPIRE 2019, 10.1007/978-3-030-32686-9_27, 382-391, 2019.10, We present the first worst-case linear time algorithm that directly computes the parameterized suffix and LCP arrays for constant sized alphabets. Previous algorithms either required quadratic time or the parameterized suffix tree to be built first. More formally, for a string over static alphabet and parameterized alphabet, our algorithm runs in time and O(n) words of space, where is the number of distinct symbols of in the string..
34. Kazuki Kai, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda, Tomasz Kociumaka, On Longest Common Property Preserved Substring Queries, Proceedings of 26th International Symposium on String Processing and Information Retrieval, SPIRE 2019, 10.1007/978-3-030-32686-9_12, 162-174, 2019.10, We revisit the problem of longest common property preserving substring queries introduced by Ayad et al. (SPIRE 2018, arXiv 2018). We consider a generalized and unified on-line setting, where we are given a set X of k strings of total length n that can be pre-processed so that, given a query string y and a positive integer (Formula presented), we can determine the longest substring of y that satisfies some specific property and is common to at least (Formula presented) strings in X. Ayad et al. considered the longest square-free substring in an on-line setting and the longest periodic and palindromic substring in an off-line setting. In this paper, we give efficient solutions in the on-line setting for finding the longest common square, periodic, palindromic, and Lyndon substrings. More precisely, we show that X can be pre-processed in O(n) time resulting in a data structure of O(n) size that answers queries in (Formula presented) time and O(1) working space, where (Formula presented) is the size of the alphabet, and the common substring must be a square, a periodic substring, a palindrome, or a Lyndon word..
35. Yuto Nakashima, Takuya Takagi, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda, On the size of the smallest alphabet for Lyndon trees, Theoretical Computer Science, 10.1016/j.tcs.2018.06.044, 792, 131-143, 2019.11, We consider the problem of reverse-engineering the Lyndon tree, i.e., given a full binary ordered tree T with n leaves as input, we are to compute a string w of length n of which Lyndon tree is isomorphic to the input tree T. Hereby we call such a string a solution string. Although the problem is easily solvable in linear time for binary alphabets and unbounded-size alphabets, it is not known how to efficiently find the smallest alphabet size for a solution string. In this paper, we show several new observations concerning this problem. Namely, we show that: 1) For any positive integer n, there exists a full binary ordered tree T with n leaves, s.t. the smallest alphabet size of a solution string for T is ⌊ [Formula presented] ⌋+1. 2) For any full binary ordered tree T with n leaves, there exists a solution string w over an alphabet of size at most ⌊ [Formula presented] ⌋+1. 3) For any full binary ordered tree T, there exists a solution string w over an alphabet of size at most h+1, where h is the height of T. 4) For any complete binary ordered tree T with 2k leaves, there exists a solution string w over an alphabet of size at most 4..
36. Yuta Fujishige, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda, An improved data structure for left-right maximal generic words problem, Proceedings of 30th International Symposium on Algorithms and Computation, ISAAC 2019, 10.4230/LIPIcs.ISAAC.2019.40, 2019.12, For a set D of documents and a positive integer d, a string w is said to be d-left-right maximal, if (1) w occurs in at least d documents in D, and (2) any proper superstring of w occurs in less than d documents. The left-right-maximal generic words problem is, given a set D of documents, to preprocess D so that for any string p and for any positive integer d, all the superstrings of p that are d-left-right maximal can be answered quickly. In this paper, we present an O(n log m) space data structure (in words) which answers queries in O(|p| + o log log m) time, where n is the total length of documents in D, m is the number of documents in D and o is the number of outputs. Our solution improves the previous one by Nishimoto et al. (PSC 2015), which uses an O(n log n) space data structure answering queries in O(|p| + r · log n + o · log2 n) time, where r is the number of right-extensions q of p occurring in at least d documents such that any proper right extension of q occurs in less than d documents..
37. Kiichi Watanabe, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda, Fast Algorithms for the Shortest Unique Palindromic Substring Problem on Run-Length Encoded Strings, Theory of Computing Systems, 10.1007/s00224-020-09980-x, 2020.01, For a string S, a palindromic substring S[i.j] is said to be a shortest unique palindromic substring (SUPS) for an interval [s,t] in S, if S[i.j] occurs exactly once in S, the interval [i,j] contains [s,t], and every palindromic substring containing [s,t] which is shorter than S[i.j] occurs at least twice in S. In this paper, we study the problem of answering SUPS queries on run-length encoded strings. We show how to preprocess a given run-length encoded string RLES of size m in O(m) space and O(mlogσRLES+mlogm/loglogm) time so that all SUPSs for any subsequent query interval can be answered in O(logm/loglogm+α) time, where α is the number of outputs, and σRLES is the number of distinct runs of RLES. Additionaly, we consider a variant of the SUPS problem where a query interval is also given in a run-length encoded form. For this variant of the problem, we present two alternative algorithms with faster queries. The first one answers queries in O(loglogm/logloglogm+α) time and can be built in O(mlogσRLES+mlogm/loglogm) time, and the second one answers queries in O(log log m+ α) time and can be built in O(mlogσRLES) time. Both of these data structures require O(m) space..
38. Kohei Yamada, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda, Faster STR-EC-LCS Computation, Proceedings of 46th International Conference on Current Trends in Theory and Practice of Computer Science, SOFSEM 2020, 10.1007/978-3-030-38919-2_11, 125-135, 2020.01, The longest common subsequence (LCS) problem is a central problem in stringology that finds the longest common subsequence of given two strings A and B. More recently, a set of four constrained LCS problems (called generalized constrained LCS problem) were proposed by Chen and Chao [J. Comb. Optim, 2011]. In this paper, we consider the substring-excluding constrained LCS (STR-EC-LCS) problem. A string Z is said to be an STR-EC-LCS of two given strings A and B excluding P if, Z is one of the longest common subsequences of A and B that does not contain P as a substring. Wang et al. proposed a dynamic programming solution which computes an STR-EC-LCS in O(mnr) time and space where [Inf. Process. Lett., 2013]. In this paper, we show a new solution for the STR-EC-LCS problem. Our algorithm computes an STR-EC-LCS in time where denotes the set of distinct characters occurring in both A and B, and L is the length of the STR-EC-LCS. This algorithm is faster than the O(mnr)-time algorithm for short/long STR-EC-LCS (namely, or), and is at least as efficient as the O(mnr)-time algorithm for all cases..
39. Takuya Mieno, Yuki Kuhara, Tooru Akagi, Yuta Fujishige, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda, Minimal Unique Substrings and Minimal Absent Words in a Sliding Window, Proceedings of 46th International Conference on Current Trends in Theory and Practice of Computer Science, SOFSEM 2020, 10.1007/978-3-030-38919-2_13, 148-160, 2020.01, A substring u of a string T is called a minimal unique substring (MUS) of T if u occurs exactly once in T and any proper substring of u occurs at least twice in T. A string w is called a minimal absent word (MAW) of T if w does not occur in T and any proper substring of w occurs in T. In this paper, we study the problems of computing MUSs and MAWs in a sliding window over a given string T. We first show how the set of MUSs can change in a sliding window over T, and present an-time and O(d)-space algorithm to compute MUSs in a sliding window of width d over T, where is the maximum number of distinct characters in every window. We then give tight upper and lower bounds on the maximum number of changes in the set of MAWs in a sliding window over T. Our bounds improve on the previous results in Crochemore et al. (2017)..
40. Kazuya Tsuruta, Dominik Köppl, Shunsuke Kanda, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda, C-Trie++
A dynamic trie tailored for fast prefix searches, 2020 Data Compression Conference, DCC 2020 Proceedings - DCC 2020 Data Compression Conference, 10.1109/DCC47342.2020.00032, 243-252, 2020.03, Given a dynamic set K of k strings of total length n whose characters are drawn from an alphabet of size σ, a keyword dictionary is a data structure built on K that provides locate, prefix search, and update operations on K. Under the assumption that α = w / lg σ characters fit into a single machine word w, we propose a keyword dictionary that represents K in n lg σ + Θ(k lg n) bits of space, supporting all operations in Θ(m / α + lg α) expected time on an input string of length m in the word RAM model. This data structure is underlined with an exhaustive practical evaluation, highlighting the practical usefulness of the proposed data structure, especially for prefix searches - one of the most elementary keyword dictionary operations..
41. Isamu Furuya, Takuya Takagi, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Takuya Kida, Practical grammar compression based on maximal repeats, Algorithms, 10.3390/A13040103, 13, 4, 2020.04, This study presents an analysis of RePair, which is a grammar compression algorithm known for its simple scheme, while also being practically effective. First, we show that the main process of RePair, that is, the step by step substitution of the most frequent symbol pairs, works within the corresponding most frequent maximal repeats. Then, we reveal the relation between maximal repeats and grammars constructed by RePair. On the basis of this analysis, we further propose a novel variant of RePair, called MR-RePair, which considers the one-time substitution of the most frequent maximal repeats instead of the consecutive substitution of the most frequent pairs. The results of the experiments comparing the size of constructed grammars and execution time of RePair and MR-RePair on several text corpora demonstrate that MR-RePair constructs more compact grammars than RePair does, especially for highly repetitive texts..
42. Katsuhito Nakashima, Noriki Fujisato, Diptarama Hendrian, Yuto Nakashima, Ryo Yoshinaka, Shunsuke Inenaga, Hideo Bannai, Ayumi Shinohara, Masayuki Takeda, DAWGs for Parameterized Matching
Online Construction and Related Indexing Structures, 31st Annual Symposium on Combinatorial Pattern Matching, CPM 2020 31st Annual Symposium on Combinatorial Pattern Matching, CPM 2020, 10.4230/LIPIcs.CPM.2020.26, 2020.06, Two strings x and y over of equal length are said to parameterized match (p-match) if there is a renaming bijection f : That is identity on and transforms x to y (or vice versa). The p-matching problem is to look for substrings in a text that p-match a given pattern. In this paper, we propose parameterized suffix automata (p-suffix automata) and parameterized directed acyclic word graphs (PDAWGs) which are the p-matching versions of suffix automata and DAWGs. While suffix automata and DAWGs are equivalent for standard strings, we show that p-suffix automata can have(n2) nodes and edges but PDAWGs have only O(n) nodes and edges, where n is the length of an input string. We also give O(n log( + ))-time O(n)-space algorithm that builds the PDAWG in a left-to-right online manner. As a byproduct, it is shown that the parameterized suffix tree for the reversed string can also be built in the same time and space, in a right-to-left online manner. 2012 ACM Subject Classification Theory of computation ! Pattern matching..
43. Mitsuru Funakoshi, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda, Ayumi Shinohara, Detecting k-(Sub-)Cadences and Equidistant Subsequence Occurrences, 31st Annual Symposium on Combinatorial Pattern Matching, CPM 2020 31st Annual Symposium on Combinatorial Pattern Matching, CPM 2020, 10.4230/LIPIcs.CPM.2020.12, 2020.06, The equidistant subsequence pattern matching problem is considered. Given a pattern string P and a text string T, we say that P is an equidistant subsequence of T if P is a subsequence of the text such that consecutive symbols of P in the occurrence are equally spaced. We can consider the problem of equidistant subsequences as generalizations of (sub-)cadences. We give bit-parallel algorithms that yield o(n2) time algorithms for finding k-(sub-)cadences and equidistant subsequences. Furthermore, O(n log2 n) and O(n log n) time algorithms, respectively for equidistant and Abelian equidistant matching for the case P = 3, are shown. The algorithms make use of a technique that was recently introduced which can efficiently compute convolutions with linear constraints. 2012 ACM Subject Classification Mathematics of computing ! Combinatorial algorithms..
44. Kazuya Tsuruta, Dominik Köppl, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda, Grammar-compressed Self-index with Lyndon Words, 情報処理学会論文誌数理モデル化と応用(TOM), 13, 2, 84-92, 2020.08, We introduce a new class of straight-line programs (SLPs), named the Lyndon SLP, inspired by the Lyndon trees (Barcelo, 1990). Based on this SLP, we propose a self-index data structure of O(g) words of spacethat can be built from a string T in O(n lg n) expected time, retrieving the starting positions of all occurrences of a pattern P of length m in O(m + lg m lg n + occ lg g) time, where n is the length of T, g is the size of the Lyndon SLP for T, and occ is the number of occurrences of P in T.We introduce a new class of straight-line programs (SLPs), named the Lyndon SLP, inspired by the Lyndon trees (Barcelo, 1990). Based on this SLP, we propose a self-index data structure of O(g) words of spacethat can be built from a string T in O(n lg n) expected time, retrieving the starting positions of all occurrences of a pattern P of length m in O(m + lg m lg n + occ lg g) time, where n is the length of T, g is the size of the Lyndon SLP for T, and occ is the number of occurrences of P in T..
45. Shiori Mitsuya, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda, Compressed Communication Complexity of Hamming Distance, Algorithms, 10.3390/a14040116, 14, 4, 116-116, 2021.04.
46. Akio Nishimoto, Noriki Fujisato, Yuto Nakashima, Shunsuke Inenaga, Position Heaps for Cartesian-Tree Matching on Strings and Tries, Proc. of 28th International Symposium in String Processing and Information Retrieval (SPIRE 2021), 241-254, 2021.10.
47. Takumi Ideue, Takuya Mieno, Mitsuru Funakoshi, Yuto Nakashima, Shunsuke Inenaga, Masayuki Takeda, On the Approximation Ratio of LZ-End to LZ77, Proc. of 28th International Symposium in String Processing and Information Retrieval (SPIRE 2021), 114-126, 2021.10.
48. Kosuke Fujita, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda, Longest Common Rollercoasters, Proc. of 28th International Symposium in String Processing and Information Retrieval (SPIRE 2021), 21-32, 2021.10.
49. Tooru Akagi, Dominik Köppl, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda, Grammar Index by Induced Suffix Sorting, Proc. of 28th International Symposium in String Processing and Information Retrieval (SPIRE 2021), 85-99, 2021.10.
50. Noriki Fujisato, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda, The Parameterized Suffix Tray, Proc. of 12th International Conference on Algorithms and Complexity (CIAC 2019), 258-270, 2021.10.