2024/10/07 更新

お知らせ

 

写真a

ナカシマ ユウト
中島 祐人
NAKASHIMA YUTO
所属
システム情報科学研究院 情報学部門 准教授
理学部 物理学科(併任)
システム情報科学府 情報理工学専攻(併任)
職名
准教授
プロフィール
文字列の組合せ論および文字列データ処理アルゴリズムの研究を行なっている. 教育では主に,理学部物理学科情報理学コースの演習科目を担当している.
外部リンク

研究分野

  • 情報通信 / 情報学基礎論

学位

  • 博士(情報科学)

経歴

  • 九州大学 大学院システム情報科学研究院 情報学部門 准教授 

    2024年6月 - 現在

      詳細を見る

    国名:日本国

    researchmap

  • 九州大学 大学院システム情報科学研究院 情報学部門 助教 

    2017年4月 - 2024年5月

      詳細を見る

研究テーマ・研究キーワード

  • 研究テーマ: 文字列組合せ論

    研究キーワード: 文字列組合せ論

    研究期間: 2024年

  • 研究テーマ: 文字列アルゴリズム

    研究キーワード: 文字列アルゴリズム

    研究期間: 2024年

  • 研究テーマ: 文字列情報処理, 文字列の組合せ論に関する研究

    研究キーワード: パターン照合, ストリームモデル, Lyndon文字列

    研究期間: 2017年4月

受賞

  • 2023年度山下記念研究賞

    2024年3月   一般社団法人情報処理学会   アルファベット順による lex-parse サイズ比

     詳細を見る

  • SPIRE2020 Best Paper Award

    2020年10月   SPIRE2020  

論文

  • On the Number of Non-equivalent Parameterized Squares in a String 査読

    Rikuya Hamai, Kazushi Taketsugu, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai

    Proceedings of 31th International Symposium on String Processing and Information Retrieval (SPIRE 2024)   174 - 183   2024年9月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(国際会議プロシーディングス)  

    researchmap

  • Edit and Alphabet-Ordering Sensitivity of Lex-Parse 査読

    Yuto Nakashima, Dominik Köppl, Mitsuru Funakoshi, Shunsuke Inenaga, Hideo Bannai

    Proceedings of the 49th International Symposium on Mathematical Foundations of Computer Science (MFCS 2024)   75:1 - 75:15   2024年8月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(国際会議プロシーディングス)  

    researchmap

  • Computing Longest Common Subsequence Under Cartesian-Tree Matching Model. 査読

    Taketo Tsujimoto, Hiroki Shibata, Takuya Mieno, Yuto Nakashima 0001, Shunsuke Inenaga

    Proceedings of the 35th International Workshop on Combinatorial Algorithms (IWOCA 2024)   369 - 381   2024年7月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(国際会議プロシーディングス)  

    DOI: 10.1007/978-3-031-63021-7_28

    researchmap

    その他リンク: https://dblp.uni-trier.de/db/conf/iwoca/iwoca2024.html#TsujimotoSMNI24

  • Computing Maximal Palindromes in Non-standard Matching Models. 査読

    Mitsuru Funakoshi, Takuya Mieno, Yuto Nakashima 0001, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

    Proceedings of the 35th International Workshop on Combinatorial Algorithms (IWOCA 2024)   165 - 179   2024年7月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(国際会議プロシーディングス)  

    DOI: 10.1007/978-3-031-63021-7_13

    researchmap

    その他リンク: https://dblp.uni-trier.de/db/conf/iwoca/iwoca2024.html#FunakoshiMNIBT24

  • Faster space-efficient STR-IC-LCS computation 査読

    Yonemoto Y., Nakashima Y., Inenaga S., Bannai H.

    Theoretical Computer Science   1003   114607 - 114607   2024年5月   ISSN:0304-3975

     詳細を見る

    記述言語:英語   掲載種別:研究論文(学術雑誌)   出版者・発行元:Theoretical Computer Science  

    One of the most fundamental method for comparing two given strings A and B is the longest common subsequence (LCS), where the task is to find (the length) of an LCS of A and B. In this paper, we deal with the STR-IC-LCS1 problem which is one of the constrained LCS problems proposed by Chen and Chao [J. Comb. Optim, 2011]. A string Z is said to be an STR-IC-LCS of three given strings A, B, and P, if Z is a longest string satisfying that (1) Z includes P as a substring and (2) Z is a common subsequence of A and B. We present three efficient algorithms for this problem: First, we begin with a space-efficient solution which computes the length of an STR-IC-LCS in O(n2) time and O((ℓ+1)(n−ℓ+1)) space, where ℓ is the length of an LCS of A and B of length n. When ℓ=O(1) or n−ℓ=O(1), then this algorithm uses only linear O(n) space. Second, we present a faster algorithm that works in O(nr/log⁡r+n(n−ℓ+1)) time, where r is the length of P, while retaining the O((ℓ+1)(n−ℓ+1)) space efficiency. Third, we give an alternative algorithm that runs in O(nr/log⁡r+n(n−ℓ′+1)) time with O((ℓ′+1)(n−ℓ′+1)) space, where ℓ′ denotes the STR-IC-LCS length for input strings A, B, and P.

    DOI: 10.1016/j.tcs.2024.114607

    Scopus

    researchmap

  • Largest Repetition Factorization of Fibonacci Words 査読

    Kaisei Kishi, Yuto Nakashima, Shunsuke Inenaga

    Proceedings of 30th International Symposium on String Processing and Information Retrieval (SPIRE 2023)   2023年9月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(国際会議プロシーディングス)  

  • Linear-Time Computation of Generalized Minimal Absent Words for Multiple Strings 査読

    Okabe K., Mieno T., Nakashima Y., Inenaga S., Bannai H.

    Proceedings of 30th International Symposium on String Processing and Information Retrieval (SPIRE 2023)   14240 LNCS   331 - 344   2023年9月   ISSN:0302-9743 ISBN:9783031439797, 9783031439803 eISSN:1611-3349

     詳細を見る

    記述言語:英語   掲載種別:研究論文(国際会議プロシーディングス)   出版者・発行元:Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)  

    A string w is called a minimal absent word (MAW) for a string S if w does not occur as a substring in S and all proper substrings of w occur in S. MAWs are well-studied combinatorial string objects that have potential applications in areas including bioinformatics, musicology, and data compression. In this paper, we generalize the notion of MAWs to a set $$\mathcal {S} = \{S_1, \ldots, S_k\}$$ of multiple strings. We first describe our solution to the case of $$k = 2$$ strings, and show how to compute the set $$\textsf{M}$$ of MAWs in optimal $$O(n + |\textsf{M}|)$$ time and with O(n) working space, where n denotes the total length of the strings in $$\mathcal {S}$$. We then move on to the general case of $$k > 2$$ strings, and show how to compute the set $$\textsf{M}$$ of MAWs in $$O(n \lceil k / \log n \rceil + |\textsf{M}|)$$ time and with $$O(n (k + \log n))$$ bits of working space, in the word RAM model with machine word size $$\omega = \log n$$. The latter algorithm runs in optimal $$O(n + |\textsf{M}|)$$ time for $$k = O(\log n)$$.

    DOI: 10.1007/978-3-031-43980-3_27

    Scopus

    researchmap

  • Optimally Computing Compressed Indexing Arrays Based on the Compact Directed Acyclic Word Graph 査読

    Arimura H., Inenaga S., Kobayashi Y., Nakashima Y., Sue M.

    Proceedings of 30th International Symposium on String Processing and Information Retrieval (SPIRE 2023)   14240 LNCS   28 - 34   2023年9月   ISSN:03029743 ISBN:9783031439797

     詳細を見る

    記述言語:英語   掲載種別:研究論文(国際会議プロシーディングス)   出版者・発行元:Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)  

    In this paper, we present the first study of the computational complexity of converting an automata-based text index structure, called the Compact Directed Acyclic Word Graph (CDAWG), of size e for a text T of length n into other text indexing structures for the same text, suitable for highly repetitive texts: the run-length BWT of size r, the irreducible PLCP array of size r, and the quasi-irreducible LPF array of size e, as well as the lex-parse of size O(r) and the LZ77-parse of size z, where $$r, z \leqslant e$$. As main results, we showed that the above structures can be optimally computed from either the CDAWG for T stored in read-only memory or its self-index version of size e without a text in O(e) worst-case time and words of working space. To obtain the above results, we devised techniques for enumerating a particular subset of suffixes in the lexicographic and text orders using the forward and backward search on the CDAWG by extending the result by Belazzougui et al. in 2015.

    DOI: 10.1007/978-3-031-43980-3_3

    Scopus

    researchmap

    その他リンク: https://dblp.uni-trier.de/db/conf/spire/spire2023.html#ArimuraIKNS23

  • Computing SEQ-IC-LCS of Labeled Graphs 査読 国際誌

    Yuki Yonemoto, Yuto Nakashima, Shunsuke Inenaga

    Proceedings of the Prague Stringology Conference 2023 (PSC 2023)   2023年8月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(国際会議プロシーディングス)  

  • On Sensitivity of Compact Directed Acyclic Word Graphs 査読 国際誌

    Hiroto Fujimaru, Yuto Nakashima, Shunsuke Inenaga

    Proceedings of 14th International Conference on Words (WORDS 2023)   168 - 180   2023年6月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(国際会議プロシーディングス)  

  • Optimal LZ-End Parsing Is Hard 査読 国際誌

    Hideo Bannai, Mitsuru Funakoshi, Kazuhiro Kurita, Yuto Nakashima, Kazuhisa Seto, Takeaki Uno

    Proceedings of the 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)   3:1 - 3:11   2023年6月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(国際会議プロシーディングス)  

  • Space-Efficient STR-IC-LCS Computation 査読 国際誌

    Yuuki Yonemoto, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai

    Proceedings of 47th International Conference on Current Trends in Theory and Practice of Informatics (SOFSEM 2023)   372 - 384   2023年1月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(国際会議プロシーディングス)  

  • Parameterized DAWGs: Efficient constructions and bidirectional pattern searches 査読

    Nakashima, K; Fujisato, N; Hendrian, D; Nakashima, Y; Yoshinaka, R; Inenaga, S; Bannai, H; Shinohara, A; Takeda, M

    THEORETICAL COMPUTER SCIENCE   933   21 - 42   2022年10月   ISSN:0304-3975 eISSN:1879-2294

     詳細を見る

    記述言語:英語   掲載種別:研究論文(学術雑誌)   出版者・発行元:Theoretical Computer Science  

    Two strings x and y over Σ∪Π of equal length are said to parameterized match (p-match) if there is a renaming bijection f:Σ∪Π→Σ∪Π that is identity on Σ and transforms x to y (or vice versa). The p-matching problem is to look for substrings in a text that p-match a given pattern. In this paper, we propose parameterized suffix automata (p-suffix automata) and parameterized directed acyclic word graphs (PDAWGs) which are the p-matching versions of suffix automata and DAWGs. While suffix automata and DAWGs are equivalent for standard strings, we show that p-suffix automata can have Θ(n2) nodes and edges but PDAWGs have only O(n) nodes and edges, where n is the length of an input string. We also give an O(n|Π|log⁡(|Π|+|Σ|))-time O(n)-space algorithm that builds the PDAWG in a left-to-right online manner. As a byproduct, it is shown that the parameterized suffix tree for the reversed string can also be built in the same time and space, in a right-to-left online manner. This duality also leads us to two further efficient algorithms for p-matching: Given the parameterized suffix tree for the reversal T‾ of the input string T, one can build the PDAWG of T in O(n) time in an offline manner; One can perform bidirectional p-matching in O(mlog⁡(|Π|+|Σ|)+occ) time using O(n) space, where m denotes the pattern length and occ is the number of pattern occurrences in the text T.

    DOI: 10.1016/j.tcs.2022.09.008

    Web of Science

    Scopus

    researchmap

  • Combinatorics of minimal absent words for a sliding window 査読

    Akagi, T; Kuhara, Y; Mieno, T; Nakashima, Y; Inenaga, S; Bannai, H; Takeda, M

    THEORETICAL COMPUTER SCIENCE   927   109 - 119   2022年6月   ISSN:0304-3975 eISSN:1879-2294

     詳細を見る

    記述言語:英語   掲載種別:研究論文(学術雑誌)   出版者・発行元:Theoretical Computer Science  

    A string w is called a minimal absent word (MAW) for another string T if w does not occur in T but the proper substrings of w occur in T. For example, let Σ={a,b,c} be the alphabet. Then, the set of MAWs for string w=abaab is {aaa,aaba,bab,bb,c}. In this paper, we study combinatorial properties of MAWs in the sliding window model, namely, how the set of MAWs changes when a sliding window of fixed length d is shifted over the input string T of length n, where 1≤d<n. We present tight upper and lower bounds on the maximum number of changes in the set of MAWs for a sliding window over T, both in the cases of general alphabets and binary alphabets. Our bounds improve on the previously known best bounds [Crochemore et al., 2020].

    DOI: 10.1016/j.tcs.2022.06.002

    Web of Science

    Scopus

    researchmap

  • Minimal Absent Words on Run-Length Encoded Strings 査読 国際誌

    Tooru Akagi, Kouta Okabe, Takuya Mieno, Yuto Nakashima, Shunsuke Inenaga

    Proceedings of the 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)   27:1 - 27:17   2022年6月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(国際会議プロシーディングス)  

  • c-trie++: A dynamic trie tailored for fast prefix searches 査読

    Kazuya Tsuruta, Dominik Köppl, Shunsuke Kanda, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

    Information and Computation   285   104794 - 104794   2022年5月   ISSN:0890-5401

     詳細を見る

    記述言語:英語   掲載種別:研究論文(学術雑誌)   出版者・発行元:Information and Computation  

    Given a dynamic set K of k strings of total length n whose characters are drawn from an alphabet of size σ, a keyword dictionary is a data structure built on K that provides lookup, prefix search, and update operations on K. Under the assumption that α=w/lg⁡σ characters fit into a single machine word of w bits, we propose a keyword dictionary that represents K in either nlg⁡σ+Θ(klg⁡n) or |T|lg⁡σ+Θ(kw) bits of space, where |T| is the number of nodes of a trie representing K. It supports all operations in O(m/α+lg⁡α) expected time on an input string of length m in the word RAM model. An evaluation of our implementation highlights the practical usefulness of the proposed data structure, especially for prefix searches — one of the most essential keyword dictionary operations.

    DOI: 10.1016/j.ic.2021.104794

    Scopus

    researchmap

  • Factorizing strings into repetitions 査読 国際誌

    Hiroe Inoue, Yoshiaki Matsuoka, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

    Theory of Computing Systems   66 ( 2 )   484 - 501   2022年4月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(学術雑誌)  

  • Palindromic trees for a sliding window and its applications 査読

    Mieno, T; Watanabe, K; Nakashima, Y; Inenaga, S; Bannai, H; Takeda, M

    INFORMATION PROCESSING LETTERS   173   106174 - 106174   2022年1月   ISSN:0020-0190 eISSN:1872-6119

     詳細を見る

    記述言語:英語   掲載種別:研究論文(学術雑誌)   出版者・発行元:Information Processing Letters  

    The palindromic tree (a.k.a. eertree) for a string S of length n is a tree-like data structure that represents the set of all distinct palindromic substrings of S, using O(n) space [Rubinchik and Shur, 2018]. It is known that, when S is over an alphabet of size σ and is given in an online manner, then the palindromic tree of S can be constructed in O(nlog⁡σ) time with O(n) space. In this paper, we consider the sliding window version of the problem: For a sliding window of length at most d, we present two versions of an algorithm which maintains the palindromic tree of size O(d) for every sliding window S[i..j] over S, where 1≤j−i+1≤d. The first version works in O(nlog⁡σ′) time with O(d) space where σ′≤d is the maximum number of distinct characters in the windows, and the second one works in O(n+dσ) time with (d+2)σ+O(d) space. We also show how our algorithms can be applied to efficient computation of minimal unique palindromic substrings (MUPS) and minimal absent palindromic words (MAPW) for a sliding window.

    DOI: 10.1016/j.ipl.2021.106174

    Web of Science

    Scopus

    researchmap

  • The Parameterized Suffix Tray 査読 国際誌

    Noriki Fujisato, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

    Proceedings of 12th International Conference on Algorithms and Complexity (CIAC 2021)   258 - 270   2021年10月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(国際会議プロシーディングス)  

  • Efficiently computing runs on a trie 査読

    Ryo Sugahara, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

    Theoretical Computer Science   887   143 - 151   2021年10月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(学術雑誌)  

    DOI: 10.1016/j.tcs.2021.07.011

  • Position Heaps for Cartesian-Tree Matching on Strings and Tries 査読 国際誌

    Akio Nishimoto, Noriki Fujisato, Yuto Nakashima, Shunsuke Inenaga

    Proceedings of 28th International Symposium in String Processing and Information Retrieval (SPIRE 2021)   241 - 254   2021年10月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(国際会議プロシーディングス)  

  • On the Approximation Ratio of LZ-End to LZ77 査読 国際誌

    Takumi Ideue, Takuya Mieno, Mitsuru Funakoshi, Yuto Nakashima, Shunsuke Inenaga, Masayuki Takeda

    Proceedings of 28th International Symposium in String Processing and Information Retrieval (SPIRE 2021)   114 - 126   2021年10月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(国際会議プロシーディングス)  

  • Longest Common Rollercoasters 査読 国際誌

    Kosuke Fujita, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

    Proceedinds of 28th International Symposium in String Processing and Information Retrieval (SPIRE 2021)   21 - 32   2021年10月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(国際会議プロシーディングス)  

  • Grammar Index by Induced Suffix Sorting 査読 国際誌

    Tooru Akagi, Dominik Köppl, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

    Proceedings of 28th International Symposium in String Processing and Information Retrieval (SPIRE 2021)   85 - 99   2021年10月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(国際会議プロシーディングス)  

  • Counting Lyndon Subsequence 査読

    Ryo Hirakawa, Yuto Nakashima, Shunsuke Inenaga, Masayuki Takeda

    Proceedings of the Prague Stringology Conference 2021 (PSC 2021)   53 - 60   2021年8月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(国際会議プロシーディングス)  

  • Computing Minimal Unique Substrings for a Sliding Window 査読

    Mieno, T; Fujishige, Y; Nakashima, Y; Inenaga, S; Bannai, H; Takeda, M

    ALGORITHMICA   84 ( 3 )   670 - 693   2021年8月   ISSN:0178-4617 eISSN:1432-0541

     詳細を見る

    記述言語:英語   掲載種別:研究論文(学術雑誌)   出版者・発行元:Algorithmica  

    A substring u of a string T is called a minimal unique substring (MUS) of T if u occurs exactly once in T and any proper substring of u occurs at least twice in T. In this paper, we study the problem of computing MUSs for a sliding window over a given string T. We first show how the set of MUSs can change when the window slides over T. We then present an O(nlog σ′) -time and O(d)-space algorithm to compute MUSs for a sliding window of size d over the input string T of length n, where σ′≤ d is the maximum number of distinct characters in every window.

    DOI: 10.1007/s00453-021-00864-1

    Web of Science

    Scopus

    researchmap

    その他リンク: https://link.springer.com/article/10.1007/s00453-021-00864-1/fulltext.html

  • Compressed Communication Complexity of Hamming Distance 査読

    Shiori Mitsuya, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

    Algorithms   14 ( 4 )   116 - 116   2021年4月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(学術雑誌)  

    DOI: 10.3390/a14040116

  • Computing longest palindromic substring after single-character or block-wise edits. 査読

    Mitsuru Funakoshi, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

    Theor. Comput. Sci.   859   116 - 133   2021年3月

     詳細を見る

    記述言語:その他   掲載種別:研究論文(学術雑誌)  

    DOI: 10.1016/j.tcs.2021.01.014

  • Space-efficient algorithms for computing minimal/shortest unique substrings. 査読

    Takuya Mieno, Dominik Köppl, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

    Theor. Comput. Sci.   845   230 - 242   2020年12月

     詳細を見る

    記述言語:その他   掲載種別:研究論文(学術雑誌)  

    DOI: 10.1016/j.tcs.2020.09.017

  • Towards Efficient Interactive Computation of Dynamic Time Warping Distance. 査読

    Akihiro Nishi, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

    Proceedings of 27th International Symposium on String Processing and Information Retrieval (SPIRE 2020)   27 - 41   2020年10月

     詳細を見る

    記述言語:その他   掲載種別:研究論文(その他学術会議資料等)  

    DOI: 10.1007/978-3-030-59212-7_3

  • On Repetitiveness Measures of Thue-Morse Words. 査読

    Kanaru Kutsukake, Takuya Matsumoto, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

    Proceedings of 27th International Symposium on String Processing and Information Retrieval (SPIRE 2020)   213 - 220   2020年10月

     詳細を見る

    記述言語:その他   掲載種別:研究論文(その他学術会議資料等)  

    DOI: 10.1007/978-3-030-59212-7_15

  • Lyndon Words, the Three Squares Lemma, and Primitive Squares. 査読

    Hideo Bannai, Takuya Mieno, Yuto Nakashima

    Proceedings of 27th International Symposium on String Processing and Information Retrieval (SPIRE 2020)   265 - 273   2020年10月

     詳細を見る

    記述言語:その他   掲載種別:研究論文(その他学術会議資料等)  

    DOI: 10.1007/978-3-030-59212-7_19

  • Grammar-compressed Self-index with Lyndon Words 査読

    Kazuya Tsuruta, Dominik Köppl, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

    情報処理学会論文誌数理モデル化と応用(TOM)   13 ( 2 )   84 - 92   2020年8月

     詳細を見る

    記述言語:英語  

    We introduce a new class of straight-line programs (SLPs), named the Lyndon SLP, inspired by the Lyndon trees (Barcelo, 1990). Based on this SLP, we propose a self-index data structure of O(g) words of spacethat can be built from a string T in O(n lg n) expected time, retrieving the starting positions of all occurrences of a pattern P of length m in O(m + lg m lg n + occ lg g) time, where n is the length of T, g is the size of the Lyndon SLP for T, and occ is the number of occurrences of P in T.We introduce a new class of straight-line programs (SLPs), named the Lyndon SLP, inspired by the Lyndon trees (Barcelo, 1990). Based on this SLP, we propose a self-index data structure of O(g) words of spacethat can be built from a string T in O(n lg n) expected time, retrieving the starting positions of all occurrences of a pattern P of length m in O(m + lg m lg n + occ lg g) time, where n is the length of T, g is the size of the Lyndon SLP for T, and occ is the number of occurrences of P in T.

  • Detecting k-(Sub-)Cadences and Equidistant Subsequence Occurrences

    Mitsuru Funakoshi, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda, Ayumi Shinohara

    31st Annual Symposium on Combinatorial Pattern Matching, CPM 2020 31st Annual Symposium on Combinatorial Pattern Matching, CPM 2020   2020年6月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(その他学術会議資料等)  

    The equidistant subsequence pattern matching problem is considered. Given a pattern string P and a text string T, we say that P is an equidistant subsequence of T if P is a subsequence of the text such that consecutive symbols of P in the occurrence are equally spaced. We can consider the problem of equidistant subsequences as generalizations of (sub-)cadences. We give bit-parallel algorithms that yield o(n2) time algorithms for finding k-(sub-)cadences and equidistant subsequences. Furthermore, O(n log2 n) and O(n log n) time algorithms, respectively for equidistant and Abelian equidistant matching for the case P = 3, are shown. The algorithms make use of a technique that was recently introduced which can efficiently compute convolutions with linear constraints. 2012 ACM Subject Classification Mathematics of computing ! Combinatorial algorithms.

    DOI: 10.4230/LIPIcs.CPM.2020.12

  • DAWGs for Parameterized Matching Online Construction and Related Indexing Structures

    Katsuhito Nakashima, Noriki Fujisato, Diptarama Hendrian, Yuto Nakashima, Ryo Yoshinaka, Shunsuke Inenaga, Hideo Bannai, Ayumi Shinohara, Masayuki Takeda

    31st Annual Symposium on Combinatorial Pattern Matching, CPM 2020 31st Annual Symposium on Combinatorial Pattern Matching, CPM 2020   2020年6月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(その他学術会議資料等)  

    Two strings x and y over of equal length are said to parameterized match (p-match) if there is a renaming bijection f : That is identity on and transforms x to y (or vice versa). The p-matching problem is to look for substrings in a text that p-match a given pattern. In this paper, we propose parameterized suffix automata (p-suffix automata) and parameterized directed acyclic word graphs (PDAWGs) which are the p-matching versions of suffix automata and DAWGs. While suffix automata and DAWGs are equivalent for standard strings, we show that p-suffix automata can have(n2) nodes and edges but PDAWGs have only O(n) nodes and edges, where n is the length of an input string. We also give O(n log( + ))-time O(n)-space algorithm that builds the PDAWG in a left-to-right online manner. As a byproduct, it is shown that the parameterized suffix tree for the reversed string can also be built in the same time and space, in a right-to-left online manner. 2012 ACM Subject Classification Theory of computation ! Pattern matching.

    DOI: 10.4230/LIPIcs.CPM.2020.26

  • Practical grammar compression based on maximal repeats 査読

    Isamu Furuya, Takuya Takagi, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Takuya Kida

    Algorithms   13 ( 4 )   2020年4月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(学術雑誌)  

    This study presents an analysis of RePair, which is a grammar compression algorithm known for its simple scheme, while also being practically effective. First, we show that the main process of RePair, that is, the step by step substitution of the most frequent symbol pairs, works within the corresponding most frequent maximal repeats. Then, we reveal the relation between maximal repeats and grammars constructed by RePair. On the basis of this analysis, we further propose a novel variant of RePair, called MR-RePair, which considers the one-time substitution of the most frequent maximal repeats instead of the consecutive substitution of the most frequent pairs. The results of the experiments comparing the size of constructed grammars and execution time of RePair and MR-RePair on several text corpora demonstrate that MR-RePair constructs more compact grammars than RePair does, especially for highly repetitive texts.

    DOI: 10.3390/A13040103

  • C-Trie++ A dynamic trie tailored for fast prefix searches

    Kazuya Tsuruta, Dominik Köppl, Shunsuke Kanda, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

    2020 Data Compression Conference, DCC 2020 Proceedings - DCC 2020 Data Compression Conference   243 - 252   2020年3月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(その他学術会議資料等)  

    Given a dynamic set K of k strings of total length n whose characters are drawn from an alphabet of size σ, a keyword dictionary is a data structure built on K that provides locate, prefix search, and update operations on K. Under the assumption that α = w / lg σ characters fit into a single machine word w, we propose a keyword dictionary that represents K in n lg σ + Θ(k lg n) bits of space, supporting all operations in Θ(m / α + lg α) expected time on an input string of length m in the word RAM model. This data structure is underlined with an exhaustive practical evaluation, highlighting the practical usefulness of the proposed data structure, especially for prefix searches - one of the most elementary keyword dictionary operations.

    DOI: 10.1109/DCC47342.2020.00032

  • Faster STR-EC-LCS Computation

    Kohei Yamada, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

    Proceedings of 46th International Conference on Current Trends in Theory and Practice of Computer Science, SOFSEM 2020   125 - 135   2020年1月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(その他学術会議資料等)  

    The longest common subsequence (LCS) problem is a central problem in stringology that finds the longest common subsequence of given two strings A and B. More recently, a set of four constrained LCS problems (called generalized constrained LCS problem) were proposed by Chen and Chao [J. Comb. Optim, 2011]. In this paper, we consider the substring-excluding constrained LCS (STR-EC-LCS) problem. A string Z is said to be an STR-EC-LCS of two given strings A and B excluding P if, Z is one of the longest common subsequences of A and B that does not contain P as a substring. Wang et al. proposed a dynamic programming solution which computes an STR-EC-LCS in O(mnr) time and space where [Inf. Process. Lett., 2013]. In this paper, we show a new solution for the STR-EC-LCS problem. Our algorithm computes an STR-EC-LCS in time where denotes the set of distinct characters occurring in both A and B, and L is the length of the STR-EC-LCS. This algorithm is faster than the O(mnr)-time algorithm for short/long STR-EC-LCS (namely, or), and is at least as efficient as the O(mnr)-time algorithm for all cases.

    DOI: 10.1007/978-3-030-38919-2_11

  • Fast Algorithms for the Shortest Unique Palindromic Substring Problem on Run-Length Encoded Strings 査読

    Kiichi Watanabe, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

    Theory of Computing Systems   2020年1月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(学術雑誌)  

    For a string S, a palindromic substring S[i.j] is said to be a shortest unique palindromic substring (SUPS) for an interval [s,t] in S, if S[i.j] occurs exactly once in S, the interval [i,j] contains [s,t], and every palindromic substring containing [s,t] which is shorter than S[i.j] occurs at least twice in S. In this paper, we study the problem of answering SUPS queries on run-length encoded strings. We show how to preprocess a given run-length encoded string RLES of size m in O(m) space and O(mlogσRLES+mlogm/loglogm) time so that all SUPSs for any subsequent query interval can be answered in O(logm/loglogm+α) time, where α is the number of outputs, and σRLES is the number of distinct runs of RLES. Additionaly, we consider a variant of the SUPS problem where a query interval is also given in a run-length encoded form. For this variant of the problem, we present two alternative algorithms with faster queries. The first one answers queries in O(loglogm/logloglogm+α) time and can be built in O(mlogσRLES+mlogm/loglogm) time, and the second one answers queries in O(log log m+ α) time and can be built in O(mlogσRLES) time. Both of these data structures require O(m) space.

    DOI: 10.1007/s00224-020-09980-x

  • Minimal Unique Substrings and Minimal Absent Words in a Sliding Window

    Takuya Mieno, Yuki Kuhara, Tooru Akagi, Yuta Fujishige, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

    Proceedings of 46th International Conference on Current Trends in Theory and Practice of Computer Science, SOFSEM 2020   148 - 160   2020年1月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(その他学術会議資料等)  

    A substring u of a string T is called a minimal unique substring (MUS) of T if u occurs exactly once in T and any proper substring of u occurs at least twice in T. A string w is called a minimal absent word (MAW) of T if w does not occur in T and any proper substring of w occurs in T. In this paper, we study the problems of computing MUSs and MAWs in a sliding window over a given string T. We first show how the set of MUSs can change in a sliding window over T, and present an-time and O(d)-space algorithm to compute MUSs in a sliding window of width d over T, where is the maximum number of distinct characters in every window. We then give tight upper and lower bounds on the maximum number of changes in the set of MAWs in a sliding window over T. Our bounds improve on the previous results in Crochemore et al. (2017).

    DOI: 10.1007/978-3-030-38919-2_13

  • An improved data structure for left-right maximal generic words problem

    Yuta Fujishige, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

    Proceedings of 30th International Symposium on Algorithms and Computation, ISAAC 2019   2019年12月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(その他学術会議資料等)  

    For a set D of documents and a positive integer d, a string w is said to be d-left-right maximal, if (1) w occurs in at least d documents in D, and (2) any proper superstring of w occurs in less than d documents. The left-right-maximal generic words problem is, given a set D of documents, to preprocess D so that for any string p and for any positive integer d, all the superstrings of p that are d-left-right maximal can be answered quickly. In this paper, we present an O(n log m) space data structure (in words) which answers queries in O(|p| + o log log m) time, where n is the total length of documents in D, m is the number of documents in D and o is the number of outputs. Our solution improves the previous one by Nishimoto et al. (PSC 2015), which uses an O(n log n) space data structure answering queries in O(|p| + r · log n + o · log2 n) time, where r is the number of right-extensions q of p occurring in at least d documents such that any proper right extension of q occurs in less than d documents.

    DOI: 10.4230/LIPIcs.ISAAC.2019.40

  • On the size of the smallest alphabet for Lyndon trees 査読

    Yuto Nakashima, Takuya Takagi, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

    Theoretical Computer Science   792   131 - 143   2019年11月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(学術雑誌)  

    We consider the problem of reverse-engineering the Lyndon tree, i.e., given a full binary ordered tree T with n leaves as input, we are to compute a string w of length n of which Lyndon tree is isomorphic to the input tree T. Hereby we call such a string a solution string. Although the problem is easily solvable in linear time for binary alphabets and unbounded-size alphabets, it is not known how to efficiently find the smallest alphabet size for a solution string. In this paper, we show several new observations concerning this problem. Namely, we show that: 1) For any positive integer n, there exists a full binary ordered tree T with n leaves, s.t. the smallest alphabet size of a solution string for T is ⌊ [Formula presented] ⌋+1. 2) For any full binary ordered tree T with n leaves, there exists a solution string w over an alphabet of size at most ⌊ [Formula presented] ⌋+1. 3) For any full binary ordered tree T, there exists a solution string w over an alphabet of size at most h+1, where h is the height of T. 4) For any complete binary ordered tree T with 2k leaves, there exists a solution string w over an alphabet of size at most 4.

    DOI: 10.1016/j.tcs.2018.06.044

  • Compact Data Structures for Shortest Unique Substring Queries

    Takuya Mieno, Dominik Köppl, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

    Proceedings of 26th International Symposium on String Processing and Information Retrieval, SPIRE 2019   107 - 123   2019年10月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(その他学術会議資料等)  

    Given a string T of length n, a substring of T is called a shortest unique substring (SUS) for an interval [s, t] if (a) u occurs exactly once in T, (b) u contains the interval [s, t] (i.e.), and (c) every substring v of T with containing [s, t] occurs at least twice in T. Given a query interval, the interval SUS problem is to output all the SUSs for the interval [s, t]. In this article, we propose a bits data structure answering an interval SUS query in output-sensitive time, where is the number of returned SUSs. Additionally, we focus on the point SUS problem, which is the interval SUS problem for. Here, we propose a bits data structure answering a point SUS query in the same output-sensitive time.

    DOI: 10.1007/978-3-030-32686-9_8

  • Direct Linear Time Construction of Parameterized Suffix and LCP Arrays for Constant Alphabets

    Noriki Fujisato, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

    Proceedings of 26th International Symposium on String Processing and Information Retrieval, SPIRE 2019   382 - 391   2019年10月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(その他学術会議資料等)  

    We present the first worst-case linear time algorithm that directly computes the parameterized suffix and LCP arrays for constant sized alphabets. Previous algorithms either required quadratic time or the parameterized suffix tree to be built first. More formally, for a string over static alphabet and parameterized alphabet, our algorithm runs in time and O(n) words of space, where is the number of distinct symbols of in the string.

    DOI: 10.1007/978-3-030-32686-9_27

  • On Longest Common Property Preserved Substring Queries

    Kazuki Kai, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda, Tomasz Kociumaka

    Proceedings of 26th International Symposium on String Processing and Information Retrieval, SPIRE 2019   162 - 174   2019年10月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(その他学術会議資料等)  

    We revisit the problem of longest common property preserving substring queries introduced by Ayad et al. (SPIRE 2018, arXiv 2018). We consider a generalized and unified on-line setting, where we are given a set X of k strings of total length n that can be pre-processed so that, given a query string y and a positive integer (Formula presented), we can determine the longest substring of y that satisfies some specific property and is common to at least (Formula presented) strings in X. Ayad et al. considered the longest square-free substring in an on-line setting and the longest periodic and palindromic substring in an off-line setting. In this paper, we give efficient solutions in the on-line setting for finding the longest common square, periodic, palindromic, and Lyndon substrings. More precisely, we show that X can be pre-processed in O(n) time resulting in a data structure of O(n) size that answers queries in (Formula presented) time and O(1) working space, where (Formula presented) is the size of the alphabet, and the common substring must be a square, a periodic substring, a palindrome, or a Lyndon word.

    DOI: 10.1007/978-3-030-32686-9_12

  • Computing maximal palindromes and distinct palindromes in a trie

    Mitsuru Funakoshi, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

    23rd Prague Stringology Conference, PSC 2019 Proceedings of the Prague Stringology Conference, PSC 2019   3 - 15   2019年8月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(その他学術会議資料等)  

    It is known that all maximal palindromes of a given string T of length n can be computed in O(n) time by Manacher's algorithm [J. ACM '75]. Also, all distinct palindromes in T can be computed in O(n) time [Groult et al., Inf. Process. Lett. 2010]. In this paper, we consider the problem of computing maximal palindromes and distinct palindromes of a given trie T (i.e. rooted edge-labeled tree). A trie is a natural generalization of a string which can be seen as a single path tree. We propose algorithms to compute all maximal palindromes and all distinct palindromes in T in O(N log h) time and O(N) space, where N is the number of edges in T and h is the height of T . To our knowledge these are the first sub-quadratic time solutions to these problems.

  • Shortest unique palindromic substring queries on run-length encoded strings

    Kiichi Watanabe, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

    Proceedings of 30th International Workshop on Combinatorial Algorithms, IWOCA 2019   430 - 441   2019年7月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(その他学術会議資料等)  

    For a string S, a palindromic substring S[i.j] is said to be a shortest unique palindromic substring (SUPS) for an interval [s, t] in S, if S[i.j] occurs exactly once in S, the interval [i, j] contains [s, t], and every palindromic substring containing [s, t] which is shorter than S[i.j] occurs at least twice in S. In this paper, we study the problem of answering SUPS queries on run-length encoded strings. We show how to preprocess a given run-length encoded string RLES of size m in O(m) space and O(mlog σRLES + m√log m/log logm) time so that all SUPSs for any subsequent query interval can be answered in O(√log m/log logm+ α) time, where α is the number of outputs, and σRLES is the number of distinct runs of RLES.

    DOI: 10.1007/978-3-030-25005-8_35

  • Computing runs on a trie

    Ryo Sugahara, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

    Proceedings of 30th Annual Symposium on Combinatorial Pattern Matching, CPM 2019   2019年6月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(その他学術会議資料等)  

    A maximal repetition, or run, in a string, is a maximal periodic substring whose smallest period is at most half the length of the substring. In this paper, we consider runs that correspond to a path on a trie, or in other words, on a rooted edge-labeled tree where the endpoints of the path must be a descendant/ancestor of the other. For a trie with n edges, we show that the number of runs is less than n. We also show an O(n√log n log log n) time and O(n) space algorithm for counting and finding the shallower endpoint of all runs. We further show an O(n log n) time and O(n) space algorithm for finding both endpoints of all runs. We also discuss how to improve the running time even more.

    DOI: 10.4230/LIPIcs.CPM.2019.23

  • On the size of overlapping Lempel-Ziv and Lyndon factorizations

    Yuki Urabe, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

    Proceedings of 30th Annual Symposium on Combinatorial Pattern Matching, CPM 2019   2019年6月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(その他学術会議資料等)  

    Lempel-Ziv (LZ) factorization and Lyndon factorization are well-known factorizations of strings. Recently, Kärkkäinen et al. studied the relation between the sizes of the two factorizations, and showed that the size of the Lyndon factorization is always smaller than twice the size of the non-overlapping LZ factorization [STACS 2017]. In this paper, we consider a similar problem for the overlapping version of the LZ factorization. Since the size of the overlapping LZ factorization is always smaller than the size of the non-overlapping LZ factorization and, in fact, can even be an O(log n) factor smaller, it is not immediately clear whether a similar bound as in previous work would hold. Nevertheless, in this paper, we prove that the size of the Lyndon factorization is always smaller than four times the size of the overlapping LZ factorization.

    DOI: 10.4230/LIPIcs.CPM.2019.29

  • Faster queries for longest substring palindrome after block edit

    Mitsuru Funakoshi, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

    Proceedings of 30th Annual Symposium on Combinatorial Pattern Matching, CPM 2019   2019年6月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(その他学術会議資料等)  

    Palindromes are important objects in strings which have been extensively studied from combinatorial, algorithmic, and bioinformatics points of views. Manacher [J. ACM 1975] proposed a seminal algorithm that computes the longest substring palindromes (LSPals) of a given string in O(n) time, where n is the length of the string. In this paper, we consider the problem of finding the LSPal after the string is edited. We present an algorithm that uses O(n) time and space for preprocessing, and answers the length of the LSPals in O(ℓ + log log n) time, after a substring in T is replaced by a string of arbitrary length ℓ. This outperforms the query algorithm proposed in our previous work [CPM 2018] that uses O(ℓ + log n) time for each query.

    DOI: 10.4230/LIPIcs.CPM.2019.27

  • The Parameterized Position Heap of a Trie

    Noriki Fujisato, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

    Proceedings of 11th International Conference on Algorithms and Complexity, CIAC 2019   237 - 248   2019年5月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(その他学術会議資料等)  

    Let and be disjoint alphabets of respective size and. Two strings over of equal length are said to parameterized match (p-match) if there is a bijection such that (1) f is identity on and (2) f maps the characters of one string to those of the other string so that the two strings become identical. We consider the p-matching problem on a (reversed) trie and a string pattern P such that every path that p-matches P has to be reported. Let N be the size of the given trie. In this paper, we propose the parameterized position heap for that occupies O(N) space and supports p-matching queries in time, where m is the length of a query pattern P and is the number of paths in to report. We also present an algorithm which constructs the parameterized position heap for a given trie in time and working space.

    DOI: 10.1007/978-3-030-17402-6_20

  • MR-RePair Grammar Compression Based on Maximal Repeats

    Isamu Furuya, Takuya Takagi, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Takuya Kida

    2019 Data Compression Conference, DCC 2019 Proceedings - DCC 2019 2019 Data Compression Conference   508 - 517   2019年3月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(その他学術会議資料等)  

    We analyze the grammar generation algorithm of the RePair compression algorithm and show the relation between a grammar generated by RePair and maximal repeats. We reveal that RePair replaces step by step the most frequent pairs within the corresponding most frequent maximal repeats. Then, we design a novel variant of RePair, called MR-RePair, which substitutes the most frequent maximal repeats at once instead of substituting the most frequent pairs consecutively. We implemented MR-RePair and compared the size of the grammar generated by MR-RePair to that by RePair on several text corpora. Our experiments show that MR-RePair generates more compact grammars than RePair does, especially for highly repetitive texts.

    DOI: 10.1109/DCC.2019.00059

  • Recovering, counting and enumerating strings from forward and backward suffix arrays

    Yuki Kuhara, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

    25th International Symposium on String Processing and Information Retrieval, SPIRE 2018 String Processing and Information Retrieval - 25th International Symposium, SPIRE 2018, Proceedings   254 - 267   2018年10月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(その他学術会議資料等)  

    The suffix array SAw of a string w of length n is a permutation of [1..n] such that SAw[i]=j iff w[j, n] is the lexicographically i-th suffix of w. In this paper, we consider variants of the reverse-engineering problem on suffix arrays with two given permutations P and Q of [1..n], such that P refers to the forward suffix array of some string w and Q refers to the backward suffix array of the reversed string wR. Our results are the following: (1) An algorithm which computes a solution string over an alphabet of the smallest size, in O(n) time. (2) The exact number of solution strings over an alphabet of size σ. (3) An efficient algorithm which computes all solution strings in the lexicographical order, in time near optimal up to log n factor.

    DOI: 10.1007/978-3-030-00479-8_21

  • Algorithms and combinatorial properties on shortest unique palindromic substrings 査読

    Hiroe Inoue, Yuto Nakashima, Takuya Mieno, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

    Journal of Discrete Algorithms   52-53   122 - 132   2018年9月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(学術雑誌)  

    A palindrome is a string that reads the same forward and backward. A palindromic substring P of a string S is called a shortest unique palindromic substring (SUPS) for an interval [s,t] in S, if P occurs exactly once in S, this occurrence of P contains interval [s,t], and every palindromic substring of S which contains interval [s,t] and is shorter than P occurs at least twice in S. The SUPS problem is, given a string S, to preprocess S so that for any subsequent query interval [s,t] all the SUPSs for interval [s,t] can be answered quickly. We present an optimal solution to this problem. Namely, we show how to preprocess a given string S of length n in O(n) time and space so that all SUPSs for any subsequent query interval can be answered in O(α+1) time, where α is the number of outputs. We also discuss the number of SUPSs in a string.

    DOI: 10.1016/j.jda.2018.11.009

  • Right-to-left online construction of parameterized position heaps

    Noriki Fujisato, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

    22nd Prague Stringology Conference, PSC 2018 Proceedings of the Prague Stringology Conference, PSC 2018   91 - 102   2018年8月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(その他学術会議資料等)  

    Two strings of equal length are said to parameterized match if there is a bijection that maps the characters of one string to those of the other string, so that two strings become identical. The parameterized pattern matching problem is, given two strings T and P, to find the occurrences of substrings in T that parameterized match P. Diptarama et al. [Position Heaps for Parameterized Strings, CPM 2017] proposed an indexing data structure called parameterized position heaps, and gave a left-toright online construction algorithm. In this paper, we present a right-to-left online construction algorithm for parameterized position heaps. For a text string T of length n over two kinds of alphabets Σand II of respective size σ and π, our construction algorithm runs in O(n log(σ+π)) time with O(n) space. Our right-to-left parameterized position heaps support pattern matching queries in O(mlog(σ+π)+mπ+pocc)) time, where m is the length of a query pattern P and pocc is the number of occurrences to report. Our construction and pattern matching algorithms are as efficient as Diptarama et al.'s algorithms.

  • O(n log n)-time text compression by LZ-style longest first substitution

    Akihiro Nishi, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

    22nd Prague Stringology Conference, PSC 2018 Proceedings of the Prague Stringology Conference, PSC 2018   12 - 26   2018年8月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(その他学術会議資料等)  

    Mauer et al. [A Lempel-Ziv-style Compression Method for Repetitive Texts, PSC 2017] proposed a hybrid text compression method called LZ-LFS which has both features of Lempel-Ziv 77 factorization and longest first substitution. They showed that LZ-LFS can achieve better compression ratio for repetitive texts, compared to some state-of-the-art compression algorithms. The drawback of Mauer et al.'s method is that their LZ-LFS compression algorithm takes O(n2) time on an input string of length n. In this paper, we show a faster LZ-LFS compression algorithm that works in O(n log n) time. We also propose a simpler version of LZ-LFS that can be computed in O(n) time.

  • Faster online elastic degenerate string matching

    Kotaro Aoyama, Yuto Nakashima, I. Tomohiro, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

    29th Annual Symposium on Combinatorial Pattern Matching, CPM 2018 29th Annual Symposium on Combinatorial Pattern Matching, CPM 2018   91 - 910   2018年5月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(その他学術会議資料等)  

    An Elastic-Degenerate String [Iliopoulus et al., LATA 2017] is a sequence of sets of strings, which was recently proposed as a way to model a set of similar sequences. We give an online algorithm for the Elastic-Degenerate String Matching (EDSM) problem that runs in O(nm √mlogm+ N) time and O(m) working space, where n is the number of elastic degenerate segments of the text, N is the total length of all strings in the text, and m is the length of the pattern. This improves the previous algorithm by Grossi et al. [CPM 2017] that runs in O(nm2 + N) time.

    DOI: 10.4230/LIPIcs.CPM.2018.9

  • Lyndon factorization of grammar compressed texts revisited

    Isamu Furuya, Yuto Nakashima, I. Tomohiro, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

    29th Annual Symposium on Combinatorial Pattern Matching, CPM 2018 29th Annual Symposium on Combinatorial Pattern Matching, CPM 2018   241 - 2410   2018年5月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(その他学術会議資料等)  

    We revisit the problem of computing the Lyndon factorization of a string w of length N which is given as a straight line program (SLP) of size n. For this problem, we show a new algorithm which runs in O(P(n,N) + Q(n,N)n log logN) time and O(n logN + S(n,N)) space where P(n,N), S(n,N), Q(n,N) are respectively the pre-processing time, space, and query time of a data structure for longest common extensions (LCE) on SLPs. Our algorithm improves the algorithm proposed by I et al. (TCS '17), and can be more efficient than the O(N)-time solution by Duval (J. Algorithms '83) when w is highly compressible.

    DOI: 10.4230/LIPIcs.CPM.2018.24

  • Longest substring palindrome after edit

    Mitsuru Funakoshi, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

    29th Annual Symposium on Combinatorial Pattern Matching, CPM 2018 29th Annual Symposium on Combinatorial Pattern Matching, CPM 2018   105   121 - 1214   2018年5月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(その他学術会議資料等)  

    It is known that the length of the longest substring palindromes (LSPals) of a given string T of length n can be computed in O(n) time by Manacher's algorithm [J. ACM '75]. In this paper, we consider the problem of finding the LSPal after the string is edited. We present an algorithm that uses O(n) time and space for preprocessing, and answers the length of the LSPals in O(log(min{ω, log n})) time after single character substitution, insertion, or deletion, where ω denotes the number of distinct characters appearing in T. We also propose an algorithm that uses O(n) time and space for preprocessing, and answers the length of the LSPals in O(ℓ+log n) time, after an existing substring in T is replaced by a string of arbitrary length ℓ.

    DOI: 10.4230/LIPIcs.CPM.2018.12

  • Longest lyndon substring after edit

    Yuki Urabe, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

    29th Annual Symposium on Combinatorial Pattern Matching, CPM 2018 29th Annual Symposium on Combinatorial Pattern Matching, CPM 2018   191 - 1910   2018年5月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(その他学術会議資料等)  

    The longest Lyndon substring of a string T is the longest substring of T which is a Lyndon word. LLS(T) denotes the length of the longest Lyndon substring of a string T. In this paper, we consider computing LLS(T′) where T′ is an edited string formed from T. After O(n) time and space preprocessing, our algorithm returns LLS(T′) in O(log n) time for any single character edit. We also consider a version of the problem with block edits, i.e., a substring of T is replaced by a given string of length l. After O(n) time and space preprocessing, our algorithm returns LLS(T′) in O(l log σ + log n) time for any block edit where σ is the number of distinct characters in T. We can modify our algorithm so as to output all the longest Lyndon substrings of T′ for both problems.

    DOI: 10.4230/LIPIcs.CPM.2018.19

  • Shortest unique palindromic substring queries in optimal time

    Yuto Nakashima, Hiroe Inoue, Takuya Mieno, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

    28th International Workshop on Combinational Algorithms, IWOCA 2017 Combinatorial Algorithms - 28th International Workshop, IWOCA 2017, Revised Selected Papers   397 - 408   2018年1月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(その他学術会議資料等)  

    palindrome is a string that reads the same forward and backward. A palindromic substring P of a string S is called a shortest unique palindromic substring (SUPS) for an interval [s, t] in S, if P occurs exactly once in S, this occurrence of P contains interval [s, t], and every palindromic substring of S which contains interval [s, t] and is shorter than P occurs at least twice in S. The SUPS problem is, given a string S, to preprocess S so that for any subsequent query interval [s, t] all the SUPSs for interval [s, t] can be answered quickly. We present an optimal solution to this problem. Namely, we show how to preprocess a given string S of length n in O(n) time and space so that all SUPSs for any subsequent query interval can be answered in O(α + 1) time, where α is the number of outputs.

    DOI: 10.1007/978-3-319-78825-8_32

  • Almost linear time computation of maximal repetitions in run length encoded strings

    Yuta Fujishige, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

    28th International Symposium on Algorithms and Computation, ISAAC 2017 28th International Symposium on Algorithms and Computation, ISAAC 2017   2017年12月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(その他学術会議資料等)  

    We consider the problem of computing all maximal repetitions contained in a string that is given in run-length encoding. Given a run-length encoding of a string, we show that the maximum number of maximal repetitions contained in the string is at most m+k-1, where m is the size of the run-length encoding, and k is the number of run-length factors whose exponent is at least 2. We also show an algorithm for computing all maximal repetitions in O(m α (m)) time and O(m) space, where α denotes the inverse Ackermann function.

    DOI: 10.4230/LIPIcs.ISAAC.2017.33

  • The "runs" theorem 査読

    Hideo Bannai, Tomohiro I, Shunsuke Inenaga, Yuto Nakashima, Masayuki Takeda, Kazuya Tsuruta

    SIAM Journal on Computing   46 ( 5 )   1501 - 1514   2017年9月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(学術雑誌)  

    We give a new characterization of maximal repetitions (or runs) in strings based on Lyndon words. The characterization leads to a proof of what was known as the "runs" conjecture [R. M. Kolpakov andG.Kucherov, Proceedings of the IEEE Symposium on Foundations of Computer Science (FOCS), IEEE Computer Society, Los Alamitos, CA, 1999, pp. 596-604]), which states that the maximum number of runs ρ(n) in a string of length n is less than n. The proof is remarkably simple, considering the numerous endeavors to tackle this problem in the last 15 years, and significantly improves our understanding of how runs can occur in strings. In addition, we obtain an upper bound of 3n for the maximum sum of exponents σ(n) of runs in a string of length n, improving on the best known bound of 4.1n by Crochemore et al. [J. Discrete Algorithms, 14 (2012), pp. 29-36], as well as other improved bounds on related problems. The characterization also gives rise to a new, conceptually simple linear-time algorithm for computing all the runs in a string. A notable characteristic of our algorithm is that, unlike all existing linear-time algorithms, it does not utilize the Lempel-Ziv factorization of the string. We also establish a relationship between runs and nodes of the Lyndon tree, which gives a simple optimal solution to the 2-period query problem that was recently solved by Kociumaka et al. [Proceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, (SODA) 2015, San Diego, CA, SIAM, Philadelphia, 2015, pp. 532-551].

    DOI: 10.1137/15M1011032

  • On reverse engineering the Lyndon tree

    Yuto Nakashima, Takuya Takagi, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

    21st Prague Stringology Conference, PSC 2017 Proceedings of the Prague Stringology Conference, PSC 2017   108 - 117   2017年8月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(その他学術会議資料等)  

    We consider the problem of reverse engineering the Lyndon tree, i.e., given a full binary ordered tree T with n leaves as input, compute a string w of length n for which it’s Lyndon tree is isomorphic to the input tree. Although the problem is easy and solvable in linear time when assuming a binary alphabet or when there is no limit on the alphabet size, how to efficiently find the smallest alphabet size for a solution string is not known. We show several new observations concerning this problem. Namely, we show that: 1) For any full binary ordered tree T, there exists a solution string w over an alphabet of size at most h + 1, where h is the height of T. 2) For any positive n, there exists a full binary ordered tree T with n leaves, s.t. the smallest alphabet size of the solution string for T is ⌊n2 ⌋ + 1.

  • Inferring strings from Lyndon factorization 査読

    Yuto Nakashima, Takashi Okabe, Tomohiro I, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

    Theoretical Computer Science   689   147 - 156   2017年8月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(学術雑誌)  

    The Lyndon factorization of a string w is a unique factorization ℓ1p1,…,ℓmpm of w such that ℓ1,…,ℓm is a sequence of Lyndon words that is monotonically decreasing in lexicographic order. In this paper, we consider the reverse-engineering problem on Lyndon factorization: Given a sequence S=((s1,p1),…,(sm,pm)) of ordered pairs of positive integers, find a string w whose Lyndon factorization corresponds to the input sequence S, i.e., the Lyndon factorization of w is in a form of ℓ1p1,…,ℓmpm with |ℓi|=si for all 1≤i≤m. Firstly, we show that there exists a simple O(n)-time algorithm if the size of the alphabet is unbounded, where n is the length of the output string. Secondly, we present an O(n)-time algorithm to compute a string over an alphabet of the smallest size. Thirdly, we show how to compute only the size of the smallest alphabet in O(m) time. Fourthly, we give an O(m)-time algorithm to compute an O(m)-size representation of a string over an alphabet of the smallest size. Finally, we propose an efficient algorithm to enumerate all strings whose Lyndon factorizations correspond to S.

    DOI: 10.1016/j.tcs.2017.05.038

  • On the size of Lempel-Ziv and Lyndon factorizations

    Juha Kärkkäinen, Dominik Kempa, Yuto Nakashima, Simon J. Puglisi, Arseny M. Shur

    34th Symposium on Theoretical Aspects of Computer Science, STACS 2017 34th Symposium on Theoretical Aspects of Computer Science, STACS 2017   2017年3月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(その他学術会議資料等)  

    Lyndon factorization and Lempel-Ziv (LZ) factorization are both important tools for analysing the structure and complexity of strings, but their combinatorial structure is very different. In this paper, we establish the first direct connection between the two by showing that while the Lyndon factorization can be bigger than the non-overlapping LZ factorization (which we demonstrate by describing a new, non-trivial family of strings) it is always less than twice the size.

    DOI: 10.4230/LIPIcs.STACS.2017.45

  • Faster Lyndon factorization algorithms for SLP and LZ78 compressed text 査読

    Tomohiro I, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

    Theoretical Computer Science   656   215 - 224   2016年12月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(学術雑誌)  

    We present two efficient algorithms which, given a compressed representation of a string w of length N, compute the Lyndon factorization of w. Given a straight line program (SLP) S of size n that describes w, the first algorithm runs in O(n2+P(n,N)+Q(n,N)nlog⁡n) time and O(n2+S(n,N)) space, where P(n,N), S(n,N), Q(n,N) are respectively the pre-processing time, space, and query time of a data structure for longest common extensions (LCE) on SLPs. Given the Lempel–Ziv 78 encoding of size s for w, the second algorithm runs in O(slog⁡s) time and space.

    DOI: 10.1016/j.tcs.2016.03.005

  • Longest common Abelian factors and large alphabets

    Golnaz Badkobeh, Travis Gagie, Szymon Grabowski, Yuto Nakashima, Simon J. Puglisi, Shiho Sugimoto

    23rd International Symposium on String Processing and Information Retrieval, SPIRE 2016 String Processing and Information Retrieval - 23rd International Symposium, SPIRE 2016, Proceedings   254 - 259   2016年10月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(その他学術会議資料等)  

    Two strings X and Y are considered Abelian equal if the letters of X can be permuted to obtain Y (and vice versa). Recently, Alatabbi et al. (2015) considered the longest common Abelian factor problem in which we are asked to find the length of the longest Abelian-equal factor present in a given pair of strings. They provided an algorithm that uses O(σn2) time and O(σn) space, where n is the length of the pair of strings and σ is the alphabet size. In this paper we describe an algorithm that uses O(n2 log2 n log∗ n) time and O(n log2 n) space, significantly improving Alatabbi et al.’s result unless the alphabet is small. Our algorithm makes use of techniques for maintaining a dynamic set of strings under split, join, and equality testing (Melhorn et al., Algorithmica 17(2), 1997).

    DOI: 10.1007/978-3-319-46049-9_24

  • Computing smallest and largest repetition factorizations in O(n log n) time

    Hiroe Inoue, Yoshiaki Matsuoka, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

    20th Prague Stringology Conference, PSC 2016 Proceedings of the Prague Stringology Conference, PSC 2016   135 - 145   2016年8月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(その他学術会議資料等)  

    A factorization f1, . . . , fm of a string w is called a repetition factorization of w if each factor fi is a repetition, namely, fi = xkx' for some non-empty string x, an integer k ≥ 2, and x' being a proper prefix of x. Dumitran et al. (Proc. SPIRE 2015) proposed an algorithm which computes a repetition factorization of a given string w in O(n) time, where n is the length of w. In this paper, we propose two algorithms which compute smallest/largest repetition factorizations in O(n log n) time. The first algorithm is a simple O(n log n) space algorithm while the second one uses only O(n) space.

  • Constructing LZ78 tries and position heaps in linear time for large alphabets 査読

    Yuto Nakashima, Tomohiro I, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

    Information Processing Letters   115 ( 9 )   655 - 659   2015年9月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(学術雑誌)  

    We present the first worst-case linear-time algorithm to compute the Lempel-Ziv 78 factorization of a given string over an integer alphabet. Our algorithm is based on nearest marked ancestor queries on the suffix tree of the given string. We also show that the same technique can be used to construct the position heap of a set of strings in worst-case linear time, when the set of strings is given as a trie.

    DOI: 10.1016/j.ipl.2015.04.002

  • Computing left-right maximal generic words

    Takaaki Nishimoto, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

    19th Prague Stringology Conference, PSC 2015 Proceedings of the Prague Stringology Conference 2015, PSC 2015   5 - 16   2015年1月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(その他学術会議資料等)  

    The maximal generic words problem was proposed by Kucherov et al. (SPIRE 2012). Let D be a set of documents. In this problem, given a pattern P and a threshold δ ≤ |D|, we want to compute all right-maximal extensions of P which occur in at least δ distinct documents. They proposed an O(n)-space data structure which can solve this problem in O(|P| + rocc) time where n is the total length of documents in D and rocc is the number of right-maximal extensions of P. The data structure can be constructed in O(n) time. In this paper, we propose a more generalized problem. Given a pattern P and a threshold δ ≤ |D|, we want to compute all left-right-maximal extensions of P which occur in at least δ distinct documents. We propose an O(n log n)- space data structure which can solve this problem in O(|P| + occ log2 n + rocc log n) time where occ is the number of left-right-maximal extensions of P.

  • A new characterization of maximal repetitions by Lyndon trees

    Hideo Bannai, Tomohiro I, Shunsuke Inenaga, Yuto Nakashima, Masayuki Takeda, Kazuya Tsuruta

    26th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2015 Proceedings of the 26th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2015   562 - 571   2015年1月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(その他学術会議資料等)  

    We give a new characterization of maximal repetitions (or runs) in strings, using a tree defined on recursive standard factorizations of Lyndon words, called the Lyndon tree. The characterization leads to a remarkably simple novel proof of the linearity of the maximum number of runs p(n) in a string of length n. Furthermore, we show an upper bound of p(n) < 1.5n, which improves on the best upper bound 1.6n (Crochemore & Hie 2008) that does not rely on computational verification. The proof also gives rise to a new, conceptually simple linear-time algorithm for computing all the runs in a string. A notable characteristic of our algorithm is that, unlike all existing linear-time algorithms, it does not utilize the Lempel-Ziv factorization of the string.

    DOI: 10.1137/1.9781611973730.38

  • Inferring strings from Lyndon factorization

    Yuto Nakashima, Takashi Okabe, I. Tomohiro, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

    39th International Symposium on Mathematical Foundations of Computer Science, MFCS 2014 Mathematical Foundations of Computer Science 2014 - 39th International Symposium, MFCS 2014, Proceedings   565 - 576   2014年1月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(その他学術会議資料等)  

    The Lyndon factorization of a string w is a unique factorization ℓp11,⋯, ℓpmm of w s.t. ℓ1,⋯, ℓm is a sequence of Lyndon words that is monotonically decreasing in lexicographic order. In this paper, we consider the reverse-engineering problem on Lyndon factorization: Given a sequence S = ((s1, p1),⋯, (sm, p m)) of ordered pairs of positive integers, find a string w whose Lyndon factorization corresponds to the input sequence S, i.e., the Lyndon factorization of w is in a form of ℓp11,⋯, ℓpmm with |ℓi| = si for all 1 ≤ i ≤ m. Firstly, we show that there exists a simple O(n)-time algorithm if the size of the alphabet is unbounded, where n is the length of the output string. Secondly, we present an O(n)-time algorithm to compute a string over an alphabet of the smallest size. Thirdly, we show how to compute only the size of the smallest alphabet in O(m) time. Fourthly, we give an O(m)-time algorithm to compute an O(m)-size representation of a string over an alphabet of the smallest size. Finally, we propose an efficient algorithm to enumerate all strings whose Lyndon factorizations correspond to S.

    DOI: 10.1007/978-3-662-44465-8_48

  • Faster lyndon factorization algorithms for SLP and LZ78 compressed text

    Tomohiro I, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

    20th International Symposium on String Processing and Information Retrieval, SPIRE 2013 String Processing and Information Retrieval - 20th International Symposium, SPIRE 2013, Proceedings   174 - 185   2013年10月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(その他学術会議資料等)  

    We present two efficient algorithms which, given a compressed representation of a string w of length N, compute the Lyndon factorization of w. Given a straight line program (SLP) S of size n and height h that describes w, the first algorithm runs in O(nh(n + log N log n)) time and O(n2) space. Given the Lempel-Ziv 78 encoding of size s for w, the second algorithm runs in O(s log s) time and space.

    DOI: 10.1007/978-3-319-02432-5_21

  • Efficient Lyndon factorization of grammar compressed text

    Tomohiro I, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

    24th Annual Symposium on Combinatorial Pattern Matching, CPM 2013 Combinatorial Pattern Matching - 24th Annual Symposium, CPM 2013, Proceedings   153 - 164   2013年1月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(その他学術会議資料等)  

    We present an algorithm for computing the Lyndon factorization of a string that is given in grammar compressed form, namely, a Straight Line Program (SLP). The algorithm runs in O(n4 + mn3 h) time and O(n 2) space, where m is the size of the Lyndon factorization, n is the size of the SLP, and h is the height of the derivation tree of the SLP. Since the length of the decompressed string can be exponentially large w.r.t. n, m and h, our result is the first polynomial time solution when the string is given as SLP.

    DOI: 10.1007/978-3-642-38905-4_16

  • The position heap of a trie

    Yuto Nakashima, Tomohiro I, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

    19th International Symposium on String Processing and Information Retrieval, SPIRE 2012 String Processing and Information Retrieval - 19th International Symposium, SPIRE 2012, Proceedings   360 - 371   2012年10月

     詳細を見る

    記述言語:英語   掲載種別:研究論文(その他学術会議資料等)  

    The position heap is a text indexing structure for a single text string, recently proposed by Ehrenfeucht et al. [Position heaps: A simple and dynamic text indexing data structure, Journal of Discrete Algorithms, 9(1):100-121, 2011]. In this paper we introduce the position heap for a set of strings, and propose an efficient algorithm to construct the position heap for a set of strings which is given as a trie. For a fixed alphabet our algorithm runs in time linear in the size of the trie. We also show that the position heap can be efficiently updated after addition/removal of a leaf of the input trie.

    DOI: 10.1007/978-3-642-34109-0-38

▼全件表示

講演・口頭発表等

  • Optimal LZ-End Parsing Is Hard 国際会議

    Hideo Bannai, Mitsuru Funakoshi, Kazuhiro Kurita, Yuto Nakashima, Kazuhisa Seto, Takeaki Uno

    34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)  2023年6月 

     詳細を見る

    開催年月日: 2023年6月

    記述言語:英語   会議種別:口頭発表(一般)  

    開催地:Marne-la-Vallée   国名:フランス共和国  

  • Space-Efficient STR-IC-LCS Computation 国際会議

    Yuuki Yonemoto, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai

    48th International Conference on Current Trends in Theory and Practice of Computer Science (SOFSEM 2023)  2023年1月 

     詳細を見る

    開催年月日: 2023年1月 - 2020年1月

    記述言語:英語  

    開催地:Nový Smokovec   国名:スロバキア共和国  

  • Minimal Absent Words on Run-Length Encoded Strings 国際会議

    Tooru Akagi, Kouta Okabe, Takuya Mieno, Yuto Nakashima, Shunsuke Inenaga

    33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)  2022年6月 

     詳細を見る

    開催年月日: 2022年6月

    記述言語:英語   会議種別:口頭発表(一般)  

    開催地:Czech Technical University   国名:チェコ共和国  

  • Longest Common Rollercoasters 国際会議

    Kosuke Fujita, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

    28th International Symposium on String Processing and Information Retrieval (SPIRE 2021)  2021年10月 

     詳細を見る

    開催年月日: 2021年10月

    記述言語:英語  

    開催地:Online   国名:フランス共和国  

  • On the approximation ratio of LZ-End to LZ77 国際会議

    Takumi Ideue, Takuya Mieno, Mitsuru Funakoshi, Yuto Nakashima, Shunsuke Inenaga, Masayuki Takeda

    28th International Symposium on String Processing and Information Retrieval (SPIRE 2021)  2021年10月 

     詳細を見る

    開催年月日: 2021年10月

    記述言語:英語  

    開催地:Online   国名:フランス共和国  

  • Position Heaps for Cartesian-tree Matching on Strings and Tries 国際会議

    Akio Nishimoto, Noriki. Fujisato, Yuto Nakashima, Shunsuke Inenaga

    28th International Symposium on String Processing and Information Retrieval (SPIRE 2021)  2021年10月 

     詳細を見る

    開催年月日: 2021年10月

    記述言語:英語  

    開催地:Online   国名:フランス共和国  

  • Grammar Index By Induced Suffix Sorting 国際会議

    Tooru Akagi, Dominik Köppl, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

    28th International Symposium on String Processing and Information Retrieval (SPIRE 2021)  2021年10月 

     詳細を見る

    開催年月日: 2021年10月

    記述言語:英語  

    開催地:Online   国名:フランス共和国  

  • Counting Lyndon Subsequence 国際会議

    Ryo Hirakawa, Yuto Nakashima, Shunsuke Inenaga, Masayuki Takeda

    Prague Stringology Conference 2021 (PSC 2021)  2021年8月 

     詳細を見る

    開催年月日: 2021年8月

    記述言語:英語  

    開催地:Online   国名:チェコ共和国  

  • The Parameterized Suffix Tray 国際会議

    Noriki Fujisato, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda

    12th International Conference on Algorithms and Complexity (CIAC 2021)  2021年5月 

     詳細を見る

    開催年月日: 2021年5月

    記述言語:英語  

    開催地:Online   国名:キプロス共和国  

  • On repetitiveness measures of Thue-Morse words 国際会議

    Kanaru Kutsukake, Takuya Matsumoto, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

    27th International Symposium on String Processing and Information Retrieval (SPIRE 2020)  2020年10月 

     詳細を見る

    開催年月日: 2020年10月

    記述言語:英語  

    開催地:Online   国名:アメリカ合衆国  

  • Lyndon Words, the Three Squares Lemma, and Primitive Squares 国際会議

    Hideo Bannai, Takuya Mieno, Yuto Nakashima

    27th International Symposium on String Processing and Information Retrieval (SPIRE 2020)  2020年10月 

     詳細を見る

    開催年月日: 2020年10月

    記述言語:英語   会議種別:口頭発表(一般)  

    開催地:Online   国名:アメリカ合衆国  

  • Towards Efficient Interactive Computation of Dynamic Time Warping Distance 国際会議

    Akihiro Nishi, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

    27th International Symposium on String Processing and Information Retrieval (SPIRE 2020)  2020年10月 

     詳細を見る

    開催年月日: 2020年10月

    記述言語:英語  

    開催地:Online   国名:アメリカ合衆国  

  • DAWGs for Parameterized Matching: Online Construction and Related Indexing Structures 国際会議

    Katsuhito Nakashima, Noriki Fujisato, Diptarama Hendrian, Yuto Nakashima, Ryo Yoshinaka, Shunsuke Inenaga, Hideo Bannai, Ayumi Shinohara, Masayuki Takeda

    31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020)  2020年6月 

     詳細を見る

    開催年月日: 2020年6月

    記述言語:英語   会議種別:口頭発表(一般)  

    開催地:Online   国名:デンマーク王国  

  • Detecting k-(Sub-)Cadences and Equidistant Subsequence Occurrences 国際会議

    Mitsuru Funakoshi, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda, Ayumi Shinohara

    31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020)  2020年6月 

     詳細を見る

    開催年月日: 2020年6月

    記述言語:英語   会議種別:口頭発表(一般)  

    開催地:Online   国名:デンマーク王国  

  • c-Trie++: A Dynamic Trie Tailored for Fast Prefix Searches 国際会議

    Kazuya Tsuruta, Dominik Köppl, Shunsuke Kanda, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

    Data Compression Conference 2020  2020年3月 

     詳細を見る

    開催年月日: 2020年3月

    記述言語:英語  

    開催地:オンライン会議   国名:その他  

  • Faster STR-EC-LCS Computation 国際会議

    Kohei Yamada, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

    46th International Conference on Current Trends in Theory and Practice of Computer Science, SOFSEM 2020  2020年1月 

     詳細を見る

    開催年月日: 2020年1月

    記述言語:英語  

    開催地:Limassol   国名:キプロス共和国  

    The longest common subsequence (LCS) problem is a central problem in stringology that finds the longest common subsequence of given two strings A and B. More recently, a set of four constrained LCS problems (called generalized constrained LCS problem) were proposed by Chen and Chao [J. Comb. Optim, 2011]. In this paper, we consider the substring-excluding constrained LCS (STR-EC-LCS) problem. A string Z is said to be an STR-EC-LCS of two given strings A and B excluding P if, Z is one of the longest common subsequences of A and B that does not contain P as a substring. Wang et al. proposed a dynamic programming solution which computes an STR-EC-LCS in O(mnr) time and space where [Inf. Process. Lett., 2013]. In this paper, we show a new solution for the STR-EC-LCS problem. Our algorithm computes an STR-EC-LCS in time where denotes the set of distinct characters occurring in both A and B, and L is the length of the STR-EC-LCS. This algorithm is faster than the O(mnr)-time algorithm for short/long STR-EC-LCS (namely, or), and is at least as efficient as the O(mnr)-time algorithm for all cases.

  • Minimal Unique Substrings and Minimal Absent Words in a Sliding Window 国際会議

    Takuya Mieno, Yuki Kuhara, Tooru Akagi, Yuta Fujishige, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

    46th International Conference on Current Trends in Theory and Practice of Computer Science, SOFSEM 2020  2020年1月 

     詳細を見る

    開催年月日: 2020年1月

    記述言語:英語  

    開催地:Limassol   国名:キプロス共和国  

    A substring u of a string T is called a minimal unique substring (MUS) of T if u occurs exactly once in T and any proper substring of u occurs at least twice in T. A string w is called a minimal absent word (MAW) of T if w does not occur in T and any proper substring of w occurs in T. In this paper, we study the problems of computing MUSs and MAWs in a sliding window over a given string T. We first show how the set of MUSs can change in a sliding window over T, and present an-time and O(d)-space algorithm to compute MUSs in a sliding window of width d over T, where is the maximum number of distinct characters in every window. We then give tight upper and lower bounds on the maximum number of changes in the set of MAWs in a sliding window over T. Our bounds improve on the previous results in Crochemore et al. (2017).

  • An improved data structure for left-right maximal generic words problem 国際会議

    Yuta Fujishige, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

    30th International Symposium on Algorithms and Computation, ISAAC 2019  2019年12月 

     詳細を見る

    開催年月日: 2019年12月

    記述言語:英語  

    開催地:Shanghai   国名:中華人民共和国  

    For a set D of documents and a positive integer d, a string w is said to be d-left-right maximal, if (1) w occurs in at least d documents in D, and (2) any proper superstring of w occurs in less than d documents. The left-right-maximal generic words problem is, given a set D of documents, to preprocess D so that for any string p and for any positive integer d, all the superstrings of p that are d-left-right maximal can be answered quickly. In this paper, we present an O(n log m) space data structure (in words) which answers queries in O(|p| + o log log m) time, where n is the total length of documents in D, m is the number of documents in D and o is the number of outputs. Our solution improves the previous one by Nishimoto et al. (PSC 2015), which uses an O(n log n) space data structure answering queries in O(|p| + r · log n + o · log2 n) time, where r is the number of right-extensions q of p occurring in at least d documents such that any proper right extension of q occurs in less than d documents.

  • Compact Data Structures for Shortest Unique Substring Queries 国際会議

    Takuya Mieno, Dominik Köppl, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

    26th International Symposium on String Processing and Information Retrieval, SPIRE 2019  2019年10月 

     詳細を見る

    開催年月日: 2019年10月

    記述言語:英語  

    開催地:Segovia   国名:スペイン  

    Given a string T of length n, a substring of T is called a shortest unique substring (SUS) for an interval [s, t] if (a) u occurs exactly once in T, (b) u contains the interval [s, t] (i.e.), and (c) every substring v of T with containing [s, t] occurs at least twice in T. Given a query interval, the interval SUS problem is to output all the SUSs for the interval [s, t]. In this article, we propose a bits data structure answering an interval SUS query in output-sensitive time, where is the number of returned SUSs. Additionally, we focus on the point SUS problem, which is the interval SUS problem for. Here, we propose a bits data structure answering a point SUS query in the same output-sensitive time.

  • On Longest Common Property Preserved Substring Queries 国際会議

    Kazuki Kai, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda, Tomasz Kociumaka

    26th International Symposium on String Processing and Information Retrieval, SPIRE 2019  2019年10月 

     詳細を見る

    開催年月日: 2019年10月

    記述言語:英語  

    開催地:Segovia   国名:スペイン  

    We revisit the problem of longest common property preserving substring queries introduced by Ayad et al. (SPIRE 2018, arXiv 2018). We consider a generalized and unified on-line setting, where we are given a set X of k strings of total length n that can be pre-processed so that, given a query string y and a positive integer (Formula presented), we can determine the longest substring of y that satisfies some specific property and is common to at least (Formula presented) strings in X. Ayad et al. considered the longest square-free substring in an on-line setting and the longest periodic and palindromic substring in an off-line setting. In this paper, we give efficient solutions in the on-line setting for finding the longest common square, periodic, palindromic, and Lyndon substrings. More precisely, we show that X can be pre-processed in O(n) time resulting in a data structure of O(n) size that answers queries in (Formula presented) time and O(1) working space, where (Formula presented) is the size of the alphabet, and the common substring must be a square, a periodic substring, a palindrome, or a Lyndon word.

  • Direct Linear Time Construction of Parameterized Suffix and LCP Arrays for Constant Alphabets 国際会議

    Noriki Fujisato, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

    26th International Symposium on String Processing and Information Retrieval, SPIRE 2019  2019年10月 

     詳細を見る

    開催年月日: 2019年10月

    記述言語:英語  

    開催地:Segovia   国名:スペイン  

    We present the first worst-case linear time algorithm that directly computes the parameterized suffix and LCP arrays for constant sized alphabets. Previous algorithms either required quadratic time or the parameterized suffix tree to be built first. More formally, for a string over static alphabet and parameterized alphabet, our algorithm runs in time and O(n) words of space, where is the number of distinct symbols of in the string.

  • Computing Maximal Palindromes and Distinct Palindromes in a Trie 国際会議

    Mitsuru Funakoshi, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

    Prague Stringology Conference 2019  2019年8月 

     詳細を見る

    開催年月日: 2019年8月

    記述言語:英語  

    国名:チェコ共和国  

  • Shortest unique palindromic substring queries on run-length encoded strings 国際会議

    Kiichi Watanabe, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

    30th International Workshop on Combinatorial Algorithms, IWOCA 2019  2019年7月 

     詳細を見る

    開催年月日: 2019年7月

    記述言語:英語  

    開催地:Pisa   国名:イタリア共和国  

    For a string S, a palindromic substring S[i.j] is said to be a shortest unique palindromic substring (SUPS) for an interval [s, t] in S, if S[i.j] occurs exactly once in S, the interval [i, j] contains [s, t], and every palindromic substring containing [s, t] which is shorter than S[i.j] occurs at least twice in S. In this paper, we study the problem of answering SUPS queries on run-length encoded strings. We show how to preprocess a given run-length encoded string RLES of size m in O(m) space and O(mlog σRLES + m√log m/log logm) time so that all SUPSs for any subsequent query interval can be answered in O(√log m/log logm+ α) time, where α is the number of outputs, and σRLES is the number of distinct runs of RLES.

  • Computing runs on a trie 国際会議

    Ryo Sugahara, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

    30th Annual Symposium on Combinatorial Pattern Matching, CPM 2019  2019年6月 

     詳細を見る

    開催年月日: 2019年6月

    記述言語:英語  

    開催地:Pisa   国名:イタリア共和国  

    A maximal repetition, or run, in a string, is a maximal periodic substring whose smallest period is at most half the length of the substring. In this paper, we consider runs that correspond to a path on a trie, or in other words, on a rooted edge-labeled tree where the endpoints of the path must be a descendant/ancestor of the other. For a trie with n edges, we show that the number of runs is less than n. We also show an O(n√log n log log n) time and O(n) space algorithm for counting and finding the shallower endpoint of all runs. We further show an O(n log n) time and O(n) space algorithm for finding both endpoints of all runs. We also discuss how to improve the running time even more.

  • On the size of overlapping Lempel-Ziv and Lyndon factorizations 国際会議

    Yuki Urabe, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

    30th Annual Symposium on Combinatorial Pattern Matching, CPM 2019  2019年6月 

     詳細を見る

    開催年月日: 2019年6月

    記述言語:英語  

    開催地:Pisa   国名:イタリア共和国  

    Lempel-Ziv (LZ) factorization and Lyndon factorization are well-known factorizations of strings. Recently, Kärkkäinen et al. studied the relation between the sizes of the two factorizations, and showed that the size of the Lyndon factorization is always smaller than twice the size of the non-overlapping LZ factorization [STACS 2017]. In this paper, we consider a similar problem for the overlapping version of the LZ factorization. Since the size of the overlapping LZ factorization is always smaller than the size of the non-overlapping LZ factorization and, in fact, can even be an O(log n) factor smaller, it is not immediately clear whether a similar bound as in previous work would hold. Nevertheless, in this paper, we prove that the size of the Lyndon factorization is always smaller than four times the size of the overlapping LZ factorization.

  • Faster queries for longest substring palindrome after block edit 国際会議

    Mitsuru Funakoshi, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

    30th Annual Symposium on Combinatorial Pattern Matching, CPM 2019  2019年6月 

     詳細を見る

    開催年月日: 2019年6月

    記述言語:英語  

    開催地:Pisa   国名:イタリア共和国  

    Palindromes are important objects in strings which have been extensively studied from combinatorial, algorithmic, and bioinformatics points of views. Manacher [J. ACM 1975] proposed a seminal algorithm that computes the longest substring palindromes (LSPals) of a given string in O(n) time, where n is the length of the string. In this paper, we consider the problem of finding the LSPal after the string is edited. We present an algorithm that uses O(n) time and space for preprocessing, and answers the length of the LSPals in O(ℓ + log log n) time, after a substring in T is replaced by a string of arbitrary length ℓ. This outperforms the query algorithm proposed in our previous work [CPM 2018] that uses O(ℓ + log n) time for each query.

  • The Parameterized Position Heap of a Trie 国際会議

    Noriki Fujisato, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

    11th International Conference on Algorithms and Complexity, CIAC 2019  2019年5月 

     詳細を見る

    開催年月日: 2019年5月

    記述言語:英語  

    開催地:Rome   国名:イタリア共和国  

    Let and be disjoint alphabets of respective size and. Two strings over of equal length are said to parameterized match (p-match) if there is a bijection such that (1) f is identity on and (2) f maps the characters of one string to those of the other string so that the two strings become identical. We consider the p-matching problem on a (reversed) trie and a string pattern P such that every path that p-matches P has to be reported. Let N be the size of the given trie. In this paper, we propose the parameterized position heap for that occupies O(N) space and supports p-matching queries in time, where m is the length of a query pattern P and is the number of paths in to report. We also present an algorithm which constructs the parameterized position heap for a given trie in time and working space.

  • MR-RePair Grammar Compression Based on Maximal Repeats

    Isamu Furuya, Takuya Takagi, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Takuya Kida

    2019 Data Compression Conference, DCC 2019  2019年3月 

     詳細を見る

    開催年月日: 2019年3月

    記述言語:英語  

    開催地:Snowbird   国名:アメリカ合衆国  

    We analyze the grammar generation algorithm of the RePair compression algorithm and show the relation between a grammar generated by RePair and maximal repeats. We reveal that RePair replaces step by step the most frequent pairs within the corresponding most frequent maximal repeats. Then, we design a novel variant of RePair, called MR-RePair, which substitutes the most frequent maximal repeats at once instead of substituting the most frequent pairs consecutively. We implemented MR-RePair and compared the size of the grammar generated by MR-RePair to that by RePair on several text corpora. Our experiments show that MR-RePair generates more compact grammars than RePair does, especially for highly repetitive texts.

  • Recovering, counting and enumerating strings from forward and backward suffix arrays 国際会議

    Yuki Kuhara, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

    25th International Symposium on String Processing and Information Retrieval, SPIRE 2018  2018年10月 

     詳細を見る

    開催年月日: 2018年10月

    記述言語:英語  

    開催地:Lima   国名:ペルー共和国  

    The suffix array SAw of a string w of length n is a permutation of [1..n] such that SAw[i]=j iff w[j, n] is the lexicographically i-th suffix of w. In this paper, we consider variants of the reverse-engineering problem on suffix arrays with two given permutations P and Q of [1..n], such that P refers to the forward suffix array of some string w and Q refers to the backward suffix array of the reversed string wR. Our results are the following: (1) An algorithm which computes a solution string over an alphabet of the smallest size, in O(n) time. (2) The exact number of solution strings over an alphabet of size σ. (3) An efficient algorithm which computes all solution strings in the lexicographical order, in time near optimal up to log n factor.

  • Right-to-left online construction of parameterized position heaps 国際会議

    Noriki Fujisato, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda.

    Prague Stringology Conference 2018  2018年8月 

     詳細を見る

    開催年月日: 2018年8月

    記述言語:英語   会議種別:口頭発表(一般)  

    国名:チェコ共和国  

  • O(n log n)-time text compression by LZ-style longest first substitution 国際会議

    Akihiro Nishi, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda.

    Prague Stringology Conference 2018  2018年8月 

     詳細を見る

    開催年月日: 2018年8月

    記述言語:英語   会議種別:口頭発表(一般)  

    国名:チェコ共和国  

  • Faster online elastic degenerate string matching

    Kotaro Aoyama, Yuto Nakashima, I. Tomohiro, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

    29th Annual Symposium on Combinatorial Pattern Matching, CPM 2018  2018年7月 

     詳細を見る

    開催年月日: 2018年7月

    記述言語:英語  

    開催地:Qingdao   国名:中華人民共和国  

    An Elastic-Degenerate String [Iliopoulus et al., LATA 2017] is a sequence of sets of strings, which was recently proposed as a way to model a set of similar sequences. We give an online algorithm for the Elastic-Degenerate String Matching (EDSM) problem that runs in O(nm √mlogm+ N) time and O(m) working space, where n is the number of elastic degenerate segments of the text, N is the total length of all strings in the text, and m is the length of the pattern. This improves the previous algorithm by Grossi et al. [CPM 2017] that runs in O(nm2 + N) time.

  • Longest substring palindrome after edit

    Mitsuru Funakoshi, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

    29th Annual Symposium on Combinatorial Pattern Matching, CPM 2018  2018年7月 

     詳細を見る

    開催年月日: 2018年7月

    記述言語:英語  

    開催地:Qingdao   国名:中華人民共和国  

    It is known that the length of the longest substring palindromes (LSPals) of a given string T of length n can be computed in O(n) time by Manacher's algorithm [J. ACM '75]. In this paper, we consider the problem of finding the LSPal after the string is edited. We present an algorithm that uses O(n) time and space for preprocessing, and answers the length of the LSPals in O(log(min{ω, log n})) time after single character substitution, insertion, or deletion, where ω denotes the number of distinct characters appearing in T. We also propose an algorithm that uses O(n) time and space for preprocessing, and answers the length of the LSPals in O(ℓ+log n) time, after an existing substring in T is replaced by a string of arbitrary length ℓ.

  • Longest lyndon substring after edit

    Yuki Urabe, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

    29th Annual Symposium on Combinatorial Pattern Matching, CPM 2018  2018年7月 

     詳細を見る

    開催年月日: 2018年7月

    記述言語:英語  

    開催地:Qingdao   国名:中華人民共和国  

    The longest Lyndon substring of a string T is the longest substring of T which is a Lyndon word. LLS(T) denotes the length of the longest Lyndon substring of a string T. In this paper, we consider computing LLS(T′) where T′ is an edited string formed from T. After O(n) time and space preprocessing, our algorithm returns LLS(T′) in O(log n) time for any single character edit. We also consider a version of the problem with block edits, i.e., a substring of T is replaced by a given string of length l. After O(n) time and space preprocessing, our algorithm returns LLS(T′) in O(l log σ + log n) time for any block edit where σ is the number of distinct characters in T. We can modify our algorithm so as to output all the longest Lyndon substrings of T′ for both problems.

  • Lyndon 文字列とテキスト圧縮 招待

    中島祐人

    2018年電子情報通信学会総合大会  2018年3月 

     詳細を見る

    開催年月日: 2018年6月

    記述言語:日本語   会議種別:口頭発表(一般)  

    開催地:東京電機大学(東京)   国名:日本国  

  • ストリーミングモデルにおける最長Lyndon文字列

    中島祐人

    冬のLAシンポジウム2017  2018年2月 

     詳細を見る

    開催年月日: 2018年6月

    記述言語:日本語   会議種別:口頭発表(一般)  

    開催地:京都大学(京都市)   国名:日本国  

  • Almost linear time computation of maximal repetitions in run length encoded strings 国際会議

    Yuta Fujishige, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

    The 28th International Symposium on Algorithms and Computation (ISAAC 2017)  2017年12月 

     詳細を見る

    開催年月日: 2017年12月

    記述言語:英語   会議種別:口頭発表(一般)  

    国名:タイ王国  

  • On Reverse Engineering the Lyndon Tree 国際会議

    Yuto Nakashima, Takuya Takagi, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

    The Prague Stringology Conference 2017 (PSC 2017)  2017年8月 

     詳細を見る

    開催年月日: 2017年8月

    記述言語:英語   会議種別:口頭発表(一般)  

    国名:チェコ共和国  

  • Almost linear time computation of maximal repetitions in run length encoded strings 国際会議

    Yuta Fujishige, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

    12th Workshop on Compression, Text and Algorithms  2017年9月 

     詳細を見る

    開催年月日: 2017年6月

    記述言語:英語   会議種別:口頭発表(一般)  

    国名:イタリア共和国  

  • On Sensitivity of Compact Directed Acyclic Word Graphs 国際会議

    Hiroto Fujimaru, Yuto Nakashima, Shunsuke Inenaga

    14th International Conference on Words (WORDS 2023)  2023年6月 

     詳細を見る

    記述言語:英語   会議種別:口頭発表(一般)  

    開催地:Umeå   国名:スウェーデン王国  

  • Computing SEQ-IC-LCS of Labeled Graphs 国際会議

    Yuki Yonemoto, Yuto Nakashima, Shunsuke Inenaga

    Prague Stringology Conference 2023 (PSC 2023)  2023年8月 

     詳細を見る

    記述言語:英語  

    国名:チェコ共和国  

  • Optimally Computing Compressed Indexing Arrays Based on the Compact Directed Acyclic Word Graph 国際会議

    Hiroki Arimura, Shunsuke Inenaga, Yasuaki Kobayashi, Yuto Nakashima, Mizuki Sue

    30th International Symposium on String Processing and Information Retrieval (SPIRE 2023)  2023年9月 

     詳細を見る

    記述言語:英語  

    国名:チェコ共和国  

  • Largest Repetition Factorization of Fibonacci Words 国際会議

    Kaisei Kishi, Yuto Nakashima, Shunsuke Inenaga

    30th International Symposium on String Processing and Information Retrieval (SPIRE 2023)  2023年9月 

     詳細を見る

    記述言語:英語  

    国名:チェコ共和国  

  • Linear-Time Computation of Generalized Minimal Absent Words for Multiple Strings 国際会議

    Kouta Okabe, Takuya Mieno, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai

    30th International Symposium on String Processing and Information Retrieval (SPIRE 2023)  2023年9月 

     詳細を見る

    記述言語:英語  

    国名:チェコ共和国  

▼全件表示

委員歴

  • 情報処理学会 アルゴリズム研究運営委員会   幹事  

    2024年4月 - 現在   

      詳細を見る

  • EATCS Japan Chapter   幹事   国際

    2023年1月 - 現在   

  • EATCS Japan Chapter   幹事  

    2023年1月 - 現在   

      詳細を見る

  • 電子情報通信学会 情報・システムソサイエティ誌 編集委員会   編集委員  

    2022年6月 - 現在   

      詳細を見る

  • 情報処理学会   アルゴリズム研究会運営委員   国内

    2022年6月 - 2026年6月   

  • 情報処理学会 アルゴリズム研究運営委員会   運営委員  

    2022年4月 - 2024年3月   

      詳細を見る

  • 電子情報通信学会 コンピュテーション研究専門委員会   専門委員  

    2020年6月 - 現在   

      詳細を見る

  • 電子情報通信学会   コンピュテーション研究会 専門委員   国内

    2020年6月 - 2022年5月   

▼全件表示

学術貢献活動

  • 組織委員長 国際学術貢献

    35th Annual Symposium on Combinatorial Pattern Matching  ( 福岡 ) 2024年6月

     詳細を見る

    種別:大会・シンポジウム等 

  • PC memer 国際学術貢献

    34th Annual Symposium on Combinatorial Pattern Matching  ( France ) 2023年6月

     詳細を見る

    種別:大会・シンポジウム等 

  • 学術論文等の審査

    役割:査読

    2023年

     詳細を見る

    種別:査読等 

    国際会議録 査読論文数:8

  • 運営委員

    STRセミナー2022 若手研究者のための大学間合同研究集会  ( 神戸市 ) 2022年9月

     詳細を見る

    種別:大会・シンポジウム等 

  • 電子情報通信学会 情報・システムソサイエティ誌

    2022年6月 - 2024年6月

     詳細を見る

    種別:学会・研究会等 

  • 学術論文等の審査

    役割:査読

    2022年

     詳細を見る

    種別:査読等 

    外国語雑誌 査読論文数:3

    国際会議録 査読論文数:7

  • 学術論文等の審査

    役割:査読

    2021年

     詳細を見る

    種別:査読等 

    外国語雑誌 査読論文数:2

    国際会議録 査読論文数:4

  • 座長

    冬のLAシンポジウム2019  ( 京都大学数理解析研究所(京都府京都市) ) 2020年2月

     詳細を見る

    種別:大会・シンポジウム等 

  • 学術論文等の審査

    役割:査読

    2020年

     詳細を見る

    種別:査読等 

    外国語雑誌 査読論文数:4

    国際会議録 査読論文数:4

  • 座長

    コンピュテーション研究会  ( 北海道大学札幌キャンパス(北海道札幌市) ) 2019年10月

     詳細を見る

    種別:大会・シンポジウム等 

  • コメンテータ

    第11回データ工学と情報マネジメントに関するフォーラム  ( ホテルオークラJRハウステンボス(長崎県佐世保市) ) 2019年3月

     詳細を見る

    種別:大会・シンポジウム等 

  • 学術論文等の審査

    役割:査読

    2019年

     詳細を見る

    種別:査読等 

    外国語雑誌 査読論文数:0

    日本語雑誌 査読論文数:0

    国際会議録 査読論文数:6

    国内会議録 査読論文数:0

  • 座長

    夏のLAシンポジウム2018  ( サンライズ九十九里(千葉県山武郡) ) 2018年7月

     詳細を見る

    種別:大会・シンポジウム等 

  • 座長

    2018年電子情報通信学会総合大会  ( 東京電機大学(東京) ) 2018年3月

     詳細を見る

    種別:大会・シンポジウム等 

  • 座長

    アルゴリズム研究会(167回)  ( サンポートホール高松(香川県高松市) ) 2018年3月

     詳細を見る

    種別:大会・シンポジウム等 

  • 座長

    冬のLAシンポジウム2017  ( 京都大学(京都市) ) 2018年2月

     詳細を見る

    種別:大会・シンポジウム等 

  • 学術論文等の審査

    役割:査読

    2018年

     詳細を見る

    種別:査読等 

    外国語雑誌 査読論文数:4

    日本語雑誌 査読論文数:0

    国際会議録 査読論文数:3

    国内会議録 査読論文数:0

  • 学術論文等の審査

    役割:査読

    2017年

     詳細を見る

    種別:査読等 

    外国語雑誌 査読論文数:1

    日本語雑誌 査読論文数:0

    国際会議録 査読論文数:2

    国内会議録 査読論文数:0

▼全件表示

共同研究・競争的資金等の研究課題

  • 辞書式順序依存問題の複雑さの解明

    研究課題/領域番号:23H04386  2023年 - 2024年

    日本学術振興会・文部科学省  科学研究費助成事業  学術変革領域研究(A)

    中島 祐人

      詳細を見る

    担当区分:研究代表者  資金種別:科研費

    効率的な文字列アルゴリズムの開発には,文字列が持つ数理的性質の理解が重要である.文字列の特徴を捉える際に利用される構造や性質は,辞書式順序依存な構造と辞書式順序非依存な構造に大別される.辞書式順序依存な構造におけるこれまでの研究では,暗に与えられた辞書式順序のみを考えているに過ぎなかった.本研究では辞書式順序に依存する問題に着目し,組合せ論・計算量理論・アルゴリズム論の三方向から,辞書式順序が与える文字列構造への影響を解明する.

    CiNii Research

  • 広義文字列のアルゴリズムと組合せ論

    研究課題/領域番号:23K24808  2022年4月 - 2026年3月

    科学研究費助成事業  基盤研究(B)

    稲永 俊介, 坂内 英夫, 中島 祐人, Koeppl Dominik

      詳細を見る

    資金種別:科研費

    文字列とは,文字あるいは記号の一本鎖列である.自然言語テキスト,サーバログ,DNA配列など,計算機処理可能な様々なデータを文字列と見なすことができる.1970年代から半世紀にわたって,検索・圧縮・比較・解析・発見などの重要タスクを文字列データ上で高速実行するアルゴリズムが多数提案されてきた.本提案課題では,文字列の定義域を拡張した「広義文字列」という新概念を提唱し,木型文字列/グラフ型文字列/2次元文字列/時系列の高速処理アルゴリズム開発と,それを支える組合せ論的性質の解明に取り組む.

    CiNii Research

  • 広義文字列のアルゴリズムと組合せ論

    研究課題/領域番号:22H03551  2022年 - 2025年

    日本学術振興会  科学研究費助成事業  基盤研究(B)

      詳細を見る

    担当区分:研究分担者  資金種別:科研費

  • Lyndon文字列による反復性指標解析

    研究課題/領域番号:21K17705  2021年 - 2024年

    日本学術振興会  科学研究費助成事業  若手研究

    中島 祐人

      詳細を見る

    担当区分:研究代表者  資金種別:科研費

    文字列データを対象としたデータ圧縮においては,辞書式圧縮と呼ばれる様々な手法が知られている.一般に,部分文字列の反復(繰り返し)が多いほど,文字列は圧縮されやすい傾向にあるため,LZ 分解などの文字列構造のサイズは,文字列の反復性を捉えた反復性指標と見なすことができる.最近では,String attractor や部分文字列複雑性に基づいた反復性指標が提案され,反復性への理解が進められているが,本申請課題では,辞書式圧縮とは直接関係のない Lyndon 文字列および関連する文字列構造を導入することで,新たな視点から反復性指標の解析を行い,その本質を明らかにする.

    CiNii Research

  • 辞書式順序に基づいた文字列データ処理法の構築

    2020年 - 2022年

    戦略的創造研究推進事業 (文部科学省)

      詳細を見る

    担当区分:研究代表者  資金種別:受託研究

  • 情報爆縮に基づくIoTデータ処理基盤の構築

    研究課題/領域番号:18H04098  2018年 - 2022年

    日本学術振興会  科学研究費助成事業  基盤研究(A)

    竹田 正幸, 定兼 邦彦, 坂内 英夫, 井 智弘, 瀧本 英二, 坂本 比呂志, 畑埜 晃平, 稲永 俊介, 喜田 拓也, 中島 祐人

      詳細を見る

    担当区分:研究分担者  資金種別:科研費

    IoT時代が到来し, 製造・産業界, モビリティ・交通インフラ, 医療・ヘルスケアなど, 広範な分野において技術革新が生み出され新たな価値が創出されると期待されている. 一方, IoTデータの急増に伴い, クラウドサーバへのデータ処理の集中, ネットワーク回線の逼迫, データ送信の遅延などの技術的問題が深刻化している. 現在のIoTに対する取り組みは実証フェーズのものが多くIoTを用いた価値創出に重点が置かれているため, 上記の問題の抜本的な解決は後回しとなっている. 代表者らは, これまでに, 入力データや計算に用いるデータ構造に潜む冗長性の除去に基づいて計算そのものの効率化を図る「情報爆縮」の研究を行ってきた. 本研究プロジェクトでは, この情報爆縮技術を核に据え, 理論と実用的の両面から, エッジ側とクラウド側の処理を効率化する技術を開発し低コストでデータを収集・集約・送信・蓄積・検索・解析できる新しいIoTデータ処理基盤の構築を目指す.
    このために, 以下の3つの研究項目をおいて研究を行った.
    (A) 圧縮データストリームに対するリアルタイム抽出・集計技術.
    (B) 圧縮データに対する高速質問処理技術.
    (C) 解釈可能な圧縮データ錬成技術.
    前年度に引き続き, 特に(A), (B)に力点を置いて研究を行い、(A)スライド窓における極小ユニーク文字列の列挙・回文木の構築/維持のアルゴリズム、(B)接尾辞配列構築アルゴリズムを基にした高速な圧縮索引構造の提案等、多くの研究成果を得た.

    CiNii Research

  • Lyndon文字列による簡潔で高速な文字列処理アルゴリズム

    研究課題/領域番号:18K18002  2018年 - 2020年

    日本学術振興会  科学研究費助成事業  若手研究

    中島 祐人

      詳細を見る

    担当区分:研究代表者  資金種別:科研費

    本研究の目的は,Lyndon文字列の性質に基づいて,簡潔で高速な文字列処理アルゴリズムを開発することである.
    本目的の達成のために,Lyndon文字列を中心に,繰り返し構造や回文構造など広く文字列処理アルゴリズムや文字列組合せ論の問題に取り組んだ.Lyndon文字列を中心に,様々な文字列構造の性質の理解や,それらの性質を利用した効率的なアルゴリズムを提案した.

    CiNii Research

  • 情報爆縮基盤技術

    研究課題/領域番号:25240003  2017年

    日本学術振興会  科学研究費助成事業  基盤研究(A)

      詳細を見る

    担当区分:研究分担者  資金種別:科研費

  • ストリーミングモデルにおける文字列処理アルゴリズム基盤

    研究課題/領域番号:17H06923  2017年

    日本学術振興会  科学研究費助成事業  研究活動スタート支援

      詳細を見る

    担当区分:研究代表者  資金種別:科研費

▼全件表示

教育活動概要

  • 主に,理学部物理学科情報理学コースの演習科目を担当している.

担当授業科目

  • 情報科学

    2024年10月 - 2025年3月   後期

  • 情報代数学演習

    2024年4月 - 2024年9月   前期

  • 形式言語理論演習

    2024年4月 - 2024年9月   前期

  • 形式言語理論演習

    2023年4月 - 2023年9月   前期

  • 情報代数学演習

    2023年4月 - 2023年9月   前期

  • 形式言語理論演習

    2022年4月 - 2022年9月   前期

  • 情報代数学演習

    2022年4月 - 2022年9月   前期

  • 情報論理学演習

    2022年4月 - 2022年9月   前期

  • 情報論理学演習

    2021年4月 - 2021年9月   前期

  • 情報代数学演習

    2021年4月 - 2021年9月   前期

  • 形式言語理論演習

    2021年4月 - 2021年9月   前期

  • 情報論理学演習

    2020年4月 - 2020年9月   前期

  • 形式言語理論演習

    2020年4月 - 2020年9月   前期

  • 情報代数学演習

    2020年4月 - 2020年9月   前期

  • 形式言語理論演習

    2019年4月 - 2019年9月   前期

  • 情報代数学演習

    2019年4月 - 2019年9月   前期

  • 情報論理学演習

    2019年4月 - 2019年9月   前期

  • 情報科学

    2018年10月 - 2019年3月   後期

  • 形式言語理論演習

    2018年4月 - 2018年9月   前期

  • 情報代数学演習

    2018年4月 - 2018年9月   前期

  • 情報論理学演習

    2018年4月 - 2018年9月   前期

  • 情報論理学演習

    2017年4月 - 2017年9月   前期

  • 形式言語理論演習

    2017年4月 - 2017年9月   前期

  • 情報代数学演習

    2017年4月 - 2017年9月   前期

▼全件表示

FD参加状況

  • 2024年3月   役割:参加   名称:【シス情FD】高度データサイエンティスト育成事業の取り組みについて

    主催組織:部局

  • 2023年10月   役割:参加   名称:【シス情FD】価値創造型半導体人材育成センターについて

    主催組織:部局

  • 2023年4月   役割:講演   名称:【シス情FD】若手教員による研究紹介⑧

    主催組織:部局

  • 2022年5月   役割:参加   名称:【シス情FD】若手教員による研究紹介④「量子コンピュータ・システム・アーキテクチャの研究~道具になることを目指して~」

    主催組織:部局

  • 2022年4月   役割:参加   名称:【シス情FD】第4期中期目標・中期計画等について

    主催組織:部局

  • 2022年1月   役割:参加   名称:【シス情FD】シス情関連の科学技術に対する国の政策動向(に関する私見)

    主催組織:部局

  • 2021年10月   役割:参加   名称:【シス情FD】熊本高専と九大システム情報との交流・連携に向けて ー 3年半で感じた高専の実像 ー

    主催組織:部局

  • 2021年7月   役割:参加   名称:若手教員による研究紹介 及び 科研取得のポイント、その他について ②

    主催組織:部局

  • 2021年6月   役割:参加   名称:若手教員による研究紹介 及び 科研取得のポイントについて ①

    主催組織:部局

  • 2020年11月   役割:参加   名称:マス・フォア・イノベーション卓越大学院について

    主催組織:部局

  • 2020年4月   役割:参加   名称:新型コロナウイルスが誘起した社会変化に対するシステム情報科学からの提言

    主催組織:部局

  • 2018年7月   役割:参加   名称:論文剽窃ソフトの活用方法について

    主催組織:部局

▼全件表示