九州大学 研究者情報
論文一覧
IWANA BRIAN KENJI(いわな ぶらいあん けんじ) データ更新日:2024.04.01

准教授 /  システム情報科学研究院 情報知能工学部門


原著論文
1. Ke Xiao, Anna Zhu, Brian Kenji Iwana, Cheng-Lin Liu, Scene text recognition via dual character counting-aware visual and semantic modeling network, Science China Information Sciences, 10.1007/s11432-023-3935-8, 67, 3, 139101:1-139101:1, 2024.02.
2. Karthikeyan Suresh, Brian Kenji Iwana, Using Motif-Based Features to Improve Signal Classification with Temporal Neural Networks, Asian Conference on Pattern Recognition (ACPR), 2023.11.
3. Yusuke Nagata, Brian Kenji Iwana, Seiichi Uchida, Contour Completion by Transformers and Its Application to Vector Font Data, International Conference on Document Analysis and Recognition (ICDAR), 2023.08.
4. Brian Kenji Iwana, Akihiro Kusuda, Vision Conformer: Incorporating Convolutions into Vision Transformer Layers, International Conference on Document Analysis and Recognition (ICDAR), 2023.08.
5. Wei Pan, Anna Zhu, Xinyu Zhou, Brian Kenji Iwana, Shilin Li, Few shot font generation via transferring similarity guided global style and quantization local style, International Conference on Computer Vision (ICCV), 2023.10.
6. Guangtao Lyu, Kun Liu, Anna Zhu, Seiichi Uchida, Brian Kenji Iwana, FETNet: Feature erasing and transferring network for scene text removal, Pattern Recognition, 2023.08.
7. Shinnosuke Matsuo, Xiaomeng Wu, Gantugs Atarsaikhan, Akisato Kimura, Kunio Kashino, Deep attentive time warping, Pattern Recognition, 10.1016/j.patcog.2022.109201, 136, 2023.04.
8. Sangjun Han, Brian Kenji Iwana, Satoru Uchida, Classification of Polysemous and Homograph Word Usages using Semi-Supervised Learning, Annual Conference of the Association for Natural Language Processing (NLP), 2023.03.
9. Anna Zhu, Zhanhui Yin, Brian Kenji Iwana, Xinyu Zhou, Shengwu Xiong, Text Style Transfer based on Multi-factor Disentanglement and Mixture, ACM Multimedia, 2022.10.
10. Daisuke Oba, Brian Kenji Iwana, Shinnosuke Matsuo, Dynamic Data Augmentation with Gating Networks for Time Series Recognition, International Conference on Pattern Recognition (ICPR), 2022.08.
11. Brian Kenji Iwana, On Mini-Batch Training with Varying Length Time Series, International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022.05.
12. Yuchen Zheng, Brian Kenji Iwana, Muhammad Imran Malik, Sheraz Ahmed, Wataru Ohyama, Seiichi Uchida, Learning the micro deformations by max-pooling for offline signature verification, Pattern Recognition, 10.1016/j.patcog.2021.108008, 118, 108008, 2021.10.
13. Shinnosuke Matsuo, Xiaomeng Wu, Gantugs Atarsaikhan, Akisato Kimura, Kunio Kashino, Brian Kenji Iwana, and Seiichi Uchida, Attention to Warp: Deep Metric Learning for Multivariate Time Series, International Conference on Document Analysis and Recognition (ICDAR), 2021.09.
14. Taiga Miyazono, Daichi Haraguchi, Seiichi Uchida, and Brian Kenji Iwana, Font Style that Fits an Image -- Font Generation Based on Image Context, International Conference on Document Analysis and Recognition (ICDAR), 2021.09.
15. Wensheng Zhang, Yan Zheng, Taiga Miyazono, Seiichi Uchida, and Brian Kenji Iwana, Towards Book Cover Design via Layout Graphs, International Conference on Document Analysis and Recognition (ICDAR), 2021.09.
16. Kaigen Tsuji, Seiichi Uchida, Brian Kenji Iwana, Using Robust Regression to Find Font Usage Trends, ICDAR Workshop on Machine Learning, 2021.09.
17. Brian Kenji Iwana, Seiichi Uchida, An Empirical Survey of Data Augmentation for Time Series Classification with Neural Networks, PLOS ONE, 2021.07, In recent times, deep artificial neural networks have achieved many successes in pattern recognition. Part of this success can be attributed to the reliance on big data to increase generalization. However, in the field of time series recognition, many datasets are often very small. One method of addressing this problem is through the use of data augmentation. In this paper, we survey data augmentation techniques for time series and their application to time series classification with neural networks. We propose a taxonomy and outline the four families in time series data augmentation, including transformation-based methods, pattern mixing, generative models, and decomposition methods. Furthermore, we empirically evaluate 12 time series data augmentation methods on 128 time series classification datasets with six different types of neural networks. Through the results, we are able to analyze the characteristics, advantages and disadvantages, and recommendations of each data augmentation method. This survey aims to help in the selection of time series data augmentation for neural network applications..
18. Seokjun Kang, Seiichi Uchida, Brian Kenji Iwana, Tunable U-Net: Controlling image-to-image outputs using a tunable scalar value, IEEE Access, 10.1109/ACCESS.2021.3096530, 2021.07.
19. Shinnosuke Matsuo, Brian Kenji Iwana, and Seiichi Uchida, Self-Augmented Multi-Modal Feature Embedding, International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021.06.
20. Brian Kenji Iwana, Seiichi Uchida, Time Series Data Augmentation for Neural Networks by Time Warping with a Discriminative Teacher, International Conference on Pattern Recognition (ICPR), 2021.01, Neural networks have become a powerful tool in pattern recognition and part of their success is due to generalization from using large datasets. However, unlike other domains, time series classification datasets are often small. In order to address this problem, we propose a novel time series data augmentation called guided warping. While many data augmentation methods are based on random transformations, guided warping exploits the element alignment properties of Dynamic Time Warping (DTW) and shapeDTW, a high-level DTW method based on shape descriptors, to deterministically warp sample patterns. In this way, the time series are mixed by warping the features of a sample pattern to match the time steps of a reference pattern. Furthermore, we introduce a discriminative teacher in order to serve as a directed reference for the guided warping. We evaluate the method on all 85 datasets in the 2015 UCR Time Series Archive with a deep convolutional neural network (CNN) and a recurrent neural network (RNN). The code with an easy to use implementation can be found at this https URL ..
21. Seokjun Kang, Brian Kenji Iwana, Seiichi Uchida, Complex image processing with less data—Document image binarization by integrating multiple pre-trained U-Net modules, Pattern Recognition, 10.1016/j.patcog.2020.107577, 106, 107577, 2021.01.
22. Gantugs Atarsaikhan, Brian Kenji Iwana, and Seiichi Uchida, Neural Style Difference Transfer and Its Application to Font Generation, International Workshop on Document Analysis Systems (DAS), 2020.10, Designing fonts requires a great deal of time and effort. It requires professional skills, such as sketching, vectorizing, and image editing. Additionally, each letter has to be designed individually. In this paper, we will introduce a method to create fonts automatically. In our proposed method, the difference of font styles between two different fonts is found and transferred to another font using neural style transfer. Neural style transfer is a method of stylizing the contents of an image with the styles of another image. We proposed a novel neural style difference and content difference loss for the neural style transfer. With these losses, new fonts can be generated by adding or removing font styles from a font. We provided experimental results with various combinations of input fonts and discussed limitations and future development for the proposed method..
23. Keisuke Kanda, Brian Kenji Iwana, and Seiichi Uchida, What is the Reward for Handwriting? --- Handwriting Generation by Imitation Learning, International Conference on Frontiers in Handwriting Recognition (ICFHR), 2020.09, Analyzing the handwriting generation process is an important issue and has been tackled by various generation models, such as kinematics based models and stochastic models. In this study, we use a reinforcement learning (RL) framework to realize handwriting generation with the careful future planning ability. In fact, the handwriting process of human beings is also supported by their future planning ability; for example, the ability is necessary to generate a closed trajectory like '0' because any shortsighted model, such as a Markovian model, cannot generate it. For the algorithm, we employ generative adversarial imitation learning (GAIL). Typical RL algorithms require the manual definition of the reward function, which is very crucial to control the generation process. In contrast, GAIL trains the reward function along with the other modules of the framework. In other words, through GAIL, we can understand the reward of the handwriting generation process from handwriting examples. Our experimental results qualitatively and quantitatively show that the learned reward catches the trends in handwriting generation and thus GAIL is well suited for the acquisition of handwriting behavior..
24. Hiroki Tokunaga, Brian Kenji Iwana, Yuki Teramoto, Akihiko Yoshizawa, and Ryoma Bise, Negative Pseudo Labeling using Class Proportion for Semantic Segmentation in Pathology, European Conference on Computer Vision (ECCV), 2020.08, We propose a weakly-supervised cell tracking method that can train a convolutional neural network (CNN) by using only the annotation of "cell detection" (i.e., the coordinates of cell positions) without association information, in which cell positions can be easily obtained by nuclear staining. First, we train a co-detection CNN that detects cells in successive frames by using weak-labels. Our key assumption is that the co-detection CNN implicitly learns association in addition to detection. To obtain the association information, we propose a backward-and-forward propagation method that analyzes the correspondence of cell positions in the detection maps output of the co-detection CNN. Experiments demonstrated that the proposed method can match positions by analyzing the co-detection CNN. Even though the method uses only weak supervision, the performance of our method was almost the same as the state-of-the-art supervised method..
25. Masaya Ikoma, Brian Kenji Iwana, and Seiichi Uchida, Effect of Text Color on Word Embeddings, International Workshop on Document Analysis Systems (DAS), 2020.07, In natural scenes and documents, we can find the correlation between a text and its color. For instance, the word, "hot", is often printed in red, while "cold" is often in blue. This correlation can be thought of as a feature that represents the semantic difference between the words. Based on this observation, we propose the idea of using text color for word embeddings. While text-only word embeddings (e.g. word2vec) have been extremely successful, they often represent antonyms as similar since they are often interchangeable in sentences. In this paper, we try two tasks to verify the usefulness of text color in understanding the meanings of words, especially in identifying synonyms and antonyms. First, we quantify the color distribution of words from the book cover images and analyze the correlation between the color and meaning of the word. Second, we try to retrain word embeddings with the color distribution of words as a constraint. By observing the changes in the word embeddings of synonyms and antonyms before and after re-training, we aim to understand the kind of words that have positive or negative effects in their word embeddings when incorporating text color information..
26. Gantugs Atarsaikhan, Brian Kenji Iwana, and Seiichi Uchida, Guided neural style transfer for shape stylization, PLOS ONE, 10.1371/journal.pone.0233489, 15, 6, e0233489, 2020.06, Designing logos, typefaces, and other decorated shapes can require professional skills. In this paper, we aim to produce new and unique decorated shapes by stylizing ordinary shapes with machine learning. Specifically, we combined parametric and non-parametric neural style transfer algorithms to transfer both local and global features. Furthermore, we introduced a distance-based guiding to the neural style transfer process, so that only the foreground shape will be decorated. Lastly, qualitative evaluation and ablation studies are provided to demonstrate the usefulness of the proposed method..
27. Seokjun Kang, Brian Kenji Iwana, and Seiichi Uchida, ACMU-Net: Advanced Cascading Modular U-Nets incorporated Squeeze and Excitation Blocks, International Workshop on Document Analysis Systems (DAS), 2020.06, In document analysis research, image-to-image conversion models such as a U-Net have been shown significant performance. Recently, cascaded U-Nets research is suggested for solving complex document analysis studies. However, improving performance by adding U-Net modules requires using too many parameters in cascaded U-Nets. Therefore, in this paper, we propose a method for enhancing the performance of cascaded U-Nets. We suggest a novel document image binarization method by utilizing Cascading Modular U-Nets (CMU-Nets) and Squeeze and Excitation blocks (SE-blocks). Through verification experiments, we point out the problems caused by the use of SE-blocks in existing CMU-Nets and suggest how to use SE-blocks in CMU-Nets. We use the Document Image Binarization (DIBCO) 2017 dataset to evaluate the proposed model..
28. Daichi Haraguchi, Shota Harada, Yuto Shinahara, Brian Kenji Iwana, and Seiichi Uchida, Character-independent font identification, International Workshop on Document Analysis Systems (DAS), 2020.06, There are a countless number of fonts with various shapes and styles. In addition, there are many fonts that only have subtle differences in features. Due to this, font identification is a difficult task. In this paper, we propose a method of determining if any two characters are from the same font or not. This is difficult due to the difference between fonts typically being smaller than the difference between alphabet classes. Additionally, the proposed method can be used with fonts regardless of whether they exist in the training or not. In order to accomplish this, we use a Convolutional Neural Network (CNN) trained with various font image pairs. In the experiment, the network is trained on image pairs of various fonts. We then evaluate the model on a different set of fonts that are unseen by the network. The evaluation is performed with an accuracy of 92.27%. Moreover, we analyzed the relationship between character classes and font identification accuracy..
29. Anna Zhu, Xiongbo Lu, Xiang Bai, Seiichi Uchida, Few-Shot Text Style Transfer via Deep Feature Similarity, IEEE Transactions on Image Processing, 10.1109/TIP.2020.2995062, 2020.05, Generating text to have a consistent style with only a few observed highly-stylized text samples is a difficult task for image processing. The text style involving the typography, i.e., font, stroke, color, decoration, effects, etc., should be considered for transfer. In this paper, we propose a novel approach to stylize target text by decoding weighted deep features from only a few referenced samples. The deep features, including content and style features of each referenced text, are extracted from a Convolutional Neural Network (CNN) that is optimized for character recognition. Then, we calculate the similarity scores of the target text and the referenced samples by measuring the distance along the corresponding channels from the content features of the CNN when considering only the content, and assign them as the weights for aggregating the deep features. To enforce the stylized text to be realistic, a discriminative network with adversarial loss is employed. We demonstrate the effectiveness of our network by conducting experiments on three different datasets which have various styles, fonts, languages, etc. Additionally, the coefficients for character style transfer, including the character content, the effect of similarity matrix, the number of referenced characters, the similarity between characters, and performance evaluation by a new protocol are analyzed for better understanding our proposed framework..
30. Adriano Lucieri, Huzaifa Sabir, Shoaib Ahmed Siddiqui, Syed Tahseen Raza Rizvi, Brian Kenji Iwana, Seiichi Uchida, Andreas Dengel, and Sheraz Ahmed, Benchmarking Deep Learning Models for Classification of Book Covers
, Springer Nature Computer Science, 10.1007/s42979-020-00132-z, 1, 139, 1-16, 2020.04, Book covers usually provide a good depiction of a book’s content and its central idea. The classification of books in their respective genre usually involves subjectivity and contextuality. Book retrieval systems would utterly benefit from an automated framework that is able to classify a book’s genre based on an image, specifically for archival documents where digitization of the complete book for the purpose of indexing is an expensive task. While various modalities are available (e.g., cover, title, author, abstract), benchmarking the image-based classification systems based on minimal information is a particularly exciting field due to the recent advancements in the domain of image-based deep learning and its applicability. For that purpose, a natural question arises regarding the plausibility of solving the problem of book classification by only utilizing an image of its cover along with the current state-of-the-art deep learning models. To answer this question, this paper makes a three-fold contribution. First, the publicly available book cover dataset comprising of 57k book covers belonging to 30 different categories is thoroughly analyzed and corrected. Second, it benchmarks the performance on a battery of state-of-the-art image classification models for the task of book cover classification. Third, it uses explicit attention mechanisms to identify the regions that the network focused on in order to make the prediction. All of our evaluations were performed on a subset of the mentioned public book cover dataset. Analysis of the results revealed the inefficacy of the most powerful models for solving the classification task. With the obtained results, it is evident that significant efforts need to be devoted in order to solve this image-based classification task to a satisfactory level..
31. Yirong Zhao, Brian Kenji Iwana, Kun Qian, 学校教育における演劇的手法を取り入れた表現教育 : 中学生を対象とした教育実践を事例に, 九州地区国立大学教育系・文系研究論文集, 6, 1/2, 2020.03, In this study, we aim to investigate the use of the drama approach and its effectiveness within a classroom setting. The drama approach is an educational practice that aims to develop communication, creativity, and critical thinking skills through group interaction and performance. This case study examines the implementation and effectiveness of using the drama approach in a secondary school in Japan. A total of 266 grade 7 students participated in the activities. To analyze the effectiveness, a post-class survey was provided. The student participants were given a survey containing 23 semantic differential questions and one free-response question. The homeroom teachers of the classes also answered five free-response questions. The student survey had a 99.6% survey recovery rate and a 93.6% effective response rate. The results of the survey were positive and demonstrated that the drama activities were able to facilitate communication between peers. Specifically, factor analysis of the semantic differential questions found that there were three significant factors: (1) anxiety and opinion on the activities, (2) communication and cooperation, and (3) personal feelings. Furthermore, the free-response questions to the students and homeroom teachers indicated that the activities were enjoyable, encouraged cooperation and discussion, and helped deepen relationships between students. Through this, we found that the drama approach-based activities were beneficial for the students and worth including in Japanese education..
32. Brian Kenji Iwana and Seiichi Uchida, Time series classification using local distance-based features in multi-modal fusion networks, Pattern Recognition, 10.1016/j.patcog.2019.107024, 97, 107024, 2020.01, We propose the use of a novel feature, called local distance features, for time series classification. The local distance features are extracted using Dynamic Time Warping (DTW) and classified using Convolutional Neural Networks (CNN). DTW is classically as a robust distance measure for distance-based time series recognition methods. However, by using DTW strictly as a global distance measure, information about the matching is discarded. We show that this information can further be used as supplementary input information in temporal CNNs. This is done by using both the raw data and the features extracted from DTW in multi-modal fusion CNNs. Furthermore, we explore the effects of different prototype selection methods, prototype numbers, and data fusion schemes induce on the accuracy. We perform experiments on a wide range of time series datasets including three Unipen handwriting datasets, four UCI Machine Learning Repository datasets, and 85 UCR Time Series Classification Archive datasets..
33. Brian Kenji Iwana, Volkmar Frinken, and Seiichi Uchida, DTW-NN: A novel neural network for time series recognition using dynamic alignment between inputs and weights, Knowledge-Based Systems, 10.1016/j.knosys.2019.104971, 188, 104971, 2020.01, This paper describes a novel model for time series recognition called a Dynamic Time Warping Neural Network (DTW-NN). DTW-NN is a feedforward neural network that exploits the elastic matching ability of DTW to dynamically align the inputs of a layer to the weights. This weight alignment replaces the standard dot product within a neuron with DTW. In this way, the DTW-NN is able to tackle difficulties with time series recognition such as temporal distortions and variable pattern length within a feedforward architecture. We demonstrate the effectiveness of DTW-NNs on four distinct datasets: online handwritten characters, accelerometer-based active daily life activities, spoken Arabic numeral Mel-Frequency Cepstrum Coefficients (MFCC), and one-dimensional centroid-radii sequences from leaf shapes. We show that the proposed method is an effective general approach to temporal pattern learning by achieving state-of-the-art results on these datasets..
34. Yirong Zhao and Brian Kenji Iwana, Implementation and evaluation of classes incorporating drama approach methods for interactive learning in a primary school setting, International Conference of Education, Research and Innovation (ICERI), 10.21125/iceri.2019.1004, 4007-4014, 2019.11, [URL].
35. Brian Kenji Iwana, Ryohei Kuroki, Seiichi Uchida, Explaining Convolutional Neural Networks using Softmax Gradient Layer-wise Relevance Propagation, ICCV Workshops, 10.1109/ICCVW.2019.00513, 4176-4185, 2019.10, [URL], Convolutional Neural Networks (CNN) have become state-of-the-art in the field of image classification. However, not everything is understood about their inner representations. This paper tackles the interpretability and explainability of the predictions of CNNs for multi-class classification problems. Specifically, we propose a novel visualization method of pixel-wise input attribution called Softmax-Gradient Layer-wise Relevance Propagation (SGLRP). The proposed model is a class discriminate extension to Deep Taylor Decomposition (DTD) using the gradient of softmax to back propagate the relevance of the output probability to the input image. Through qualitative and quantitative analysis, we demonstrate that SGLRP can successfully localize and attribute the regions on input images which contribute to a target object's classification. We show that the proposed method excels at discriminating the target objects class from the other possible objects in the images. We confirm that SGLRP performs better than existing Layer-wise Relevance Propagation (LRP) based methods and can help in the understanding of the decision process of CNNs..
36. Yuchen Zheng, Brian Kenji Iwana, and Seiichi Uchida, Mining the Displacement of Max-pooling for Text Recognition, Pattern Recognition, 10.1016/j.patcog.2019.05.014, 93, 558-569, 2019.09, The max-pooling operation in convolutional neural networks (CNNs) downsamples the feature maps of convolutional layers. However, in doing so, it loses some spatial information. In this paper, we extract a novel feature from pooling layers, called displacement features, and combine them with the features resulting from max-pooling to capture the structural deformations for text recognition tasks. The displacement features record the location of the maximal value in a max-pooling operation. Furthermore, we analyze and mine the class-wise trends of the displacement features. The extensive experimental results and discussions demonstrate that the proposed displacement features can improve the performance of the CNN based architectures and tackle the issues with the structural deformations of max-pooling in the text recognition tasks..
37. Seokjun Kang, Brian Kenji Iwana, Seiichi Uchida, Cascading Modular U-Nets for Document Image Binarization, International Conference on Document Analysis and Recognition (ICDAR), 10.1109/ICDAR.2019.00113, 675-680, 2019.09, [URL], In recent years, U-Net has achieved good results in various image processing tasks. However, conventional U-Nets need to be re-trained for individual tasks with enough amount of images with ground-truth. This requirement makes U-Net not applicable to tasks with small amounts of data. In this paper, we propose to use "modular" U-Nets, each of which is pre-trained to perform an existing image processing task, such as dilation, erosion, and histogram equalization. Then, to accomplish a specific image processing task, such as binarization of historical document images, the modular U-Nets are cascaded with inter-module skip connections and fine-tuned to the target task. We verified the proposed model using the Document Image Binarization Competition (DIBCO) 2017 dataset..
38. Kohei Baba, Seiichi Uchida, and Brian Kenji Iwana, On the Ability of a CNN to Realize Image-to-Image Language Conversion, International Conference on Document Analysis and Recognition (ICDAR), 10.1109/ICDAR.2019.00078, 448-453, 2019.09, [URL], The purpose of this paper is to reveal the ability that Convolutional Neural Networks (CNN) have on the novel task of image-to-image language conversion. We propose a new network to tackle this task by converting images of Korean Hangul characters directly into images of the phonetic Latin character equivalent. The conversion rules between Hangul and the phonetic symbols are not explicitly provided. The results of the proposed network show that it is possible to perform image-to-image language conversion. Moreover, it shows that it can grasp the structural features of Hangul even from limited learning data. In addition, it introduces a new network to use when the input and output have significantly different features..
39. Ryo Nakao, Brian Kenji Iwana, and Seiichi Uchida, Selective Super-Resolution for Scene Text Images, International Conference on Document Analysis and Recognition (ICDAR), 401-406, 2019.09, [URL], In this paper, we realize the enhancement of super-resolution using images with scene text. Specifically, this paper proposes the use of Super-Resolution Convolutional Neural Networks (SRCNN) which are constructed to tackle issues associated with characters and text. We demonstrate that standard SRCNNs trained for general object super-resolution is not sufficient and that the proposed method is a viable method in creating a robust model for text. To do so, we analyze the characteristics of SRCNNs through quantitative and qualitative evaluations with scene text data. In addition, analysis using the correlation between layers by Singular Vector Canonical Correlation Analysis (SVCCA) and comparison of filters of each SRCNN using t-SNE is performed. Furthermore, in order to create a unified super-resolution model specialized for both text and objects, a model using SRCNNs trained with the different data types and Content-wise Network Fusion (CNF) is used. We integrate the SRCNN trained for character images and then SRCNN trained for general object images, and verify the accuracy improvement of scene images which include text. We also examine how each SRCNN affects super-resolution images after fusion..
40. Taichi Sumi, Brian Kenji Iwana, Hideaki Hayashi, and Seiichi Uchida, Modality Conversion of Handwritten Patterns by Cross Variational Autoencoders, International Conference on Document Analysis and Recognition (ICDAR), 10.1109/ICDAR.2019.00072, 407-412, 2019.09, [URL], This research attempts to construct a network that can convert online and offline handwritten characters to each other. The proposed network consists of two Variational Auto-Encoders (VAEs) with a shared latent space. The VAEs are trained to generate online and offline handwritten Latin characters simultaneously. In this way, we create a cross-modal VAE (Cross-VAE). During training, the proposed Cross-VAE is trained to minimize the reconstruction loss of the two modalities, the distribution loss of the two VAEs, and a novel third loss called the space sharing loss. This third, space sharing loss is used to encourage the modalities to share the same latent space by calculating the distance between the latent variables. Through the proposed method mutual conversion of online and offline handwritten characters is possible. In this paper, we demonstrate the performance of the Cross-VAE through qualitative and quantitative analysis..
41. Xiaomeng Wu, Akisato Kimura, Brian Kenji Iwana, Seiichi Uchida, and Kunio Kashino, Deep Dynamic Time Warping: End-to-End Local Representation Learning for Online Signature Verification, International Conference on Document Analysis and Recognition (ICDAR), 10.1109/ICDAR.2019.00179, 1103-1110, 2019.09, [URL], Siamese networks have been shown to be successful in learning deep representations for multivariate time series verification. However, most related studies optimize a global distance objective and suffer from a low discriminative power due to the loss of temporal information. To address this issue, we propose an end-to-end, neural network-based framework for learning local representations of time series, and demonstrate its effectiveness for online signature verification. This framework optimizes a Siamese network with a local embedding loss, and learns a feature space that preserves the temporal location-wise distances between time series. To achieve invariance to non-linear temporal distortion, we propose building a dynamic time warping block on top of the Siamese network, which will greatly improve the accuracy for local correspondences across intra-personal variability. Validation with respect to online signature verification demonstrates the advantage of our framework over existing techniques that use either handcrafted or learned feature representations..
42. Yuchen Zheng, Wataru Ohyama, Brian Kenji Iwana, and Seiichi Uchida, Capturing Micro Deformations from Pooling Layers for Offline Signature Verification, International Conference on Document Analysis and Recognition (ICDAR), 10.1109/ICDAR.2019.00180, 1111-1116, 2019.09, [URL], In this paper, we propose a novel Convolutional Neural Network (CNN) based method that extracts the location information (displacement features) of the maximums in the max-pooling operation and fuses it with the pooling features to capture the micro deformations between the genuine signatures and skilled forgeries as a feature extraction procedure. After the feature extraction procedure, we apply support vector machines (SVMs) as writer-dependent classifiers for each user to build the signature verification system. The extensive experimental results on GPDS-150, GPDS-300, GPDS-1000, GPDS-2000, and GPDS-5000 datasets demonstrate that the proposed method can discriminate the genuine signatures and their corresponding skilled forgeries well and achieve state-of-the-art results on these datasets..
43. Brian Kenji Iwana and Seiichi Uchida, Dynamic Weight Alignment for Temporal Convolutional Neural Networks, International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 10.1109/ICASSP.2019.8682908, 3827-3831, 2019.05, [URL], In this paper, we propose a method of improving temporal Convolutional Neural Networks (CNN) by determining the optimal alignment of weights and inputs using dynamic programming. Conventional CNN convolutions linearly match the shared weights to a window of the input. However, it is possible that there exists a more optimal alignment of weights. Thus, we propose the use of Dynamic Time Warping (DTW) to dynamically align the weights to the input of the convolutional layer. Specifically, the dynamic alignment overcomes issues such as temporal distortion by finding the minimal distance matching of the weights and the inputs under constraints. We demonstrate the effectiveness of the proposed architecture on the Unipen online handwritten digit and character datasets, the UCI Spoken Arabic Digit dataset, and the UCI Activities of Daily Life dataset..
44. Shailza Jolly, Brian Kenji Iwana, Ryohei Kuroki, and Seiichi Uchida, How do Convolutional Neural Networks Learn Design?, International Conference on Pattern Recognition (ICPR), 10.1109/ICPR.2018.8545624, 1085-1090, 2018.08, [URL], In this paper, we aim to understand the design principles in book cover images which are carefully crafted by experts. Book covers are designed in a unique way, specific to genres which convey important information to their readers. By using Convolutional Neural Networks (CNN) to predict book genres from cover images, visual cues which distinguish genres can be highlighted and analyzed. In order to understand these visual clues contributing towards the decision of a genre, we present the application of Layer-wise Relevance Propagation (LRP) on the book cover image classification results. We use LRP to explain the pixel-wise contributions of book cover design and highlight the design elements contributing towards particular genres. In addition, with the use of state-of-the-art object and text detection methods, insights about genre-specific book cover designs are discovered..
45. Brian Kenji Iwana, Minoru Mori, Akisato Kimura, and Seiichi Uchida, Introducing Local Distance-based Features to Temporal Convolutional Neural Networks, International Conference on Frontiers in Handwriting Recognition (ICFHR), 10.1109/ICFHR-2018.2018.00025, 92-97, 2018.08, [URL], In this paper, we propose the use of local distance-based features determined by Dynamic Time Warping (DTW) for temporal Convolutional Neural Networks (CNN). Traditionally, DTW is used as a robust distance metric for time series patterns. However, this traditional use of DTW only utilizes the scalar distance metric and discards the local distances between the dynamically matched sequence elements. This paper proposes recovering these local distances, or DTW features, and utilizing them for the input of a CNN. We demonstrate that these features can provide additional information for the classification of isolated handwritten digits and characters. Furthermore, we demonstrate that the DTW features can be combined with the spatial coordinate features in multi-modal fusion networks to achieve state-of-the-art accuracy on the Unipen online handwritten character datasets..
46. Yuchen Zheng, Brian Kenji Iwana, and Seiichi Uchida, Discovering Class-Wise Trends of Max-Pooling in Subspace, International Conference on Frontiers in Handwriting Recognition (ICFHR), 10.1109/ICFHR-2018.2018.00026, 98-103, 2018.08, [URL], The traditional max-pooling operation in Convolutional Neural Networks (CNNs) only obtains the maximal value from a pooling window. However, it discards the information about the precise position of the maximal value. In this paper, we extract the location of the maximal value in a pooling window and transform it into "displacement feature". We analyze and discover the class-wise trend of the displacement features in many ways. The experimental results and discussion demonstrate that the displacement features have beneficial behaviors for solving the problems in max-pooling..
47. Gantugs Atarsaikhan, Brian Kenji Iwana, and Seiichi Uchida, Contained Neural Style Transfer for Decorated Logo Generation, International Workshop on Document Analysis Systems (DAS), 10.1109/DAS.2018.78, 2018.04, [URL], Making decorated logos requires image editing skills, without sufficient skills, it could be a time-consuming task. While there are many on-line web services to make new logos, they have limited designs and duplicates can be made. We propose using neural style transfer with clip art and text for the creation of new and genuine logos. We introduce a new loss function based on distance transform of the input image, which allows the preservation of the silhouettes of text and objects. The proposed method contains style transfer to only a designated area. We demonstrate the characteristics of proposed method. Finally, we show the results of logo generation with various input images..
48. Kotaro Abe, Brian Kenji Iwana, Viktor Gösta Holmér, and Seiichi Uchida, Font Creation Using Class Discriminative Deep Convolutional Generative Adversarial Networks, Asian Conference on Pattern Recognition (ACPR), 10.1109/ACPR.2017.99, 232-237, 2017.11, [URL], In this research, we attempt to generate fonts automatically using a modification of a Deep Convolutional Generative Adversarial Network (DCGAN) by introducing class consideration. DCGANs are the application of generative adversarial networks (GAN) which make use of convolutional and deconvolutional layers to generate data through adversarial detection. The conventional GAN is comprised of two neural networks that work in series. Specifically, it approaches an unsupervised method of data generation with the use of a generative network whose output is fed into a second discriminative network. While DCGANs have been successful on natural images, we show its limited ability on font generation due to the high variation of fonts combined with the need of rigid structures of characters. We propose a class discriminative DCGAN which uses a classification network to work alongside the discriminative network to refine the generative network. This results of our experiment shows a dramatic improvement over the conventional DCGAN..
49. Gantugs Atarsaikhan, Brian Kenji Iwana, Atsushi Narusawa, Keiji Yanai, and Seiichi Uchida, Neural font style transfer, 10.1109/ICDAR.2017.328, 51-56, 2017.11, [URL], In this paper, we chose an approach to generate fonts by using neural style transfer. Neural style transfer uses Convolution Neural Networks(CNN) to transfer the style of one image to another. By modifying neural style transfer, we can achieve neural font style transfer. We also demonstrate the effects of using different weighted factors, character placements, and orientations. In addition, we show the results of using non-Latin alphabets, non-text patterns, and non-text images as style images. Finally, we provide insight into the characteristics of style transfer with fonts..
50. Brian Kenji Iwana, Letao Zhou, Kumiko Tanaka-Ishii, and Seiichi Uchida, Component Awareness in Convolutional Neural Networks, International Conference on Document Analysis and Recognition (ICDAR), 10.1109/ICDAR.2017.72, 394-399, 2017.11, [URL], In this work, we investigate the ability of Convolutional Neural Networks (CNN) to infer the presence of components that comprise an image. In recent years, CNNs have achieved powerful results in classification, detection, and segmentation. However, these models learn from instance-level supervision of the detected object. In this paper, we determine if CNNs can detect objects using image-level weakly supervised labels without localization. To demonstrate that a CNN can infer awareness of objects, we evaluate a CNN's classification ability with a database constructed of Chinese characters with only character-level labeled components. We show that the CNN is able to achieve a high accuracy in identifying the presence of these components without specific knowledge of the component. Furthermore, we verify that the CNN is deducing the knowledge of the target component by comparing the results to an experiment with the component removed. This research is important for applications with large amounts of data without robust annotation such as Chinese character recognition..
51. Jinho Lee, Brian Kenji Iwana, Shota Ide, Hideaki Hayashi, and Seiichi Uchida, Globally Optimal Object Tracking with Complementary Use of Single Shot Multibox Detector and Fully Convolutional Network, Pacific-Rim Symposium on Image and Video Technology (PSIVT), 110-112, 2017.07, Tracking is one of the most important but still difficult tasks in computer vision and pattern recognition. The main difficulties in the tracking field are appearance variation and occlusion. Most traditional tracking methods set the parameters or templates to track target objects in advance and should be modified accordingly. Thus, we propose a new and robust tracking method using a Fully Convolutional Network (FCN) to obtain an object probability map and Dynamic Programming (DP) to seek the globally optimal path through all frames of video. Our proposed method solves the object appearance variation problem with the use of a FCN and deals with occlusion by DP. We show that our method is effective in tracking various single objects through video frames..
52. Brian Kenji Iwana, Kaspar Riesen, Volkmar Frinken, and Seiichi Uchida, Efficient temporal pattern recognition by means of dissimilarity space embedding with discriminative prototypes, Pattern Recognition, 10.1016/j.patcog.2016.11.013, 64, 268-276, 2017.04, Dissimilarity space embedding (DSE) presents a method of representing data as vectors of dissimilarities. This representation is interesting for its ability to use a dissimilarity measure to embed various patterns (e.g. graph patterns with different topology and temporal patterns with different lengths) into a vector space. The method proposed in this paper uses a dynamic time warping (DTW) based DSE for the purpose of the classification of massive sets of temporal patterns. However, using large data sets introduces the problem of requiring a high computational cost. To address this, we consider a prototype selection approach. A vector space created by DSE offers us the ability to treat its independent dimensions as features allowing for the use of feature selection. The proposed method exploits this and reduces the number of prototypes required for accurate classification. To validate the proposed method we use two-class classification on a data set of handwritten on-line numerical digits. We show that by using DSE with ensemble classification, high accuracy classification is possible with very few prototypes..
53. Seiichi Uchida, Shota Ide, Brian Kenji Iwana, and Anna Zhu, A further step to perfect accuracy by training CNN with larger data, International Conference on Frontiers in Handwriting Recognition (ICFHR), 10.1109/ICFHR.2016.0082, 405-410, 2016.10, [URL], Convolutional Neural Networks (CNN) are on the forefront of accurate character recognition. This paper explores CNNs at their maximum capacity by implementing the use of large datasets. We show a near-perfect performance by using a dataset of about 820,000 real samples of isolated handwritten digits, much larger than the conventional MNIST database. In addition, we report a near-perfect performance on the recognition of machine-printed digits and multi-font digital born digits. Also, in order to progress toward a universal OCR, we propose methods of combining the datasets into one classifier. This paper reveals the effects of combining the datasets prior to training and the effects of transfer learning during training. The results of the proposed methods also show an almost perfect accuracy suggesting the ability of the network to generalize all forms of text..
54. Brian Kenji Iwana, Volkmar Frinken, and Seiichi Uchida, A Robust Dissimilarity-Based Neural Network for Temporal Pattern Recognition, International Conference on Frontiers in Handwriting Recognition (ICFHR), 10.1109/ICFHR.2016.0058, 265-270, 2016.10, [URL], Temporal pattern recognition is challenging because temporal patterns require extra considerations over other data types, such as order, structure, and temporal distortions. Recently, there has been a trend in using large data and deep learning, however, many of the tools cannot be directly used with temporal patterns. Convolutional Neural Networks (CNN) for instance are traditionally used for visual and image pattern recognition. This paper proposes a method using a neural network to classify isolated temporal patterns directly. The proposed method uses dynamic time warping (DTW) as a kernel-like function to learn dissimilarity-based feature maps as the basis of the network. We show that using the proposed DTW-NN, efficient classification of on-line handwritten digits is possible with accuracies comparable to state-of-the-art methods..
55. Anna Zhu, Guoyou Wang, Yangbo Dong, and Brian Kenji Iwana, Detecting text in natural scene images with conditional clustering and convolution neural network, Journal of Electronic Imaging, 10.1117/1.JEI.24.5.053019, 24, 5, 053019, 2015.09, We present a robust method of detecting text in natural scenes. The work consists of four parts. First, automatically partition the images into different layers based on conditional clustering. The clustering operates in two sequential ways. One has a constrained clustering center and conditional determined cluster numbers, which generate small-size subregions. The other has fixed cluster numbers, which generate full-size subregions. After the clustering, we obtain a bunch of connected components (CCs) in each subregion. In the second step, the convolutional neural network (CNN) is used to classify those CCs to character components or noncharacter ones. The output score of the CNN can be transferred to the postprobability of characters. Then we group the candidate characters into text strings based on the probability and location. Finally, we use a verification step. We choose a multichannel strategy to evaluate the performance on the public datasets: ICDAR2011 and ICDAR2013. The experimental results demonstrate that our algorithm achieves a superior performance compared with the state-of-the-art text detection algorithms..
56. Brian Kenji Iwana, Seiichi Uchida, Kaspar Riesen, and Volkmar Frinken, Tackling Temporal Pattern Recognition by Vector Space Embedding, International Conference on Document Analysis and Recognition (ICDAR), 10.1109/ICDAR.2015.7333875, 816-820, 2015.08, [URL], This paper introduces a novel method of reducing the number of prototype patterns necessary for accurate recognition of temporal patterns. The nearest neighbor (NN) method is an effective tool in pattern recognition, but the downside is it can be computationally costly when using large quantities of data. To solve this problem, we propose a method of representing the temporal patterns by embedding dynamic time warping (DTW) distance based dissimilarities in vector space. Adaptive boosting (AdaBoost) is then applied for classifier training and feature selection to reduce the number of prototype patterns required for accurate recognition. With a data set of handwritten digits provided by the International Unipen Foundation (iUF), we successfully show that a large quantity of temporal data can be efficiently classified produce similar results to the established NN method while performing at a much smaller cost..

九大関連コンテンツ

pure2017年10月2日から、「九州大学研究者情報」を補完するデータベースとして、Elsevier社の「Pure」による研究業績の公開を開始しました。