Faculty Profiles - IWANA KENJI BRIAN

Information

写真a

IWANA KENJI BRIAN

Organization

Faculty of Information Science and Electrical Engineering Department of Advanced Information Technology Associate Professor
School of Engineering Department of Electrical Engineering and Computer Science（Concurrent）
Graduate School of Information Science and Electrical Engineering Department of Information Science and Technology（Concurrent）

Contact information

Profile

2001-2005: B.S. Computer Engineering from the University of California Irvine (Irvine, USA) 2005-2007: IT Consultant at Elite Development Group (Aliso Viejo, USA) 2007-2009: Software Developer at Talisman LBS, LLC (Honolulu, USA) 2010-2014: Software Developer at National Aeronautics and Space Administration (NASA) (Mountain View, USA) 2015-2018: Ph.D. Information Science and Electrical Engineering from Kyushu University (Fukuoka, Japan) 2018-2020: Assistant Professor at Kyushu University (Fukuoka, Japan) 2020-current: Associate Professor at Kyushu University (Fukuoka, Japan)

Homepage

http://human.ait.kyushu-u.ac.jp/
Human Interface Laboratory
http://brianiwana.com/
Personal Website

External link

Research Areas

Informatics / Intelligent informatics
Informatics / Robotics and intelligent system

Degree

Ph.D. Information Science and Electrical Engineering

Education

Kyushu University Information Science and Electrical Engineering

2015.4 - 2018.3

　 More details

Country：Japan
University of California Irvine Electrical and Computer Engineering

2001.9 - 2005.6

　 More details

Country：United States

Research Interests・Research Keywords

Research theme： Dynamic Neural Architecture Warping for Time Series Recognition

Keyword： Dynamic Time Warping, Neural Networks

Research period： 2021.3 - 2022.7
Research theme： Time series feature extraction

Keyword： Dynamic Time Warping, Neural Networks

Research period： 2018.4 - 2021.4
Research theme： Dynamic weight alignment for convolutional neural networks

Keyword： Dynamic Time Warping, Neural Networks

Research period： 2018.3 - 2020.1
Research theme： Dynamic Time Warping Neural Network (DTW-NN)

Keyword： Dynamic Time Warping, Neural Networks

Research period： 2016.4 - 2019.1
Research theme： Prototype selection for dissimilarity space embedding using AdaBoost

Keyword： Boosting, Dissimilarity Space Embedding, Dynamic Time Warping

Research period： 2014.11 - 2017.1

Papers

Facial Gesture Classification with Few-shot Learning Using Limited Calibration Data from Photo-reflective Sensors on Smart Eyewear Reviewed

Katsutoshi Masai, Maki Sugimoto, Brian Kenji Iwana

International Conference on Mobile and Ubiquitous Multimedia 432 - 438 2024.12

　More details

Authorship：Last author Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1145/3701571.3701595
Improving the Robustness of Time Series Neural Networks from Adversarial Attacks Using Time Warping Reviewed

Yoh Yamashita and Brian Kenji Iwana

International Conference on Pattern Recognition (ICPR) 2024.12

　More details

Authorship：Last author Language：English Publishing type：Research paper (international conference proceedings)
Improving Online Handwriting Recognition with Transfer Learning Using Out-of-Domain and Different-Dimensional Sources Reviewed

Jiseok Lee, Masaki Akiba, and Brian Kenji Iwana

International Conference on Pattern Recognition (ICPR) 2024.12

　More details

Authorship：Last author Language：English Publishing type：Research paper (international conference proceedings)
Model Selection with a Shapelet-based Distance Measure for Multi-source Transfer Learning in Time Series Classification Reviewed

Jiseok Lee and Brian Kenji Iwana

International Conference on Pattern Recognition (ICPR) 2024.12

　More details

Authorship：Last author Language：English Publishing type：Research paper (international conference proceedings)
Test Time Augmentation as a Defense Against Adversarial Attacks on Online Handwriting Reviewed International journal

#Yoh Yamashita, Brian Kenji Iwnaa

International Conference on Document Analysis and Recognition (ICDAR) 2024.9

　More details

Language：English Publishing type：Research paper (international conference proceedings)
Scene text recognition via dual character counting-aware visual and semantic modeling network Reviewed International journal

@Ke Xiao, Anna Zhu, Brian Kenji Iwana, Cheng-Lin Liu

Science China Information Sciences 67 ( 3 ) 139101:1 - 139101:1 2024.2

　More details

Language：English Publishing type：Research paper (scientific journal)

DOI： 10.1007/s11432-023-3935-8
Using Motif-Based Features to Improve Signal Classification with Temporal Neural Networks Reviewed International journal

@Karthikeyan Suresh, Brian Kenji Iwana

Asian Conference on Pattern Recognition (ACPR) 2023.11

　More details

Language：English Publishing type：Research paper (international conference proceedings)
Few shot font generation via transferring similarity guided global style and quantization local style Reviewed International journal

Wei Pan, Anna Zhu, Xinyu Zhou, Brian Kenji Iwana, Shilin Li

International Conference on Computer Vision (ICCV) 2023.10

　More details

Language：English Publishing type：Research paper (international conference proceedings)
Contour Completion by Transformers and Its Application to Vector Font Data Reviewed International journal

#Yusuke Nagata, Brian Kenji Iwana, Seiichi Uchida

International Conference on Document Analysis and Recognition (ICDAR) 2023.8

　More details

Language：English Publishing type：Research paper (international conference proceedings)
FETNet: Feature erasing and transferring network for scene text removal Reviewed International journal

Guangtao Lyu, Kun Liu, #Anna Zhu, Seiichi Uchida, Brian Kenji Iwana

Pattern Recognition 2023.8

　More details

Language：English Publishing type：Research paper (scientific journal)
Vision Conformer: Incorporating Convolutions into Vision Transformer Layers Reviewed International journal

Brian Kenji Iwana, @Akihiro Kusuda

International Conference on Document Analysis and Recognition (ICDAR) 2023.8

　More details

Language：English Publishing type：Research paper (international conference proceedings)
Deep attentive time warping Reviewed International journal

#Shinnosuke Matsuo, Xiaomeng Wu, Gantugs Atarsaikhan, Akisato Kimura, Kunio Kashino

Pattern Recognition 136 2023.4

　More details

Language：English Publishing type：Research paper (scientific journal)

DOI： 10.1016/j.patcog.2022.109201
Classification of Polysemous and Homograph Word Usages using Semi-Supervised Learning

#Sangjun Han, Brian Kenji Iwana, Satoru Uchida

Annual Conference of the Association for Natural Language Processing (NLP) 2023.3

　More details

Language：English Publishing type：Research paper (other academic)
Text Style Transfer based on Multi-factor Disentanglement and Mixture Reviewed International journal

Anna Zhu, Zhanhui Yin, Brian Kenji Iwana, Xinyu Zhou, Shengwu Xiong

ACM Multimedia 2022.10

　More details

Language：English Publishing type：Research paper (international conference proceedings)
Dynamic Data Augmentation with Gating Networks for Time Series Recognition Reviewed International journal

#Daisuke Oba, Brian Kenji Iwana, #Shinnosuke Matsuo

International Conference on Pattern Recognition (ICPR) 2022.8

　More details

Language：English Publishing type：Research paper (international conference proceedings)
On Mini-Batch Training with Varying Length Time Series Reviewed International journal

Brian Kenji Iwana

International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2022.5

　More details

Language：English Publishing type：Research paper (international conference proceedings)
Learning the micro deformations by max-pooling for offline signature verification Reviewed International journal

#Yuchen Zheng, Brian Kenji Iwana, Muhammad Imran Malik, Sheraz Ahmed, Wataru Ohyama, Seiichi Uchida

Pattern Recognition 118 108008 2021.10

　More details

Language：English Publishing type：Research paper (scientific journal)

DOI： 10.1016/j.patcog.2021.108008
Attention to Warp: Deep Metric Learning for Multivariate Time Series Reviewed International journal

#Shinnosuke Matsuo, Xiaomeng Wu, #Gantugs Atarsaikhan, Akisato Kimura, Kunio Kashino, Brian Kenji Iwana, and Seiichi Uchida

International Conference on Document Analysis and Recognition (ICDAR) 2021.9

　More details

Language：English Publishing type：Research paper (international conference proceedings)
Using Robust Regression to Find Font Usage Trends Reviewed International journal

#Kaigen Tsuji, Seiichi Uchida, Brian Kenji Iwana

ICDAR Workshop on Machine Learning 2021.9

　More details

Language：English Publishing type：Research paper (international conference proceedings)
Towards Book Cover Design via Layout Graphs Reviewed International journal

#Wensheng Zhang, #Yan Zheng, #Taiga Miyazono, Seiichi Uchida, and Brian Kenji Iwana

International Conference on Document Analysis and Recognition (ICDAR) 2021.9

　More details

Language：English Publishing type：Research paper (international conference proceedings)
Font Style that Fits an Image -- Font Generation Based on Image Context Reviewed International journal

#Taiga Miyazono, #Daichi Haraguchi, Seiichi Uchida, and Brian Kenji Iwana

International Conference on Document Analysis and Recognition (ICDAR) 2021.9

　More details

Language：English Publishing type：Research paper (international conference proceedings)
An Empirical Survey of Data Augmentation for Time Series Classification with Neural Networks Reviewed International journal

Brian Kenji Iwana, Seiichi Uchida

PLOS ONE 2021.7

　More details

Language：English Publishing type：Research paper (scientific journal)

In recent times, deep artificial neural networks have achieved many successes in pattern recognition. Part of this success can be attributed to the reliance on big data to increase generalization. However, in the field of time series recognition, many datasets are often very small. One method of addressing this problem is through the use of data augmentation. In this paper, we survey data augmentation techniques for time series and their application to time series classification with neural networks. We propose a taxonomy and outline the four families in time series data augmentation, including transformation-based methods, pattern mixing, generative models, and decomposition methods. Furthermore, we empirically evaluate 12 time series data augmentation methods on 128 time series classification datasets with six different types of neural networks. Through the results, we are able to analyze the characteristics, advantages and disadvantages, and recommendations of each data augmentation method. This survey aims to help in the selection of time series data augmentation for neural network applications.

Repository Public URL： https://hdl.handle.net/2324/7341549

Open data URL： https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0254841
Tunable U-Net: Controlling image-to-image outputs using a tunable scalar value Reviewed International journal

#Seokjun Kang, Seiichi Uchida, Brian Kenji Iwana

IEEE Access 2021.7

　More details

Language：English Publishing type：Research paper (scientific journal)

DOI： 10.1109/ACCESS.2021.3096530
Self-Augmented Multi-Modal Feature Embedding Reviewed International journal

#Shinnosuke Matsuo, Brian Kenji Iwana, and Seiichi Uchida

International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2021.6

　More details

Language：English Publishing type：Research paper (international conference proceedings)
Time Series Data Augmentation for Neural Networks by Time Warping with a Discriminative Teacher Reviewed

Brian Kenji Iwana, Seiichi Uchida

International Conference on Pattern Recognition (ICPR) 2021.1

　More details

Language：English Publishing type：Research paper (other academic)

Neural networks have become a powerful tool in pattern recognition and part of their success is due to generalization from using large datasets. However, unlike other domains, time series classification datasets are often small. In order to address this problem, we propose a novel time series data augmentation called guided warping. While many data augmentation methods are based on random transformations, guided warping exploits the element alignment properties of Dynamic Time Warping (DTW) and shapeDTW, a high-level DTW method based on shape descriptors, to deterministically warp sample patterns. In this way, the time series are mixed by warping the features of a sample pattern to match the time steps of a reference pattern. Furthermore, we introduce a discriminative teacher in order to serve as a directed reference for the guided warping. We evaluate the method on all 85 datasets in the 2015 UCR Time Series Archive with a deep convolutional neural network (CNN) and a recurrent neural network (RNN). The code with an easy to use implementation can be found at this https URL .
Complex image processing with less data—Document image binarization by integrating multiple pre-trained U-Net modules Reviewed International journal

#Seokjun Kang, Brian Kenji Iwana, Seiichi Uchida

Pattern Recognition 106 107577 2021.1

　More details

Language：English Publishing type：Research paper (scientific journal)

DOI： 10.1016/j.patcog.2020.107577
Neural Style Difference Transfer and Its Application to Font Generation Reviewed

#Gantugs Atarsaikhan, Brian Kenji Iwana, and Seiichi Uchida

International Workshop on Document Analysis Systems (DAS) 2020.10

　More details

Language：English Publishing type：Research paper (other academic)

Designing fonts requires a great deal of time and effort. It requires professional skills, such as sketching, vectorizing, and image editing. Additionally, each letter has to be designed individually. In this paper, we will introduce a method to create fonts automatically. In our proposed method, the difference of font styles between two different fonts is found and transferred to another font using neural style transfer. Neural style transfer is a method of stylizing the contents of an image with the styles of another image. We proposed a novel neural style difference and content difference loss for the neural style transfer. With these losses, new fonts can be generated by adding or removing font styles from a font. We provided experimental results with various combinations of input fonts and discussed limitations and future development for the proposed method.
What is the Reward for Handwriting? --- Handwriting Generation by Imitation Learning Reviewed

#Keisuke Kanda, Brian Kenji Iwana, and Seiichi Uchida

International Conference on Frontiers in Handwriting Recognition (ICFHR) 2020.9

　More details

Language：English Publishing type：Research paper (other academic)

Analyzing the handwriting generation process is an important issue and has been tackled by various generation models, such as kinematics based models and stochastic models. In this study, we use a reinforcement learning (RL) framework to realize handwriting generation with the careful future planning ability. In fact, the handwriting process of human beings is also supported by their future planning ability; for example, the ability is necessary to generate a closed trajectory like '0' because any shortsighted model, such as a Markovian model, cannot generate it. For the algorithm, we employ generative adversarial imitation learning (GAIL). Typical RL algorithms require the manual definition of the reward function, which is very crucial to control the generation process. In contrast, GAIL trains the reward function along with the other modules of the framework. In other words, through GAIL, we can understand the reward of the handwriting generation process from handwriting examples. Our experimental results qualitatively and quantitatively show that the learned reward catches the trends in handwriting generation and thus GAIL is well suited for the acquisition of handwriting behavior.
Negative Pseudo Labeling using Class Proportion for Semantic Segmentation in Pathology Reviewed

#Hiroki Tokunaga, Brian Kenji Iwana, Yuki Teramoto, Akihiko Yoshizawa, and Ryoma Bise

European Conference on Computer Vision (ECCV) 2020.8

　More details

Language：English Publishing type：Research paper (other academic)

We propose a weakly-supervised cell tracking method that can train a convolutional neural network (CNN) by using only the annotation of "cell detection" (i.e., the coordinates of cell positions) without association information, in which cell positions can be easily obtained by nuclear staining. First, we train a co-detection CNN that detects cells in successive frames by using weak-labels. Our key assumption is that the co-detection CNN implicitly learns association in addition to detection. To obtain the association information, we propose a backward-and-forward propagation method that analyzes the correspondence of cell positions in the detection maps output of the co-detection CNN. Experiments demonstrated that the proposed method can match positions by analyzing the co-detection CNN. Even though the method uses only weak supervision, the performance of our method was almost the same as the state-of-the-art supervised method.
Effect of Text Color on Word Embeddings Reviewed

#Masaya Ikoma, Brian Kenji Iwana, and Seiichi Uchida

International Workshop on Document Analysis Systems (DAS) 2020.7

　More details

Language：English Publishing type：Research paper (other academic)

In natural scenes and documents, we can find the correlation between a text and its color. For instance, the word, "hot", is often printed in red, while "cold" is often in blue. This correlation can be thought of as a feature that represents the semantic difference between the words. Based on this observation, we propose the idea of using text color for word embeddings. While text-only word embeddings (e.g. word2vec) have been extremely successful, they often represent antonyms as similar since they are often interchangeable in sentences. In this paper, we try two tasks to verify the usefulness of text color in understanding the meanings of words, especially in identifying synonyms and antonyms. First, we quantify the color distribution of words from the book cover images and analyze the correlation between the color and meaning of the word. Second, we try to retrain word embeddings with the color distribution of words as a constraint. By observing the changes in the word embeddings of synonyms and antonyms before and after re-training, we aim to understand the kind of words that have positive or negative effects in their word embeddings when incorporating text color information.
Guided neural style transfer for shape stylization Reviewed International journal

#Gantugs Atarsaikhan, Brian Kenji Iwana, and Seiichi Uchida

PLOS ONE 15 ( 6 ) e0233489 2020.6

　More details

Language：English Publishing type：Research paper (scientific journal)

Designing logos, typefaces, and other decorated shapes can require professional skills. In this paper, we aim to produce new and unique decorated shapes by stylizing ordinary shapes with machine learning. Specifically, we combined parametric and non-parametric neural style transfer algorithms to transfer both local and global features. Furthermore, we introduced a distance-based guiding to the neural style transfer process, so that only the foreground shape will be decorated. Lastly, qualitative evaluation and ablation studies are provided to demonstrate the usefulness of the proposed method.

DOI： 10.1371/journal.pone.0233489

Other Link： https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0233489
Character-independent font identification Reviewed

#Daichi Haraguchi, #Shota Harada, Yuto Shinahara, Brian Kenji Iwana, and Seiichi Uchida

International Workshop on Document Analysis Systems (DAS) 2020.6

　More details

Language：English Publishing type：Research paper (other academic)

There are a countless number of fonts with various shapes and styles. In addition, there are many fonts that only have subtle differences in features. Due to this, font identification is a difficult task. In this paper, we propose a method of determining if any two characters are from the same font or not. This is difficult due to the difference between fonts typically being smaller than the difference between alphabet classes. Additionally, the proposed method can be used with fonts regardless of whether they exist in the training or not. In order to accomplish this, we use a Convolutional Neural Network (CNN) trained with various font image pairs. In the experiment, the network is trained on image pairs of various fonts. We then evaluate the model on a different set of fonts that are unseen by the network. The evaluation is performed with an accuracy of 92.27%. Moreover, we analyzed the relationship between character classes and font identification accuracy.
ACMU-Net: Advanced Cascading Modular U-Nets incorporated Squeeze and Excitation Blocks Reviewed

#Seokjun Kang, Brian Kenji Iwana, and Seiichi Uchida

International Workshop on Document Analysis Systems (DAS) 2020.6

　More details

Language：English Publishing type：Research paper (other academic)

In document analysis research, image-to-image conversion models such as a U-Net have been shown significant performance. Recently, cascaded U-Nets research is suggested for solving complex document analysis studies. However, improving performance by adding U-Net modules requires using too many parameters in cascaded U-Nets. Therefore, in this paper, we propose a method for enhancing the performance of cascaded U-Nets. We suggest a novel document image binarization method by utilizing Cascading Modular U-Nets (CMU-Nets) and Squeeze and Excitation blocks (SE-blocks). Through verification experiments, we point out the problems caused by the use of SE-blocks in existing CMU-Nets and suggest how to use SE-blocks in CMU-Nets. We use the Document Image Binarization (DIBCO) 2017 dataset to evaluate the proposed model.
Few-Shot Text Style Transfer via Deep Feature Similarity Reviewed International journal

Anna Zhu, Xiongbo Lu, Xiang Bai, Seiichi Uchida

IEEE Transactions on Image Processing 2020.5

　More details

Language：English Publishing type：Research paper (scientific journal)

Generating text to have a consistent style with only a few observed highly-stylized text samples is a difficult task for image processing. The text style involving the typography, i.e., font, stroke, color, decoration, effects, etc., should be considered for transfer. In this paper, we propose a novel approach to stylize target text by decoding weighted deep features from only a few referenced samples. The deep features, including content and style features of each referenced text, are extracted from a Convolutional Neural Network (CNN) that is optimized for character recognition. Then, we calculate the similarity scores of the target text and the referenced samples by measuring the distance along the corresponding channels from the content features of the CNN when considering only the content, and assign them as the weights for aggregating the deep features. To enforce the stylized text to be realistic, a discriminative network with adversarial loss is employed. We demonstrate the effectiveness of our network by conducting experiments on three different datasets which have various styles, fonts, languages, etc. Additionally, the coefficients for character style transfer, including the character content, the effect of similarity matrix, the number of referenced characters, the similarity between characters, and performance evaluation by a new protocol are analyzed for better understanding our proposed framework.

DOI： 10.1109/TIP.2020.2995062

Other Link： https://ieeexplore.ieee.org/document/9098082
Benchmarking Deep Learning Models for Classification of Book Covers Reviewed International journal

@Adriano Lucieri, @Huzaifa Sabir, @Shoaib Ahmed Siddiqui, @Syed Tahseen Raza Rizvi, Brian Kenji Iwana, Seiichi Uchida, Andreas Dengel, and Sheraz Ahmed

Springer Nature Computer Science 1 ( 139 ) 1 - 16 2020.4

　More details

Language：English Publishing type：Research paper (scientific journal)

Book covers usually provide a good depiction of a book’s content and its central idea. The classification of books in their respective genre usually involves subjectivity and contextuality. Book retrieval systems would utterly benefit from an automated framework that is able to classify a book’s genre based on an image, specifically for archival documents where digitization of the complete book for the purpose of indexing is an expensive task. While various modalities are available (e.g., cover, title, author, abstract), benchmarking the image-based classification systems based on minimal information is a particularly exciting field due to the recent advancements in the domain of image-based deep learning and its applicability. For that purpose, a natural question arises regarding the plausibility of solving the problem of book classification by only utilizing an image of its cover along with the current state-of-the-art deep learning models. To answer this question, this paper makes a three-fold contribution. First, the publicly available book cover dataset comprising of 57k book covers belonging to 30 different categories is thoroughly analyzed and corrected. Second, it benchmarks the performance on a battery of state-of-the-art image classification models for the task of book cover classification. Third, it uses explicit attention mechanisms to identify the regions that the network focused on in order to make the prediction. All of our evaluations were performed on a subset of the mentioned public book cover dataset. Analysis of the results revealed the inefficacy of the most powerful models for solving the classification task. With the obtained results, it is evident that significant efforts need to be devoted in order to solve this image-based classification task to a satisfactory level.

DOI： 10.1007/s42979-020-00132-z

Other Link： https://link.springer.com/article/10.1007%2Fs42979-020-00132-z
学校教育における演劇的手法を取り入れた表現教育 : 中学生を対象とした教育実践を事例に Reviewed

#Yirong Zhao, Brian Kenji Iwana, Kun Qian

九州地区国立大学教育系・文系研究論文集 6 ( 1/2 ) 2020.3

　More details

Language：English Publishing type：Research paper (scientific journal)

Drama Approach as an Educational Practice in Secondary Education: A Case Study in a Japanese Middle School

Other Link： https://catalog.lib.kyushu-u.ac.jp/opac_detail_md/?lang=0&amode=MD100000&bibid=2559291
Time series classification using local distance-based features in multi-modal fusion networks Reviewed International journal

Brian Kenji Iwana and Seiichi Uchida

Pattern Recognition 97 107024 2020.1

　More details

Language：English Publishing type：Research paper (scientific journal)

We propose the use of a novel feature, called local distance features, for time series classification. The local distance features are extracted using Dynamic Time Warping (DTW) and classified using Convolutional Neural Networks (CNN). DTW is classically as a robust distance measure for distance-based time series recognition methods. However, by using DTW strictly as a global distance measure, information about the matching is discarded. We show that this information can further be used as supplementary input information in temporal CNNs. This is done by using both the raw data and the features extracted from DTW in multi-modal fusion CNNs. Furthermore, we explore the effects of different prototype selection methods, prototype numbers, and data fusion schemes induce on the accuracy. We perform experiments on a wide range of time series datasets including three Unipen handwriting datasets, four UCI Machine Learning Repository datasets, and 85 UCR Time Series Classification Archive datasets.

DOI： 10.1016/j.patcog.2019.107024

Other Link： https://www.sciencedirect.com/science/article/pii/S0031320319303279?via%3Dihub
DTW-NN: A novel neural network for time series recognition using dynamic alignment between inputs and weights Reviewed International journal

Brian Kenji Iwana, Volkmar Frinken, and Seiichi Uchida

Knowledge-Based Systems 188 104971 2020.1

　More details

Language：English Publishing type：Research paper (scientific journal)

This paper describes a novel model for time series recognition called a Dynamic Time Warping Neural Network (DTW-NN). DTW-NN is a feedforward neural network that exploits the elastic matching ability of DTW to dynamically align the inputs of a layer to the weights. This weight alignment replaces the standard dot product within a neuron with DTW. In this way, the DTW-NN is able to tackle difficulties with time series recognition such as temporal distortions and variable pattern length within a feedforward architecture. We demonstrate the effectiveness of DTW-NNs on four distinct datasets: online handwritten characters, accelerometer-based active daily life activities, spoken Arabic numeral Mel-Frequency Cepstrum Coefficients (MFCC), and one-dimensional centroid-radii sequences from leaf shapes. We show that the proposed method is an effective general approach to temporal pattern learning by achieving state-of-the-art results on these datasets.

DOI： 10.1016/j.knosys.2019.104971

Other Link： https://www.sciencedirect.com/science/article/abs/pii/S0950705119303995?via%3Dihub
Implementation and evaluation of classes incorporating drama approach methods for interactive learning in a primary school setting Reviewed

#Yirong Zhao and Brian Kenji Iwana

International Conference of Education, Research and Innovation (ICERI) 4007 - 4014 2019.11

　More details

Language：English Publishing type：Research paper (other academic)

DOI： 10.21125/iceri.2019.1004
Explaining Convolutional Neural Networks using Softmax Gradient Layer-wise Relevance Propagation Reviewed

Brian Kenji Iwana, #Ryohei Kuroki, Seiichi Uchida

ICCV Workshops 4176 - 4185 2019.10

　More details

Language：English Publishing type：Research paper (other academic)

Convolutional Neural Networks (CNN) have become state-of-the-art in the field of image classification. However, not everything is understood about their inner representations. This paper tackles the interpretability and explainability of the predictions of CNNs for multi-class classification problems. Specifically, we propose a novel visualization method of pixel-wise input attribution called Softmax-Gradient Layer-wise Relevance Propagation (SGLRP). The proposed model is a class discriminate extension to Deep Taylor Decomposition (DTD) using the gradient of softmax to back propagate the relevance of the output probability to the input image. Through qualitative and quantitative analysis, we demonstrate that SGLRP can successfully localize and attribute the regions on input images which contribute to a target object's classification. We show that the proposed method excels at discriminating the target objects class from the other possible objects in the images. We confirm that SGLRP performs better than existing Layer-wise Relevance Propagation (LRP) based methods and can help in the understanding of the decision process of CNNs.

DOI： 10.1109/ICCVW.2019.00513
Mining the Displacement of Max-pooling for Text Recognition Reviewed International journal

#Yuchen Zheng, Brian Kenji Iwana, and Seiichi Uchida

Pattern Recognition 93 558 - 569 2019.9

　More details

Language：English Publishing type：Research paper (scientific journal)

The max-pooling operation in convolutional neural networks (CNNs) downsamples the feature maps of convolutional layers. However, in doing so, it loses some spatial information. In this paper, we extract a novel feature from pooling layers, called displacement features, and combine them with the features resulting from max-pooling to capture the structural deformations for text recognition tasks. The displacement features record the location of the maximal value in a max-pooling operation. Furthermore, we analyze and mine the class-wise trends of the displacement features. The extensive experimental results and discussions demonstrate that the proposed displacement features can improve the performance of the CNN based architectures and tackle the issues with the structural deformations of max-pooling in the text recognition tasks.

DOI： 10.1016/j.patcog.2019.05.014

Other Link： https://www.sciencedirect.com/science/article/abs/pii/S003132031930189X?via%3Dihub
Capturing Micro Deformations from Pooling Layers for Offline Signature Verification Reviewed

#Yuchen Zheng, Wataru Ohyama, Brian Kenji Iwana, and Seiichi Uchida

International Conference on Document Analysis and Recognition (ICDAR) 1111 - 1116 2019.9

　More details

Language：English Publishing type：Research paper (other academic)

In this paper, we propose a novel Convolutional Neural Network (CNN) based method that extracts the location information (displacement features) of the maximums in the max-pooling operation and fuses it with the pooling features to capture the micro deformations between the genuine signatures and skilled forgeries as a feature extraction procedure. After the feature extraction procedure, we apply support vector machines (SVMs) as writer-dependent classifiers for each user to build the signature verification system. The extensive experimental results on GPDS-150, GPDS-300, GPDS-1000, GPDS-2000, and GPDS-5000 datasets demonstrate that the proposed method can discriminate the genuine signatures and their corresponding skilled forgeries well and achieve state-of-the-art results on these datasets.

DOI： 10.1109/ICDAR.2019.00180
Deep Dynamic Time Warping: End-to-End Local Representation Learning for Online Signature Verification Reviewed

Xiaomeng Wu, Akisato Kimura, Brian Kenji Iwana, Seiichi Uchida, and Kunio Kashino

International Conference on Document Analysis and Recognition (ICDAR) 1103 - 1110 2019.9

　More details

Language：English Publishing type：Research paper (other academic)

Siamese networks have been shown to be successful in learning deep representations for multivariate time series verification. However, most related studies optimize a global distance objective and suffer from a low discriminative power due to the loss of temporal information. To address this issue, we propose an end-to-end, neural network-based framework for learning local representations of time series, and demonstrate its effectiveness for online signature verification. This framework optimizes a Siamese network with a local embedding loss, and learns a feature space that preserves the temporal location-wise distances between time series. To achieve invariance to non-linear temporal distortion, we propose building a dynamic time warping block on top of the Siamese network, which will greatly improve the accuracy for local correspondences across intra-personal variability. Validation with respect to online signature verification demonstrates the advantage of our framework over existing techniques that use either handcrafted or learned feature representations.

DOI： 10.1109/ICDAR.2019.00179
Modality Conversion of Handwritten Patterns by Cross Variational Autoencoders Reviewed

#Taichi Sumi, Brian Kenji Iwana, Hideaki Hayashi, and Seiichi Uchida

International Conference on Document Analysis and Recognition (ICDAR) 407 - 412 2019.9

　More details

Language：English Publishing type：Research paper (other academic)

This research attempts to construct a network that can convert online and offline handwritten characters to each other. The proposed network consists of two Variational Auto-Encoders (VAEs) with a shared latent space. The VAEs are trained to generate online and offline handwritten Latin characters simultaneously. In this way, we create a cross-modal VAE (Cross-VAE). During training, the proposed Cross-VAE is trained to minimize the reconstruction loss of the two modalities, the distribution loss of the two VAEs, and a novel third loss called the space sharing loss. This third, space sharing loss is used to encourage the modalities to share the same latent space by calculating the distance between the latent variables. Through the proposed method mutual conversion of online and offline handwritten characters is possible. In this paper, we demonstrate the performance of the Cross-VAE through qualitative and quantitative analysis.

DOI： 10.1109/ICDAR.2019.00072
Selective Super-Resolution for Scene Text Images Reviewed

#Ryo Nakao, Brian Kenji Iwana, and Seiichi Uchida

International Conference on Document Analysis and Recognition (ICDAR) 401 - 406 2019.9

　More details

Language：English Publishing type：Research paper (other academic)

In this paper, we realize the enhancement of super-resolution using images with scene text. Specifically, this paper proposes the use of Super-Resolution Convolutional Neural Networks (SRCNN) which are constructed to tackle issues associated with characters and text. We demonstrate that standard SRCNNs trained for general object super-resolution is not sufficient and that the proposed method is a viable method in creating a robust model for text. To do so, we analyze the characteristics of SRCNNs through quantitative and qualitative evaluations with scene text data. In addition, analysis using the correlation between layers by Singular Vector Canonical Correlation Analysis (SVCCA) and comparison of filters of each SRCNN using t-SNE is performed. Furthermore, in order to create a unified super-resolution model specialized for both text and objects, a model using SRCNNs trained with the different data types and Content-wise Network Fusion (CNF) is used. We integrate the SRCNN trained for character images and then SRCNN trained for general object images, and verify the accuracy improvement of scene images which include text. We also examine how each SRCNN affects super-resolution images after fusion.
On the Ability of a CNN to Realize Image-to-Image Language Conversion Reviewed

#Kohei Baba, Seiichi Uchida, and Brian Kenji Iwana

International Conference on Document Analysis and Recognition (ICDAR) 448 - 453 2019.9

　More details

Language：English Publishing type：Research paper (other academic)

The purpose of this paper is to reveal the ability that Convolutional Neural Networks (CNN) have on the novel task of image-to-image language conversion. We propose a new network to tackle this task by converting images of Korean Hangul characters directly into images of the phonetic Latin character equivalent. The conversion rules between Hangul and the phonetic symbols are not explicitly provided. The results of the proposed network show that it is possible to perform image-to-image language conversion. Moreover, it shows that it can grasp the structural features of Hangul even from limited learning data. In addition, it introduces a new network to use when the input and output have significantly different features.

DOI： 10.1109/ICDAR.2019.00078
Cascading Modular U-Nets for Document Image Binarization Reviewed

#Seokjun Kang, Brian Kenji Iwana, Seiichi Uchida

International Conference on Document Analysis and Recognition (ICDAR) 675 - 680 2019.9

　More details

Language：English Publishing type：Research paper (other academic)

In recent years, U-Net has achieved good results in various image processing tasks. However, conventional U-Nets need to be re-trained for individual tasks with enough amount of images with ground-truth. This requirement makes U-Net not applicable to tasks with small amounts of data. In this paper, we propose to use "modular" U-Nets, each of which is pre-trained to perform an existing image processing task, such as dilation, erosion, and histogram equalization. Then, to accomplish a specific image processing task, such as binarization of historical document images, the modular U-Nets are cascaded with inter-module skip connections and fine-tuned to the target task. We verified the proposed model using the Document Image Binarization Competition (DIBCO) 2017 dataset.

DOI： 10.1109/ICDAR.2019.00113
Dynamic Weight Alignment for Temporal Convolutional Neural Networks Reviewed

Brian Kenji Iwana and Seiichi Uchida

International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 3827 - 3831 2019.5

　More details

Language：English Publishing type：Research paper (other academic)

In this paper, we propose a method of improving temporal Convolutional Neural Networks (CNN) by determining the optimal alignment of weights and inputs using dynamic programming. Conventional CNN convolutions linearly match the shared weights to a window of the input. However, it is possible that there exists a more optimal alignment of weights. Thus, we propose the use of Dynamic Time Warping (DTW) to dynamically align the weights to the input of the convolutional layer. Specifically, the dynamic alignment overcomes issues such as temporal distortion by finding the minimal distance matching of the weights and the inputs under constraints. We demonstrate the effectiveness of the proposed architecture on the Unipen online handwritten digit and character datasets, the UCI Spoken Arabic Digit dataset, and the UCI Activities of Daily Life dataset.

DOI： 10.1109/ICASSP.2019.8682908
How do Convolutional Neural Networks Learn Design? Reviewed

@Shailza Jolly, Brian Kenji Iwana, #Ryohei Kuroki, and Seiichi Uchida

International Conference on Pattern Recognition (ICPR) 1085 - 1090 2018.8

　More details

Language：English Publishing type：Research paper (other academic)

In this paper, we aim to understand the design principles in book cover images which are carefully crafted by experts. Book covers are designed in a unique way, specific to genres which convey important information to their readers. By using Convolutional Neural Networks (CNN) to predict book genres from cover images, visual cues which distinguish genres can be highlighted and analyzed. In order to understand these visual clues contributing towards the decision of a genre, we present the application of Layer-wise Relevance Propagation (LRP) on the book cover image classification results. We use LRP to explain the pixel-wise contributions of book cover design and highlight the design elements contributing towards particular genres. In addition, with the use of state-of-the-art object and text detection methods, insights about genre-specific book cover designs are discovered.

DOI： 10.1109/ICPR.2018.8545624
Discovering Class-Wise Trends of Max-Pooling in Subspace Reviewed

#Yuchen Zheng, Brian Kenji Iwana, and Seiichi Uchida

International Conference on Frontiers in Handwriting Recognition (ICFHR) 98 - 103 2018.8

　More details

Language：English Publishing type：Research paper (other academic)

The traditional max-pooling operation in Convolutional Neural Networks (CNNs) only obtains the maximal value from a pooling window. However, it discards the information about the precise position of the maximal value. In this paper, we extract the location of the maximal value in a pooling window and transform it into "displacement feature". We analyze and discover the class-wise trend of the displacement features in many ways. The experimental results and discussion demonstrate that the displacement features have beneficial behaviors for solving the problems in max-pooling.

DOI： 10.1109/ICFHR-2018.2018.00026
Introducing Local Distance-based Features to Temporal Convolutional Neural Networks Reviewed

Brian Kenji Iwana, Minoru Mori, Akisato Kimura, and Seiichi Uchida

International Conference on Frontiers in Handwriting Recognition (ICFHR) 92 - 97 2018.8

　More details

Language：English Publishing type：Research paper (other academic)

In this paper, we propose the use of local distance-based features determined by Dynamic Time Warping (DTW) for temporal Convolutional Neural Networks (CNN). Traditionally, DTW is used as a robust distance metric for time series patterns. However, this traditional use of DTW only utilizes the scalar distance metric and discards the local distances between the dynamically matched sequence elements. This paper proposes recovering these local distances, or DTW features, and utilizing them for the input of a CNN. We demonstrate that these features can provide additional information for the classification of isolated handwritten digits and characters. Furthermore, we demonstrate that the DTW features can be combined with the spatial coordinate features in multi-modal fusion networks to achieve state-of-the-art accuracy on the Unipen online handwritten character datasets.

DOI： 10.1109/ICFHR-2018.2018.00025
Contained Neural Style Transfer for Decorated Logo Generation Reviewed

#Gantugs Atarsaikhan, Brian Kenji Iwana, and Seiichi Uchida

International Workshop on Document Analysis Systems (DAS) 2018.4

　More details

Language：English Publishing type：Research paper (other academic)

Making decorated logos requires image editing skills, without sufficient skills, it could be a time-consuming task. While there are many on-line web services to make new logos, they have limited designs and duplicates can be made. We propose using neural style transfer with clip art and text for the creation of new and genuine logos. We introduce a new loss function based on distance transform of the input image, which allows the preservation of the silhouettes of text and objects. The proposed method contains style transfer to only a designated area. We demonstrate the characteristics of proposed method. Finally, we show the results of logo generation with various input images.

DOI： 10.1109/DAS.2018.78
Font Creation Using Class Discriminative Deep Convolutional Generative Adversarial Networks Invited Reviewed International journal

#Kotaro Abe, Brian Kenji Iwana, @Viktor Gösta Holmér, and Seiichi Uchida

Asian Conference on Pattern Recognition (ACPR) 232 - 237 2017.11

　More details

Language：English Publishing type：Research paper (scientific journal)

In this research, we attempt to generate fonts automatically using a modification of a Deep Convolutional Generative Adversarial Network (DCGAN) by introducing class consideration. DCGANs are the application of generative adversarial networks (GAN) which make use of convolutional and deconvolutional layers to generate data through adversarial detection. The conventional GAN is comprised of two neural networks that work in series. Specifically, it approaches an unsupervised method of data generation with the use of a generative network whose output is fed into a second discriminative network. While DCGANs have been successful on natural images, we show its limited ability on font generation due to the high variation of fonts combined with the need of rigid structures of characters. We propose a class discriminative DCGAN which uses a classification network to work alongside the discriminative network to refine the generative network. This results of our experiment shows a dramatic improvement over the conventional DCGAN.

DOI： 10.1109/ACPR.2017.99
Component Awareness in Convolutional Neural Networks Reviewed

Brian Kenji Iwana, #Letao Zhou, Kumiko Tanaka-Ishii, and Seiichi Uchida

International Conference on Document Analysis and Recognition (ICDAR) 394 - 399 2017.11

　More details

Language：English Publishing type：Research paper (other academic)

In this work, we investigate the ability of Convolutional Neural Networks (CNN) to infer the presence of components that comprise an image. In recent years, CNNs have achieved powerful results in classification, detection, and segmentation. However, these models learn from instance-level supervision of the detected object. In this paper, we determine if CNNs can detect objects using image-level weakly supervised labels without localization. To demonstrate that a CNN can infer awareness of objects, we evaluate a CNN's classification ability with a database constructed of Chinese characters with only character-level labeled components. We show that the CNN is able to achieve a high accuracy in identifying the presence of these components without specific knowledge of the component. Furthermore, we verify that the CNN is deducing the knowledge of the target component by comparing the results to an experiment with the component removed. This research is important for applications with large amounts of data without robust annotation such as Chinese character recognition.

DOI： 10.1109/ICDAR.2017.72
Neural font style transfer Reviewed

#Gantugs Atarsaikhan, Brian Kenji Iwana, Atsushi Narusawa, Keiji Yanai, and Seiichi Uchida

51 - 56 2017.11

　More details

Language：English Publishing type：Research paper (other academic)

In this paper, we chose an approach to generate fonts by using neural style transfer. Neural style transfer uses Convolution Neural Networks(CNN) to transfer the style of one image to another. By modifying neural style transfer, we can achieve neural font style transfer. We also demonstrate the effects of using different weighted factors, character placements, and orientations. In addition, we show the results of using non-Latin alphabets, non-text patterns, and non-text images as style images. Finally, we provide insight into the characteristics of style transfer with fonts.

DOI： 10.1109/ICDAR.2017.328
Globally Optimal Object Tracking with Complementary Use of Single Shot Multibox Detector and Fully Convolutional Network Reviewed

#Jinho Lee, Brian Kenji Iwana, #Shota Ide, Hideaki Hayashi, and Seiichi Uchida

Pacific-Rim Symposium on Image and Video Technology (PSIVT) 110 - 112 2017.7

　More details

Language：English Publishing type：Research paper (other academic)

Tracking is one of the most important but still difficult tasks in computer vision and pattern recognition. The main difficulties in the tracking field are appearance variation and occlusion. Most traditional tracking methods set the parameters or templates to track target objects in advance and should be modified accordingly. Thus, we propose a new and robust tracking method using a Fully Convolutional Network (FCN) to obtain an object probability map and Dynamic Programming (DP) to seek the globally optimal path through all frames of video. Our proposed method solves the object appearance variation problem with the use of a FCN and deals with occlusion by DP. We show that our method is effective in tracking various single objects through video frames.
Efficient temporal pattern recognition by means of dissimilarity space embedding with discriminative prototypes Reviewed International journal

Brian Kenji Iwana, Kaspar Riesen, Volkmar Frinken, and Seiichi Uchida

Pattern Recognition 64 268 - 276 2017.4

　More details

Language：English Publishing type：Research paper (scientific journal)

Dissimilarity space embedding (DSE) presents a method of representing data as vectors of dissimilarities. This representation is interesting for its ability to use a dissimilarity measure to embed various patterns (e.g. graph patterns with different topology and temporal patterns with different lengths) into a vector space. The method proposed in this paper uses a dynamic time warping (DTW) based DSE for the purpose of the classification of massive sets of temporal patterns. However, using large data sets introduces the problem of requiring a high computational cost. To address this, we consider a prototype selection approach. A vector space created by DSE offers us the ability to treat its independent dimensions as features allowing for the use of feature selection. The proposed method exploits this and reduces the number of prototypes required for accurate classification. To validate the proposed method we use two-class classification on a data set of handwritten on-line numerical digits. We show that by using DSE with ensemble classification, high accuracy classification is possible with very few prototypes.

DOI： 10.1016/j.patcog.2016.11.013

Other Link： https://www.sciencedirect.com/science/article/abs/pii/S0031320316303739?via%3Dihub
A further step to perfect accuracy by training CNN with larger data Reviewed

Seiichi Uchida, #Shota Ide, Brian Kenji Iwana, and Anna Zhu

International Conference on Frontiers in Handwriting Recognition (ICFHR) 405 - 410 2016.10

　More details

Language：English Publishing type：Research paper (other academic)

Convolutional Neural Networks (CNN) are on the forefront of accurate character recognition. This paper explores CNNs at their maximum capacity by implementing the use of large datasets. We show a near-perfect performance by using a dataset of about 820,000 real samples of isolated handwritten digits, much larger than the conventional MNIST database. In addition, we report a near-perfect performance on the recognition of machine-printed digits and multi-font digital born digits. Also, in order to progress toward a universal OCR, we propose methods of combining the datasets into one classifier. This paper reveals the effects of combining the datasets prior to training and the effects of transfer learning during training. The results of the proposed methods also show an almost perfect accuracy suggesting the ability of the network to generalize all forms of text.

DOI： 10.1109/ICFHR.2016.0082
A Robust Dissimilarity-Based Neural Network for Temporal Pattern Recognition Reviewed

Brian Kenji Iwana, Volkmar Frinken, and Seiichi Uchida

International Conference on Frontiers in Handwriting Recognition (ICFHR) 265 - 270 2016.10

　More details

Language：English Publishing type：Research paper (other academic)

Temporal pattern recognition is challenging because temporal patterns require extra considerations over other data types, such as order, structure, and temporal distortions. Recently, there has been a trend in using large data and deep learning, however, many of the tools cannot be directly used with temporal patterns. Convolutional Neural Networks (CNN) for instance are traditionally used for visual and image pattern recognition. This paper proposes a method using a neural network to classify isolated temporal patterns directly. The proposed method uses dynamic time warping (DTW) as a kernel-like function to learn dissimilarity-based feature maps as the basis of the network. We show that using the proposed DTW-NN, efficient classification of on-line handwritten digits is possible with accuracies comparable to state-of-the-art methods.

DOI： 10.1109/ICFHR.2016.0058
Detecting text in natural scene images with conditional clustering and convolution neural network Reviewed International journal

Anna Zhu, Guoyou Wang, Yangbo Dong, and Brian Kenji Iwana

Journal of Electronic Imaging 24 ( 5 ) 053019 2015.9

　More details

Language：English Publishing type：Research paper (scientific journal)

We present a robust method of detecting text in natural scenes. The work consists of four parts. First, automatically partition the images into different layers based on conditional clustering. The clustering operates in two sequential ways. One has a constrained clustering center and conditional determined cluster numbers, which generate small-size subregions. The other has fixed cluster numbers, which generate full-size subregions. After the clustering, we obtain a bunch of connected components (CCs) in each subregion. In the second step, the convolutional neural network (CNN) is used to classify those CCs to character components or noncharacter ones. The output score of the CNN can be transferred to the postprobability of characters. Then we group the candidate characters into text strings based on the probability and location. Finally, we use a verification step. We choose a multichannel strategy to evaluate the performance on the public datasets: ICDAR2011 and ICDAR2013. The experimental results demonstrate that our algorithm achieves a superior performance compared with the state-of-the-art text detection algorithms.

DOI： 10.1117/1.JEI.24.5.053019

Other Link： https://www.spiedigitallibrary.org/journals/Journal-of-Electronic-Imaging/volume-24/issue-5/053019/Detecting-text-in-natural-scene-images-with-conditional-clustering-and/10.1117/1.JEI.24.5.053019.short
Tackling Temporal Pattern Recognition by Vector Space Embedding Reviewed

Brian Kenji Iwana, Seiichi Uchida, Kaspar Riesen, and Volkmar Frinken

International Conference on Document Analysis and Recognition (ICDAR) 816 - 820 2015.8

　More details

Language：English Publishing type：Research paper (other academic)

This paper introduces a novel method of reducing the number of prototype patterns necessary for accurate recognition of temporal patterns. The nearest neighbor (NN) method is an effective tool in pattern recognition, but the downside is it can be computationally costly when using large quantities of data. To solve this problem, we propose a method of representing the temporal patterns by embedding dynamic time warping (DTW) distance based dissimilarities in vector space. Adaptive boosting (AdaBoost) is then applied for classifier training and feature selection to reduce the number of prototype patterns required for accurate recognition. With a data set of handwritten digits provided by the International Unipen Foundation (iUF), we successfully show that a large quantity of temporal data can be efficiently classified produce similar results to the established NN method while performing at a much smaller cost.

DOI： 10.1109/ICDAR.2015.7333875

▼display all

Presentations

Model Selection with a Shapelet-based Distance Measure for Multi-source Transfer Learning in Time Series Classification International conference

Jiseok Lee

International Conference on Pattern Recognition (ICPR)

　More details

Event date： 2024.12

Language：English

Venue：Kolkata Country：India
Improving the Robustness of Time Series Neural Networks from Adversarial Attacks Using Time Warping International conference

Yoh Yamashita

International Conference on Pattern Recognition (ICPR)

　More details

Event date： 2024.12

Language：English

Venue：Kolkata Country：India
Improving Online Handwriting Recognition with Transfer Learning Using Out-of-Domain and Different-Dimensional Sources International conference

Jiseok Lee

International Conference on Pattern Recognition (ICPR)

　More details

Event date： 2024.12

Language：English

Venue：Kolkata Country：India
Test Time Augmentation as a Defense Against Adversarial Attacks on Online Handwriting International conference

#Yoh Yamashita, Brian Kenji Iwana

International Conference on Document Analysis and Recognition (ICDAR) 2024.9

　More details

Event date： 2024.4

Language：English Presentation type：Oral presentation (general)

Venue：Athens Country：Greece
Using Motif-Based Features to Improve Signal Classification with Temporal Neural Networks International conference

Brian Kenji Iwana

Asian Conference on Pattern Recognition (ACPR) 2023.11

　More details

Event date： 2023.10

Language：English Presentation type：Oral presentation (general)

Venue：Kitakyushu Country：Japan
Vision Conformer: Incorporating Convolutions into Vision Transformer Layers International conference

Brian Kenji Iwana

International Conference on Document Analysis and Recognition (ICDAR) 2023.8

　More details

Event date： 2023.10

Language：English

Venue：San Jose Country：United States
On Mini-Batch Training with Varying Length Time Series International conference

Brian Kenji Iwana

International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2022.5

　More details

Event date： 2022.5

Language：English Presentation type：Oral presentation (general)

Venue：Singapore Country：Singapore
Font Style that Fits an Image -- Font Generation Based on Image Context International conference

Brian Kenji Iwana

International Conference on Document Analysis and Recognition (ICDAR) 2021.9

　More details

Event date： 2021.9

Language：English Presentation type：Oral presentation (general)

Country：Switzerland
Time Series Data Augmentation for Neural Networks by Time Warping with a Discriminative Teacher International conference

Brian Kenji Iwana

International Conference on Pattern Recognition (ICPR) 2021.1

　More details

Event date： 2021.1

Language：English Presentation type：Oral presentation (general)

Country：Italy
What is the Reward for Handwriting? --- Handwriting Generation by Imitation Learning International conference

Brian Kenji Iwana

International Conference on Frontiers in Handwriting Recognition (ICFHR) 2020.8

　More details

Event date： 2020.10

Language：English

Venue：Online Country：Other

Analyzing the handwriting generation process is an important issue and has been tackled by various generation models, such as kinematics based models and stochastic models. In this study, we use a reinforcement learning (RL) framework to realize handwriting generation with the careful future planning ability. In fact, the handwriting process of human beings is also supported by their future planning ability; for example, the ability is necessary to generate a closed trajectory like '0' because any shortsighted model, such as a Markovian model, cannot generate it. For the algorithm, we employ generative adversarial imitation learning (GAIL). Typical RL algorithms require the manual definition of the reward function, which is very crucial to control the generation process. In contrast, GAIL trains the reward function along with the other modules of the framework. In other words, through GAIL, we can understand the reward of the handwriting generation process from handwriting examples. Our experimental results qualitatively and quantitatively show that the learned reward catches the trends in handwriting generation and thus GAIL is well suited for the acquisition of handwriting behavior.
Effect of Text Color on Word Embeddings International conference

Brian Kenji Iwana

International Workshop on Document Analysis Systems (DAS) 2020.6

　More details

Event date： 2020.10

Language：English Presentation type：Oral presentation (general)

Venue：Online Country：Other

In natural scenes and documents, we can find the correlation between a text and its color. For instance, the word, "hot", is often printed in red, while "cold" is often in blue. This correlation can be thought of as a feature that represents the semantic difference between the words. Based on this observation, we propose the idea of using text color for word embeddings. While text-only word embeddings (e.g. word2vec) have been extremely successful, they often represent antonyms as similar since they are often interchangeable in sentences. In this paper, we try two tasks to verify the usefulness of text color in understanding the meanings of words, especially in identifying synonyms and antonyms. First, we quantify the color distribution of words from the book cover images and analyze the correlation between the color and meaning of the word. Second, we try to retrain word embeddings with the color distribution of words as a constraint. By observing the changes in the word embeddings of synonyms and antonyms before and after re-training, we aim to understand the kind of words that have positive or negative effects in their word embeddings when incorporating text color information.
Explaining Convolutional Neural Networks using Softmax Gradient Layer-wise Relevance Propagation International conference

Brian Kenji Iwana

International Conference on Computer Vision Workshops 2019.10

　More details

Event date： 2019.10

Language：English Presentation type：Symposium, workshop panel (public)

Venue：Seoul Country：Korea, Republic of

Convolutional Neural Networks (CNN) have become state-of-the-art in the field of image classification. However, not everything is understood about their inner representations. This paper tackles the interpretability and explainability of the predictions of CNNs for multi-class classification problems. Specifically, we propose a novel visualization method of pixel-wise input attribution called Softmax-Gradient Layer-wise Relevance Propagation (SGLRP). The proposed model is a class discriminate extension to Deep Taylor Decomposition (DTD) using the gradient of softmax to back propagate the relevance of the output probability to the input image. Through qualitative and quantitative analysis, we demonstrate that SGLRP can successfully localize and attribute the regions on input images which contribute to a target object's classification. We show that the proposed method excels at discriminating the target objects class from the other possible objects in the images. We confirm that SGLRP performs better than existing Layer-wise Relevance Propagation (LRP) based methods and can help in the understanding of the decision process of CNNs.
Selective Super-Resolution for Scene Text Images International conference

Brian Kenji Iwana

International Conference on Document Analysis and Recognition (ICDAR) 2019.9

　More details

Event date： 2019.9

Language：English

Venue：Sydney Country：Australia

In this paper, we realize the enhancement of super-resolution using images with scene text. Specifically, this paper proposes the use of Super-Resolution Convolutional Neural Networks (SRCNN) which are constructed to tackle issues associated with characters and text. We demonstrate that standard SRCNNs trained for general object super-resolution is not sufficient and that the proposed method is a viable method in creating a robust model for text. To do so, we analyze the characteristics of SRCNNs through quantitative and qualitative evaluations with scene text data. In addition, analysis using the correlation between layers by Singular Vector Canonical Correlation Analysis (SVCCA) and comparison of filters of each SRCNN using t-SNE is performed. Furthermore, in order to create a unified super-resolution model specialized for both text and objects, a model using SRCNNs trained with the different data types and Content-wise Network Fusion (CNF) is used. We integrate the SRCNN trained for character images and then SRCNN trained for general object images, and verify the accuracy improvement of scene images which include text. We also examine how each SRCNN affects super-resolution images after fusion.
Modality Conversion of Handwritten Patterns by Cross Variational Autoencoders International conference

Brian Kenji Iwana

International Conference on Document Analysis and Recognition (ICDAR) 2019.9

　More details

Event date： 2019.9

Language：English

Venue：Sydney Country：Australia

This research attempts to construct a network that can convert online and offline handwritten characters to each other. The proposed network consists of two Variational Auto-Encoders (VAEs) with a shared latent space. The VAEs are trained to generate online and offline handwritten Latin characters simultaneously. In this way, we create a cross-modal VAE (Cross-VAE). During training, the proposed Cross-VAE is trained to minimize the reconstruction loss of the two modalities, the distribution loss of the two VAEs, and a novel third loss called the space sharing loss. This third, space sharing loss is used to encourage the modalities to share the same latent space by calculating the distance between the latent variables. Through the proposed method mutual conversion of online and offline handwritten characters is possible. In this paper, we demonstrate the performance of the Cross-VAE through qualitative and quantitative analysis.
Dynamic Weight Alignment for Temporal Convolutional Neural Networks International conference

Brian Kenji Iwana

International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2019.5

　More details

Event date： 2019.5

Language：English

Venue：Brighton Country：United Kingdom

In this paper, we propose a method of improving temporal Convolutional Neural Networks (CNN) by determining the optimal alignment of weights and inputs using dynamic programming. Conventional CNN convolutions linearly match the shared weights to a window of the input. However, it is possible that there exists a more optimal alignment of weights. Thus, we propose the use of Dynamic Time Warping (DTW) to dynamically align the weights to the input of the convolutional layer. Specifically, the dynamic alignment overcomes issues such as temporal distortion by finding the minimal distance matching of the weights and the inputs under constraints. We demonstrate the effectiveness of the proposed architecture on the Unipen online handwritten digit and character datasets, the UCI Spoken Arabic Digit dataset, and the UCI Activities of Daily Life dataset.
Dynamic Weight Alignment for Temporal Convolutional Neural Networks International conference

Brian Kenji Iwana

International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2019.5

　More details

Event date： 2019.5

Language：English Presentation type：Oral presentation (general)

Venue：Brighton Country：United Kingdom

In this paper, we propose a method of improving temporal Convolutional Neural Networks (CNN) by determining the optimal alignment of weights and inputs using dynamic programming. Conventional CNN convolutions linearly match the shared weights to a window of the input. However, it is possible that there exists a more optimal alignment of weights. Thus, we propose the use of Dynamic Time Warping (DTW) to dynamically align the weights to the input of the convolutional layer. Specifically, the dynamic alignment overcomes issues such as temporal distortion by finding the minimal distance matching of the weights and the inputs under constraints. We demonstrate the effectiveness of the proposed architecture on the Unipen online handwritten digit and character datasets, the UCI Spoken Arabic Digit dataset, and the UCI Activities of Daily Life dataset.
Capturing Micro Deformations from Pooling Layers for Offline Signature Verification International conference

Brian Kenji Iwana

International Conference on Document Analysis and Recognition (ICDAR) 2018.8

　More details

Event date： 2018.8

Language：English Presentation type：Oral presentation (general)

Venue：Niagra Falls Country：United States

In this paper, we propose a novel Convolutional Neural Network (CNN) based method that extracts the location information (displacement features) of the maximums in the max-pooling operation and fuses it with the pooling features to capture the micro deformations between the genuine signatures and skilled forgeries as a feature extraction procedure. After the feature extraction procedure, we apply support vector machines (SVMs) as writer-dependent classifiers for each user to build the signature verification system. The extensive experimental results on GPDS-150, GPDS-300, GPDS-1000, GPDS-2000, and GPDS-5000 datasets demonstrate that the proposed method can discriminate the genuine signatures and their corresponding skilled forgeries well and achieve state-of-the-art results on these datasets.
Introducing Local Distance-based Features to Temporal Convolutional Neural Networks International conference

Brian Kenji Iwana

International Conference on Frontiers in Handwriting Recognition (ICFHR) 2018.8

　More details

Event date： 2018.8

Language：English Presentation type：Oral presentation (general)

Venue：Niagra Falls Country：United States

In this paper, we propose the use of local distance-based features determined by Dynamic Time Warping (DTW) for temporal Convolutional Neural Networks (CNN). Traditionally, DTW is used as a robust distance metric for time series patterns. However, this traditional use of DTW only utilizes the scalar distance metric and discards the local distances between the dynamically matched sequence elements. This paper proposes recovering these local distances, or DTW features, and utilizing them for the input of a CNN. We demonstrate that these features can provide additional information for the classification of isolated handwritten digits and characters. Furthermore, we demonstrate that the DTW features can be combined with the spatial coordinate features in multi-modal fusion networks to achieve state-of-the-art accuracy on the Unipen online handwritten character datasets.
Facial Gesture Classification with Few-shot Learning Using Limited Calibration Data from Photo-reflective Sensors on Smart Eyewear International coauthorship International conference

Katsutoshi Masai

International Conference on Mobile and Ubiquitous Multimedia (MUM) 2024.12

　More details

Language：English Presentation type：Oral presentation (general)

Country：Sweden

DOI： 10.1145/3701571.3701595
Advances in Temporal Pattern Recognition and its Effect on AI Invited

Brian Kenji Iwana

2024.5 Q-AOS

　More details

Language：English Presentation type：Oral presentation (invited, special)

▼display all

Works

Book Cover Dataset

Brian Kenji Iwana

2016.4

　More details

This dataset contains 207,572 books from the Amazon.com, Inc. marketplace.

Other Link： https://github.com/uchidalab/book-dataset

Industrial property rights

Patent	Number of applications: 0	Number of registrations: 1
Utility model	Number of applications: 0	Number of registrations: 0
Design	Number of applications: 0	Number of registrations: 0
Trademark	Number of applications: 0	Number of registrations: 0

Professional Memberships

IEEE
IEICE

Academic Activities

Co-Chair International contribution

Role(s)： Planning, management, etc.,　Review, evaluation,　Peer review

International workshop on Documents analysis of Low-resourse Languages (DALL) 2025.9
Program Committee International contribution

Role(s)： Review, evaluation,　Peer review

Int. Conf. on Document Analysis and Recognition 2025.9
Reviewer International contribution

Role(s)： Review, evaluation,　Peer review

IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2025.2 - 2025.3
Program Committee International contribution

Role(s)： Review, evaluation,　Peer review

Int. Conf. on Pattern Recognition Applications and Methods 2025.2
Program Committee International contribution

Int. Conf. on Document Analysis and Recognition （ Greece UnitedStatesofAmerica ） 2024.8 - 2024.9

　More details

Type：Competition, symposium, etc.
Session Chair

情報関係学会九州支部連合大会（ Kumamoto Japan ） 2023.9

　More details

Type：Competition, symposium, etc.
Program Committee International contribution

Int. Conf. on Document Analysis and Recognition （ San Jose UnitedStatesofAmerica ） 2023.8

　More details

Type：Competition, symposium, etc.
Reviewer International contribution

Int. Conf. on Acoustics, Speech, and Signal Processing （ Rhodes Island Greece ） 2023.6

　More details

Type：Competition, symposium, etc.
Senior Committee Member International contribution

Int. Conf. on Frontiers of Handwriting Recognition （ Hyderabad India ） 2022.12

　More details

Type：Competition, symposium, etc.
Reviewer International contribution

Int. Conf. on Pattern Recognition （ Montreal Canada ） 2022.8

　More details

Type：Competition, symposium, etc.
Program Committee International contribution

International Workshop on Document Analysis Systems （ La Rochelle France ） 2022.5

　More details

Type：Competition, symposium, etc.
Reviewer International contribution

Winter Conference on Applications of Computer Vision （ Waikoloa UnitedStatesofAmerica ） 2022.1

　More details

Type：Competition, symposium, etc.
Screening of academic papers

Role(s)： Peer review

2022

　More details

Type：Peer review

Number of peer-reviewed articles in foreign language journals：6

Number of peer-reviewed articles in Japanese journals：0

Proceedings of International Conference Number of peer-reviewed papers：17

Proceedings of domestic conference Number of peer-reviewed papers：0
Program Committee International contribution

Int. Conf. on Document Analysis and Recognition （ Lausanne Switzerland ） 2021.9

　More details

Type：Competition, symposium, etc.
Program Committee International contribution

ICDAR Workshop on Machine Learning （ Lausanne Switzerland ） 2021.9

　More details

Type：Competition, symposium, etc.
Springer Nature Computer Science International contribution

2021.8 - 2032.7

　More details

Type：Academic society, research group, etc.
Program Committee International contribution

AAAI Conf. on Artificial Intelligence （ Vancouver Canada ） 2021.2

　More details

Type：Competition, symposium, etc.
Program Committee International contribution

Int. Conf. on Frontiers of Handwriting Recognition （ Dortmund Germany ） 2020.9

　More details

Type：Competition, symposium, etc.
Program Committee International contribution

International Workshop on Document Analysis Systems （ Online (originally Wuhan) China ） 2020.5

　More details

Type：Competition, symposium, etc.
Screening of academic papers

Role(s)： Peer review

2020

　More details

Type：Peer review

Number of peer-reviewed articles in foreign language journals：12

Number of peer-reviewed articles in Japanese journals：0

Proceedings of International Conference Number of peer-reviewed papers：7

Proceedings of domestic conference Number of peer-reviewed papers：0
Program Committee International contribution

ICDAR Workshop on Machine Learning （ Sydney Australia ） 2019.9

　More details

Type：Competition, symposium, etc.
Screening of academic papers

Role(s)： Peer review

2019

　More details

Type：Peer review

Number of peer-reviewed articles in foreign language journals：15

Number of peer-reviewed articles in Japanese journals：0

Proceedings of International Conference Number of peer-reviewed papers：12

Proceedings of domestic conference Number of peer-reviewed papers：0
Session Chair

Joint Workshop on Machine Perception and Robotics （ Fukuoka Japan ） 2018.10

　More details

Type：Competition, symposium, etc.
Reviewer International contribution

Int. Conf. on Pattern Recognition （ Beijing China ） 2018.8

　More details

Type：Competition, symposium, etc.
Program Committee International contribution

International Workshop on Document Analysis Systems （ Vienna Austria ） 2018.4

　More details

Type：Competition, symposium, etc.
Program Committee International contribution

ICDAR Workshop on Machine Learning （ Kyoto Japan ） 2017.11

　More details

Type：Competition, symposium, etc.

▼display all

Research Projects

Tackling real-world time series using dynamic neural networks

Grant number：23K16949 2023 - 2025

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Early-Career Scientists

　 More details

Authorship：Principal investigator Grant type：Scientific research funding
Dynamic Neural Architecture Warping for Time Series Recognition

Grant number：21K17808 2021 - 2023

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Early-Career Scientists

　 More details

Authorship：Principal investigator Grant type：Scientific research funding
Tackling Real-World Data with Multi-Modal Representation and Augmentation

2021

Q-Dai Jump Wakaba Challenge

　 More details

Authorship：Principal investigator Grant type：Contract research
Time series recognition

2018.4 - 2023.3

Joint research

　 More details

Authorship：Coinvestigator(s) Grant type：Other funds from industry-academia collaboration

Educational Activities

Pattern Recognition and Data Processing English version

Class subject

Fundamentals of Computer Systems A/B

2025.10 - 2026.2 Second semester
Fundamentals of Computer Systems A/B

2024.10 - 2025.3 Second semester
Fundamentals of Computer Systems A/B

2023.10 - 2024.3 Second semester
Fundamentals of Electrical Engineering and Computer Science I

2022.10 - 2023.3 Second semester
Fundamentals of Computer Systems A/B

2022.10 - 2023.3 Second semester
Fundamentals of Computer Systems A/B

2021.10 - 2022.3 Second semester
SLS Biological Data Processing (English version)

2021.4 - 2021.6 Spring quarter
Fundamentals of Computer Systems A/B

2020.10 - 2021.3 Second semester
ISEE Pattern Recognition (English version)

2019.4 - 2019.6 Spring quarter
SLS Biological Data Processing (English version)

2019.4 - 2019.6 Spring quarter

▼display all