Kyushu University Academic Staff Educational and Research Activities Database
List of Papers
Seiichi Uchida Last modified date:2024.03.01

Professor / Real World Robotics / Department of Advanced Information Technology / Faculty of Information Science and Electrical Engineering


Papers
1. Wataru Shimoda, Daichi Haraguchi, Seiichi Uchida, Kota Yamaguchi, Towards Diverse and Consistent Typography Generation, Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV 2024), 2024.01.
2. Kaoru Takabayashi, Taku Kobayashi, Katsuyoshi Matsuoka, Barrett G Levesque, Takuji Kawamura, Kiyohito Tanaka, Takeaki Kadota, Ryoma Bise, Seiichi Uchida, Takanori Kanai, Haruhiko Ogata, Artificial intelligence quantifying endoscopic severity of ulcerative colitis in gradation scale., Digestive endoscopy : official journal of the Japan Gastroenterological Endoscopy Society, 10.1111/den.14677, 2023.09, OBJECTIVES: Existing endoscopic scores for ulcerative colitis (UC) objectively categorize disease severity based on the presence or absence of endoscopic findings; therefore, it may not reflect the range of clinical severity within each category. However, inflammatory bowel disease (IBD) expert endoscopists categorize the severity and diagnose the overall impression of the degree of inflammation. This study aimed to develop an artificial intelligence (AI) system that can accurately represent the assessment of the endoscopic severity of UC by IBD expert endoscopists. METHODS: A ranking-convolutional neural network (ranking-CNN) was trained using comparative information on the UC severity of 13,826 pairs of endoscopic images created by IBD expert endoscopists. Using the trained ranking-CNN, the UC Endoscopic Gradation Scale (UCEGS) was used to express severity. Correlation coefficients were calculated to ensure that there were no inconsistencies in assessments of severity made using UCEGS diagnosed by the AI and the Mayo Endoscopic Subscore, and the correlation coefficients of the mean for test images assessed using UCEGS by four IBD expert endoscopists and the AI. RESULTS: Spearman's correlation coefficient between the UCEGS diagnosed by AI and Mayo Endoscopic Subscore was approximately 0.89. The correlation coefficients between IBD expert endoscopists and the AI of the evaluation results were all higher than 0.95 (P 
3. Kanae Masuda, Eriko Kuwada, Maria Suzuki, Tetsuya Suzuki, Takeshi Niikawa, Seiichi Uchida, Takashi Akagi, Transcriptomic interpretation on explainable AI-guided intuition uncovers premonitory reactions of disordering fate in persimmon fruit., Plant & cell physiology, 10.1093/pcp/pcad050, 2023.05, Deep neural network (DNN) techniques, as an advanced machine learning framework, have allowed various image diagnoses in plants, which often achieve better prediction performance than human experts in each specific field. Notwithstanding, in plant biology, the application of deep neural networks is still mostly limited to rapid and effective phenotyping. Recent development of explainable CNN frameworks has allowed visualization of the features in the prediction by convolutional neural network (CNN), which potentially contributes to the understanding of physiological mechanisms in objective phenotypes. In this study, we propose an integration of explainable CNN and transcriptomic approach to make a physiological interpretation of a fruit internal disorder in persimmon, rapid over-softening. We constructed CNN models to accurately predict the fate to be rapid softening in persimmon cv. Soshu, only with photo images. The explainable CNNs, such as Grad-CAM and Guided Grad-CAM, visualized specific featured regions relevant to the prediction of rapid-softening, which would correspond to the premonitory symptoms in a fruit. Transcriptomic analyses to compare the featured regions of predicted rapid-softening and control fruits suggested that rapid softening is triggered by precocious ethylene signal-dependent cell wall modification, despite exhibiting no direct phenotypic changes. Further transcriptomic comparison between the featured and non-featured regions in predicted rapid-softening fruit suggested that premonitory symptoms reflected hypoxia and the related stress signals finally to induce ethylene signals. These results would provide a good example for the collaboration of image analysis and omics approaches in plant physiology, which uncovered a novel aspect of fruit premonitory reactions in the rapid softening fate..
4. Cluster-Guided Semi-Supervised Domain Adaptation for Imbalanced Medical Image Classification.
5. Disease Severity Regression with Continuous Data Augmentation.
6. Hirofumi Ohga, Koki Shibata, Ryo Sakanoue, Takuma Ogawa, Hajime Kitano, Satoshi Kai, Kohei Ohta, Naoki Nagano, Tomoya Nagasako, Seiichi Uchida, Tetsushi Sakuma, Takashi Yamamoto, Sangwan Kim, Kosuke Tashiro, Satoru Kuhara, Koichiro Gen, Atushi Fujiwara, Yukinori Kazeto, Takanori Kobayashi, Michiya Matsuyama, Development of a chub mackerel with less-aggressive fry stage by genome editing of arginine vasotocin receptor V1a2., Scientific reports, 10.1038/s41598-023-30259-x, 13, 1, 3190-3190, 2023.02, Genome editing is a technology that can remarkably accelerate crop and animal breeding via artificial induction of desired traits with high accuracy. This study aimed to develop a chub mackerel variety with reduced aggression using an experimental system that enables efficient egg collection and genome editing. Sexual maturation and control of spawning season and time were technologically facilitated by controlling the photoperiod and water temperature of the rearing tank. In addition, appropriate low-temperature treatment conditions for delaying cleavage, shape of the glass capillary, and injection site were examined in detail in order to develop an efficient and robust microinjection system for the study. An arginine vasotocin receptor V1a2 (V1a2) knockout (KO) strain of chub mackerel was developed in order to reduce the frequency of cannibalistic behavior at the fry stage. Video data analysis using bioimage informatics quantified the frequency of aggressive behavior, indicating a significant 46% reduction (P = 0.0229) in the frequency of cannibalistic behavior than in wild type. Furthermore, in the V1a2 KO strain, the frequency of collisions with the wall and oxygen consumption also decreased. Overall, the manageable and calm phenotype reported here can potentially contribute to the development of a stable and sustainable marine product..
7. Masato Suzuki, Shikiho Kawai, Chean Fei Shee, Ryoga Yamada, Seiichi Uchida, Tomoyuki Yasukawa, Development of a simultaneous electrorotation device with microwells for monitoring the rotation rates of multiple single cells upon chemical stimulation., Lab on a chip, 10.1039/d2lc00627h, 23, 4, 692-701, 2023.02, Here, we described a unique simultaneous electrorotation (ROT) device for monitoring the rotation rate of Jurkat cells via chemical stimulation without fluorescent labeling and an algorithm for estimating cell rotation rates. The device comprised two pairs of interdigitated array electrodes that were stacked orthogonally through a 20 μm-thick insulating layer with rectangular microwells. Four microelectrodes (two were patterned on the bottom of the microwells and the other two on the insulating layer) were arranged on each side of the rectangular microwells. The cells, which were trapped in the microwells, underwent ROT when AC voltages were applied to the four microelectrodes to generate a rotating electric field. These microwells maintained the cells even in fluid flows. Thereafter, the ROT rates of the trapped cells were estimated and monitored during the stimulation. We demonstrated the feasibility of estimating the chemical efficiency of cells by monitoring the ROT rates of the cells. After introducing a Jurkat cell suspension into the device, the cells were subjected to ROT by applying an AC signal. Further, the rotating cells were chemically stimulated by adding an ionomycin (a calcium ionophore)-containing aliquot. The ROT rate of the ionomycin-stimulated cells decreased gradually to 90% of the initial rate after 30 s. The ROT rate was reduced by an increase in membrane capacitance. Thus, our device enabled the simultaneous chemical stimulation-induced monitoring of the alterations in the membrane capacitances of many cells without fluorescent labeling..
8. Takashi Akagi, Kanae Masuda, Eriko Kuwada, Kouki Takeshita, Taiji Kawakatsu, Tohru Ariizumi, Yasutaka Kubo, Koichiro Ushijima, Seiichi Uchida, Genome-wide cis-decoding for expression design in tomato using cistrome data and explainable deep learning., The Plant cell, 10.1093/plcell/koac079, 34, 6, 2174-2187, 2022.05, In the evolutionary history of plants, variation in cis-regulatory elements (CREs) resulting in diversification of gene expression has played a central role in driving the evolution of lineage-specific traits. However, it is difficult to predict expression behaviors from CRE patterns to properly harness them, mainly because the biological processes are complex. In this study, we used cistrome datasets and explainable convolutional neural network (CNN) frameworks to predict genome-wide expression patterns in tomato (Solanum lycopersicum) fruit from the DNA sequences in gene regulatory regions. By fixing the effects of trans-acting factors using single cell-type spatiotemporal transcriptome data for the response variables, we developed a prediction model for crucial expression patterns in the initiation of tomato fruit ripening. Feature visualization of the CNNs identified nucleotide residues critical to the objective expression pattern in each gene, and their effects were validated experimentally in ripening tomato fruit. This cis-decoding framework will not only contribute to the understanding of the regulatory networks derived from CREs and transcription factor interactions, but also provides a flexible means of designing alleles for optimized expression..
9. Yuma Cho, Daichi Haraguchi, Kenta Shigetomi, Kenji Matsuzawa, Seiichi Uchida, Junichi Ikenouchi, Tricellulin secures the epithelial barrier at tricellular junctions by interacting with actomyosin., The Journal of cell biology, 10.1083/jcb.202009037, 221, 4, 2022.04, The epithelial cell sheet functions as a barrier to prevent invasion of pathogens. It is necessary to eliminate intercellular gaps not only at bicellular junctions, but also at tricellular contacts, where three cells meet, to maintain epithelial barrier function. To that end, tight junctions between adjacent cells must associate as closely as possible, particularly at tricellular contacts. Tricellulin is an integral component of tricellular tight junctions (tTJs), but the molecular mechanism of its contribution to the epithelial barrier function remains unclear. In this study, we revealed that tricellulin contributes to barrier formation by regulating actomyosin organization at tricellular junctions. Furthermore, we identified α-catenin, which is thought to function only at adherens junctions, as a novel binding partner of tricellulin. α-catenin bridges tricellulin attachment to the bicellular actin cables that are anchored end-on at tricellular junctions. Thus, tricellulin mobilizes actomyosin contractility to close the lateral gap between the TJ strands of the three proximate cells that converge on tricellular junctions..
10. Yukako Oda, Chisato Takahashi, Shota Harada, Shun Nakamura, Daxiao Sun, Kazumi Kiso, Yuko Urata, Hitoshi Miyachi, Yoshinori Fujiyoshi, Alf Honigmann, Seiichi Uchida, Yasushi Ishihama, Fumiko Toyoshima, Discovery of anti-inflammatory physiological peptides that promote tissue repair by reinforcing epithelial barrier formation., Science advances, 10.1126/sciadv.abj6895, 7, 47, eabj6895, 2021.11, Epithelial barriers that prevent dehydration and pathogen invasion are established by tight junctions (TJs), and their disruption leads to various inflammatory diseases and tissue destruction. However, a therapeutic strategy to overcome TJ disruption in diseases has not been established because of the lack of clinically applicable TJ-inducing molecules. Here, we found TJ-inducing peptides (JIPs) in mice and humans that corresponded to 35 to 42 residue peptides of the C terminus of alpha 1-antitrypsin (A1AT), an acute-phase anti-inflammatory protein. JIPs were inserted into the plasma membrane of epithelial cells, which promoted TJ formation by directly activating the heterotrimeric G protein G13. In a mouse intestinal epithelial injury model established by dextran sodium sulfate, mouse or human JIP administration restored TJ integrity and strongly prevented colitis. Our study has revealed TJ-inducing anti-inflammatory physiological peptides that play a critical role in tissue repair and proposes a previously unidentified therapeutic strategy for TJ-disrupted diseases..
11. Brian Kenji Iwana, Seiichi Uchida, An empirical survey of data augmentation for time series classification with neural networks., PloS one, 10.1371/journal.pone.0254841, 16, 7, e0254841, 2021.07, In recent times, deep artificial neural networks have achieved many successes in pattern recognition. Part of this success can be attributed to the reliance on big data to increase generalization. However, in the field of time series recognition, many datasets are often very small. One method of addressing this problem is through the use of data augmentation. In this paper, we survey data augmentation techniques for time series and their application to time series classification with neural networks. We propose a taxonomy and outline the four families in time series data augmentation, including transformation-based methods, pattern mixing, generative models, and decomposition methods. Furthermore, we empirically evaluate 12 time series data augmentation methods on 128 time series classification datasets with six different types of neural networks. Through the results, we are able to analyze the characteristics, advantages and disadvantages, and recommendations of each data augmentation method. This survey aims to help in the selection of time series data augmentation for neural network applications..
12. AIとは何か? : 入門編: トクシュウ サア 、 AI オ ハジメヨウ : ドボク コウガク エ ノ AI ドウニュウ ノ ススメ.
13. Kana Aoki, Shota Harada, Keita Kawaji, Kenji Matsuzawa, Seiichi Uchida, and Junichi Ikenouchi, STIM-Orai1 Signaling Regulates Fluidity of Cytoplasm during Membrane Blebbing, Nature Communications, 10.1038/s41467-020-20826-5, 2021.01.
14. Heon Song, Daiki Suehiro, Seiichi Uchida, Adaptive aggregation of arbitrary online trackers with a regret bound, Proceedings - 2020 IEEE Winter Conference on Applications of Computer Vision, WACV 2020, 10.1109/WACV45572.2020.9093613, 670-678, 2020.03, We propose an online visual-object tracking method that is robust even in an adversarial environment, where various disturbances may occur on the target appearance, etc. The proposed method is based on a delayed-Hedge algorithm for aggregating multiple arbitrary online trackers with adaptive weights. The robustness in the tracking performance is guaranteed theoretically in term of "regret" by the property of the delayed-Hedge algorithm. Roughly speaking, the proposed method can achieve a similar tracking performance as the best one among all the trackers to be aggregated in an adversarial environment. The experimental study on various tracking tasks shows that the proposed method could achieve state-of-the-art performance by aggregating various online trackers..
15. Kana Aoki, Shinsuke Satoi, Shota Harada, Seiichi Uchida, Yoh Iwasa, Junichi Ikenouchi, Coordinated changes in cell membrane and cytoplasm during maturation of apoptotic bleb, Molecular Biology of the Cell, 10.1091/MBC.E19-12-0691, 31, 8, 833-844, 2020.03, Apoptotic cells form membrane blebs, but little is known about how the formation and dynamics of membrane blebs are regulated. The size of blebs gradually increases during the progression of apoptosis, eventually forming large extracellular vesicles called apoptotic bodies that have immune-modulating activities. In this study, we investigated the molecular mechanism involved in the differentiation of blebs into apoptotic blebs by comparing the dynamics of the bleb formed during cell migration and the bleb formed during apoptosis. We revealed that the enhanced activity of ROCK1 is required for the formation of small blebs in the early phase of apoptosis, which leads to the physical disruption of nuclear membrane and the degradation of Lamin A. In the late phase of apoptosis, the loss of asymmetry in phospholipids distribution caused the enlargement of blebs, which enabled translocation of damage-associated molecular patterns to the bleb cytoplasm and maturation of functional apoptotic blebs. Thus, changes in cell membrane dynamics are closely linked to cytoplasmic changes during apoptotic bleb formation..
16. Xiaotong Ji, Yuchen Zheng, Daiki Suehiro, Seiichi Uchida, Optimal Rejection Function Meets Character Recognition Tasks, Proceedings of the 5th Asian Conference on Pattern Recognition, 2019.11.
17. Brian Kenji Iwana, Ryohei Kuroki, Seiichi Uchida, Explaining convolutional neural networks using softmax gradient layer-wise relevance propagation, 17th IEEE/CVF International Conference on Computer Vision Workshop, ICCVW 2019 Proceedings - 2019 International Conference on Computer Vision Workshop, ICCVW 2019, 10.1109/ICCVW.2019.00513, 4176-4185, 2019.10, Convolutional Neural Networks (CNN) have become state-of-the-art in the field of image classification. However, not everything is understood about their inner representations. This paper tackles the interpretability and explainability of the predictions of CNNs for multi-class classification problems. Specifically, we propose a novel visualization method of pixel-wise input attribution called Softmax-Gradient Layer-wise Relevance Propagation (SGLRP). The proposed model is a class discriminate extension to Deep Taylor Decomposition (DTD) using the gradient of softmax to back propagate the relevance of the output probability to the input image. Through qualitative and quantitative analysis, we demonstrate that SGLRP can successfully localize and attribute the regions on input images which contribute to a target object's classification. We show that the proposed method excels at discriminating the target objects class from the other possible objects in the images. We confirm that SGLRP performs better than existing Layer-wise Relevance Propagation (LRP) based methods and can help in the understanding of the decision process of CNNs..
18. Yuchen Zheng, Wataru Ohyama, Brian Kenji Iwana, Seiichi Uchida, Capturing micro deformations from pooling layers for offline signature verification, 15th IAPR International Conference on Document Analysis and Recognition, ICDAR 2019 Proceedings - 15th IAPR International Conference on Document Analysis and Recognition, ICDAR 2019, 10.1109/ICDAR.2019.00180, 1111-1116, 2019.09, In this paper, we propose a novel Convolutional Neural Network (CNN) based method that extracts the location information (displacement features) of the maximums in the max-pooling operation and fuses it with the pooling features to capture the micro deformations between the genuine signatures and skilled forgeries as a feature extraction procedure. After the feature extraction procedure, we apply support vector machines (SVMs) as writer-dependent classifiers for each user to build the signature verification system. The extensive experimental results on GPDS-150, GPDS-300, GPDS-1000, GPDS-2000, and GPDS-5000 datasets demonstrate that the proposed method can discriminate the genuine signatures and their corresponding skilled forgeries well and achieve state-of-the-art results on these datasets..
19. Takuro Karamatsu, Daiki Suehiro, Seiichi Uchida, Logo design analysis by ranking, 15th IAPR International Conference on Document Analysis and Recognition, ICDAR 2019 Proceedings - 15th IAPR International Conference on Document Analysis and Recognition, ICDAR 2019, 10.1109/ICDAR.2019.00238, 1482-1487, 2019.09, In this paper, we analyze logo designs by using machine learning, as a promising trial of graphic design analysis. Specifically, we will focus on favicon images, which are tiny logos used as company icons on web browsers, and analyze them to understand their trends in individual industry classes. For example, if we can catch the subtle trends in favicons of financial companies, they will suggest to us how professional designers express the atmosphere of financial companies graphically. For the purpose, we will use top-rank learning, which is one of the recent machine learning methods for ranking and very suitable for revealing the subtle trends in graphic designs..
20. Joonho Lee, Hideaki Hayashi, Wataru Ohyama, Seiichi Uchida, Page segmentation using a convolutional neural network with trainable co-occurrence features, 15th IAPR International Conference on Document Analysis and Recognition, ICDAR 2019 Proceedings - 15th IAPR International Conference on Document Analysis and Recognition, ICDAR 2019, 10.1109/ICDAR.2019.00167, 1023-1028, 2019.09, In document analysis, page segmentation is a fundamental task that divides a document image into semantic regions. In addition to local features, such as pixel-wise information, co-occurrence features are also useful for extracting texture-like periodic information for accurate segmentation. However, existing convolutional neural network (CNN)-based methods do not have any mechanisms that explicitly extract co-occurrence features. In this paper, we propose a method for page segmentation using a CNN with trainable multiplication layers (TMLs). The TML is specialized for extracting co-occurrences from feature maps, thereby supporting the detection of objects with similar textures and periodicities. This property is also considered to be effective for document image analysis because of regularity in text line structures, tables, etc. In the experiment, we achieved promising performance on a pixel-wise page segmentation task by combining TMLs with U-Net. The results demonstrate that TMLs can improve performance compared to the original U-Net. The results also demonstrate that TMLs are helpful for detecting regions with periodically repeating features, such as tables and main text..
21. Yuto Shinahara, Takuro Karamatsu, Daisuke Harada, Kota Yamaguchi, Seiichi Uchida, Serif or sans Visual font analytics on book covers and online advertisements, 15th IAPR International Conference on Document Analysis and Recognition, ICDAR 2019 Proceedings - 15th IAPR International Conference on Document Analysis and Recognition, ICDAR 2019, 10.1109/ICDAR.2019.00170, 1041-1046, 2019.09, In this paper, we conduct a large-scale study of font statistics in book covers and online advertisements. Through the statistical study, we try to understand how graphic designers relate fonts and content genres and identify the relationship between font styles, colors, and genres. We propose an automatic approach to extract font information from graphic designs by applying a sequence of character detection, style classification, and clustering techniques to the graphic designs. The extracted font information is accumulated together with genre information, such as romance or business, for further trend analysis. Through our unique empirical study, we show that the collected font statistics reveal interesting trends in terms of how typographic design represents the impression and the atmosphere of the content genres..
22. Scene word recognition from pieces to whole..
23. Ryoma Bise, Kentaro Abe, Hideaki Hayashi, Kiyohito Tanaka, Seiichi Uchida, Efficient Soft-Constrained Clustering for Group-Based Labeling, 22nd International Conference on Medical Image Computing and Computer-Assisted Intervention, MICCAI 2019 Medical Image Computing and Computer Assisted Intervention – MICCAI 2019 - 22nd International Conference, Proceedings, 10.1007/978-3-030-32254-0_47, 421-430, 2019.01, We propose a soft-constrained clustering method for group-based labeling of medical images. Since the idea of group-based labeling is to attach the label to a group of samples at once, we need to have groups (i.e., clusters) with high purity. The proposed method is formulated to achieve high purity even for difficult clustering tasks such as medical image clustering, where image samples of the same class are often very distant in their feature space. In fact, those images degrade the performance of conventional constrained clustering methods. Experiments with an endoscopy image dataset demonstrated that our method outperformed various state-of-the-art methods..
24. Comic Text Detection Using Neural Network Approach..
25. Daisuke Matsuoka, Masuo Nakano, Daisuke Sugiyama, Seiichi Uchida, Deep learning approach for detecting tropical cyclones and their precursors in the simulation by a cloud-resolving global nonhydrostatic atmospheric model, Progress in Earth and Planetary Science, 10.1186/s40645-018-0245-y, 5, 1, 2018.12, We propose a deep learning approach for identifying tropical cyclones (TCs) and their precursors. Twenty year simulated outgoing longwave radiation (OLR) calculated using a cloud-resolving global atmospheric simulation is used for training two-dimensional deep convolutional neural networks (CNNs). The CNNs are trained with 50,000 TCs and their precursors and 500,000 non-TC data for binary classification. Ensemble CNN classifiers are applied to 10 year independent global OLR data for detecting precursors and TCs. The performance of the CNNs is investigated for various basins, seasons, and lead times. The CNN model successfully detects TCs and their precursors in the western North Pacific in the period from July to November with a probability of detection (POD) of 79.9–89.1% and a false alarm ratio (FAR) of 32.8–53.4%. Detection results include 91.2%, 77.8%, and 74.8% of precursors 2, 5, and 7 days before their formation, respectively, in the western North Pacific. Furthermore, although the detection performance is correlated with the amount of training data and TC lifetimes, it is possible to achieve high detectability with a POD exceeding 70% and a FAR below 50% during TC season for several ocean basins, such as the North Atlantic, with a limited sample size and short lifetime. [Figure not available: see fulltext.]..
26. Daisuke Matsuoka, Masuo Nakano, Daisuke Sugiyama, Seiichi Uchida, Deep learning approach for detecting tropical cyclones and their precursors in the simulation by a cloud-resolving global nonhydrostatic atmospheric model, Progress in Earth and Planetary Science, 10.1186/s40645-018-0245-y, 5, 1, 2018.12, We propose a deep learning approach for identifying tropical cyclones (TCs) and their precursors. Twenty year simulated outgoing longwave radiation (OLR) calculated using a cloud-resolving global atmospheric simulation is used for training two-dimensional deep convolutional neural networks (CNNs). The CNNs are trained with 50,000 TCs and their precursors and 500,000 non-TC data for binary classification. Ensemble CNN classifiers are applied to 10 year independent global OLR data for detecting precursors and TCs. The performance of the CNNs is investigated for various basins, seasons, and lead times. The CNN model successfully detects TCs and their precursors in the western North Pacific in the period from July to November with a probability of detection (POD) of 79.9–89.1% and a false alarm ratio (FAR) of 32.8–53.4%. Detection results include 91.2%, 77.8%, and 74.8% of precursors 2, 5, and 7 days before their formation, respectively, in the western North Pacific. Furthermore, although the detection performance is correlated with the amount of training data and TC lifetimes, it is possible to achieve high detectability with a POD exceeding 70% and a FAR below 50% during TC season for several ocean basins, such as the North Atlantic, with a limited sample size and short lifetime. [Figure not available: see fulltext.]..
27. Daisuke Matsuoka, Masuo Nakano, Daisuke Sugiyama, Seiichi Uchida, Deep learning approach for detecting tropical cyclones and their precursors in the simulation by a cloud-resolving global nonhydrostatic atmospheric model, Progress in Earth and Planetary Science, 10.1186/s40645-018-0245-y, 5, 1, 2018.12, We propose a deep learning approach for identifying tropical cyclones (TCs) and their precursors. Twenty year simulated outgoing longwave radiation (OLR) calculated using a cloud-resolving global atmospheric simulation is used for training two-dimensional deep convolutional neural networks (CNNs). The CNNs are trained with 50,000 TCs and their precursors and 500,000 non-TC data for binary classification. Ensemble CNN classifiers are applied to 10 year independent global OLR data for detecting precursors and TCs. The performance of the CNNs is investigated for various basins, seasons, and lead times. The CNN model successfully detects TCs and their precursors in the western North Pacific in the period from July to November with a probability of detection (POD) of 79.9–89.1% and a false alarm ratio (FAR) of 32.8–53.4%. Detection results include 91.2%, 77.8%, and 74.8% of precursors 2, 5, and 7 days before their formation, respectively, in the western North Pacific. Furthermore, although the detection performance is correlated with the amount of training data and TC lifetimes, it is possible to achieve high detectability with a POD exceeding 70% and a FAR below 50% during TC season for several ocean basins, such as the North Atlantic, with a limited sample size and short lifetime. [Figure not available: see fulltext.]..
28. Daisuke Matsuoka, Masuo Nakano, Daisuke Sugiyama, Seiichi Uchida, Deep learning approach for detecting tropical cyclones and their precursors in the simulation by a cloud-resolving global nonhydrostatic atmospheric model, Progress in Earth and Planetary Science, 10.1186/s40645-018-0245-y, 5, 1, 2018.12, We propose a deep learning approach for identifying tropical cyclones (TCs) and their precursors. Twenty year simulated outgoing longwave radiation (OLR) calculated using a cloud-resolving global atmospheric simulation is used for training two-dimensional deep convolutional neural networks (CNNs). The CNNs are trained with 50,000 TCs and their precursors and 500,000 non-TC data for binary classification. Ensemble CNN classifiers are applied to 10 year independent global OLR data for detecting precursors and TCs. The performance of the CNNs is investigated for various basins, seasons, and lead times. The CNN model successfully detects TCs and their precursors in the western North Pacific in the period from July to November with a probability of detection (POD) of 79.9–89.1% and a false alarm ratio (FAR) of 32.8–53.4%. Detection results include 91.2%, 77.8%, and 74.8% of precursors 2, 5, and 7 days before their formation, respectively, in the western North Pacific. Furthermore, although the detection performance is correlated with the amount of training data and TC lifetimes, it is possible to achieve high detectability with a POD exceeding 70% and a FAR below 50% during TC season for several ocean basins, such as the North Atlantic, with a limited sample size and short lifetime. [Figure not available: see fulltext.]..
29. A Trainable Multiplication Layer for Auto-correlation and Co-occurrence Extraction..
30. Daisuke Matsuoka, Masuo Nakano, Daisuke Sugiyama, Seiichi Uchida, Deep learning approach for detecting tropical cyclones and their precursors in the simulation by a cloud-resolving global nonhydrostatic atmospheric model, Progress in Earth and Planetary Science, 10.1186/s40645-018-0245-y, 5, 1, 2018.12, We propose a deep learning approach for identifying tropical cyclones (TCs) and their precursors. Twenty year simulated outgoing longwave radiation (OLR) calculated using a cloud-resolving global atmospheric simulation is used for training two-dimensional deep convolutional neural networks (CNNs). The CNNs are trained with 50,000 TCs and their precursors and 500,000 non-TC data for binary classification. Ensemble CNN classifiers are applied to 10 year independent global OLR data for detecting precursors and TCs. The performance of the CNNs is investigated for various basins, seasons, and lead times. The CNN model successfully detects TCs and their precursors in the western North Pacific in the period from July to November with a probability of detection (POD) of 79.9–89.1% and a false alarm ratio (FAR) of 32.8–53.4%. Detection results include 91.2%, 77.8%, and 74.8% of precursors 2, 5, and 7 days before their formation, respectively, in the western North Pacific. Furthermore, although the detection performance is correlated with the amount of training data and TC lifetimes, it is possible to achieve high detectability with a POD exceeding 70% and a FAR below 50% during TC season for several ocean basins, such as the North Atlantic, with a limited sample size and short lifetime. [Figure not available: see fulltext.]..
31. Daisuke Matsuoka, Masuo Nakano, Daisuke Sugiyama, Seiichi Uchida, Deep learning approach for detecting tropical cyclones and their precursors in the simulation by a cloud-resolving global nonhydrostatic atmospheric model, Progress in Earth and Planetary Science, 10.1186/s40645-018-0245-y, 5, 1, 2018.12, We propose a deep learning approach for identifying tropical cyclones (TCs) and their precursors. Twenty year simulated outgoing longwave radiation (OLR) calculated using a cloud-resolving global atmospheric simulation is used for training two-dimensional deep convolutional neural networks (CNNs). The CNNs are trained with 50,000 TCs and their precursors and 500,000 non-TC data for binary classification. Ensemble CNN classifiers are applied to 10 year independent global OLR data for detecting precursors and TCs. The performance of the CNNs is investigated for various basins, seasons, and lead times. The CNN model successfully detects TCs and their precursors in the western North Pacific in the period from July to November with a probability of detection (POD) of 79.9–89.1% and a false alarm ratio (FAR) of 32.8–53.4%. Detection results include 91.2%, 77.8%, and 74.8% of precursors 2, 5, and 7 days before their formation, respectively, in the western North Pacific. Furthermore, although the detection performance is correlated with the amount of training data and TC lifetimes, it is possible to achieve high detectability with a POD exceeding 70% and a FAR below 50% during TC season for several ocean basins, such as the North Atlantic, with a limited sample size and short lifetime. [Figure not available: see fulltext.]..
32. Qier Meng, Kiyohito Tanaka, Shin'ichi Satoh, Masaru Kitsuregawa, Yusuke Kurose, Tatsuya Harada, Hideaki Hayashi, Ryoma Bise, Seiichi Uchida, Masahiro Oda, Kensaku Mori, Anatomical location classification of gastroscopic images using DenseNet trained from Cyclical Learning Rate, MIRU2018, PS1-51, 2018.08.
33. Tsukamoto M, Chiba K, Sobu Y, Shiraki Y, Okumura Y, Hata S, Kitamura A, Nakaya T, Uchida S, Kinjo M, Taru H, Toshiharu S, The cytoplasmic region of the amyloid β-protein precursor (APP) is necessary and sufficient for the enhanced fast velocity of APP transport by kinesin-1., FEBS letters, 10.1002/1873-3468.13204, 592, 16, 2716-2724, 2018.08.
34. An Image-Based Representation for Graph Classification..
35. Discovering Class-Wise Trends of Max-Pooling in Subspace..
36. How do Convolutional Neural Networks Learn Design?.
37. Introducing Local Distance-Based Features to Temporal Convolutional Neural Networks..
38. On Fast Sample Preselection for Speeding up Convolutional Neural Network Training..
39. Biosignal Data Augmentation Based on Generative Adversarial Networks..
40. CNN Training with Graph-Based Sample Preselection: Application to Handwritten Character Recognition..
41. Contained Neural Style Transfer for Decorated Logo Generation..
42. Text Line Extraction Based on Integrated K-Shortest Paths Optimization..
43. Liuan Wang, Seiichi Uchida, Anna Zhu, Jun Sun, Human Reading Knowledge Inspired Text Line Extraction, Cognitive Computation, 10.1007/s12559-017-9490-4, 10, 1, 84-93, 2018.02, Text in images contains exact semantic information and the text knowledge can be utilized in many image cognition and understanding applications. The human reading habits can provide the clues of text line structure for text line extraction. In this paper, we propose a novel human reading knowledge inspired text line extraction method based on k-shortest paths global optimization. Firstly, the candidate character extraction is reformulated as Maximal Stable Extremal Region (MSER) algorithm on gray, red, blue, and green channels of the target images, and the extracted MSERs are fed into Convolutional Neural Network (CNN) to remove the noise components. Then, the directed graph is built upon the character component nodes with edges inspired by human reading sense. The directed graph can automatically construct the relationship to eliminate the disorder of candidate text components. The text line paths optimization is inspired by the human reading ability in planning of a text line path sequentially. Therefore, the text line extraction problem can be solved using the k-shortest paths optimization algorithm by taking advantage of the human reading sense structure of the directed graph. It can extract the text lines iteratively to avoid the exhaustive searching and obtain global optimized text line number. The proposed method achieves the f-measure of 0.820 and 0.812 on public ICDAR2011 and ICDAR2013 dataset, respectively. The experimental results demonstrate the effectiveness of the proposed human reading knowledge inspired text line extraction method in comparison with state-of-the-art methods This paper presents one human reading knowledge inspired text line extraction method, which approves that the human reading knowledge can benefit the text line extraction and image text discovery..
44. Brian Kenji Iwana, Letao Zhou, Kumiko Tanaka-Ishii, Seiichi Uchida, Component Awareness in Convolutional Neural Networks, Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, 10.1109/ICDAR.2017.72, 1, 394-399, 2018.01, In this work, we investigate the ability of Convolutional Neural Networks (CNN) to infer the presence of components that comprise an image. In recent years, CNNs have achieved powerful results in classification, detection, and segmentation. However, these models learn from instance-level supervision of the detected object. In this paper, we determine if CNNs can detect objects using image-level weakly supervised labels without localization. To demonstrate that a CNN can infer awareness of objects, we evaluate a CNN's classification ability with a database constructed of Chinese characters with only character-level labeled components. We show that the CNN is able to achieve a high accuracy in identifying the presence of these components without specific knowledge of the component. Furthermore, we verify that the CNN is deducing the knowledge of the target component by comparing the results to an experiment with the component removed. This research is important for applications with large amounts of data without robust annotation such as Chinese character recognition..
45. Shota Ide, Seiichi Uchida, How Does a CNN Manage Different Printing Types?, Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, 10.1109/ICDAR.2017.167, 1, 1004-1009, 2018.01, In past OCR research, different OCR engines are used for different printing types, i.e., machine-printed characters, handwritten characters, and decorated fonts. A recent research, however, reveals that convolutional neural networks (CNN) can realize a universal OCR, which can deal with any printing types without pre-classification into individual types. In this paper, we analyze how CNN for universal OCR manage the different printing types. More specifically, we try to find where a handwritten character of a class and a machine-printed character of the same class are 'fused' in CNN. For analysis, we use two different approaches. The first approach is statistical analysis for detecting the CNN units which are sensitive (or insensitive) to type difference. The second approach is network-based visualization of pattern distribution in each layer. Both analyses suggest the same trend that types are not fully fused in convolutional layers but the distributions of the same class from different types become closer in upper layers..
46. Gantugs Atarsaikhan, Brian Kenji Iwana, Atsushi Narusawa, Keiji Yanai, Seiichi Uchida, Neural Font Style Transfer, Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, 10.1109/ICDAR.2017.328, 5, 51-56, 2018.01, In this paper, we chose an approach to generate fonts by using neural style transfer. Neural style transfer uses Convolution Neural Networks(CNN) to transfer the style of one image to another. By modifying neural style transfer, we can achieve neural font style transfer. We also demonstrate the effects of using different weighted factors, character placements, and orientations. In addition, we show the results of using non-Latin alphabets, non-text patterns, and non-text images as style images. Finally, we provide insight into the characteristics of style transfer with fonts..
47. Toshiki Nakamura, Anna Zhu, Keiji Yanai, Seiichi Uchida, Scene Text Eraser, Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, 10.1109/ICDAR.2017.141, 1, 832-837, 2018.01, The character information in natural scene images contains various personal information, such as telephone numbers, home addresses, etc. It is a high risk of leakage the information if they are published. In this paper, we proposed a scene text erasing method to properly hide the information via an inpainting convolutional neural network (CNN) model. The input is a scene text image, and the output is expected to be text erased image with all the character regions filled up the colors of the surrounding background pixels. This work is accomplished byaCNNmodelthroughconvolutiontodeconvolutionwithinterconnection process. The training samples and the corresponding inpainting images are considered as teaching signals for training. To evaluate the text erasing performance, the output images are detected by a novel scene text detection method. Subsequently, the same measurement on text detection is utilized for testing the images in benchmark dataset ICDAR2013. Compared with direct text detection way, the scene text erasing process demonstrates a drastically decrease on the precision, recall and f-score. That proves the effectiveness of proposed method for erasing the text in natural scene images..
48. Anna Zhu, Seiichi Uchida, Scene Text Relocation with Guidance, Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, 10.1109/ICDAR.2017.212, 1, 1289-1294, 2018.01, Applying object proposal technique for scene text detection becomes popular for its significant improvement in speed and accuracy for object detection. However, some of the text regions after the proposal classification are overlapped and hard to remove or merge. In this paper, we present a scene text relocation system that refines the detection from text proposals to text. An object proposal-based deep neural network is employed to get the text proposals. To tackle the detection overlapping problem, a refinement deep neural network relocates the overlapped regions by estimating the text probability inside, and locating the accurate text regions by thresholding. Since the spacebetweenwordsindifferenttextlinesarevarious, aguidance mechanism is proposed in text relocation to guide where to extract the text regions in word level. This refinement procedure helps boost the precision after removing multiple overlapped text regions or joint cracked text regions. The experimental results on standard benchmark ICDAR 2013 demonstrate the effectiveness of the proposed approach..
49. Font Creation Using Class Discriminative Deep Convolutional Generative Adversarial Networks..
50. Jinho Lee, Brian Kenji Iwana, Shouta Ide, Hideaki Hayashi, Seiichi Uchida, Globally Optimal Object Tracking with Complementary Use of Single Shot Multibox Detector and Fully Convolutional Network, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 10.1007/978-3-319-75786-5_10, 10749, 110-122, 2017.11, Object tracking is one of the most important but still difficult tasks in computer vision and pattern recognition. The main difficulties in the tracking task are appearance variation of target objects and occlusion. To deal with those difficulties, we propose a object tracking method combining Single Shot Multibox Detector (SSD), Fully Convolutional Network (FCN) and Dynamic Programming (DP). SSD and FCN provide a probability value of the target object which allows for appearance variation within each category. DP provides a globally optimal tracking path even with severe occlusions. Through several experiments, we confirmed that their combination realized a robust object tracking method. Also, in contrast to traditional trackers, initial position and a template of the target do not need to be specified. We show that the proposed method has a higher performance than the traditional trackers in tracking various single objects through video frames..
51. Kotaro Abe, Brian Kenji Iwana, Viktor Gösta Holmér and Seiichi Uchida, Font Creation Using Generative Adversarial Networks with Class Discrimination, Proceedings of Asian Conference on Pattern Recognition (ACPR2017, Nanjing, China), 2017.10.
52. Brian Kenji Iwana, Volkmar Frinkena, Kaspar Riesen, Seiichi Uchida, Efficient temporal pattern recognition by means of dissimilarity space embedding with discriminative prototypes, PATTERN RECOGNITION, 10.1016/j.patcog.2016.11.013, 64, 268-276, 2017.04, Dissimilarity space embedding (DSE) presents a method of representing data as vectors of dissimilarities. This representation is interesting for its ability to use a dissimilarity measure to embed various patterns (e.g. graph patterns with different topology and temporal patterns with different lengths) into a vector space. The method proposed in this paper uses a dynamic tithe warping (DTW) based DSE for the purpose of the classification of massive sets of temporal patterns. However, using large data sets introduces the problem of requiring a high computational cost. To address this, we consider a prototype selection approach. A vector space created by DSE offers us the ability to treat its independent dimensions as features allowing for the use of feature selection. The proposed method exploits this and reduces the number of prototypes required for accurate classification. To validate the proposed method we use two-class classification on a data set of handwritten on-line numerical digits. We show that by using DSE with ensemble classification, high accuracy classification is possible with very few prototypes..
53. Kenji Kimura, Alexandre Mamane, Tohru Sasaki, Kohta Sato, Jun Takagi, Ritsuya Niwayama, Lars Hufnagel, Yuta Shimamoto, Jean-Francois Joanny, Seiichi Uchida, Akatsuki Kimura, Endoplasmic-reticulum-mediated microtubule alignment governs cytoplasmic streaming, NATURE CELL BIOLOGY, 10.1038/ncb3490, 19, 4, 399-+, 2017.04, Cytoplasmic streaming refers to a collective movement of cytoplasm observed in many cell types(1-7). The mechanism of meiotic cytoplasmic streaming (MeiCS) in Caenorhabditis elegans zygotes is puzzling as the direction of the flow is not predefined by cell polarity and occasionally reverses(6). Here, we demonstrate that the endoplasmic reticulum (ER) network structure is required for the collective flow. Using a combination of RNAi, microscopy and image processing of C. elegans zygotes, we devise a theoretical model, which reproduces and predicts the emergence and reversal of the flow. We propose a positive-feedback mechanism, where a local flow generated along a microtubule is transmitted to neighbouring regions through the ER. This, in turn, aligns microtubules over a broader area to self-organize the collective flow. The proposed model could be applicable to various cytoplasmic streaming phenomena in the absence of predefined polarity. The increased mobility of cortical granules by MeiCS correlates with the efficient exocytosis of the granules to protect the zygotes from osmotic and mechanical stresses..
54. Koichi Kise, Shinichiro Omachi, Seiichi Uchida, Masakazu Iwamura, Masahiko Inami, Kai Kunze, Reading-life log as a new paradigm of utilizing character and document media, Human-Harmonized Information Technology, 10.1007/978-4-431-56535-2_7, 2, 197-233, 2017.04, "You are what you read." As this sentence implies, reading is important for building our minds. We are investing a huge amount of time for reading to input information. However the activity of "reading" is done only by each individual in an analog way and nothing is digitally recorded and reused. In order to solve this problem, we record reading activities as digital data and analyze them for various goals. We call this research "reading-life log." In this chapter, we describe our achievements of the reading-life log. A target of the reading-life log is to analyze reading activities quantitatively and qualitatively: when, how much, what you read, and how you read in terms of your interests and understanding. Body-worn sensors including intelligent eyewear are employed for this purpose. Another target is to analyze the contents of documents based on the users' reading activities: for example, which are the parts most people feel difficult/interesting. Materials to be read are not limited to books and documents. Scene texts are also important materials which guide human activities..
55. Tomohiro Nakayasu, Masaki Yasugi, Soma Shiraishi, Seiichi Uchida, Eiji Watanabe, Three-Dimensional Computer Graphic Animations for Studying Social Approach Behaviour in Medaka Fish: Effects of Systematic Manipulation of Morphological and Motion Cues, PLoS ONE, 2017.04.
56. Tomohiro Nakayasu, Masaki Yasugi, Soma Shiraishi, Seiichi Uchida, Eiji Watanabe, Three-dimensional computer graphic animations for studying social approach behaviour in medaka fish Effects of systematic manipulation of morphological and motion cues, PloS one, 10.1371/journal.pone.0175059, 12, 4, 2017.04, We studied social approach behaviour in medaka fish using three-dimensional computer graphic (3DCG) animations based on the morphological features and motion characteristics obtained from real fish. This is the first study which used 3DCG animations and examined the relative effects of morphological and motion cues on social approach behaviour in medaka. Various visual stimuli, e.g., lack of motion, lack of colour, alternation in shape, lack of locomotion, lack of body motion, and normal virtual fish in which all four features (colour, shape, locomotion, and body motion) were reconstructed, were created and presented to fish using a computer display. Medaka fish presented with normal virtual fish spent a long time in proximity to the display, whereas time spent near the display was decreased in other groups when compared with normal virtual medaka group. The results suggested that the naturalness of visual cues contributes to the induction of social approach behaviour. Differential effects between body motion and locomotion were also detected. 3DCG animations can be a useful tool to study the mechanisms of visual processing and social behaviour in medaka..
57. Tomohiro Nakayasu, Masaki Yasugi, Soma Shiraishi, Seiichi Uchida, Eiji Watanabe, Three-dimensional computer graphic animations for studying social approach behaviour in medaka fish Effects of systematic manipulation of morphological and motion cues, PloS one, 10.1371/journal.pone.0175059, 12, 4, 2017.04, We studied social approach behaviour in medaka fish using three-dimensional computer graphic (3DCG) animations based on the morphological features and motion characteristics obtained from real fish. This is the first study which used 3DCG animations and examined the relative effects of morphological and motion cues on social approach behaviour in medaka. Various visual stimuli, e.g., lack of motion, lack of colour, alternation in shape, lack of locomotion, lack of body motion, and normal virtual fish in which all four features (colour, shape, locomotion, and body motion) were reconstructed, were created and presented to fish using a computer display. Medaka fish presented with normal virtual fish spent a long time in proximity to the display, whereas time spent near the display was decreased in other groups when compared with normal virtual medaka group. The results suggested that the naturalness of visual cues contributes to the induction of social approach behaviour. Differential effects between body motion and locomotion were also detected. 3DCG animations can be a useful tool to study the mechanisms of visual processing and social behaviour in medaka..
58. Tomohiro Nakayasu, Masaki Yasugi, Soma Shiraishi, Seiichi Uchida, Eiji Watanabe, Three-dimensional computer graphic animations for studying social approach behaviour in medaka fish Effects of systematic manipulation of morphological and motion cues, PloS one, 10.1371/journal.pone.0175059, 12, 4, 2017.04, We studied social approach behaviour in medaka fish using three-dimensional computer graphic (3DCG) animations based on the morphological features and motion characteristics obtained from real fish. This is the first study which used 3DCG animations and examined the relative effects of morphological and motion cues on social approach behaviour in medaka. Various visual stimuli, e.g., lack of motion, lack of colour, alternation in shape, lack of locomotion, lack of body motion, and normal virtual fish in which all four features (colour, shape, locomotion, and body motion) were reconstructed, were created and presented to fish using a computer display. Medaka fish presented with normal virtual fish spent a long time in proximity to the display, whereas time spent near the display was decreased in other groups when compared with normal virtual medaka group. The results suggested that the naturalness of visual cues contributes to the induction of social approach behaviour. Differential effects between body motion and locomotion were also detected. 3DCG animations can be a useful tool to study the mechanisms of visual processing and social behaviour in medaka..
59. Tomohiro Nakayasu, Masaki Yasugi, Soma Shiraishi, Seiichi Uchida, Eiji Watanabe, Three-dimensional computer graphic animations for studying social approach behaviour in medaka fish Effects of systematic manipulation of morphological and motion cues, PloS one, 10.1371/journal.pone.0175059, 12, 4, 2017.04, We studied social approach behaviour in medaka fish using three-dimensional computer graphic (3DCG) animations based on the morphological features and motion characteristics obtained from real fish. This is the first study which used 3DCG animations and examined the relative effects of morphological and motion cues on social approach behaviour in medaka. Various visual stimuli, e.g., lack of motion, lack of colour, alternation in shape, lack of locomotion, lack of body motion, and normal virtual fish in which all four features (colour, shape, locomotion, and body motion) were reconstructed, were created and presented to fish using a computer display. Medaka fish presented with normal virtual fish spent a long time in proximity to the display, whereas time spent near the display was decreased in other groups when compared with normal virtual medaka group. The results suggested that the naturalness of visual cues contributes to the induction of social approach behaviour. Differential effects between body motion and locomotion were also detected. 3DCG animations can be a useful tool to study the mechanisms of visual processing and social behaviour in medaka..
60. Tomohiro Nakayasu, Masaki Yasugi, Soma Shiraishi, Seiichi Uchida, Eiji Watanabe, Three-dimensional computer graphic animations for studying social approach behaviour in medaka fish Effects of systematic manipulation of morphological and motion cues, PloS one, 10.1371/journal.pone.0175059, 12, 4, 2017.04, We studied social approach behaviour in medaka fish using three-dimensional computer graphic (3DCG) animations based on the morphological features and motion characteristics obtained from real fish. This is the first study which used 3DCG animations and examined the relative effects of morphological and motion cues on social approach behaviour in medaka. Various visual stimuli, e.g., lack of motion, lack of colour, alternation in shape, lack of locomotion, lack of body motion, and normal virtual fish in which all four features (colour, shape, locomotion, and body motion) were reconstructed, were created and presented to fish using a computer display. Medaka fish presented with normal virtual fish spent a long time in proximity to the display, whereas time spent near the display was decreased in other groups when compared with normal virtual medaka group. The results suggested that the naturalness of visual cues contributes to the induction of social approach behaviour. Differential effects between body motion and locomotion were also detected. 3DCG animations can be a useful tool to study the mechanisms of visual processing and social behaviour in medaka..
61. Tomohiro Nakayasu, Masaki Yasugi, Soma Shiraishi, Seiichi Uchida, Eiji Watanabe, Three-dimensional computer graphic animations for studying social approach behaviour in medaka fish Effects of systematic manipulation of morphological and motion cues, PloS one, 10.1371/journal.pone.0175059, 12, 4, 2017.04, We studied social approach behaviour in medaka fish using three-dimensional computer graphic (3DCG) animations based on the morphological features and motion characteristics obtained from real fish. This is the first study which used 3DCG animations and examined the relative effects of morphological and motion cues on social approach behaviour in medaka. Various visual stimuli, e.g., lack of motion, lack of colour, alternation in shape, lack of locomotion, lack of body motion, and normal virtual fish in which all four features (colour, shape, locomotion, and body motion) were reconstructed, were created and presented to fish using a computer display. Medaka fish presented with normal virtual fish spent a long time in proximity to the display, whereas time spent near the display was decreased in other groups when compared with normal virtual medaka group. The results suggested that the naturalness of visual cues contributes to the induction of social approach behaviour. Differential effects between body motion and locomotion were also detected. 3DCG animations can be a useful tool to study the mechanisms of visual processing and social behaviour in medaka..
62. Tomohiro Nakayasu, Masaki Yasugi, Soma Shiraishi, Seiichi Uchida, Eiji Watanabe, Three-dimensional computer graphic animations for studying social approach behaviour in medaka fish Effects of systematic manipulation of morphological and motion cues, PloS one, 10.1371/journal.pone.0175059, 12, 4, 2017.04, We studied social approach behaviour in medaka fish using three-dimensional computer graphic (3DCG) animations based on the morphological features and motion characteristics obtained from real fish. This is the first study which used 3DCG animations and examined the relative effects of morphological and motion cues on social approach behaviour in medaka. Various visual stimuli, e.g., lack of motion, lack of colour, alternation in shape, lack of locomotion, lack of body motion, and normal virtual fish in which all four features (colour, shape, locomotion, and body motion) were reconstructed, were created and presented to fish using a computer display. Medaka fish presented with normal virtual fish spent a long time in proximity to the display, whereas time spent near the display was decreased in other groups when compared with normal virtual medaka group. The results suggested that the naturalness of visual cues contributes to the induction of social approach behaviour. Differential effects between body motion and locomotion were also detected. 3DCG animations can be a useful tool to study the mechanisms of visual processing and social behaviour in medaka..
63. Masanori Goto, Ryosuke Ishida, Seiichi Uchida, A Preselection-Based Fast Support Vector Machine Learning for Large-Scale Pattern Sets using Compressed Relative Neighborhood Graph, Research Reports on Information Science and Electrical Engineering of Kyushu University, 22, 1, 1-7, 2017.01, We propose a pre-selection method for training support vector machines (SVM) with a largescale dataset. Specifically, the proposed method selects patterns around the class boundary and the selected data is fed to train an SVM. For the selection, that is, searching for boundary patterns, we utilize a compressed representation of relative neighborhood graph (Clustered-RNG). A Clustered-RNG is a network of neighboring patterns which have a different class label and thus, we can find boundary patterns between different classes. Through large-scale handwritten digit pattern recognition experiments, we show that the proposed pre-selection method accelerates SVM training process 10 times faster without degrading recognition accuracy..
64. Yuki Sato, Kei Nagatoshi, Ayumi Hamano, Yuko Imamura, David Huss, Seiichi Uchida, Rusty Lansford, Basal filopodia and vascular mechanical stress organize fibronectin into pillars bridging the mesoderm-endoderm gap, Development (Cambridge), 10.1242/dev.141259, 144, 2, 281-291, 2017.01, Cells may exchange information with other cells and tissues by exerting forces on the extracellular matrix (ECM). Fibronectin (FN) is an important ECM component that forms fibrils through cell contacts and creates directionally biased geometry. Here, we demonstrate that FN is deposited as pillars between widely separated germ layers, namely the somitic mesoderm and the endoderm, in quail embryos. Alongside the FN pillars, long filopodia protrude from the basal surfaces of somite epithelial cells. Loss-of-function of Ena/VASP, α5β1-integrins or talin in the somitic cells abolished the FN pillars, indicating that FN pillar formation is dependent on the basal filopodia through these molecules. The basal filopodia and FN pillars are also necessary for proper somite morphogenesis. We identified a new mechanism contributing to FN pillar formation by focusing on cyclic expansion of adjacent dorsal aorta. Maintenance of the directional alignment of the FN pillars depends on pulsatile blood flow through the dorsal aortae. These results suggest that the FN pillars are specifically established through filopodia-mediated and pulsating force-related mechanisms..
65. Takano, Shigeru, Hori, Maiya, Goto, Takayuki, Uchida, Seiichi, Kurazume, Ryo, Taniguchi, Rin-ichiro, Deep Learning-based Prediction Method for People Flows and Their Anomalies, ICPRAM: PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION APPLICATIONS AND METHODS, 10.5220/0006248806760683, 676-683, 2017.01, This paper proposes prediction methods for people flows and anomalies in people flows on a university campus. The proposed methods are based on deep learning frameworks. By predicting the statistics of people flow conditions on a university campus, it becomes possible to create applications that predict future crowded places and the time when congestion will disappear. Our prediction methods will be useful for developing applications for solving problems in cities..
66. Seiichi Uchida, Yuto Shinahara, What Does Scene Text Tell Us?, 2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 10.1109/ICPR.2016.7900267, 4047-4052, 2016.12, Scene text is one of the most important information sources for our daily life because it has particular functions such as disambiguation and navigation. In contrast, ordinary document text has no such function. Consequently, it is natural to have a hypothesis that scene text and document text have different characteristics. This paper tries to prove this hypothesis by semantic analysis of texts by word2vec, which is a neural network model to give a vector representation of each word. By the vector representation, we can have the semantic distributions of scene text and document text in Euclidean space and then determine their semantic categories by simple clustering. Experimental study reveals several differences between scene text and document text. For example, it is found that scene text is a semantic subset of document text and several semantic categories are very specific to scene text..
67. Anna Zhu, Renwu Gao, Seiichi Uchida, Could scene context be beneficial for scene text detection?, PATTERN RECOGNITION, 10.1016/j.patcog.2016.04.011, 58, 204-215, 2016.10, Scene text detection and scene segmentation are meaningful tasks in the computer vision field. Could the semantic scene segmentation assist scene text detection in any degree? For example, can we expect the probability of a region being text is low if its surrounding segment, i.e., its context, is labeled as sky? In this paper, we have a positive answer by constructing a scene context-based text detection model. In this model, we use texton features and a fully-connected conditional random field (CRF) to estimate pixel-level scene class's probability to be considered as image's context feature. Meanwhile, maximally stable extremal regions (MSERs) are extracted, integrated and extended as image patches of character candidates. Then, each image patch is fed to a simple two-layer convolutional neural network (CNN) to automatically extract its character feature. The averaged context feature of the corresponding patch is considered as the patch's context feature. The character feature and context feature are fused as the input into a support vector machine for text/non-text determination. Finally, as a post-processing, neighboring text regions are grouped hierarchically. The performance evaluation on ICDAR2013 and SVT databases, as well as a preliminary evaluation on a patch-level database, proves that the scene context can improve the performance of scene text detection. Moreover, the comparative study with state-of-the-art methods shows the top-level performance of our method. (C) 2016 Elsevier Ltd. All rights reserved..
68. Seiichi Uchida, Shota Ide, Brian Kenji Iwana, Anna Zhu, A Further Step to Perfect Accuracy by Training CNN with Larger Data, PROCEEDINGS OF 2016 15TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR), 10.1109/ICFHR.2016.77, 405-410, 2016.10, Convolutional Neural Networks (CNN) are on the forefront of accurate character recognition. This paper explores CNNs at their maximum capacity by implementing the use of large datasets. We show a near-perfect performance by using a dataset of about 820,000 real samples of isolated handwritten digits, much larger than the conventional MNIST database. In addition, we report a near-perfect performance on the recognition of machine-printed digits and multi-font digital born digits. Also, in order to progress toward a universal OCR, we propose methods of combining the datasets into one classifier. This paper reveals the effects of combining the datasets prior to training and the effects of transfer learning during training. The results of the proposed methods also show an almost perfect accuracy suggesting the ability of the network to generalize all forms of text..
69. Brian Kenji Iwana, Volkmar Frinken, Seiichi Uchida, A Robust Dissimilarity-based Neural Network for Temporal Pattern Recognition, PROCEEDINGS OF 2016 15TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR), 10.1109/ICFHR.2016.53, 265-270, 2016.10, Temporal pattern recognition is challenging because temporal patterns require extra considerations over other data types, such as order, structure, and temporal distortions. Recently, there has been a trend in using large data and deep learning, however, many of the tools cannot be directly used with temporal patterns. Convolutional Neural Networks (CNN) for instance are traditionally used for visual and image pattern recognition. This paper proposes a method using a neural network to classify isolated temporal patterns directly. The proposed method uses dynamic time warping (DTW) as a kernel-like function to learn dissimilarity-based feature maps as the basis of the network. We show that using the proposed DTW-NN, efficient classification of on-line handwritten digits is possible with accuracies comparable to state-of-the-art methods..
70. O. Nedzvedz, S. Ablameyko, S. Uchida, Extraction and tracking living cells in medical images, IDT 2016 - Proceedings of the International Conference on Information and Digital Technologies 2016, 10.1109/DT.2016.7557173, 198-202, 2016.08, One of the important problems of cytological image analysis is the cell segmentation. Today the most perspective direction of cytological image analysis is living cells investigation. Such images lead to many troubles for cell analysis. In this paper, we propose a solution of one such problems: pattern extraction of living cells from its aggregation and measurement of their 3D characteristics..
71. Shigeru Matsumura, Tomoko Kojidani, Yuji Kamioka, Seiichi Uchida, Tokuko Haraguchi, Akatsuki Kimura, Fumiko Toyoshima, Interphase adhesion geometry is transmitted to an internal regulator for spindle orientation via caveolin-1, NATURE COMMUNICATIONS, 10.1038/ncomms11858, 7, ncomms11858, 2016.06, Despite theoretical and physical studies implying that cell-extracellular matrix adhesion geometry governs the orientation of the cell division axis, the molecular mechanisms that translate interphase adhesion geometry to the mitotic spindle orientation remain elusive. Here, we show that the cellular edge retraction during mitotic cell rounding correlates with the spindle axis. At the onset of mitotic cell rounding, caveolin-1 is targeted to the retracting cortical region at the proximal end of retraction fibres, where ganglioside GM1-enriched membrane domains with clusters of caveola-like structures are formed in an integrin and RhoA-dependent manner. Furthermore, G alpha i1-LGN-NuMA, a well-known regulatory complex of spindle orientation, is targeted to the caveolin-1-enriched cortical region to guide the spindle axis towards the cellular edge retraction. We propose that retraction-induced cortical heterogeneity of caveolin-1 during mitotic cell rounding sets the spindle orientation in the context of adhesion geometry..
72. Markus Goldstein, Seiichi Uchida, A Comparative Evaluation of Unsupervised Anomaly Detection Algorithms for Multivariate Data, PLOS ONE, 10.1371/journal.pone.0152173, 11, 4, e0152173, 2016.04, Anomaly detection is the process of identifying unexpected items or events in datasets, which differ from the norm. In contrast to standard classification tasks, anomaly detection is often applied on unlabeled data, taking only the internal structure of the dataset into account. This challenge is known as unsupervised anomaly detection and is addressed in many practical applications, for example in network intrusion detection, fraud detection as well as in the life science and medical domain. Dozens of algorithms have been proposed in this area, but unfortunately the research community still lacks a comparative universal evaluation as well as common publicly available datasets. These shortcomings are addressed in this study, where 19 different unsupervised anomaly detection algorithms are evaluated on 10 different datasets from multiple application domains. By publishing the source code and the datasets, this paper aims to be a new well-funded basis for unsupervised anomaly detection research. Additionally, this evaluation reveals the strengths and weaknesses of the different approaches for the first time. Besides the anomaly detection performance, computational effort, the impact of parameter settings as well as the global/local anomaly detection behavior is outlined. As a conclusion, we give an advise on algorithm selection for typical real-world tasks..
73. Liuan Wang, Wei Fan, Jun Sun, Seiichi Uchida, Globally Optimal Text Line Extraction based on K-Shortest Paths algorithm, PROCEEDINGS OF 12TH IAPR WORKSHOP ON DOCUMENT ANALYSIS SYSTEMS, (DAS 2016), 10.1109/DAS.2016.12, 335-339, 2016.04, The task of text line extraction in images is a crucial prerequisite for content-based image understanding applications. In this paper, we propose a novel text line extraction method based on k-shortest paths global optimization in images. Firstly, the candidate connected components are extracted by reformulating it as Maximal Stable Extremal Region (MSER) results in images. Then, the directed graph is built upon the connected component nodes with edges comprising of unary and pairwise cost function. Finally, the text line extraction problem is solved using the k-shortest paths optimization algorithm by taking advantage of the particular structure of the directed graph. Experimental results on public dataset demonstrate the effectiveness of proposed method in comparison with state-of-the-art methods..
74. Kana Aoki, Fumiyo Maeda, Tomoya Nagasako, Yuki Mochizuki, Seiichi Uchida, Junichi Ikenouchi, A RhoA and Rnd3 cycle regulates actin reassembly during membrane blebbing, PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 10.1073/pnas.1600968113, 113, 13, E1863-E1871, 2016.03, The actin cytoskeleton usually lies beneath the plasma membrane. When the membrane-associated actin cytoskeleton is transiently disrupted or the intracellular pressure is increased, the plasma membrane detaches from the cortex and protrudes. Such protruded membrane regions are called blebs. However, the molecular mechanisms underlying membrane blebbing are poorly understood. This study revealed that epidermal growth factor receptor kinase substrate 8 (Eps8) and ezrin are important regulators of rapid actin reassembly for the initiation and retraction of protruded blebs. Live-cell imaging of membrane blebbing revealed that local reassembly of actin filaments occurred at Eps8- and activated ezrin-positive foci of membrane blebs. Furthermore, we found that a RhoA-ROCK-Rnd3 feedback loop determined the local reassembly sites of the actin cortex during membrane blebbing..
75. Markus Goldstein, Seiichi Uchida, A Comparative Study on Outlier Removal from a Large-scale Dataset using Unsupervised Anomaly Detection, Proceedings of The 5th International Conference on Pattern Recognition Applications and Methods (ICPRAM2016), 263-269, 2016.02, Outlier removal from training data is a classical problem in pattern recognition. Nowadays, this problem becomes more important for large-scale datasets by the following two reasons: First, we will have a higher risk of "unexpected" outliers, such as mislabeled training data. Second, a large-scale dataset makes it more difficult to grasp the distribution of outliers. On the other hand, many unsupervised anomaly detection methods have been proposed, which can be also used for outlier removal. In this paper, we present a comparative study of nine different anomaly detection methods in the scenario of outlier removal from a large-scale dataset. For accurate performance observation, we need to use a simple and describable recognition procedure and thus utilize a nearest neighbor-based classifier. As an adequate large-scale dataset, we prepared a handwritten digit dataset comprising of more than 800,000 manually labeled samples. With a data dimensionality of 16×16=256, it is ensured that each digit class has at least 100 times more instances than data dimensionality. The experimental results show that the common understanding that outlier removal improves classification performance on small datasets is not true for high-dimensional large-scale datasets. Additionally, it was found that local anomaly detection algorithms perform better on this data than their global equivalents..
76. Markus Goldstein, Seiichi Uchida, A Comparative Study on Outlier Removal from a Large-scale Dataset using Unsupervised Anomaly Detection, Proceedings of The 5th International Conference on Pattern Recognition Applications and Methods (ICPRAM2016), 263-269, 2016.02, Outlier removal from training data is a classical problem in pattern recognition. Nowadays, this problem becomes more important for large-scale datasets by the following two reasons: First, we will have a higher risk of "unexpected" outliers, such as mislabeled training data. Second, a large-scale dataset makes it more difficult to grasp the distribution of outliers. On the other hand, many unsupervised anomaly detection methods have been proposed, which can be also used for outlier removal. In this paper, we present a comparative study of nine different anomaly detection methods in the scenario of outlier removal from a large-scale dataset. For accurate performance observation, we need to use a simple and describable recognition procedure and thus utilize a nearest neighbor-based classifier. As an adequate large-scale dataset, we prepared a handwritten digit dataset comprising of more than 800,000 manually labeled samples. With a data dimensionality of 16×16=256, it is ensured that each digit class has at least 100 times more instances than data dimensionality. The experimental results show that the common understanding that outlier removal improves classification performance on small datasets is not true for high-dimensional large-scale datasets. Additionally, it was found that local anomaly detection algorithms perform better on this data than their global equivalents..
77. Markus Goldstein, Seiichi Uchida, A Comparative Study on Outlier Removal from a Large-scale Dataset using Unsupervised Anomaly Detection, Proceedings of The 5th International Conference on Pattern Recognition Applications and Methods (ICPRAM2016), 263-269, 2016.02, Outlier removal from training data is a classical problem in pattern recognition. Nowadays, this problem becomes more important for large-scale datasets by the following two reasons: First, we will have a higher risk of "unexpected" outliers, such as mislabeled training data. Second, a large-scale dataset makes it more difficult to grasp the distribution of outliers. On the other hand, many unsupervised anomaly detection methods have been proposed, which can be also used for outlier removal. In this paper, we present a comparative study of nine different anomaly detection methods in the scenario of outlier removal from a large-scale dataset. For accurate performance observation, we need to use a simple and describable recognition procedure and thus utilize a nearest neighbor-based classifier. As an adequate large-scale dataset, we prepared a handwritten digit dataset comprising of more than 800,000 manually labeled samples. With a data dimensionality of 16×16=256, it is ensured that each digit class has at least 100 times more instances than data dimensionality. The experimental results show that the common understanding that outlier removal improves classification performance on small datasets is not true for high-dimensional large-scale datasets. Additionally, it was found that local anomaly detection algorithms perform better on this data than their global equivalents..
78. Markus Goldstein, Seiichi Uchida, A Comparative Study on Outlier Removal from a Large-scale Dataset using Unsupervised Anomaly Detection, Proceedings of The 5th International Conference on Pattern Recognition Applications and Methods (ICPRAM2016), 263-269, 2016.02, Outlier removal from training data is a classical problem in pattern recognition. Nowadays, this problem becomes more important for large-scale datasets by the following two reasons: First, we will have a higher risk of "unexpected" outliers, such as mislabeled training data. Second, a large-scale dataset makes it more difficult to grasp the distribution of outliers. On the other hand, many unsupervised anomaly detection methods have been proposed, which can be also used for outlier removal. In this paper, we present a comparative study of nine different anomaly detection methods in the scenario of outlier removal from a large-scale dataset. For accurate performance observation, we need to use a simple and describable recognition procedure and thus utilize a nearest neighbor-based classifier. As an adequate large-scale dataset, we prepared a handwritten digit dataset comprising of more than 800,000 manually labeled samples. With a data dimensionality of 16×16=256, it is ensured that each digit class has at least 100 times more instances than data dimensionality. The experimental results show that the common understanding that outlier removal improves classification performance on small datasets is not true for high-dimensional large-scale datasets. Additionally, it was found that local anomaly detection algorithms perform better on this data than their global equivalents..
79. Markus Goldstein, Seiichi Uchida, A Comparative Study on Outlier Removal from a Large-scale Dataset using Unsupervised Anomaly Detection, Proceedings of The 5th International Conference on Pattern Recognition Applications and Methods (ICPRAM2016), 263-269, 2016.02, Outlier removal from training data is a classical problem in pattern recognition. Nowadays, this problem becomes more important for large-scale datasets by the following two reasons: First, we will have a higher risk of "unexpected" outliers, such as mislabeled training data. Second, a large-scale dataset makes it more difficult to grasp the distribution of outliers. On the other hand, many unsupervised anomaly detection methods have been proposed, which can be also used for outlier removal. In this paper, we present a comparative study of nine different anomaly detection methods in the scenario of outlier removal from a large-scale dataset. For accurate performance observation, we need to use a simple and describable recognition procedure and thus utilize a nearest neighbor-based classifier. As an adequate large-scale dataset, we prepared a handwritten digit dataset comprising of more than 800,000 manually labeled samples. With a data dimensionality of 16×16=256, it is ensured that each digit class has at least 100 times more instances than data dimensionality. The experimental results show that the common understanding that outlier removal improves classification performance on small datasets is not true for high-dimensional large-scale datasets. Additionally, it was found that local anomaly detection algorithms perform better on this data than their global equivalents..
80. Markus Goldstein, Seiichi Uchida, A Comparative Study on Outlier Removal from a Large-scale Dataset using Unsupervised Anomaly Detection, Proceedings of The 5th International Conference on Pattern Recognition Applications and Methods (ICPRAM2016), 263-269, 2016.02, Outlier removal from training data is a classical problem in pattern recognition. Nowadays, this problem becomes more important for large-scale datasets by the following two reasons: First, we will have a higher risk of "unexpected" outliers, such as mislabeled training data. Second, a large-scale dataset makes it more difficult to grasp the distribution of outliers. On the other hand, many unsupervised anomaly detection methods have been proposed, which can be also used for outlier removal. In this paper, we present a comparative study of nine different anomaly detection methods in the scenario of outlier removal from a large-scale dataset. For accurate performance observation, we need to use a simple and describable recognition procedure and thus utilize a nearest neighbor-based classifier. As an adequate large-scale dataset, we prepared a handwritten digit dataset comprising of more than 800,000 manually labeled samples. With a data dimensionality of 16×16=256, it is ensured that each digit class has at least 100 times more instances than data dimensionality. The experimental results show that the common understanding that outlier removal improves classification performance on small datasets is not true for high-dimensional large-scale datasets. Additionally, it was found that local anomaly detection algorithms perform better on this data than their global equivalents..
81. Jiamin Xu, Palaiahnakote Shivakumara, Tong Lu, Chew Lim Tan, Seiichi Uchida, A new method for multi-oriented graphics-scene-3D text classification in video, PATTERN RECOGNITION, 10.1016/j.patcog.2015.07.002, 49, 19-42, 2016.01, Text detection and recognition in video is challenging due to the presence of different types of texts, namely, graphics (video caption), scene (natural text), 2D, 3D, static and dynamic texts. Developing a universal method that works well for all the types is hard. In this paper, we propose a novel method for classifying graphics-scene and 2D-3D texts in video to enhance text detection and recognition accuracies. We first propose an iterative method to classify static and dynamic clusters based on the fact that static texts have zero velocity while dynamic texts have non-zero velocity. This results in text candidates for both static and dynamic texts regardless of 2D and 3D types. We then propose symmetry for text candidates using stroke width distances and medial axis values, which results in potential text candidates. We group potential text candidates using their geometrical properties to form text regions. Next, for each text region, we study the distribution of the dominant medial axis values given by ring radius transform in a new way to classify graphics and scene texts. Similarly, we study the proximity among the pixels that satisfy the gradient directions symmetry to classify 2D and 3D texts. We evaluate each step of the proposed method in terms of classification and recognition rates through classification with the existing methods to show that video text classification is effective and necessary for enhancing the capability of current text detection and recognition systems. (C) 2015 Elsevier Ltd. All rights reserved..
82. Hiroaki Takebe, Yusuke Uehara, Seiichi Uchida, Efficient Anchor Graph Hashing with Data-Dependent Anchor Selection, IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 10.1587/transinf.2015EDL8060, E98D, 11, 2030-2033, 2015.11, Anchor graph hashing (AGH) is a promising hashing method for nearest neighbor (NN) search. AGH realizes efficient search by generating and utilizing a small number of points that are called anchors. In this paper, we propose a method for improving AGH, which considers data distribution in a similarity space and selects suitable anchors by performing principal component analysis (PCA) in the similarity space..
83. Volkmar Frinken, Seiichi Uchida, Deep BLSTM Neural Networks for Unconstrained Continuous Handwritten Text Recognition, 2015 13TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 10.1109/ICDAR.2015.7333894, 911-915, 2015.08, Recently, two different trends in neural network-based machine learning could be observed. The first one are the introduction of Bidirectional Long Short-Term Memory (BLSTM) neural networks (NN) which made sequences with long-distant dependencies amenable for neural network-based processing. The second one are deep learning techniques, which greatly increased the performance of neural networks, by making use of many hidden layers. In this paper, we propose to combine these two ideas for the task of unconstrained handwriting recognition. Extensive experimental evaluation on the IAM database demonstrate an increase of the recognition performance when using deep learning approaches over commonly used BLSTM neural networks, as well as insight into how different types of hidden layers affect the recognition accuracy..
84. Seiichi Uchida, Yuji Egashira, Kota Sato, Exploring the World of Fonts for Discovering the Most Standard Fonts and the Missing Fonts, 2015 13TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 10.1109/ICDAR.2015.7333800, 441-445, 2015.08, This paper has two contributions toward understanding the principles in font design. The first contribution of this paper is to discover the most standard font shape of each letter class by analyzing thousands of different fonts. For this analysis, two different methods are used. The first method is congealing for aligning multiple images based on a nonlinear geometric transformation model. The average of the aligned image is considered as a standard font shape. The second method is network analysis for representing font variations as a large-scale relative neighborhood graph (RNG) and then finding its center. The font corresponding to the center is considered as the standard font shape. Both of the standard font shapes given by the two methods are plain without decoration, serif, or slant, and thus give an objective reason why we consider the plain font as the typical font shape. The second contribution is to utilize the RNG and the pairwise congealing technique for discovering unexplored font designs and then generating totally new fonts automatically..
85. Dimosthenis Karatzas, Lluis Gomez-Bigorda, Anguelos Nicolaou, Suman Ghosh, Andrew Bagdanov, Masakazu Iwamura, Jiri Matas, Lukas Neumann, Vijay Ramaseshan Chandrasekhar, Shijian Lu, Faisal Shafait, Seiichi Uchida, Ernest Valveny, ICDAR 2015 Competition on Robust Reading, 2015 13TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 10.1109/ICDAR.2015.7333942, 1156-1160, 2015.08, Results of the ICDAR 2015 Robust Reading Competition are presented. A new Challenge 4 on Incidental Scene Text has been added to the Challenges on Born-Digital Images, Focused Scene Images and Video Text. Challenge 4 is run on a newly acquired dataset of 1,670 images evaluating Text Localisation, Word Recognition and End-to-End pipelines. In addition, the dataset for Challenge 3 on Video Text has been substantially updated with more video sequences and more accurate ground truth data. Finally, tasks assessing End-to-End system performance have been introduced to all Challenges. The competition took place in the first quarter of 2015, and received a total of 44 submissions. Only the tasks newly introduced in 2015 are reported on. The datasets, the ground truth specification and the evaluation protocols are presented together with the results and a brief summary of the participating methods..
86. Ryosuke Kakisako, Seiichi Uchida, Frinken Volkmar, Learning Non-Markovian Constraints for Handwriting Recognition, 2015 13TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 10.1109/ICDAR.2015.7333801, 446-450, 2015.08, Recently, the horizon of dynamic time warping (DTW) for matching two sequential patterns has been extended to deal with non-Markovian constraints. The non-Markovian constraints regulate the matching in a wider scale, whereas Markovian constraints regulate the matching only locally. The global optimization of the non-Markovian DTW is proved to be solvable in polynomial time by a graph cut algorithm. The main contribution of this paper is to reveal what is the best constraint for handwriting recognition by using the non-Markovian DTW. The result showed that the best constraint is not a Markovian but a totally non-Markovian constraint that regulates the matching between very distant points; that is, it was proved that the conventional Markovian DTW has a clear limitation and the non-Markovian DTW should be more focused in future research..
87. Masanori Goto, Ryosuke Ishida, Seiichi Uchidat, Preselection of Support Vector Candidates by Relative Neighborhood Graph for Large-Scale Character Recognition, 2015 13TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 10.1109/ICDAR.2015.7333773, 306-310, 2015.08, We propose a pre-selection method for training support vector machines (SVM) with a large-scale dataset. Specifically, the proposed method selects patterns around the class boundary and the selected data is fed to train an SVM. For the selection, that is, searching for boundary patterns, we utilize a relative neighborhood graph (RNG). An RNG has an edge for each pair of neighboring patterns and thus, we can find boundary patterns by looking for edges connecting patterns from different classes. Through large-scale handwritten digit pattern recognition experiments, we show that the proposed pre-selection method accelerates SVM training process 5-15 times faster without degrading recognition accuracy..
88. D. Barbuzzi, G. Pirlo, S. Uchida, V. Frinken, D. Impedovo, Similarity-based Regularization for Semi-Supervised Learning for Handwritten Digit Recognition, 2015 13TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 10.1109/ICDAR.2015.7333734, 101-105, 2015.08, This paper presents an experimental analysis on the use of semi-supervised learning in the handwritten digit recognition field. More specifically, two new feedback-based techniques for retraining individual classifiers in a multi-expert scenario are discussed. These new methods analyze the final decision provided by the multi-expert system so that sample classified with a confidence greater than a specific threshold is used to update the system itself. Experimental results carried out on the CEDAR (handwritten digits) database are presented. In particular, error rate, similarity index and a new correlation score among them are considered in order to evaluate the best retraining rule. For the experimental evaluation, an SVM classifier and five different combination techniques at abstract and measurement level have been used. Finally, the results show that iterating the feedback process, on different multi-expert systems built with the five combination techniques, one retraining rule is winning over the other respect to the best correlation score..
89. Brian Iwana, Seiichi Uchida, Kaspar Riesen, Volkmar Frinken, Tackling Temporal Pattern Recognition by Vector Space Embedding, 2015 13TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 10.1109/ICDAR.2015.7333875, 816-820, 2015.08, This paper introduces a novel method of reducing the number of prototype patterns necessary for accurate recognition of temporal patterns. The nearest neighbor (NN) method is an effective tool in pattern recognition, but the downside is it can be computationally costly when using large quantities of data. To solve this problem, we propose a method of representing the temporal patterns by embedding dynamic time warping (DTW) distance based dissimilarities in vector space. Adaptive boosting (AdaBoost) is then applied for classifier training and feature selection to reduce the number of prototype patterns required for accurate recognition. With a data set of handwritten digits provided by the International Unipen Foundation (iUF), we successfully show that a large quantity of temporal data can be efficiently classified produce similar results to the established NN method while performing at a much smaller cost..
90. Renwu Gao, Shoma Eguchi, Seiichi Uchida, True Color Distributions of Scene Text and Background, 2015 13TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 10.1109/ICDAR.2015.7333813, 506-510, 2015.08, Color feature, as one of the low level features, plays important role in image processing, object recognition and other fields. For example, in the task of scene text detection and recognition, lots of methodologies employ features that utilize color contrast of text and the corresponding background for connected component extraction. However, the true distributions of text and its background, in terms of color, is still not examined because it requires an enough number of scene text database with pixel-level labelled text/non-text ground truth. To clarify the relationship between text and its background, in this paper, we aim at investigating the color non-parametric distribution of text and its background using a large database that contains 3018 scene images and 98,600 characters. The results of our experiments show that text and its background can be discriminated by means of color, therefore color feature can be used for scene text detection..
91. 付加情報の一般的な割り当て
特徴量のみでは本質的に避けることができない誤認識を回避するために, 付加情報を用いるパターン認識という枠組みが提案されている. この方式では,パターン認識を行う際に, 付加情報と呼ばれるクラスの決定を補助する少量の情報を特徴量と 同時に用いて認識性能の改善を目指す. 付加情報は自由に設定でき,通常は誤認識率が最小になるように設定する. ここで問題となるのは,誤認識率が最小になる付加情報の設定方法である. 常に正しい付加情報が得られるいう理想的な条件においては 既に問題が定式化され,付加情報の割り当て方法が導かれている. しかし,実環境での使用を考えると, 付加情報に生じる観測誤差を考慮した割り当て方法が求められる. そこで本論文では 付加情報の観測誤差を考慮に入れて,問題を新たに定式化する. これは付加情報が誤らない場合にも有効な一般的なものである. 本論文で導いた割り当て方法が有効に機能することを マハラノビス距離を用いた実験で例示する..
92. Andreas Fischer, Seiichi Uchida, Volkmar Frinken, Kaspar Riesen, Horst Bunke, Improving Hausdorff edit distance using structural node context, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 10.1007/978-3-319-18224-7_15, 9069, 148-157, 2015.05, In order to cope with the exponential time complexity of graph edit distance, several polynomial-time approximation algorithms have been proposed in recent years. The Hausdorff edit distance is a quadratic-time matching procedure for labeled graphs which reduces the edit distance to a correspondence problem between local substructures. In its original formulation, nodes and their adjacent edges have been considered as local substructures. In this paper, we integrate a more general structural node context into the matching procedure based on hierarchical subgraphs. In an experimental evaluation on diverse graph data sets, we demonstrate that the proposed generalization of Hausdorff edit distance can significantly improve the accuracy of graph classification while maintaining low computational complexity..
93. Koichi Kise, Shinichiro Omachi, Seiichi Uchida, Masakazu Iwamura, Marcus Liwicki, Data Embedding into Characters, IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 10.1587/transinf.2014MUI0002, E98D, 1, 10-20, 2015.01, This paper reviews several trials of re-designing conventional communication medium, i.e., characters, for enriching their functions by using data-embedding techniques. For example, characters are redesigned to have better machine-readability even under various geometric distortions by embedding a geometric invariant into each character image to represent class label of the character. Another example is to embed various information into handwriting trajectory by using a new pen device, called a data-embedding pen. An experimental result showed that we can embed 32-bit information into a handwritten line of 5 cm length by using the pen device. In addition to those applications, we also discuss the relationship between data-embedding and pattern recognition in a theoretical point of view. Several theories tell that if we have appropriate supplementary information by data-embedding, we can enhance pattern recognition performance up to 100%..
94. Renwu Gao, Seiichi Uchida, Asif Shahab, Faisal Shafait, Volkmar Frinken, Visual Saliency Models for Text Detection in Real World, PLOS ONE, 10.1371/journal.pone.0114539, 9, 12, e114539, 2014.12, This paper evaluates the degree of saliency of texts in natural scenes using visual saliency models. A large scale scene image database with pixel level ground truth is created for this purpose. Using this scene image database and five state-of-the-art models, visual saliency maps that represent the degree of saliency of the objects are calculated. The receiver operating characteristic curve is employed in order to evaluate the saliency of scene texts, which is calculated by visual saliency models. A visualization of the distribution of scene texts and non-texts in the space constructed by three kinds of saliency maps, which are calculated using Itti's visual saliency model with intensity, color and orientation features, is given. This visualization of distribution indicates that text characters are more salient than their non-text neighbors, and can be captured from the background. Therefore, scene texts can be extracted from the scene images. With this in mind, a new visual saliency architecture, named hierarchical visual saliency model, is proposed. Hierarchical visual saliency model is based on Itti's model and consists of two stages. In the first stage, Itti's model is used to calculate the saliency map, and Otsu's global thresholding algorithm is applied to extract the salient region that we are interested in. In the second stage, Itti's model is applied to the salient region to calculate the final saliency map. An experimental evaluation demonstrates that the proposed model outperforms Itti's model in terms of captured scene texts..
95. Kyoko Chiba, Masahiko Araseki, Keisuke Nozawa, Keiko Furukori, Yoichi Araki, Takahide Matsushima, Tadashi Nakaya, Saori Hata, Yuhki Saito, Seiichi Uchida, Yasushi Okada, Angus C Nairn, Roger J Davis, Tohru Yamamoto, Masataka Kinjo, Hidenori Taru, Toshiharu Suzuki, Quantitative analysis of APP axonal transport in neurons: role of JIP1 in enhanced APP anterograde transport., Molecular biology of the cell, 10.1091/mbc.E14-06-1111, 25, 22, 3569-80, 2014.11, Alzheimer's β-amyloid precursor protein (APP) associates with kinesin-1 via JNK-interacting protein 1 (JIP1); however, the role of JIP1 in APP transport by kinesin-1 in neurons remains unclear. We performed a quantitative analysis to understand the role of JIP1 in APP axonal transport. In JIP1-deficient neurons, we find that both the fast velocity (∼2.7 μm/s) and high frequency (66%) of anterograde transport of APP cargo are impaired to a reduced velocity (∼1.83 μm/s) and a lower frequency (45%). We identified two novel elements linked to JIP1 function, located in the central region of JIP1b, that interact with the coiled-coil domain of kinesin light chain 1 (KLC1), in addition to the conventional interaction of the JIP1b 11-amino acid C-terminal (C11) region with the tetratricopeptide repeat of KLC1. High frequency of APP anterograde transport is dependent on one of the novel elements in JIP1b. Fast velocity of APP cargo transport requires the C11 domain, which is regulated by the second novel region of JIP1b. Furthermore, efficient APP axonal transport is not influenced by phosphorylation of APP at Thr-668, a site known to be phosphorylated by JNK. Our quantitative analysis indicates that enhanced fast-velocity and efficient high-frequency APP anterograde transport observed in neurons are mediated by novel roles of JIP1b..
96. Wenjie Cai, Seiichi Uchida, Hiroaki Sakoe, Comparative performance analysis of stroke correspondence search methods for stroke-order free online multi-stroke character recognition, FRONTIERS OF COMPUTER SCIENCE, 10.1007/s11704-014-3207-6, 8, 5, 773-784, 2014.10, For stroke-order free online multi-stroke character recognition, stroke-to-stroke correspondence search between an input pattern and a reference pattern plays an important role to deal with the stroke-order variation. Although various methods of stroke correspondence have been proposed, no comparative study for clarifying the relative superiority of those methods has been done before. In this paper, we firstly review the approaches for solving the stroke-order variation problem. Then, five representative methods of stroke correspondence proposed by different groups, including cube search (CS), bipartite weighted matching (BWM), individual correspondence decision (ICD), stable marriage (SM), and deviation-expansion model (DE), are experimentally compared, mainly in regard of recognition accuracy and efficiency. The experimental results on an online Kanji character dataset, showed that 99.17%, 99.17%, 96.37%, 98.54%, and 96.59% were attained by CS, BWM, ICD, SM, and DE, respectively as their recognition rates. Extensive discussions are made on their relative superiorities and practicalities..
97. Volkmar Frinken, Ryosuke Kakisako, Seiichi Uchida, A Novel HMM Decoding Algorithm Permitting Long-Term Dependencies and its Application to Handwritten Word Recognition, 2014 14th International Conference on Frontiers in Handwriting Recognition (ICFHR), 10.1109/ICFHR.2014.29, 128-133, 2014.09, A new decoding for hidden Markov models is presented. As opposed to the commonly used Viterbi algorithm, it is based on the Min-CuUMax-Flow algorithm instead of dynamic programming. Therefore non-Markovian long-term dependencies can easily be added to influence the decoding path while still finding the optimal decoding in polynomial time. We demonstrate through an experimental evaluation how these constraints can be used to improve an HMM-based handwritten word recognition system that model words via linear character-HMM by restricting the length of each character..
98. Muhammad Imran Malik, Marcus Liwicki, Andreas Dengel, Seiichi Uchida, Volkmar Frinken, Automatic Signature Stability Analysis And Verification Using Local Features, 2014 14th International Conference on Frontiers in Handwriting Recognition (ICFHR), 10.1109/ICFHR.2014.109, 621-626, 2014.09, The purpose of writing this paper is two-fold. First, it presents a novel signature stability analysis based on signature's local / part-based features. The Speeded Up Local features (SURF) are used for local analysis which give various clues about the potential areas from whom the features should be exclusively considered while performing signature verification. Second, based on the results of the local stability analysis we present a novel signature verification system and evaluate this system on the publicly available dataset of forensic signature verification competition, 4NSigComp2010, which contains genuine, forged, and disguised signatures. The proposed system achieved an equal error rate of 15%, which is considerably very low when compared against all the participants of the said competition. Furthermore, we also compare the proposed system with some of the earlier reported systems on the said data. The proposed system also outperforms these systems..
99. Ryota Ogata, Minoru Mori, Volkmar Frinken, Seiichi Uchida, Constrained AdaBoost for Totally-Ordered Global Features, 2014 14th International Conference on Frontiers in Handwriting Recognition (ICFHR), 10.1109/ICFHR.2014.72, 393-398, 2014.09, This paper proposes a constrained AdaBoost algorithm for utilizing global features in a dynamic time warping (DTW) framework. Global features are defined as a spatial relationship between temporally-distant points of a temporal pattern and are useful to represent global structure of the pattern. An example is the spatial relationship between the first and the last points of a handwritten pattern of the digit "0". Those temporally-distant points should be spatially close enough to form a closed circle, whereas those points of "6" should be distant enough. For a temporal pattern of an N-point sequence, it is possible to have N(N - 1)/2 global features. One problem of using the global features is that they are not ordered as a one-dimensional sequence any more. Consequently, it is impossible to use them in a left-to-right Markovian model, such as DTW and HMM. The proposed constrained AdaBoost algorithm can select a totally-ordered subset from the set of N (N - 1)/2 global features. Since the totally-ordered features can be arranged as a one-dimensional sequence, they can be incorporated into a DTW framework for compensating nonlinear temporal fluctuation. Since the selection is governed by the AdaBoost framework, the selected features can retain discriminative power..
100. Volkmar Frinken, Nilanjana Bhattacharya, Seiichi Uchida, Umapada Pal, Improved BLSTM Neural Networks for Recognition of On-Line Bangla Complex Words, STRUCTURAL, SYNTACTIC, AND STATISTICAL PATTERN RECOGNITION, 10.1007/978-3-662-44415-3_41, 8621, 404-413, 2014.08, While bi-directional long short-term (BLSTM) neural network have been demonstrated to perform very well for English or Arabic, the huge number of different output classes (characters) encountered in many Asian fonts, poses a severe challenge. In this work we investigate different encoding schemes of Bangla compound characters and compare the recognition accuracies. We propose to model complex characters not as unique symbols, which are represented by individual nodes in the output layer. Instead, we exploit the property of long-distance-dependent classification in BLSTM neural networks. We classify only basic strokes and use special nodes which react to semantic changes in the writing, i.e., distinguishing inter-character spaces from intra-character spaces. We show that our approach outperforms the common approaches to BLSTM neural network-based handwriting recognition considerably..
101. Volkmar Frinken, Yutaro Iwakiri, Ryosuke Ishida, Kensho Fujisaki, Seiichi Uchida, Improving Point of View Scene Recognition by Considering Textual Data, 2014 22ND INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 10.1109/ICPR.2014.512, 2966-2971, 2014.08, At the current rate of technological advancement and social acceptance thereof, it will not be long before wearable devices will be common that constantly record the field of view of the user. We introduce a new database of image sequences, taken with a first person view camera, of realistic, everyday scenes. As a distinguishing feature, we manually transcribed the scene text of each image. This way, sophisticated OCR algorithms can be simulated that can help in the recognition of the location and the activity. To test this hypothesis, we performed a set of experiments using visual features, textual features, and a combination of both. We demonstrate that, although not very powerful when considered alone, the textual information improves the overall recognition rates..
102. Markus Weber, Marcus Liwicki, Didier Stricker, Christopher Schoelzel, Seiichi Uchida, LSTM-Based Early Recognition of Motion Patterns, 2014 22ND INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 10.1109/ICPR.2014.611, 3552-3557, 2014.08, In this paper a method for Early Recognition (ER) of Motion Templates (MTs) is presented. We define ER as an algorithm to provide recognition results before a motion sequence is completed. In our experiments we apply Long Short-Term Memory (LSTM) and optimize the training for the task of recognizing the motion template as early as possible. The evaluation has shown that the recognition accuracy for a frame-by-frame classification the LSTM achieves a recognition accuracy of 88% if no training data of the person him/herself is included, and 92% if the training data also contains motion sequences of the person. Furthermore, the average earliness - the number of time frames it takes before the LSTM correctly classifies a motion pattern - is around 24.77 frames, which is less than a second with the used tracking technology, i.e., the Microsoft Kinect..
103. Kohei Inai, Marten Palsson, Volkmar Frinken, Yaokai Feng, Seiichi Uchida, Selective Concealment of Characters for Privacy Protection, 2014 22ND INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 10.1109/ICPR.2014.66, 333-338, 2014.08, A method of concealing characters is proposed for degrading legibility of privacy sensitive textual information in natural scene, such as car license plate numbers and name tags. An important property of the proposed method is that it realizes selective concealing of characters; that is, the proposed method degrades legibility of character regions without degrading the quality of non-character regions. This selective concealment is realized because characters have special concealment characteristics. Specifically, character legibility can be degraded by damaging stroke structure by using exemplar-based image inpainting, which does not affect non-character regions. Experimental results of qualitative and quantitative evaluations have proven that the selective concealment is practically possible. Furthermore, the quantitative evaluation through a subjective experiment revealed appropriate setups of image inpainting for maximizing selective concealment performance..
104. Minoru Mori, Seiichi Uchida, Hitoshi Sakano, Global feature for online character recognition, PATTERN RECOGNITION LETTERS, 10.1016/j.patrec.2013.03.036, 35, 142-148, 2014.01, This paper focuses on the importance of global features for online character recognition. Global features represent the relationship between two temporally distant points in a handwriting pattern. For example, it can be defined as the relative vector of two xy-coordinate features of two temporally separated points. Most existing online character recognition methods do not utilize global features, since their non-Markovian property prevents the use of the traditional recognition methodologies, such as dynamic time warping and hidden Markov models. However, we can understand the importance of, for example, the relationship between the starting and the ending points by attempting to discriminate "0" and "6". This relationship cannot be represented by local features defined at individual points but by global features. Since O(N-2) global features can be extracted from a handwriting pattern with N points, selecting those that are truly discriminative is very important. In this paper, AdaBoost is employed for feature selection. Experiments prove that many global features are discriminative and the combined use of local and global features can improve the recognition accuracy. (C) 2013 Elsevier B.V. All rights reserved..
105. Marcus Liwicki, Seiichi Uchida, Akira Yoshida, Masakazu Iwamura, Shinichiro Omachi, Koichi Kise, More than ink - Realization of a data-embedding pen, PATTERN RECOGNITION LETTERS, 10.1016/j.patrec.2012.09.001, 35, 246-255, 2014.01, In this paper we present a novel digital pen device, called data-embedding pen, for enhancing the value of handwriting on physical paper. This pen produces an additional ink-dot sequence along a written stroke during writing. This ink-dot sequence represents arbitrary information, such as writer's name and writing date. Since the information is placed on the paper as an ink-dot sequence, it can be retrieved just by scanning or photographing the paper. In addition to the hardware of the data-embedding pen, this paper also proposes a coding scheme for reliable data-embedding and retrieval. In fact, the physical data-embedding on a paper will undergo various severe errors and therefore a robust coding scheme is necessary. Through experiments on data written by two writers, we show that we can embed 32 bits on short and simple or even on more complex patterns and finally retrieve them with a high reliability. (C) 2012 Elsevier B.V. All rights reserved..
106. Megumi Chikano, Koichi Kise, Masakazu Iwamura, Seiichi Uchida, Shinichiro Omachi, Recovery and localization of handwritings by a camera-pen based on tracking and document image retrieval, PATTERN RECOGNITION LETTERS, 10.1016/j.patrec.2012.10.003, 35, 214-224, 2014.01, We propose a camera-based method for digital recovery of handwritings on ordinary paper. Our method is characterized by the following two points: (1) it requires no special device such as special paper other than a camera-pen to recover handwritings, (2) if the handwriting is on a printed document, the method is capable of localizing it onto an electronic equivalent of the printed document. The above points are enabled by the following processing. The handwriting is recovered by the LK tracking to trace the move of the pen-tip. The recovered shape is localized onto the corresponding part of the electronic document with the help of document image retrieval called LLAH (locally likely arrangement hashing). A new framework for stably estimating the homography from a camera-captured image to the corresponding electronic document allows us to localize the recovered handwritings accurately. We experimentally evaluate the accuracy, processing time and memory usage of the proposed method using 30 handwritings. From the comparison to other methods that implement alternative ways for realizing the same functionality, we have confirmed that the proposed method is superior to those other methods. (C) 2012 Elsevier B.V. All rights reserved..
107. Kyoko Chiba, Yuki Shimada, Masataka Kinjo, Toshiharu Suzuki, Seiich Uchida, Simple and Direct Assembly of Kymographs from Movies Using KYMOMAKER, TRAFFIC, 10.1111/tra.12127, 15, 1, 1-11, 2014.01, In tracking analysis, the movement of cargos by motor proteins in axons is often represented by a time-space plot termed a kymograph'. Manual creation of kymographs is time-consuming and complicated for cell biologists. Therefore, we developed KYMOMAKER, a simple system that automatically creates a kymograph from a movie without generating multiple time-dissected movie stacks. In addition, KYMOMAKER can automatically extract faint vesicle traces, and can thereby effectively analyze cargos expressed at low levels in axons. A filter can be applied to remove traces of non-physiological movements and to extract meaningful traces of anterograde or retrograde cargo transport. For example, only cargos that move at a speed of >0.4 mu m/second for a distance of >1 mu m can be included. Another function of KYMOMAKER is to create a color kymograph in which the color of the trace varies according to the position of the fluorescent particle in the axis perpendicular to the long axis of the axon. Such positional information is completely lost in conventional kymographs. KYMOMAKER is an open access program that can be easily used to analyze vesicle transport in axons by cell biologists who do not have specific knowledge of bioimage informatics..
108. Seiichi Uchida, Text localization and recognition in images and video, Handbook of Document Image Processing and Recognition, 10.1007/978-0-85729-859-1_28, 843-883, 2014.01, This chapter reviews techniques on text localization and recognition in scene images captured by camera. Since properties of scene texts are very different from scanned documents in various aspects, specific techniques are necessary to localize and recognize them. In fact, localization of scene text is a difficult and important task because there is no prior information on the location, layout, direction, size, typeface, and color of texts in a scene image in general and there are many textures and patterns similar to characters. In addition, recognition of scene text is also a difficult task because there are many characters distorted by blurring, perspective, nonuniform lighting, and low resolution. Decoration of characters makes the recognition task far more difficult. As reviewed in this chapter, those difficult tasks have been tackled with not only modified versions of conventional OCR techniques but also state-of-the-art computer vision and pattern recognition methodologies..
109. R. Huang, K. H. Rhee, S. Uchida, A parallel image encryption method based on compressive sensing, Multimedia Tools and Applications, 10.1007/s11042-012-1337-0, 72, 1, 71-93, 2013.12, Recently, compressive sensing-based encryption methods which combine sampling, compression and encryption together have been proposed. However, since the quantized measurement data obtained from linear dimension reduction projection directly serve as the encrypted image, the existing compressive sensing-based encryption methods fail to resist against the chosen-plaintext attack. To enhance the security, a block cipher structure consisting of scrambling, mixing, S-box and chaotic lattice XOR is designed to further encrypt the quantized measurement data. In particular, the proposed method works efficiently in the parallel computing environment. Moreover, a communication unit exchanges data among the multiple processors without collision. This collision-free property is equivalent to optimal diffusion. The experimental results demonstrate that the proposed encryption method not only achieves the remarkable confusion, diffusion and sensitivity but also outperforms the existing parallel image encryption methods with respect to the compressibility and the encryption speed. © 2012 Springer Science+Business Media New York..
110. Kensho Fujisaki, Ayumi Hamano, Kenta Aoki, Yaokai Feng, Seiichi Uchida, Masahiko Araseki, Yuki Saito, Toshiharu Suzuki, Detection and Tracking Protein Molecules in Fluorescence Microscopic Video, 2013 FIRST INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING (CANDAR), 10.1109/CANDAR.2013.47, 270-274, 2013.12, This paper provides a bioimage informatics system of detecting and tracking protein molecules, called APP-GFPs, in a live-cell video captured by a fluorescent microscope. Since both processes encounter many difficulties such as many targets, less appearance information, and heavy background noise, we will try to design the system as robust as possible. Specifically, for the detection, a machine learning-based method is employed. For tracking, a method based on a global optimization strategy is newly developed. Experimental results showed that the speed and direction distributions of molecular motion by the proposed system were very similar to that by manual inspection..
111. Ayumi Hamano, Kensho Fujisaki, Seiichi Uchida, Osamu Shiku, Stable Marriage Algorithm for Tracking Intracellular Objects, 2013 FIRST INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING (CANDAR), 10.1109/CANDAR.2013.53, 305-307, 2013.12, Development of automatic multiple intracellular-objects tracking methods is one of the significant challenges in Bioimage-Informatics. The challenge becomes more difficult in case the tracking targets have the same shape and appearance. In order to obtain stable results under that condition, we propose a tracking method based on global optimization. Particularly, we first detect tracking targets by our proposed detection method. Then, we formulate the multiple object tracking problem as a combinatorial optimization problem over a pair of consecutive frames. Finally, we solve the problem by the stable marriage algorithm. In this paper, we describe our proposed detection and tracking methods..
112. Koichi Ogawara, Masahiro Fukutomi, Seiichi Uchida, Yaokai Feng, A Voting-Based Sequential Pattern Recognition Method, PLOS ONE, 10.1371/journal.pone.0076980, 8, 10, e76980, 2013.10, We propose a novel method for recognizing sequential patterns such as motion trajectory of biological objects (i.e., cells, organelle, protein molecules, etc.), human behavior motion, and meteorological data. In the proposed method, a local classifier is prepared for every point (or timing or frame) and then the whole pattern is recognized by majority voting of the recognition results of the local classifiers. The voting strategy has a strong benefit that even if an input pattern has a very large deviation from a prototype locally at several points, they do not severely influence the recognition result; they are treated just as several incorrect votes and thus will be neglected successfully through the majority voting. For regularizing the recognition result, we introduce partial-dependency to local classifiers. An important point is that this dependency is introduced to not only local classifiers at neighboring point pairs but also to those at distant point pairs. Although, the dependency makes the problem non-Markovian (i.e., higher-order Markovian), it can still be solved efficiently by using a graph cut algorithm with polynomial-order computations. The experimental results revealed that the proposed method can achieve better recognition accuracy while utilizing the above characteristics of the proposed method..
113. Kai Kunze, Masakazu Iwamura, Koichi Kise, Seiichi Uchida, Shinichiro Omachi, Activity Recognition for the Mind: Toward a Cognitive "Quantified Self", COMPUTER, 10.1109/MC.2013.339, 46, 10, 105-108, 2013.10, Applying mobile sensing technology to cognitive tasks will enable novel forms of activity recognition..
114. Rong Huang, Palaiahnakote Shivakumara, Yaokai Feng, Seiichi Uchida, Scene Character Detection and Recognition with Cooperative Multiple-Hypothesis Framework, IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 10.1587/transinf.E96.D.2235, E96D, 10, 2235-2244, 2013.10, To handle the variety of scene characters, we propose a cooperative multiple-hypothesis framework which consists of an image operator set module, an Optical Character Recognition (OCR) module and an integration module. Multiple image operators activated by multiple parameters probe suspected character regions. The OCR module is then applied to each suspected region and returns multiple candidates with weight values for future integration. Without the aid of the heuristic rules which impose constraints on segmentation area, aspect ratio, color consistency, text line orientations, etc., the integration module automatically prunes the redundant detection/recognition and pads the missing detection/recognition. The proposed framework bridges the gap between scene character detection and recognition, in the sense that a practical OCR engine is effectively leveraged for result refinement. In addition, the proposed method achieves the detection and recognition at the character level, which enables dealing with special scenarios such as single character, text along arbitrary orientations or text along curves. We perform experiments on the benchmark ICDAR 2011 Robust Reading Competition dataset which includes a text localization task and a word recognition task. The quantitative results demonstrate that multiple hypotheses outperform a single hypothesis, and be comparable with state-of-the-art methods in terms of recall, precision, F-measure, character recognition rate, total edit distance and word recognition rate. Moreover, two additional experiments are conducted to confirm the simplicity of parameter setting in this proposal..
115. Wenjie Cai, Seiichi Uchida and Hiroaki Sakoe, An Efficient Radical-Based Algorithm for Stroke-Order Free and Stroke-Number Free Online Kanji Character Recognition, Proceedings of the 16th International Graphonomics Society Conference (IGS 2013, Nara, Japan), 82.0-85.0, 2013.08.
116. Takafumi Matsuo, Song Wang, Yaokai Feng and Seiichi Uchida, Exploring the Ability of Parts on Recognizing Handwriting Characters, Proceedings of the 16th International Graphonomics Society Conference (IGS 2013, Nara, Japan), 66.0-69.0, 2013.08.
117. Chihiro Nakamoto, Rong Huang, Sota Koizumi, Ryosuke Ishida, Yaokai Feng and Seiichi Uchida, Font Distribution Analysis by Network, Proceedings of The 12th International Conference on Document Analysis and Recognition (ICDAR 2013, Washington DC, USA), 2013.08.
118. Song Wang, Seiichi Uchida, Marcus Liwicki, Yaokai Feng, Part-based methods for handwritten digit recognition, FRONTIERS OF COMPUTER SCIENCE, 10.1007/s11704-013-2297-x, 7, 4, 514-525, 2013.08, In this paper, we intensively study the behavior of three part-based methods for handwritten digit recognition. The principle of the proposed methods is to represent a handwritten digit image as a set of parts and recognize the image by aggregating the recognition results of individual parts. Since part-based methods do not rely on the global structure of a character, they are expected to be more robust against various deformations which may damage the global structure. The proposed three methods are based on the same principle but different in their details, for example, the way of aggregating the individual results. Thus, those methods have different performances. Experimental results show that even the simplest part-based method can achieve recognition rate as high as 98.42% while the improved one achieved 99.15%, which is comparable or even higher than some state-of-the-art method. This result is important because it reveals that characters can be recognized without their global structure. The results also show that the part-based method has robustness against deformations which usually appear in handwriting..
119. Renwu Gao, Faisal Shafait, Seiichi Uchida, Yaokai Feng, Saliency inside Saliency - A Hierarchical Usage of Visual Saliency for Scene Character Detection, Proceedings of The 12th International Conference on Document Analysis and Recognition (ICDAR 2013, Washington DC, USA), 2013.08.
120. Renwu Gao, Faisal Shafait, Seiichi Uchida, Yaokai Feng, A Hierarchical Visual Saliency Model for Character Detection in Natural Scenes, CAMERA-BASED DOCUMENT ANALYSIS AND RECOGNITION, CBDAR 2013, 10.1007/978-3-319-05167-3_2, 8357, 18-29, 2013.08, Visual saliency models have been introduced to the field of character recognition for detecting characters in natural scenes. Researchers believe that characters have different visual properties from their non-character neighbors, which make them salient. With this assumption, characters should response well to computational models of visual saliency. However in some situations, characters belonging to scene text mignt not be as salient as one might expect. For instance, a signboard is usually very salient but the characters on the signboard might not necessarily be so salient globally. In order to analyze this hypothesis in more depth, we first give a view of how much these background regions, such as sign boards, affect the task of saliency-based character detection in natural scenes. Then we propose a hierarchical-saliency method for detecting characters in natural scenes. Experiments on a dataset with over 3,000 images containing scene text show that when using saliency alone for scene text detection, our proposed hierarchical method is able to capture a larger percentage of text pixels as compared to the conventional single-pass algorithm..
121. Masanori Goto, Ryosuke Ishida, Yaokai Feng, Seiichi Uchida, Analyzing the Distribution of a Large-scale Character Pattern Set Using Relative Neighborhood Graph, 2013 12TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 10.1109/ICDAR.2013.10, 3-7, 2013.08, The goal of this research is to understand the true distribution of character patterns. Advances in computer technology for mass storage and digital processing have paved way to process a massive dataset for various pattern recognition problems. If we can represent and analyze the distribution of a large-scale character pattern set directly and understand its relationships deeply, it should be helpful for improving character recognizer. For this purpose, we propose a network analysis method to represent the distribution of patterns using a relative neighborhood graph and its clustered version. In this paper, the properties and validity of the proposed method are confirmed on 410,564 machine-printed digit patterns and 622,660 handwritten digit patterns which were manually ground-truthed and resized to 16 times 16 pixels. Our network analysis method represents the distribution of the patterns without any assumption, approximation or loss..
122. Chihiro Nakamoto, Rong Huang, Sota Koizumi, Ryosuke Ishida, Yaokai Feng, Seiichi Uchida, Font Distribution Observation by Network-Based Analysis, CAMERA-BASED DOCUMENT ANALYSIS AND RECOGNITION, CBDAR 2013, 10.1007/978-3-319-05167-3_7, 8357, 83-97, 2013.08, The off-the-shelf Optical Character Recognition (OCR) engines return mediocre performance on the decorative characters which usually appear in natural scenes such as signboards. A reasonable way towards the so-called camera-based OCR is to collect a large-scale font set and analyze the distribution of font samples for realizing some character recognition engine which is tolerant to font shape variations. This paper is concerned with the issue of font distribution analysis by network. Minimum Spanning Tree (MST) is employed to construct font network with respect to Chamfer distance. After clustering, some centrality criterion, namely closeness centrality, eccentricity centrality or betweenness centrality, is introduced for extracting typical font samples. The network structure allows us to observe the font shape transition between any two samples, which is useful to create new fonts and recognize unseen decorative characters. Moreover, unlike the Principal Component Analysis (PCA), the font network fulfills distribution visualization through measuring the dissimilarity between samples rather than the lossy processing of dimensionality reduction. Compared with K-means algorithm, network-based clustering has the ability to preserve small size font clusters which generally consist of samples taking special appearances. Experiments demonstrate that the proposed network-based analysis is an effective way to grasp font distribution, and thus provides helpful information for decorative character recognition..
123. Dimosthenis Karatzas, Faisal Shafait, Seiichi Uchida, Masakazu Iwamura, Lluis Gomez i Bigorda, Sergi Robles Mestre, Joan Mas, David Fernandez Mota, Jon Almazan Almazan, Lluis Pere de las Heras, ICDAR 2013 Robust Reading Competition, 2013 12TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 10.1109/ICDAR.2013.221, 1484-1493, 2013.08, This report presents the final results of the ICDAR 2013 Robust Reading Competition. The competition is structured in three Challenges addressing text extraction in different application domains, namely born-digital images, real scene images and real-scene videos. The Challenges are organised around specific tasks covering text localisation, text segmentation and word recognition. The competition took place in the first quarter of 2013, and received a total of 42 submissions over the different tasks offered. This report describes the datasets and ground truth specification, details the performance evaluation protocols used and presents the final results along with a brief summary of the participating methods..
124. Yugo Terada, Rong Huang, Yaokai Feng, Seiichi Uchida, On the Possibility of Structure Learning-Based Scene Character Detector, 2013 12TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 10.1109/ICDAR.2013.101, 472-476, 2013.08, In this paper, we propose a structure learning-based scene character detector which is inspired by the observation that characters have their own inherent structures compared with the background. Graphs are extracted from the thinned binary image to represent the topological line structures of scene contents. Then, a graph classifier, namely gBoost classifier, is trained with the intent to seek out the inherent structures of character and the counterparts of non-character. The experimental results show that the proposed detector achieves the remarkable classification performance with the accuracy of about 70%, which demonstrates the existence and separability of the inherent structures..
125. Wang Song, Seiichi Uchida, Marcus Liwicki, Part-Based Recognition of Arbitrary Fonts, 2013 12TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 10.1109/ICDAR.2013.41, 170-174, 2013.08, In this paper, the part-based recognition method is introduced and applied to the arbitrary font recognition. The principle of the part-based method is to represent the character image as a set of parts and then recognize the image by finding the most possible parts set from the reference database. Since the part-based method does not rely on the global structure of a character, it is supposed to be robust against the variant appearances of the character. The experiment results indicate that it is possible to apply the part-based method to the font recognition, which is always considered as a difficult task by most of the researchers..
126. Rong Huang, Palaiahnakote Shivakumara, Seiichi Uchida, Scene Character Detection by an Edge-Ray Filter, 2013 12TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 10.1109/ICDAR.2013.99, 462-466, 2013.08, Edge is a type of valuable clues for scene character detection task. Generally, the existing edge-based methods rely on the assumption of straight text line to prune away the non-character candidates. This paper proposes a new edge-based method, called edge-ray filter, to detect the scene character. The main contribution of the proposed method lies in filtering out complex backgrounds by fully utilizing the essential spatial layout of edges instead of the assumption of straight text line. Edges are extracted by a combination of Canny and Edge Preserving Smoothing Filter (EPSF). To effectively boost the filtering strength of the designed edge-ray filter, we employ a new Edge Quasi-Connectivity Analysis (EQCA) to unify complex edges as well as contour of broken character. Label Histogram Analysis (LHA) then filters out non-character edges and redundant rays through setting proper thresholds. Finally, two frequently-used heuristic rules, namely aspect ratio and occupation, are exploited to wipe off distinct false alarms. In addition to have the ability to handle special scenarios, the proposed method can accommodate dark-on-bright and bright-on-dark characters simultaneously, and provides accurate character segmentation masks. We perform experiments on the benchmark ICDAR 2011 Robust Reading Competition dataset as well as scene images with special scenarios. The experimental results demonstrate the validity of our proposal..
127. Takashi Kimura, Rong Huang, Seiichi Uchida, Masakazu Iwamura, Shinichiro Omachi, Koichi Kise, The Reading-life Log - Technologies to Recognize Texts That We Read, 2013 12TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 10.1109/ICDAR.2013.26, 91-95, 2013.08, Reading life log is a type of techniques to automatically and unconsciously record people's reading intentions, interests and habits. Besides, it can also serve as various assistants in our daily life. In this paper, a reading-life log system is implemented by a head-mounted and unobtrusive video camera with a high resolution and a high shutter speed. We utilize DP matching, and propose a text-based frame mosaicing method to integrate multiple frames in a clip. The developed system is tested in the various environments indoor and outdoor. The experimental results show that our system can provide reliable outputs with respect to the most correct responses. The infrequent misregistration between lines also indicates the feasibility and validity of the text-based frame mosaicing..
128. Soma Shiraishi, Yaokai Feng, Seiichi Uchida, Skew Estimation by Parts, IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 10.1587/transinf.E96.D.1503, E96D, 7, 1503-1512, 2013.07, This paper proposes a new part-based approach for skew estimation of document images. The proposed method first estimates skew angles on rather small areas, which are the local parts of characters, and subsequently determines the global skew angle by aggregating those local estimations. A local skew estimation on a part of a skewed character is performed by finding an identical part from prepared upright character images and calculating the angular difference. Specifically, a keypoint detector (e.g. SURF) is used to determine the local parts of characters, and once the parts are described as feature vectors, a nearest neighbor search is conducted in the instance database to identify the parts. Finally, a local skew estimation is acquired by calculating the difference of the dominant angles of brightness gradient of the parts. After the local skew estimation, the global skew angle is estimated by the majority voting of those local estimations, disregarding some noisy estimations. Our experiments have shown that the proposed method is more robust to short and sparse text lines and non-text backgrounds in document images compared to conventional methods..
129. Koichi Kise, Riki Kudo, Masakazu Iwamura, Seiichi Uchida, Shinichiro Omachi, A Proposal of Writing-Life Log and Its Implementation Using a Retrieval-Based Camera-Pen, Proceedings of the 16th International Graphonomics Society Conference (IGS 2013), 86-89, 2013.06.
130. Seiichi Uchida, Image processing and recognition for biological images, DEVELOPMENT GROWTH & DIFFERENTIATION, 10.1111/dgd.12054, 55, 4, 523-549, 2013.05, This paper reviews image processing and pattern recognition techniques, which will be useful to analyze bioimages. Although this paper does not provide their technical details, it will be possible to grasp their main tasks and typical tools to handle the tasks. Image processing is a large research area to improve the visibility of an input image and acquire some valuable information from it. As the main tasks of image processing, this paper introduces gray-level transformation, binarization, image filtering, image segmentation, visual object tracking, optical flow and image registration. Image pattern recognition is the technique to classify an input image into one of the predefined classes and also has a large research area. This paper overviews its two main modules, that is, feature extraction module and classification module. Throughout the paper, it will be emphasized that bioimage is a very difficult target for even state-of-the-art image processing and pattern recognition techniques due to noises, deformations, etc. This paper is expected to be one tutorial guide to bridge biology and image processing researchers for their further collaboration to tackle such a difficult target..
131. Seiichi Uchida, Masahiro Fukutomi, Koichi Ogawara, Yaokai Feng, Non-Markovian Dynamic Time Warping, 2012 21ST INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR 2012), 2294-2297, 2012.11, This paper proposes a new dynamic time warping (DTW) method, called non-Markovian DTW In the conventional DTW, the warping function is optimized generally by dynamic programming (DP) subject to some Markovian constraints which restrict the relationship between neighboring time points. In contrast, the non-Markovian DTW can introduce non-Markovian constraints for dealing with the relationship between points with a large time interval. This new and promising ability of DTW is realized by using graph cut as the optimizer of the warping function instead of DP. Specifically, the conventional DTW problem is first converted as an equivalent minimum cut problem on a graph and then edges representing the non-Markovian constraints are added to the graph. An experiment on online character recognition showed the advantage of using non-Markovian constraints during DTW..
132. Song Wang, Seiichi Uchida, Marcus Liwicki, Part-Based Method on Handwritten Texts, 2012 21ST INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR 2012), 339-342, 2012.11, This paper reports a trial of handwritten text recognition by a part-based method. The part-based method recognizes individual characters by their parts without considering their whole shape. This realizes great robustness to severe deformations. This robustness is also effective for text recognition. Especially, for handwritten texts whose segmentation into individual characters is very difficult by deep touching and heavy slant, the part-based method still can recognize them because it does not request segmentation results to provide their whole shapes. Experimental results using digit sequences proved this robustness..
133. Rong Huang, Shinpei Oba, Shivakumara Palaiahnakote, Seiichi Uchida, Scene Character Detection and Recognition Based on Multiple Hypotheses Framework, 2012 21ST INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR 2012), 717-720, 2012.11, To handle the diversity of scene characters, we propose a multiple hypotheses framework which consists of an image operator set module, an optical character recognition (OCR) module, and an integration module. Image operators detect multiple suspicious character areas. The OCR engine is then applied to each detected area and returns multiple candidates with weight values for future integration. Without the aid of heuristic constraints on area, aspect ratio or color etc., the integration module prunes the redundant detection and pads the missing detection based on the outputs of OCR. The experimental results demonstrate that the whole multiple hypotheses outperforms each operator's hypotheses and be comparable with existing methods in terms of recall, precision, F-measure and recognition rate..
134. Seiichi Uchida, Satoshi Hokahori, Yaokai Feng, Analytical Dynamic Programming Matching, COMPUTER VISION - ECCV 2012: WORKSHOPS AND DEMONSTRATIONS, PT I, 10.1007/978-3-642-33863-2_10, 7583, 92-101, 2012.09, In this paper, we show that the truly two-dimensional elastic image matching problem can be solved analytically using dynamic programming (DP) in polynomial time if the problem is formulated as a maximum a posteriori problem using Gaussian distributions for the likelihood and prior. After giving the derivation of the analytical DP matching algorithm, we evaluate its performance on handwritten character images containing various nonlinear deformations, and compare other elastic image matching methods..
135. Seiichi Uchida, Ryosuke Ishida, Akira Yoshida, Wenjie Cai, Yaokai Feng, Character Image Patterns as Big Data, 13TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR 2012), 10.1109/ICFHR.2012.190, 479-484, 2012.09, The ambitious goal of this research is to understand the real distribution of character patterns. Ideally, if we can collect all possible character patterns, we can totally understand how they are distributed in the image space. In addition, we also have the perfect character recognizer because we know the correct class for any character image. Of course, it is practically impossible to collect all those patterns - however, if we collect character patterns massively and analyze how the distribution changes according to the increase of patterns, we will be able to estimate the real distribution asymptotically. For this purpose, we use 822,714 manually ground-truthed 32 x 32 handwritten digit patterns in this paper. The distribution of those patterns are observed by nearest neighbor analysis and network analysis, both of which do not make any approximation (such as low-dimensional representation) and thus do not corrupt the details of the distribution..
136. Minoru Mori, Seiichi Uchida, Hitoshi Sakano, Dynamic Programming Matching with Global Features for Online Character Recognition, 13TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR 2012), 10.1109/ICFHR.2012.199, 348-353, 2012.09, This paper proposes a dynamic programming (DP) matching method with global features for online character recognition. Many online character recognition methods have utilized the ability of DP matching on compensating temporal fluctuation. On the other hand, DP requires the Markovian property on its matching process. Consequently, most traditional DP matching methods have utilized local information of strokes such as xy-coordinates or local directions as features, because it is easy to satisfy the Markovian property with those features. Unfortunately, these local features cannot represent global structure of character shapes. Although global features that extract global structures of characters have high potential to represent various key characteristics of character shapes, conventional DP matching methods cannot handle global features. This is because the incorporation of global features is not straightforward due to the Markovian property of DP. In this paper we propose a new scheme for DP matching using global features. Our method first selects global features which not only satisfy the Markovian property but also have sufficient discrimination ability. By embedding the selected global features into DP matching process, we can compensate temporal fluctuation while considering the global structure of the pattern. Experimental results show that our methods can enhance the recognition accuracy for online numeral characters..
137. Yutaro Iwakiri, Soma Shiraishi, Yaokai Feng, Seiichi Uchida, On the Possibility of Instance-Based Stroke Recovery, 13TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR 2012), 10.1109/ICFHR.2012.248, 29-34, 2012.09, This paper tackles the stroke recovery problem, which is a typical ill-posed reverse problem, by an instance-based method. The basic idea of the instance-based stroke recovery is to refer to the drawing order of a similar instance. The instance-based method has a strong merit that it can deal with multi-stroke characters and other complex characters without any special consideration. However, it requires a sufficient numbers of instances to cover those various characters. As an initial trial of the instance-based stroke recovery method, this paper describes the principle of the method and then provides several experimental results. The experimental results indicate the potential of the proposed method on recovering the drawing order of complex characters, as expected..
138. Masakazu Iwamura, Akira Horimatsu, Ryo Niwa, Koichi Kise, Seiichi Uchida, Shinichiro Omachi, Affine-invariant character recognition by progressive removing, ELECTRICAL ENGINEERING IN JAPAN, 10.1002/eej.22276, 180, 2, 55-63, 2012.07, Recognizing characters in scene images suffering from perspective distortion is a challenge. Although there are some methods to overcome this difficulty, they are time-consuming. In this paper, we propose a set of affine-invariant features and a new recognition scheme called progressive removing that can help reduce the processing time. Progressive removing gradually removes less feasible categories and skew angles by using multiple classifiers. We observed that progressive removing and the use of the affine invariant features reduced the processing time by about 60% in comparison to a trivial algorithm without decreasing the recognition rate. (c) 2012 Wiley Periodicals, Inc. Electr Eng Jpn, 180(2): 5563, 2012; Published online in Wiley Online Library (). DOI 10.1002/eej.22276.
139. Y. Furusawa, M. Imanishi, S. Hirata, S. Uchida, K. Nakano, K. Hayashi, Fluorescence Sensing Film for Odor Imaging, Proceedings of the 6th Asia-Pacific Conference on Transducers and Micro/Nano Technologies, 2012.07.
140. Michael Blumenstein, Umapada Pal, Seiichi Uchida, Message from general chair and program chairs, 10th IAPR International Workshop on Document Analysis Systems, DAS 2012 Proceedings - 10th IAPR International Workshop on Document Analysis Systems, DAS 2012, 10.1109/DAS.2012.55, xii-xiii, 2012.05.
141. Soma Shiraishi, Yaokai Feng, Seiichi Uchida, A part-based skew estimation method, Proceedings - 10th IAPR International Workshop on Document Analysis Systems, DAS 2012, 10.1109/DAS.2012.7, 185-189, 2012.03, In this paper we propose a part-based skew estimation method which is more robust to larger varieties of text images, such as camera-captured scene images. Specifically, the skew angle at each local part of the input image is estimated independently by referring the local part of upright character images stored as a database. Then the global skew angle is estimated by aggregating the estimated local skews. The proposed method does not assume that characters are laid-out in straight lines and thus have more robustness to the varieties of text images than conventional methods. The experimental results show the advantage of the proposed method over the conventional methods under several conditions. © 2012 IEEE..
142. Minoru Mori, Seiichi Uchida, Hitoshi Sakano, How important is global structure for characters?, Proceedings - 10th IAPR International Workshop on Document Analysis Systems, DAS 2012, 10.1109/DAS.2012.41, 255-260, 2012.03, This paper studies the importance of the features that represent the global structure of character strokes to character recognition. Most existing character recognition methods based on character stroke features utilize a set or a sequence of local features such as xy-coordinates and local direction of strokes. This is natural from the viewpoint that each stroke is a trajectory and thus can be represented as a sequence of local features. This viewpoint, however, has a clear limitation in that local features cannot deal with global structure directly. For example, the sequence of local features cannot deal with the fact that the two end points of character "0" should be close to each other. In this paper we propose a simple and novel global feature that describes the global structure of the character shape of each class. We prove the importance of the global feature through a feature selection experiment. Specifically, we show that the global features are more often selected than local features to enhance classification accuracy under the AdaBoost-based machine learning framework. Recognition experiments using online numeral data show also that the use of global features improves recognition accuracy. © 2012 IEEE..
143. Asif Shahab, Faisal Shafait, Andreas Dengel, Seiichi Uchida, How salient is scene text?, Proceedings - 10th IAPR International Workshop on Document Analysis Systems, DAS 2012, 10.1109/DAS.2012.42, 317-321, 2012.03, Computational models of visual attention use image features to identify salient locations in an image that are likely to attract human attention. Attention models have been quite effectively used for various object detection tasks. However, their use for scene text detection is under-investigated. As a general observation, scene text often conveys important information and is usually prominent or salient in the scene itself. In this paper, we evaluate four state-of-the-art attention models for their response to scene text. Initial results indicate that saliency maps produced by these attention models can be used for aiding scene text detection algorithms by suppressing non-text regions. © 2012 IEEE..
144. Wang Song, Seiichi Uchida, Marcus Liwicki, Toward part-based document image decoding, Proceedings - 10th IAPR International Workshop on Document Analysis Systems, DAS 2012, 10.1109/DAS.2012.90, 266-270, 2012.03, Document image decoding (DID) is a trial to understand the contents of a whole document without any reference information about font, language, etc. Typically, DID approaches assume the correct segmentation of the document and some a priori knowledge about the language or the script. Unfortunately, this assumption will not hold if we deal with various documents, such as documents with various sized fonts, camera-captured documents, free-layout documents, or historical documents. In this paper, we propose a part-based character identification method where no segmentation into characters is necessary and no a priori information about the document is needed. The approach clusters similar key points and groups frequent neighboring key point clusters. Then a second iteration is performed, i.e., the groups are again clustered and optionally pairs frequent group clusters are detected. Our first experimental results on multi font-size documents look already very promising. We could find nearly perfect correspondences between characters and detected group clusters. © 2012 IEEE..
145. Hirotaka Matsuo, Yudai Furusawa, Masashi Imanishi, Seiichi Uchida, Kenshi Hayashi, Optical odor imaging by fluorescence probes, Journal of Robotics and Mechatronics, 10.20965/jrm.2012.p0047, 24, 1, 47-54, 2012.02, Odor gas detection is important for the detection of explosives, environmental sensing, biometrics, foodstuffs and a comfortable life. Such odor-source localizations is an active research area for robotics. In this study, we tried to detect odor chemicals with an optical method that can be applied for the spatiotemporal detection of odor. We used four types of fluorescence dyes; tryptophan, quinine sulfate, acridine orange, and 1-anilinonaphthalene-8-sulfonate (ANS). As analyses, we measured the following four odor chemicals, 2-furaldehyde, vanillin, acetophenone, and benzaldehyde. The fluorescence-quenching mechanism of PET (Photoinduced Electron Transfer) or FRET (Fluorescence Resonance Electron Transfer), which occur between fluorescence dyes and odor compounds, could prevent unintended detection of various odorants that is caused by their unspecific adsorption onto the detecting materials. The fluorescence changes were then observed. Thus, we could detect the odor substances through fluorescent quenching by using the fluorescence dyes. Odor information could be obtained by response patterns across all the fluorescence dyes. Moreover, we captured odor images with a cooled CCD camera. Shapes of the targets that emitted odor could be roughly recognized by the odor-shape images. From the spatiotemporal images of odors, twodimensional odor expanse could be obtained..
146. Soma Shiraishi, Yaokai Feng and Seiichi Uchida, Part-Based Skew Estimation for Mathematical Expressions, Proceedings of The International Workshop on "Digitization and E-Inclusion in Mathematics and Science 2012 (DEIMS12, Tokyo, Japan), 2012.02.
147. Hirotaka Matsuo, Yudai Furusawa, Masashi Imanishi, Seiichi Uchida, Kenshi Hayashi, Optical odor imaging by fluorescence probes, Journal of Robotics and Mechatronics, 10.20965/jrm.2012.p0047, 24, 1, 47-54, 2012.02, Odor gas detection is important for the detection of explosives, environmental sensing, biometrics, foodstuffs and a comfortable life. Such odor-source localizations is an active research area for robotics. In this study, we tried to detect odor chemicals with an optical method that can be applied for the spatiotemporal detection of odor. We used four types of fluorescence dyes; tryptophan, quinine sulfate, acridine orange, and 1-anilinonaphthalene-8-sulfonate (ANS). As analyses, we measured the following four odor chemicals, 2-furaldehyde, vanillin, acetophenone, and benzaldehyde. The fluorescence-quenching mechanism of PET (Photoinduced Electron Transfer) or FRET (Fluorescence Resonance Electron Transfer), which occur between fluorescence dyes and odor compounds, could prevent unintended detection of various odorants that is caused by their unspecific adsorption onto the detecting materials. The fluorescence changes were then observed. Thus, we could detect the odor substances through fluorescent quenching by using the fluorescence dyes. Odor information could be obtained by response patterns across all the fluorescence dyes. Moreover, we captured odor images with a cooled CCD camera. Shapes of the targets that emitted odor could be roughly recognized by the odor-shape images. From the spatiotemporal images of odors, twodimensional odor expanse could be obtained..
148. Hirotaka Matsuo, Yudai Furusawa, Masashi Imanishi, Seiichi Uchida, Kenshi Hayashi, Optical odor imaging by fluorescence probes, Journal of Robotics and Mechatronics, 10.20965/jrm.2012.p0047, 24, 1, 47-54, 2012.02, Odor gas detection is important for the detection of explosives, environmental sensing, biometrics, foodstuffs and a comfortable life. Such odor-source localizations is an active research area for robotics. In this study, we tried to detect odor chemicals with an optical method that can be applied for the spatiotemporal detection of odor. We used four types of fluorescence dyes; tryptophan, quinine sulfate, acridine orange, and 1-anilinonaphthalene-8-sulfonate (ANS). As analyses, we measured the following four odor chemicals, 2-furaldehyde, vanillin, acetophenone, and benzaldehyde. The fluorescence-quenching mechanism of PET (Photoinduced Electron Transfer) or FRET (Fluorescence Resonance Electron Transfer), which occur between fluorescence dyes and odor compounds, could prevent unintended detection of various odorants that is caused by their unspecific adsorption onto the detecting materials. The fluorescence changes were then observed. Thus, we could detect the odor substances through fluorescent quenching by using the fluorescence dyes. Odor information could be obtained by response patterns across all the fluorescence dyes. Moreover, we captured odor images with a cooled CCD camera. Shapes of the targets that emitted odor could be roughly recognized by the odor-shape images. From the spatiotemporal images of odors, twodimensional odor expanse could be obtained..
149. Hirotaka Matsuo, Yudai Furusawa, Masashi Imanishi, Seiichi Uchida, Kenshi Hayashi, Optical odor imaging by fluorescence probes, Journal of Robotics and Mechatronics, 10.20965/jrm.2012.p0047, 24, 1, 47-54, 2012.02, Odor gas detection is important for the detection of explosives, environmental sensing, biometrics, foodstuffs and a comfortable life. Such odor-source localizations is an active research area for robotics. In this study, we tried to detect odor chemicals with an optical method that can be applied for the spatiotemporal detection of odor. We used four types of fluorescence dyes; tryptophan, quinine sulfate, acridine orange, and 1-anilinonaphthalene-8-sulfonate (ANS). As analyses, we measured the following four odor chemicals, 2-furaldehyde, vanillin, acetophenone, and benzaldehyde. The fluorescence-quenching mechanism of PET (Photoinduced Electron Transfer) or FRET (Fluorescence Resonance Electron Transfer), which occur between fluorescence dyes and odor compounds, could prevent unintended detection of various odorants that is caused by their unspecific adsorption onto the detecting materials. The fluorescence changes were then observed. Thus, we could detect the odor substances through fluorescent quenching by using the fluorescence dyes. Odor information could be obtained by response patterns across all the fluorescence dyes. Moreover, we captured odor images with a cooled CCD camera. Shapes of the targets that emitted odor could be roughly recognized by the odor-shape images. From the spatiotemporal images of odors, twodimensional odor expanse could be obtained..
150. Hirotaka Matsuo, Yudai Furusawa, Masashi Imanishi, Seiichi Uchida, Kenshi Hayashi, Optical odor imaging by fluorescence probes, Journal of Robotics and Mechatronics, 10.20965/jrm.2012.p0047, 24, 1, 47-54, 2012.02, Odor gas detection is important for the detection of explosives, environmental sensing, biometrics, foodstuffs and a comfortable life. Such odor-source localizations is an active research area for robotics. In this study, we tried to detect odor chemicals with an optical method that can be applied for the spatiotemporal detection of odor. We used four types of fluorescence dyes; tryptophan, quinine sulfate, acridine orange, and 1-anilinonaphthalene-8-sulfonate (ANS). As analyses, we measured the following four odor chemicals, 2-furaldehyde, vanillin, acetophenone, and benzaldehyde. The fluorescence-quenching mechanism of PET (Photoinduced Electron Transfer) or FRET (Fluorescence Resonance Electron Transfer), which occur between fluorescence dyes and odor compounds, could prevent unintended detection of various odorants that is caused by their unspecific adsorption onto the detecting materials. The fluorescence changes were then observed. Thus, we could detect the odor substances through fluorescent quenching by using the fluorescence dyes. Odor information could be obtained by response patterns across all the fluorescence dyes. Moreover, we captured odor images with a cooled CCD camera. Shapes of the targets that emitted odor could be roughly recognized by the odor-shape images. From the spatiotemporal images of odors, twodimensional odor expanse could be obtained..
151. Hirotaka Matsuo, Yudai Furusawa, Masashi Imanishi, Seiichi Uchida, Kenshi Hayashi, Optical odor imaging by fluorescence probes, Journal of Robotics and Mechatronics, 10.20965/jrm.2012.p0047, 24, 1, 47-54, 2012.02, Odor gas detection is important for the detection of explosives, environmental sensing, biometrics, foodstuffs and a comfortable life. Such odor-source localizations is an active research area for robotics. In this study, we tried to detect odor chemicals with an optical method that can be applied for the spatiotemporal detection of odor. We used four types of fluorescence dyes; tryptophan, quinine sulfate, acridine orange, and 1-anilinonaphthalene-8-sulfonate (ANS). As analyses, we measured the following four odor chemicals, 2-furaldehyde, vanillin, acetophenone, and benzaldehyde. The fluorescence-quenching mechanism of PET (Photoinduced Electron Transfer) or FRET (Fluorescence Resonance Electron Transfer), which occur between fluorescence dyes and odor compounds, could prevent unintended detection of various odorants that is caused by their unspecific adsorption onto the detecting materials. The fluorescence changes were then observed. Thus, we could detect the odor substances through fluorescent quenching by using the fluorescence dyes. Odor information could be obtained by response patterns across all the fluorescence dyes. Moreover, we captured odor images with a cooled CCD camera. Shapes of the targets that emitted odor could be roughly recognized by the odor-shape images. From the spatiotemporal images of odors, twodimensional odor expanse could be obtained..
152. Hirotaka Matsuo, Yudai Furusawa, Masashi Imanishi, Seiichi Uchida, and Kenshi Hayashi, Optical Odor Imaging by Fluorescence Probes, Journal of Robotics and Mechatoronics, 2012.01.
153. Wenjie Cai, Seiichi Uchida, Hiroaki Sakoe, Toward Forensics by Stroke Order Variation - Performance Evaluation of Stroke Correspondence Methods, 4th International Workshop, IWCF 2010 Tokyo, Japan, November 11-12, 2010, Revised Selected Papers, 10.1007/978-3-642-19376-7_4, 6540, 43-+, 2011.11, We consider personal identification using stroke order variations of online handwritten character patterns, which are written on, e.g., electric tablets. To extract the stroke order variation of an input character pattern, it is necessary to establish the accurate stroke correspondence between the input pattern and the reference pattern of the same category. In this paper we compare five stroke correspondence methods: the individual correspondence decision (ICD), the cube search (CS), the bipartite weighted matching (BWM), the stable marriage (SM), and the deviation-expansion model (DE). After their brief review, they are experimentally compared quantitatively by not only their stroke correspondence accuracy but also character recognition accuracy. The experimental results showed the superiority CS and BWM over ICD, SM and DE..
154. Seiichi Uchida, Toru Sasaki, Feng Yaokai, A Generative Model for Handwritings Based on Enhanced Feature Desynchronization, 11TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR 2011), 10.1109/ICDAR.2011.124, 589-593, 2011.09, A new generative model of handwriting patterns is proposed for interpreting their deformations. The model is based on feature desynchronization, which is a coupling process of x and y coordinate features of different timings. By changing the timings to be coupled, the model can generate various deformed patterns from a single pattern. The model is further enhanced by incorporating an adaptive rotation at each timing for increasing the variety of deformed patterns. An important fact is that this enhanced desynchronization model can be interpreted intuitively as a deformation process in actual handwriting. Experimental results showed that the model can generate various handwriting patterns close to actual deformed patterns..
155. Seiichi Uchida, Yuki Shigeyoshi, Yasuhiro Kunishige, Feng Yaokai, A Keypoint-Based Approach Toward Scenery Character Detection, 11TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR 2011), 10.1109/ICDAR.2011.168, 819-823, 2011.09, This paper proposes a new approach toward scenery character detection. This is a keypoint-based approach where local features and a saliency map are fully utilized. Local features, such as SIFT and SURF, have been commonly used for computer vision and object pattern recognition problems; however, they have been rarely employed in character recognition and detection problems. Local feature, however, is similar to directional features, which have been employed in character recognition applications. In addition, local feature can detect corners and thus it is suitable for detecting characters, which are generally comprised of many corners. For evaluating the performance of the local feature, an experimental result was done and its results showed that SURF, i.e., a simple gradient feature, can detect about 70% of characters in scenery images. Then the saliency map was employed as an additional feature to the local feature. This trial is based on the expectation that scenery characters are generally printed to be salient and thus higher salient area will have a higher probability to be a character area. An experimental result showed that this expectation was reasonable and we can have better discrimination accuracy with the saliency map..
156. Soma Shiraishi, Yaokai Feng, Seiichi Uchida, A new approach for instance-based skew estimation, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 10.1007/978-3-642-23866-6_21, 6884, 4, 195-203, 2011.09, This paper proposes a new approach to a method to estimate a skew angle of a rotated document image. This is realized by using Speeded-Up Robust Features (SURF), and the goal is that it enables the image to be rotated back to the correct orientation. SURF detects a number of keypoints both from the reference image on which a set of standard alphabets (e.g. letter eaf through ezf in a certain font) are written, and the image of the rotated document. Two nearest features each from the reference image and the input image are compared to decide to how many degrees the feature in the input image is rotated. Finally the skew angle of the whole input image( the global skew angle) is decided by the majority of the total votes of angles that have been calculated as mentioned above. © 2011 Springer-Verlag..
157. Wang Song, Seiichi Uchida, Marcus Liwicki, Comparative Study of Part-Based Handwritten Character Recognition Methods, 11TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR 2011), 10.1109/ICDAR.2011.167, 814-818, 2011.09, The purpose of this paper is to introduce three part-based methods for handwritten character recognition and then compare their performances experimentally. All of those methods decompose handwritten characters into "parts". Then some recognition processes are done in a part-wise manner and, finally, the recognition results at all the parts are combined via voting to have the recognition result of the entire character. Since part-based methods do not rely on the global structure of the character, we can expect their robustness against various deformations. Three voting methods have been investigated for the combination: single voting, multiple voting, and class distance. All of them use different strategies for voting. Experimental results on the MNIST database showed the relative superiority of the class distance method and the robustness of the multiple voting method against the reduction of training set..
158. Akira Yoshida, Marcus Liwichi, Seiichi Uchida, Masakazu Iwamura, Shinichiro Omachi, Koichi Kise, Handwriting on paper as a cybermedium, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 10.1007/978-3-642-23866-6_22, 6884, 4, 204-211, 2011.09, In this paper, we report recent work of the data-embedding pen, which adds an ink-dot sequence along a handwritten pattern during writing. The ink-dot sequence represents some information, such as writer's name, date of writing, and URL. This information drastically increases the value of handwriting on a paper. The embedded information can be extracted from the handwritten pattern by image processing techniques and a stroke recovery technique. Consequently, we can augment the handwritten pattern by the data-embedding pen to carry arbitrary information. © 2011 Springer-Verlag..
159. Wang Song, Seiichi Uchida, Marcus Liwicki, Look Inside the World of Parts of Handwritten Characters, 11TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR 2011), 10.1109/ICDAR.2011.161, 784-788, 2011.09, Part-based recognition is expected to be robust in difficult handwritten character recognition tasks. This is because part-based recognition is based on aggregation of independent recognition results at individual local parts without considering their global relations and thus is robust against various deformations, such as partial occlusion, overlap, broken stroke, etc. Since part-based recognition is a new approach, there are still several open problems toward its practical use. For example, compared with entire images, local parts are more ambiguous, i.e., less discriminative. For better recognition accuracy and less computations, we need to know the characteristics of local parts and then, for example, discard less discriminative parts. The purpose of this paper is to conduct some experiments in order to observe and analyze how the local parts of multiple classes are distributed in feature spaces. By handling parts appropriately based on the analysis, we will be able to enhance the usefulness of the part-based method..
160. Marcus Liwicki, Yoshida Akira, Seiichi Uchida, Masakazu Iwamura, Shinichiro Omachi, Koichi Kise, Reliable Online Stroke Recovery from Offline Data with the Data-Embedding Pen, 11TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR 2011), 10.1109/ICDAR.2011.278, 1384-1388, 2011.09, In this paper we propose a complete system for online stroke recovery from offline data. The key idea of our approach is to use a novel pen device which is able to embed meta information into the ink during writing the strokes. This pen-device overcomes the need to get access to any memory on the pen when trying to recover the information, which is especially useful in multi-writer or multi-pen scenarios. The actual data-embedding is achieved by an additional ink-dot sequence along a handwritten pattern during writing. We design the ink-dot sequence in such a way that it is possible to retrieve the writing direction from a scanned image. Furthermore, we propose novel processing steps in order to retrieve the original writing direction and finally the embedded data. In our experiments we show that we can reliably recover the writing direction of various patterns. Our system is able to determine the writing direction of straight lines, simple patterns with crossings (e.g., "x" and "ll"), and even more complex patterns like handwritten words and symbols..
161. Yasuhiro Kunishige, Feng Yaokai, Seiichi Uchida, Scenery Character Detection with Environmental Context, 11TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR 2011), 10.1109/ICDAR.2011.212, 1049-1053, 2011.09, For scenery character detection, we introduce environmental context, which is modeled by scene components, such as sky and building. Environmental context is expected to regulate the probability of character existence at a specific region in a scenery image. For example, if a region looks like a part of a building, the region has a higher probability than another region like a part of the sky. In this paper, environmental context is represented by state-of-the-art texture and color features and utilized in two different ways. Through experimental results, it was clearly shown that the environmental context has an effect of improving detection accuracy..
162. Seiichi Uchida, Wenjie Cai, Akira Yoshida, Yaokai Feng, WATCHING PATTERN DISTRIBUTION VIA MASSIVE CHARACTER RECOGNITION, 2011 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 10.1109/MLSP.2011.6064640, 1-6, 2011.09, The purpose of this paper is to analyze how image patterns distribute inside their feature space. For this purpose, 832,612 manually ground-truthed handwritten digit patterns are used. Use of character patterns instead of general visual object patterns is very essential for our purpose. First, since there are only 10 classes for digits, it is possible to have an enough number of patterns per class. Second, since the feature space of small binary character images is rather compact, it is easier to observe the precise pattern distribution with a fixed number of patterns. Third, the classes of character patterns can be defined far more clearly than visual objects. Through nearest neighbor analysis on 832,612 patterns, their distribution in the 32 x 32 binary feature space is observed quantitatively and qualitatively. For example, the visual similarity of nearest neighbors and the existence of outliers, which are surrounded by patterns from different classes, are observed..
163. Affine invariant character recognition by progressive removing
Recognizing characters in scene images suffering from perspective distortion is a challenge. Although there are some methods to overcome this difficulty, they are time-consuming. In this paper, we propose a set of affine invariant features and a new recognition scheme called "progressive removing" that can help reduce the processing time. Progressive removing gradually removes less feasible categories and skew angles by using multiple classifiers. We observed that progressive removing and the use of the affine invariant features reduced the processing time by about 60% in comparison to a trivial one without decreasing the recognition rate. © 2011 The Institute of Electrical Engineers of Japan..
164. A. Nedzved, O. Nedzved, Sergey Ablameyko, Seiichi Uchida, Object Extraction at Nano-Surface Images, Proceedings of The Eleventh International Conference on Pattern Recognition and Information Processing (PRIP2011, Minsk, Belarus), 2011.05.
165. Object tracking with RFID
This paper reports a new method for visual tracking of humans using active RFID technology. Previous studies were based on the assumption that the radio intensity from an RFID tag will be linearly proportional to the distance between the tag and the antenna or will remain unchanged
however, in reality, the intensity fluctuates significantly and changes drastically with a small change in the environment. The proposed method helps to overcome this problem by using only accurate binary information that reveals whether the target person is close to the antenna. Several experimental results have shown that the information from the RFID tag was useful for reliable tracking of humans. © 2011 The Institute of Electrical Engineers of Japan..
166. Wenjie Cai, Yaokai Feng, Seiichi Uchida, Massive character recognition with a large ground-truthed database, Proceedings of the ACM Symposium on Applied Computing, 10.1145/1982185.1982241, 240-244, 2011.03, In character recognition, multiple prototype classifiers, where multiple patterns are prepared as representative patterns of each class, have often been employed to improve recognition accuracy. Our question is how we can improve the recognition accuracy by increasing prototypes massively in the multiple prototype classifier. In this paper, we will answer this question through several experimental analyses, using a simple 1-nearest neighbor (1-NN) classifier and about 550,000 manually labeled handwritten numeral patterns. The analysis results under the leave-one-out evaluation showed not only a simple fact that more prototypes provide fewer recognition errors, but also a more important fact that the error rate decreases approximately to 40% by increasing the prototypes 10 times. The analysis results also showed other phenomena in massive character recognition, such that the NN prototypes become visually closer to the input pattern by increasing the prototypes. © 2011 ACM..
167. Seiichi Uchida, Ikko Fujimura, Hiroki Kawano, Yaokai Feng, Analytical Dynamic Programming Tracker, COMPUTER VISION-ACCV 2010, PT I, 10.1007/978-3-642-19315-6_23, 6492, 296-309, 2010.11, Visual tracking is formulated as an optimization problem of the position of a target object on video frames. This paper proposes a new tracking method based on dynamic programming (DP). Conventional DP-based tracking methods have utilized DP as an efficient breadth-first search algorithm. Thus, their computational complexity becomes prohibitive if the search breadth becomes large according to the increase of the number of parameters to be optimized. In contrast, the proposed method can avoid this problem by utilizing DP as an analytical solver rather than the conventional breadth-first search algorithm. In addition to experimental evaluations, it will be revealed that the proposed method has a close relation to the well-known KLT tracker..
168. Akihiro Mori, Seiichi Uchida, Ryo Kurazume, Rin-ichiro Taniguchi, Tsutomu Hasegawa, Automatic Construction of Gesture Network for Gesture Recognition, TENCON 2010: 2010 IEEE REGION 10 CONFERENCE, 10.1109/TENCON.2010.5686549, 923-928, 2010.11, This paper is concerned with automatic construction algorithm for gesture network. Gesture network is a network model of gestures for gesture recognition, especially early recognition and motion prediction. Manual construction of gesture network is inefficient, and thus its automatic construction method is expected; this is because gesture network has to be constructed, whenever target gestures are changed. This paper proposes an automatic construction algorithm for gesture network by logical DP matching. The experiment was conducted for evaluating the performance of the gesture network constructed automatically. The experimental result indicated that the proposed automatic construction algorithm for gesture network can be alternative of manual construction..
169. Marcus Liwicki, Seiichi Uchida, Masakazu Iwamura, Shinichiro Omachi, Koichi Kise, Embedding Meta-information in handwriting - Reed-solomon for reliable error correction, Proceedings - 12th International Conference on Frontiers in Handwriting Recognition, ICFHR 2010, 10.1109/ICFHR.2010.127, 51-56, 2010.11, In this paper a more compact and more reliable coding scheme for the data-embedding pen is proposed. The data-embedding pen produces an additional ink-dot sequence along a handwritten pattern during writing. The ink-dot sequence represents, for example, meta-information (such as the writer's name and the date of writing) and thus drastically increases the value of the handwriting on a physical paper. There is no need to get access to any memory on the pen to recover the information, which is especially useful in multi-writer or multi-pen scenarios. In this paper we focus on the compactness of the encoded information. The aim of this paper is to encode as much information as possible in short stroke sequences. In our experiments we show that we can embed more information in shorter strokes than in previous work. In straight lines as short as 5 cm, 32 bits can successfully be embedded. Furthermore, the new encoding scheme also works reliably on more complex patterns. © 2010 IEEE..
170. Seiichi Uchida, Marcus Liwicki, Part-based recognition of handwritten characters, Proceedings - 12th International Conference on Frontiers in Handwriting Recognition, ICFHR 2010, 10.1109/ICFHR.2010.90, 545-550, 2010.11, In the part-based recognition method proposed in this paper, a handwritten character image is represented by just a set of local parts. Then, each local part of the input pattern is recognized by a nearest-neighbor classifier. Finally, the category of the input pattern is determined by aggregating the local recognition results. This approach is opposed to conventional character recognition approaches which try to benefit from the global structure information as much as possible. Despite a pessimistic expectation, we have reached recognition rates much higher than 90% for a digit recognition task. In this paper we provide a detailed analysis in order to understand the results and find the merits of the local approach. © 2010 IEEE..
171. Kazumasa Iwata, Koichi Kise, Masakazu Iwamura, Seiichi Uchida, Shinichiro Omachi, Tracking and retrieval of pen tip positions for an intelligent camera pen, Proceedings - 12th International Conference on Frontiers in Handwriting Recognition, ICFHR 2010, 10.1109/ICFHR.2010.50, 277-282, 2010.11, This paper presents a method of recovering digital ink for an intelligent camera pen, which is characterized by the functions that (1) it works on ordinary paper and (2) if an electronic document is printed on the paper the recovered digital ink is associated with the document. Two technologies called paper fingerprint and document image retrieval are integrated for realizing the above functions. The key of the integration is the introduction of image mosaicing and fast retrieval of previously seen fingerprints based on hashing of SURF local features. From the experimental results of 50 handwritings, we have confirmed that the proposed method is effective to recover and locate the digital ink from the handwriting on a physical paper. © 2010 IEEE..
172. Seiichi Uchida, Marcus Liwicki, Analysis of local features for handwritten character recognition, Proceedings - International Conference on Pattern Recognition, 10.1109/ICPR.2010.479, 1945-1948, 2010.08, This paper investigates a part-based recognition method of handwritten digits. In the proposed method, the global structure of digit patterns is discarded by representing each pattern by just a set of local feature vectors. The method is then comprised of two steps. First, each of J local feature vectors of a target pattern is recognized into one of ten categories ("0"-"9") by the nearest neighbor discrimination with a large database of reference vectors. Second, the category of the target pattern is determined by the majority voting on the J local recognition results. Despite a pessimistic expectation, we have reached recognition rates much higher than 90% for the task of digit recognition. © 2010 IEEE..
173. Toru Wakahara, Seiichi Uchida, Hierarchical decomposition of handwriting deformation vector field for improving recognition accuracy, Proceedings - International Conference on Pattern Recognition, 10.1109/ICPR.2010.459, 1860-1863, 2010.08, This paper addresses the problem of how to extract, describe, and evaluate handwriting deformation from the deterministic viewpoint for improving recognition accuracy. The key ideas are threefold. The first is to extract handwriting deformation vector field (DVF) between a pair of input and target images by 2D warping. The second is to hierarchically decompose the DVF by a parametric deformation model of global/local affine transformation, where local affine transformation is iteratively applied to the DVF by decreasing window sizes. The third is to accept only low-order deformation components as natural, within-class handwriting deformation. Experiments using the handwritten numeral database IPTP CDROM1B show that correlation-based matching absorbing components of global affine transformation and local affine transformation up to the 3rd order achieved a higher recognition rate of 92.1% than that of 87.0% obtained by original 2D warping. © 2010 IEEE..
174. Akio Fujiyoshi, Masakazu Suzuki, Seiichi Uchida, Grammatical verification for mathematical formula recognition based on context-free tree grammar, Mathematics in Computer Science, 10.1007/s11786-010-0023-8, 3, 3, 279-298, 2010.05, This paper proposes the use of a formal grammar for the verification of mathematical formulae for a practical mathematical OCR system. Like a C compiler detecting syntax errors in a source file, we want to have a verification mechanism to find errors in the output of mathematical OCR. A linear monadic context-free tree grammar (LM-CFTG) is employed as a formal framework to define "well-formed" mathematical formulae. A cubic time parsing algorithm for LM-CFTGs is presented. For the purpose of practical evaluation, a verification system for mathematical OCR is developed, and the effectiveness of the system is demonstrated by using the ground-truthed mathematical document database InftyCDB-1 and a misrecognition database newly constructed for this study. © 2010 Birkhäuser Verlag Basel/Switzerland..
175. A General Assignment of Supplementary Information
特徴量のみでは本質的に避けることができない誤認識を回避するために,付加情報を用いるパターン認識という枠組みが提案されている.この方式では,パターン認識を行う際に,付加情報と呼ばれるクラスの決定を補助する少量の情報を特徴量と同時に用いて認議性能の改善を目指す.付加情報は自由に設定でき,通常は誤認識率が最小になるように設定する.ここで問題となるのは,誤認識率が最小になる付加情報の設定方法である.常に正しい付加情報が得られるいう理想的な条件においては既に問題が定式化され,付加情報の割当方法が導かれている.しかし,実環境での使用を考えると,付加情報に生じる観測誤差を考慮した割当方法が求められる.そこで本論文では付加情報の観測誤差を考慮に入れて,問題を新たに定式化する.これは付加情報が誤らない場合にも有効な一般的なものである.本論文で導いた割当方法が有効に機能することをマハラノビス距離を用いた実験で例示する..
176. Recognition of Sequential Patterns by Combining Mutually Constrained Local Classifiers
本論文では,時系列パターンの認識手法として,各サンプル点(各時刻)で認識すなわちクラスラベルの決定を行い,最終的にクラスラベル数の多数決によってクラスを確定する手法を検討する.その一つの特徴として,必要に応じて複数サンプル点間に相互制約を設け,それらをできるだけ同じクラスにラベリングする点が挙げられる.これにより,クラスラベルの割当方を制御でき,自由度の高い識別が可能となる.クラスラベルの割当の組合せは総サンプル点数に対し指数関数的に増加する.そこで,グラフの最小切断アルゴリズムいわゆるグラフカットを用いることで,総サンプル点数に対して多項式時間での計算を実現する.オンライン文字データを対象とした認識実験を行い,本手法の有効性を検証した..
177. Real-Time Nonlinear FEM-Based Simulator for Deforming Volume Model of Soft Organ by Neural Network
本論文では,ニューラルネットワークを用いて,軟性臓器モデルの変形をシミュレートする新たな手法を提案する.提案手法は,基本的なモデルの変形(以後,変形モードと呼ぶ)の組合せに基づいて,モデルの変形を推定する.つまり,変形モードをあらかじめ非線形有限要素法で求め,臓器に加わった外力と,それに対応する変形モードの関係をニューラルネットワークで学習する.学習したニューラルネットワークは,非線形有限要素解析によりモデルの振舞いを推定することを模倣する.実験結果より,提案手法は,非線形有限要素解析とほぼ同程度の精度を保ちつつ,計算コストを大幅に削減することができた..
178. Digital pen
ディジタルペンについて概観する.ディジタルペンは,スムーズな文字・図形入力機能,および迅速なポインティング機能を持った,優れたインタフェースである.種類としては,筆記対象が限定されたもの,筆記対象が任意のものに大別される.本稿では,タブレットやアノトペンなどすでに製品化されている技術について述べ,今後の課題を考察する..
179. Trachea and Esophagus Classification by AdaBoost
気道確保法の一つである気道挿管では,通常まず喉頭鏡を使って喉頭展開を行い,声門の位置を目視により確認する.しかし実際の医療現場では,上気道閉塞など様々な要因で,声門の位置を目視により確認しづらい場合がある.この不完全な確認が原因で食道へ誤挿管した場合,気道が確保されず危険なだけでなく,無理な目視のために頸椎や歯牙損傷などの合併症を引き起こす危険性がある.安全・確実な気道挿管の実現に向けて,我々は,スタイレット先端に小型カメラを搭載した自動気管内挿管システムを開発することを自指している.本論文では,その要素機能として,カメラから取得される画像から,挿管チューブが気道あるいは食道に挿管されているかを自動的に識別する方法を提案する.本手法は,気道画像には気道周囲の輪状軟骨が特徴的に観察されることから,まずこの環状模様の記述に適した特徴量を定義し,それに基づいた気道・食道識別器をAdaBoostによって構築する.実験の結果,97.6%の高い識別率で気道・食道の判別が可能であり,提案手法の有効性が確認できた..
180. Walaa Aly, Seiichi Uchida and Masakazu Suzuki, Extract Baseline Information Using Support Vector Machine, Proceedings of The 9th Asian Symposium on Computer Mathematics, 2009.12.
181. Walaa Aly, Seiichi Uchida, Masakazu Suzuki, Automatic Classification of Spatial Relationships among Mathematical Symbols Using Geometric Features, IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 10.1587/transinf.E92.D.2235, E92D, 11, 2235-2243, 2009.11, Machine recognition of mathematical expressions on printed documents is not trivial even when all the individual characters and symbols in an expression can be recognized correctly. In this paper, an automatic classification method of spatial relationships between the adjacent symbols in a pair is presented. This classification is important to realize an accurate structure analysis module of math OCR. Experimental results on very large databases showed that this classification worked well with an accuracy of 99.525% by using distribution maps which are defined by two geometric features, relative size and relative position, with careful treatment on document-dependent characteristics..
182. Visual Tracking Based on Global Optimization : DP Tracking
映像中の物体のトラッキングは,その物体のフレーム間の移動量の最適推定問題として定式化される.本論文では,その大局的最適解を得るために,動的計画法(DP)を用いたトラッキング手法を提案する.従来,幅優先探索の一種として扱われていたDP最適化では,画像のサイズやパラメータの増加により,探索幅が非常に大きくなり計算量が増加するという問題がある.これに対し本論文ではDPの解析的解法をトラッキング問題に適用する.これは,最適化の評価に用いられる局所的な誤差関数を二次関数近似することで,DPによる最適化過程に微分による最適化を導入した手法である.幅優先探索なしに解析的にかつ高速に最適解を得ることができ,トラッキング問題には特に有効といえる.本論文では,本手法の定式化と実験結果を示す..
183. Koichi Kise, Kazumasa Iwata, Tomohiro Nakai, Masakazu Iwamura, Seiichi Uchida, Shinichiro Omachi, Document-Level Positioning of a Pen Tip by Retrieval of Image Fragments, Proceedings of the Third International Workshop on Camera-Based Document Analysis and Recognition (CBDAR 2009), 61-68, 2009.07.
184. Seiichi Uchida, Katsuhiro Itou, Masakazu Iwamura, Shinichiro Omachi, Koichi Kise, On a Possibility of Pen-Tip Camera for the Reconstruction of Handwritings, Proceedings of the Third International Workshop on Camera-Based Document Analysis and Recognition (CBDAR 2009), 119-126, 2009.07.
185. Seiichi Uchida, Ryoji Hattori, Masakazu Iwamura, Shinichiro Omachi, Koichi Kise, Selecting and Evaluating Conspicuous Character Patterns, Proceedings of the Third International Workshop on Camera-Based Document Analysis and Recognition (CBDAR 2009), 111-118, 2009.07.
186. Kazumasa Iwata, Koichi Kise, Tomohiro Nakai, Masakazu Iwamura, Seiichi Uchida, Shinichiro Omachi, Capturing digital ink as retrieving fragments of document images, Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, 10.1109/ICDAR.2009.192, 1236-1240, 2009.07, This paper presents a new method of capturing digital ink for pen-based computing. Current technologies such as tablets, ultrasonic and the Anoto pens rely on special mechanisms for locating the pen tip, which result in limiting the applicability. Our proposal is to ease this problem - a camera pen that allows us to write on ordinary paper for capturing digital ink. A document image retrieval method called LLAH is tuned to locate the pen tip efficiently and accurately on the coordinates of a document only by capturing its tiny fragment. In this papeic we report some results on captured digital ink as well as to evaluate their quality. ©2009 IEEE..
187. Seiichi Uchida, Ryoji Hattori, Masakazu Iwamura, Shinichiro Omachi, Koichi Kise, Conspicuous character patterns, Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, 10.1109/ICDAR.2009.196, 16-20, 2009.07, Detection of characters in scenery images is often a very difficult problem. Although many researchers have tackled this difficult problem and achieved a good performance, it is still difficult to suppress many false alarms andalthough missings. This paper investigates a conspicuous character pattern, which is a special pattern designed for easier detection. In order to have an example of the conspicuous character pattern, we select a character font with a larger distance from a non-character pattern distribution and, simultaneously, with a smaller distance from a character pattern distribution. Experimental results showed that the character font selected by this method is actually more conspicuous (i.e., detected more easily) than other fonts. © 2009 IEEE..
188. Toru Wakahara, Seiichi Uchida, Hierarchical decomposition of handwriting deformation vector field using 2D warping and global/local affine transformation, Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, 10.1109/ICDAR.2009.33, 1141-1145, 2009.07, This paper addresses the basic problem of how to extract, describe, and evaluate handwriting deformation from not the statistical but the deterministic viewpoint. The key ideas are threefold. The first idea is to apply 2D warping to extraction of handwriting deformation vector field (DVF) between a pair of input and target images. The second idea is to hierarchically decompose the DVF by a parametric deformation model of global/local affine transformation. As a result, the DVF is expressed by a series of deformation components each of which is characterized by a window size of local affine transformation. The third idea is interrupting of the series of deformation components to obtain natural, reasonable handwriting deformation. Experiments using the handwritten numeral database IPTP CDROM1B show that 31.1% of the handwriting DVF is expressed by global affine transformation, and the subsequent few local affine transformations successfully discriminate natural handwriting deformation from unnatural one. © 2009 IEEE..
189. Walaa Aly, Seiichi Uchida, Akio Fujiyoshi, Masakazu Suzuki, Statistical classification of spatial relationships among mathematical symbols, Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, 10.1109/ICDAR.2009.90, 1350-1354, 2009.07, In this paper, a statistical decision method for automatic classification of spatial relationships between each adjacent pair is proposed. Each pair is composed of mathematical symbols and/or alphabetical characters. Special treatment of mathematical symbols with variable size is important. This classification is important to recognize an accurate structure analysis module of math OCR. Experimental results on a very large database showed that the proposed method worked well with an accuracy of 99.57% by two important geometric feature relative size and relative position. © 2009 IEEE..
190. Yoshinori Katayama, Seiichi Uchida, Hiroaki Sakoe, Stochastic model of stroke order variation, Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, 10.1109/ICDAR.2009.146, 803-807, 2009.07, A stochastic model of stroke order variation is proposed and applied to the stroke-order free on-line Kanji character recognition. The proposed model is a hidden Markov model (HMM) with a special topology to represent all stroke order variations. A sequence of state transitions from the initial state to the final state of the model represents one stroke order and provides a probability of the stroke order. The distribution of the stroke order probability can be trained automatically by using an EM algorithm from a training set of on-line character patterns. Experimental results on large-scale test patterns showed that the proposed model could represent actual stroke order variations appropriately and improve recognition accuracy by penalizing incorrect stroke orders. © 2009 IEEE..
191. Akio Fujiyoshi, Masakazu Suzuki, Seiichi Uchida, Syntactic detection and correction of misrecognitions in mathematical OCR, Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, 10.1109/ICDAR.2009.150, 1360-1364, 2009.07, This paper proposes a syntactic method for detection and correction of misrecognized mathematical formulae for a practical mathematical OCR system. Linear monadic context-free tree grammar (LM-CFTG) is employed as a formal framework to define syntactically acceptable mathematical formulae. For the purpose of practical evaluation, a verification system is developed, and the effectiveness of the method is demonstrated by using the ground-truthed mathematical document database InftyCDB-1 and a misrecognition database newly constructed for this study. A satisfactory number of misrecognitions are detected and delivered to the correction process. © 2009 IEEE..
192. Visual Tracking of an Object with its Motion Information
Tracking of a moving robot in surveillance video is an important task for coexistence of human beings with robots. An essential technology to manage coexistence environment of human beings and moving robots is separation and tracking of moving robots. For this task, the moving robot should be separated from other moving objects, i.e., human beings. We assume that the robot provides its additional motion information to the surveillance system to ease the task. The robot can be tracked from the other objects as a moving region being consistent with the additional motion information. For this purpose, we modify a tracking algorithm based on particle filter in order to incorporate the additional motion information. The results of an experiment on real surveillance video sequences have indicated that the proposed framework can separate and track a moving robot under the existence of several walking persons..
193. Masakazu Iwamura, Ryo Niwa, Akira Horimatsu, Koichi Kise, Seiichi Uchida, Shinichiro Omachi, Layout-free dewarplng of planar document images, Proceedings of SPIE - The International Society for Optical Engineering, 10.1117/12.806122, 7247, 1-10, 2009.01, For user convenience, processing of document images captured by a digital camera has been attracted much attention. However, most existing processing methods require an upright image such like captured by a scanner. Therefore, we have to cancel perspective distortion of a camera-captured image before processing. Although there are rectification methods of the distortion, most of them work under certain assumptions on the layout
the borders of a document are available, lextlines are in parallel, a stereo camera or a video image is required and so on. In this paper, we propose a layout-free rectification method which requires none of the above assumptions. We confirm the effectiveness of the proposed method by experiments. © 2009 SPIE-IS&amp
T..
194. Walaa Aly, Seiichi Uchida, Masakazu Suzuki, Identifying subscripts and superscripts in mathematical documents, Mathematics in Computer Science, 10.1007/s11786-008-0051-9, 2, 2, 195-209, 2008.12, In mathematical OCR, it is necessary to analyze two-dimensional structures of the component characters and symbols in mathematical expressions printed in scientific documents. In this paper, we analyze the positional relationships between adjacent characters for the purpose of automatic discrimination between baseline characters, subscripts, and superscripts, which is one of the most important and delicate parts of structure analysis. It has been proven through a large-scale experiment that this discrimination can be carried out almost perfectly (~99.89%) by using the relative size and position of adjacent characters. © 2008 Birkhäuser Verlag Basel/Switzerland..
195. Yoshinori Katayama, Seiichi Uchida, Hiroaki Sakoe, A New HMM for On-Line Character Recognition Using Pen-Direction and Pen-Coordinate Features, 19TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOLS 1-6, 10.1109/ICPR.2008.4761449, 781-784, 2008.12, A new hidden Markov model (HMM) is proposed for on-line character recognition using two typical features, pen-direction feature and pen-coordinate feature. These two features are quite different in their stationarity; pen-direction feature is stationary within evey line segment of a stroke whereas pen-coordinate feature is not. In the proposed HMM, these contrasting features are used in a separative and selective wary Specifically speaking, pen-direction feature is outputted repeatedly at intra-stale transition whereas pen-coordinate feature is outputted once at inter-state transition. The superiority of the proposed HMM over the conventional HMMs was shown through single-stroke and multi-stroke character recognition experiments..
196. Seiichi Uchida, Kazuma Amamoto, Early Recognition of Sequential Patterns by Classifier Combination, 19TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOLS 1-6, 10.1109/ICPR.2008.4761137, 3011-3014, 2008.12, This paper proposes an early recognition method, i.e., a method for recognizing sequential patterns at their beginning parts. The method is based on a combination of frame classifiers prepared at individual frames. The training patterns misrecognized by the frame classifier at a certain frame are heavily weighted for the complementary training of the frame classifier at the next frame. The method was applied to an online character recognition task for showing its usefulness..
197. Akihiro Mori, Seiichi Uchida, Fast Image Mosaicing Based on Histograms, IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 10.1093/ietisy/e91-d.11.2701, E91D, 11, 2701-2708, 2008.11, This paper introduces a fast image mosaicing technique that does not require costly search on image domain (e.g., pixel-to-pixel correspondence search on the image domain) and the iterative optimization (e.g., gradient-based optimization, iterative optimization, and random optimization) of geometric transformation parameter. The proposed technique is organized in a two-step manner. At both steps, histograms are fully utilized for high computational efficiency. At the first step, a histogram of pixel feature values is utilized to detect pairs of pixels with the same rare feature values as candidates of corresponding pixel pairs. At the second step, a histogram of transformation parameter values is utilized to determine the most reliable transformation parameter value. Experimental results showed that the proposed technique can provide reasonable mosaicing results in most cases with very feasible computations..
198. Shinichiro Omachi, Masakazu Iwamura, Seiichi Uchida, Koichi Kise, Information Embedment with Cross Ratio of Areas for Accurate Camera-Based Character Recognition, Proceedings of the Third Korea-Japan Joint Workshop on Pattern Recognition (KJPR 2008), 111-112, 2008.11.
199. A Primary Study on a Data-Embedding Pen.
200. Walaa Aly, Seiichi Uchida, Masakazu Suzuki, A Large-Scale Analysis of Mathematical Expressions for an Accurate Understanding of Their Structure, PROCEEDINGS OF THE 8TH IAPR INTERNATIONAL WORKSHOP ON DOCUMENT ANALYSIS SYSTEMS, 10.1109/DAS.2008.53, 549-556, 2008.09, A wide variety of mathematical expressions printed in scientific and technical reports can be recognized bill analyzing the two-dimensional layout structure. In this paper; the position relation between adjacent characters is analyzed for the purpose of automatic discrimination between baseline, subscript, and superscript characters. This analyzing is one of the most important parts of structure analysis. The proposed method is very promising, as the results reached up to (99.76%) over a very large database by using distribution map. This distribution map is defined by two important features, i.e., relative size and relative position..
201. Akira Horimatsu, Ryo Niwa, Masakazu Iwamura, Koichi Kise, Seiichi Uchida, Shinichiro Omachi, Affine Invariant Recognition of Characters by Progressive Pruning, PROCEEDINGS OF THE 8TH IAPR INTERNATIONAL WORKSHOP ON DOCUMENT ANALYSIS SYSTEMS, 10.1109/DAS.2008.88, 237-+, 2008.09, There are many problems to realize camera-based char. acter recognition. One of the problems is that characters in scenes arc often distorted by geometric transformations such as affine distortions. Although some methods that remove the affine distortions have been proposed, they cannot remove a rotation transformation of a character Thus a skew angle of a character has to be determined by examining all the possible angles. However this consumes quite a bit of time. In this paper in order to reduce the processing time for an affine invariant recognition, we propose a set of affine invariant features and a new recognition scheme called "progressive pruning." The progressive pruning gradually prunes less feasible categories and skew angles using multiple classifiers. We confirmed the progressive pruning with the affine invariant features reduced the processing time at least less than half without decreasing the recognition rate..
202. Ken'ichi Morooka, Xin Chen, Ryo Kurazume, Seiichi Uchida, Kenji Hara, Yumi Iwashita, Makoto Hashizume, Real-Time Nonlinear FEM with Neural Network for Simulating Soft Organ Model Deformation, MEDICAL IMAGE COMPUTING AND COMPUTER-ASSISTED INTERVENTION - MICCAI 2008, PT II, PROCEEDINGS, 10.1007/978-3-540-85990-1_89, 5242, Pt 2, 742-749, 2008.09, This paper presents a new method for simulating the deformation of organ models by using a neural network. The proposed method is based on the idea proposed by Chen et al. [2] that deformed model can be estimated from the superposition of basic deformation modes. The neural network finds a relationship between external forces and the models deformed by the forces. The experimental results show that the trained network can achieve a real-time simulation while keeping the acceptable accuracy compared with the nonlinear FEM computation..
203. Seiichi Uchida, Megumi Sakai, Masakazu Iwamura, Shinichiro Omachi, Koichi Kise, Skew Estimation by Instances, PROCEEDINGS OF THE 8TH IAPR INTERNATIONAL WORKSHOP ON DOCUMENT ANALYSIS SYSTEMS, 10.1109/DAS.2008.22, 201-208, 2008.09, This paper proposes a novel skew estimation method by instances. The instances to be learned (i.e., stored) are rotation invariants and a rotation. variant for each character category. Using the instances, it is possible to estimate a skew angle of each individual character on a document. This fact implies that the proposed method can estimate the skew angle of a document where characters do not form long straight text lines. Thus, the proposed method will be applicable to various documents such as signboard images captured by a camera. Experimental evaluation using synthetic and real images revealed the expected robustness against various character layouts..
204. Seiichi Uchida, Kazuya Niyagawa, Hiroaki Sakoe, Feature Desynchronization in Online Character Recognition, Proceedings of the 11th International Conference on Frontiers of Handwriting Recognition, 2008.08.
205. HMM for On-Line Handwriting Recognition by Selective Use of Pen-Coordinate Feature and Pen-Direction Feature
本論文では,高精度なオンライン文字認識のために,方向特徴並びに座標特徴を適切に使い分け可能な隠れマルコフモデル(HMM)を提案する.両特徴はいずれもオンライン文字認識の基本的な特徴量でありながら,全く異なった性質を示す.すなわち,線分内で方向特徴が定常的なのに対し,座標特徴は常に非定常である.したがって,HMMの枠組みにおいて両特徴を同等に扱うのは問題が多い.実際従来法では,座標特徴を用いずに方向特徴だけが用いられることが多かった.本論文で提案するHMMでは,方向特徴を状態内自己遷移における出力シンボルとして使用し,座標特徴を状態間遷移における出力シンボルとして使用する.このようにすることで,線分方向が一定した定常的な部分においては方向特微が,線分の方向が変化する過渡的な部分においては座標特徴が評価されることになる.このように特徴を使い分けることで,従来法に比べ認識精度を大幅に向上できることを,多画文字(漢字)の筆順フリー認識実験並びにその詳細な考察を通して示す..
206. Christopher Malon, Seiichi Uchida, Masakazu Suzuki, Mathematical symbol recognition with support vector machines, PATTERN RECOGNITION LETTERS, 10.1016/j.patrec.2008.02.005, 29, 9, 1326-1332, 2008.07, Single-character recognition of mathematical symbols poses challenges from its two-dimensional pattern, the variety of similar symbols that must be recognized distinctly, the imbalance and paucity of training data available, and the impossibility of final verification through spell check. We investigate the use of support vector machines to improve the classification of InftyReader, a free system for the OCR of mathematical documents. First, we compare the performance of SVM kernels and feature definitions on pairs of letters that InftyReader usually confuses. Second, we describe a successful approach to multi-class classification with SVM, utilizing the ranking of alternatives within InftyReader's confusion clusters. The inclusion of our technique in InftyReader reduces its misrecognition rate by 41%. (c) 2008 Elsevier B.V. All rights reserved..
207. Akio Fujiyoshi, Masakazu Suzuki, Seiichi Uchida, Verification of mathematical formulae based on a combination of context-free grammar and tree grammar, INTELLIGENT COMPUTER MATHEMATICS, PROCEEDINGS, 10.1007/978-3-540-85110-3_35, 5144, 415-+, 2008.07, This paper proposes the use of a formal grammar for the verification of mathematical formulae for a practical mathematical OCR. system. Like a C compiler detecting syntax errors in a source file, we, rant to have a verification mechanism to find errors in the output of mathematical OCR. Linear monadic context-free tree grammar (LM-CFTG) was employed as a formal framework to define "well-formed" mathematical formulae. For the purpose of practical evaluation, a verification system for mathematical OCR was developed, and the effectiveness of the system was demonstrated by rising the ground-truthed mathematical document database INIFTY CDB-1..
208. An HMM Representing Stroke Order Variations and Its Application to Online Character Recognition
本論文では,筆順フリーなオンライン文字認識の高精度化を目指し,(i)筆順変動の統計的モデルの構築,及び(ii)その認識における利用,の2点について検討する.一般に筆順フリー化には不自然な画対応の許容による誤認識の問題があるが,提案する筆順変動モデルを用いることでそれらを抑制できる.この筆順変動モデルは,筆順フリー認識のためのグラフモデル(キューブグラフ)の確率的拡張として定式化され,結果的に文字形状に関するゆう度と筆順のゆう度を同時に扱うことが可能な隠れマルコフモデル(HMM)の一種となる.公開されているオンライン文字データベース"HANDS-kuchibue.d-97-06-10"を用いた認識実験により,筆順変動モデル導入の有効性及び妥当性を明らかにした..
209. 2D/3D Registration by Back Projection and Geometrical Constraints
レンジセンサにより取得した幾何モデルにカラーセンサで撮影したテクスチャ画像を貼り付けて表示するテクスチャマッピングを容易に実現するには,テクスチャ画像と幾何モデルのみからカラー・レンジセンサ間の相対位置関係を知ることが望ましい.本論文では,幾何拘束に基づく大域的手法とエッジの対応付けに基づく局所的手法の組合せにより,センサ間の相対位置・姿勢を初期値の変動にロバストにかつ高精度に推定し,テクスチャ画像と幾何モデルの位置合せを実現する手法を提案する.本手法はまず,テクスチャ画像から稜線と平面領域を抽出する.次に,この稜線と平面領域を幾何モデルに逆投影し,対象における幾何拘束条件を推定しつつ,この拘束条件のもとでセンサ間の相対位置・姿勢の初期推定値を求める.最後に,テクスチャ画像と幾何モデルの各エッジ間の対応付けに基づき,センサ間の相対位置・姿勢を決定する.実験では,エッジ間の対応付けに基づく従来手法と比較して,位置合せの成功率が41%から75%に向上した..
210. Yumi Iwashita, Ryo Kurazume, Kenji Hara, Seiichi Uchida, Ken'ichi Morooka, Tsutomu Hasegawa, Fast 3D reconstruction of human shape and motion tracking by Parallel Fast Level Set Method, 2008 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, VOLS 1-9, 10.1109/ROBOT.2008.4543332, 980-+, 2008.05, This paper presents a parallel algorithm of the Level Set Method named the Parallel Fast Level Set Method, and its application for real-time 3D reconstruction of human shape and motion. The Fast Level Set Method is an efficient implementation algorithm of the Level Set Method and has been applied to several applications such as object tracking in video images and 3D shape reconstruction using multiple stereo cameras. In this paper, we implement the Fast Level Set Method on a PC cluster and develop a real-time motion capture system for arbitrary viewpoint image synthesis. To obtain high performance on a PC cluster, efficient load-balancing and resource allocation algorithms are crucial problems. We develop a novel optimization technique of load distribution based on the estimation of moving direction of object boundaries. In this technique, the boundary motion is estimated in the framework of the Fast Level Set Method, and the optimum load distribution is predicted and performed according to the estimated boundary motion and the current load balance. Experiments of human shape reconstruction and arbitrary viewpoint image synthesis using the proposed system are successfully carried out..
211. Seiichi Uchida, Hiromitsu Miyazaki, Hiroaki Sakoe, Mosaicing-by-recognition for video-based text recognition, PATTERN RECOGNITION, 10.1016/j.patcog.2007.08.005, 41, 4, 1230-1240, 2008.04, Text recognition captured in multiple frames by a hand-held video camera is a challenging task because it is possible to capture and recognize a longer line of text while improving the quality of the text image by utilizing the redundancy of the overlapping areas between the frames. For this task, the video frames should be registered, i.e., mosaiced, after compensating for their distortions due to camera shakes. In this paper, a mosaicing-by-recognition technique is proposed where the problems of video mosaicing and text recognition are formulated as a unified optimization problem and solved by a dynamic program ming-based optimization algorithm simultaneously and collaboratively. Experimental results indicate that, even if the frames undergo various distortions such as rotation, scaling, translation, and nonlinear speed fluctuation of camera movement, the proposed technique provides fine mosaic image by accurate distortion estimation (around 90% of perfect estimation) and character recognition accuracy (over 95%). (c) 2007 Elsevier Ltd. All rights reserved..
212. Seiichi Uchida, Elastic matching techniques for handwritten character recognition, Pattern Recognition Technologies and Applications: Recent Advances, 10.4018/978-1-59904-807-9.ch002, 17-38, 2008.04, This chapter reviews various elastic matching techniques for Handwritten character recognition. Elastic matching is formulated as an optimization problem of planar matching, or pixel-to-pixel correspondence, between two character images under a certain matching model, such as affine and nonlinear. Use of elastic matching instead of rigid matching improves the robustness of recognition systems against geometric deformations in Handwritten character images. In addition, the optimized matching represents the deformation of Handwritten characters and thus is useful for statistical analysis of the deformation. This chapter argues the general property of elastic matching techniques and their classification by matching models and optimization strategies. It also argues various topics and future work related to elastic matching for emphasizing theoretical and practical importance of elastic matching. © 2008, IGI Global..
213. Yuji Shinomura, Tomotaka Harano, Toru Tamaki, Toshiyuki Amano, Kazufumi Kaneda, Seiichi Uchida, Comparative study of path nomalizations for path prediction, Proceedings of 14th Korea-Japan Joint Workshop on Frontiers of Computer Vision, 2008.01.
214. Document Skew Estimation by Instance-Based Learning
各文字の回転変形に対する変量と不変量を事例として学習しておき,それらを利用することで文書画像の回転角を推定する方法を提案する.本手法は,文字単位で回転角を効率的に推定するため,文字列が直線的かつ平行にレイアウトされているという仮定が不要であり,したがって様々なレイアウトの文書に利用可能である..
215. Supplementary Information Embedment with Area Ratio for Camera-Based Character Recognition
ディジタルカメラを入力デバイスとして実環境中の文字を高精度に認識するために,文字画像と同時に認識補助のための付加情報を提示する方法が検討されている.付加情報は,人間にとって自然な形で提示されること,及び,幾何学的変形に対してロバストに抽出できることが要求される.本論文では,これらの要求を満たす手法として,面積比を利用した付加情報提示手法を提案する.すなわち,文字パターンを2色で印字することを前提とし,それぞれの色の領域の面積比を特定の値とするようにデザインする.具体的には,文字に影を付加したり輪郭線を別の色とする.これらは文字パターンのデザインとして既に行われており,提案手法はその線幅や面積を変えるにすぎない.したがって,提案手法は様々な用途に広く応用することが可能である.面積比はアフィン変換に不変であり,アフィン変換を受けた環境においても誤りなく抽出されることが期待される.実際に付加情報を埋め込んだ文字パターンを作成し,ディジタルカメラで撮影された画像中の文字パターンから付加情報を抽出する実験を行い,提案手法の有効性を確認する.また,付加情報を用いて文字を認識する実験を行い,認識精度が向上することを確認する..
216. Uchida, Seiichi, Mori, Akihiro, Kurazume, Ryo, Taniguchi, Rin-ichiro, Hasegawa, Tsutomu, Logical DP matching for detecting similar subsequence, COMPUTER VISION - ACCV 2007, PT I, PROCEEDINGS, 10.1007/978-3-540-76386-4_59, 4843, 628-+, 2007.11, A logical dynamic programming (DP) matching algorithm is proposed for extracting similar subpatterns from two sequential patterns. In the proposed algorithm, local similarity between two patterns is measured by a logical function, called support. The DP matching with the support can extract all similar subpatterns simultaneously while compensating nonlinear fluctuation. The performance of the proposed algorithm was evaluated qualitatively and quantitatively via an experiment of extracting motion primitives, i.e., common subpatterns in gesture patterns of different classes..
217. Recognition of Engineering Drawing Entities: Review of Approaches..
218. Ryoji Hattori, Seiichi Uchida, Color quantization for scene change detection, Proceedings of The First International Symposium on Information and Computer Elements, 2007.09.
219. Seiichi Uchida, Megumi Sakai, Masakazu Iwamura, Shinichiro Omachi, Koichi Kise, Instance-Based Skew Estimation of Document Images by a Combination of Variant and Invariant, Proceedings of the Second International Workshop on Camera-Based Document Analysis and Recognition 2007 (CBDAR 2007), 53-60, 2007.09.
220. Ken'ichi Morooka, Hiroshi Masuda, Ryo Kurazume, Xian Chen, Seiichi Uchida, Kenji Hara, Makoto Hashizume, Real time estimation of deforming organs by neural network for endoscopic surgery simulator, Proceedings of The First International Symposium on Information and Computer Elements, 2007.09.
221. Masakazu Iwamura, Ryo Niwa, Koichi Kise, Seiichi Uchida, Shinichiro Omachi, Rectifying Perspective Distortion into Affine Distortion Using Variants and Invariants, Proceedings of the Second International Workshop on Camera-Based Document Analysis and Recognition 2007 (CBDAR 2007), 138-145, 2007.09.
222. Atsutoshi Shimeno, Seiichi Uchida, Ryo Kurazume, Rin-ichiro Taniguchi, Tsutomu Hasegawa, Separation and tracking of moving object using rough motion information from the object, Proceedings of The First International Symposium on Information and Computer Elements, 2007.09.
223. Seiichi Uchida, Megumi Sakai, Masakazu Iwamura, Shinichiro Omachi, Extraction of embedded class information from universal character pattern, ICDAR 2007: NINTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS I AND II, PROCEEDINGS, 10.1109/ICDAR.2007.4378747, 437-+, 2007.09, This paper is concerned with a universal pattern, which is defined as a character pattern designed to have high machine-readability. This universal pattern is a character pattern printed with stripes. The cross ratio calculated,front the widths of the stripes represents the character class. Thus, if the boundaries of the stripes cat? be detected for measuring the widths, the class can be determined without ordinary recognition process. Furthermore, since the cross ratio is invariant to projective distortions, the correct class will be still determined under those distortions. This paper describes a practical scheme to recognize this universal pattern. The proposed scheme includes a novel algorithm to detect the stripe boundaries stably even front the universal pattern image contaminated by non-uniform lighting and noise. The algorithm is realized by a combination of a dynamic programming-based optimal boundary detection and a finite state automaton which represents the property of the universal pattern. Experimental results showed the proposed scheme could recognize 99.6% of the universal pattern images which underwent heavy projective distortions and non-uniform lighting..
224. V. Bucha, S. Uchida, S. Ablameyko, Image pixel force fields and their application for color map vectorisation, ICDAR 2007: NINTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS I AND II, PROCEEDINGS, 10.1109/ICDAR.2007.4377111, 1228-+, 2007.09, Pixel force field is a novel image representation where at each pixel a two-dimensional vector is defined for representing the circumstance of the pixel. The vector is oriented to the center of the region composed of vectors having the same qualitative property, such as color and grqv-scale level. Using the pixel force field, that is, the orientation and the magnitude of the vector, many fundamental and specific image processing tasks can be solved As examples of the tasks, the force field is applied to color image thinning, color image segmentation, and color map vectorisation..
225. Roman Bertolami, Seiichi Uchida, Matthias Zimmermann, Horst Bunke, Non-uniform slant correction for handwritten text line recognition, ICDAR 2007: NINTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS I AND II, PROCEEDINGS, 10.1109/ICDAR.2007.4378668, 18-+, 2007.09, In this paper we apply a novel non-uniform slant Correction preprocessing technique to improve the recognition offline handwritten text lines. The local slant correction is of expressed as a global optimisation problem of the sequence of local slant angles. This is different to conventional slant removal techniques that rely on the average slant angle. Experiments based on a state-of-the-art handwritten text line recogniser show a significant gain in word level accuracy for the investigated preprocessing methods..
226. Daiki Baba, Seiichi Uchida, Hiroaki Sakoe, Predictive DP matching for on-line character recognition, ICDAR 2007: NINTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS I AND II, PROCEEDINGS, 10.1109/ICDAR.2007.4377000, 674-678, 2007.09, For on-line character recognition. predictive DP matching is proposed where two physically different features, coordinate features and directional features, are handled in a unified manner For this unification, the distance of the directional features is converted into a distance of the coordinate features by a feature prediction technique. An experimental result showed that the predictive LIP matching could attain a recognition rate comparable to the rate by the conventional DP matching which requires the costly optimization of the weight to balance the two features..
227. FSA-Guided Optimal Segmentation and Its Application to Camera-Based Character Recognition
本論文では,動的計画法(DP)と有限状態オートマトン(FSA)の組合せに基づいた,一次元信号の最適セグメンテーション手法を提案する.具体的には,信号の性質(例えば信号の値が高い区間と低い区間が交互に繰り返すと言った性質)をFSA表現した上で制約条件としてセグメンテーション問題に組み込み,その制約下での大局的最適セグメンテーションをDPにより効率的に求める.FSAの導入により,信号の性質と一致しないセグメンテーション結果は排除され,精度の向上が見込める.更に,FSA状態と各区間の対応結果によって各区間の意味付けも可能となる.本論文では本手法の詳細を述べるとともに,更にある種の実環境文字画像認識タスクに適用することでその有効性を評価する..
228. Fast 3D Shape Reconstruction of Moving Object by Parallel Fast Level Set Method
多数台のカメラによりシーン内に存在する対象物体の全周の幾何情報及び光学情報を取得し,任意視点からの画像を生成する手法として,視体積交差法と多視点ステレオ法が提案されている.しかしこれらの手法は単一物体あるいはオクルージョンの生じない複数物体を対象とした手法であり,シーン内に複数物体が存在し物体間に相互オクルージョンが生じる場合,それぞれの物体形状を同時に復元することは困難であった.この問題に対し,我々はこれまでに高速な境界追跡手法であるFast Level Set Methodを複数ステレオ距離画像に適用し,複数対象物体の三次元形状をオクルージョンに頑強に復元するシステムを構築している.本論文では,これまでに構築したシステムを8台の計算機からなるPCクラスタへ実装し,Fast Level Set Method処理の並列計算により,より高速な三次元形状の復元を実現する.また対象物体が移動する場合,その移動方向を予測し,移動体を処理する計算機の計算負荷を低減することで,移動体の正確な三次元形状を遅れなく復元する手法を提案する.更に,舞踊の測定実験により,対象が高速に移動しても,従来システムと比較してより正確な三次元形状の復元が可能であることを示す..
229. Analytical DP Matching
パターン認識・画像処理において多用される弾性マッチング手法に動的計画法によるマッチング,いわゆるDPマッチングがある.DPマッチングは離散化された最適化問題の幅優先探索に基づく解法であり,したがって探索の幅が非常に大きくなる問題に対しては適用困難であった.この問題を解決すべく本論文では解析的DPマッチングを提案する.本手法では,マッチングの評価に用いられる局所的な誤差関数を二次関数近似することで,幅優先探索なしに解析的に近似解(二次関数近似された問題の厳密解)を与えることができる.本論文では一次元パターンに対するマッチングアルゴリズムを導出し,更に実際の問題に適用し得ることをオンライン文字データを用いて実験的に検証する..
230. Detection of Similar Sub-Sequence by Logical DP Matching
本論文では,論理判定型DPマッチングによる類似区間検出手法について提案する.論理判定型DPマッチングとは,サポートと呼ばれる論理関数を基準として用いて二つのパターン間の非線形マッチングを行うアルゴリズムである.本手法の特徴は,パターン間に複数存在する類似区間の始端及び終端をマッチングの過程で最適に決定していく点にある.また,本手法の有効性を評価するための一応用として,ジェスチャの基本動作抽出についても検討する.実験の結果,本手法の基本的な性能を示すことができた..
231. Masakazu Suzuki, Christopher Malon, Seiichi Uchida, Databases of mathematical documents, Research Reports on Information Science and Electrical Engineering of Kyushu University, 12, 1, 302-306, 2007.04, This paper describes the specifications for three ground-truthed mathematical character and symbol image databases, called InftyCDB-1, InftyCDB-2, and InftyCDB-3. In the former two databases, the ground-truth of each character is composed of type, font, quality (touching/broken) and link (relative position), etc. InftyCDB-1 includes all the characters and symbols of 30 articles on mathematics, and is organized so that it can be used as word image database or as mathematical formula image database. InftyCDB-2, which is a continuation of InftyCDB-1, includes 37 articles including French and German articles and is organized like InftyCDB-1. InftyCDB-3 is a single character database for training and evaluating single-character recognition engines..
232. Masakazu Suzuki, Christopher Malon, Seiichi Uchida, Databases of mathematical documents, Research Reports on Information Science and Electrical Engineering of Kyushu University, 12, 1, 302-306, 2007.04, This paper describes the specifications for three ground-truthed mathematical character and symbol image databases, called InftyCDB-1, InftyCDB-2, and InftyCDB-3. In the former two databases, the ground-truth of each character is composed of type, font, quality (touching/broken) and link (relative position), etc. InftyCDB-1 includes all the characters and symbols of 30 articles on mathematics, and is organized so that it can be used as word image database or as mathematical formula image database. InftyCDB-2, which is a continuation of InftyCDB-1, includes 37 articles including French and German articles and is organized like InftyCDB-1. InftyCDB-3 is a single character database for training and evaluating single-character recognition engines..
233. Masakazu Suzuki, Christopher Malon, Seiichi Uchida, Databases of mathematical documents, Research Reports on Information Science and Electrical Engineering of Kyushu University, 12, 1, 302-306, 2007.04, This paper describes the specifications for three ground-truthed mathematical character and symbol image databases, called InftyCDB-1, InftyCDB-2, and InftyCDB-3. In the former two databases, the ground-truth of each character is composed of type, font, quality (touching/broken) and link (relative position), etc. InftyCDB-1 includes all the characters and symbols of 30 articles on mathematics, and is organized so that it can be used as word image database or as mathematical formula image database. InftyCDB-2, which is a continuation of InftyCDB-1, includes 37 articles including French and German articles and is organized like InftyCDB-1. InftyCDB-3 is a single character database for training and evaluating single-character recognition engines..
234. Masakazu Suzuki, Christopher Malon, Seiichi Uchida, Databases of mathematical documents, Research Reports on Information Science and Electrical Engineering of Kyushu University, 12, 1, 302-306, 2007.04, This paper describes the specifications for three ground-truthed mathematical character and symbol image databases, called InftyCDB-1, InftyCDB-2, and InftyCDB-3. In the former two databases, the ground-truth of each character is composed of type, font, quality (touching/broken) and link (relative position), etc. InftyCDB-1 includes all the characters and symbols of 30 articles on mathematics, and is organized so that it can be used as word image database or as mathematical formula image database. InftyCDB-2, which is a continuation of InftyCDB-1, includes 37 articles including French and German articles and is organized like InftyCDB-1. InftyCDB-3 is a single character database for training and evaluating single-character recognition engines..
235. Masakazu Suzuki, Christopher Malon, Seiichi Uchida, Databases of mathematical documents, Research Reports on Information Science and Electrical Engineering of Kyushu University, 12, 1, 302-306, 2007.04, This paper describes the specifications for three ground-truthed mathematical character and symbol image databases, called InftyCDB-1, InftyCDB-2, and InftyCDB-3. In the former two databases, the ground-truth of each character is composed of type, font, quality (touching/broken) and link (relative position), etc. InftyCDB-1 includes all the characters and symbols of 30 articles on mathematics, and is organized so that it can be used as word image database or as mathematical formula image database. InftyCDB-2, which is a continuation of InftyCDB-1, includes 37 articles including French and German articles and is organized like InftyCDB-1. InftyCDB-3 is a single character database for training and evaluating single-character recognition engines..
236. Masakazu Suzuki, Christopher Malon, Seiichi Uchida, Databases of mathematical documents, Research Reports on Information Science and Electrical Engineering of Kyushu University, 12, 1, 302-306, 2007.04, This paper describes the specifications for three ground-truthed mathematical character and symbol image databases, called InftyCDB-1, InftyCDB-2, and InftyCDB-3. In the former two databases, the ground-truth of each character is composed of type, font, quality (touching/broken) and link (relative position), etc. InftyCDB-1 includes all the characters and symbols of 30 articles on mathematics, and is organized so that it can be used as word image database or as mathematical formula image database. InftyCDB-2, which is a continuation of InftyCDB-1, includes 37 articles including French and German articles and is organized like InftyCDB-1. InftyCDB-3 is a single character database for training and evaluating single-character recognition engines..
237. Pattern Recognition with Supplementary Information
本論文ではパターンが属するクラスの情報(付加情報)をパターンと同時に識別器に入力し,パターンと付加情報から矛盾のない答を導くことで誤認識を防ぐ方式を検討する.この方式では付加情報の情報量が増えれば増えるほど認識率は100%に近づく.そのため,従来のパターン認識のように,いかに認識性能を向上させるかではなく,ある認識率を達成するために必要な付加情報の情報量をいかに小さくできるかが課題となる.本論文では付加情報の割当方と認識性能の関係を導き,実験によりデモンストレーションする..
238. Early Recognition and Prediction of Gestures for Embodied Proactive Human Interface
This paper concerns three topics for realizing embodied“proactive”human interface, where a humanoid is used as an interface capable of making some reaction against to user's gesture input in advance to the termination of the gesture. The first topic is early recognition of gestures: the recognition result of a gesture is provided at the beginning part of the gesture. The second topic is motion prediction: the subsequent posture of the person who makes a gesture is predicted by using the result of early recognition. The third topic is a network model constructed for improving the performance of early recognition and motion prediction. The effectiveness of these methods was shown by experimental results..
239. Seiichi Uchida, Kazuhiro Tanaka, Masakazu Iwamura, Shinichiro Omachi, Koichi Kise, A Data-Embedding Pen, Proceedings of the 10th International Workshop on Frontiers in Handwriting Recognition (IWFHR-10), 2006.10.
240. Yutaka Araki, Daisaku Arita, Rin-ichiro Taniguchi, Seiichi Uchida, Ryo Kurazume, Tsutomu Hasegawa, Construction of symbolic representation from human motion information, KNOWLEDGE-BASED INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT 2, PROCEEDINGS, 10.1007/11893004_27, 4252, 212-219, 2006.10, In general, avatar-based communication has a merit that it can represent non-verbal information. The simplest way of representing the non-verbal information is to capture the human action/motion by a motion capture system and to visualize the received motion data through the avatar. However, transferring raw motion data often makes the avatar motion unnatural or unrealistic because the body structure of the avatar is usually a bit different from that of the human beings. We think this can be solved by transferring the meaning of motion, instead of the raw motion data, and by properly visualizing the meaning depending on characteristics of the avatar's function and body structure. Here, the key issue is how to symbolize the motion meanings. Particularly, the problem is what kind of motions we should symbolize. In this paper, we introduce an algorithm to decide the symbols to be recognized referring to accumulated communication data, i.e., motion data..
241. Recognition and Understanding of Characters and Documents Using Digital Cameras
ディジタルスチルカメラやビデオカメラの普及と発展に伴って,撮影した画像内の文字・文書を情報処理に利用したいという要求が高まっている.本稿では,このようなカメラを用いた文字・文書の認識・理解を通して,我々は何を得ることができるのか,また実現には何が問題であり,現在どのような取組みがなされているのかについて解説する.加えて,残された研究課題について触れるとともに,エーザインタフェースへの適用の視点から筆者らが進めている新しい試みについても紹介する..
242. Shinichiro Omachi, Seiichi Uchida, Masakazu Iwamura, Koichi Kise, Affine invariant information embedment for accurate camera-based character recognition, 18TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 2, PROCEEDINGS, 10.1109/ICPR.2006.229, 1098-+, 2006.08, Recognizing characters in a scene image taken by a digital camera has been studied for decades. However it is still a challenging problem to achieve high accuracy. In this paper we propose a method of embedding information in a character pattern so that the class of the character can be identified. The information should be robust against geometric distortions since an image taken by a digital camera is usually geometrically distorted. In the proposed method, a character pattern is designed in two colors so that the information is embedded as the area ratio of regions of two colors. Since the area ratio is affine invariant, it is expected that the area ratio is correctly extracted even if a character image is affine-transformed. We generate character patterns with the embedded information and discuss the effectiveness of the proposed.
243. Wenjie Cai, Seiichi Uchida, Hiroaki Sakoe, An efficient radical-based algorithm for stroke-order-free online Kanji character recognition, 18TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 2, PROCEEDINGS, 10.1109/ICPR.2006.241, 986-+, 2006.08, This paper investigates improvements of an online handwriting stroke-order analysis algorithm - cube search, based on cube graph stroke-order generation model and dynamic programming (DP). By dividing character into radicals, the model is decomposed into intra-radical graphs and an inter-radical graph. This decomposition considerably reduces the time complexity of stroke-order search DP Experimental results showed an significant improvements in operational speed Additionally, recognition accuracy was also improved by prohibiting unnatural stroke-order.
244. Akihiro Mori, Seiichi Uchida, Ryo Kurazume, Early recognition and prediction of gestures, 18TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 3, PROCEEDINGS, 10.1109/ICPR.2006.467, 560-+, 2006.08, This paper is concerned with an early recognition and prediction algorithm of gestures. Early recognition is the algorithm to provide recognition results before input gestures are completed. Motion prediction is the algorithm to predict the subsequent posture of the performer by using early recognition. In addition to them, this paper considers a gesture network for improving the performance of these algorithms. The performance of the proposed algorithm was evaluated by experiments of real-time control of a humanoid by gestures..
245. Ryo Kurazume, Hiroaki Omasa, Seiichi Uchida, Rinichiro Taniguchi, Tsutornu Hasegawa, Embodied Proactive Human Interface "PICO-2", 18TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 2, PROCEEDINGS, 10.1109/ICPR.2006.488, 1233-+, 2006.08, We are conducting research on "Embodied Proactive Human Interface". The aim of this research is to develop a new human-friendly active interface based on two key technologies, an estimation mechanism of human intention for supporting natural communication named "Proactive Interface", and a tangible device using robot technology. This paper introduces the humanoid-type Two-legged robot named "PICO-2", which was developed as a tangible telecommunication device for the proactive human interface. In order to achieve the embodied telecommunication with PICO-2, we propose new tracking technique of human gestures using a monocular video camera mounted on PICO-2, and natural gesture reproduction by PICO-2 which absorbs the difference of body structure between the user and the robot..
246. A. Nedzved, S. Uchida, S. Ablameyko, Gray-scale thinning by using a pseudo-distance map, 18TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 2, PROCEEDINGS, 10.1109/ICPR.2006.618, 239-+, 2006.08, In this paper, the algorithm for thinning of grey-scale images is proposed that is based on a pseudo-distance map (PDM). The PDM is a simplified distance map of gray-scale image and uses only that features of image and objects that are necessary to build a skeleton. The algorithm works fast for large gray-scale images and allows constructing a high quality skeleton..
247. V. Bucha, S. Uchida, S. Ablameyko, Interactive road extraction with pixel force fields, 18TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 4, PROCEEDINGS, 10.1109/ICPR.2006.720, 829-+, 2006.08, Pixel force field (PFF) is a novel image representation where at each pixel a two-dimensional vector is defined for representing interaction of pixels. The vector is oriented to the center of the region composed of pixels having the same qualitative property, such as color and gray-scale level. Using the pixel force field and improved live-wire segmentation technique the task of interactive road extraction from remote sensing images is solved..
248. Seiichi Uchida, Masakazu Wamura, Shinichiro Omachi, Koichi Kise, OCR fonts revisited for camera-based character recognition, 18TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 2, PROCEEDINGS, 10.1109/ICPR.2006.891, 1134-+, 2006.08, In order to realize accurate camera-based character recognition, machine-readable class information is embedded into each character image. Specifically, each character image is printed with a pattern which comprises five stripes and the cross ratio derived from the pattern represents class information. Since the cross ratio is a projective invariant, the class information is extracted correctly regardless of camera angle. The results of simulation experiments showed that recognition rates over 99% were obtained by the extracted cross ratio under heavy projective distortions..
249. Christopher Malon, Seiichi Uchida, Masakazu Suzuki, Support vector machines for mathematical symbol recognition, STRUCTURAL, SYNTACTIC, AND STATISTICAL PATTERN RECOGNITION, PROCEEDINGS, 10.1007/11815921_14, 4109, 136-144, 2006.08, Mathematical formulas challenge an OCR system with a range of similar-looking characters whose bold, calligraphic, and italic varieties must be recognized distinctly, though the fonts to be used in an article are not known in advance. We describe the use of support vector machines (SVM) to learn and predict about 300 classes of styled characters and symbols..
250. Category Data Embedding for Camera-Based Character Recognition
本研究は,バーコードと同程度の精度で三次元実環境中の文字パターンを認識することを目標としている.実環境中の文字パターンは,撮影状況により様々なひずみ,例えば射影変換ひずみを受ける.このため,通常の文字認識手法の延長線上でこの目標を達成しようとしても,相当の困難が予想される.そこで本論文では,文字そのものに機械可読性を補強するような情報を埋め込む方式を検討する.具体的には,文字画像に対し,しま模様状のパターンを埋め込む.このパターンを構成する各しまの幅から計算される複比は,文字パターンがどのように射影変換ひずみを受けてたとしても常に一定値となる.したがって,カテゴリーと複比の値をあらかじめ対応づけておけば,抽出された複比を識別の手掛りとして認識時に利用できる.シミュレーション実験の結果,複比と文字形状情報を併用することで,射影変換ひずみを受けても非常に高い認識精度が得られることが分かった..
251. S Toyota, S Uchida, M Suzuki, Structural analysis of mathematical formulae with verification based on formula description grammar, DOCUMENT ANALYSIS SYSTEMS VII, PROCEEDINGS, 10.1007/11669487_14, 3872, 153-163, 2006.02, In this paper, a reliable and efficient structural analysis method for mathematical formulae is proposed for practical mathematical OCR. The proposed method consists of three steps. In the first step, a fast structural analysis algorithm is performed on each mathematical formula to obtain a tree representation of the formula. This step generally provides a correct tree representation but sometimes provides an erroneous representation. Therefore, the tree representation is verified by the following two steps. In the second step, the result of the analysis step, (i.e., a tree representation) is converted into a one-dimensional representation. The third step is a verification step where the one-dimensional representation is parsed by a formula description grammar, which is a context-free grammar specialized for mathematical formulae. If the one-dimensional representation is not accepted by the grammar, the result of the analysis step is detected as an erroneous result and alarmed to OCR users. This three-step organization achieves reliable and efficient structural analysis without any two-dimensional grammars..
252. Design and recognition of human-readable and machine-readable patterns
In this paper, design and recognition of human-readable and machine-readable patterns are investigated. Specifically speaking, we design character images printed with a horizontal stripe pattern, called a cross ratio pattern. The cross ratio derived from the cross ratio pattern represents the class information of the character. Since the cross ratio is invariant to projective distortion, the class information is extracted correctly regardless of camera angle. The character image itself is human-readable and therefore the character image with the cross ratio pattern is not only humanreadable and but also machine-readable and can be used as a medium for human-machine communication..
253. Seiichi Uchida, Akihiro Nomura, Masakazu Suzuki, Quantitative analysis of mathematical documents, International Journal on Document Analysis and Recognition, 10.1007/s10032-005-0142-y, 7, 4, 211-218, 2005.09, Mathematical documents are analyzed from several viewpoints for the development of practical OCR for mathematical and other scientific documents. Specifically, four viewpoints are quantified using a large-scale database of mathematical documents, containing 690,000 manually ground-truthed characters: (i) the number of character categories, (ii) abnormal characters (e.g., touching characters), (iii) character size variation, and (iv) the complexity of the mathematical expressions. The result of these analyses clarifies the difficulties of recognizing mathematical documents and then suggests several promising directions to overcome them. © Springer-Verlag Berlin/Heidelberg 2005..
254. S Uchida, H Sakoe, A survey of elastic matching techniques for handwritten character recognition, IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 10.1093/ietisy/e88-d.8.1781, E88D, 8, 1781-1790, 2005.08, This paper presents a survey of elastic matching (EM) techniques employed in handwritten character recognition. EM is often called deformable template, flexible matching, or nonlinear template matching, and defined as the optimization problem of two-dimensional warping (2DW) which specifies the pixel-to-pixel correspondence between two subjected character image patterns. The pattern distance evaluated under optimized 2DW is invariant to a certain range of geometric deformations. Thus, by using the EM distance as a discriminant function, recognition systems robust to the deformations of handwritten characters can be realized. In this paper, EM techniques are classified according to the type of 2DW and the properties of each class are outlined. Several topics around EM, such as the category-dependent deformation tendency of handwritten characters, are also discussed..
255. Seiichi Uchida, Masakazu Iwamura, Shinichiro Omachi, Koichi Kise, Data Embedding for Camera-Based Character Recognition, Proceedings of the First International Workshop on Camera-Based Document Analysis and Recognition (CBDAR2005), 60-67, 2005.08.
256. Seiichi Uchida, Hiromitsu Miyazaki, and Hiroaki Sakoe, Mosaicing-by-recognition for recognizing texts captured in multiple video frames, First International Workshop on Camera-Based Document Analysis and Recognition 2005 (CBDAR 2005, Seoul, Korea), 3-9, 2005.08, Text recognition in video frames is promising because of its following superiorities over text recognition in a still camera image: (1) it is possible to recognize longer texts by concatenating the frames, and (2) it is also possible to improve the quality of the text image by integrating the frames. In this paper, a mosaicing-by-recognition technique is proposed where video mosaicing and text recognition are simultaneously and collaboratively performed in a one-step manner by a dynamic programming-based optimization algorithm. In this optimization algorithm, rotation, scaling, vertical shift, and speed fluctuation of camera motion are efficiently compensated. The results of experiments to evaluate not only the accuracy of text recognition but also that of video mosaicing indicates that the proposed technique is practical and can provide reasonable results in most cases..
257. Masakazu Iwamura, Seiichi Uchida, Shinichiro Omachi, Koichi Kise, Recognition with Supplementary Information -How Many Bits Are Lacking for 100% Recognition?-, Proceedings of the First International Workshop on Camera-Based Document Analysis and Recognition (CBDAR2005), 68-75, 2005.08.
258. Seiichi Uchida, Hiromitsu Miyazaki, and Hiroaki Sakoe, Mosaicing-by-recognition for recognizing texts captured in multiple video frames, First International Workshop on Camera-Based Document Analysis and Recognition 2005 (CBDAR 2005, Seoul, Korea), 3-9, 2005.08, Text recognition in video frames is promising because of its following superiorities over text recognition in a still camera image: (1) it is possible to recognize longer texts by concatenating the frames, and (2) it is also possible to improve the quality of the text image by integrating the frames. In this paper, a mosaicing-by-recognition technique is proposed where video mosaicing and text recognition are simultaneously and collaboratively performed in a one-step manner by a dynamic programming-based optimization algorithm. In this optimization algorithm, rotation, scaling, vertical shift, and speed fluctuation of camera motion are efficiently compensated. The results of experiments to evaluate not only the accuracy of text recognition but also that of video mosaicing indicates that the proposed technique is practical and can provide reasonable results in most cases..
259. Seiichi Uchida, Hiromitsu Miyazaki, and Hiroaki Sakoe, Mosaicing-by-recognition for recognizing texts captured in multiple video frames, First International Workshop on Camera-Based Document Analysis and Recognition 2005 (CBDAR 2005, Seoul, Korea), 3-9, 2005.08, Text recognition in video frames is promising because of its following superiorities over text recognition in a still camera image: (1) it is possible to recognize longer texts by concatenating the frames, and (2) it is also possible to improve the quality of the text image by integrating the frames. In this paper, a mosaicing-by-recognition technique is proposed where video mosaicing and text recognition are simultaneously and collaboratively performed in a one-step manner by a dynamic programming-based optimization algorithm. In this optimization algorithm, rotation, scaling, vertical shift, and speed fluctuation of camera motion are efficiently compensated. The results of experiments to evaluate not only the accuracy of text recognition but also that of video mosaicing indicates that the proposed technique is practical and can provide reasonable results in most cases..
260. Seiichi Uchida, Hiromitsu Miyazaki, and Hiroaki Sakoe, Mosaicing-by-recognition for recognizing texts captured in multiple video frames, First International Workshop on Camera-Based Document Analysis and Recognition 2005 (CBDAR 2005, Seoul, Korea), 3-9, 2005.08, Text recognition in video frames is promising because of its following superiorities over text recognition in a still camera image: (1) it is possible to recognize longer texts by concatenating the frames, and (2) it is also possible to improve the quality of the text image by integrating the frames. In this paper, a mosaicing-by-recognition technique is proposed where video mosaicing and text recognition are simultaneously and collaboratively performed in a one-step manner by a dynamic programming-based optimization algorithm. In this optimization algorithm, rotation, scaling, vertical shift, and speed fluctuation of camera motion are efficiently compensated. The results of experiments to evaluate not only the accuracy of text recognition but also that of video mosaicing indicates that the proposed technique is practical and can provide reasonable results in most cases..
261. M Suzuki, S Uchida, A Nomura, A ground-truthed mathematical character and symbol image database, Eighth International Conference on Document Analysis and Recognition, Vols 1 and 2, Proceedings, 10.1109/ICDAR.2005.14, 675-679, 2005.08, This paper describes the specifications for our ground-truthed mathematical character and symbol image database, called InftyCDB-1. The ground-truth of each character is composed of type, font, quality (touched/broken) and link (relative position), etc. The database includes all the characters and symbols of 467 pages of 30 articles on mathematics, and is organized so that it can be used as word image database or as mathematical formula image database. lrftyCDB-1 is a public database that is freely usable for research and development purposes..
262. D Okumura, S Uchida, H Sakoe, An HMM implementation for on-line handwriting recognition based on pen-coordinate feature and pen-direction feature, Eighth International Conference on Document Analysis and Recognition, Vols 1 and 2, Proceedings, 10.1109/ICDAR.2005.50, 26-30, 2005.08, An on-line handwritten character recognition technique based on a new HMM is proposed. In the proposed HMM, not only pen-direction feature but also pen-coordinate feature are separately utilized for describing the shape variation of on-line characters accurately. Specifically speaking, the proposed HMM outputs a pen-coordinate feature at each inter-state transition and outputs a pen-direction feature at each intra-state transition, i.e., self-transition. Thus, each state of the proposed HMM can specify the starting position and the direction of a line segment by its incoming inter-state transition and intra-state transition, respectively. The results of recognition experiments on 10-stroke Chinese characters show that the proposed HMM outperforms the conventional HMM which does not use the pen-coordinate feature because of its non-stationarity..
263. H Ezaki, S Uchida, A Asano, H Sakoe, Dewarping of document image by global optimization, EIGHTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS 1 AND 2, PROCEEDINGS, 10.1109/ICDAR.2005.87, 302-306, 2005.08, This paper proposes a novel dewarping technique for document images of bound volumes. This technique is a kind of model fitting techniques for estimating the warp of each text line by fitting some elastic curve model to the text line. Differing from conventional techniques, the proposed technique is applicable to document images including local irregularities such as formulae, short text lines, and figures, since the proposed technique dewarps whole document images by fitting splines while considering the global optimality that specifies the desirable relationship among the splines. The experimental results on several document images including the local irregularities indicated the effectiveness of the proposed technique. The experimental result also indicated the effectiveness of the vertical division of a document image into some partial document images for more accurate dewarping..
264. Seiichi Uchida, Hiromitsu Miyazaki, and Hiroaki Sakoe, Mosaicing-by-recognition for recognizing texts captured in multiple video frames, First International Workshop on Camera-Based Document Analysis and Recognition 2005 (CBDAR 2005, Seoul, Korea), 3-9, 2005.08, Text recognition in video frames is promising because of its following superiorities over text recognition in a still camera image: (1) it is possible to recognize longer texts by concatenating the frames, and (2) it is also possible to improve the quality of the text image by integrating the frames. In this paper, a mosaicing-by-recognition technique is proposed where video mosaicing and text recognition are simultaneously and collaboratively performed in a one-step manner by a dynamic programming-based optimization algorithm. In this optimization algorithm, rotation, scaling, vertical shift, and speed fluctuation of camera motion are efficiently compensated. The results of experiments to evaluate not only the accuracy of text recognition but also that of video mosaicing indicates that the proposed technique is practical and can provide reasonable results in most cases..
265. H Miyazaki, S Uchida, H Sakoe, Mosaicing-by-recognition: a technique for video-based text recognition, EIGHTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS 1 AND 2, PROCEEDINGS, 10.1109/ICDAR.2005.161, 904-908, 2005.08, In this paper, a mosaicing-by-recognition technique is proposed where video mosaicing and text recognition are simultaneously and collaboratively optimized in a one-step manner Specifically, mulliple frames capturing a long text line are optimally concatenated with a guide of the text recognition framework. In this optimization process, rotation, scaling, vertical shift, and speed fluctuation, which often appear in video frames captured by hand-held cameras, are compensated. The optimization is performed by a DP-based algorithm. The results of experiments to evaluate not only the accuracy of text recognition but also that of video mosaicing indicates that the proposed technique is practical and can provide reasonable results in most cases..
266. H Mitoma, S Uchida, H Sakoe, Online character recognition based on elastic matching and quadratic discrimination, EIGHTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS 1 AND 2, PROCEEDINGS, 10.1109/ICDAR.2005.178, 36-40, 2005.08, We try to link elastic matching with a statistical discrimination framework to overcome the overfitting problem which often degrades the performance of elastic matching based online character recognizers. In the proposed technique, elastic matching is used just as an extractor of a feature vector representing the difference between input and reference patterns. Then quadratic discrimination is performed under the assumption that the feature vector is governed by a Gaussian distribution. The result of a recognition experiment on UNIPEN database (Train-R01/V071a) showed that the proposed technique can attain a high recognition rate (97.95%) and outperforms a recent elastic matching-based recognizer..
267. Seiichi Uchida, Hiromitsu Miyazaki, and Hiroaki Sakoe, Mosaicing-by-recognition for recognizing texts captured in multiple video frames, First International Workshop on Camera-Based Document Analysis and Recognition 2005 (CBDAR 2005, Seoul, Korea), 3-9, 2005.08, Text recognition in video frames is promising because of its following superiorities over text recognition in a still camera image: (1) it is possible to recognize longer texts by concatenating the frames, and (2) it is also possible to improve the quality of the text image by integrating the frames. In this paper, a mosaicing-by-recognition technique is proposed where video mosaicing and text recognition are simultaneously and collaboratively performed in a one-step manner by a dynamic programming-based optimization algorithm. In this optimization algorithm, rotation, scaling, vertical shift, and speed fluctuation of camera motion are efficiently compensated. The results of experiments to evaluate not only the accuracy of text recognition but also that of video mosaicing indicates that the proposed technique is practical and can provide reasonable results in most cases..
268. Realization of 100% Recognition Rate with Supplementary Information - For Seamless Man-Machine Communication -.
269. An Efficient Stroke-Order-Free On-Line Character Recognition Algorithm Based on Radical Reference Pattern
グラフサーチにより最適画間対応を定めて筆順自由性を実現するオンライン文字認識法であるキューブサーチ法の動作速度と認識精度の改善を検討した.筆順変動を部首内の変動と部首間の変動に分離して, 部首単位標準パターンに基づく2段階のキューブサーチアルゴリズムを構成した.併せて, 処理量最小化条件を含む部首単位分割の指針を示した.教育漢字を対象とする画数固定条件での認識実験により, 速度, 精度両面での改善が確認され, 併せて, 処理量最小化部首分割条件の妥当性が確認された..
270. Seiichi Uchida, Hiroaki Sakoe, Category-dependent elastic matching based on a linear combination of eigen-deformations, Systems and Computers in Japan, 10.1002/scj.20229, 36, 5, 13-22, 2005.05, A new elastic image matching (EM) technique based on a category-dependent deformation model is proposed. In the deformation model, any deformation of a category is described by a linear combination of eigen-deformations, which are frequent deformation directions of the category and can be estimated statistically from the actual deformations. Experimental results on handwritten characters show that the proposed technique can attain higher recognition rates than conventional EM techniques based on the affine deformation model, which is a typical category-independent deformation model. The results also show the superiority of the proposed technique over those conventional EM techniques in computational efficiency. © 2005 Wiley Periodicals, Inc..
271. Foreword: Special section on document image understanding and digital documents.
272. Online Character Recognition Using Elastic Matching and Eigen-deformations(Image Processing)(Next Generation Mobile Communication Systems)
In online character recognition based on elastic matching, such as DP matching, many of misrecognitions are often due to overfitting, which is the phenomenon that a wrong reference pattern is closely fitted an input pattern by the matching. In this report, a technique to reduce those misrecognitions is proposed, where frequent deformations of each category, called eigen-deformations, are employed. In case of overfitting, the matching between the two patterns will not be expressed by the eigen-deformations of the category of the reference pattern. Thus, the overfitting can be detected by evaluating the divergence of the matching result from the eigen-deformations. The results of recognition experiment showed the usefulness of the proposed technique..
273. R Mitoma, S Uchida, H Sakoe, Online character recognition using eigen-deformations, NINTH INTERNATIONAL WORKSHOP ON FRONTIERS IN HANDWRITING RECOGNITION, PROCEEDINGS, 10.1109/IWFHR.2004.79, 3-8, 2004.08, In online character recognition based on elastic matching, such as dynamic programming matching, many of misrecognitions are often caused by overfitting, which is the phenomenon that the distance between reference pattern of at? incorrect category and an input pattern is underestimated by unnatural matching. In this paper a new recognition technique is proposed where category-specific deformations, called eigen-deformations, are utilized to suppress those misrecognitions. Generally, matching results at overfitting are not consistent with the eigen-deformations. Thus, the overfitting can be detected and penalized by a posterior evaluation of this inconsistency. The result of a recognition experiment showed tire usefulness of the proposed technique..
274. N Matsumoto, S Uchida, H Sakoe, Prototype setting for elastic matching-based image pattern recognition, PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 1, 10.1109/ICPR.2004.1334064, 224-227, 2004.08, The purpose of this paper is to emphasize the importance the consistency between the distance measures on prototype setting and discrimination in elastic matching (EM)based recognition. Specifically, this paper focuses on the following points: (i) confirmation of performance degradation when Euclidean distance is used on prototype setting whereas EM-distance is used on discrimination, and (ii) proposal of new prototype setting algorithm where this inconsistency is avoided Through an experiment of handwritten character recognition, the effectiveness of the proposed algorithm was quantified..
275. A Clustering Algorithm for Elastic Matching-Based Image Pattern Recognition
弾性マッチングに基づく画像パターン認識のための標準パターン設定法について述べる.本手法はクラスタリング法の一種であるが,従来法がユークリッド距離を基準としているのに対し,本手法では識別時と同じ弾性マッチングによる距離を基準とする..
276. R. Taniguchi, D. Arita, S. Uchida, R. Kurazume, and T. Hasegawa, Human action sensing for proactive human interface: Computer vision approach, Proceedings of International workshop on Processing Sensory Information for Proactive Systems (PSIPS 2004, Oulu, Finland), 2004.06.
277. E Taira, S Uchida, H Sakoe, Nonuniform slant correction for handwritten word recognition, IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, E87D, 5, 1247-1253, 2004.05, Slant correction is a preprocessing technique to improve segmentation and recognition accuracy for handwritten word recognition. All conventional slant correction techniques were performed by the estimation of the average slant angle and the shear transformation. In this paper, a nonuniform slant correction technique for handwritten word recognition is proposed where the slant correction problem is formulated as a global optimal estimation problem of the sequence of local slant angles. The optimal estimation is performed by a dynamic programming based algorithm. Front experimental results it was shown that the present technique outperforms conventional uniform slant correction techniques..
278. Eiji Taira, Seiichi Uchida, and Hiroaki Sakoe, Block boundary detection and title extraction for automatic bookshelf inspection, Tenth Korea-Japan Joint Workshop on Frontiers of Computer Vision (FCV2005, Fukuoka, Japan), 2004.02.
279. Category-Dependent Elastic Matching Based on a Linear Combination of Eigen-Deformations
画像パターンの認識において,パターンに生じた変形を補償するための手法として,弾性マッチングの利用が検討されている.従来法がすべてのカテゴリーに共通の変形特性を仮定していたのに対し,本論文では各カテゴリーに固有の変形特性を組み込んだ手法を提案する.具体的には,各カテゴリーの任意の変形をそのカテゴリーに固有ないくつかの変形の線形結合で表現する.その結果,各カテゴリー内に生じる変形だけが適切に補償されることになり,過変形の抑制及び計算効率の向上といった効果が得られる.本手法は,一種の非線形最適化問題として定式化される.本論文ではその解法についても述べ,実験を通して有効性を検証する..
280. Bookshelf Image Analysis Based on Model Fitting
本論文では画像処理による書籍管理を目的として書棚画像から各書籍の境界を検出する手法を提案する.従来法ではエッジや影からハフ変換などの直線検出法を用いて書籍境界を検出している.本手法では,そのような局所的な情報だけでなく大域的な最適性も考慮して,書棚画像の最適領域分割(各書籍の背表紙領域,書棚背景領域)を動的計画法に基づくアルゴリズムにより行い,各書籍の境界を検出する.更に最適化問題として定式化する際,書棚画像の文法モデルを組み込むことで高精度化を図っている.実験により,本手法の有効性を定性的及び定量的に確認した..
281. Eiji Taira, Seiichi Uchida, and Hiroaki Sakoe, A model-based book boundary detection technique for bookshelf image analysis, Asian Conference on Computer Vision (ACCV2004, Jeju Island, Korea), 2004.01.
282. Wenjie Cai, Seiichi Uchida, and Hiroaki Sakoe, A comparative study of stroke correspondence search algorithms for online kanji character recognition, International Symposium on Information Science and Electrical Engineering, E83D, 1, 109-111, 2003.11.
283. Eiji Taira, Seiichi Uchida, and Hiroaki Sakoe, Book boundary detection from bookshelf image based on model fitting, International Symposium on Information Science and Electrical Engineering, 534-537, 2003.11.
284. INFTY: an integrated OCR system for mathematical documents..
285. S Uchida, H Sakoe, Eigen-deformations for elastic matching based handwritten character recognition, PATTERN RECOGNITION, 10.1016/S0031-3203(03)00039-6, 36, 9, 2031-2040, 2003.09, Deformations in handwritten characters have category-dependent tendencies. In this paper, the estimation and the utilization of such tendencies called eigen-deformations are investigated for the better performance of elastic matching based handwritten character recognition. The eigen-deformations are estimated by the principal component analysis of actual deformations automatically collected by the elastic matching. From experimental results it was shown that typical deformations of each category can be extracted as the eigen-deformations. It was also shown that the recognition performance can be improved significantly by using the eigen-deformations for the detection of overfitting, which is the main cause of the misrecognition in the elastic matching based recognition methods. (C) 2003 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved..
286. A Nomura, K Michishita, S Uchida, M Suzuki, Detection and segmentation of touching characters in mathematical expressions, SEVENTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS I AND II, PROCEEDINGS, 10.1109/ICDAR.2003.1227645, 126-130, 2003.08, A technique for the detection and the segmentation of touching characters in mathematical expressions is presented. In the detection stage, a connected component initially recognized into some category is judged as a candidate of touched characters if its feature values deviate from the standard feature values of the category. In the segmentation stage, two component characters of the candidate are decided by the comparison with touching character images synthesized from two single character images. Experimental results showed the effectiveness on the accuracy improvement of the recognition of mathematical expressions..
287. S Uchida, H Sakoe, Handwritten character recognition using elastic matching based on a class-dependent deformation model, SEVENTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS I AND II, PROCEEDINGS, 10.1109/ICDAR.2003.1227652, 163-167, 2003.08, For handwritten character recognition, a new elastic image matching (EM) technique based on a class-dependent deformation model is proposed. In the deformation model, any deformation of a class is described by a linear combination of eigen-deformations, which are intrinsic deformation directions of the class. The eigen-deformations can be estimated statistically from the actual deformations of handwritten characters. Experimental results show that the proposed technique can attain higher recognition rates than conventional EM techniques based on class-independent deformation models. The results also show the superiority of the proposed technique over those conventional EM techniques in computational efficiency..
288. S Uchida, H Sakoe, A handwritten character recognition method based on unconstrained elastic matching and eigen-deformations, EIGHTH INTERNATIONAL WORKSHOP ON FRONTIERS IN HANDWRITING RECOGNITION: PROCEEDINGS, 10.1109/IWFHR.2002.1030887, 72-77, 2002.08, A fast elastic matching based handwritten character recognition method is investigated. In the present method, an unconstrained elastic matching technique, where the matching is optimized locally and individually on each pixel, is utilized together with its a posteriori evaluation based on the eigen-deformations of handwritten characters. Our experimental results show that high recognition rates can be attained by the present method with feasible computations..
289. S Uchida, MA Ronee, H Sakoe, Using eigen-deformations in handwritten character recognition, 16TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL I, PROCEEDINGS, 10.1109/ICPR.2002.1044795, 572-575, 2002.08, Deformations in handwritten characters have class-dependent tendencies. For example, characters of class "A" are often deformed by global slant transformation and never deformed to be similar to "R". In this paper, the extraction and the utilization of such tendencies called eigen-deformations are investigated for better performance of elastic matching based recognition systems. The eigen-deformations are extracted by the principal component analysis of actual deformations automatically collected by elastic matching. From experimental results it was shown that the extracted eigen-deformations represent typical deformations of each class. It was also shown that the recognition performance can be improved significantly by using the eigen-deformations in detecting overfitting, which often results in misrecognition..
290. R Bogush, S Maltsev, S Ablameyko, S Uchida, S Kamata, An efficient correlation computation method for binary images based on matrix factorisation, SIXTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, PROCEEDINGS, 10.1109/ICDAR.2001.953805, 312-316, 2001.09, A novel algorithm for complexity reduction in binary image processing, namely for computation of correlation between image and object template is proposed. This algorithm is based on direct computation of vector-matrix multiplication with utilisation of binary matrix factorisation approach. Comparison with other algorithms is given and it is shown that our approach allows to reduce tithe and complexity of this task..
291. MA Ronee, S Uchida, H Sakoe, Handwritten character recognition using piecewise linear two-dimensional warping, SIXTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, PROCEEDINGS, 10.1109/ICDAR.2001.953751, 39-43, 2001.09, In this paper, the effectiveness of piecewise linear two-dimensional warping, a dynamic programming-based elastic image matching technique, in handwritten character recognition is investigated. The present technique is capable of providing compensation for roost variations in character patterns with tractable computation. The superiority of the present technique over several conventional two-dimensional warping techniques in variation compensating is experimentally, justified. Another comparison with monotonic and continuous two-dimensional warping, a snore flexible matching technique, reveals that the present method takes far less computation than the latter, yet provides almost the same recognition accuracy for most categories..
292. S Uchida, E Taira, H Sakoe, Nonuniform slant correction using dynamic programming, SIXTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, PROCEEDINGS, 10.1109/ICDAR.2001.953827, 434-438, 2001.09, Slant correction is an indispensable technique for handwritten word recognition systems. Conventional slant correction techniques estimate the average slant angle of component characters and then correct the slant uniformly. Thus these conventional techniques will perform successfully under the assumption that each word is written with a constant slant. However, it is more widely acceptable assumption that the slant angle fluctuates during writing a word. In this paper, a nonuniform slant correction technique is presented where the slant correction problem is formulated as an optimal estimation problem of local slant angles tit all horizontal positions. The optimal estimation is governed by a criterion function and several constraints for the global and local validity of the local angles. The optimal local slant angles which maximize the criterion satisfying the constraints are searched for efficiently by a dynamic programming based algrithin. Experimental results show the advantageous characteristics of the present technique over the uniform slant correction techniques..
293. Piecewise Linear Two-Dimensional Warping.
294. S Uchida, H Sakoe, Piecewise linear two-dimensional warping, 15TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 3, PROCEEDINGS, 10.1109/ICPR.2000.903601, 534-537, 2000.09, A new efficient dynamic programming (DP) algorithm for 2D elastic matching is proposed. The present DP algorithm requires by far less complexity than previous DP-based elastic matching algorithms. This complexity reduction results from piecewise linearization of a 2D-2D wrapping which specifies an elastic matching bern een two given images. Since this linearization can be guided by a priori knowledge related to image patterns to be matched, the present DP algorithm often provides sufficient matching as is shown by experimental results..
295. A Handwritten Character Recognition Experiment Using Monotonic and Continuous Two-Dimensional Warping.
296. S Uchida, H Sakoe, An approximation algorithm for two-dimensional warping, IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, E83D, 1, 109-111, 2000.01, A new efficient two-dimensional warping algorithm is presented, in which sub-optimal warping is attained by iterating DP-based local optimization of warp on partially overlapping subplane sequence. From an experimental comparison with a conventional approximation algorithm based on beam search DP, relative superiority of the proposed algorithm is established..
297. Seiichi Uchida, Hiroaki Sakoe, Handwritten character recognition using monotonic and continuous two-dimensional warping, Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, 10.1109/ICDAR.1999.791834, 503-506, 1999.09, In this paper, a handwritten character recognition experiment using a monotonic and continuous two-dimensional warping algorithm is reported. This warping algorithm is based on dynamic programming and searches for the optimal pixel-to-pixel mapping between given two images subject to two-dimensional monotonicity and continuity constraints. Experimental comparisons with rigid matching and local perturbation show the performance superiority of the monotonic and continuous warping in character recognition..
298. S Uchida, H Sakoe, An efficient two-dimensional warping algorithm, IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, E82D, 3, 693-700, 1999.03, A new dynamic programming (DP) based algorithm for monotonic and continuous two-dimensional warping (2DW) is presented. This algorithm searches for the optimal pixel-to-pixel mapping between a pair of images subject to monotonicity and continuity constraints with by far less time complexity than the algorithm previously reported by the authors. This complexity reduction results from a refinement of the multi-stage decision process representing the 2DW problem. As an implementation technique, a polynomial order approximation algorithm incorporated with beam search is also presented. Theoretical and experimental comparisons show that the present approximation algorithm yields better performance than the previous approximation algorithm..
299. S Uchida, H Sakoe, A monotonic and continuous two-dimensional warping based on dynamic programming, FOURTEENTH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOLS 1 AND 2, 10.1109/ICPR.1998.711195, 521-524, 1998.08, A novel two-dimensional warping algorithm is presented which searches for the optimal pixel mapping subject to continuity and monotonicity constraints. These constraints enable us to preserve topological structure in ima,ges. The search algorithm is based on dynamic programming (DP). As implementation techniques, acceleration by beam search and excessive warp suppression by penalty and/or range limitation are investigated. Experimental results show that this method provides successful warpings between images..
300. Monotonic and Continuous Two-Dimensional Warping Based on Dynamic Progarmming
2画像間の最大一致を実現する画素間のマッピングとして定義される2次元ワープは, パターンに生じる変形に適応可能なテンプレートマッチング法とみなすことができる.本論文では新しい2次元ワープ法の枠組みを提案し, 基礎的な考察を行う.本手法の第一の特徴は, 2次元的な自由度をもちながら, パターンの位相を保存するワープを構成できることである.この性質はワープに対する単調性および連続性制約により実現される.第2の特徴は, 画像全体での最適性が保証されるように構成された動的計画法(DP)を, 最大一致の探索法として用いる点である.DPの利用により, 評価関数に対する微分可能性の制約がないなどの特長も生じる.実験により, 提案した手法の基本的特性を確認した..