||Satoshi Fukuda, Emi Ishita, Yoichi Tomiura, Douglas W. Oard, Automating the Choice Between Single or Dual Annotation for Classifier Training, The 23rd International Conference on Asia-Pacific Digital Libraries (ICADL 2021), 2021.12, Many emerging digital library applications rely on automated classifiers that are trained using manually assigned labels. Accurately labeling training data for text classification requires either highly trained coders or multiple annotations, either of which can be costly. Previous studies have shown that there is a quality-quantity trade-off for this labeling process, and the optimal balance between quality and quantity varies depending on the annotation task. In this paper, we present a method that learns to choose between higher-quality annotation that results from dual annotation and higher-quantity annotation that results from the use of a single annotator per item. We demonstrate the effectiveness of this approach through an experiment in which a binary classifier is constructed for assigning human value categories to sentences in newspaper editorials..
||Mei Kodama, Emi Ishita, Yukiko Watanabe, Yoichi Tomiura, Usage of E-books During the COVID-19 Pandemic: A Case Study of Kyushu University Library, Japan, iConference 2021, 2021.03.
||Emi Nishida, Emi Ishita, Yukiko Watanabe, Yoichi Tomiura, Description of research data in laboratory notebooks: Challenges and opportunities, ASIS&T 2020, 2020.10.
||Emi Ishita, Satoshi Fukuda, Yoichi Tomiura, Douglas W Oard, Using text classification to improve annotation quality by improving annotator consistency, ASIS&T 2020, 2020.10.
||Xiaofan Zheng, Yoichi Tomiura, Kenshi Hayashi, Takaaki Soeda, Profile-Decomposing Output of Multi-Channel Odor Sensor Array, IMCS 2020, 2020.05.
||Emi Ishita, Satoshi Fukuda, Toru Oga, Yoichi Tomiura, Douglas W Oard, Kenneth R Fleischmann, Cost-effective learning for classifying human values, iConference 2020, 2020.03.
||M. Kodama, K. Abe, K. Fukushima, E. Hayashi, Z. Hua, M. Jiang, P. Kang, E. Nishida, S. Sakai, Y. Tomiura, Y. Watanabe, E. Ishita, Content Analysis of Library Use on Microblog: Pre-coding Results, 9th Asia-Pacific Conference on Library & Information Education and Practice (A-LIEP 2019), 2019.11.
||H. Uchiyama, E. Ishita, Y. Watanabe, Y. Tomiura, A. Shimada, M. Yamada
, A framework for sharing learner generated contents in collaborative learning, 9th Asia-Pacific Conference on Library & Information Education and Practice (A-LIEP 2019), 2019.11.
||K. Fukushima, E. Ishita, Y. Tomiura, Y. Watanabe, H. Uchiyama, Photovoice for Student Out-of-Class Learning, 9th Asia-Pacific Conference on Library & Information Education and Practice (A-LIEP 2019), 2019.11.
||Keiya Maekawa, Yoichi Tomiura, Satoshi Fukuda, Emi Ishita, Hideaki Uchiyama, Improving OCR for Historical Documents by Modeling Image Distortion, 21st International Conference on Asia-Pacific Digital Libraries (ICADL 2019), 2019.11, Archives hold printed historical documents, many of which have deteriorated. It is difficult to extract text from such images without errors using optical character recognition (OCR). This problem reduces the accuracy of information retrieval. Therefore, it is necessary to improve the performance of OCR for images of deteriorated documents. One approach is to convert images of deteriorated documents to clear images, to make it easier for an OCR system to recognize text. To perform this conversion using a neural network, data is needed to train it. It is hard to prepare training data consisting of pairs of a deteriorated image and an image from which deterioration has been removed; however, it is easy to prepare training data consisting of pairs of a clear image and an image created by adding noise to it. In this study, PDFs of historical documents were collected and converted to text and JPEG images. Noise was added to the JPEG images to create a dataset in which the images had noise similar to that of the actual printed documents. U-Net, a type of neural network, was trained using this dataset. The performance of OCR for an image with noise in the test data was compared with the performance of OCR for an image generated from it by the trained U-Net. An improvement in the OCR recognition rate was confirmed..
||T. Soeda, Z. Yang, Z. Xiofan, F. Sassa, Y. Tomiura and K. Hayashi, 2D LSPR multi gas sensor array with 4-segmented subpixel using Au/Ag core shell structure, IEEE Sensors, 2019.10.
||Satoshi Fukuda, Yoichi Tomiura, Emi Ishita, Research Paper Search Using a Topic-Based Boolean Query Search and a General Query-Based Ranking Model, 30th International Conference on Database and Expert Systems Applications (DEXA 2019), 2019.08.
||Takaaki Soeda, Zhongyuan Yang, Zheng Xiofan, Fumihiro Sassa, Yoichi Tomiura, Kenshi Hayashi, Two dimensional LSPR gas sensor with Au/Ag core-shell structure, 18th International Symposium on Olfaction and Electronic Nose, ISOEN 2019, 2019.05, If we can quickly recognize the distribution of dangerous gases, it will be useful in places such as disaster scene. Localized surface plasmon resonance (LSPR) gas sensor is known as a gas sensor with high response / recovery speed and high spatial resolution. However, the general LSPR gas sensor does not have a molecular selectivity and it is difficult to identify the gas species. We made gas selected pixelated LSPR substrate based on Au/Ag core-shell structure by photo-induced growth by exposure system using the photomask..
||R. Marciano, V. Lemieux, M. Hedges, Y. Tomiura, S. Katuu, J. Greenberg, W. Underwood, K. Fenlon, A. Kriesberg, M. Kendig, G. Jansen, P. Piety, D. Weintrop, M. Kurtz, Establishing an International Computational Network for Librarians and Archivists, 14th International Conference on Information in Contemporary Society (iConference2019), 2019.04.
||Emi Ishita, Satoshi Fukuda, Toru Oga, Douglas W. Oard, Kenneth R. Fleischmann, Yoichi Tomiura, An Shou Cheng, Toward Three-Stage Automation of Annotation for Human Values, 14th International Conference on Information in Contemporary Society (iConference2019), 2019.04, Prior work on automated annotation of human values has sought to train text classification techniques to label text spans with labels that reflect specific human values such as freedom, justice, or safety. This confounds three tasks: (1) selecting the documents to be labeled, (2) selecting the text spans that express or reflect human values, and (3) assigning labels to those spans. This paper proposes a three-stage model in which separate systems can be optimally trained for each of the three stages. Experiments from the first stage, document selection, indicate that annotation diversity trumps annotation quality, suggesting that when multiple annotators are available, the traditional practice of adjudicating conflicting annotations of the same documents is not as cost effective as an alternative in which each annotator labels different documents. Preliminary results for the second stage, selecting value sentences, indicate that high recall (94%) can be achieved on that task with levels of precision (above 80%) that seem suitable for use as part of a multi-stage annotation pipeline. The annotations created for these experiments are being made freely available, and the content that was annotated is available from commercial sources at modest cost..
||Yoichi Tomiura, Emi Ishita, Hideaki Uchiyama, Satoshi Fukuda, A Comprehensive Study for Constructing a Large Scale Information Infrastructure of Paper-based Historical Materials, 10th Asia Library and Information Research Group (ALIRG) Workshop, 2018.12.
||Satoshi Fukuda and Yoichi Tomiura, A Study for the Support of a Search Formula Creation for the Exhaustive search of an Academic Paper based on a User’s Information Need, 10th Asia Library and Information Research Group (ALIRG) Workshop, 2018.12.
||Emi Ishita, Yasuko Hagiwara, Yoichi Tomiura, Users’ searching behavior for academic papers, Workshop at ICADL2018, 2018.11.
||Satoshi Fukuda and Yoichi Tomiura, Toward a Search Formula Creation Support for the Exhaustive Search of an Academic Paper, Workshop at ICADL2018, 2018.11.
||Satoshi Fukuda and Yoichi Tomiura, Clustering of Research Papers based on Sentence Roles, 20th International Conference on Asia-Pacific Digital Library (ICADL 2018), 2018.11.
||Satoshi Fukuda and Yoichi Tomiura, Exhaustive Search of Academic Paper Using Topic-Based Boolean Query, The 2018 International Symposium on Information Technology Convergence (ISITC 2018), 2018.10.
||Emi Ishita, Yasuko Hagiwara, Yukiko Watanabe, Yoichi Tomiura, Which Parts of Search Results do Researchers Check when Selecting Academic Documents?, JCDL 2018, 2018.06, Our goal is to propose an alternative retrieval system of academic documents based on researcher’s behavior in practice. In this study, a questionnaire survey was conducted. Question items were developed from findings in the previous observational study for researcher’s behavior. From the results of 46 respondents, the top three elements checked in the search results were title, abstract, and the full-text version. They also checked structure “Introduction” in the full-text rather than other structures when they found previous research in an unfamiliar field. These results indicate that researchers use different ways for selecting documents based on the type of documents they look for..
||Satoshi Fukuda, Yoichi Tomiura, Using Topic Analysis Techniques to Support Comprehensive Research Paper Searches, 21st International Conference on Asian Language Processing, IALP 2017, 2017.12, In an academic paper search to confirm the originality of a user's research, it is important that the search returns comprehensive results relevant to the user's information need. To achieve comprehensive search results, users often relax initially restrictive search formula by adding synonyms and expressions similar to the search words with operator OR, and/or replacing AND with OR operations. However, it is difficult to anticipate all the terms that authors of relevant papers might have used. In addition, the replacement of AND with OR in search phrases can return a large number of unrelated papers. To overcome these issues, we propose a research paper search method based on topic analysis, which uses Boolean search based on the topics assigned to the search words in the search formula and the abstracts that contain any search word. Our method considers synonyms and expressions similar to the search words, which a user might not anticipate, while limiting the number of papers unrelated to the information need in the search result. To investigate the effectiveness of our method, we conducted experiments using the NTCIR-1 and 2 datasets, and confirmed that our method shows a reduction effect on unrelated papers, while maintaining high coverage..
||Yasuko Hagiwara, Emi Ishita, Emiko Mizutani, Kana Fukushima, Yukiko Watanabe, Yoichi Tomiura, Identifying Key Elements of Search Results for Document Selection in the Digital Age: An Observational Study, 19th International Conference on Asia-Pacific Digital Libraries, ICADL 2017, 2017.11, Academic database systems are vitally important tools for enabling researchers to find relevant, useful articles. Identifying how researchers select documents from search results is an extremely useful measure for improving the functions or interfaces of academic retrieval systems. This study aims to reveal which elements are checked, and in what order, when researchers select from among search results. It consists of two steps: an observational study of search sessions performed by researchers who volunteered, and a questionnaire to confirm whether extracted elements and patterns are used. This article reports findings from the observational study and introduces questions we developed based on the study. In the observational study we obtained data on nine participants who were asked to search for documents using information retrieval systems. The search sessions were recorded using a voice recorder and by capturing screen images. The participants were also asked to state which elements they checked in selecting documents, along with the reasons for their selections. Three patterns of order of checking were found. In pattern 1, seven researchers used titles and abstracts as the primary elements. In pattern 2, the others used titles and then accessed the full text before making a decision on their selection. In pattern 3, one participant searched for images and accessed the full text from the link in those pictures. We also found participants used novel elements for selecting. We subsequently developed items for a questionnaire reflecting the findings..
||Ishita, E., Oga, T., Cheng, A.-S., Fleischmann, K.R., Yasuhiro, T., Oard, D.W., Tomiura, Y., Toward automating detection of human values in the nuclear power debate, Association for Information Science and Technology, 2017.10.
||Yasuko Hagiwara, Emi Ishita, Emiko Mizutani, Yukiko Watanabe, Yoichi Tomiura, A Preliminary Experiment and Analysis to Identify Key Elements in Document Selection, ISIC 2016, 2016.09.
||Takafumi Yamamoto, Yoichi Tomiura, Constructing Corpus of Scientific Abstracts Annotated with Sentence Roles, Seventh International Conference on E-Service and Knowledge Management, 2016.07.
||Kohei Omori, Yoichi Tomiura, Kenshi Hayashi, Statistical analysis for clustering of areas on the olfactory bulb and estimation of the physico-chemical properties detected by glomeruli in each area, ISOT 2016, 2016.06.
||Liang Shang, Chuanjun Liu, Yoichi Tomiura, Kenshi Hayashi, Artificial odor cluster map of odorant molecular parameters and odor maps in rat olfactory bulbs
, ISOT 2016, 2016.06.
||Takeshi Shirai, Yoichi Tomiura, Shosaku Tanaka, Ryutaro Ono, Mining Latent Research Groups within Institutions Using an Author‐Topic Model, ICADL 2015, 2015.12.
||Kosuke Furusawa, Hongjun Fan, Yoichi Tomiura, Emi Ishita, Encompassing Retrieval of Academic Papers for User's Information Need
, ICADL 2015, 2015.12.
||Yasuhiro Takayama, Yoichi Tomiura, Kenneth R. Fleischmann, An-Shou Cheng, Douglas W. Oard, Emi Ishita, An Automatic Dictionary Extraction and Annotation Method Using Simulated Annealing for Detecting Human Values , Sixth International Conference on E-Service and Knowledge Management, 2015.07.
||Emi Ishita, Douglas W. Oard, Kenneth R. Fleischmann, Yoichi Tomiura, Yasuhiro Takayama, An-Shou Cheng, Learning curves for automating content analysis: How much human annotation is needed ? , Sixth International Conference on E-Service and Knowledge Management, 2015.07.
||Kenneth R. Fleischmann, Yasuhiro Takayama, An-Shou Cheng, Yoichi Tomiura, Douglas W. Oard, Emi Ishita, Thematic Analysis of Words that Invoke Values in the Net Neutrality Debate, i Conference 2015, 2015.03.
||Shinjiro Okaku, Yoichi Tomiura, Emi Ishita, Shosaku Tanaka, Towards Generating Multiple-Choice Tests for Supporting Extensive Reading, The Seventh International Conference on Mobile, Hybrid, and On-line Learning (eLmL 2015), 2015.02, We propose a method for generating multiple-choice test for an English text selected by a learner and its answer, that are used to make a self-assessment whether the learner comprehends the text after reading it. In our method, the system extracts several important sentences from the text, and replaces one word in each of these sentences with its synonym (if possible). One of these sentences is then selected as a correct optional sentence, while further changes to the polarities or nouns in the remaining sentences are carried out to generate distractor optional sentences for the multiple-choice test. Our method has potential to make extensive reading in English more effective..
||Yasuhiro Takayama, Yoichi Tomiura, Emi Ishita, Douglas W. Oard, Kenneth R. Fleischmann, An-Shou Cheng, A Word-Scale Probabilistic Latent Variable Model for Detecting Human Values, ACM International Conference on Information and Knowledge Management (CIKM2014), 2014.12, This paper describes a probabilistic latent variable model that is designed to detect human values such as justice or freedom that a writer has sought to reflect or appeal to when participating in a public debate. The proposed model treats the words in a sentence as having been chosen based on specific values; values reflected by each sentence are then estimated by aggregating values associated with each word. The model can determine the human values for the word in light of the influence of the previous word. This design choice was motivated by syntactic structures such as noun+noun, adjective+noun, and verb+adjective. The classifier based on the model was evaluated on a test collection containing 102 manually annotated documents focusing on one contentious political issue --- Net neutrality, achieving the highest reported classification effectiveness for this task. We also compared our proposed classifier with human second annotator. As a result, the proposed classifier effectiveness is statistically comparable with human annotators..
||Shuhei Otani, Yoichi Tomiura, Extraction of Key Expressions Indicating the Important Sentence from Article Abstracts, ESKM 2014, 2014.09.
||Shinjiro OKaku, Yoichi Tomiura, Kou Shu, Shosaku Tanaka, Towards Generating Multiple-Choice Tests for Evaluating Comprehension of Arbitrary English Texts, ESKM 2014, 2014.09, a.
||Toshiaki Funatsu, Yoichi Tomiura, Emi Ishita, Kosuke Furusawa, Extracting Representative Words of a Topic Determined by Latent Dirichlet Allocation, eKNOW 2014 (Digital World 2014), 2014.03, Determining the topic of a document is necessary to understand the content of the document efficiently. Latent Dirichlet Allocation (LDA) is a method of analyzing topics. In LDA, a topic is treated as an unobservable variable to establish a probabilistic distribution of words. We can interpret the topic with a list of words that appear with high probability in the topic. This method works well when determining a topic included in many documents having a variety of contents. However, it is difficult to interpret the topic just using conventional LDA when determining the topic in a set of article abstracts found by a keyword search, because their contents are limited and similar. We propose a method to estimate representative words of each topic from an LDA result. Experimental results show that our method provides better information for interpreting a topic than LDA does..
||Yuichiro Kobayashi, Shosak Tanaka, Yoichi Tomiura, Yoshinori Miyazaki, Michio Tokumi, Identifying Discipline-Specific Expressions Based on Institutional Repository, Digital Humanities, 2014.03,
||Yasuhiro Takayama, Yoichi Tomiura, Emi Ishita, Zheng Wang, Douglas W. Oard, Kenneth R. Fleischmann, An-Shou Cheng, Improving Automatic Sentence-Level Annotation of Human Values Using Augmented Feature Vectors, Pacling, 2013.09, This paper describes an effort to improve identification of human values that are directly or indirectly invoked within the prepared statements of witnesses before legislative and regulatory hearings. We automatically code human values at the sentence level using supervised machine learning techniques trained on a few thousand annotated sentences. To simulate an actual situation, we treat a quarter of the data as labeled for training and the remaining three quarters of the data as unlabeled for test. We find that augmenting the feature space using a combination of lexical and statistical co-occurrence evidence can yield about a 6% relative improvement in F1 using a Support Vector Machine classifier. .
||Analysis of Answering Method with Probability Conversion for Internet Research,
Atsushi TAGAMI, Chikara SAKAKI, Teruyuki HASEGAWA, Shigehiro ANO, Yoichi TOMIURA,
Fifth IEEE Consumer Communications & Networking Conference (CCNC'08).
||Tracing Japanese EFL Learners' Development in Productive Vocabulary.
||Robust Language Identification for Similar Languages and short texts using Low-Frequent Byte Strings,
Kensei Yukino, Shosaku Tanaka, Yoichi Tomiura and Hideki Matsumoto,
Pacific Association for Computational Linguistics 2005 (Pacling 2005).
||A System for Extensive Slash Reading Using Web,
Shosaku TANAKA, Yoichi TOMIURA, Kensei YUKINO,
The 18th Pacific Asia Conference on Language, Information and Computation (PACLIC 18).
||Discrimination of Native/Non-native Documents Based on Skew Divergence,
H. Fuji, S. Tanaka, Y. Tomiura,
Forum on Information Technology 2004 (FIT2004).
||Problems of FGREP Module and Their Solution,
M. Motoki, Y. Tomiura, N. Takahashi,
3rd IEEE International Conference on Cognitive Informatics,.
||A Method for Retrieving Translations of Collocation in Web Data,
M. Shibata, Y. Tomiura, S Tanaka,
IJCNLP-04 Satellite Symposium.
||Placement of Nouns in n-Dimensional Space Based on Cooccurrency
Yoichi TOMIURA, Sho TANAKA, Toru HITAKA
IPSJ SIG Notes, Vol.2003, No.23 (2003-NL-154).
||Judgement on Validity of Candidates of Translations Derived via English Using Sets of Synonyms
Shosaku TANAKA, Yoichi TOMIURA
Forum on Information Technology.
||PP-attachment Ambiguity Resolution Using a Neural Network wiht Modified FGREP Method
TAKAHASHI Naoto, MOTOKI Minoru, SHIMAZU Yoshio, TOMIURA Yoichi, HITAKA Toru
the 2nd Workshop on Natural Language Processing and Neural Networks (post-conference workshop of NLPRS2001).
||Extraction of Candidates having Semantic Relations of Japanese Noun Phrase "NP1 'no' NP2" by Dependency Structures
Shosaku TANAKA, Yoichi TOMIURA, Toru HITAKA
Technical Report of IEICE, NLC 2001-46.
||Discriminant Analysis with Incomplete Data and its Applicatioin to Estimation of Words' Cooccurrency
Yoichi TOMIURA, Shosaku TANAKA, Toru HITAKA
Technical Report of IEICE, NLC 2000-76.
||How to Strengthen a Context Free Grammar Expressing Dependency Constraint
D. Toushinbatto, Yoichi TOMIURA, Toru HITAKA
IPSJ SIG Notes, Vol.98, No.99 (98-NL-128).
||A Parameter Estimation of a Probabilistic Context Free Grammar on a Sparse Sample
Yoichi TOMIURA, Takeshi NISHIDA, Toru HITAKA
Tecnical Report of IEICE, NLC 98-12.
||Acquisition of Semantic Relations of Japanese Noun Phrases "NP 'no' NO" by using Statistical Property
Shosaku TANAKA, Yoichi TOMIURA, Toru HITAKA
Technical Report of IEICE, NLC 98-4.