Kyushu University [Yoichi Tomiura (Professor) Faculty of Information Science and Electrical Engineering, Department of Informatics]

Keyword Search

Researcher information

Yoichi Tomiura
Professor
Intelligence Science
Department of Informatics
Faculty of Information Science and Electrical Engineering
Last modified date：2024.04.29

Graduate School

Department of informatics
Graduate School of Information Science and Electrical Engineering

Department of Library Science
Graduate School of Integrated Frontier Sciences

Undergraduate School

Department of Electrical Engineering and Computer Science
School of Engineering

Department of Physics
School of Sciences

Other Organization

Library
Data-Driven Innovation Initiative
Other

E-Mail

Homepage

Phone

Academic Degree

Country of degree conferring institution (Overseas)

Field of Specialization

Total Priod of education and research career in the foreign country

Outline Activities

Research

Educational

Unauthorized reprint of the contents of this database is prohibited.

1.	Motokazu Yamasaki, Yoichi Tomiura, Toshiyuki Shimizu, Investigation of ChatGPT Use in Research Data Retrieval, Proceedings of International Conference on Asian Digital Libraries 2023, 36-40, 2023.12, In recent years, huge amounts of research data have been generated, and it has become important to search them efficiently and accurately in order to make use of research data. Existing search engines and keyword-based search methods require users to enter appropriate keywords or phrases, and it is difficult to obtain satisfactory results if users do not have detailed information about the desired data. In this study, we investigated whether ChatGPT could be used to reach the desired research data by users who are not familiar with them. Specifically, we investigated whether users could find the research data cited in a research paper by entering the abstract of the paper into ChatGPT and then asking for the data necessary to write the research paper. The results showed that research data could be found in 65% of the cases, confirming that the use of ChatGPT increases the discoverability of research data..
2.	Xiaofan Zheng, Masato Matsuoka, Kenshi Hayashi, Yoichi Tomiura, Extract spatial distribution of a specific gas from mixed gas data measured by the LSPR gas sensor, 10.1109/SENSORS56945.2023.10324923, 1-4, 2023.10, Visualizing invisible gas molecules can be a great help to our lives. At present, gas sensors can already visualize the spatial distribution of gas mixture, however, the visualization of a specific gas requires further analysis of the measurement data. In this study, matrix decomposition is used to analyze the measurement data of localized surface plasmon resonance (LSPR) gas sensor. To satisfy the linear relationship between the concentration of gas and the output of the device required for applying matrix decomposition, we formulated a procedure for processing the measurement data instead of using them directly. To obtain the diffusion trace of a specific gas, we designed a method to obtain the characteristic output of the specific gas, then by using the characteristic output as the known information, the corresponding diffusion trace can be estimated better through the matrix decomposition algorithm. We used the designed method to analyze the measurement data, and the results show that our method can obtain the spatial distribution of some gas..
3.	Xiaofan Zheng, Yoichi Tomiura, Kenshi Hayashi , Investigation of the structure-odor relationship using a Transformer model, Journal of Cheminformatics, https://doi.org/10.1186/s13321-022-00671-y, 2022.12, The relationships between molecular structures and their properties are subtle and complex, and the properties of odor are no exception. Molecules with similar structures, such as a molecule and its optical isomer, may have completely different odors, whereas molecules with completely distinct structures may have similar odors. Many works have attempted to explain the molecular structure-odor relationship from chemical and data-driven perspectives. The Transformer model is widely used in natural language processing and computer vision, and the attention mechanism included in the Transformer model can identify relationships between inputs and outputs. In this paper, we describe the construction of a Transformer model for predicting molecular properties and interpreting the prediction results. The SMILES data of 100,000 molecules are collected and used to predict the existence of molecular substructures, and our proposed model achieves an F1 value of 0.98. The attention matrix is visualized to investigate the substructure annotation performance of the attention mechanism, and we find that certain atoms in the target substructures are accurately annotated. Finally, we collect 4462 molecules and their odor descriptors and use the proposed model to infer 98 odor descriptors, obtaining an average F1 value of 0.33. For the 19 odor descriptors that achieved F1 values greater than 0.45, we also attempt to summarize the relationship between the molecular substructures and odor quality through the attention matrix..
4.	Yasuko Hagiwara, Emi Ishita, Yukiko Watanabe, Yoichi Tomiura, Identifying Scholarly Search Skills Based on Resource and Document Selection Behavior among Researchers and Master’s Students in Engineering, College & Research Libraries, https://doi.org/10.5860/crl.83.4.610, 83, 4, 610-630, 2022.07.
5.	Satoshi Fukuda, Emi Ishita, Yoichi Tomiura, Douglas W. Oard, Automating the Choice Between Single or Dual Annotation for Classifier Training, Porceedings of the 23rd International Conference on Asia-Pacific Digital Libraries (ICADL 2021), 10.1007/978-3-030-91669-5_19, 233-248, 2021.12, Many emerging digital library applications rely on automated classifiers that are trained using manually assigned labels. Accurately labeling training data for text classification requires either highly trained coders or multiple annotations, either of which can be costly. Previous studies have shown that there is a quality-quantity trade-off for this labeling process, and the optimal balance between quality and quantity varies depending on the annotation task. In this paper, we present a method that learns to choose between higher-quality annotation that results from dual annotation and higher-quantity annotation that results from the use of a single annotator per item. We demonstrate the effectiveness of this approach through an experiment in which a binary classifier is constructed for assigning human value categories to sentences in newspaper editorials..
6.	Xiaofan Zheng, Yoichi Tomiura, Kenshi Hayashi, Takaaki Soeda, Profile-Decomposing Output of Multi-Channel Odor Sensor Array, ECS Meeting Abstracts, MA2020-01, 2020.05.
7.	Keiya Maekawa, Yoichi Tomiura, Satoshi Fukuda, Emi Ishita, Hideaki Uchiyama, Improving OCR for Historical Documents by Modeling Image Distortion, Lecture Notes in Computer Science, 10.1007/978-3-030-34058-2_31, 11853, 312-316, 2019.11.
8.	Satoshi Fukuda, Yoichi Tomiura, Emi Ishita, Research Paper Search Using a Topic-Based Boolean Query Search and a General Query-Based Ranking Model, Lecture Notes in Computer Science, 10.1007/978-3-030-27618-8_5, 11707, 65-75, 2019.08.
9.	Emi Ishita, Satoshi Fukuda, Toru Oga, Douglas W. Oard, Kenneth R. Fleischmann, Yoichi Tomiura, An Shou Cheng, Toward Three-Stage Automation of Annotation for Human Values, iConference 2019, 2019.03, Prior work on automated annotation of human values has sought to train text classification techniques to label text spans with labels that reflect specific human values such as freedom, justice, or safety. This confounds three tasks: (1) selecting the documents to be labeled, (2) selecting the text spans that express or reflect human values, and (3) assigning labels to those spans. This paper proposes a three-stage model in which separate systems can be optimally trained for each of the three stages. Experiments from the first stage, document selection, indicate that annotation diversity trumps annotation quality, suggesting that when multiple annotators are available, the traditional practice of adjudicating conflicting annotations of the same documents is not as cost effective as an alternative in which each annotator labels different documents. Preliminary results for the second stage, selecting value sentences, indicate that high recall (94%) can be achieved on that task with levels of precision (above 80%) that seem suitable for use as part of a multi-stage annotation pipeline. The annotations created for these experiments are being made freely available, and the content that was annotated is available from commercial sources at modest cost..
10.	Shinjiro Okaku, Yoichi Tomiura, Emi Ishita, Shosaku Tanaka, Towards Generating Multiple-Choice Tests for Supporting Extensive Reading, Proc. the Seventh International Conference on Mobile, Hybrid, and On-line Learning (eLmL 2015), 2015.02, We propose a method for generating multiple-choice test for an English text selected by a learner and its answer, that are used to make a self-assessment whether the learner comprehends the text after reading it. In our method, the system extracts several important sentences from the text, and replaces one word in each of these sentences with its synonym (if possible). One of these sentences is then selected as a correct optional sentence, while further changes to the polarities or nouns in the remaining sentences are carried out to generate distractor optional sentences for the multiple-choice test. Our method has potential to make extensive reading in English more effective..
11.	Yasuhiro Takayama, Yoichi Tomiura, Emi Ishita, Douglas W. Oard, Kenneth R. Fleischmann, An-Shou Cheng, A Word-Scale Probabilistic Latent Variable Model for Detecting Human Values, Proc. 23th ACM International Conference on Information and Knowledge Management (CIKM 2014), 1-10, 2014.12, This paper describes a probabilistic latent variable model that is designed to detect human values such as justice or freedom that a writer has sought to reflect or appeal to when participating in a public debate. The proposed model treats the words in a sentence as having been chosen based on specific values; values reflected by each sentence are then estimated by aggregating values associated with each word. The model can determine the human values for the word in light of the influence of the previous word. This design choice was motivated by syntactic structures such as noun+noun, adjective+noun, and verb+adjective. The classifier based on the model was evaluated on a test collection containing 102 manually annotated documents focusing on one contentious political issue --- Net neutrality, achieving the highest reported classification effectiveness for this task. We also compared our proposed classifier with human second annotator. As a result, the proposed classifier effectiveness is statistically comparable with human annotators..
12.	Toshiaki Funatsu, Yoichi Tomiura, Emi Ishita, Kosuke Furusawa, Extracting Representative Words of a Topic Determined by Latent Dirichlet Allocation, Proc. The Sixth International Conference on Information, Process, and Knowledge Management (eKNOW 2014), 2014.03, Determining the topic of a document is necessary to understand the content of the document efficiently. Latent Dirichlet Allocation (LDA) is a method of analyzing topics. In LDA, a topic is treated as an unobservable variable to establish a probabilistic distribution of words. We can interpret the topic with a list of words that appear with high probability in the topic. This method works well when determining a topic included in many documents having a variety of contents. However, it is difficult to interpret the topic just using conventional LDA when determining the topic in a set of article abstracts found by a keyword search, because their contents are limited and similar. We propose a method to estimate representative words of each topic from an LDA result. Experimental results show that our method provides better information for interpreting a topic than LDA does..
13.	Relationship between Errors and Corrections in Verb Selection: Basic Research for Composition Support, Journal of Natural Language Processing, Vol.18, No.1, pp.3-29.
14.	A System Providing Appropriate Alternative Candidates for Japanese Writing using Word Co-occurrence, T. Nakano, Y. Tomiura, Jpn. J. Educ. Technol., 34(3), pp.181-189 (2010).
15.	Teiko NAKANO, Yoichi TOMIURA, Providing Appropriate Alternative Co-occurrence Candidates; Towards a Japanese Composition Support System, Proc. of the Ninth IASTED International Conference on Web-Based Education, pp. 173--179, 2010.03.
16.	Method for Selecting Appropriate Sentence from Documents on the WWW for the Open-ended Conversation Dialog System.
17.	Masahiro Shibata, Tomomi Nishiguchi, Yoichi Tomiura , Dialog System for Open-ended Conversation Using Web Documents, Informatica, Vol.33, No.3, pp.277-284, 2009.10.
18.	M. Shibata, Y. Tomiura, T. Mizuta, Identification among Similar Languages Using Statistical Hypothesis Testing , Proc. of Pacific Association for Computational Linguistics (PACLING'09) , pp.47--52 , 2009.09.
19.	Analysis of Answering Method with Probability Conversion for Internet Research.
20.	Discernment of Nativeness of English Documents Based on Statistical Hypothesis Testing, Y. Tomiura, S. Aoki, M. Shibata, K. Yukino, Journal of Natural Language Processing, Vol.16, No.1, pp.23-46 (2009).
21.	Masahiro Shibata, Tomomi Nishiguchi, Yoichi Tomiura, A Method for Automatically Generating Proper Responses to User's Utterances in Open-ended Conversation by Retrieving Documents on the Web, Proc. of 2008 IEEE International Conference on Information Reuse and Integration (IEEE IRI'08), pp.268-279, 2008.07.
22.	Atsushi TAGAMI, Chikara SAKAKI, Teruyuki HASEGAWA, Shigehiro ANO, Yoichi TOMIURA, Optimization of Answering Method with Probability Conversion, Proc. of 2008 International Symposium on Applications and the Internet (SAINT'08), pp.249-252, 2008.07.
23.	Atsushi TAGAMI, Chikara SAKAKI, Teruyuki HASEGAWA, Shigehiro ANO, Yoichi TOMIURA, Analysis of Answering Method with Probability Conversion for Internet Research, Fifth IEEE Consumer Communications & Networking Conference (CCNC'08), pp.110-111, 2008.01.
24.	Documents Discrimination between Native English Documents and Nonnative Ones Based on Language Identification Technique S. Aoki, Y. Tomiura, K. Yukino, R. Tanigawa Information Technology Letters, Vol.5, pp.85-88.
25.	A Learning Method of a Layered Neural Network Whose Inputs and Outputs are Symbol Sequences, M. Motoki, Y. Tomiura, N. Takahashi IPSJ Journal, Vol.47, No.8, pp.2279--2791.
26.	Language Identification Using Low-frequent Byte-strings K. Yukino, S. Tanaka, Y. Tomiura, H. Matsumoto IPSJ Journal, Vol.47, No.4, pp.1287--1294.
27.	Y. Tomiura, S. Tanaka, T. Hitaka, Estimating Satisfactoriness of Selectional Restriction from Corpus without Thesaurus, ACM Transactions on Asian Language Information Processing, Vol.4, No.4, pp.400--416, 2005.12.
28.	Estimation of Nativeness of Documents Based on Skew Divergence, H. Fujii, Y. Tomiura, S. Tanaka, Journal of Natural Language Processing, Vol. 12, No. 4, pp.79-96 (2005).
29.	K. YUKINO, S. TANAKA, Y. TOMIURA, H. MATSUMOTO, Robust Language Identification for Similar Languages and short texts using Low-Frequent Byte Strings, Pacific Association for Computational Linguistics 2005 (Pacling 2005), pp.368-373, 2005.08.
30.	Assisting with Translating Collocations Based on the Word Co-occurrence on the Web Texts, M. Shibata, Y. Tomiura, S. Tanaka, IPSJ Journal, Vol.46, No.6, pp.1480-1491 (2005).
31.	M. Motoki, Y. Tomiura, N. Takahashi, Problems of FGREP Module and Their Solution, 3rd IEEE International Conference on Cognitive Informatics (ICCI2004), 10.1109/COGINF.2004.1327479, 220-227, pp.220-227, 2004.08.
32.	Masahiro SHIBATA, Yoichi TOMIURA, Shosaku TANAKA, A Method for Retrieving Translations of Collocation in Web Data, Asian Symposium on Natural Language Processing to Overcome Language Barriers (in conjunction with IJCNLP-04), 2004.03.
33.	Placement of Nouns in a Multi-Dimensional Space Based on Words' Cooccurrency Yoichi TOMIURA, Shosaku TANAKA, Toru HITAKA Transactions of the Japanese Society for Artificial Intelligence, Vol.19, No.1A, pp.1-9.
34.	Estimation of Words' Cooccurrency from Corpus Yoichi TOMIURA, Toru HITAKA IPSJ Journal, Vol.45, No.1, pp.324-332.
35.	TAKAHASHI Naoto, MOTOKI Minoru, SHIMAZU Yoshio, TOMIURA Yoichi, HITAKA Toru, PP-attachment Ambiguity Resolution Using a Neural Network wiht Modified FGREP Method, the 2nd Workshop on Natural Language Processing and Neural Networks (post-conference workshop of NLPRS2001), pp.1-7, 2001.11.
36.	Context Free Grammar Expressing Dependency Constraint and its Application to Japanese Language Toshifumi TANABE, Yoichi TOMIURA, Toru HITAKA IPSJ Journal, Vol.41, No.1, pp.36 - 45.
37.	A Parameter Estimation of PCFG Expressing Dependency Constraints on a Sparse Sample Yoichi TOMIURA, Toru HITAKA IPSJ Journal, Vol.40, No.11, pp.4055 - 4063.
38.	Classification of Syntactic Categories of Nouns by the Scattering of Semantic Categories Shosaku TANAKA, Yoichi TOMIURA, Toru HITAKA IPSJ Journal, Vol.40, No.9, pp.3387 - 3396.
39.	Semantic Structure of Japanese Noun Phrasese "NP 'no' NP", Yoichi TOMIURA, Teigo NAKAMURA, Toru HITAKA, IPSJ Journal, Vol.36, No.6, pp.1441-1448 (1995).
40.	Preference on Common Sense Reasoning and Application to Contextual Processing, Yoichi TOMIURA, Natsuki ICHIMARU, Toru HITAKA, IPSJ Journal, Vol.35, No.11, pp.2239 - 2248 (1994).
41.	A New Data Structure for Searching Prefix Words: Prefix-Closed B-tree, Yoichi TOMIURA, Teigo NAKAMURA, Toru Hitaka, IPSJ Journal. Vol.35, No.5, pp.779-789 (2007).
42.	Semantic Validity of Japanese Noun Phrases with Adnominal Particles, Teigo NAKAMURA, Yoichi TOMIURA, Toru HITAKA, in Proc. of PRICAI'92, Vol.1, No.2, pp.433-437 (1992).
43.	Y. TOMIURA, T. NAKAMURA, T. HITAKA, S. YOSHIDA, Logical Form of Hierarchical Relation on Verbs and Extracting it from Definition Sentences in a Japanese Dictionary, Proc. of the th International Conference on Computational Linguistics(Coling-92), Vol.2, No.14, pp.574-580, 1992.07.
44.	Extracting the Superordinate-Subordinate Relation between Verbs from Definition Sentences in Japanese Dictionaries, Yoichi TOMIURA, Toru HITAKA, Sho YOSHIDA, IPSJ Journal, Vol.32, No.1, pp.42 - 49 (1991).

1.	Motokazu Yamasaki, Yoichi Tomiura, Toshiyuki Shimizu, Investigation of ChatGPT Use in Research Data Retrieval, International Conference on Asian Digital Libraries 2023, 2023.12, In recent years, huge amounts of research data have been generated, and it has become important to search them efficiently and accurately in order to make use of research data. Existing search engines and keyword-based search methods require users to enter appropriate keywords or phrases, and it is difficult to obtain satisfactory results if users do not have detailed information about the desired data. In this study, we investigated whether ChatGPT could be used to reach the desired research data by users who are not familiar with them. Specifically, we investigated whether users could find the research data cited in a research paper by entering the abstract of the paper into ChatGPT and then asking for the data necessary to write the research paper. The results showed that research data could be found in 65% of the cases, confirming that the use of ChatGPT increases the discoverability of research data..
2.	Satoshi Fukuda, Emi Ishita, Yoichi Tomiura, Douglas W. Oard, Automating the Choice Between Single or Dual Annotation for Classifier Training, The 23rd International Conference on Asia-Pacific Digital Libraries (ICADL 2021), 2021.12, Many emerging digital library applications rely on automated classifiers that are trained using manually assigned labels. Accurately labeling training data for text classification requires either highly trained coders or multiple annotations, either of which can be costly. Previous studies have shown that there is a quality-quantity trade-off for this labeling process, and the optimal balance between quality and quantity varies depending on the annotation task. In this paper, we present a method that learns to choose between higher-quality annotation that results from dual annotation and higher-quantity annotation that results from the use of a single annotator per item. We demonstrate the effectiveness of this approach through an experiment in which a binary classifier is constructed for assigning human value categories to sentences in newspaper editorials..
3.	Xiaofan Zheng, Yoichi Tomiura, Kenshi Hayashi, Takaaki Soeda, Profile-Decomposing Output of Multi-Channel Odor Sensor Array, IMCS 2020, 2020.05.
4.	Keiya Maekawa, Yoichi Tomiura, Satoshi Fukuda, Emi Ishita, Hideaki Uchiyama, Improving OCR for Historical Documents by Modeling Image Distortion, 21st International Conference on Asia-Pacific Digital Libraries (ICADL 2019), 2019.11, Archives hold printed historical documents, many of which have deteriorated. It is difficult to extract text from such images without errors using optical character recognition (OCR). This problem reduces the accuracy of information retrieval. Therefore, it is necessary to improve the performance of OCR for images of deteriorated documents. One approach is to convert images of deteriorated documents to clear images, to make it easier for an OCR system to recognize text. To perform this conversion using a neural network, data is needed to train it. It is hard to prepare training data consisting of pairs of a deteriorated image and an image from which deterioration has been removed; however, it is easy to prepare training data consisting of pairs of a clear image and an image created by adding noise to it. In this study, PDFs of historical documents were collected and converted to text and JPEG images. Noise was added to the JPEG images to create a dataset in which the images had noise similar to that of the actual printed documents. U-Net, a type of neural network, was trained using this dataset. The performance of OCR for an image with noise in the test data was compared with the performance of OCR for an image generated from it by the trained U-Net. An improvement in the OCR recognition rate was confirmed..
5.	Satoshi Fukuda, Yoichi Tomiura, Emi Ishita, Research Paper Search Using a Topic-Based Boolean Query Search and a General Query-Based Ranking Model, 30th International Conference on Database and Expert Systems Applications (DEXA 2019), 2019.08.
6.	Kohei Omori, Yoichi Tomiura, Kenshi Hayashi, Statistical analysis for clustering of areas on the olfactory bulb and estimation of the physico-chemical properties detected by glomeruli in each area, ISOT 2016, 2016.06.
7.	Shinjiro Okaku, Yoichi Tomiura, Emi Ishita, Shosaku Tanaka, Towards Generating Multiple-Choice Tests for Supporting Extensive Reading, The Seventh International Conference on Mobile, Hybrid, and On-line Learning (eLmL 2015), 2015.02, We propose a method for generating multiple-choice test for an English text selected by a learner and its answer, that are used to make a self-assessment whether the learner comprehends the text after reading it. In our method, the system extracts several important sentences from the text, and replaces one word in each of these sentences with its synonym (if possible). One of these sentences is then selected as a correct optional sentence, while further changes to the polarities or nouns in the remaining sentences are carried out to generate distractor optional sentences for the multiple-choice test. Our method has potential to make extensive reading in English more effective..
8.	Yasuhiro Takayama, Yoichi Tomiura, Emi Ishita, Douglas W. Oard, Kenneth R. Fleischmann, An-Shou Cheng, A Word-Scale Probabilistic Latent Variable Model for Detecting Human Values, ACM International Conference on Information and Knowledge Management (CIKM2014), 2014.12, This paper describes a probabilistic latent variable model that is designed to detect human values such as justice or freedom that a writer has sought to reflect or appeal to when participating in a public debate. The proposed model treats the words in a sentence as having been chosen based on specific values; values reflected by each sentence are then estimated by aggregating values associated with each word. The model can determine the human values for the word in light of the influence of the previous word. This design choice was motivated by syntactic structures such as noun+noun, adjective+noun, and verb+adjective. The classifier based on the model was evaluated on a test collection containing 102 manually annotated documents focusing on one contentious political issue --- Net neutrality, achieving the highest reported classification effectiveness for this task. We also compared our proposed classifier with human second annotator. As a result, the proposed classifier effectiveness is statistically comparable with human annotators..
9.	Toshiaki Funatsu, Yoichi Tomiura, Emi Ishita, Kosuke Furusawa, Extracting Representative Words of a Topic Determined by Latent Dirichlet Allocation, eKNOW 2014 (Digital World 2014), 2014.03, Determining the topic of a document is necessary to understand the content of the document efficiently. Latent Dirichlet Allocation (LDA) is a method of analyzing topics. In LDA, a topic is treated as an unobservable variable to establish a probabilistic distribution of words. We can interpret the topic with a list of words that appear with high probability in the topic. This method works well when determining a topic included in many documents having a variety of contents. However, it is difficult to interpret the topic just using conventional LDA when determining the topic in a set of article abstracts found by a keyword search, because their contents are limited and similar. We propose a method to estimate representative words of each topic from an LDA result. Experimental results show that our method provides better information for interpreting a topic than LDA does..
10.	Analysis of Answering Method with Probability Conversion for Internet Research, Atsushi TAGAMI, Chikara SAKAKI, Teruyuki HASEGAWA, Shigehiro ANO, Yoichi TOMIURA, Fifth IEEE Consumer Communications & Networking Conference (CCNC'08).
11.	Robust Language Identification for Similar Languages and short texts using Low-Frequent Byte Strings, Kensei Yukino, Shosaku Tanaka, Yoichi Tomiura and Hideki Matsumoto, Pacific Association for Computational Linguistics 2005 (Pacling 2005).
12.	Problems of FGREP Module and Their Solution, M. Motoki, Y. Tomiura, N. Takahashi, 3rd IEEE International Conference on Cognitive Informatics,.
13.	A Method for Retrieving Translations of Collocation in Web Data, M. Shibata, Y. Tomiura, S Tanaka, IJCNLP-04 Satellite Symposium.