Kyushu University Academic Staff Educational and Research Activities Database
Researcher information (To researchers) Need Help? How to update
Yoichi Tomiura Last modified date:2017.05.29



Graduate School
Undergraduate School
Other Organization


E-Mail
Phone
092-802-3584
Academic Degree
Dr. Eng.
Field of Specialization
Natural Language Processing
Outline Activities
He has been working in the field of Natural Language Processing.
Especially, his current research interests are : disambiguation of
syntactic structure of a sentence, acquisition of the knowledge
about words' meaning from a corpus, organization of documents
on WWW, and a use of documents on WWW
to the language education.
Research
Research Interests
  • olfactory information processing using active pattern of glomeruli in the olfactory bulb
    keyword : sense of smell, primitives of olfactory information, clustering of glomeruli
    2014.10.
  • Extracting Latent Research Cluster using Institute Repository
    keyword : Author Topic Model, Topic Analysis, Collaboration, Research Administrator
    2013.04.
  • Automatically Generating Questions and Answers about Content of English Document Arbitrarily Selected by Learner Using NLP
    keyword : generation of questions and answers, extensive reading, learning support, NLP
    2012.04~2016.03.
  • Automatic Sentence-Level Annotation of Human Values
    keyword : human values, opinion sentence, SVM, latent variable, Gibbs Sampling
    2012.02.
  • Organization of Scientific Papers for Scientific Information Retrieval
    keyword : clustering, k-means, latent variable, statistic language model, distributional similarity, Gibbs Sampling
    2011.04.
  • Organization of Documents on WWW
    keyword : Document Clustering, Estimation of Topic, Estimation of Relation between Clusters, Latent Class, Stochastic Language Model, BIC
    2010.10~2012.03.
  • Analysis of Answering Method with Probability Conversion for Internet Research
    keyword : Internet Research, anonymity, Probability Conversion
    2007.07~2009.12.
  • Construction and publication of native/non-native English paper corpus gathered from Web and its application
    keyword : Web document, discernment of nativeness, learner's corpus, English language education, supporting system for writing in English, NLP
    2007.10~2012.03.
  • Providing Appropriate Alternative Co-occurrence Candidates; Towards a Japanese Composition Support System for Foreign Students
    keyword : similarity of occurring environments, natural cooccurrence, Japanese composition support system, natural language processing
    2008.06~2011.03.
  • Knowledge Acquisition about Meaning from Large Language Corpus
    keyword : semantic category, case-frame, causal relation, self-organization, statistical language model, NLP
    2003.08~2010.09.
  • Discernment of Nativeness of English Documents Based on Statistical Language Model
    keyword : Web documents, Discernment of Nativeness, Statistical Language Model, Statistical Hypothesis Testing, NLP
    2003.06~2009.09.
  • A Method for Automatically Generating Proper Responses to User's Utterances in Open-ended Conversation by Retrieving Documents on the Web
    keyword : dialogue, open-domain, cohesion, coherence
    2005.09~2009.08A knowledge-based dialog system makes the correct answer; however, it is unsuitable for open-ended input. On the other hand, Eliza makes open-ended conversation; however, it gives no new information to a user. We propose a new type of dialog system. Our dialog system lies between the two types of dialog systems xdescribed above, and makes conversation about various topics and gives information related to user's utterances. This type of dialog is useful for a new-idea generation for a user having an obscure desire to get information about his or her interest, but no concrete goal. Our system selects the proper sentence for the response to a user's utterance from a corpus. The most proper sentence is selected according to whether it satisfies surface cohesion and semantic coherence with the user's utterance. We made a trial system to make a conversation about movies and it showed that the proposed method is effective to search the proper response to user's open-ended utterances. .
  • Estimating Satisfactoriness of Selectional Restriction from Corpus
    keyword : word cooccurrence, syntactic disambiguation, multiple regression model, natural language processing
    1999.06~2004.03A selectional restriction specifies what combinations of words are semantically valid in a particular syntactic construction. This is one of the basic and important pieces of knowledge in Natural Language Processing, and has been used for syntactic disambiguation and word sense disambiguation. In the case of acquiring the selectional restriction for many combinations of words from a corpus, it is necessary to estimate whether or not a word combination that is not observed in the corpus satisfies the selectional restriction. This study proposed a new method for estimating the degree of satisfaction of the selectional restriction for a word combination from a tagged corpus, based on the multiple regression model. The independent variables of this model correspond to modifiers. Unlike a conventional multiple regression analysis, the independent variables are also parameters to be learned. We experiment on estimating the degree of satisfaction of the selectional restriction for Japanese word combinations . The experimental results indicate that our method estimates the degree of satisfaction of a word combination not observed in the corpus very well, and that the accuracy of syntactic disambiguation using the cooccurrencies estimated by our method is higher than using cooccurrence probabilities smoothed by previous methods. .
  • Placement of Nouns in a Multi-Dimensional Space Based on Words' Cooccurrency
    keyword : word vector, cooccurrency, example-based method, natural language processing
    2002.07~2003.10The semantic similarity (or distance) between words is one of the basic knowledge in Natural Language Processing. There have been several previous studies on measuring the similarity (or distance) based on word vectors in a multi-dimensional space. In those studies, high dimensional feature vectors of words are made from words' cooccurrence in a corpus or from reference relation in a dictionary, and then the word vectors are calculated from the feature vectors through the method like principal component analysis. This study proposed a new placement method of nouns into a multi-dimensional space based on words' cooccurrence in a corpus. The proposed method doesn't use the high dimensional feature vectors of words, but is based on the idea that ``vectors corresponding to nouns which cooccur with a word w in a relation f constitute a group in the multi-dimensional space''. Although the whole meaning of nouns isn't reflected in the word vectors obtained by the proposed method, the semantic similarity (or distance) between nouns defined with the word vectors is proper for an example-based disambiguation method. .
  • Robust Language Identification for Similar Languages and Short Texts
    keyword : Language Identification, Similar Language, WWW, Information Retrieval
    2004.08~2009.09A language identification is to estimate in what language a document is written, and is an important technology as a preprocess of information retrieval and natural language processing. This research proposes a language identification method which uses low-frequent byte-strings as language features. A general method identifies the language of a document by choosing the language which has the most similar probability distribution of byte-strings to that of the document. Most previous methods, whose similarity measures are based on frequencies of byte-strings, never use the low-frequent byte-strings because of the fluctuation of their frequencies. However, among low-frequent byte-strings, there are a lot of effective byte-strings in language identification, which tend to appear in a particular language. The similarity measure using not only frequent byte-strings but also low-frequent ones should be robust to the fluctuation of the probability and be sufficiently influenced by the low-frequent byte-strings. The similarity measure used in the proposed method is based on an intersection size of byte-strings between each language and a target document. Two kinds of preliminary examinations show that the proposed method has higher accuracy than the previous methods and has advantage in the language identification among similar languages or for short target documents. Now we are investigating revising the similarity measure and decreasing language features..
  • Assisting with Translating Japanese Collocations Based on the Word Co-occurrence on the Web Texts
    keyword : translation, Web document, cooccurrency, Word Sense Disambiguation
    2003.08~2005.09A Method for Retrieving Translations of a Collocation in Web Data When a Japanese writer translates Japanese collocation (v^J is a Japanese verb, n^J is the object of v^J, and ``WO'' is the postpositional particle) into English collocation under the condition that he knows n^E to be the translation of n^J, there are some ways to get the proper translation of v^E as follows : (1) Looking up v^J in a Japanese-English dictionary and finding the proper translation of v^J referring examples, (2) Looking up some candidates for v^J in some English documents. The first way sometimes fails in getting the proper translation. And the second way needs a lot of time and manual efforts. If candidates of the proper v^E can be extracted from documents on WWW together with example sentences, it is easier to find the proper translation. This study proposes a new method for retrieving the proper English expression corresponding to a Japanese collocation using web data..
Current and Past Project
  • Research on supporting the social science research through information science approach using a computer such as NLP
Academic Activities
Papers
1. Shinjiro Okaku, Yoichi Tomiura, Emi Ishita, Shosaku Tanaka, Towards Generating Multiple-Choice Tests for Supporting Extensive Reading, Proc. the Seventh International Conference on Mobile, Hybrid, and On-line Learning (eLmL 2015), 2015.02.
2. Yasuhiro Takayama, Yoichi Tomiura, Emi Ishita, Douglas W. Oard, Kenneth R. Fleischmann, An-Shou Cheng, A Word-Scale Probabilistic Latent Variable Model for Detecting Human Values, Proc. 23th ACM International Conference on Information and Knowledge Management (CIKM 2014), 2014.12.
3. Toshiaki Funatsu, Yoichi Tomiura, Emi Ishita, Kosuke Furusawa, Extracting Representative Words of a Topic Determined by Latent Dirichlet Allocation, Proc. The Sixth International Conference on Information, Process, and Knowledge Management (eKNOW 2014), 2014.03.
4. Relationship between Errors and Corrections in Verb Selection: Basic Research for Composition Support,
Journal of Natural Language Processing, Vol.18, No.1, pp.3-29.
5. A System Providing Appropriate Alternative Candidates for Japanese Writing using Word Co-occurrence,
T. Nakano, Y. Tomiura, Jpn. J. Educ. Technol., 34(3), pp.181-189 (2010).
6. Teiko NAKANO, Yoichi TOMIURA, Providing Appropriate Alternative Co-occurrence Candidates; Towards a Japanese Composition Support System, Proc. of the Ninth IASTED International Conference on Web-Based Education, pp. 173--179, 2010.03.
7. Method for Selecting Appropriate Sentence from Documents on the WWW for the Open-ended Conversation Dialog System.
8. Masahiro Shibata, Tomomi Nishiguchi, Yoichi Tomiura , Dialog System for Open-ended Conversation Using Web Documents, Informatica, Vol.33, No.3, pp.277-284, 2009.10.
9. M. Shibata, Y. Tomiura, T. Mizuta, Identification among Similar Languages Using Statistical Hypothesis Testing
, Proc. of Pacific Association for Computational Linguistics (PACLING'09) , pp.47--52 , 2009.09.
10. Analysis of Answering Method with Probability Conversion for Internet Research.
11. Discernment of Nativeness of English Documents Based on Statistical Hypothesis Testing,
Y. Tomiura, S. Aoki, M. Shibata, K. Yukino,
Journal of Natural Language Processing, Vol.16, No.1, pp.23-46 (2009).
12. Atsushi TAGAMI, Chikara SAKAKI, Teruyuki HASEGAWA, Shigehiro ANO, Yoichi TOMIURA, Optimization of Answering Method with Probability Conversion, Proc. of 2008 International Symposium on Applications and the Internet (SAINT'08), pp.249-252, 2008.07.
13. Masahiro Shibata, Tomomi Nishiguchi, Yoichi Tomiura, A Method for Automatically Generating Proper Responses to User's Utterances in Open-ended Conversation by Retrieving Documents on the Web, Proc. of 2008 IEEE International Conference on Information Reuse and Integration (IEEE IRI'08), pp.268-279, 2008.07.
14. Atsushi TAGAMI, Chikara SAKAKI, Teruyuki HASEGAWA, Shigehiro ANO, Yoichi TOMIURA, Analysis of Answering Method with Probability Conversion for Internet Research, Fifth IEEE Consumer Communications & Networking Conference (CCNC'08), pp.110-111, 2008.01.
15. Documents Discrimination between Native English Documents and Nonnative Ones Based on Language Identification Technique

S. Aoki, Y. Tomiura, K. Yukino, R. Tanigawa

Information Technology Letters, Vol.5, pp.85-88.
16. A Learning Method of a Layered Neural Network Whose Inputs and Outputs are Symbol Sequences,

M. Motoki, Y. Tomiura, N. Takahashi

IPSJ Journal, Vol.47, No.8, pp.2279--2791.
17. Language Identification Using Low-frequent Byte-strings

K. Yukino, S. Tanaka, Y. Tomiura, H. Matsumoto

IPSJ Journal, Vol.47, No.4, pp.1287--1294.
18. Y. Tomiura, S. Tanaka, T. Hitaka, Estimating Satisfactoriness of Selectional Restriction from Corpus without Thesaurus, ACM Transactions on Asian Language Information Processing, Vol.4, No.4, pp.400--416, 2005.12.
19. K. YUKINO, S. TANAKA, Y. TOMIURA, H. MATSUMOTO, Robust Language Identification for Similar Languages and short texts using Low-Frequent Byte Strings, Pacific Association for Computational Linguistics 2005 (Pacling 2005), pp.368-373, 2005.08.
20. Estimation of Nativeness of Documents Based on Skew Divergence,
H. Fujii, Y. Tomiura, S. Tanaka,
Journal of Natural Language Processing, Vol. 12, No. 4, pp.79-96 (2005).
21. Assisting with Translating Collocations Based on the Word Co-occurrence on the Web Texts,
M. Shibata, Y. Tomiura, S. Tanaka,
IPSJ Journal, Vol.46, No.6, pp.1480-1491 (2005).
22. M. Motoki, Y. Tomiura, N. Takahashi, Problems of FGREP Module and Their Solution, 3rd IEEE International Conference on Cognitive Informatics (ICCI2004), pp.220-227, 2004.08.
23. Masahiro SHIBATA, Yoichi TOMIURA, Shosaku TANAKA, A Method for Retrieving Translations of Collocation in Web Data, Asian Symposium on Natural Language Processing to Overcome Language Barriers (in conjunction with IJCNLP-04), 2004.03.
24. Estimation of Words' Cooccurrency from Corpus

Yoichi TOMIURA, Toru HITAKA

IPSJ Journal, Vol.45, No.1, pp.324-332.
25. Placement of Nouns in a Multi-Dimensional Space Based on Words' Cooccurrency

Yoichi TOMIURA, Shosaku TANAKA, Toru HITAKA

Transactions of the Japanese Society for Artificial Intelligence, Vol.19, No.1A, pp.1-9.
26. TAKAHASHI Naoto, MOTOKI Minoru, SHIMAZU Yoshio, TOMIURA Yoichi, HITAKA Toru, PP-attachment Ambiguity Resolution Using a Neural Network wiht Modified FGREP Method, the 2nd Workshop on Natural Language Processing and Neural Networks (post-conference workshop of NLPRS2001), pp.1-7, 2001.11.
27. Context Free Grammar Expressing Dependency Constraint and its Application to Japanese Language

Toshifumi TANABE, Yoichi TOMIURA, Toru HITAKA

IPSJ Journal, Vol.41, No.1, pp.36 - 45.
28. A Parameter Estimation of PCFG Expressing Dependency Constraints on a Sparse Sample

Yoichi TOMIURA, Toru HITAKA

IPSJ Journal, Vol.40, No.11, pp.4055 - 4063.
29. Classification of Syntactic Categories of Nouns by the Scattering of Semantic Categories

Shosaku TANAKA, Yoichi TOMIURA, Toru HITAKA

IPSJ Journal, Vol.40, No.9, pp.3387 - 3396.
30. Semantic Structure of Japanese Noun Phrasese "NP 'no' NP",
Yoichi TOMIURA, Teigo NAKAMURA, Toru HITAKA,
IPSJ Journal, Vol.36, No.6, pp.1441-1448 (1995).
31. Preference on Common Sense Reasoning and Application to Contextual Processing,
Yoichi TOMIURA, Natsuki ICHIMARU, Toru HITAKA,
IPSJ Journal, Vol.35, No.11, pp.2239 - 2248 (1994).
32. A New Data Structure for Searching Prefix Words: Prefix-Closed B-tree,
Yoichi TOMIURA, Teigo NAKAMURA, Toru Hitaka,
IPSJ Journal. Vol.35, No.5, pp.779-789 (2007).
33. Semantic Validity of Japanese Noun Phrases with Adnominal Particles,
Teigo NAKAMURA, Yoichi TOMIURA, Toru HITAKA,
in Proc. of PRICAI'92, Vol.1, No.2, pp.433-437 (1992).
34. Y. TOMIURA, T. NAKAMURA, T. HITAKA, S. YOSHIDA, Logical Form of Hierarchical Relation on Verbs and Extracting it from Definition Sentences in a Japanese Dictionary, Proc. of the th International Conference on Computational Linguistics(Coling-92), Vol.2, No.14, pp.574-580, 1992.07.
35. Extracting the Superordinate-Subordinate Relation between Verbs from Definition Sentences in Japanese Dictionaries,
Yoichi TOMIURA, Toru HITAKA, Sho YOSHIDA,
IPSJ Journal, Vol.32, No.1, pp.42 - 49 (1991).
Presentations
1. Kohei Omor, Yoichi Tomiura, Kenshi Hayashi, Statistical analysis for clustering of areas on the olfactory bulb and estimation of the physico-chemical properties detected by glomeruli in each area, ISOT 2016, 2016.06.08.
2. Shinjiro Okaku, Yoichi Tomiura, Emi Ishita, Shosaku Tanaka, Towards Generating Multiple-Choice Tests for Supporting Extensive Reading, The Seventh International Conference on Mobile, Hybrid, and On-line Learning (eLmL 2015), 2015.02.23.
3. Yasuhiro Takayama, Yoichi Tomiura, Emi Ishita, Douglas W. Oard, Kenneth R. Fleischmann, An-Shou Cheng, A Word-Scale Probabilistic Latent Variable Model for Detecting Human Values, ACM International Conference on Information and Knowledge Management (CIKM2014), 2014.12.06.
4. Toshiaki Funatsu, Yoichi Tomiura, Emi Ishita, Kosuke Furusawa, Extracting Representative Words of a Topic Determined by Latent Dirichlet Allocation, eKNOW 2014 (Digital World 2014), 2014.03.26.
5. Analysis of Answering Method with Probability Conversion for Internet Research,
Atsushi TAGAMI, Chikara SAKAKI, Teruyuki HASEGAWA, Shigehiro ANO, Yoichi TOMIURA,
Fifth IEEE Consumer Communications & Networking Conference (CCNC'08).
6. Robust Language Identification for Similar Languages and short texts using Low-Frequent Byte Strings,
Kensei Yukino, Shosaku Tanaka, Yoichi Tomiura and Hideki Matsumoto,
Pacific Association for Computational Linguistics 2005 (Pacling 2005).
7. Problems of FGREP Module and Their Solution,
M. Motoki, Y. Tomiura, N. Takahashi,
3rd IEEE International Conference on Cognitive Informatics,.
8. A Method for Retrieving Translations of Collocation in Web Data,
M. Shibata, Y. Tomiura, S Tanaka,
IJCNLP-04 Satellite Symposium.
Educational