1. Takahiko Suzuki Tsukasa Kamimasu Tetsuya Nakatoh Sachio Hirokawa, Identification of Unnatural Subsets in Statistical Data, ESKM 2018 (IIAI AAI 2018), 2018.07, Benford’s law is an observation on the frequency distribution of first significant digits in natural numerical data. We can measure the unnaturalness of the data by evaluating estrangementofthefrequencydistributionofleadingdigitsofthe datainrelationtotheBenford’sdistribution.However,wecannot identify the unnatural part of the data precisely. In this study, we focus on the fact that statistical data is generally provided in tabular form. We specify a subset of the target data by using the item names of rows and columns that define each cell of the table or words appearing in the table title. By measuring the degree of divergence of the subset from Benford’s distribution, we can identify unnatural subsets. We apply this method to agriculturerelated data from China Statistical Yearbook and succeeded to identify unnatural subsets..
2. When shopping at the online shop, reviews attached to products are important elements of purchase judgment. However, some of these include reviews that do not objectiveness that praise the item, such as fake reviews etc. Many useless reviews also exist, and these reviews hinder purchase decisions. We created filters to exclude those useless reviews and evaluated the performance. When creating the filter, evaluate the review of the product Extracted praise reviews were extracted from the distribution, focusing on products that have been posted at a certain rate or higher..
3. When learning a new word in language learning, there are two problems. One is how difficult the word itself is. The second is in what kind of situation it will be used. In previous paper we had defined quantitative ambiguity words based on the static structure of WordNet. We had investigated the relationship between the static ambiguity and the degree of difficulty of words. In this paper, we newly define ‘dynamic’ ambiguity of words in sentences by using the result of Word Sense Disambiguation (WSD) system. We analyze the relationship between dynamic ambiguity of words and their difficulty. We compare the result with those in the previous paper. For dynamic ambiguity, utilizing knowledge and learning affects the relationship to the difficulty of words..
4. Conceptual dictionary Japanese WordNet is a useful tool in natural language processing. However, it is officially announced that the Japanese WordNet contains 5% errors. In this paper, we classify errors in the Japanese WordNet and discuss automatic detection methods of the errors. .
5. Lexical Database the Japanese WordNet is a useful tool in natural language processing. However, it is officially announced that Japanese WordNet contains 5% errors. In this paper, we discuss error detection methods in the Japanese WordNet..
6. The concept dictionary WordNet contains many verbs with two or more meanings, i.e., ambiguity. This paper proposes a
formulation of the ambiguity and analyzes if there is any relationship between the degree of ambiguity of a verb and the
difficulty of the verb. As an index of the difficulty of a word, the level of Japanese Language Aptitude Test (JLPT) is used. The
paper considers possibility of disambiguation of verbs..