准教授 ／ 情報基盤研究開発センター 応用データ科学研究部門
|1.||Yuko Kusakari, Takahiko Suzuki Sachio Hirokawa, Construction of over-praise removal filter for useful product review extraction, AROB 24th 2019, 389-392, 2019.01, When shopping at the EC site, product reviews are important factors for purchase decision. However, it is a well-known fact that the product reviews contain many useless reviews including fake reviews. It will be useful if one can filter out such useless reviews automatically. We propose a method for construction of a filter which can remove ‘over-praise’ reviews and evaluate the performance of the filter. In order to construct the filter by using supervised learning, it is necessary to prepare a certain amount of positive and negative examples. However, it is not a simple task to construct such sets of examples. We introduce a simple method of constructing the positive and negative examples of over-praise reviews. We focus on products that receive 5-star (max score) reviews more than a certain ratio. We use SVM with feature selection (SVM-FS) in supervised learning. We examine the selected feature words by SVM-FS. .|
|2.||Takahiko Suzuki Tsukasa Kamimasu Tetsuya Nakatoh Sachio Hirokawa , Identification of Unnatural Subsets in Statistical Data , ESKM 2018 (IIAI AAI 2018), 2018.07, Benford’s law is an observation on the frequency distribution of ﬁrst signiﬁcant digits in natural numerical data. We can measure the unnaturalness of the data by evaluating estrangementofthefrequencydistributionofleadingdigitsofthe datainrelationtotheBenford’sdistribution.However,wecannot identify the unnatural part of the data precisely. In this study, we focus on the fact that statistical data is generally provided in tabular form. We specify a subset of the target data by using the item names of rows and columns that deﬁne each cell of the table or words appearing in the table title. By measuring the degree of divergence of the subset from Benford’s distribution, we can identify unnatural subsets. We apply this method to agriculturerelated data from China Statistical Yearbook and succeeded to identify unnatural subsets..|
|3.||Kumiko Kanekawa, Takahiko Suzuki, tetsuya Nakatoh, Sachio Hirokawa, Analyzing Researcher Stage with Last Authorship Ratio: Who is the last author of your paper?, Proc. of 6th International Congress on Advanced Applied Informatics (IIAI
|4.||Kumiko Kanekawa, Tetsuya Nakatoh, Takahiko Suzuki, and Sachio Hirokawa, Assessment of Doctoral Supervision of International Students, 2018.03.|
|5.||Kumiko Kanekawa, Takahiko Suzuki,Tetsuya Nakatoh, Sachio Hirokawa, Analysis of Scientific Citation Context, Proc AICS 2017, 2017.08.|
|6.||Takahiko Suzuki, Sachio Hirokawa, Nao Wariishi, Interactive Visualization System of Contexts, Proc. ICCTD 2017, 2017.02, Abstract—Manyo-shu is the oldest Japanese waka anthology. Manyo-shu contains about 4,500 waka poetry which were written between the 7th and 8th centuries. On the other hand, Kokin-waka-shu is first original chokusen-waka-shu (chokusen-waka-shu is waka anthology recording waka selected at the behalf of the Emperor). Kokin-waka-shu is said to have been edited in 905, and it contains about 1100 waka poetry. There are many researches on Manyo-shu and Kokin-waka-shu. In these researches, the differences between Manyo-shu and Kokin-waka-shu is often pointed out. In this paper, we focused on such differences and visualized the context where seasonal words are used in Manyo-shu and Kokin-waka-shu. By using the visualized results, we could extract relation between “cherry blossom” and “scatter” in Kokin-waka-shu..|
|7.||Takahiko Suzuki, Sachio Hirokawa, Nao Wariishi, Discrepancy between Similarity and Polarity of Words Used in Sales Reports, Proc. AROB 2016, 2017.01, Evaluation and improvement of the quality of sales activities are important issues in corporate development. Recent results about sales report analysis shows that differences in business activities between salespeople with good sales results and the others can be identified by SVM (Support Vector Machine) using a limited number of words in the sales reports. We scrutinized the result and found that some pair of words have reverse polarity, although both of them are synonyms in the Japanese WordNet. One of such examples is a pair of ‘success’ and ‘achievement’. In this paper, we have quantitatively evaluated the performance of SVM in distinguishing salespeople when the features are limited to such pairs of synonyms with reverse polarities. The result shows 0.8 accuracy when we use 100 synonym pairs with reverse polarity in distinguishing salespeople with good sales result..|
|8.||Takahiko Suzuki, Sachio Hirokawa, Koki Miyata, Difficulty of Words and Their Ambiguity Estimated from the Result of Word Sense Disambiguation
, Proc. KICSS 2016, 2016.10, When learning a new word in language learning, there are two problems. One is how difficult the word itself is. The second is, in what kind of situation, it will be used. There is a research that defined quantitative ambiguity of words based on the structure of WordNet, then investigated the relationship between the ambiguity and the difficulty level of words. In this paper, we re-define ambiguity of word occurrences in text by using the result of word sense disambiguation technique. We analyze the relationship between the ambiguity of words and their difficulty level. We compare the result with those in the previous research. Utilizing knowledge and training data affect the relationship between the difficulty and ambiguity of words..
|9.||Takahiko Suzuki, Sachio Hirokawa, Takuya Hirao, Nao Wariishi, Kyota Hashimoto, Evaluation of Integrity of WordNet by Combining Word Similarity and Random Forest, AROB 2016, 2016.01, The Japanese WordNet contains erroneous synonyms. We have been exploiting the detection of the erroneous synonyms in the Japanese WordNet. In a previous study, it is shown that the combination of word2vec and decision tree is promising in distinguishing synonyms and related words from unrelated words. In this paper, we introduce an error detection method based on the combination of word2vec, WordNet structure and Random Forest. We discuss current results and possible improvement..|
|10.||Nao Wariishi, Takahiko Suzuki, Sachio Hirokawa, Shuichi Mitarai, Text mining of daily sales reports, AROB 2016, 2016.01, It is one of the most important issues for companies in any industries to improve the salespeople's business activities because the improvement leads to corporate achievements. In many companies, salespeople often record their activities in their daily sales reports. For example, they record when they contacted with their customers, what they talked with them, future outlook of the negotiation and so on. In this paper, by applying machine learning, we extract the factors which identify the differences of business activities between salespeople with good sales results and the other salespeople. Furthermore, we evaluate the classification performance using the importance of each factor..|
|11.||Takahiko Suzuki, Sachio Hirokawa, Kyota Hashimoto, Yuusuke Yoshida, Correspondence of Clustering of Questions and Clustering of Answers, Proc. AROB2016, 2016.01.|
|12.||Takahiko Suzuki, Sachio Hirokawa, Nao Wariishi, Takuya Hirao, Vector Similarity of Related Words and Synonyms in the Japanese WordNet, Information Engineering Express, IIAI, 1, 4, 21, 2015.12, Word2vec is a tool that produces vector representations of words from a large amount of text data. In this study, we show that only a part of the vector space produced by word2vec is sufficient to represent the collective sense of a set of related words in Japanese WordNet. Furthermore, we show that there is a subspace of the vector space that does not relate to the collective sense of related words and synonyms. We construct a compact decision tree by using the vectors to distinguish whether a given word belongs to the set of related words..|
|13.||Takahiho Suzuki, Sachio Hirokawa, Nao Wariishi, Takuya Hirao, Vector Similarity of Related Words in the Japanese WordNet, Proc ESKM 2015 IIAI, 21, 2015.08, Word2vec is a tool that produces vector representation of words from a large amount of text data. In this paper, we show that only a part of the vector space produced by word2vec is enough to represent the collective sense of a set of related words in the Japanese WordNet. Further, we will show that there is a subspace in the vector space which do not relate to the collective sense. We construct a compact decision tree by using the vectors in order to distinguish whether a given word belongs to the set of related words..|
|14.||Takahiko Suzuki, Sachio Hirokawa, Nao Wariishi, Sentiment Analysis of Wine Aroma, Proc. ESKM2105, IIAI, 2015.08, It has been easy for us to send information thanks to the growth of the Internet. Information includes reputation. Analysis of the large amount of reputation and making summary of the analysis is very helpful for consumers to decide which goods they should buy. They are also helpful for companies in order to make marketing decisions. Many researches try to classify documents on whether they have positive or negative sentiment. In this paper, we focus on more complex sentiment. We will show a result of machine classification of wine reviews from the point of view of “aroma” using Support Vector Machine (SVM)..|
|15.||Takahiko Suzuki, Sachio Hirokawa, Brendan Franagan, Nao Wariishi, Predicting and Visualizing Wine Characteristics Through Analysis of Tasting Notes From Viewpoints, Proceedings of HCI2015, 2015.08.|
|16.||Takahiko Suzuki, Sachio Hirokawa, Takuya Hirao, Kohki Miyata, Detection Method for Misplacement of Synonyms in the Japanese WordNet, 15, 2, 26-35, 2014.12, Lexical database the Japanese WordNet is a useful tool in natural language processing. However, it is officially announced that the Japanese WordNet contains 5% errors. In this paper, we discuss error detection methods in the Japanese WordNet..|
|17.||Takahiko Suzuki, Sachio Hirokawa, Tetsuya Nakatoh, Hiroto Nakae, Discovery of Implicit Feature Words of Place, Proceedings of International Conference on Advanced Applied Informatics (AAI 2014),
Kitakyushu, Japan, 2014.8.31-9.4., 2014.09, Individual opinions and experiences are published in Web
as CGM (consumer generated media). A tourism blog which a tourist
wrote his experience and impression in a certain area is very helpful
information for other tourists. However, a user cannot obtain such precious
information without knowing the relation of blog articles and concrete
place-names. We paid our attention to the hierarchical structure of
place-names. In this paper, we propose the method of connecting related
words to the place-name which does not appear explicitly in a blog article
paying attention to the hierarchical structure of place-names. From
from 45,553 blog articles about the Karatsu area in Saga Prefecture, the
potential related words about 78 place-names of Saga Prefecture which
have not appeared in the blogs were extracted. 4 subjects evaluated that
meaningful related words are obtained in 80% or more of the placenames.
However, the direct relationships between the place-name and
related words was not able to be guessed easily..
|18.||Takahiko Suzuki, Sachio Hirokawa, Takuya Hirao, Kohki Miyata, Detection of Misplacement of Synonyms in the Japanese WordNet, Proc. of International Conference on Advanced Applied Informatics (AAI 2014),
Kitakyushu, Japan, 2014.8.31-9.4., 2014.09, Lexical database the Japanese WordNet is a useful tool in natural language processing. However, it is officially announced that the Japanese WordNet contains 5% errors. In this paper, we classify errors in the Japanese WordNet and discuss error detection methods..
|19.||Takahiko Suzuki, Kohki Miyata, Sachio Hirokawa, Difficulty and Ambiguity of Verbs
－Analysis based on Synsets in Japanese WordNet－, IIAI AAI AIT 2013, 2013.12, When foreign students learn Japanese words, they encounter two types of problems. The first is the difficulty of the word itself. The second problem is related to the situation which it is used. The meaning of the word may differ as the situation changes. It is necessary to understand the ambiguity of the word. In this paper, we propose three simple formulas representing ambiguity of words based on Synset structure in Japanese WordNet. Then we analyze the relationship between the ambiguity and the difficulty of Japanese words, particularly verbs. We use vocabulary level of Japanese-Language Proficiency Test (old version) as the difficulty measure. The result shows that easy (not difficult) words are more ambiguous than difficult words..
|20.||Takahiko Suzuki, Takeshi Michiwaki, Irregularities In Candidates Selection Of Multi-Aimed
University Entrance Examinations
, IIAI AAI LTLE 2013, 2013.09, In Japanese university entrance examinations, examinee is sometimes allowed to aim at two or more departments per single entrance examination opportunity. We call such an examination a ‘multi-aimed’ examination. We define the model of a ‘multi-aimed’ examination and list the requirements that should be satisfied in selecting successful candidates on multi-aimed examinations. We show that there are certain drawbacks in the traditional method of selection..
|21.||Chengjiu Yin, Sachio Hirokawa, Brendan Flanagan, Takahiko Suzuki,
, Mistake discovery and generation of exercises
automaticity in context, LTLE2012, 2012.09.
|22.||Jun Zeng, Toshihiko Sakai, Chengjiu Yin, Takahiko Suzuki, Sachio Hirokawa
, Automatic Generation of Tourism Quiz using Blogs, 7th International Symposium on Artificial Life and Robotics(AROB2012)