Researcher Database

Researcher Profile and Settings

Master

Affiliation (Master)

  • Faculty of Information Science and Technology Media and Network Technologies Information Media Science and Technology

Affiliation (Master)

  • Faculty of Information Science and Technology Media and Network Technologies Information Media Science and Technology

researchmap

Profile and Settings

Affiliation

  • Hokkaido University, Graduate School of Information Science and Technology, Associate Profesor

Profile and Settings

  • Name (Japanese)

    Itoh
  • Name (Kana)

    Toshihiko
  • Name

    200901035152015271

Alternate Names

Affiliation

  • Hokkaido University, Graduate School of Information Science and Technology, Associate Profesor

Achievement

Research Interests

  • 対話制御   発話意図   音声対話   音声対話システム   対話リズム   ユーザ満足度   音声言語理解   発話タイミング   韻律   身体性   動画   話者交替   共同補完   ペン入力   アニメーション生成   音声インターフェース   学習支援システム   生命音声認識   教材知識ベース   MULTEXT   文献検索   日本語教育   顔表情   韻律コーパス   携帯情報端末   文脈処理   基本周波数   フォーム入力   手   姓名音声認識   音声言語情報処理   Speech Language Processing   

Research Areas

  • Informatics / Intelligent robotics
  • Informatics / Perceptual information processing
  • Humanities & social sciences / Educational technology
  • Informatics / Intelligent informatics

Research Experience

  • 2007 - 2010 北海道大学 大学院・情報科学研究科 准教授
  • 1999 - 2002 Shizuoka University
  • 1999 - 2002 Shizuoka University, Research Assistant

Education

  •        - 1999  Toyohashi University of Technology
  •        - 1999  Toyohashi University of Technology  Graduate School, Division of Engineering
  •        - 1996  Toyohashi University of Technology  Faculty of Engineering
  •        - 1996  Toyohashi University of Technology  Faculty of Engineering

Awards

  • 2018/12 電子情報通信学会 平成30年度ヒューマンコミュニケーション賞
     PCノートテイカーによる誤入力文章の自動修正システム 
    受賞者: 平井 康義;伊藤 敏彦

Published Papers

  • Noriki Fujiwara, Toshihiko Itoh, Kenji Araki, Atsuhiko Kai, Tatsuhiro Konishi, Yukihiro Itoh
    Systems and Computers in Japan 38 (9) 21 - 31 0882-1666 2007/08 [Not refereed][Not invited]
     
    In the real environment, it is hard for a speech recognizer to avoid misrecognitions completely. However, if misrecognitions occur, user's intentions are usually misunderstood by a conventional language understanding technique, which simply gives priority to the higher rank hypothesis of a speech recognition result (N-best). The utterances in a dialogue are coherent and correct user's intentions might appear in the lower rank hypothesis of N-best. To understand user's speech intentions in the real environment, we propose the language understanding technique that utilizes the dialogue context and confidence measure, which is the word posterior probability. The experimental results show that proposed technique is more efficient (about 15%) than the conventional technique. © 2007 Wiley Periodicals, Inc.
  • Noriki Fujiwara, Toshihiko Itoh, Kenji Araki
    TEXT, SPEECH AND DIALOGUE, PROCEEDINGS 4629 564 - 573 0302-9743 2007 [Refereed][Not invited]
     
    We consider that factors such as prosody of systems' utterances and dialogue rhythm are important to attain a natural human-machine dialogue. However, the relations between dialogue rhythm and speaker's various states in task-oriented dialogue have been not revealed. In this study, we collected task-oriented dialogues and analyzed the relations between "dialogue structures, kinds of dialogue acts (contents of utterances), Aizuchi (backchannellacknowledgment), Repeat and interjection" and "dialogue rhythm (response timing, F0, and speech rate)".
  • Shinya Yamada, Toshihiko Itoh, Kenji Araki
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5 481 - 484 2006 [Not refereed][Not invited]
     
    This paper presents the characteristic differences of linguistic and acoustic features observed in different spoken dialogue situations and with different dialogue partners: human-human vs. human-machine interactions. And it also presents influences of awareness of users on those characteristics. We compare the linguistic and acoustic features of the user's speech to a spoken dialogue system and to a human operator in several goal setting and destination database searching tasks for a car navigation system. Because it is not clear enough whether different dialogue situations and different dialogue partners cause any differences of linguistic or acoustic features on one's utterances in a speech interface system, we have performed experiments in several dialogue situations[4]. However, in these experiments the conditions such as voice quality and awareness of users such as impressions on the partner and prejudices against a system have not been considered. And so we collected a set of spoken dialogues in new dialogue situations. To investigate influence of voice quality, we also prepare recorded voice for response of dialogue partners and compared the influences of voice (natural voice, synthetic voice and recorded voice). We also made users answer questionnaire before and after the experiments and investigated characteristic differences caused by awareness of users. Additionally, in order to confirm the usefulness of the results of all experiments, we actually applied acoustic features of users' utterances and identified the utterances made to a system.
  • JA Xu, T Itoh, K Araki, K Tochinai
    IEEE INTERNATIONAL SYMPOSIUM ON COMMUNICATIONS AND INFORMATION TECHNOLOGIES 2004 (ISCIT 2004), PROCEEDINGS, VOLS 1 AND 2 103 - 108 2004 [Not refereed][Not invited]
     
    This paper describes a basic idea how to realize an intelligent learning room system. Such a system needs to have dynamical adaptive capability for each user. We have proposed a method to predict user action using Inductive Learning with N-gram. The system based on our proposed method is able to acquire rules automatically from data pairs through Inductive Learning. As unified with N-gram, the system demonstrates a high predictive accuracy. However, the acquired rules express the user's habits and preferences. Consequently, it is possible that the system adapts dynamically to each user. The user need to proofread the errors in prediction results. Therefore the prediction ability improves. As a result, the number of errors decreases. This paper unifies N-gram and Inductive Learning to develop the Point-Pass-Based Prediction system. The system was found to have good accuracy of which the highest prediction accuracy was about 89.3(%). The system was improved that it has high dynamic adaptive ability.
  • JA Xu, T Itoh, K Araki, K Tochinai
    2004 7TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS, VOLS 1-3 1605 - 1609 2004 [Not refereed][Not invited]
     
    Being society aging, an intelligent room is needed for the aged or handicapped. The important ingredient of such a system is how to predict the next action. In this paper we describe how to solve the problem of predicting inhabitant action in an intelligent room that we called learning room. We have proposed a method to predict user action using Inductive Learning (IL) with N-gram. The system based on our proposed method is able to acquire the inimanent causality rules automatically from data pairs bv means of IL. Since our system unified IL and N-gram, it demonstrates good accuracy for the simulated data. The system showed high dynamic adaptive capability.
  • K ARAKI, T ITOH
    IEEE TRANSACTIONS ON MICROWAVE THEORY AND TECHNIQUES 29 (9) 911 - 916 0018-9480 1981 [Refereed][Not invited]

MISC

  • 平井康義, 伊藤敏彦  電子情報通信学会技術研究報告  117-  (502(WIT2017 66-91))  2018
  • 平井康義, 伊藤敏彦  電気・情報関係学会北海道支部連合大会講演論文集(CD-ROM)  2017-  2017
  • 小川翼, 伊藤敏彦  人工知能学会言語・音声理解と対話処理研究会資料  68th-  2013
  • 伊藤敏彦, 小川翼  人工知能学会言語・音声理解と対話処理研究会資料  68th-  2013
  • Ema Junki, Wang Longbiao, Kai Atsuhiko, Itoh Toshihiko  Proceedings of the Society Conference of IEICE  2010-  (0)  166  -166  2010/08/31  [Not refereed][Not invited]
  • 江間旬記, WANG Longbiao, 甲斐充彦, 伊藤敏彦  電子情報通信学会大会講演論文集  2010-  166  2010/08/31  [Not refereed][Not invited]
  • 桂川 景子, 大野 健, 冨樫 実, 小暮 悟, 伊藤 敏彦, 小西 達裕, 伊東 幸宏  情報処理学会論文誌  50-  (1)  181  -192  2009/01/15  
    近年,多くのユーザがPCや携帯電話,カーナビなど複数種類の情報端末機器を所有,利用するようになってきた.これにともない,同一のサービスを環境に応じてインタフェースを変更しながら利用可能な仕組みへの要求も高まっている.しかし複数種類の端末機器を使用する際には,同一のシステムであっても各機器に備わるインタフェースでの操作方法を個別に習得しなければならず,扱える端末機器が増えるほどユーザの負担は増大する.本論文ではこの問題を,ミラー効果に基づく相互チュートリアル機能を各インタフェースに持たせることで解決する.ミラー効果に基づく相互チュートリアルとは次の2つの特徴を持つ操作方法の学習支援機構である.(1) 同一システム上の同一タスクを実行する際には,どのインタフェースを使用する際であっても,同一のプログラムと入力データが利用されることに着目し,異なるインタフェース上で同一の操作結果が得られる操作手順を生成する.(2) ユーザがあるインタフェースを利用した際に,同一タスクを他のインタフェースで実行するための操作手順を生成・デモンストレーションすることで他のインタフェースでの操作方法を自然に身につけさせる.カーナビでのメニュー操作インタフェースからPCでの自然言語インタフェースへの相互チュートリアル機能を実装したシステムの評価では,タスク達成時間が約24%短縮,入力文の受理率は約17%向上し,その有効性が示された.Recently, cell phones, PCs and car navigation systems are increasingly used for taking advantage of a single software service. Although such a system typically offers different interfaces according to the users' environments, not every user is familiar to the operation of all the available devices; hence, the users face difficulties in switching from one device to another. One of the biggest problems of a service accessible from multiple devices is that the users must learn different operations on different interfaces. Needless to say, this is a heavy burden on the users and it is desirable to alleviate the load. To solve this problem, we propose a mirror-effect-based mutual tutorial system to support learning operations on different interfaces. The basic functions of our tutorial system involve the following two procedures. 1) By focusing on a pair of a program and its input data for conducting a task, the system generates operation procedure to perform the same task on different interfaces. 2) By demonstrating the generated procedures, the system helps the users learn operations on different interfaces. It is experimentally confirmed that the tutorial system improves the usability. The task completion time is reduced by 24% and the input acceptance rate is increased by 17%.
  • Keiko Katsuragawa, Takeshi Oono, Minoru Tomikashi, Satoru Kogure, Toshihiko Itoh, Tatsuhiro Konishi, Yukihiro Itoh  IPSJ Journal  50-  (1)  181  -192  2009/01/15  [Not refereed][Not invited]
     
    Recently, cell phones, PCs and car navigation systems are increasingly used for taking advantage of a single software service. Although such a system typically offers different interfaces according to the users' environments, not every user is familiar to the operation of all the available devices; hence, the users face difficulties in switching from one device to another. One of the biggest problems of a service accessible from multiple devices is that the users must learn different operations on different interfaces. Needless to say, this is a heavy burden on the users and it is desirable...
  • KATSURAGAWA KEIKO, KATSURAGAWA KEIKO, ONO TAKESHI, TOMIKASHI MINORU, KOGURE SATORU, ITOH TOSHIHIKO, KONISHI TATSUHIRO, ITO YUKIHIRO  情報処理学会論文誌ジャーナル(CD-ROM)  50-  (1)  181  -192  2009/01/15  [Not refereed][Not invited]
  • ITOH TOSHIHIKO, KITAOKA NORIHIDE, NISHIMURA RYOTA  電子情報通信学会技術研究報告  108-  (283(NLC2008 19-23))  7  -12  2008/11/03  [Not refereed][Not invited]
     
    In order to examine the validity of the findings on the dialogue rhythm which we previously pointed out, we made some dialogue samples with various rhythm/utterance timing and evaluated them subjectively from the view points of naturalness of whole the dialogues, unnaturalness of synthesized speech, and intelligibility of the dialogues. We used short task-oriented four-turns dialogues using speech synthesizer in Experiment No. 1, and approx. one-minute chat-like dialogues in No. 2 using natural human utterances and synthesized voices. The results of these experiments supported our previous analysis that the utterance timing is important for natural dialogue and the timing of each utterance mainly depends on the contents of the utterance.
  • ITOH TOSHIHIKO, KITAOKA NORIHIDE, NISHIMURA RYOTA  情報処理学会研究報告  2008-  (68(SLP-72))  99  -104  2008/07/11  [Not refereed][Not invited]
  • NAKANO MIKIO, FUNAKOSHI KOTARO, ITOH TOSHIHIKO, ARAKI KENJI, HASEGAWA YUJI, TSUJINO HIROSHI  人工知能学会全国大会論文集(CD-ROM)  22nd-  1H1-04  -4  2008  [Not refereed][Not invited]
  • 藤原 敬記, 伊藤 敏彦, 荒木 健治  言語・音声理解と対話処理研究会  50-  (0)  45  -50  2007/07/23  [Not refereed][Not invited]
  • FUJIWARA NORIKI, ITOH TOSHIHIKO, ARAKI KENJI  人工知能学会言語・音声理解と対話処理研究会資料  50th-  45  -50  2007/07/23  [Not refereed][Not invited]
  • FUJIWARA Noriki, ITOH Toshihiko, ARAKI Kenji  IPSJ SIG Notes  2007-  (47)  37  -42  2007/05/24  [Not refereed][Not invited]
     
    We consider that factors such as prosody of systems' utterances and dialogue rhythm axe important to attain a natural human-machine dialogue. However, it has been not revealed the relations between dialogue rhythm and speaker's various states in task-oriented dialogue. In this study, we collected task-oriented dialogues and analyzed the relations between "dialogue structures, kinds of dialogu acts (contents of utterances), Aizuchi (backchannel/acknowledgment), Repeat and interjection" and "dialogue rhythm (response timing, F0, and speech rate)". From the results, we understood that dialogue rhythm is affected by dialogue structures and dialogue acts significantly, moreover, utterances of Aizuchi and Repeat conform to restrictions to keep dialogue rhythm.
  • FUJIWARA Noriki, ITOH Toshihiko, ARAKI Kenji  IPSJ SIG Notes  2007-  (47)  37  -42  2007/05/24  [Not refereed][Not invited]
     
    We consider that factors such as prosody of systems' utterances and dialogue rhythm axe important to attain a natural human-machine dialogue. However, it has been not revealed the relations between dialogue rhythm and speaker's various states in task-oriented dialogue. In this study, we collected task-oriented dialogues and analyzed the relations between "dialogue structures, kinds of dialogu acts (contents of utterances), Aizuchi (backchannel/acknowledgment), Repeat and interjection" and "dialogue rhythm (response timing, F0, and speech rate)". From the results, we understood that dialogue...
  • FUJIWARA NORIKI, ITOH TOSHIHIKO, ARAKI KENJI  情報処理学会研究報告  2007-  (47(NL-179 SLP-66))  37  -42  2007/05/24  [Not refereed][Not invited]
  • 伊藤敏彦, 山田真也, 荒木健治  日本音響学会誌  63-  (5)  251  -261  2007/05/01  [Not refereed][Not invited]
  • 陰地 祐太, 小暮 悟, 伊藤 敏彦  言語・音声理解と対話処理研究会  49-  (0)  57  -62  2007/03/02  [Not refereed][Not invited]
  • ONJI YUTA, KOGURE SATORU, ITOH TOSHIHIKO, KAI ATSUHIKO, KONISHI TATSUHIRO, ITO YUKIHIRO  人工知能学会言語・音声理解と対話処理研究会資料  49th-  57  -62  2007/03/02  [Not refereed][Not invited]
  • Iwasaki Yoshinori, Kogure Satoru, Itoh Toshihiko, Kai Atsuhiko, Konishi Tatsuhiro, Itoh Yukihiro  IPSJ SIG Notes  2007-  (11)  67  -72  2007/02/09  [Not refereed][Not invited]
     
    Recently, the technology of speech recognition and natural language processing, and the performance of computer calculation ability has been highly improved, so we can utilize speech interface to handle information service in car. Cars' spoken dialogue systems like existent navigation system, however, often misrecognized user utterances. In this paper, the system predicts the frequency uttered word using the contextual information and the system response, and raise word occurrence probabilities of those words. As a result, we make the correct answer word appear in the recognition result eas...
  • IWASAKI YOSHINORI, KOGURE SATORU, ITOH TOSHIHIKO, KAI ATSUHIKO, KONISHI TATSUHIRO, ITO YUKIHIRO  情報処理学会研究報告  2007-  (11(HI-122 SLP-65))  67  -72  2007/02/09  [Not refereed][Not invited]
     
    Recently, the technology of speech recognition and natural language processing, and the performance of computer calculation ability has been highly improved, so we can utilize speech interface to handle information service in car. Cars' spoken dialogue systems like existent navigation system, however, often misrecognized user utterances. In this paper, the system predicts the frequency uttered word using the contextual information and the system response, and raise word occurrence probabilities of those words. As a result, we make the correct answer word appear in the recognition result easily. As a result of the evaluation experiment, the word recognition rate rose from 83.5% to 85.1% according to use the proposal method. We show the effectiveness of the method.
  • Yuki Ikegaya, Yasuhiro Noguchi, Satoru Kogure, Toshihiko Itoh, Tatsuhiro Konishi, Makoto Kondo, Hideki Asoh, Akira Takagi, Yukihiro Itoh  Transactions of the Japanese Society for Artificial Intelligence  22-  (3)  291  -310  2007  [Not refereed][Not invited]
     
    This paper describes how to perform syntactic parsing and semantic analysis in a dialog system. The paper especially deals with how to disambiguate potentially ambiguous sentences using the contextual information. Although syntactic parsing and semantic analysis are often studied independently of each other, correct parsing of a sentence often requires the semantic information on the input and/or the contextual information prior to the input. Accordingly, we merge syntactic parsing with semantic analysis, which enables syntactic parsing taking advantage of the semantic content of an input and its context. One of the biggest problems of semantic analysis is how to interpret dependency structures. We employ a framework for semantic representations that circumvents the problem. Within the framework, the meaning of any predicate is converted into a semantic representation which only permits a single type of predicate: an identifying predicate "aru". The semantic representations are expressed as sets of "attribute-value" pairs, and those semantic representations are stored in the context information. Our system disambiguates syntactic/semantic ambiguities of inputs referring to the attribute-value pairs in the context information. We have experimentally confirmed the effectiveness of our approach specifically, the experiment confirmed high accuracy of parsing and correctness of generated semantic representations.
  • Shigeta Yoshihiro, Ikegaya Yuki, Noguchi Yasuhiro, Kogure Satoru, Itoh Toshihiko, Konishi Tatsuhiro, Kondo Makoto, Itoh Yukihiro  情報科学技術フォーラム一般講演論文集  5-  (2)  157  -158  2006/08/21  [Not refereed][Not invited]
  • 繁田佳宏, 池ケ谷有紀, 野口靖浩, 小暮悟, 伊藤敏彦, 小西達裕, 近藤真, 伊東幸宏  情報科学技術フォーラム  FIT 2006-  157  -158  2006/08/21  [Not refereed][Not invited]
  • FUJIWARA Noriki, ITOH Toshihiko, ARAKI Kenji, KAI Atsuhiko, KONISHI Tatsuhiro, ITOH Yukihiro  The IEICE transactions on information and systems (Japanese edetion)  89-  (7)  1493  -1503  2006/07/01  [Not refereed][Not invited]
     
    実環境での音声対話システムの使用において,誤認識を回避することは難しい.誤認識が起きると,システムはユーザの期待する応答とかけ離れた応答を行い,対話がスムーズに進まなくなることも多い.そこで本研究では,音声認識器が誤認識した場合でも,認識信頼度と対話履歴を用いることで正しくユーザの意図を推定することができる音声言語理解手法を提案する.これは,音声認識器が誤認識した場合でも多くの場合,複数候補(N-best)中に正解が含まれていること,システムが誤認識した場合にはユーザは大体訂正反応を示すこと,タスク指向対話には強い一貫性がありユーザは基本的に意味的・文脈的に関係した内容以外を発話しないことを利用する.また,提案手法ではあらかじめすべての認識可能単語を理解候補として保持し,言語理解部の対話戦略において音声認識結果中の単語との意味的関連性などを考慮している.これにより音声認識結果のN-best中に正解の一部が含まれていない場合でも,複数のユーザ発話の認識結果に基づくことで正しい意図を推定することが可能となっている.評価データにおいて,提案手法における対話単位での理解率は72.2%(21,430/29,670対話),単語単位での理解率は87.1%(77,544/89,010単語)であり,従来手法の最新認識結果の上位候補を優先するシステムの57.9% (17,178/29,670対話...
  • FUJIWARA NORIKI, ITOH TOSHIHIKO, ARAKI KENJI, KAI ATSUHIKO, KONISHI TATSUHIRO, ITO YUKIHIRO  電子情報通信学会論文誌 D  J89-D-  (7)  1493  -1503  2006/07/01  [Not refereed][Not invited]
     
    実環境での音声対話システムの使用において,誤認識を回避することは難しい.誤認識が起きると,システムはユーザの期待する応答とかけ離れた応答を行い,対話がスムーズに進まなくなることも多い.そこで本研究では,音声認識器が誤認識した場合でも,認識信頼度と対話履歴を用いることで正しくユーザの意図を推定することができる音声言語理解手法を提案する.これは,音声認識器が誤認識した場合でも多くの場合,複数候補(N-best)中に正解が含まれていること,システムが誤認識した場合にはユーザは大体訂正反応を示すこと,タスク指向対話には強い一貫性がありユーザは基本的に意味的・文脈的に関係した内容以外を発話しないことを利用する.また,提案手法ではあらかじめすべての認識可能単語を理解候補として保持し,言語理解部の対話戦略において音声認識結果中の単語との意味的関連性などを考慮している.これにより音声認識結果のN-best中に正解の一部が含まれていない場合でも,複数のユーザ発話の認識結果に基づくことで正しい意図を推定することが可能となっている.評価データにおいて,提案手法における対話単位での理解率は72.2%(21,430/29,670対話),単語単位での理解率は87.1%(77,544/89,010単語)であり,従来手法の最新認識結果の上位候補を優先するシステムの57.9% (17,178/29,670対話),75.4%(67,084/89,010単語)と比較しても有効である.
  • YAMADA Shinya, ITOH Toshihiko, ARAKI Kenji  IPSJ SIG Notes  2006-  (40)  7  -12  2006/05/11  [Not refereed][Not invited]
     
    This paper presents usefulness of identifying user's utterances made to a spoken dialogue system using machine learning which uses acoustic features of user's utterances recorded in various situations. We have already performed dialogue experiments with two speakers (human-human or human-machine patterns) in several situations and we newly performed the experiments with three speakers (human-human-machine). The dialogue task simulates voice control of a car navigation system, where we made users perform goal settings or look the goal up in destination database. We prepared a spoken dialogue...
  • SHOJI Keisuke, TAKAHASHI Mika, IBARA Seiya, ITOH Toshihiko, ARAKI Kenji  IPSJ SIG Notes  2006-  (40)  43  -48  2006/05/11  [Not refereed][Not invited]
     
    The best rhythm of the conversation between humans is developed during their conversation. It can be expected that users conscious of rhythm will perform smoother conversation. In this paper, as one of the methods is to copy human communication abilities as much as possible, we develop spoken dialog system which puts the importance to the rhythm of dialog aiming at improvement of user satisfaction by encouraging an user to utter naturally. Elements of rhythm of dialog that we paid our attention is speaking rate and timing of an utterance and backchanneling from a system. To realize such nat...
  • YAMADA SHIN'YA, ITOH TOSHIHIKO, ARAKI KENJI  情報処理学会研究報告  2006-  (40(SLP-61))  7  -12  2006/05/11  [Not refereed][Not invited]
     
    This paper presents usefulness of identifying user's utterances made to a spoken dialogue system using machine learning which uses acoustic features of user's utterances recorded in various situations. We have already performed dialogue experiments with two speakers (human-human or human-machine patterns) in several situations and we newly performed the experiments with three speakers (human-human-machine). The dialogue task simulates voice control of a car navigation system, where we made users perform goal settings or look the goal up in destination database. We prepared a spoken dialogue system for all experiments and prepared a human operator for the experiment with two speakers. We used the dialogue data achieved from the experiments and identified user's utterances made to the spoken dialogue system. Additionally, by comparison with utterances which were collected from different situations, we researched the influence of various conditions on performance of identifying utterances.
  • SHOJI KEISUKE, TAKAHASHI MIKA, IBARA SEIYA, ITOH TOSHIHIKO, ARAKI KENJI  情報処理学会研究報告  2006-  (40(SLP-61))  43  -48  2006/05/11  [Not refereed][Not invited]
     
    The best rhythm of the conversation between humans is developed during their conversation. It can be expected that users conscious of rhythm will perform smoother conversation. In this paper, as one of the methods is to copy human communication abilities as much as possible, we develop spoken dialog system which puts the importance to the rhythm of dialog aiming at improvement of user satisfaction by encouraging an user to utter naturally. Elements of rhythm of dialog that we paid our attention is speaking rate and timing of an utterance and backchanneling from a system. To realize such natural rhythm, we newly designed three modules-Understanding Component to predict user's task intention in the middle of his utterance while performing language understanding by a pause unit; Response Generator which generates the response considering rhythm and uses a user model; Rhythm Generator to perform a speaker change judgment including backchanneling judgment and rythm synchronization in real time. These components are to construct a task oriented spoken dialog system.
  • 高木 浩吉, 小暮 悟, 伊藤 敏彦  言語・音声理解と対話処理研究会  46-  (0)  33  -38  2006/03/03  [Not refereed][Not invited]
  • TAKAGI HIROYOSHI, KOGURE SATORU, ITOH TOSHIHIKO, KAI ATSUHIKO, KONISHI TATSUHIRO, ITO YUKIHIRO  人工知能学会言語・音声理解と対話処理研究会資料  46th-  33  -38  2006/03/03  [Not refereed][Not invited]
  • YAMADA Shinya, ITOH Toshihiko, ARAKI Kenji  IPSJ SIG Notes  2005-  (127)  67  -72  2005/12/21  [Not refereed][Not invited]
     
    This paper presents our analyses of human-human and human-machine interactions and the characteristic differences of linguistic and acoustic features observed in different spoken dialogue situations and with different dialogue partners. The linguistic and acoustic features of the user's speech to a spoken dialogue system and a human operator are compared in several goal setting and destination database searching tasks for a car navigation system. It is said that it is not clear enough whether different dialogue situations, different dialogue partners and different speech recognition rate ca...
  • SUZUKI Sadayuki, KOGURE Satoru, ITOH Toshihiko, KAI Atsuhiko, KONISHI Tatsuhiro, ITOH Yukihiro  IPSJ SIG Notes  2005-  (127)  115  -120  2005/12/21  [Not refereed][Not invited]
     
    In this paper, we propose the technique for improving the N-best candidates accuracy in a spontaneous utterance by destination setting task with car navigation, combining the N-best candidates using a grammatical constraint at sentence and the word lattice using word spotting, to improve performance in the framework of speech understanding from the N-best candidates in early research. The system calculates the reliability of the word of each utterance by using the word lattice. We use the reliability to raise the word likelihood and to exchange the word of the N-best candidate by grammatica...
  • SUZUKI Sadayuki, KOGURE Satoru, ITOH Toshihiko, KAI Atsuhiko, KONISHI Tatsuhiro, ITOH Yukihiro  IEICE technical report. Speech  105-  (496)  25  -30  2005/12/15  [Not refereed][Not invited]
     
    In this paper, we propose the technique for improving the N-best candidates accuracy in a spontaneous utterance by destination setting task with car navigation, combining the N-best candidates using a grammatical constraint at sentence and the word lattice using word spotting, to improve performance in the framework of speech understanding from the N-best candidates in early research. The system calculates the reliability of the word of each utterance by using the word lattice. We use the reliability to raise the word likelihood and to exchange the word of the N-best candidate by grammatica...
  • SUZUKI SADAYUKI, KOGURE SATORU, ITOH TOSHIHIKO, KAI ATSUHIKO, KONISHI TATSUHIRO, ITO YUKIHIRO  電子情報通信学会技術研究報告  105-  (496(SP2005 105-138))  25  -30  2005/12/15  [Not refereed][Not invited]
  • YAMADA Shinya, ITOH Toshihiko, ARAKI Kenji  IEICE technical report. Speech  105-  (495)  67  -72  2005/12/14  [Not refereed][Not invited]
     
    This paper presents our analyses of human-human and human-machine interactions and the characteristic differences of linguistic and acoustic features observed in different spoken dialogue situations and with different dialogue partners. The linguistic and acoustic features of the user's speech to a spoken dialogue system and a human operator are compared in several goal setting and destination database searching tasks for a car navigation system. It is said that it is not clear enough whether different dialogue situations, different dialogue partners and different speech recognition rate ca...
  • YAMADA SHIN'YA, ITOH TOSHIHIKO, ARAKI KENJI  電子情報通信学会技術研究報告  105-  (495(SP2005 90-104))  67  -72  2005/12/14  [Not refereed][Not invited]
  • 伊藤 敏彦, 山田 真也, 荒木 健治  情報処理学会研究報告. 自然言語処理研究会報告  2005-  (50)  101  -106  2005/05/26  [Not refereed][Not invited]
     
    人間同士または人間と機械との音声対話において, タスク遂行役の音声認識率、対話状況や対話相手の違いによって生じる言語・音響的な特徴の差異に関して実音声対話データの分析結果から明らかにする.機械との対話を扱うため, 比較的単純な状況設定としてカーナビゲーションシステムにおける目的地検索・設定タスクを想定し, その音声インタフェースという具体的な状況設定においてユーザ発話に現れる言語・音響的な特徴の差異を比較した.想定した状況は, 音声認識率が100%と約80%の場合, 対話相手が人間, 応答能力が制限された人間, 又は機械の場合, 運転中又は停車中の場合である.これらの対話状況の違いにより発話形態にどのような違いがあるか, 被験者24名による実対話音声の収録データに基づいて分析を行なった.運転操作中の状況設定に関しては, 擬似的な運転操作環境を設定した.さらに, 対話状況の違いと併せて, 対話相手が誤認識・誤理解した場合の次発話の言語・音響的な分析も行った.その結果, 運転操作の有無による言語的な特徴の差異はほとんどないが, 音響的な特徴の違いが一部見られたほか, 応答が自然音声か合成音声かで幾つかの言語・音響的な特徴の差異が明らかになった.
  • ITOH Toshihiko, YAMADA Shinya, ARAKI Kenji  IPSJ SIG Notes  2005-  (50)  101  -106  2005/05/26  [Not refereed][Not invited]
     
    This paper presents the characteristic differences of linguistic and acoustic features observed in different spoken dialogue situations and with different dialogue partners: human-human vs. human-machine interactions. We compare the linguistic and acoustic features of the user's speech to a spoken dialogue system and to a human operator in several goal setting and destination database searching tasks for a car navigation system. It has been pointed out that speech-based interaction has the potential to distract the driver's attention and degrade safety. On the other hand, it is not clear en...
  • ITOH TOSHIHIKO, YAMADA SHIN'YA, ARAKI KENJI  情報処理学会研究報告  2005-  (50(NL-167 SLP-56))  101  -106  2005/05/26  [Not refereed][Not invited]
     
    This paper presents the characteristic differences of linguistic and acoustic features observed in different spoken dialogue situations and with different dialogue partners: human-human vs. human-machine interactions. We compare the linguistic and acoustic features of the user's speech to a spoken dialogue system and to a human operator in several goal setting and destination database searching tasks for a car navigation system. It has been pointed out that speech-based interaction has the potential to distract the driver's attention and degrade safety. On the other hand, it is not clear enough whether different dialogue situations and different dialogue partners cause any differences of linguistic or acoustic features on one's utterances in a speech interface system. Additionally, research about influence of speech recognition rate is not enough either. We collected a set of spoken dialogues by 24 subject speakers for each experiment under several dialogue situations. For a car driving situation, we prepared a virtual driving simulation system. We also prepared two patterns where we have two dialogue partners with different speech recognition rate (100% and about 80%). We analyzed the characteristic differences of user utterances caused by different dialogue situations and with different dialogue partners in two above mentioned patterns.
  • XU Jin'an, ITOH Toshihiko, ARAKI Kenji  Human interface. The Transaction of Human Interface Society  7-  (1)  55  -67  2005/02/25  [Not refereed][Not invited]
  • Mizuno Satoshi, Takagi Hiroyoshi, Kogure Satoru, Kai Atsuhiko, Itoh Toshihiko, Konishi Tatsuhiro, Itoh Yukihiro  IPSJ SIG Notes  2005-  (12)  77  -82  2005/02/04  [Not refereed][Not invited]
     
    The spoken dialogue interface and the task oriented dialogue system has come to be used by improving the speech recognition, the language understanding technologies, and the computer performance. We need a more robust language understanding for the system to come to be used more generally. Our paper deals with speech intent presumption method using the confidence score of speech recognition and dialogue history for robust meaning understanding. This language understanding results are generated by using the speech recognition results (n-best) and the identification results. Thus, the accurac...
  • MIZUNO SATOSHI, TAKAGI HIROYOSHI, KOGURE SATOSHI, KAI ATSUHIKO, ITOH TOSHIHIKO, KONISHI TATSUHIRO, ITO YUKIHIRO  情報処理学会研究報告  2005-  (12(SLP-55))  77  -82  2005/02/04  [Not refereed][Not invited]
     
    The spoken dialogue interface and the task oriented dialogue system has come to be used by improving the speech recognition, the language understanding technologies, and the computer performance. We need a more robust language understanding for the system to come to be used more generally. Our paper deals with speech intent presumption method using the confidence score of speech recognition and dialogue history for robust meaning understanding. This language understanding results are generated by using the speech recognition results (n-best) and the identification results. Thus, the accuracy of the category identification influences the language understanding accuracy. Then, we used the presumption of user's speech intention in order to improve the language understanding accuracy. As the result of evaluation experiment, we show that the language understanding performance used our proposed method is higher than the language understanding method which simply gives priority to the first hypothesis of an n-best.
  • XU J, 伊藤敏彦, 荒木健治  ヒューマンインタフェース学会論文誌  7-  (1)  55  -67  2005/02  [Not refereed][Not invited]
  • 薬袋直貴, 白鳥雄史, 伊藤敏彦, 小西達裕, 近藤真, 伊東幸宏  教育システム情報学会全国大会講演論文集  29th-  37  -38  2004/08/20  [Not refereed][Not invited]
  • Rzepka Rafal, Itoh Toshihiko, Araki Kenji  IPSJ SIG Notes  2004-  (73)  11  -18  2004/07/15  [Not refereed][Not invited]
     
    In this paper we introduce some ideas for reusing cognitive science concepts which realizing before was impossible due to the technical limits. We concentrate on the Schankian scripts which could help to build plans as the basic method for achieving goals. In contradistinction to the authors of classic cognitivistic ideas, we can currently use powerful computers and terabytes of data which could help to make their concepts usable in not restricted domains for any kind of application using commonsense knowledge. Many useful projects were abandoned because of difficulties due to the manual in...
  • 桐山伸也, 北沢茂良, 伊藤敏彦  日本音響学会研究発表会講演論文集  2004-  237  -238  2004/03/17  [Not refereed][Not invited]
  • 北沢茂良, 桐山伸也, 伊藤敏彦  日本音響学会研究発表会講演論文集  2004-  349  -350  2004/03/17  [Not refereed][Not invited]
  • 伊東幸宏, 小西達裕, 近藤真, 伊藤敏彦  静岡大学情報学研究  9-  119  -123  2004/03/10  [Not refereed][Not invited]
  • SUZUKI YUKIKO, IKEGAYA YUKI, NOGUCHI YASUHIRO, ITOH TOSHIHIKO, KONISHI TATSUHIRO, ITO YUKIHIRO, TAKAGI AKIRA  人工知能学会言語・音声理解と対話処理研究会資料  40th-  73  -78  2004/03/05  [Not refereed][Not invited]
  • Shiraki Masayuki, Itoh Toshihiko, Kai Atsuhiko, Nakatani, Hiromasa  IPSJ SIG Notes  2004-  (15)  69  -74  2004/02/06  [Not refereed][Not invited]
     
    Recently, research on spoken dialog systems has been active with progress of the speech recognition technology. However, it is difficult to extract user intention correctly from natural utterance. Most of these difficulties are due to the errors of speech recognition results, and a variety of linguistic phenomena included in natural utterance. We propose statistical methods to extract user intention from natural utterance. By learning examples, a set of rules which are robust to various linguistic phenomena can be automatically acquired. In this paper, N-gram model, vector space model, and ...
  • SHIRAKI MASAYUKI, ITOH TOSHIHIKO, KAI ATSUHIKO, NAKATANI HIROMASA  情報処理学会研究報告  2004-  (15(SLP-50))  69  -74  2004/02/06  [Not refereed][Not invited]
     
    Recently, research on spoken dialog systems has been active with progress of the speech recognition technology. However, it is difficult to extract user intention correctly from natural utterance. Most of these difficulties are due to the errors of speech recognition results, and a variety of linguistic phenomena included in natural utterance. We propose statistical methods to extract user intention from natural utterance. By learning examples, a set of rules which are robust to various linguistic phenomena can be automatically acquired. In this paper, N-gram model, vector space model, and Support Vector Machine (SVM) are used for understanding user intention. We perform the experiments of intention understanding and evaluate the performances of those methods.
  • IKEGAYA YUKI, NOGUCHI YASUHIRO, SUZUKI YUKIKO, ITOH TOSHIHIKO, KONISHI TATSUHIRO, KONDO MAKOTO, TAKAGI AKIRA, NAKASHIMA HIDEYUKI, ITO YUKIHIRO  人工知能学会全国大会論文集(CD-ROM)  18th-  3E2-10  2004  [Not refereed][Not invited]
  • Yuasa Hiroki, Mizuno Satoshi, Itoh Toshihiko, Kai Atsuhiko, Konishi Tatsuhiro, Itoh Yukihiro  IPSJ SIG Notes  2003-  (124)  199  -204  2003/12/18  [Not refereed][Not invited]
     
    This paper deals with the construction of a spoken dialogue system which interprets an input by using the situation/context. The system has restricting input styles to "Operate an object" "An attribute is a value" in order to achieve higher recognition rate. The system further accepts more than one input in an utterance. We have conducted an evaluation experiment by 20 subjects. The experiment involves operating an air-conditioner and a stereo in a car. By analyzing the collected dialogues, the validity of the language interpretation using the situation/context has been confirmed. In additi...
  • MORITA Hiroyasu, HAYASHI Michihiro, ITOH Toshihiko, KAI Atsuhiko, KONISHI Tatsuhiro, ITOH Yukihiro, KATSURAGAWA Keiko, OONO Takeshi  IPSJ SIG Notes  2003-  (124)  205  -210  2003/12/18  [Not refereed][Not invited]
     
    When a human interface accepts voice input, the vocabulary at sentence styles to be used are different from those for another device accepting voice input. The increase of such devices forces users to learn different input methods. In this paper, we propose a spoken language interface using a consistent input method which can be applied to every voice input device. We examine problems of voice input in car navigation systems and describe the tequnique for unification of sentence styles. We have implemented a system for destination search and conducted two experiments for evaluating the system.
  • YUASA HIROKI, MIZUNO SATOSHI, ITOH TOSHIHIKO, KAI ATSUHIKO, KONISHI TATSUHIRO, ITO YUKIHIRO  電子情報通信学会技術研究報告  103-  (517(NLC2003 50-90))  199  -204  2003/12/18  [Not refereed][Not invited]
  • MORITA HIROYASU, HAYASHI MICHIHIRO, ITOH TOSHIHIKO, KAI ATSUHIKO, KONISHI TATSUHIRO, ITO YUKIHIRO, KATSURAGAWA KEIKO, ONO TAKESHI  電子情報通信学会技術研究報告  103-  (517(NLC2003 50-90))  205  -210  2003/12/18  [Not refereed][Not invited]
  • KATSURAGAWA KEIKO, YANAGI TAKURA, OONO KEN, WATANABE MASAKI, ITOH TOSHIHIKO, KONISHI TATSUHIRO, ITOH YUKIHIRO  Transactions of Information Processing Society of Japan  44-  (12)  2990  -3001  2003/12/15  
    In this paper, we propose a drive planning system that supports users in making a plan for a trip. We introduce a sub-system named DPS-PC which runs on stand-alone PC. We think if we can register our trip plan to an ITS system previously, the ITS services it provides for us will be more rich. DPS-PC has the function to help users decide several factors of a trip: multiple destinations and waypoints, arrival and departure times, the number of days that the trip will take and the route. The drive is planned interactively by a dialog with the system through a natural language interface. We discuss what conditions such a drive planning system should accept, describe the implementation of a prototype of DPS-PC, and present the result of evaluation of its usefulness. We make 10 subjects construct each drive plan for 2 days trip by using DPS-PC. It accepts 92.4% sentences that the subjects input and all subjects can construct their plan in 15 minutes. Moreover, we confirm that our natural language interface can accept various requirements in one sentence, while it takes multiple actions to designate such requirements by using usual GUI.
  • KATSURAGAWA KEIKO, YANAGI TAKURA, OONO KEN, WATANABE MASAKI, ITOH TOSHIHIKO, KONISHI TATSUHIRO, ITOH YUKIHIRO  IPSJ Journal  44-  (12)  2990  -3001  2003/12/15  [Not refereed][Not invited]
     
    In this paper, we propose a drive planning system that supports users in making a plan for a trip. We introduce a sub-system named DPS-PC which runs on stand-alone PC. We think if we can register our trip plan to an ITS system previously, the ITS services it provides for us will be more rich. DPS-PC has the function to help users decide several factors of a trip: multiple destinations and waypoints, arrival and departure times, the number of days that the trip will take and the route. The drive is planned interactively by a dialog with the system through a natural language interface. We dis...
  • KATSURAGAWA KEIKO, YANAGI TAKURA, ONO KEN, WATANABE MASAKI, ITOH TOSHIHIKO, KONISHI TATSUHIRO, ITO YUKIHIRO  情報処理学会論文誌  44-  (12)  2990  -3001  2003/12/15  [Not refereed][Not invited]
  • Yuasa Hiroki, Mizuno Satoshi, Itoh Toshihiko, Kai Atsuhiko, Konishi Tatsuhiro, Itoh Yukihiro  IEICE technical report. Natural language understanding and models of communication  103-  (517)  199  -204  2003/12/11  [Not refereed][Not invited]
     
    This paper deals with the construction of a spoken dialogue system which interprets an input by using the situation/context. The system has restricting input styles to "Operate an object" "An attribute is a value" in order to achieve higher recognition rate. The system further accepts more than one input in an utterance. We have conducted an evaluation experiment by 20 subjects. The experiment involves operating an air-conditioner and a stereo in a car. By analyzing the collected dialogues, the validity of the language interpretation using the situation/context has been confirmed. In additi...
  • MORITA Hiroyasu, HAYASHI Michihiro, ITOH TOshihiko, KAI Atsuhiko, KONISHI Tatsuhiro, ITOH Yukihiro, KATSURAGAWA Keiko, OONO Takeshi  IEICE technical report. Natural language understanding and models of communication  103-  (517)  205  -210  2003/12/11  [Not refereed][Not invited]
     
    When a human interface accepts voice input, the vocabulary at sentence styles to be used are different from those for another device accepting voice input. The increase of such devices forces users to learn different input methods. In this paper, we propose a spoken language interface using a consistent input method which can be applied to every voice input device. We examine problems of voice input in car navigation systems and describe the tequnique for unification of sentence styles. We have implemented a system for destination search and conducted two experiments for evaluating the system.
  • Yuasa Hiroki, Mizuno Satoshi, Itoh Toshihiko, Kai Atsuhiko, Konishi Tatsuhiro, Itoh Yukihiro  IEICE technical report. Speech  103-  (519)  199  -204  2003/12/11  [Not refereed][Not invited]
     
    This paper deals with the construction of a spoken dialogue system which interprets an input by using the situation/context. The system has restricting input styles to "Operate an object" "An attribute is a value" in order to achieve higher recognition rate. The system further accepts more than one input in an utterance. We have conducted an evaluation experiment by 20 subjects. The experiment involves operating an air-conditioner and a stereo in a car. By analyzing the collected dialogues, the validity of the language interpretation using the situation/context has been confirmed. In additi...
  • MORITA Hiroyasu, HAYASHI Michihiro, ITOH TOshihiko, KAI Atsuhiko, KONISHI Tatsuhiro, ITOH Yukihiro, KATSURAGAWA Keiko, OONO Takeshi  IEICE technical report. Speech  103-  (519)  205  -210  2003/12/11  [Not refereed][Not invited]
     
    When a human interface accepts voice input, the vocabulary at sentence styles to be used are different from those for another device accepting voice input. The increase of such devices forces users to learn different input methods. In this paper, we propose a spoken language interface using a consistent input method which can be applied to every voice input device. We examine problems of voice input in car navigation systems and describe the tequnique for unification of sentence styles. We have implemented a system for destination search and conducted two experiments for evaluating the system.
  • 田中勝, 伊藤敏彦, 竹内一雅, 七海憲  ネットワークポリマー講演討論会講演要旨集  53rd-  89  -92  2003/10/23  [Not refereed][Not invited]
  • 竹内一雅, 田中勝, 伊藤敏彦, 七海憲  ネットワークポリマー講演討論会講演要旨集  53rd-  85  -88  2003/10/23  [Not refereed][Not invited]
  • KIRIYAMA SHIN'YA, MITSUTA YOSHIFUMI, HOSOKAWA YUTA, ITOH TOSHIHIKO, KITAZAWA SHIGEYOSHI  電子情報通信学会技術研究報告  103-  (332(SP2003 94-102))  35  -40  2003/09/30  [Not refereed][Not invited]
     
    We have developed the methods to generate prosodic labels automatically, utilizing the linguistic information. Large-scale prosodic databases are strongly desired for years, however, the construction of databases depend on hand labeling, because of diversity of prosody. Our purpose is development of "a prosodic labeling support system." We aim at not automating the whole labeling process, but making the hand labeling work more efficient by providing the labelers with the appropriate support information. The methods of auto-generating initial phoneme and prosodic labels utilizing linguistic information are proposed and evaluated. The experimental results showed that more than 70% of J-ToBI labels were correctly generated, and proved the efficiency of the proposed methods. The results also enabled us to study how to generate support information based on tendency of timing errors of the phoneme labels for each phoneme, or possibility of plural candidates of accentual phrase boundaries for J-ToBI labels.
  • KIRIYAMA Shinya, MITSUTA Yoshifumi, HOSOKAWA Yuta, ITOH Toshihiko, KITAZAWA Shigeyoshi  Technical report of IEICE. DSP  103-  (330)  35  -40  2003/09/23  [Not refereed][Not invited]
     
    We have developed the methods to generate prosodic labels automatically, utilizing the linguistic information. Large-scale prosodic databases are strongly desired for years, however, the construction of databases depend on hand labeling, because of diversity of prosody. Our purpose is development of "a prosodic labeling support system." We aim at not automating the whole labeling process, but making the hand labeling work more efficient by providing the labelers with the appropriate support information. The methods of auto-generating initial phoneme and prosodic labels utilizing linguistic ...
  • KIRIYAMA Shinya, MITSUTA Yoshifumi, HOSOKAWA Yuta, ITOH Toshihiko, KITAZAWA Shigeyoshi  IEICE technical report. Speech  103-  (332)  35  -40  2003/09/23  [Not refereed][Not invited]
     
    We have developed the methods to generate prosodic labels automatically, utilizing the linguistic information. Large-scale prosodic databases are strongly desired for years, however, the construction of databases depend on hand labeling, because of diversity of prosody. Our purpose is development of "a prosodic labeling support system." We aim at not automating the whole labeling process, but making the hand labeling work more efficient by providing the labelers with the appropriate support information. The methods of auto-generating initial phoneme and prosodic labels utilizing linguistic ...
  • 三ツ田佳史, 桐山伸也, 北沢茂良, 伊藤敏彦  日本音響学会研究発表会講演論文集  2003-  363  -364  2003/09/17  [Not refereed][Not invited]
  • 伊藤佳世, 桐山伸也, 北沢茂良, 伊藤敏彦, 北村達也  日本音響学会研究発表会講演論文集  2003-  361  -362  2003/09/17  [Not refereed][Not invited]
  • はつ川友宏, 伊藤敏彦, 坂根裕, 新谷誠, 小西達裕, 伊東幸宏  教育システム情報学会全国大会講演論文集  28th-  141  -142  2003/08/30  [Not refereed][Not invited]
  • 白鳥雄史, 伊藤敏彦, 小西達裕, 近藤真, 伊東幸宏  教育システム情報学会全国大会講演論文集  28th-  33  -34  2003/08/30  [Not refereed][Not invited]
  • NOGUCHI YASUHIRO, IKEGAYA YUKI, SUZUKI YUKIKO, ITOH TOSHIHIKO, KONISHI TATSUHIRO, KONDO MAKOTO, TAKAGI AKIRA, NAKASHIMA HIDEYUKI, ITO YUKIHIRO  人工知能学会全国大会論文集  17th-  (Pt.1)  1C1.05,1-4  2003/06/23  [Not refereed][Not invited]
  • IKEGAYA YUKI, NOGUCHI YASUHIRO, SUZUKI YUKIKO, ITOH TOSHIHIKO, KONISHI TATSUHIRO, KONDO MAKOTO, TAKAGI AKIRA, NAKASHIMA HIDEYUKI, ITO YUKIHIRO  人工知能学会全国大会論文集  17th-  (Pt.2)  3B1.05,1-4  -4  2003/06/23  [Not refereed][Not invited]
  • MITSUTA Yoshifumi, KIRIYAMA Shinya, KITAZAWA Shigeyoshi, ITOH TOSHIHIKO  日本音響学会研究発表会講演論文集  2003-  (1)  379  -380  2003/03/18  [Not refereed][Not invited]
  • KIRIYAMA Shinya, ITOH Toshihiko, KITAZAWA Shigeyoshi  日本音響学会研究発表会講演論文集  2003-  (1)  381  -382  2003/03/18  [Not refereed][Not invited]
  • MOCHIZUKI Kazuya, KIRIYAMA Shinya, ITOH Toshihiko, KITAZAWA Shigeyoshi  日本音響学会研究発表会講演論文集  2003-  (1)  383  -384  2003/03/18  [Not refereed][Not invited]
  • 桐山伸也, 伊藤敏彦, 北沢茂良  日本音響学会研究発表会講演論文集  2003-  381  -382  2003/03/18  [Not refereed][Not invited]
  • MIZUTANI Makoto, ITOH Toshihiko, KAI Atsuhiko, KONISHI Tatsuhiro, ITOH Yukihiro  IPSJ SIG Notes  2003-  (14)  113  -118  2003/02/07  [Not refereed][Not invited]
     
    Although the ear-navigation system attracts attention as one of the spoken dialogue interfaces, a dialogue will not progress smoothly by miss recognition under the influence of a natural speech and a run noise, and a user will feel displeasure. Thus, this research aims, at the construction of a dialogue system which can obtain a smooth dialogue and the high degree of user satisfaction by performing language understanding and response generation using the confidence measure (CM) based on continuous speech recognizer (CSR) and the dialogue history. This paper show s the spoken language unders...
  • Itoh Yukihiro, Konishi Tatsuhiro, Kondo Makoto, Itoh Toshihiko  Studies in information, Shizuoka University  9-  (0)  119  -123  2003  [Not refereed][Not invited]
  • 小暮 悟, 伊藤 敏彦, 中川 聖一  言語・音声理解と対話処理研究会  36-  (0)  71  -76  2002/11/07  [Not refereed][Not invited]
  • Kogure Satoru, Itoh Toshihiko, Nakagawa Seiichi  情報科学技術フォーラム一般講演論文集  2002-  (3)  467  -468  2002/09/13  [Not refereed][Not invited]
  • ITO Toshihiko, KAI Atsuhiko, IWAMOTO Yoshiyuki, MIZUTANI Makoto, YUASA Hiroki, KONISHI Tatsuhiro, ITOH Yukihiro  Transactions of Information Processing Society of Japan  43-  (7)  2118  -2129  2002/07/15  
    This paper presents the characteristic differences of acoustic and linguistic features observed for different spoken dialogue situations in human-human and human-machine interactions. We compare the acoustic and linguistic features of the user's dialogue speech both for a spoken dialogue system and an actual human-operator service in several landmark-setting tasks for a car navigation system. It is known that speech-based interaction has the potential to distract drivers and degrade safety. On the other hand, it is not clear whether a different dialogue situation causes some acoustic or linguistic differences on their utterances in a speech interface system. We collected a set of spoken dialogue data by 10 subject speakers under several dialogue situations. For the car-driving condition, we prepared a virtual driving simulation system. We analyzed the characteristic differences of user utterances caused by different dialogue situations or the system understanding errors. As a result, we observed that the existence of a car-driving task affects some prosodic features and the difference of humanmachine and human-human dialogue conditions also affects the other acoustic and linguistic features, while no significant differences are observed for the other acoustic and linguistic features whether they performed a car-driving task or not.
  • ITOH TOSHIHIKO, KAI ATSUHIKO, IWAMOTO YOSHIYUKI, MIZUTANI MAKOTO, YUASA HIROKI, KONISHI TATSUHIRO, ITOH YUKIHIRO  IPSJ Journal  43-  (7)  2118  -2129  2002/07/15  [Not refereed][Not invited]
     
    This paper presents the characteristic differences of acoustic and linguistic features observed for different spoken dialogue situations in human-human and human-machine interactions. We compare the acoustic and linguistic features of the user's dialogue speech both for a spoken dialogue system and an actual human-operator service in several landmark-setting tasks for a car navigation system. It is known that speech-based interaction has the potential to distract drivers and degrade safety. On the other hand, it is not clear whether a different dialogue situation causes some acoustic or lin...
  • Iwamoto Yoshiyuki, Itoh Toshihiko, Kai Atsuhiko, Konishi Tatsuhiro, Itoh Yukihiro  IPSJ SIG Notes  2002-  (50)  61  -67  2002/05/24  [Not refereed][Not invited]
     
    We investigated the characteristic change of utterances under the different dialogue situations; the situation of using a voice interface of machine versus talking with a human, and the situation of talking with driving versus without driving. The result of a statistical analysis revealed that a driving task does not affect the linguistic features of utterances and the result differed from our assumption. Since this result may be due to a relatively low coguitive load in driving task, we conducted a dialogue experiment under the situation of a concurrent driving task with different difficul...
  • 岩本 善行, 伊藤 敏彦, 甲斐 充彦, 小西 達裕, 伊東 幸宏  情報処理学会研究報告. 自然言語処理研究会報告  2002-  (44)  125  -131  2002/05/23  [Not refereed][Not invited]
     
    音声入力インタフェースを使用する状況において、対話相手が人間又は機械、運転中又は停車中といった対話状況の違いにより発話にどのような特徴の変化があるのかを調べる為に、対話を収集し分析を行った。その書き起こしや言語的・音響的特徴の統計的な分析結果では、運転の有無は発話の言語的特徴に影響を与えないというものであり、我々の仮説とは異なっていた。しかしながら、運転タスクの難易度が低すぎたことによる影響の可能性がある為、運転操作に必要な認知的負荷を変化させた場合の発話の言語的・音響的特徴に関する分析を行った。その結果、発話の言語的特徴においては、ほとんど運転タスクの影響を受けず、音響的特徴に若干の影響を与える事が明らかになった。
  • Itoh Yukihiro, Konishi Tatsuhiro, Itoh Toshihiko, Katsuragawa Keiko  Journal of Japanese Society for Artificial Intelligence  17-  (3)  285  -290  2002/05/01  [Not refereed][Not invited]
  • KITAMURA Tatsuya, ITOH Kayo, ITOH Toshihiko, KITAZAWA Shigeyoshi  IEICE technical report. Speech  102-  (35)  61  -66  2002/04/19  [Not refereed][Not invited]
     
    This paper studies the influence of prosodic features, context, and word order on the identification of focused clauses in Japanese dialogue, using a psychoacoustic experiment. In the experiment, question and answer speech was used as stimuli. The questions were to create two different contexts in the stimuli, and the answers had focal prominence at different clauses and had different word orders. The experimental results indicate that (1) prosodic characteristics are more significant for focus identification, (2) context has some effect on identification, and (3) it is probable that the wo...
  • KITAMURA Tatsuya, ITOH Kayo, ITOH Toshihiko, KITAZAWA Shigeyoshi  Technical report of IEICE. EA  102-  (33)  61  -66  2002/04/19  [Not refereed][Not invited]
     
    This paper studies the influence of prosodic features, context, and word order on the identification of focused clauses in Japanese dialogue, using a psychoacoustic experiment. In the experiment, question and answer speech was used as stimuli. The questions were to create two different contexts in the stimuli, and the answers had focal prominence at different clauses and had different word orders. The experimental results indicate that (1) prosodic characteristics are more significant for focus identification, (2) context has some effect on identification, and (3) it is probable that the wo...
  • MOCHIZUKI Kazuya, KITAZAWA Shigeyoshi, KITAMURA Tatsuya, ITOH Toshihiko  日本音響学会研究発表会講演論文集  2002-  (1)  369  -370  2002/03/18  [Not refereed][Not invited]
  • 成瀬 聡, 鈴木 正浩, 伊藤 敏彦  知的教育システム研究会  34-  (0)  99  -104  2002/03/02  [Not refereed][Not invited]
  • KOGURE Satoru, ITOH Toshihiko, NAKAGAWA Seiichi  情報処理学会研究報告. HI, ヒューマンインタフェース研究会報告  2002-  (10)  139  -144  2002/02/01  [Not refereed][Not invited]
     
    Recently the technology for speech recognition and language processing for spoken dialogue systems has been improved, and speech recognition systems and dialogue systems have been developed to be practical use. In order to become more practical, not only those fundamental techniques but also the techniques of portability and expansibility should be developed. We already presented the portability of spoken dialogue systems. In our past research, we demonstrated the portability of the speech recognition module and the interpreter. In this paper, we focused on the portability of the dialogue m...
  • KOGURE Satoru, ITOH Toshihiko, NAKAGAWA Seiichi  IPSJ SIG Notes  2002-  (10)  139  -144  2002/02/01  [Not refereed][Not invited]
     
    Recently the technology for speech recognition and language processing for spoken dialogue systems has been improved, and speech recognition systems and dialogue systems have been developed to be practical use. In order to become more practical, not only those fundamental techniques but also the techniques of portability and expansibility should be developed. We already presented the portability of spoken dialogue systems. In our past research, we demonstrated the portability of the speech recognition module and the interpreter. In this paper, we focused on the portability of the dialogue m...
  • KITAMURA Tatsuya, ITOH Toshihiko, MOCHIZUKI Kazuya, KITAZAWA Shigeyoshi  IEICE technical report. Speech  101-  (603)  23  -30  2002/01/17  [Not refereed][Not invited]
     
    A very detailed segmentation of prosodic phrase has carried out in order to construct a Japanese prosodic database. The database, referred to here as "Japanese Multext", contains read style speech and spontaneous style speech by three male speakers and three female speakers in Tokyo dialect. The "prosodic phrase", we introduced as a unit of the segmentation, was defined and regarded as a unit of language speech perception. For the exact segmentation, the wide-band spectrum, the narrow-band spectrum, fine speech wave and fundamental frequency shapes, and transition of amplitude of the higher...
  • KAI Atsuhiko, ISHIMARU Akiko, ITOH Toshihiko, KONISHI Tatsuhiko, ITOH Yukihiro  日本音響学会研究発表会講演論文集  2001-  (2)  63  -64  2001/10/01  [Not refereed][Not invited]
  • ITOH Toshihiko, IWAMOTO Yoshiyuki, MIZUTANI Makoto, YUASA Hiroki, KAI Atsuhiko, KONISHI Tatsuhiro, ITOH Yukihiro  日本音響学会研究発表会講演論文集  2001-  (2)  65  -66  2001/10/01  [Not refereed][Not invited]
  • KITAZAWA Shigeyoshi, KITAMURA Tatsuya, MOCHIZUKI Kazuya, ITOH TOSHIHIKO  日本音響学会研究発表会講演論文集  2001-  (2)  227  -228  2001/10/01  [Not refereed][Not invited]
  • 桂川 景子, 丹羽 教泰, 柳 拓良, 渡部 眞幸, 伊藤 敏彦, 小西 達裕, 伊東 幸宏  情報処理学会研究報告. MBL, [モバイルコンピューティングとワイヤレス通信]  2001-  (83)  229  -236  2001/09/06  [Not refereed][Not invited]
     
    本稿では車での移動を前提とした旅行やドライブのための移動プラン作成をサポートするドライブプランニングシステムを提案する.このシステムは, カーナビゲーションシステムの機能の一つである目的地設定を拡張し, 複数の目的地やそれに付随する発着時間, 日数や経路などの設定を行なうものである.これら複数のパラメータを自然言語対話によって設定する手法について報告する.特にその言語解析部について詳しく検討し, 実装したプロトタイプシステムを紹介する.また, 試作したシステムの評価を行ない, 有用性と問題点を明確にする.
  • Katsuragawa Keiko, Niwa Michihiro, Yanagi Takura, Watanabe Masaki, Itoh Toshihiko, Konishi Tatsuhiro, Itoh Yukihiro  情報処理学会研究報告. ITS, [高度交通システム]  2001-  (83)  229  -236  2001/09/06  [Not refereed][Not invited]
     
    In this paper, we propose a drive planning system that supports users in making a plan for a trip. This system has the function to help users decide several factors of a trip: multiple destinations and waypoints, arrival and departure times, the number of days that the trip will take and the route. It also proposes taking a rest on a long distance trip in order to ensure safe driving. The drive is planned interactively by a dialog with the system through a natural language interface. We propose a method to construct such a drive planning system, describe the implementation of a prototype di...
  • Niwa Michihiro, Akiyama Taizou, Yanagi Takura, Watanabe Masaki, Itoh Toshihiko, Konishi Tatsuhiro, Itoh Yukihiro  IPSJ SIG Notes  2000-  (101)  55  -60  2000/10/27  [Not refereed][Not invited]
     
    In this paper, we describe a natural language interface for Drive Planning System which supports drivers to make a plan for a trip. The system enables us to make a plan for a trip interactively by using natural language. We propose following methods:a parsing technique for restricted sentence patterns in a specific domain, a method for semantic analysis integrated into the parsing process, and a method for contextual analysis identifying references of pronouns and omitted words. We implemented a prototype dialogue system for planning trip by car and evaluated the system.
  • Itoh Toshihiko, Minematsu Nobuaki, Nakagawa Seiichi  The Journal of the Acoustical Society of Japan  55-  (5)  333  -342  1999/05/01  [Not refereed][Not invited]
     
    本研究では, 独話や対話に存在する間投詞に着目し「発話中の間投詞は聞き手に対してどのような働きを持つのか」「協調的なシステムの応答文生成において間投詞は有効・必要なのか」という観点から, 聴取実験による検討を行った。その結果, 間投詞に関する幾つかの知見を得ることができた。これらの知見に基づき, 対話システムにおいて「より自然なシステム応答」及び「情報検索・応答文生成によって不可避的に生じる無音が引き起こす不自然さの軽減」を目的として, システム応答音声中に間投詞を挿入することを考案した。そして, WOZ (Wizard of OZ)による音声対話システムを用いて, 間投詞が付与されたシステム応答に対する評価実験を行った。実験結果より間投詞が, 音声対話システムにおける応答文生成時間の確保や, 発話権の維持, 及びシステムが動作中であることを示すサインとして有用であることが分かり, 間投詞挿入による効果が確認された。
  • Araki Masahiro, Itoh Toshihiko, Kumagai Tomoko, Ishizaki Masato  Journal of Japanese Society for Artificial Intelligence  14-  (2)  251  -260  1999/03/01  [Not refereed][Not invited]
     
    In this paper, we propose a standard utterane-unit tagging scheme, which has been developed by the discourse tagging working group under SIG-SLUD, JSAI. Utterance-unit tagging mainly addresses the type of illocutionary force and the role of The interaction unit. We have made a first version of the tagging scheme by surveying existing tagging schemes developed by several research groups. We have evaluated it on an experimental basis and thereby revised it to the new version that we propose as a standard scheme. The reliability of this scheme is demonstrated by another tagging experiment.
  • KOGURE Satoru, ITOH Toshihiko, NAKAGAWA Seiichi  IPSJ SIG Notes  99-  (14)  13  -18  1999/02/05  [Not refereed][Not invited]
     
    Recently the study of robustness and usability for speech recognition and language processing has been established, and speech recognition systems and dialogue systems have been developed to be practical use. But if these systems will be come practical, it is important that not only those fundamental techniques but also the techniques of portability and expansibility should be developed. Based on this consideration, we examined our system in portability by transfering the domain of the system from the Mt. Fuji sightseeing guidance to the Mikawa sightseeing guidance. Also we designed a domai...
  • 55-  (5)  333  -342  1999  [Not refereed][Not invited]
  • Kogure Satoru, Itoh Toshihiko, Hirose Yoshifumi, Kai Atsuhiko, Nakagawa Seiichi  全国大会講演論文集  57-  (2)  239  -240  1998/10/05  [Not refereed][Not invited]
  • ITOH Toshihiko, NAKAGAWA Seiichi  IPSJ SIG Notes  98-  (68)  61  -66  1998/07/24  [Not refereed][Not invited]
     
    We investigated filled pauses found in lecture speech and dialgoue speech from the following viewpoints;1)the role of the filled pauses in listener's understanding, 2)the necessity or effectiveness of generating filled pauses to make its responses more cooperative. And a series of listening tests were carried out. As a result, we obtained several findings on the above issues. Based on the findings, in this paper, we propose that a speech dialogue system should insert filled pauses in response senteces to increase the naturalness of the responses and to exclude unnatural silent segments(corr...
  • ITOH Toshihiko, MINEMATSU Nobuaki, NAKAGAWA Seiichi  人工知能学会全国大会論文集 = Proceedings of the Annual Conference of JSAI  12-  (0)  499  -502  1998/06/16  [Not refereed][Not invited]
  • ITO Toshihiko, KOGURE Satoru, NAKAGAWA Seiichi  Transactions of Information Processing Society of Japan  39-  (5)  1248  -1257  1998/05/15  
    We have developed a robust dialogue system which aids users in information retrieval through spontaneous speech. Dialog system through natural language must be designed so that it can cooperatively response to users. Based on this consideration, we developed a cooperative response generator in the dialogue system. The response generator is composed of dialog manager, problem solver, knowledge databases, and response sentence generator. The response generator receives a semantic representation (that is, semantic network) which the interpreter builds for the user's utterance and generates as cooperative response sentences as possible. For example, if a user's query doesn't have enough conditions/information to answer the question by the system, and if there are many information retrieval candidates from the knowledge database for user's question, the dialog manager queries the user to get necessary conditions and to select the information. Further, if the system can't retrieve any information related to the user's question, the generator proposes an alternative plan. And evaluation experiments are described how the above improvement increses "convenience of the system".
  • ITOH TOSHIHIKO, KOGURE SATORU, NAKAGAWA SEIICHI  IPSJ Journal  39-  (5)  1248  -1257  1998/05/15  [Not refereed][Not invited]
     
    We have developed a robust dialogue system which aids users in information retrieval through spontaneous speech. Dialog system through natural language must be designed so that it can cooperatively response to users. Based on this consideration, we developed a cooperative response generator in the dialogue system. The response generator is composed of dialog manager, problem solver, knowledge databases, and response sentence generator. The response generator receives a semantic representation (that is, semantic network) which the interpreter builds for the user's utterance and generates as ...
  • Denda Akihiro, Itoh Toshihiko, Nakagawa Seiichi  全国大会講演論文集  56-  (2)  86  -87  1998/03/17  [Not refereed][Not invited]
  • Nakagawa Seiichi, Denda Akihiro, Itoh Toshihiko  Journal of Japanese Society for Artificial Intelligence  13-  (2)  241  -251  1998/03/01  [Not refereed][Not invited]
     
    Recent improvements of speech recognition and natural language processing enable dialogue systems to deal with spontaneous speech. With the aim of supporting these systems, multi-modal man-machine interface has been introduced to the system widely. We have been aiming at realization of a robust dialogue system using spontaneous speech as main input modality. Although our conventional system was developed with a robust natural language interpreter, since its user interface was built only on speech, the system did not always give enough usability. However, in this case, response sentences bec...
  • 協調的応答を備えた音声対話システムとその評価
    情報処理学会論文誌  55-  (5)  333  -342  1998  [Not refereed][Not invited]
  • A Spoken Dialogue System with Cooperative Response and Evaluation for the System
    55-  (5)  333  -342  1998  [Not refereed][Not invited]
  • DENDA Akihiro, ITOH Toshihiko, KOGURE Satoru, NAKAGAWA Seiichi  IPSJ SIG Notes  97-  (101)  39  -46  1997/10/24  [Not refereed][Not invited]
     
    In our laboratory, we have developed the multi-modal interface with speech input/output, graphical output and touch input for our spoken dialogue system; "Mt. Fuji Sightseeing Guidance System by Spoken Japanese". Furthermore, we implemented an agent interface with real face image/animation and real speech/synthesized speech to the system and carried out evaluation experiments which consist of task completions and questionnaires to evaluate the interface and whole system. The results indicate that users prefer "mechanical/artificial" and "consistent" agent. And they indicate the usefulness o...
  • Itoh Toshihiko, Minematsu Nobuaki, Nakasawa Seiichi  全国大会講演論文集  55-  (2)  27  -28  1997/09/24  [Not refereed][Not invited]
     
    本研究では協調的な問題解決の対話音声中に存在する間投詞に着目し「発話中の間投詞は聞き手に対してどのような働きを持つか。」, 「協調的な応答文生成において間投詞は有効又必要なのか。」という観点から, 知覚実験による検討を行なった。実験は対話音声より, 間投詞部分を1) 抜き出して切り詰めた音声試料, 2) 同一時間長の無音置換を施した音声試料, 3) 異なる箇所で発声された同一種類の間投詞, 4) 異なる種類の間投詞と置換した音声試料, 5) 2)の無音区間の長さを様々に変化させた音声試料, 6) 間投詞の直前に位置する無音区間を様々に変化させた音声試料, を各々用意し被験者に提示した。1)〜4)までの音声試料に対しては自然である(違和感を全く感じない)との反応を示した。5), 6)に対しては「長い無音区間が不自然に感じる」との反応が幾らかあった。以下, 本実験の目的・計画・結果・考察について述べる。なお, 本稿で言う無音置換とはバックグランドノイズとの置換を意味する。
  • Itoh Toshihiko, Kai Atsuhiko, Yamamoto Kazumasa, Nakasawa Seiichi  全国大会講演論文集  55-  (2)  33  -34  1997/09/24  [Not refereed][Not invited]
     
    近年, バーソナルコンピューター(PC)の性能が向上し, 音声・動画といった計算パワーが必要なマルチメディア関係のアブリケーションも多く見られるようになってきた。そのため, アプリケーションの入力インターフェイスとしてもキーボード・マウスだけでなく, これまでは計算量の問題から実現が難しかったソフトウェアによる音声認識も使用可能となってきた。パソコン上でソフトウェアのみで動作する音声認識システムはいくつか提案されている。我々はワークステーショシ上で開発された音声認識システムをベースに, PC上で動作する音声認識システムを開発した。この音声認識システムは音声入力・分析クライアントと音声認識サーバから構成されておりネットワークを介した文, 句などの複数単語の系列(連続音声)の音声認識か可能である。
  • KAI Atsuhiko, ITOH Toshihiko, YAMAMOTO Kazumasa, NAKAGAWA Seiichi  日本音響学会研究発表会講演論文集  1997-  (2)  175  -176  1997/09/01  [Not refereed][Not invited]
  • ITOH TOSHIHIKO, Nakagawa Seiichi  全国大会講演論文集  54-  (2)  235  -236  1997/03/12  [Not refereed][Not invited]
     
    自然言語による対話システムにおいては、システムがユーザと協調的に対話を進めていくことは重要である。データベース検索における協調的応答生成に関しては質問の答以外に付加的な情報を与えたり、失敗した質問に対する理由や代案を提示するものが多い。例えば、ユーザの質問文に検索に必要な情報が含まれていなかったり、検索結果の数が多い場合などはユーザへの質問を行なったり、ユーザの望む検索結果が得られなかった場合、それに代わる代案を提供する。このようなユーザへの協調的応答によってユーザにかかる負担や不安を軽減することを我々は試みている。本稿では、我々が協調的応答生成に関して改良した音声対話シスチムについて、「システムの使い勝手の良さ」、「協調的応答」に着目して行なった評価実験について述べる。
  • DENDA Akihiro, ITOH Toshihiko, KOGURE Satoru, NAKAGAWA Seiichi  IPSJ SIG Notes  97-  (16)  47  -52  1997/02/07  [Not refereed][Not invited]
     
    Recent improvements of speech recognition and natural language processing enable dialogue systems to deal with spontaneous speech. With the aim of supporting these systems, multi-modal man-machine interface has been introduced to the system widely. In addition to increasing the total performance of the systems, the multi-modal interface is expected to make the dialogues between a user and the system more natural and abundant in content. In our laboratory, we have developed the multi-modal interface with speech input/output, graphical output and touch input for our spoken dialogue system, "M...
  • ITOH TOSHIHIKO, Nakagawa Seiichi  全国大会講演論文集  53-  (2)  353  -354  1996/09/04  [Not refereed][Not invited]
     
    自然言語による対話システムにおいては、システムがユーザと協調的に対話を進めていくことは重要である。発話内容を決定する方法としては、談話の結束性に注目し、修飾構造、談話の焦点などの情報を利用し発話内容を決定するアプローチや、談話をある目的のためのプランとして考え、システムがユーザの質問意図として談話ゴールを推論し、そのゴールの達成に必要な内容を協調的発話として生成するアプローチがある。データベース検索における協調的応答生成に関しては質問の答以外に付加的な情報を与えたり、失敗した質問に対する理由や代案を提示するものが多い。本稿では我々が開発した富士山観光案内音声対話システムとその評価実験で挙げられた応答生成システムの問題点を改良するために構築した、協調的な応答機能をもった応答生成システムについて述べる。
  • 傳田 明弘, 伊藤 敏彦, 中川 聖一  情報処理学会研究報告. SLP, 音声言語情報処理  96-  (74)  53  -54  1996/07/26  [Not refereed][Not invited]
     
    In this paper, we propose a drive planning system that supports users in making a plan for a trip. This system has the function to help users decide several factors of a trip: multiple destinations and waypoints, arrival and departure times, the number of days that the trip will take and the route. It also proposes taking a rest on a long distance trip in order to ensure safe driving. The drive is planned interactively by a dialog with the system through a natural language interface. We propose a method to construct such a drive planning system, describe the implementation of a prototype di...
  • YAMAMOTO MIKIO, ITOH TOSHIHIKO, HIDANO MASARU, NAKAGAWA SEIICHI  IPSJ Journal  37-  (4)  471  -482  1996/04/15  [Not refereed][Not invited]
     
    In a current speech recognition technology, an interpreter that receives the recognized sentences must be developed so as to cope not only with spontaneous sentences but also with illegal sentences with recognition errors to improve a spoken dialogue system property. Therefore, we carried out experiments to investigate how humans modify or correct the recognized sentences which might include errors. Although 43% of the sentences were the results of misrecognition, the results showed that the subjects who were familiar with the system could correctly interpret 87% of all the sentences. And s...
  • ITOH Toshihiko, NAKAGAWA Seiichi  情報処理学会研究報告. HI, ヒューマンインタフェース研究会報告  96-  (21)  105  -110  1996/02/29  [Not refereed][Not invited]
     
    We have developed a robust dialogue system which aid users in information retrieval through spontaneous speech. Dialog system through natural language must be designed so that it can cooperatively response to users. Based on this consideration, we deloped a cooperative response generator in the dialogue system. The response generator is composed of dialog manager, problem solver, knowledge databases, and response sentence generator. The response generator receives a semantic representation (that is, semantic network) which the interpreter builds for the user's utterance and generates as coo...
  • Itoh Toshihiko, Hidano Masaru, Yamamoto Mikio, Nakagawa Seiichi  IPSJ SIG Notes  95-  (73)  139  -144  1995/07/20  [Not refereed][Not invited]
     
    A spoken dialogue system that can understand spontaneous speech needs to handle extensive range of speech compared to the read speech that have been studied so far. The spoken language has looser restriction of the grammar than the written language and has ambiguous phenomena such as interjections, ellipses, inversions, repairs, unknown words and so on. It must be noted the fact that a recognizer may output the sentence that human being never speaks. Therefore, the interpreter must cope not only with spontaneous sentences but also with illegal sentences having recognition errors. We explain...
  • Hidano Masaru, ITOH TOSHIHIKO, Yamamoto Mikio, Nakagawa Seiichi  全国大会講演論文集  50-  (2)  467  -468  1995/03/15  [Not refereed][Not invited]
     
    音声対話システムにおいて自然な発話における間投詞、助詞落ち、言い直し、倒置などを含む文の理解、あるいは誤認識文からの発話文の復元は対話システム品質を向上させるために必要不可欠である。本稿では人間がいかにして文の復元を行なっているかを被験者実験を通して調べ、それを参考にして復元ストラテジーを考案し、ロバストな意味理解システムを構築した。
  • A Robust Spoken Dialogue System Basoz on Understanding Mechanism of Human Being
    36-  (4)  471  -481  1995  [Not refereed][Not invited]
  • ITOH TOSHIHIKO, Otani Koji, Hidano Masaru, Yamamoto Mikio, Nakagawa Seiichi  IPSJ SIG Notes  94-  (109)  49  -56  1994/12/15  [Not refereed][Not invited]
     
    It is difficult to recognize and understand spontaneous speech, because spontaneous speech has many phenomena of ambiguty such as omissions, inversions, repairs and so on. Since there is a trade-off between the looseness of linguistic constraints and recognition precision, the recognizer cannot perfectly recognize the completely free speech of the user on the current art of speech recognition. Therefore some problems arise. First problem is that there are gaps between sentences a dialog sysytem can accept and sentences the user wants to say. Second problem is that the semantic analyzer has ...
  • Itoh Toshihiko, Ohtani Kohji, Hidano Masaru, Yamamoto Mikio, Nakagawa Seiichi  IEICE technical report. Speech  94-  (398)  49  -56  1994/12/15  [Not refereed][Not invited]
     
    It is difficult to recognize and understand spontaneous speech, because spontaneous speech has many phenomena of ambiguty such as omissions,inversions,repairs and so on.Since there is a trade-off between the looseness of linguistic constraints and recognition precision,the recognizer cannot perfectly recognize the completely free speech of the user on the current art of speech recognition. Therefore some problems arise.First problem is that there are gaps between sentences a dialog sysytem can accept and sentences the user wants to say.Second problem is that the semantic analyzer has to und...
  • Yamamoto Mikio, Hidano Masaru, Itoh Toshihiko, Kai Atsuhiko, Nakagawa Seiichi  IPSJ SIG Notes  94-  (57)  91  -98  1994/07/07  [Not refereed][Not invited]
     
    This paper describes the spoken dialog system for spontaneous speech. It is difficult to recognize and understand spontaneous speech, because spontaneous speech has many phenomena of ambiguity such as ommitions, inversions, repairs and so on. Since there is a trade-off between looseness of linguistic constraints and recognition precision, the recognition rate of speech recognizer is limited. Therefore, the interpretation part must cope with not only spontaneous sentences but illegal sentences with recognition errors. We developed the robust interpretation method and applied it to the dialog...

Association Memberships

  • 情報処理学会   人工知能学会   日本音響学会   

Research Projects

  • Ministry of Education, Culture, Sports, Science and Technology:Grants-in-Aid for Scientific Research(若手研究(B))
    Date (from‐to) : 2008 -2010 
    Author : Toshihiko ITOH
     
    In this research we study how dialog rhythm influences user's comfort and reliability and propose a new framework for building spoken interfaces based on this framework. Although we confirmed user's increased satisfaction and smoothness of conversation, we have not reached the level of naturalness of human to human dialog.To achieve this we have improved our model for generating rhythmical dialogs, re-implemented it into the system and increased processing speed.In result, we achieved better human-likeness and reliability comparing to the previous system, but we could not reach evaluation s...
  • 文部科学省:科学研究費補助金(若手研究(B))
    Date (from‐to) : 2005 -2007 
    Author : 伊藤 敏彦
     
    本研究は音声インターフェイスにおいて、対話のリズムと身体性が、ユーザの快適性や安全性にどれほどの影響を与えるか明らかにし、これらの要素を音声インターフェイスに導入するための新たな枠組みを提案することである。昨年までこの目的のために対話リズムを考慮した音声対話システムの基本システムを構築した。これは人間同士の対話データから発話タイミングを機械学習し、ユーザの音響的特徴と言語的特徴から音声対話システムの発話タイミングを決定する方法で実現した。しかし、予備的な評価実験からユーザ満足度や発話のしやすさなどの向上は確認できたが、人間同士の対話に近い感覚を与えるまでには至らなかった。この原因を調査するために人間同士の対話データを収集し、発話タイミングや韻律的特徴を発話意図(発話内容)の違いにより分類・比較した結果、対話における話し手の発話タイミングは対話相手の発話特徴のみで決定できるわけではなく、話し手の発話意図(発話内容)や発話の重要度、感情などに大きく影響を受けることが示唆された。つまり、音声対話システムがリズミカルに発話するだけでは人間は機械に対して人間らしさ(安心感)を感じるわけではなく、発話意図(発話内容)や発話の重要度、感情なども考慮した適切なタイミングで発話することが人間らしさ(安心感)を感じさせるために重要である。また、聞き手も話し手の発話タイミングの変化やずれなどから発...
  • 文部科学省:科学研究費補助金(特定領域研究)
    Date (from‐to) : 2006 -2006 
    Author : 北岡 教英, 中川 聖一, 井藤 敏彦
     
    人間と機械が対話を行うことを考えるとき,機械が人間同士の会話と同様にあいつちなどさまざまな応答を自然に返すことができれば,より円滑な対話を行うことが期待できる.本研究では,特に雑談のような対話に着目し,自然な雑談対話をする上で最も重要である応答タイミングと韻律的同調性の生成手法を提案した。さらにそれを用いて、種々の雑談的対話現象を生成できる対話システムの枠組みを提案し、それに基づく対話システムを試作した.まず、ユーザーシステム間の対話において、システムは時々刻々ユーザ発話の特徴から決定ルールを用いて相槌や話者交替の判断やそのタイミングを生成し、リアルタイムに応答する手法を実現した。これにより、オーバラップした相槌や話者交代、さらに相手の発話内容を予測してオーバラップして発話する「共同補完」などの、自然な対話で生起するさまざま雑談現象に対応できる手法となることを示した。タイミング生成や、発話内容の選択には、最後のユーザ発話の表層的言語情報及び韻律情報(ピッチやパワーの変化パターン)を情報源として用いた。さらに、対話はスムーズで盛り上がった場合には対話者間の韻律、特に声の高さが同期して変動していることを、実際の人間同士の対話の分析により確かめた。そして、それをシステムで実現するために、ユーザの韻律に追従する韻律制御モデルを提案して、その挙動が人間の動作に似たものであることを示した...
  • 文部科学省:科学研究費補助金(特定領域研究)
    Date (from‐to) : 2003 -2004 
    Author : 伊東 幸宏, 小西 達裕, 伊藤 敏彦
     
    (1)知識表現の再設計実用規模の知識ベース構築にあたり、単に規模によるコスト増大にとどまらない問題が生じた。一般に、問題解決の場面や学習の進行につれ、同一対象についての知識でも一貫しない表現を持つことがある。例えば高校化学では化学現象を再現する際、分子・原子間の対応関係レベル(反応式レベル)で考えれば良い場合と、反応に直接関わらない物質も含め、実空間における化学反応レベルで考えるべき場合がある。このように場面毎に知識の使い分けを必要とする場合、知識表現や推論機構を完全に一定のアーキテクチャのもとで設計することは難しい。この問題に対処するために、本研究では(a)ひとつの概念に複数の属性値を与えたり、ひとつの概念を表す知識を複数持つことを許容する知識表現手法(b)問題に応じて、適切な知識を選択する問題解決エンジンを設計実装した。(2)システムの再構築昨年度まで、システム開発にはUNIX環境におけるTCL/TK言語を用いていた。しかし現場教師との交流などを通じて、教育現場への可搬性、高校における教育用計算機環境の現状との整合性、システム運用の容易さ、処理速度の面から、Webブラウザ上で稼動するJava環境による開発がより望ましいとの知見を得た。知識表現は基本的にはプログラミング言語に依存しないが、部分的に修正を要する部分もあり、見直しを行った。(3)オーサリングツール設計のための基...
  • Ministry of Education, Culture, Sports, Science and Technology:Grants-in-Aid for Scientific Research(基盤研究(B))
    Date (from‐to) : 2002 -2004 
    Author : Yukihiro ITOH, 伊藤 敏彦, 竹内 勇剛, 小西 達裕, 小暮 悟
     
    1.For the system generating verbal and visual explanations of target programs(1)Expansion of our program understanding mechanismIn our previous work, we proposed a mechanism to understand a behavior of a program in the domain world of "greater and lessen". We have developed an extended method for another domain world "two dimensional space" that is used for some numerical analysis such as 'Newton method' or 'Simpson method'. We achieved it by using heuristic rules to specify correspondence between a variable and an attribute of an entity in the domain world.(2)Development of a method to gen...
  • Ministry of Education, Culture, Sports, Science and Technology:Grants-in-Aid for Scientific Research(基盤研究(B))
    Date (from‐to) : 2001 -2003 
    Author : Seiichi NAKAGAWA, 甲斐 充彦, 北岡 教英, 小林 聡, 中野 崇, 伊藤 敏彦
     
    While some speech interface systems have been developed for accessing Web resources, they are limited for accessing some specific contents and they don't provide a universal interface for arbitrary information retrieval services on the WWW. We propose an interactive speech user interface system, which could be applied to many form-based information retrieval services of the WVVW. In particular, our system was implemented based on a client-server, a Web proxy-centered architecture and employed an information extraction and language processing of HTML documents for providing a general-purpose...
  • 文部科学省:科学研究費補助金(特定領域研究(B), 特定領域研究)
    Date (from‐to) : 2000 -2003 
    Author : 北澤 茂良, 北村 達也, Campbell Nick, 板橋 秀一, 伊藤 敏彦, 市川 熹, 桐山 伸也, Nick Campbell
     
    1.新規の韻律コーパスの作成(静岡大学)韻律コーパスとして日本語のMULTEXT韻律データベースの40パッセジにJ-ToBI韻律タグ付けを完了し、同様の手法で、筑波大学と千葉大学と東京大孝と東工大グループの既存音声コーパスの各種案内読上げと模擬対話と対話音声、マルチモーダル対話音声、天気予報、模擬感情音声へのJ-ToBIタグ付けを行った。これらのラベリング作業について研究支援者を雇用して行った。言語情報を利用した韻律ラベリング手法の開発と、音素ラベリング支援のための音素自動セグメンテーションと、連接境界における音響的特徴の詳細について研究成果を発表した。2.既存の音声コーパスの韻律分析と韻律コーパスの作成(筑波大学)既存の音声コーパスとして、日本音響学会「研究用連続音声データベース」の各種案内読上げ文と模擬対話、重点領域研究「音声対話」の対話音声コーパス、の3種のコーパスに基本周波数分析と発話ラベルと付与した。200ms以上の無音区間で区切られた音声区間を発話単位として、発話単位長を読上げ音声と模擬対話音声で比較した。模擬対話では間投詞や割込みによって発話単位が短くなる。音声パワーと基本周波数の標準偏差は対話に比べて読上げは狭い範囲に集中していることが分かった。3.ジェスチャー・顔表情付の対話音声収録(千葉大学)音声対話における視線や頷きなどジェスチャーを記録・分析するため、...
  • Ministry of Education, Culture, Sports, Science and Technology:Grants-in-Aid for Scientific Research(基盤研究(C))
    Date (from‐to) : 1999 -2001 
    Author : Hiromasa NAKATANI, 伊藤 敏彦, 佐治 斉
     
    Conventional interactions between humans and machines are performed mainly by keyboards or special pointing devices. In this project, we have investigated human interface that analyzes user's gestures and facial expressions and identifies their intentions and emotions. The main subjects of this projects are as follows :1. 3D facial motion measuring systemWe have developed a method for measuring three- dimensional moving facial shapes. The system uses two light sources and a slit pattern projector. Natural facial motions can be entered to the system with the sampling rate up to 2/15 sec.2. H...
  • 文部科学省:科学研究費補助金(特定領域研究(A))
    Date (from‐to) : 2000 -2000 
    Author : 伊東 幸宏, 小西 達裕, 近藤 真, 中谷 広正, 伊藤 敏彦
     
    1)入力文の意味解釈能力の向上に関する検討1-1)対話訓練に効果的な協調的タスクを設定し、取り扱う必要がある概念・語彙・文体について事例分析を行った。1-2)同義表現の吸収、文意の文脈への位置付け、文意の統合(蓄積)を可能にする意味表現方法を開発した。この意味表現方法は以下のような特色を持つ。・表層の依存構造によらず、一定の表現形式で意味が表現可能・意味内容毎に、それを位置付ける場所が決まっている2)対話訓練を指向した対話制御に関する検討2.1)協調的タスクのためのプランニング手法を開発し、特に1-1)で設定したタスクについて知識の設計を行った。2-2)タスクに対する学習者の発話の有効性を踏まえてシステムが取るべき教育行動を、対話戦略として実装した。3)タスク設定に関する検討:学習目標を効果的に達成する上で適したタスクを自動設定する手法を開発することをめざし、特に今年度は、学習者に与えるタスクと、それにより学習される事項の関係を整理した。4)試作システムの構築:以上の成果を踏まえて、ホテル検索を中心とした対話をテストベッドにした日本語対話システムを構築した。現状では、対話範囲をホテル検索や観光名所案内等に限定した上で、「ホテルを探して下さい」・「名古屋テレビ塔はどこにありますか」等の、文法に則った正しい文の入力を受け付けることが可能である。この入力の中には「依頼」や「動詞のて...
  • 協調的音声対話制御
  • 統計的音声言語処理
  • Cooperative Speech Dialogue Manag
  • Stochastic speech Language Processing

Industrial Property Rights



Copyright © MEDIA FUSION Co.,Ltd. All rights reserved.