澤山 正貴 (サワヤマ マサタカ)

情報科学研究院 メディアネットワーク部門 情報メディア学分野准教授
Last Updated :2025/06/07

■研究者基本情報

学位

  • 博士(学術), 千葉大学

プロフィール情報

  • 2010年 - 2013年 千葉大学大学院融合科学研究科情報科学専攻 博士後期課程


    2013年 - 2016年 NTTコミュニケーション科学基礎研究所 人間情報研究部 感覚表現グループ リサーチアソシエイト


    2016年 - 2018年 NTTコミュニケーション科学基礎研究所 人間情報研究部 感覚表現グループ 研究員


    2018年 - 2020年 NTTコミュニケーション科学基礎研究所 人間情報研究部 感覚表現グループ 研究主任


    2021年 - 2022年 Inria, Team Flowers ポスドク研究員


    2022年 - 2023年 東京大学大学院情報理工学系研究科 特任講師


    2023年 - 2025年 東京大学大学院情報理工学系研究科 講師


    2025年 - 現在 北海道大学大学院情報科学研究院 准教授

Researchmap個人ページ

研究分野

  • 情報通信, 知覚情報処理
  • 情報通信, 計算科学
  • 情報通信, 知能情報学
  • 人文・社会, 認知科学
  • 人文・社会, 実験心理学

■研究活動情報

論文

  • Probing the link between vision and language in material perception using psychophysics and unsupervised learning
    Chenxi Liao, Masataka Sawayama, Bei Xiao
    PLOS Computational Biology, 20, 10, e1012481, e1012481, Public Library of Science (PLoS), 2024年10月03日, [査読有り]
    英語, 研究論文(学術雑誌), We can visually discriminate and recognize a wide range of materials. Meanwhile, we use language to describe what we see and communicate relevant information about the materials. Here, we investigate the relationship between visual judgment and language expression to understand how visual features relate to semantic representations in human cognition. We use deep generative models to generate images of realistic materials. Interpolating between the generative models enables us to systematically create material appearances in both well-defined and ambiguous categories. Using these stimuli, we compared the representations of materials from two behavioral tasks: visual material similarity judgments and free-form verbal descriptions. Our findings reveal a moderate but significant correlation between vision and language on a categorical level. However, analyzing the representations with an unsupervised alignment method, we discover structural differences that arise at the image-to-image level, especially among ambiguous materials morphed between known categories. Moreover, visual judgments exhibit more individual differences compared to verbal descriptions. Our results show that while verbal descriptions capture material qualities on the coarse level, they may not fully convey the visual nuances of material appearances. Analyzing the image representation of materials obtained from various pre-trained deep neural networks, we find that similarity structures in human visual judgments align more closely with those of the vision-language models than purely vision-based models. Our work illustrates the need to consider the vision-language relationship in building a comprehensive model for material perception. Moreover, we propose a novel framework for evaluating the alignment and misalignment between representations from different modalities, leveraging information from human behaviors and computational models.
  • Temporal and Spatial Analysis of Event-Related Potentials in Response to Color Saliency Differences Among Various Color Vision Types
    Naoko Takahashi, Masataka Sawayama, Xu Chen, Yuki Motomura, Hiroshige Takeichi, Satoru Miyauchi, Chihiro Hiramatsu
    Frontiers in Human Neuroscience, 18, 1, 17, 2024年10月02日, [査読有り], [国際誌]
    英語, 研究論文(学術雑誌), Introduction: Human color vision exhibits significant diversity that cannot be fully explained by categorical classifications. Understanding how individuals with different color vision phenotypes perceive, recognize, and react to the same physical stimuli provides valuable insights into sensory characteristics. This study aimed to identify behavioral and neural differences between different color visions, primarily classified as typical trichromats and anomalous trichromats, in response to two chromatic stimuli, blue-green and red, during an attention-demanding oddball task.

    Methods: We analyzed the P3 component of event-related potentials (ERPs), associated with attention, and conducted a broad spatiotemporal exploration of neural differences. Behavioral responses were also analyzed to complement neural data. Participants included typical trichromats (n = 13) and anomalous trichromats (n = 5), and the chromatic stimuli were presented in an oddball paradigm.

    Results: Typical trichromats exhibited faster potentiation from the occipital to parietal regions in response to the more salient red stimulus, particularly in the area overlapping with the P3 component. In contrast, anomalous trichromats revealed faster potentiation to the expected more salient blue-green stimulus in the occipital to parietal regions, with no other significant neural differences between stimuli. Comparisons between the color vision types showed no significant overall neural differences.

    Discussion: The large variability in red-green sensitivity among anomalous trichromats, along with neural variability not fully explained by this sensitivity, likely contributed to the absence of clear neural distinctions based on color saliency. While reaction times were influenced by red-green sensitivity, neural signals showed ambiguity regarding saliency differences. These findings suggest that factors beyond red-green sensitivity influenced neural activity related to color perception and cognition in minority color vision phenotypes. Further research with larger sample sizes is needed to more comprehensively explore these neural dynamics and their broader implications.
  • Stick to your role! Stability of personal values expressed in large language models
    Grgur Kovač, Rémy Portelas, Masataka Sawayama, Peter Ford Dominey, Pierre-Yves Oudeyer
    PLOS ONE, 19, 8, e0309114, e0309114, Public Library of Science (PLoS), 2024年08月26日, [査読有り], [国際共著], [国際誌]
    英語, 研究論文(学術雑誌), The standard way to study Large Language Models (LLMs) through benchmarks or psychology questionnaires is to provide many different queries from similar minimal contexts (e.g. multiple choice questions). However, due to LLM’s highly context-dependent nature, conclusions from such minimal-context evaluations may be little informative about the model’s behavior in deployment (where it will be exposed to many new contexts). We argue that context-dependence should be studied as another dimension of LLM comparison alongside others such as cognitive abilities, knowledge, or model size. In this paper, we present a case-study about the stability of value expression over different contexts (simulated conversations on different topics), and as measured using a standard psychology questionnaire (PVQ) and behavioral downstream tasks. We consider 21 LLMs from six families. Reusing methods from psychology, we study Rank-order stability on the population (interpersonal) level, and Ipsative stability on the individual (intrapersonal) level. We explore two settings: with and without instructing LLMs to simulate particular personalities. We observe similar trends in the stability of models and model families—Mixtral, Mistral, GPT-3.5 and Qwen families being more stable than LLaMa-2 and Phi—over those two settings, two different simulated populations, and even on three downstream behavioral tasks. When instructed to simulate particular personas, LLMs exhibit low Rank-Order stability, and this stability further diminishes with conversation length. This highlights the need for future research directions on LLMs that can coherently simulate a diversity of personas, as well as how context-dependence can be studied in more thorough and efficient ways. This paper provides a foundational step in that direction, and, to our knowledge, it is the first study of value stability in LLMs. The project website with code is available at https://sites.google.com/view/llmvaluestability.
  • Decoding time-resolved neural representations of orientation ensemble perception
    Ryuto Yashiro, Masataka Sawayama, Kaoru Amano
    Frontiers in Neuroscience, 18, 1, 13, Frontiers Media SA, 2024年08月01日, [査読有り], [国際誌]
    英語, 研究論文(学術雑誌), The visual system can compute summary statistics of several visual elements at a glance. Numerous studies have shown that an ensemble of different visual features can be perceived over 50–200 ms; however, the time point at which the visual system forms an accurate ensemble representation associated with an individual’s perception remains unclear. This is mainly because most previous studies have not fully addressed time-resolved neural representations that occur during ensemble perception, particularly lacking quantification of the representational strength of ensembles and their correlation with behavior. Here, we conducted orientation ensemble discrimination tasks and electroencephalogram (EEG) recordings to decode orientation representations over time while human observers discriminated an average of multiple orientations. We modeled EEG signals as a linear sum of hypothetical orientation channel responses and inverted this model to quantify the representational strength of orientation ensemble. Our analysis using this inverted encoding model revealed stronger representations of the average orientation over 400–700 ms. We also correlated the orientation representation estimated from EEG signals with the perceived average orientation reported in the ensemble discrimination task with adjustment methods. We found that the estimated orientation at approximately 600–700 ms significantly correlated with the individual differences in perceived average orientation. These results suggest that although ensembles can be quickly and roughly computed, the visual system may gradually compute an orientation ensemble over several hundred milliseconds to achieve a more accurate ensemble representation.
  • Unsupervised learning reveals interpretable latent representations for translucency perception
    Liao C, Sawayama M, Xiao B
    PLoS Computational Biology, 19, (2): e1010878, 1, 31, 2023年01月, [査読有り]
    英語, 研究論文(学術雑誌)
  • An Open-Source Cognitive Test Battery to Assess Human Attention and Memory
    Maxime Adolphe, Masataka Sawayama, Denis Maurel, Alexandra Delmas, Pierre-Yves Oudeyer, Hélène Sauzéon
    Frontiers in Psychology, 13, 1, 16, Frontiers Media SA, 2022年06月10日, [査読有り], [筆頭著者, 責任著者]
    英語, 研究論文(学術雑誌), Cognitive test batteries are widely used in diverse research fields, such as cognitive training, cognitive disorder assessment, or brain mechanism understanding. Although they need flexibility according to their usage objectives, most test batteries are not available as open-source software and are not be tuned by researchers in detail. The present study introduces an open-source cognitive test battery to assess attention and memory, using a javascript library, p5.js. Because of the ubiquitous nature of dynamic attention in our daily lives, it is crucial to have tools for its assessment or training. For that purpose, our test battery includes seven cognitive tasks (multiple-objects tracking, enumeration, go/no-go, load-induced blindness, task-switching, working memory, and memorability), common in cognitive science literature. By using the test battery, we conducted an online experiment to collect the benchmark data. Results conducted on 2 separate days showed the high cross-day reliability. Specifically, the task performance did not largely change with the different days. Besides, our test battery captures diverse individual differences and can evaluate them based on the cognitive factors extracted from latent factor analysis. Since we share our source code as open-source software, users can expand and manipulate experimental conditions flexibly. Our test battery is also flexible in terms of the experimental environment, i.e., it is possible to experiment either online or in a laboratory environment.
  • Language-biased image classification: evaluation based on semantic representations
    Lemesle*, Y, Sawayama*, M, Valle-Perez, G, Adolphe, M, Sauzéon, H, Oudeyer, P. Y
    International Conference on Learning Representations (ICLR) 2022, 1, 19, 2022年04月, [査読有り], [筆頭著者, 責任著者]
    英語, 研究論文(国際会議プロシーディングス)
  • Visual discrimination of optical material properties: a large-scale study.
    Sawayama, M, Dobashi, Y, Okabe, M, Hosokawasa, K, Koumura, T, Saarela, T, Olkkonen, M, Nishida, S
    Journal of Vision, 22(2), 17, 1, 24, 2022年02月, [査読有り], [筆頭著者, 責任著者], [国際誌]
    英語, 研究論文(学術雑誌), Complex visual processing involved in perceiving the object materials can be better elucidated by taking a variety of research approaches. Sharing stimulus and response data is an effective strategy to make the results of different studies directly comparable and can assist researchers with different backgrounds to jump into the field. Here, we constructed a database containing several sets of material images annotated with visual discrimination performance. We created the material images using physically based computer graphics techniques and conducted psychophysical experiments with them in both laboratory and crowdsourcing settings. The observer's task was to discriminate materials on one of six dimensions (gloss contrast, gloss distinctness of image, translucent vs. opaque, metal vs. plastic, metal vs. glass, and glossy vs. painted). The illumination consistency and object geometry were also varied. We used a nonverbal procedure (an oddity task) applicable for diverse use cases, such as cross-cultural, cross-species, clinical, or developmental studies. Results showed that the material discrimination depended on the illuminations and geometries and that the ability to discriminate the spatial consistency of specular highlights in glossiness perception showed larger individual differences than in other tasks. In addition, analysis of visual features showed that the parameters of higher order color texture statistics can partially, but not completely, explain task performance. The results obtained through crowdsourcing were highly correlated with those obtained in the laboratory, suggesting that our database can be used even when the experimental conditions are not strictly controlled in the laboratory. Several projects using our dataset are underway.
  • Crystal or Jelly? Effect of Color on the Perception of Translucent Materials with Photographs of Real-world Objects.
    Liao, C, Sawayama, M, Xiao, B
    Journal of Vision, 22(2), 6, 1, 23, 2022年02月, [査読有り]
    英語, 研究論文(学術雑誌)
  • The roles of lower- and higher-order surface statistics in tactile texture perception
    Scinob Kuroki, Masataka Sawayama, Shin’ya Nishida
    Journal of Neurophysiology, 126, 1, 95, 111, American Physiological Society, 2021年07月01日, [査読有り], [国際誌]
    英語, 研究論文(学術雑誌), Humans can discriminate subtle spatial patterns differences in the surrounding world through their hands, but the underlying computation remains poorly understood. Here, we 3-D-printed textured surfaces and analyzed the tactile discrimination performance regarding the sensitivity to surface statistics. The results suggest that observers have sensitivity to lower-order statistics whereas not to higher-order statistics. That is, touch differs from vision not only in spatiotemporal resolution but also in (in)sensitivity to high-level surface statistics.
  • Discounting mechanism underlies extinction illusion
    Lana Okubo, Kazuhiko Yokosawa, Masataka Sawayama, Takahiro Kawabe
    Consciousness and Cognition, 90, 103100, 103100, Elsevier BV, 2021年04月, [査読有り]
    英語, 研究論文(学術雑誌)
  • A computational mechanism for seeing dynamic deformation.
    Kawabe, T, Sawayama, M
    eNeuro, 7, 2, 1, 14, 2020年03月, [査読有り], [国際誌]
    英語, 研究論文(学術雑誌), Human observers perceptually discriminate the dynamic deformation of materials in the real world. However, the psychophysical and neural mechanisms responsible for the perception of dynamic deformation have not been fully elucidated. By using a deforming bar as the stimulus, we showed that the spatial frequency of deformation was a critical determinant of deformation perception. Simulating the response of direction-selective units (i.e., MT pattern motion cells) to stimuli, we found that the perception of dynamic deformation was well explained by assuming a higher-order mechanism monitoring the spatial pattern of direction responses. Our model with the higher-order mechanism also successfully explained the appearance of a visual illusion wherein a static bar apparently deforms against a tilted drifting grating. In particular, it was the lower spatial frequencies in this pattern that strongly contributed to the deformation perception. Finally, by manipulating the luminance of the static bar, we observed that the mechanism for the illusory deformation was more sensitive to luminance than contrast cues.
  • Image-based translucency transfer through correlation analysis over multi-scale spatial color distribution               
    Todo, H., Yatagawa, T., Sawayama, M., Dobashi, Y., & Kakimoto, M.
    the Visual Computer, 35, 6-8, 811, 822, 2019年06月, [査読有り]
    英語, 研究論文(学術雑誌)
  • Motion Perception: From Detection to Interpretation               
    Nishida, S., Kawabe, T., Sawayama, M., & Fukiage, T.
    Annual Review of Vision Science, 4, 1, 501, 523, 2018年09月, [査読有り]
    英語, 研究論文(学術雑誌)
  • Material and shape perception based on two types of intensity gradient information
    Masataka Sawayama, Shin'ya Nishida
    PLoS Computational Biology, 14, 4, 1, 40, Public Library of Science, 2018年04月01日, [査読有り], [筆頭著者, 責任著者]
    英語, 研究論文(学術雑誌)
  • Haptic texture perception on 3D-printed surfaces transcribed from visual natural textures
    Scinob Kuroki, Masataka Sawayama, Shin’ya Nishida
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 10893, 102, 112, Springer Verlag, 2018年, [査読有り]
    英語, 研究論文(国際会議プロシーディングス)
  • Animating static objects by illusion-based projection mapping
    Taiki Fukiage, Takahiro Kawabe, Masataka Sawayama, Shin'ya Nishida
    JOURNAL OF THE SOCIETY FOR INFORMATION DISPLAY, 25, 7, 434, 443, 2017年07月, [査読有り]
    英語, 研究論文(学術雑誌)
  • Visual wetness perception based on image color statistics
    Masataka Sawayama, Edward H. Adelson, Shin'ya Nishida
    JOURNAL OF VISION, 17, 5, 1, 24, 2017年05月, [査読有り], [筆頭著者, 責任著者]
    英語, 研究論文(学術雑誌)
  • Human perception of subresolution fineness of dense textures based on image intensity statistics
    Masataka Sawayama, Shin'ya Nishida, Mikio Shinya
    JOURNAL OF VISION, 17, 4, 1, 18, 2017年04月, [査読有り], [筆頭著者, 責任著者]
    英語, 研究論文(学術雑誌)
  • Deformation Lamps: A Projection Technique to Make Static Objects Perceptually Dynamic
    Takahiro Kawabe, Taiki Fukiage, Masataka Sawayama, Shin'ya Nishida
    ACM TRANSACTIONS ON APPLIED PERCEPTION, 13, 2, Article No. 10, 2016年03月, [査読有り]
    英語, 研究論文(学術雑誌)
  • Deformation Lamps: A projection technique to make a static picture dynamic
    Takahiro Kawabe, Masataka Sawayama, Shin'ya Nishida
    ACM SIGGRAPH 2015 Emerging Technologies, SIGGRAPH 2015, Association for Computing Machinery, Inc, 2015年07月31日, [査読有り]
    英語, 研究論文(国際会議プロシーディングス)
  • Stain on texture: Perception of a dark spot having a blurred edge on textured backgrounds
    Masataka Sawayama, Eiji Kimura
    VISION RESEARCH, 109, 209, 220, 2015年04月, [査読有り], [筆頭著者, 責任著者]
    英語, 研究論文(学術雑誌)
  • Spatial organization affects lightness perception on articulated surrounds
    Masataka Sawayama, Eiji Kimura
    JOURNAL OF VISION, 13, 5, 1, 14, 2013年, [査読有り], [筆頭著者, 責任著者]
    英語, 研究論文(学術雑誌)
  • Local computation of lightness on articulated surrounds
    Masataka Sawayama, Eiji Kimura
    I-PERCEPTION, 3, 8, 505, 514, 2012年, [査読有り], [筆頭著者, 責任著者]
    英語, 研究論文(学術雑誌)

その他活動・業績