Faculty Profiles - THOMAS GABRIEL FRANCIS DIEGO

Information

写真a

THOMAS GABRIEL FRANCIS DIEGO

Organization

Faculty of Information Science and Electrical Engineering Department of Advanced Information Technology Associate Professor
School of Engineering Department of Electrical Engineering and Computer Science（Concurrent）
Graduate School of Information Science and Electrical Engineering Department of Information Science and Technology（Concurrent）

Contact information

Tel

0928023596

Profile

My research is related to 3D reconstruction and digital humans, including RGB-D SLAM, AI-based 3D shape estimation from few images, and multiview 3D human body capture. I have developed memory-efficient 3D scene reconstruction using planar patches and cheap depth sensors for accurate indoor mapping. Leveraging deep neural networks, I have created methods to reconstruct detailed 3D human body shapes from a single 2D image, later extending this to unsupervised shape estimation from sparse views. My work on neural shape representations advanced high-accuracy 3D shape estimation from multiview images. Additionally, my collaborative research enables millimetric-accuracy 3D human body capture from multiview images with real-time free-viewpoint rendering. Education: -Year 2022-Now: Associate professor at Kyushu University, Fukuoka (Japan) -Year 2017-2022: Assistant professor at Kyushu University, Fukuoka (Japan) -Year 2015-2017: JSPS Post-doc research at Kyushu University, Fukuoka (Japan) -Year 2012-2015: Post-doc researcher at the National Institute of Informatics, Tokyo (Japan). -Year 2009-2012: Phd course at the National Institute of Informatics, Tokyo (Japan) as a student of SOKENDAI. (Best student award) -Year 2005-2008: Master course at the ENSIMAG-INPG (Engineering school of Computer Science and Mathematics), Grenoble, option IRV (Image and Virtual Reality). Diploma with honors. -Year 2003-2005: Two-year intensive course in science and mathematics at undergraduate level as preparation for competitive entrance exam to French engineering schools, lycee Champollion, in Grenoble.

Homepage

https://diegothomas.github.io/DigitalHumans-lab/

External link

Research Areas

Informatics / Human interface and interaction

Degree

Master (France)
Ph. D

Research History

Kyushu University Associate Professor

2022.12 - Present
Kyushu University Assistant Professor

2017.4 - 2022.11
I was a post doc researcher at the National Institute of Informatics from April 2012 to March 2015.

Education

Kyushu University Digital Humans 1-2

2023.4
Kyushu University Information Science

2023.10

Research Interests・Research Keywords

Research theme： multimodal data to 3D motion generation

Keyword： motion diffusion model

Research period： 2024.9 - Present
Research theme： Digital humans

Keyword： generative AI; 3D and 4D capture; motion retargeting; gesture

Research period： 2023.1
Research theme： multiview 3D reconstruction

Keyword： 3D reconstruction; differentiable rendering

Research period： 2022.8 - Present
Research theme： AI-based avatar animation synthesis

Keyword： deep learning; avatar animation; dense deformation; texture

Research period： 2021.6 - 2022.6
Research theme： Aerial-based outdoor 3D scene mapping

Keyword： aerial drone; RGB-D SLAM; outdoor scene

Research period： 2020.4 - 2022.4
Research theme： 3D shape from a single image

Keyword： Deep learning, 3D shape estimation

Research period： 2019.4 - 2021.8
Research theme： Mediated Reality Agents for educational applications targeting young children

Keyword： Education support, Virtual Reality, Augmented Reality, Mixed Reality, Virtual Assistant, RGB-D camera.

Research period： 2018.5 - 2020.6
Research theme： High frame-rate 3D reconstruction with multiple cameras

Keyword： RGB-D camera; high frame rate; multi-view set-up; real time; distributed system; GPU optimization; volumetric reconstruction; fast and uncontrolled motion

Research period： 2017.12 - 2018.2
Research theme： Human body 3D reconstruction in dynamic scenes

Keyword： RGB-D camera; fast motion; skeleton; deforming bounding boxes; volumetric depth fusion; ICP; GPU optimization; large-scale scene

Research period： 2017.4 - 2018.2
Research theme： Facial 3D reconstruction and expression tracking

Keyword： RGB-D camera; Facial expression; Blendshape; Template mesh; Texturing; 3D modeling; Retargeting; Deviation mapping; Real-time.

Research period： 2015.4 - 2018.2
Research theme： 3D reconstruction of large-scale static indoor scenes using consumer-grade RGB-D cameras

Keyword： RGB-D camera; SLAM; Depth Fusion; 3D modeling; Camera tracking; Loop closure

Research period： 2012.4 - 2017.4

Awards

MIRU Nagao Award

2024.8 Meeting on Image Recognition and Understanding (MIRU) 3D Shape Modeling with Adaptive Centroidal Voronoi Tesselation on Signed Distance Field

Diego Thomas (Kyusyu University), Jean-Sebastien Franco (INRIA), Edmond Boyer (INRIA)

　More details

Award type：Award from Japanese society, conference, symposium, etc. Country：Japan

受賞理由: 多視点画像を用いた３次元再構成問題において、適応的重心ボロノイ分割を用いた新たなニューラル場の表現による手法を提案する研究であり、任意の法線を持つ物体表面形状を直接表現できるという特徴を活かして、従来の座標軸方向に拘束された離散化に基づく手法と比較して少ない分割数で高い再構成精度を実現している。高速な最適化手法および微分可能レンダリングなど実装上の工夫も提案しており、アイデアと実装双方において完成度が高く、MIRU長尾賞にふさわしい論文である。
MIRU Excellent Paper Award

2024.8 Meeting on Image Recognition and Understanding (MIRU) Text-Guided Diverse Scene Interaction Synthesis by Disentangling Actions from Scenes

Hitoshi Teshima (Kyushu University), Naoki Wake (Microsoft), Diego Thomas (Kyushu University), Yuta Nakashima (Osaka University), Hiroshi Kawasaki (Kyushu University), Katsushi Ikeuchi (Microsoft)

　More details

Country：Japan

受賞理由: シーン中オブジェクトとのインタラクションを伴う動作生成という挑戦的課題に取り組んだ研究である。このため、動作指示のみからの動作生成結果から取り出したKey Poseとシーンの接合を既存の基盤モデルで推定した上で、そこに至る軌跡を生成し動作生成するパイプラインにより、シーン情報が含まれない既存データセットのみで学習させる現実的な方法を提案した。周辺環境に応じた動作生成という難しい問題設定に挑戦し有効性を確認した点は高い評価に値する。
Best paper award

2019.11 The 9th Pacific-Rim Symposium on Image and Video Technology (PSIVT 2019) This prize was received after the presentation by Akihiko Sayo about his joint work on "Human shape reconstruction with loose clothes from partially observed data by pose specific deformation"
Best poster presentation award

2019.11 Machine Perception and Robotics (MPR 2019) This prize was obtained after the presentation of Hayato Onizuka about his work on "Regression of 3D body shapes from a single image in a tetrahedral volume"
Outstanding research achievement and contribution to ASPCIT 2019 Annual Meeting Invited Presentation

2019.7 Asia Pacific Society for Computing and Information Technology This prize was received after an invited talk at APSCIT 2019 organized in Sapporo.
Best poster award

2019.2 IW-FCV2019 This prize was received after the poster presentation of Maxence Remy about the joint research "Merging SLAM and photometric stereo for 3D reconstruction with a moving camera"
Outstanding reviewer

2015.7 MIRU 2015 Outstanding reviewer
Best student award

2012.3 National Insitute of Informatics Best student award at the end of my Ph.D course.

▼display all

Papers

ProbeSDF: Light Field Probes For Neural Surface Reconstruction Reviewed International coauthorship

B Toussaint, D Thomas, JS Franco

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2025) 11026 - 11035 2025.6

　More details

Language：English Publishing type：Research paper (international conference proceedings)
Millimetric Human Surface Capture in Minutes Reviewed International coauthorship

Briac Toussaint, Laurence Boissieux, Diego Thomas, Edmond Boyer, Jean-Sébastien Franco

SIGGRAPH Asia 2024 Conference Papers 1 - 12 2024.12

　More details

Language：English Publishing type：Research paper (international conference proceedings)
3D Shape Modeling with Adaptive Centroidal Voronoi Tesselation on Signed Distance Field Reviewed

Diego Thomas, Jean-Sébastien Franco, Edmond Boyer

Meeting on Image Recognition and Understanding 2024.8

　More details

Authorship：Lead author,　Corresponding author Language：English Publishing type：Research paper (conference, symposium, etc.)
TetraTSDF: 3D human reconstruction from a single image with a tetrahedral outer shell Reviewed International journal

Hayato Onizuka, Zehra Hayirci, Diego Thomas, Akihiro Sugimoto, Hideaki Uchiyama, Rin-ichiro Taniguchi

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2020.6

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Recovering the 3D shape of a person from its 2D appearance is ill-posed due to ambiguities. Nevertheless, with the help of convolutional neural networks (CNN) and prior knowledge on the 3D human body, it is possible to overcome such ambiguities to recover detailed 3D shapes of human bodies from single images. Current solutions, however, fail to reconstruct all the details of a person wearing loose clothes. This is because of either (a) huge memory requirement that cannot be maintained even on modern GPUs or (b) the compact 3D representation that cannot encode all the details. In this paper, we propose the tetrahedral outer shell volumetric truncated signed distance function (TetraTSDF) model for the human body, and its corresponding part connection network (PCN) for 3D human body shape regression. Our proposed model is compact, dense, accurate, and yet well suited for CNN-based regression task. Our proposed PCN allows us to learn the distribution of the TSDF in the tetrahedral volume from a single image in an end-to-end manner. Results show that our proposed method allows to reconstruct detailed shapes of humans wearing loose clothes from single RGB images.
Human shape reconstruction with loose clothes from partially observed data by pose specific deformation Reviewed International journal

#Akihiko Sayo, #Hayato Onizuka, Diego Thomas, Yuta Nakashima, Hiroshi Kawasaki, and Katsushi Ikeuchi

he 9th Pacific-Rim Symposium on Image and Video Technology 2019.11

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Recent approaches for full-bodyreconstruction use a statistical shape model, which is built upon accu-rate full-body scans of people in skin-tight clothes. Such a model can befitted to a point cloud of a person wearing loose clothes, however, it can-not represent the detailed shape of loose clothes, such as wrinkles and/orfolds. In this paper, we propose a method that reconstructs 3D modelof full-body human with loose clothes by reproducing the deformationsas displacements from the skin-tight body mesh. We take advantage ofa statistical shape model as base shape of full-body human mesh, andthen, obtain displacements from the base mesh by non-rigid registration.To efficiently represent such displacements, we use lower dimensional em-beddings of the deformations. This enables us to regress the coefficientscorresponding to the small number of bases. We also propose a methodto reconstruct shape only from a single 3D scanner, which is realized byshape fitting to only visible meshes as well as intra-frame shape inter-polation. Our experiments with both unknown scene and partial bodyscans confirm the reconstruction ability of our proposed method.
Revisiting Depth Image Fusion with Variational Message Passing Reviewed International journal

Diego Thomas, Ekaterina Sirazitdinova, Akihiro Sugimoto, Rin-ichiro Taniguchi

International conference on 3D vison 2019. 2019.9

　More details

Language：English Publishing type：Research paper (international conference proceedings)

The running average approach has long been perceived as the best choice for fusing depth measurements captured by a consumer-grade RGB-D camera into a global 3D model. This strategy, however, assumes exact correspondences between points in a 3D model and points in the captured RGB-D images. Such assumption does not hold true in many cases because of errors in motion tracking, noise, occlusions, or inconsistent surface sampling during measurements. Accordingly, reconstructed 3D models suffer unpleasant visual artifacts. In this paper, we visit the depth fusion problem from a probabilistic viewpoint and formulate it as a probabilistic optimization using variational message passing in a Bayesian network. Our formulation enables us to fuse depth images robustly, accurately, and fast for high quality RGB-D keyframe creation, even if exact point correspondences are not always available. Our formulation also allows us to smoothly combine depth and color information for further improvements without increasing computational speed. The quantitative and qualitative comparative evaluation on built keyframes of indoor scenes show that our proposed framework achieves promising results for reconstructing accurate 3D models while using low computational power and being robust against misalignment errors without post-processing.
Landmark-guided deformation transfer of template facial expressions for automatic generation of avatar blendshapes Reviewed International journal

Hayato Onizuka, Diego Thomas, Hideaki Uchiyama, Rin-ichiro Taniguchi

Proceedings of the IEEE International Conference on Computer Vision Workshops 2019.9

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Blendshape models are commonly used to track and re-target facial expressions to virtual avatars using RGB-D cameras and without using any facial marker. When using blendshape models, the target avatar model must possess a set of key-shapes that can be blended depending on the estimated facial expression. Creating realistic set of key-shapes is extremely difficult and requires time and professional expertise. As a consequence, blendshape-based re-targeting technology can only be used with a limited amount of pre-built avatar models, which is not attractive for the large public. In this paper, we propose an automatic method to easily generate realistic key-shapes of any avatar that map directly to the source blendshape model (the user is only required to select a few facial landmarks on the avatar mesh). By doing so, captured facial motion can be easily re-targeted to any avatar, even when the avatar has largely different shape and topology compared with the source template mesh. Our experimental results show the accuracy of our proposed method compared with the state-of-the-art method for mesh deformation transfer.
Dense 3D reconstruction by combining photometric stereo and key frame-based SLAM with a moving smartphone and its flashlight Reviewed International journal

@Remy Maxence, Hideaki Uchiyama, Hiroshi Kawasaki, Diego Thomas, Vincent Nozick, Hideo Saito

International Conference on 3D vision 2019.9

　More details

Language：English Publishing type：Research paper (international conference proceedings)

The standard photometric stereo is a technique to densely reconstruct objects’ surfaces using light variation under the assumption of a static camera with a moving light source. In this work, we use photometric stereo to reconstruct dense 3D scenes while moving the camera and the light altogether. In such non-static case, camera poses as well as correspondences between pixels of each frame to apply photometric stereo are required. ORB-SLAM is a technique that can be used to estimate camera poses. To retrieve correspondences, our idea is to start from a sparse 3D mesh obtained with ORB SLAM and then densify the mesh by a plane sweep method using a multi-view photometric consistency. By combining ORB-SLAM and photometric stereo, it is possible to reconstruct dense 3D scenes with a off-the-shelf smartphone and its embedded torchlight. Note that SLAM systems usually struggle with textureless object, which is effectively compensated by the photometric stereo in our method. Experiments are conducted to show that our proposed method gives better results than SLAM alone or COLMAP, especially for partially textureless surfaces.
SegmentedFusion: 3D human body reconstruction using stitched bounding boxes Reviewed International journal

Shih Hsuan Yao, Diego Thomas, Akihiro Sugimoto, Shang-Hong Lai, Rin-Ichiro Taniguchi Kyushu

2018 International Conference on 3D Vision (3DV) 2018.9

　More details

Language：English Publishing type：Research paper (international conference proceedings)

This paper presents SegmentedFusion, a method possessing the capability of reconstructing non-rigid 3D models of a human body by using a single depth camera with skeleton information. Our method estimates a dense volumetric 6D motion field that warps the integrated model into the live frame by segmenting a human body into different parts and building a canonical space for each part. The key feature of this work is that a deformed and connected canonical volume for each part is created, and it is used to integrate data. The dense volumetric warp field of one volume is represented efficiently by blending a few rigid transformations. Overall, SegmentedFusion is able to scan a non-rigidly deformed human surface as well as to estimate the dense motion field by using a consumer-grade depth camera. The experimental results demonstrate that SegmentedFusion is robust against fast inter-frame motion and topological changes. Since our method does not require prior assumption, SegmentedFusion can be applied to a wide range of human motions.
Augmented blendshapes for real-time simultaneous 3d head modeling and facial motion capture Reviewed International journal

Diego Thomas, Rin-Ichiro Taniguchi

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2016.6

　More details

Language：English Publishing type：Research paper (international conference proceedings)

We propose a method to build in real-time animated 3D head models using a consumer-grade RGB-D camera. Our framework is the first one to provide simultaneously com- prehensive facial motion tracking and a detailed 3D model of the user’s head. Anyone’s head can be instantly recon- structed and his facial motion captured without requiring any training or pre-scanning. The user starts facing the camera with a neutral expression in the first frame, but is free to move, talk and change his face expression as he wills otherwise. The facial motion is tracked using a blendshape representation while the fine geometric details are captured using a Bump image mapped over the template mesh. We propose an efficient algorithm to grow and refine the 3D model of the head on-the-fly and in real-time. We demon- strate robust and high-fidelity simultaneous facial motion tracking and 3D head modeling results on a wide range of subjects with various head poses and facial expressions. Our proposed method offers interesting possibilities for an- imation production and 3D video telecommunications.
Range Image Registration Using a Photometric Metric under Unknown Lighting Reviewed International journal

Diego Thomas, Akihiro Sugimoto

IEEE transactions on pattern analysis and machine intelligence 2013.9

　More details

Language：English Publishing type：Research paper (scientific journal)

Based on the spherical harmonics representation of image formation, we derive a new photometric metric for evaluating the correctness of a given rigid transformation aligning two overlapping range images captured under unknown, distant, and general illumination. We estimate the surrounding illumination and albedo values of points of the two range images from the point correspondences induced by the input transformation. We then synthesize the color of both range images using albedo values transferred using the point correspondences to compute the photometric reprojection error. This way allows us to accurately register two range images by finding the transformation that minimizes the photometric reprojection error. We also propose a practical method using the proposed photometric metric to register pairs of range images devoid of salient geometric features, captured under unknown lighting. Our method uses a hypothesize-and-test strategy to search for the transformation that minimizes our photometric metric. Transformation candidates are efficiently generated by employing the spherical representation of each range image. Experimental results using both synthetic and real data demonstrate the usefulness of the proposed metric.
Neural SDF for Shadow-Aware Unsupervised Structured Light International coauthorship

Kazuto Ichimaru, Diego Thomas, Takafumi Iwaguchi, Hiroshi Kawasaki

2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 287 - 296 2025.2

　More details

Language：English Publishing type：Research paper (international conference proceedings)
VortSDF: 3D Modeling with Centroidal Voronoi Tessellation on Signed Distance Field Reviewed International coauthorship

Diego Thomas, Briac Toussaint, Jean-Sebastien Franco, Edmond Boyer

2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 495 - 504 2025.2

　More details

Authorship：Lead author,　Corresponding author Language：English Publishing type：Research paper (international conference proceedings)
Sparse-View 3D Reconstruction of Clothed Humans via Normal Maps Reviewed International coauthorship

Jane Wu, Diego Thomas, Ronald Fedkiw

2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 11 - 22 2025.2

　More details

Language：English Publishing type：Research paper (international conference proceedings)
Text-Guided Diverse Scene Interaction Synthesis by Disentangling Actions From Scenes

Teshima H., Wake N., Thomas D., Nakashima Y., Kawasaki H., Ikeuchi K.

IEEE Access 13 73818 - 73830 2025

　More details

Publisher：IEEE Access

Generating human motion within 3D scenes from textual descriptions remains a challenging task because of the scarcity of hybrid datasets encompassing text, 3D scenes, and motion. Existing approaches suffer from fundamental limitations: a lack of datasets that integrate text, 3D scenes, and motion, and a reliance on end-to-end methods, which constrain the diversity and realism of generated human-scene interactions. In this paper, we propose a novel method to generate motions of humans interacting with objects in a 3D scene given a textual prompt. Our key innovation focuses on decomposing the motion generation task into distinct steps: 1) generating key poses from textual and scene contexts and 2) synthesizing full motion trajectories guided by these key poses and path planning. This approach eliminates the need for hybrid datasets by leveraging independent text-motion and pose datasets, significantly expanding action diversity and overcoming the constraints of prior works. Unlike previous methods, which focus on limited action types or rely on scarce datasets, our approach enables scalable and adaptable motion generation. Through extensive experiments, we demonstrate that our framework achieves unparalleled diversity and contextually accurate motions, advancing the state-of-the-art in human-scene interaction synthesis.

DOI： 10.1109/ACCESS.2025.3562086

Scopus
VR and computer vision based facade complexity analysis for building design

Jarrin F., Koga Y., Thomas D.

Journal of Asian Architecture and Building Engineering 2025 （ ISSN:13467581 ）

　More details

Publisher：Journal of Asian Architecture and Building Engineering

Architectural practice is evolving through digital fabrication, enabling complex designs that challenge the uniformity of barren walls and fully glazed facades that often dominate contemporary streetscapes. This paper addresses the challenge of quantifying complexity in architectural facade design. It asks whether a Virtual Reality (VR) and Computer Vision (CV) approach can effectively measure facade complexity and align with user perceptions. The study employs the Computational Image Complexity Analysis (CICA) system, integrating VR and CV algorithms, to assess reactions to various facade complexities. Results reveal an average standard deviation of 9% between the system’s complexity measurements and participants’ perceptions, with a preference for moderate complexity. These findings highlight the importance of aligning architectural complexity with user preferences to enhance sustainability and satisfaction and the potential of this approach to quantify complexity and guide data-driven building design processes. Future research should explore the long-term impact of complex facades on user well-being and environmental sustainability.

DOI： 10.1080/13467581.2025.2458791

Scopus
Fast direct multi-person radiance fields from sparse input with dense pose priors

Lima J.P., Uchiyama H., Thomas D., Teichrieb V.

Computers and Graphics Pergamon 124 2024.11 （ ISSN:00978493 ）

　More details

Publisher：Computers and Graphics Pergamon

Volumetric radiance fields have been popular in reconstructing small-scale 3D scenes from multi-view images. With additional constraints such as person correspondences, reconstructing a large 3D scene with multiple persons becomes possible. However, existing methods fail for sparse input views or when person correspondences are unavailable. In such cases, the conventional depth image supervision may be insufficient because it only captures the relative position of each person with respect to the camera center. In this paper, we investigate an alternative approach by supervising the optimization framework with a dense pose prior that represents correspondences between the SMPL model and the input images. The core ideas of our approach consist in exploiting dense pose priors estimated from the input images to perform person segmentation and incorporating such priors into the learning of the radiance field. Our proposed dense pose supervision is view-independent, significantly speeding up computational time and improving 3D reconstruction accuracy, with less floaters and noise. We confirm the advantages of our proposed method with extensive evaluation in a subset of the publicly available CMU Panoptic dataset. When training with only five input views, our proposed method achieves an average improvement of 6.1% in PSNR, 3.5% in SSIM, 17.2% in LPIPS<sup>vgg</sup>, 19.3% in LPIPS<sup>alex</sup>, and 39.4% in training time.

DOI： 10.1016/j.cag.2024.104063

Scopus
Virtual reality-based site layout planning for building design

Jarrin F., Koga Y., Thomas D., Kawasaki H.

Automation in Construction 167 2024.11 （ ISSN:09265805 ）

　More details

Publisher：Automation in Construction

This paper addresses the challenge of integrating optimized design solutions in Site Layout Planning (SLP) through Virtual Reality (VR), questioning how VR simulations can enhance their acceptance. The methodology involves a multi-objective optimization model that evaluates critical factors like earthwork volume, cost, and environmental impact, integrated into a VR framework for interactive participant evaluations. Results show a notable 48.3% increase in decision-making accuracy among participants using VR, highlighting VR's potential to significantly improve comprehension and application of complex data-driven designs in SLP. This underlines the transformative impact of VR on enhancing stakeholder engagement and optimizing design outcomes. Future research will broaden the participant base and further investigate the long-term effects of VR integration in professional environments.

DOI： 10.1016/j.autcon.2024.105690

Scopus
A Practical Calibration Method for Cameras and Multiple Line-Lasers in Light Sectioning Systems for Underwater Environments Reviewed

Takaki Ikeda, Takafumi Iwaguchi, Diego Thomas, Hiroshi Kawasaki

2024 IEEE International Conference on Image Processing (ICIP) 1602 - 1608 2024.10

　More details

Language：English Publishing type：Research paper (international conference proceedings)
Two-stage pose optimization algorithm using color information for underwater SLAM with light-sectioning-based 3D scanning method Reviewed

Takaki Ikeda, Takafumi Iwaguchi, Diego Thomas, Hiroshi Kawasaki

2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 10182 - 10189 2024.10

　More details

Language：English Publishing type：Research paper (international conference proceedings)
Mean Teacher for Unsupervised Domain Adaptation in Multi-View 3D Pedestrian Detection Reviewed International coauthorship

João Paulo Lima, Diego Thomas, Hideaki Uchiyama, Veronica Teichrieb

2024 37th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI) 1 - 6 2024.9

　More details

Language：English Publishing type：Research paper (international conference proceedings)
ActiveNeuS: Neural Signed Distance Fields for Active Stereo Reviewed International journal

Kazuto Ichimaru, Takaki Ikeda, Diego Thomas, Takafumi Iwaguchi, Hiroshi Kawasaki

International Conference on 3D Vision 2024.3

　More details

Language：English Publishing type：Research paper (international conference proceedings)
Weakly-Supervised 3D Reconstruction of Clothed Humans via Normal Maps International journal

Wu, Jane, Diego Thomas, and Ronald Fedkiw

arXiv 2023.11

　More details

Language：English Publishing type：Research paper (scientific journal)
A Two-step Approach for Interactive Animatable Avatars Reviewed International journal

#Takumi Kitamura, Diego Thomas, Hiroshi Kawasaki, Naoya Iwamoto

COMPUTER GRAPHICS INTERNATIONAL 2023 2023.9

　More details

Language：English Publishing type：Research paper (international conference proceedings)

We propose a new two-step human body animation technique based on displacement mapping that can learn a detailed deformation space, works at interactive time (more than 30 fps) and can be directly integrated into standard animation environments. To achieve real-time animation we employ the template-based approach and model pose-dependent deformations with 2D displacement images. We propose our own template model to facilitate and automatize training data preparation. Key to achieve detailed animation with few artifacts is to learn pose-dependent displacements directly in the pose space, without having to predict skinning weights. In order to generalize to totally new motions we employ a two step approach where the first step contains knowledge about general human motion while second step contains information about user specific motion. Our experimental results show that our proposed method can animate an avatar up to 300 times faster than baselines while keeping similar or even better level of details.
ACT2G: Attention-based Contrastive Learning for Text-to-Gesture Generation

Teshima H., Wake N., Thomas D., Nakashima Y., Kawasaki H., Ikeuchi K.

Proceedings of the ACM on Computer Graphics and Interactive Techniques 6 ( 3 ) 2023.8

　More details

Publisher：Proceedings of the ACM on Computer Graphics and Interactive Techniques

Recent increase of remote-work, online meeting and tele-operation task makes people find that gesture for avatars and communication robots is more important than we have thought. It is one of the key factors to achieve smooth and natural communication between humans and AI systems and has been intensively researched. Current gesture generation methods are mostly based on deep neural network using text, audio and other information as the input, however, they generate gestures mainly based on audio, which is called a beat gesture. Although the ratio of the beat gesture is more than 70% of actual human gestures, content based gestures sometimes play an important role to make avatars more realistic and human-like. In this paper, we propose a attention-based contrastive learning for text-to-gesture (ACT2G), where generated gestures represent content of the text by estimating attention weight for each word from the input text. In the method, since text and gesture features calculated by the attention weight are mapped to the same latent space by contrastive learning, once text is given as input, the network outputs a feature vector which can be used to generate gestures related to the content. User study confirmed that the gestures generated by ACT2G were better than existing methods. In addition, it was demonstrated that wide variation of gestures were generated from the same text by changing attention weights by creators.

DOI： 10.1145/3606940

Scopus
ACT2G: Attention-based Contrastive Learning for Text-to-Gesture Generation Reviewed International journal

#Teshima Hitoshi, Naoki Wake, Diego Thomas, Yuta Nakashima, Hiroshi Kawasaki, Katsushi Ikeuchi

ACM on Computer Graphics and Interactive Techniques 6 2023.8

　More details

Language：English Publishing type：Research paper (international conference proceedings)
Deep Gesture Generation for Social Robots Using Type-Specific Libraries Reviewed International journal

Hitoshi Teshima, Naoki Wake, Diego Thomas, Yuta Nakashima, Hiroshi Kawasaki, Katsushi Ikeuchi

2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2022.10

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Body language such as conversational gesture is a powerful way to ease communication. Conversational gestures do not only make a speech more lively but also contain semantic meaning that helps to stress important information in the discussion. In the field of robotics, giving conversational agents (humanoid robots or virtual avatars) the ability to properly use gestures is critical, yet remain a task of extraordinary difficulty. This is because given only a text as input, there are many possibilities and ambiguities to generate an appropriate gesture. Different to previous works we propose a new method that explicitly takes into account the gesture types to reduce these ambiguities and generate human-like conversational gestures. Key to our proposed system is a new gesture database built on the TED dataset that allows us to map a word to one of three types of gestures: “Imagistic” gestures, which express the content …
Self-calibration of multiple-line-lasers based on coplanarity and Epipolar constraints for wide area shape scan using moving camera Reviewed International journal

Genki Nagamatsu, Takaki Ikeda, Takafumi Iwaguchi, Diego Thomas, Jun Takamatsu, Hiroshi Kawasaki

2022 26th International Conference on Pattern Recognition (ICPR) 2022.8

　More details

Language：English Publishing type：Research paper (international conference proceedings)

High-precision three-dimensional scanning systems have been intensively researched and developed. Recently, for acquisition of large scale scene with high density, simultaneous localisation and mapping (SLAM) technique is preferred because of its simplicity; a single sensor that is moved around freely during 3D scanning. However, to integrate multiple scans, captured data as well as position of each sensor must be highly accurate, making these systems difficult to use in environments not accessible by humans, such as underwater, internal body, or outer space. In this paper, we propose a new, flexible system with multiple line lasers that reconstructs dense and accurate 3D scenes. The advantages of our proposed system are (1) no need of synchronization nor precalibration between lasers and a camera, and (2) the system can reconstruct 3D scenes in extreme conditions, such as underwater. We propose a …
3D pedestrian localization using multiple cameras: a generalizable approach Reviewed International journal

João Paulo Lima, Rafael Roberto, Lucas Figueiredo, Francisco Simões, Diego Thomas, Hideaki Uchiyama, Veronica Teichrieb

Machine Vision and Applications 2022.7

　More details

Language：English Publishing type：Research paper (scientific journal)

Pedestrian detection is a critical problem in many areas, such as smart cities, surveillance, monitoring, autonomous driving, and robotics. AI-based methods have made tremendous progress in the field in the last few years, but good performance is limited to data that match the training datasets. We present a multi-camera 3D pedestrian detection method that does not need to be trained using data from the target scene. The core idea of our approach consists in formulating consistency in multiple views as a graph clique cover problem. We estimate pedestrian ground location on the image plane using a novel method based on human body poses and person’s bounding boxes from an off-the-shelf monocular detector. We then project these locations onto the ground plane and fuse them with a new formulation of a clique cover problem from graph theory. We propose a new vertex ordering strategy to define fusion …
Generalizable Online 3D Pedestrian Tracking with Multiple Cameras Reviewed International journal

Victor Lyra, Isabella de Andrade, Joao Paulo Lima, Rafael Roberto, Lucas Figueiredo, Joao Marcelo Teixeira, Diego Thomas, Hideaki Uchiyama, Veronica Teichrieb

Proceedings of the 17th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2022) 2022.3

　More details

Language：English Publishing type：Research paper (international conference proceedings)

3D pedestrian tracking using multiple cameras is still a challenging task with many applications such as surveillance, behavioral analysis, statistical analysis, and more. Many of the existing tracking solutions involve training the algorithms on the target environment, which requires extensive time and effort. We propose an online 3D pedestrian tracking method for multi-camera environments based on a generalizable detection solution that does not require training with data of the target scene. We establish temporal relationships between people detected in different frames by using a combination of graph matching algorithm and Kalman filter. Our proposed method obtained a MOTA and MOTP of 77.1% and 96.4%, respectively on the test split of the public WILDTRACK dataset. Such results correspond to an improvement of approximately 3.4% and 22.2%, respectively, compared to the best existing online technique. Our experiments also demonstrate the advantages of using appearance information to improve the tracking performance.
Refining OpenPose With a New Sports Dataset for Robust 2D Pose Estimation Reviewed International journal

Takumi Kitamura, Hitoshi Teshima, Diego Thomas, Hiroshi Kawasaki

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision 2022.1

　More details

Language：English Publishing type：Research paper (international conference proceedings)

3D marker-less motion capture can be achieved by triangulating estimated multi-views 2D poses. However, when the 2D pose estimation fails, the 3D motion capture also fails. This is particularly challenging for sports performance of athletes, which have extreme poses. In extreme poses (like having the head down) state-of-the-art 2D pose estimator such as OpenPose do not work at all. In this paper, we propose a new method to improve the training of 2D pose estimators for extreme poses by leveraging a new sports dataset and our proposed data augmentation strategy. Our results show significant improvements over previous methods for 2D pose estimation of athletes performing acrobatic moves, while keeping state-of-the-art performance on standard datasets.
Integration of gesture generation system using gesture library with DIY robot design kit Reviewed International journal

Hitoshi Teshima, Naoki Wake, Diego Thomas, Yuta Nakashima, David Baumert, Hiroshi Kawasaki, Katsushi Ikeuchi

2022 IEEE/SICE International Symposium on System Integration (SII) 2022.1

　More details

Language：English Publishing type：Research paper (international conference proceedings)
3D pedestrian localization using multiple cameras: a generalizable approach.

João Paulo Lima, Rafael Roberto, Lucas Figueiredo, Francisco Simões, Diego Thomas, Hideaki Uchiyama, Veronica Teichrieb

Machine Vision and Applications 33 ( 4 ) 61 - 61 2022 （ ISSN:09328092 ）

　More details

Publishing type：Research paper (scientific journal) Publisher：Machine Vision and Applications

Pedestrian detection is a critical problem in many areas, such as smart cities, surveillance, monitoring, autonomous driving, and robotics. AI-based methods have made tremendous progress in the field in the last few years, but good performance is limited to data that match the training datasets. We present a multi-camera 3D pedestrian detection method that does not need to be trained using data from the target scene. The core idea of our approach consists in formulating consistency in multiple views as a graph clique cover problem. We estimate pedestrian ground location on the image plane using a novel method based on human body poses and person’s bounding boxes from an off-the-shelf monocular detector. We then project these locations onto the ground plane and fuse them with a new formulation of a clique cover problem from graph theory. We propose a new vertex ordering strategy to define fusion priority based on both detection distance and vertex degree. We also propose an optional step for exploiting pedestrian appearance during fusion by using a domain-generalizable person re-identification model. Finally, we compute the final 3D ground coordinates of each detected pedestrian with a method based on keypoint triangulation. We evaluated the proposed approach on the challenging WILDTRACK and MultiviewX datasets. Our proposed method significantly outperformed state of the art in terms of generalizability. It obtained a MODA that was approximately 15% and 2% better than the best existing generalizable detection technique on WILDTRACK and MultiviewX, respectively.

DOI： 10.1007/s00138-022-01323-9

Scopus

researchmap
Generalizable Online 3D Pedestrian Tracking with Multiple Cameras.

Victor Gouveia de M. Lyra, Isabella de Andrade, João Paulo Lima, Rafael Roberto, Lucas Figueiredo, João Marcelo X. N. Teixeira, Diego Thomas, Hideaki Uchiyama, Veronica Teichrieb

Proceedings of the 17th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications 5 820 - 827 2022 （ ISSN:21845921 ISBN:9789897585555 ）

　More details

Publishing type：Research paper (international conference proceedings) Publisher：SCITEPRESS

3D pedestrian tracking using multiple cameras is still a challenging task with many applications such as surveillance, behavioral analysis, statistical analysis, and more. Many of the existing tracking solutions involve training the algorithms on the target environment, which requires extensive time and effort. We propose an online 3D pedestrian tracking method for multi-camera environments based on a generalizable detection solution that does not require training with data of the target scene. We establish temporal relationships between people detected in different frames by using a combination of graph matching algorithm and Kalman filter. Our proposed method obtained a MOTA and MOTP of 77.1% and 96.4%, respectively on the test split of the public WILDTRACK dataset. Such results correspond to an improvement of approximately 3.4% and 22.2%, respectively, compared to the best existing online technique. Our experiments also demonstrate the advantages of using appearance information to improve the tracking performance.

DOI： 10.5220/0010842800003124

Scopus

researchmap

Other Link： https://dblp.uni-trier.de/db/conf/visapp/visapp2022-2.html#LyraALRFTTUT22
Unsupervised Multi-view Multi-person 3D Pose Estimation Using Reprojection Error

De Franca Silva, DW; Do Monte Lima, JPS; Macedo, D; Zanchettin, C; Thomas, DGF; Uchiyama, H; Teichrieb, V

ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2022, PT III 13531 482 - 494 2022 （ ISSN:0302-9743 ISBN:978-3-031-15933-6 eISSN:1611-3349 ）

　More details

Publisher：Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics

This work addresses multi-view multi-person 3D pose estimation in synchronized and calibrated camera views. Recent approaches estimate neural network weights in a supervised way; they rely on ground truth annotated datasets to compute the loss function and optimize the weights in the network. However, manually labeling ground truth datasets is labor-intensive, expensive, and prone to errors. Consequently, it is preferable not to rely heavily on labeled datasets. This work proposes an unsupervised approach to estimating 3D human poses requiring only an off-the-shelf 2D pose estimation method and the intrinsic and extrinsic camera parameters. Our approach uses reprojection error as a loss function instead of comparing the predicted 3D pose with the ground truth. First, we estimate the 3D pose of each person using the plane sweep stereo approach, in which the depth of each 2D joint related to each person is estimated in a selected target view. The estimated 3D pose is then projected onto each of the other views using camera parameters. Finally, the 2D reprojection error in the image plane is computed by comparing it with the estimated 2D pose corresponding to the same person. The 2D poses that correspond to the same person are identified using virtual depth planes, where each 3D pose is projected onto the reference view and compared to find the nearest 2D pose. Our proposed method learns to estimate 3D pose in an end-to-end unsupervised manner and does not require any manual parameter tuning, yet we achieved results close to state-of-the-art supervised methods on a public dataset. Our method achieves only 5.8% points below the fully supervised state-of-the-art method and only 5.1% points below the best geometric approach in the Campus dataset.

DOI： 10.1007/978-3-031-15934-3_40

Web of Science

Scopus
Self-calibrated dense 3D sensor using multiple cross line-lasers based on light sectioning method and visual odometry Reviewed International journal

Genki Nagamatsu, Jun Takamatsu, Takafumi Iwaguchi, Diego Thomas, Hiroshi Kawasaki

2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2021.9

　More details

Language：English Publishing type：Research paper (international conference proceedings)
PoseRN: A 2D pose refinement network for bias-free multi-view 3D human pose estimation Reviewed International journal

Akihiko Sayo, Diego Thomas, Hiroshi Kawasaki, Yuta Nakashima, Katsushi Ikeuchi

2021 IEEE International Conference on Image Processing (ICIP) 2021.9

　More details

Language：English Publishing type：Research paper (international conference proceedings)

We propose a new 2D pose refinement network that learns to predict the human bias in the estimated 2D pose. There are biases in 2D pose estimations that are due to differences between annotations of 2D joint locations based on annotators’ perception and those defined by motion capture (MoCap) systems. These biases are crafted into publicly available 2D pose datasets and cannot be removed with existing error reduction approaches. Our proposed pose refinement network allows us to efficiently remove the human bias in the estimated 2D poses and achieve highly accurate multi-view 3D human pose estimation.
Unsupervised 3D Human Pose Estimation in Multi-view-multi-pose Video Reviewed International journal

#Cheng Sun, Diego Thomas, Hiroshi Kawasaki

2020 25th International Conference on Pattern Recognition (ICPR) 5959 - 5964 2021.1

　More details

Language：English Publishing type：Research paper (international conference proceedings)
Analysis and classification of gestures in ted talks Reviewed International journal

Hitoshi Teshima, Naoki Wake, Diego Thomas, Yuta Nakashima, Hiroshi Kawasaki, Katsushi Ikeuchi

IEICE Technical Report; IEICE Tech. Rep. 2020.10

　More details

Language：English Publishing type：Research paper (international conference proceedings)
On-the-fly Extrinsic Calibration of Non-Overlapping in-Vehicle Cameras based on Visual SLAM under 90-degree Backing-up Parking Reviewed International journal

Kazuki Nishiguchi, Hideaki Uchiyama, Kazutaka Hayakawa, Jun Adachi, Diego Thomas, Atsushi Shimada, Rin-Ichiro Taniguchi

2020 IEEE Intelligent Vehicles Symposium (IV) 2020.10

　More details

Language：English Publishing type：Research paper (international conference proceedings)
Real-time Simultaneous 3D Head Modeling and Facial Motion Capture with an RGB-D camera International journal

Diego Thomas

arXiv preprint arXiv:2004.10557 2020.4

　More details

Language：English Publishing type：Research paper (scientific journal)
Generating a consistent global map under intermittent mapping conditions for large-scale vision-based navigation Reviewed International journal

Kazuki Nishiguchi, Walid Bousselham, Hideaki Uchiyama, Diego Thomas, Atsushi Shimada, Rin Ichiro Taniguchi

15th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, VISIGRAPP 2020 2020.1

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Localization is the process to compute sensor poses based on vision technologies such as visual Simultaneous Localization And Mapping (vSLAM). It can generally be applied to navigation systems . To achieve this, a global map is essential such that the relocalization process requires a single consistent map represented with an unified coordinate system. However, a large-scale global map cannot be created at once due to insufficient visual features at some moments. This paper presents an interactive method to generate a consistent global map from intermittent maps created by vSLAM independently via global reference points. First, vSLAM is applied to individual image sequences to create maps independently. At the same time, multiple reference points with known latitude and longitude are interactively recorded in each map. Then, the coordinate system of each individual map is converted into the one that has metric scale and unified axes with the reference points. Finally, the individual maps are merged into a single map based on the relative position of each origin. In the evaluation, we show the result of map merging and relocalization with our dataset to confirm the effectiveness of our method for navigation tasks. In addition, the report on participating in the navigation competition in a practical environment is also discussed.
Regression of 3D human body shapes from a single image in a tetrahedral volume International journal

#Hayato Onizuka, Diego Thomas, @Zehra Hayirci, Akihiro Sugimoto, Hideaki Uchiyama, Rin-ichiro Taniguchi

THE 15TH JOINT WORKSHOP ON MACHINE PERCEPTION AND ROBOTICS 2019.11

　More details

Language：English Publishing type：Research paper (bulletin of university, research institution)

Reconstructing a 3D shape from a single 2D image is an ill-posed problem. This is because different 3D shapes may produce the same 2D image. Nevertheless, under some conditions and with the help of deep neural networks (DNN), approximate solutions can be obtained. The recent advances in convolutional neural networks (CNNs) for 3D object shape reconstruction from a single image are particularly thrilling for the case of 3D human body shape retrieval. The 3D human body has been extensively studied and modelled using standard computer vision techniques, which give us a sufficient amount of prior knowledge to constrain the 3D shape recovery problem using DNN. Current solutions, however, fail to reconstruct the fine details of the body due to a required huge amount of memory that cannot be maintained even on modern GPUs. In this paper, we propose the tetrahedral volumetric truncated signed distance function (TSDF) model for the human body, and its corresponding part connection network (PCN) for detailed shape regression. Our proposed 3D representation requires a low amount of memory and allows us to reconstruct detailed shapes from a single RGB image. Experimental results using real data demonstrate that our proposed method is promising.
Real-Time Facial Motion Capture Using RGB-D Images Under Complex Motion and Occlusions Reviewed International journal

@Joao Otavio de Lucena, Joao Paulo Lima, Diego Thomas, Veronica Teichrieb

THE SYMPOSIUM ON VIRTUAL AND AUGMENTED REALITY 2019 2019.10

　More details

Language：English Publishing type：Research paper (international conference proceedings)

We present a technique for capturing facial performance in real time using an RGB-D camera. Such method can be applied to face augmentation by leveraging facial expression changes. The technique is able to perform both 3D facial modeling and facial motion tracking without the need of pre-scanning or training for a specific user.
The proposed approach builds on an existing method that we refer as FaceCap, which uses a blendshape representation and a Bump image for tracking facial motion and capturing geometric details. The original FaceCap algorithm fails in some scenarios with complex motion and occlusions, mainly due to problems in the face detection and tracking steps. FaceCap also has problems with the Bump image filtering step that generates outliers, causing more distortion on the 3D augmented blendshape.
In order to solve these problems, we propose two refinements: (a) a new framework for face detection and landmark localization based on the state-of-the-art methods MTCNN and CE-CLM, respectively; and (b) a simple but effective modification in the filtering step, removing reconstruction failures in the eye region.
Experiments showed that the proposed approach can deal with unconstrained scenarios, such as large head pose variations and partial occlusions, while achieving real-time execution.
MeRA: An Interactive Mediated Reality Agent for Educational Application Reviewed International journal

@Guillaume Quiniou, Frederic Rayar, Diego Thomas

International Symposium on Mixed and Augmented Reality | ISMAR 2019 2019.10

　More details

Language：English Publishing type：Research paper (international conference proceedings)

The recent developments of Mixed Reality devices and advances in 3D scene understanding and mapping unlock new possibilities for richer interactions between users, the surrounding 3D environment but also virtual agents. In this work, we present MeRA: an interactive Meditated Reality agent for ludic and educational applications. The agent evolves in mediated tabletop environment and can help the user to learn, play or create Tangram, a jigsaw-like traditional game. This opens new exciting perspectives for educational support of young children, who require active and human-like interactions.
Landmark-guided deformation transfer of template facial expressions for automatic generation of avatar blendshapes Reviewed International journal

#Hayato Onizuka, Diego Thomas, Hideaki Uchiyama, Rin-ichiro Taniguchi

The 2nd Workshop on 3D Reconstruction in the Wild (3DRW2019) in conjunction with ICCV2019 2019.10

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Blendshape models are commonly used to track and re-target facial expressions to virtual avatars using RGB-D cameras and without using any facial marker. When using blendshape models, the target avatar model must possess a set of key-shapes that can be blended depending on the estimated facial expression. Creating realistic set of key-shapes is extremely difficult and requires time and professional expertise. As a consequence, blendshape-based re-targeting technology can only be used with a limited amount of pre-built avatar models, which is not attractive for the large public. In this paper, we propose an automatic method to easily generate realistic key-shapes of any avatar that map directly to the source blendshape model (the user is only required to select a few facial landmarks on the avatar mesh). By doing so, captured facial motion can be easily re-targeted to any avatar, even when the avatar has largely different shape and topology compared with the source template mesh. Our experimental results show the accuracy of our proposed method compared with the state-of-the-art method for mesh deformation transfer.
Blended-Keyframes for Mobile Mediated Reality Applications Reviewed International journal

#Yu Xue, Diego Thomas, Frederic Rayar, Hideaki Uchiyama, Rin-ichiro Taniguchi, Boacai Yin

IEEE International Symposium on Mixed and Augmented Reality (ISMAR) 2019.10

　More details

Language：English Publishing type：Research paper (international conference proceedings)

With the recent developments of Mixed Reality (MR) devices and advances in 3D scene understanding, MR applications on mobile devices are becoming available to a large part of the society. These applications allow users to mix virtual content into the surrounding environment. However the ability to mediate (\textit{i.e.}, modify or alter) the surrounding environment remains a difficult and unsolved problem that limits the degree of immersion of current MR applications on mobile devices. In this paper, we present a method to mediate 2D views of a real environment using a single consumer-grade RGB-D camera and without the need of pre-scanning the scene. Our proposed method creates in real-time a dense and detailed keyframe-based 3D map of the real scene and takes advantage of a semantic instance segmentation to isolate target objects. We show that our proposed method allows to remove target objects in the environment and to replace them by their virtual counterpart, which are built on-the-fly. Such an approach is well suited for creating mobile Mediated Reality applications.
仲介現実を用いた次世代教育に向けるアプリ

#Xue Yu, Diego Thomas, Frederic Rayar, Hideaki Uchiyama, Yin Baocai, Rin-ichiro Taniguchi

第22回画像の認識・理解シンポジウム (MIRU2019) 2019.8

　More details

Language：Japanese Publishing type：Research paper (other academic)

近年では、仲介現実（Mediated Reality）の設備と三次元復元に関する３D シーン理解の研究が行われてきた。仮想エージェントと人と周囲環境の相互作用手段が適用可能となった。本稿では仲介現実エージェントを紹介して、市販の手持ちRGBDカメラを用いた、モバイル機器上に仲介現実環境を構築する方法を提案した。
3D Body and Background Reconstruction in a Large-scale Indoor Scene using Multiple Depth Cameras Reviewed International journal

Daisuke Kobayashi ; Diego Thomas ; Hideaki Uchiyama ; Rin-ichiro Taniguchi

2019 12th Asia Pacific Workshop on Mixed and Augmented Reality (APMAR) 2019.3

　More details

Language：English Publishing type：Research paper (international conference proceedings)

3D reconstruction of indoor scenes that contain non-rigidly moving human body using depth cameras is a task of extraordinary difficulty. Despite intensive efforts from the researchers in the 3D vision community, existing methods are still limited to reconstruct small scale scenes. This is because of the difficulty to track the camera motion when a target person moves in a totally different direction. Due to the narrow field of view (FoV) of consumer-grade red-green-blue-depth (RGB-D) cameras, a target person (generally put at about 2-3 meters from the camera) covers most of the FoV of the camera. Therefore, there are not enough features from the static background to track the motion of the camera. In this paper, we propose a system which reconstructs a moving human body and the background of an indoor scene using multiple depth cameras. Our system is composed of three Kinects that are approximately set in the same line and facing the same direction so that their FoV do not overlap (to avoid interference). Owing to this setup, we capture images of a person moving in a large scale indoor scene. The three Kinect cameras are calibrated with a robust method that uses three large non parallel planes. A moving person is detected by using human skeleton information, and is reconstructed separately from the static background. By separating the human body and the background, static 3D reconstruction can be adopted for the static background area while a method specialized for the human body area can be used to reconstruct the 3D model of the moving person. The experimental result shows the performance of proposed system for human body in a large-scale indoor scene.
3D Body and Background Reconstruction in a Large-scale Indoor Scene using Multiple Depth Cameras Reviewed International journal

#Daisuke Kobayashi, Diego Thomas, Hideaki Uchiyama, Rin-ichiro Taniguchi

The 12th Asia Pacific Workshop on Mixed and Augmented Reality 2019.3

　More details

Language：English Publishing type：Research paper (international conference proceedings)

3D reconstruction of indoor scenes that contain non-rigidly moving human body using depth cameras is a task of extraordinary difficulty. Despite intensive efforts from the researchers in the 3D vision community, existing methods are still limited to reconstruct small scale scenes. This is because of the difficulty to track the camera motion when a target person moves in a totally different direction. Due to the narrow field of view (FoV) of consumer-grade red-green-blue-depth (RGB-D) cameras, a target person (generally put at about $2-3$ meters from the camera) covers most of the FoV of the camera. Therefore, there are not enough features from the static background to track the motion of the camera. In this paper, we propose a system which reconstructs a moving human body and the background of an indoor scene using multiple depth cameras. Our system is composed of three Kinects that are approximately set in the same line and facing the same direction so that their FoV do not overlap (to avoid interference). Owing to this setup, we capture images of a person moving in a large scale indoor scene. The three Kinect cameras are calibrated with a robust method that uses three large non parallel planes. A moving person is detected by using human skeleton information, and is reconstructed separately from the static background. By separating the human body and the background, static 3D reconstruction can be adopted for the static background area while a method specialized for the human body area can be used to reconstruct the 3D model of the moving person. The experimental result shows the performance of proposed system for human body in a large-scale indoor scene.
Solving monocular visual odometry scale factor with adaptive step length estimates for pedestrians using handheld devices Reviewed International journal

Nicolas Antigny, Hideaki Uchiyama, Myriam Servières, Valérie Renaudin, Diego Thomas, Rin-ichiro Taniguchi

MDPI Sensors 2019.1

　More details

Language：English Publishing type：Research paper (scientific journal)

The urban environments represent challenging areas for handheld device pose estimation (ie, 3D position and 3D orientation) in large displacements. It is even more challenging with low-cost sensors and computational resources that are available in pedestrian mobile devices (ie, monocular camera and Inertial Measurement Unit). To address these challenges, we propose a continuous pose estimation based on monocular Visual Odometry. To solve the scale ambiguity and suppress the scale drift, an adaptive pedestrian step lengths estimation is used for the displacements on the horizontal plane. To complete the estimation, a handheld equipment height model, with respect to the Digital Terrain Model contained in Geographical Information Systems, is used for the displacement on the vertical axis. In addition, an accurate pose estimation based on the recognition of known objects is punctually used to correct the pose estimate and reset the monocular Visual Odometry. To validate the benefit of our framework, experimental data have been collected on a 0.7 km pedestrian path in an urban environment for various people. Thus, the proposed solution allows to achieve a positioning error of 1.6–7.5% of the walked distance, and confirms the benefit of the use of an adaptive step length compared to the use of a fixed-step length.
Incremental 3D Cuboid Modeling with Drift Compensation Reviewed International journal

Masashi Mishima, Hideaki Uchiyama, Diego Thomas, Rin-ichiro Taniguchi, Rafael Roberto, João Lima, Veronica Teichrieb

MDPI Sensors 2019.1

　More details

Language：English Publishing type：Research paper (scientific journal)

This paper presents a framework of incremental 3D cuboid modeling by using the mapping results of an RGB-D camera based simultaneous localization and mapping (SLAM) system. This framework is useful in accurately creating cuboid CAD models from a point cloud in an online manner. While performing the RGB-D SLAM, planes are incrementally reconstructed from a point cloud in each frame to create a plane map. Then, cuboids are detected in the plane map by analyzing the positional relationships between the planes, such as orthogonality, convexity, and proximity. Finally, the position, pose, and size of a cuboid are determined by computing the intersection of three perpendicular planes. To suppress the false detection of the cuboids, the cuboid shapes are incrementally updated with sequential measurements to check the uncertainty of the cuboids. In addition, the drift error of the SLAM is compensated by the registration of the cuboids. As an application of our framework, an augmented reality-based interactive cuboid modeling system was developed. In the evaluation at cluttered environments, the precision and recall of the cuboid detection were investigated, compared with a batch-based cuboid detection method, so that the advantages of our proposed method were clarified.
Indoor Positioning System Based on Chest-Mounted IMU Reviewed International journal

Chuanhua Lu, Hideaki Uchiyama, Diego Thomas, Atsushi Shimada, Rin-ichiro Taniguchi

MDPI Sensors 2019.1

　More details

Language：English Publishing type：Research paper (scientific journal)

Demand for indoor navigation systems has been rapidly increasing with regard to location-based services. As a cost-effective choice, inertial measurement unit (IMU)-based pedestrian dead reckoning (PDR) systems have been developed for years because they do not require external devices to be installed in the environment. In this paper, we propose a PDR system based on a chest-mounted IMU as a novel installation position for body-suit-type systems. Since the IMU is mounted on a part of the upper body, the framework of the zero-velocity update cannot be applied because there are no periodical moments of zero velocity. Therefore, we propose a novel regression model for estimating step lengths only with accelerations to correctly compute step displacement by using the IMU data acquired at the chest. In addition, we integrated the idea of an efficient map-matching algorithm based on particle filtering into our system to improve positioning and heading accuracy. Since our system was designed for 3D navigation, which can estimate position in a multifloor building, we used a barometer to update pedestrian altitude, and the components of our map are designed to explicitly represent building-floor information. With our complete PDR system, we were awarded second place in 10 teams for the IPIN 2018 Competition Track 2, achieving a mean error of 5.2 m after the 800 m walking event.
FusionMLS: Highly dynamic 3D reconstruction with consumer-grade RGB-D cameras Reviewed International journal

Siim Meerits, Diego Thomas, Vincent Nozick, Hideo Saito

Computational Visual Media 2018.12

　More details

Language：English Publishing type：Research paper (scientific journal)

Multi-view dynamic three-dimensional reconstruction has typically required the use of custom shutter-synchronized camera rigs in order to capture scenes containing rapid movements or complex topology changes. In this paper, we demonstrate that multiple unsynchronized low-cost RGB-D cameras can be used for the same purpose. To alleviate issues caused by unsynchronized shutters, we propose a novel depth frame interpolation technique that allows synchronized data capture from highly dynamic 3D scenes. To manage the resulting huge number of input depth images, we also introduce an efficient moving least squares-based volumetric reconstruction method that generates triangle meshes of the scene. Our approach does not store the reconstruction volume in memory, making it memory-efficient and scalable to large scenes. Our implementation is completely GPU based and works in real time. The results shown herein, obtained with real data, demonstrate the effectiveness of our proposed method and its advantages compared to state-of-the-art approaches.
Sparse cost volume for efficient stereo matching Reviewed International journal

Chuanhua Lu, Hideaki Uchiyama, Diego Thomas, Atsushi Shimada, Rin-ichiro Taniguchi

MDPI Remote Sensing 2018.11

　More details

Language：English Publishing type：Research paper (scientific journal)

Stereo matching has been solved as a supervised learning task with convolutional neural network (CNN). However, CNN based approaches basically require huge memory use. In addition, it is still challenging to find correct correspondences between images at ill-posed dim and sensor noise regions. To solve these problems, we propose Sparse Cost Volume Net (SCV-Net) achieving high accuracy, low memory cost and fast computation. The idea of the cost volume for stereo matching was initially proposed in GC-Net. In our work, by making the cost volume compact and proposing an efficient similarity evaluation for the volume, we achieved faster stereo matching while improving the accuracy. Moreover, we propose to use weight normalization instead of commonly-used batch normalization for stereo matching tasks. This improves the robustness to not only sensor noises in images but also batch size in the training process. We evaluated our proposed network on the Scene Flow and KITTI 2015 datasets, its performance overall surpasses the GC-Net. Comparing with the GC-Net, our SCV-Net achieved to:(1) reduce 73.08% GPU memory cost;(2) reduce 61.11% processing time;(3) improve the 3PE from 2.87% to 2.61% on the KITTI 2015 dataset.
RGB-D SLAM based incremental cuboid modeling Reviewed International journal

Masashi Mishima, Hideaki Uchiyama, Diego Thomas, Rin-ichiro Taniguchi, Rafael Roberto, Veronica Teichrieb

The European Conference on Computer Vision (ECCV) workshops, 2018 2018.9

　More details

Language：English Publishing type：Research paper (international conference proceedings)

This paper present a framework for incremental 3D cuboid modeling combined with RGB-D SLAM. While performing RGB-D SLAM, planes are incrementally reconstructed from point clouds. Then, cuboids are detected in the planes by analyzing the positional relationships between the planes; orthogonality, convexity, and proximity. Finally, the position, pose and size of a cuboid are determined by computing the intersection of three perpendicular planes. In addition, the cuboid shapes are incrementally updated to suppress false detections with sequential measurements. As an application of our framework, an augmented reality based interactive cuboid modeling system is introduced. In the evaluation at a cluttered environment, the precision and recall of the cuboid detection are improved with our framework owing to stable plane detection, compared with a batch based method.
Live structural modeling using RGB-D SLAM Reviewed International journal

Nicolas Olivier, Hideaki Uchiyama, Masashi Mishima, Diego Thomas, Rin-Ichiro Taniguchi, Rafael Roberto, João Paulo Lima, Veronica Teichrieb

2018 IEEE International Conference on Robotics and Automation (ICRA) 2018.5

　More details

Language：English Publishing type：Research paper (international conference proceedings)

This paper presents a method for localizing primitive shapes in a dense point cloud computed by the RGB-D SLAM system. To stably generate a shape map containing only primitive shapes, the primitive shape is incrementally modeled by fusing the shapes estimated at previous frames in the SLAM, so that an accurate shape can be finally generated. Specifically, the history of the fusing process is used to avoid the influence of error accumulation in the SLAM. The point cloud of the shape is then updated by fusing the points in all the previous frames into a single point cloud. In the experimental results, we show that metric primitive modeling in texture-less and unprepared environments can be achieved online.
Synthesis of environment maps for mixed reality Reviewed International journal

David R Walton, Diego Thomas, Anthony Steed, Akihiro Sugimoto

2017 IEEE International Symposium on Mixed and Augmented Reality (ISMAR) 2017.10

　More details

Language：English Publishing type：Research paper (international conference proceedings)

When rendering virtual objects in a mixed reality application, it is helpful to have access to an environment map that captures the appearance of the scene from the perspective of the virtual object. It is straightforward to render virtual objects into such maps, but capturing and correctly rendering the real components of the scene into the map is much more challenging. This information is often recovered from physical light probes, such as reflective spheres or fisheye cameras, placed at the location of the virtual object in the scene. For many application areas, however, real light probes would be intrusive or impractical.
Ideally, all of the information necessary to produce detailed en- vironment maps could be captured using a single device. We intro- duce a method using an RGBD camera and a small fisheye camera, contained in a single unit, to create environment maps at any lo- cation in an indoor scene. The method combines the output from both cameras to correct for their limited field of view and the dis- placement from the virtual object, producing complete environment maps suitable for rendering the virtual content in real time. Our method improves on previous probeless approaches by its ability to recover high-frequency environment maps. We demonstrate how this can be used to render virtual objects which shadow, reflect and refract their environment convincingly.
Fast 3D point cloud segmentation using supervoxels with geometry and color for 3D scene understanding Reviewed International journal

Francesco Verdoja, Diego Thomas, Akihiro Sugimoto

IEEE International Conference on Multimedia and Expo (ICME), 2017 2017.7

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Segmentation of 3D colored point clouds is a research field with renewed interest thanks to recent availability of inexpensive consumer RGB-D cameras and its importance as an unavoidable low-level step in many robotic applications. However, 3D data's nature makes the task challenging and, thus, many different techniques are being proposed, all of which require expensive computational costs. This paper presents a novel fast method for 3D colored point cloud segmentation. It starts with supervoxel partitioning of the cloud, i.e., an oversegmentation of the points in the cloud. Then it leverages on a novel metric exploiting both geometry and color to iteratively merge the supervoxels to obtain a 3D segmentation where the hierarchical structure of partitions is maintained. The algorithm also presents computational complexity linear to the size of the input. Experimental results over two publicly available datasets demonstrate that our proposed method outperforms state-of-the-art techniques.
Parametric surface representation with bump image for dense 3d modeling using an rbg-d camera Reviewed International journal

Diego Thomas, Akihiro Sugimoto

International Journal of Computer Vision 123 ( 2 ) 2017.6

　More details

Language：English Publishing type：Research paper (scientific journal)

When constructing a dense 3D model of an indoor static scene from a sequence of RGB-D images, the choice of the 3D representation (e.g. 3D mesh, cloud of points or implicit function) is of crucial importance. In the last few years, the volumetric truncated signed distance function (TSDF) and its extensions have become popular in the community and largely used for the task of dense 3D modelling using RGB-D sensors. However, as this representation is voxel based, it offers few possibilities for manipulating and/or editing the constructed 3D model, which limits its applicability. In particular, the amount of data required to maintain the volumetric TSDF rapidly becomes huge which limits possibilities for portability. Moreover, simplifications (such as mesh extraction and surface simplification) significantly reduce the accuracy of the 3D model (especially in the color space), and editing the 3D model is difficult. We propose a novel compact, flexible and accurate 3D surface representation based on parametric surface patches augmented by geometric and color texture images. Simple parametric shapes such as planes are roughly fitted to the input depth images, and the deviations of the 3D measurements to the fitted parametric surfaces are fused into a geometric texture image (called the Bump image). A confidence and color texture image are also built. Our 3D scene representation is accurate yet memory efficient. Moreover, updating or editing the 3D model becomes trivial since it is reduced to manipulating 2D images. Our experimental results demonstrate the advantages of our proposed 3D representation through a concrete indoor scene reconstruction application.
Modeling large-scale indoor scenes with rigid fragments using RGB-D cameras Reviewed International journal

Diego Thomas, Akihiro Sugimoto

Computer Vision and Image Understanding 2017.4

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Hand-held consumer depth cameras have become a commodity tool for constructing 3D models of indoor environments in real time. Recently, many methods to fuse low quality depth images into a single dense and high fidelity 3D model have been proposed. Nonetheless, dealing with large-scale scenes remains a challenging problem. In particular, the accumulation of small errors due to imperfect camera localization becomes crucial (at large scale) and results in dramatic deformations of the built 3D model. These deformations have to be corrected whenever it is possible (when a loop exists for example). To facilitate such correction, we use a structured 3D representation where points are clustered into several planar patches that compose the scene. We then propose a two-stage framework to build in details and in real-time a large-scale 3D model. The first stage (the local mapping) generates local structured 3D models with rigidity constraints from short subsequences of RGB-D images. The second stage (the global mapping) aggregates all local 3D models into a single global model in a geometrically consistent manner. Minimizing deformations of the global model reduces to re-positioning the planar patches of the local models thanks to our structured 3D representation. This allows efficient, yet accurate computations. Our experiments using real data confirm the effectiveness of our proposed method.
Multi-view facial landmark detector learned by the Structured Output SVM Reviewed International journal

Michal Uřičář, Vojtěch Franc, Diego Thomas, Akihiro Sugimoto, Václav Hlaváč

Image and Vision Computing 47 2016.3

　More details

Language：English Publishing type：Research paper (scientific journal)

We propose a real-time multi-view landmark detector based on Deformable Part Models (DPM). The detector is composed of a mixture of tree based DPMs, each component describing landmark configurations in a specific range of viewing angles. The usage of view specific DPMs allows to capture a large range of poses and to deal with the problem of self-occlusions. Parameters of the detector are learned from annotated examples by the Structured Output Support Vector Machines algorithm. The learning objective is directly related to the performance measure used for detector evaluation. The tree based DPM allows to find a globally optimal landmark configuration by the dynamic programming. We propose a coarse-to-fine search strategy which allows real-time processing by the dynamic programming also on high resolution images. Empirical evaluation on “in the wild” images shows that the proposed detector is competitive with the state-of-the-art methods in terms of speed and accuracy yet it keeps the guarantee of finding a globally optimal estimate in contrast to other methods.

DOI： https://doi.org/10.1016/j.imavis.2016.02.004
Real-time multi-view facial landmark detector learned by the structured output SVM Reviewed International journal

Michal Uřičář, Vojtěch Franc, Diego Thomas, Akihiro Sugimoto, Václav Hlaváč

11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), 2015 2015.5

　More details

Language：English Publishing type：Research paper (international conference proceedings)

While the problem of facial landmark detection is getting big attention in the computer vision
community recently, most of the methods deal only with near-frontal views and there is only
a few really multi-view detectors available, that are capable of detection in a wide range of
yaw angle (eg Φ ε (-90°, 90°)). We describe a multi-view facial landmark detector based on
the Deformable Part Models, which treats the problem of the simultaneous landmark
detection and the viewing angle estimation within a structured output classification
framework. We present an easily extensible and flexible framework which provides a real-
time performance on the “in the wild” images, evaluated on a challenging “Annotated Facial
Landmarks in the Wild” database. We show that our detector achieves better results than the
current state of the art in terms of the localization error.

DOI： 10.1109/FG.2015.7284810
A two-stage strategy for real-time dense 3D reconstruction of large-scale scenes. Reviewed International journal

Diego Thomas, Akihiro Sugimoto

IEEE European Conference on Computer Vision Workshops (ECCVW), 2014 2014.9

　More details

Language：English Publishing type：Research paper (international conference proceedings)

The frame-to-global-model approach is widely used for accurate 3D modeling from sequences of RGB-D images. Because still no perfect camera tracking system exists, the accumulation of small errors generated when registering and integrating successive RGB-D images causes deformations of the 3D model being built up. In particular, the deformations become significant when the scale of the scene to model is large. To tackle this problem, we propose a two-stage strategy to build in details a large-scale 3D model with minimal deformations where the first stage creates accurate small-scale 3D scenes in real-time from short subsequences of RGB-D images while the second stage re-organises all the results from the first stage in a geometrically consistent manner to reduce deformations as much as possible. By employing planar patches as the 3D scene representation, our proposed method runs in real-time to build accurate 3D models with minimal deformations even for large-scale scenes. Our experiments using real data confirm the effectiveness of our proposed method.
A flexible scene representation for 3D reconstruction using an RGB-D camera Reviewed International journal

Diego Thomas, Akihiro Sugimoto

IEEE International Conference on Computer Vision (ICCV), 2013 2013.12

　More details

Language：English Publishing type：Research paper (international conference proceedings)

Updating a global 3D model with live RGB-D measure- ments has proven to be successful for 3D reconstruction of indoor scenes. Recently, a Truncated Signed Distance Function (TSDF) volumetric model and a fusion algorithm have been introduced (KinectFusion), showing significant advantages such as computational speed and accuracy of the reconstructed scene. This algorithm, however, is expen- sive in memory when constructing and updating the global model. As a consequence, the method is not well scalable to large scenes. We propose a new flexible 3D scene repre- sentation using a set of planes that is cheap in memory use and, nevertheless, achieves accurate reconstruction of in- door scenes from RGB-D image sequences. Projecting the scene onto different planes reduces significantly the size of the scene representation and thus it allows us to generate a global textured 3D model with lower memory requirement while keeping accuracy and easiness to update with live RGB-D measurements. Experimental results demonstrate that our proposed flexible 3D scene representation achieves accurate reconstruction, while keeping the scalability for large indoor scenes.
Learning to discover objects in RGB-D images using correlation clustering Reviewed International journal

Michael Firman, Diego Thomas, Simon Julier, Akihiro Sugimoto

IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2013 1107 - 1112 2013.11

　More details

Language：English Publishing type：Research paper (international conference proceedings)

We introduce a method to discover objects from RGB-D image collections which does not
require a user to specify the number of objects expected to be found. We propose a
probabilistic formulation to find pairwise similarity between image segments, using a
classifier trained on labelled pairs from the recently released RGB-D Object Dataset. We
then use a correlation clustering solver to both find the optimal clustering of all the segments
in the collection and to recover the number of clusters. Unlike traditional supervised learning
methods, our training data need not be of the same class or category as the objects we
expect to discover. We show that this parameter-free supervised clustering method has
superior performance to traditional clustering methods.
Compact and accurate 3-d face modeling using an rgb-d camera: Let's open the door to 3-d video conference Reviewed International journal

Pavan Kumar Anasosalu, Diego Thomas, Akihiro Sugimoto

The IEEE International Conference on Computer Vision (ICCV) Workshops 2013.5

　More details

Language：English Publishing type：Research paper (international conference proceedings)

We present a method for producing an accurate and compact 3-D face model in real time using a low cost RGB-D sensor like the Kinect camera. We extend and use Bump Images for highly accurate and low memory consumption 3-D reconstruction of the human face. Bump Images are generated by representing the Cartesian coordinates of points on the face in the spherical coordinate system whose origin is the center of the head. After initialization, the Bump Images are updated in real time with every RGB-D frame with respect to the current viewing direction and head pose that are estimated using the frame-to-global-model registration strategy. While high accuracy of the representation allows to recover fine details, low memory use opens new possible applications of consumer depth cameras such as 3-D video conferencing. We validate our approach by quantitatively comparing our result with the result obtained by a commercial high resolution laser scanner. We also discuss the potential of our proposed method for a 3-D video conferencing application with existing internet speeds.
Robust simultaneous 3D registration via rank minimization Reviewed International journal

Diego Thomas, Yasuyuki Matsushita, Akihiro Sugimoto

Second International Conference on 3D Imaging, Modeling, Processing, Visualization and Transmission (3DIMPVT), 2012 2012.10

　More details

Language：English Publishing type：Research paper (international conference proceedings)

We present a robust and accurate 3D registration method for a dense sequence of depth images taken from unknown viewpoints. Our method simultaneously estimates multiple extrinsic parameters of the depth images to obtain a registered full 3D model of the scanned scene. By arranging the depth measurements in a matrix form, we formulate the problem as a simultaneous estimation of multiple extrinsics and a low-rank matrix, which corresponds to the aligned depth images as well as a sparse error matrix. Unlike previous approaches that use sequential or heuristic global registration approaches, our solution method uses an advanced convex optimization technique for obtaining a robust solution via rank minimization. To achieve accurate computation, we develop a depth projection method that has minimum sensitivity to sampling by reading projected depth values in the input depth images. We demonstrate the effectiveness of the proposed method through extensive experiments and compare it with previous standard techniques.
Range Image Registration Based on Photometry Reviewed

Diego Thomas

PhD thesis, The National Institute of Informatics, SOKENDAI, Tokyo, Japan 2012.3

　More details

Language：English Publishing type：Research paper (scientific journal)

3D modeling of a real scene stands for constructing a virtual represen- tation of the scene, generally simplified that can be used or modified at our will. Constructing such a 3D model by hand is a laborious and time consuming task, and automating the whole process has attracted growing in- terest in the computer vision field. In particular, the task of registering (i.e. aligning) different parts of the scene (called range images) acquired from different viewpoints is of crucial importance when constructing 3D models. During the last decades, researchers have concentrated their efforts on this problem and proposed several methodologies to automatically register range images. Thereby, key-point detectors and descriptors have been utilized to match points across different range images using geometric features or tex- tural features. Several similarity metrics have also been proposed to identify the overlapping regions. In spite of the advantages of the current methods, several limitation cases have been reported. In particular, when the scene lacks in discriminative geometric features, the difficulty of accounting for the changes in appearance of the scene observed in different poses, or from different viewpoints, significantly degrades the performance of the current methods. We address this issue by investigating the use of photometry (i.e. the relationship between geometry, reflectance properties and illumination) for range image registration. First, we propose a robust descriptor using albedo that is permissive to errors in the illumination estimation. Second, we propose an albedo extraction technique for specular surfaces that enlarges the range of materials we can deal with. Third, we propose a photometric metric under unknown lighting that allows registration of range images with- out any assumptions on the illumination. With these proposed methods, we significantly enlarge the practicability and range of applications of range image registration.
Illumination-free photometric metric for range image registration Reviewed International journal

Diego Thomas, Akihiro Sugimoto

IEEE Workshop on Applications of Computer Vision (WACV), 2012 2012.1

　More details

Language：English Publishing type：Research paper (international conference proceedings)

This paper presents an illumination-free photometric metric for evaluating the goodness of a rigid transformation aligning two overlapping range images, under the assumption of Lambertian surface. Our metric is based on photometric re-projection error but not on feature detection and matching. We synthesize the color of one image using albedo of the other image to compute the photometric re-projection error. The unknown illumination and albedo are estimated from the correspondences induced by the input transformation using the spherical harmonics representation of image formation. This way allows us to derive an illumination-free photometric metric for range image alignment. We use a hypothesize-and-test method to search for the transformation that minimizes our illumination-free photometric function. Transformation candidates are efficiently generated by employing the spherical representation of each image. Experimental results using synthetic and real data show the usefulness of the proposed metric.
Robustly registering range images using local distribution of albedo Reviewed International journal

Diego Thomas and Akihiro Sugimoto

Computer Vision and Image Understanding 115 ( 5 ) 649 - 667 2011.5

　More details

Language：English Publishing type：Research paper (scientific journal)

We propose a robust method for registering overlapping range images of a Lambertian object under a rough estimate of illumination. Because reflectance properties are invariant to changes in illumination, the albedo is promising to range image registration of Lambertian objects lacking in discriminative geometric features under variable illumination. We use adaptive regions in our method to model the local distribution of albedo, which enables us to stably extract the reliable attributes of each point against illumination estimates. We use a level-set method to grow robust and adaptive regions to define these attributes. A similarity metric between two attributes is also defined to match points in the overlapping area. Moreover, remaining mismatches are efficiently removed using the rigidity constraint of surfaces. Our experiments using synthetic and real data demonstrate the robustness and effectiveness of our proposed method.

DOI： https://doi.org/10.1016/j.cviu.2010.11.016
Range image registration of specular objects under complex illumination Reviewed International journal

Diego Thomas, Akihiro Sugimoto

Fifth International Symposium on 3D Data Processing, Visualization and Transmission (3DPVT2010) 2010.6

　More details

Language：English Publishing type：Research paper (international conference proceedings)

We present a method for range image registration of specular objects devoid of salient
geometric properties under complex lighting environment. Our method uses illumination
consistency on two range images to detect specular highlights, which are used to obtain
diffuse reflection components. By using light information estimated from the specular
highlights and the diffuse reflection components, we extract albedo at the surface of an
object, even under unknown complex lighting environment. We then robustly register the two
range images using extracted albedo. This technique can handle various kind of illumination
situations and can be applied to a wide range of materials. Our experiments using synthetic
data and real data show the effectiveness, the robustness and the accuracy of our proposed
method.
Range image registration of specular objects Reviewed International journal

Diego Thomas, Akihiro Sugimoto

Proc. of CVWW’10 2010.2

　More details

Language：English Publishing type：Research paper (international conference proceedings)

We present a method for range image registration of specular objects devoid of salient
geometric properties under complex lighting environment. We propose to use illumination
consistency on two range images to detect specular highlights, which are used to obtain
diffuse reflection components. By using light information estimated from the specular
highlights and the diffuse reflection components, we extract photometric features invariant to
changes in pose and illumination, even under unknown complex lighting environment. We
then robustly register the two range images using these features. This technique can handle
various kind of illumination situations and can be applied to a wide range of materials. Our
experiments using synthetic data show the effectiveness, the robustness and the accuracy of
our proposed method.
Robust range image registration using local distribution of albedo Reviewed International journal

Diego Thomas, Akihiro Sugimoto

IEEE 12th International Conference on Computer Vision Workshops (ICCV Workshops), 2009 2009.9

　More details

Language：English Publishing type：Research paper (international conference proceedings)

We propose a robust registration method for range images under a rough estimate of
illumination. Because reflectance properties are invariant to changes in illumination, they
are promising to range image registration of objects lacking in discriminative geometric
features under variable illumination. In our method, we use adaptive regions to model the
local distribution of reflectance, which enables us to stably extract reliable attributes of each
point against illumination estimation. We use a level set method to grow robust and adaptive
regions to define these attributes. A similarity metric between two attributes is defined using
the principal component analysis to find matches. Moreover, remaining mismatches are
efficiently removed using the rigidity constraint of surfaces. Our experiments using synthetic
and real data demonstrate the robustness and effectiveness of our proposed method.

▼display all

Presentations

ProbeSDF: Light Field Probes For Neural Surface Reconstruction International coauthorship International conference

B Toussaint, D Thomas, JS Franco

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2025) 2025.6

　More details

Event date： 2025.6

Language：English Presentation type：Poster presentation
VortSDF: 3D Modeling with Centroidal Voronoi Tessellation on Signed Distance Field International coauthorship International conference

D Thomas, B Toussaint, JS Franco, E Boyer

2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV 2025) 2025.2

　More details

Event date： 2025.2

Language：English Presentation type：Oral presentation (general)

Country：United States
Neural SDF for Shadow-Aware Unsupervised Structured Light International conference

K Ichimaru, D Thomas, T Iwaguchi, H Kawasaki

2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2025.2

　More details

Event date： 2025.2

Language：English Presentation type：Oral presentation (general)
Sparse-View 3D Reconstruction of Clothed Humans via Normal Maps International coauthorship International conference

J Wu, D Thomas, R Fedkiw

2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2025.2

　More details

Event date： 2025.2

Language：English Presentation type：Oral presentation (general)
Millimetric Human Surface Capture in Minutes International coauthorship International conference

B Toussaint, L Boissieux, D Thomas, E Boyer, JS Franco

SIGGRAPH Asia 2024 2024.12

　More details

Event date： 2024.12

Language：English Presentation type：Oral presentation (general)
3D Shape Modeling with Adaptive Centroidal Voronoi Tesselation on Signed Distance Field International coauthorship

Diego Thomas, Jean-Sébastien Franco, Edmond Boyer

Meeting on Image Recognition and Understanding 2024.8

　More details

Event date： 2024.8

Language：English Presentation type：Oral presentation (general)
TetraTSDF: 3D human reconstruction from a single image with a tetrahedral outer shell International conference

Hayato Onizuka, Zehra Hayirci, Diego Thomas, Akihiro Sugimoto, Hideaki Uchiyama, Rin-ichiro Taniguchi

IEEE/CVF Conference on Computer Vision and Pattern Recognition 2020.6

　More details

Event date： 2021.5

Language：English

Country：Other
Human shape reconstruction with loose clothes from partially observed data by pose specific deformation International conference

Akihiko Sayo, Hayato Onizuka, Diego Thomas, Yuta Nakashima, Hiroshi Kawasaki, Katsushi Ikeuchi

Pacific-Rim Symposium on Image and Video Technology 2019.11

　More details

Event date： 2021.5

Language：English

Country：Australia
A Practical Calibration Method for Cameras and Multiple Line-Lasers in Light Sectioning Systems for Underwater Environments International conference

Takaki Ikeda, Takafumi Iwaguchi, Diego Thomas, Hiroshi Kawasaki

2024 IEEE International Conference on Image Processing (ICIP) 2024.10

　More details

Event date： 2024.10

Language：English
Two-stage pose optimization algorithm using color information for underwater SLAM with light-sectioning-based 3D scanning method International conference

Takaki Ikeda, Takafumi Iwaguchi, Diego Thomas, Hiroshi Kawasaki

2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2024.10

　More details

Event date： 2024.10
Neural Active Structure-from-Motion in Dark and Textureless Environment International conference

Kazuto Ichimaru, Diego Thomas, Takafumi Iwaguchi, Hiroshi Kawasaki

Proceedings of the Asian Conference on Computer Vision 2024.10

　More details

Event date： 2024.10

Language：English
Mean Teacher for Unsupervised Domain Adaptation in Multi-View 3D Pedestrian Detection International coauthorship International conference

João Paulo Lima, Diego Thomas, Hideaki Uchiyama, Veronica Teichrieb

2024 37th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI) 2024.9

　More details

Event date： 2024.9

Language：English
ActiveNeuS: Neural Signed Distance Fields for Active Stereo International conference

Kazuto Ichimaru, Takaki Ikeda, Diego Thomas, Takafumi Iwaguchi, Hiroshi Kawasaki

International Conference on 3D Vision 2024.3

　More details

Event date： 2024.3

Language：English

Venue：Davos Country：Switzerland
A Two-step Approach for Interactive Animatable Avatars International conference

Takumi Kitamura, Naoya Iwamoto, Hiroshi Kawasaki, Diego Thomas

COMPUTER GRAPHICS INTERNATIONAL 2023 2023.8

　More details

Event date： 2023.8 - 2023.9

Language：English Presentation type：Oral presentation (general)

Venue：Shanghai Country：China

We propose a two-step human body animation technique that generates pose-dependent detailed deformations in real-time on standard animation pipeline. In order to accomplish real-time animation, we utilize the template-based approach and represent pose-dependent deformations using 2D displacement maps. In order to generalize to totally new motions, we employ a two-step strategy: 1) the first step aligns the topology of the Skinned Multi-Person Linear Model (SMPL) [23] model to our proposed template model. 2) the second step models detailed clothes and muscles deformation for the specific motion. Our experimental results show that our proposed method can animate an avatar up to 30 times faster than baselines while keeping similar or even better level of details.
Refining OpenPose With a New Sports Dataset for Robust 2D Pose Estimation International conference

Takumi Kitamura, Hitoshi Teshima, Diego Thomas, Hiroshi Kawasaki

IEEE/CVF Winter Conference on Applications of Computer Vision 2022.1

　More details

Event date： 2022.1

Language：English Presentation type：Oral presentation (general)

Venue：Hawai (online) Country：United States

3D marker-less motion capture can be achieved by triangulating estimated multi-views 2D poses. However, when the 2D pose estimation fails, the 3D motion capture also fails. This is particularly challenging for sports performance of athletes, which have extreme poses. In extreme poses (like having the head down) state-of-the-art 2D pose estimator such as OpenPose do not work at all. In this paper, we propose a new method to improve the training of 2D pose estimators for extreme poses by leveraging a new sports dataset and our proposed data augmentation strategy. Our results show significant improvements over previous methods for 2D pose estimation of athletes performing acrobatic moves, while keeping state-of-the-art performance on standard datasets.

Other Link： https://openaccess.thecvf.com/content/WACV2022W/CV4WS/html/Kitamura_Refining_OpenPose_With_a_New_Sports_Dataset_for_Robust_2D_WACVW_2022_paper.html
スポーツ選手のマーカレスモーションキャプチャーのための効率的なOpenpose再学習

#北村卓弥, 川崎洋, ディエゴトマ

情報処理学会　第248回自然言語処理・第226回コンピュータビジョンとイメージメディア合同研究発表会 2021.5

　More details

Event date： 2021.5

Language：Japanese Presentation type：Symposium, workshop panel (public)

Country：Japan
Analysis and Classification of Gestures in TED Talks

Hitoshi Teshima, Naoki Wake, Diego Thomas, Yuta Nakashima, Hiroshi Kawasaki, Katsushi Ikeuchi

パターン認識・メディア理解研究会 (PRMU 2020) 2020.10

　More details

Event date： 2021.5

Language：Japanese Presentation type：Symposium, workshop panel (public)

Country：Japan
Unsupervised 3D Human Pose Estimation in Multi-view-multi-pose Video International conference

Sun, Cheng, Diego Thomas, and Hiroshi Kawasaki

25th International Conference on Pattern Recognition (ICPR) 2021.1

　More details

Event date： 2021.5

Language：English

Country：Other
3D human body reconstruction using RGB-D camera Invited International conference

Diego Thomas

Asia Pacific Society for Computing and Information Technology 2019 Annual Meeting 2019.7

　More details

Event date： 2019.7

Language：English Presentation type：Oral presentation (general)

Venue：Sapporo, Hokkaido Country：Japan

Consumer grade RGB-D cameras have become the commodity tool to build dense 3D models of indoor scenes. Motivated by the strong demand to build high-fidelity personal 3D avatars, there is now many efforts done to use RGB-D cameras to automatically reconstruct high-quality 3D models of the human body. This is a very difficult task because the human body non-rigidly moves during the scanning process. How to simultaneously reconstruct the detailed 3D shape of the body while accurately tracking the non-rigid motion is the main challenge that all successful systems must solve. In addition, to be used in portable devices such as smartphones, solutions that require few memory consumption and low computational power are needed. In this talk, we will first briefly review existing successful strategies for real-time 3D human body reconstruction. Then, we will present our proposed solution for 3D human body reconstruction that is light in memory consumption and computational power. Our main idea here is to separate the full body non-rigid reconstruction into multiple nearly-rigid reconstructions of body parts that are tightly stitched together.
VMPFusion: Variational Message Passing for dynamic 3D face reconstruction Invited International conference

Diego Thomas

IDS/JFLI workshop 2018.5

　More details

Event date： 2019.6

Language：English Presentation type：Oral presentation (general)

Venue：Osaka Country：Japan

In this talk I will describe a probabilistic approach for dynamic 3D face modeling using a consumer-grade RGB-D camera. In this research my goal is to formulate a strategy to fuse noisy 3D measurements captured with a Kinect camera into a 3D facial model without relying on explicit point correspondences. We propose to tackle this challenging problem with the Variational Message Passing (VMP) algorithm, which optimize a variational distribution using a message passing procedure on a graphical model. We show the validity of our formulation with real-data experiments.
3D Modeling of Large-Scale Indoor Scenes Using RGB-D Cameras Invited International conference

Diego Thomas, Akihiro Sugimoto

The 1st International Conference on Advanced Imaging 2015.6

　More details

Event date： 2018.6

Language：English Presentation type：Oral presentation (general)

Venue：National Center of Science, Tokyo, Japan Country：Japan
Synthesis of environment maps for mixed reality

David R. Walton, Diego Gabriel Francis Thomas, Anthony Steed, Akihiro Sugimoto

16th IEEE International Symposium on Mixed and Augmented Reality, ISMAR 2017 2017.11

　More details

Event date： 2017.10

Language：English

Venue：Nantes Country：France

When rendering virtual objects in a mixed reality application, it is helpful to have access to an environment map that captures the appearance of the scene from the perspective of the virtual object. It is straightforward to render virtual objects into such maps, but capturing and correctly rendering the real components of the scene into the map is much more challenging. This information is often recovered from physical light probes, such as reflective spheres or fisheye cameras, placed at the location of the virtual object in the scene. For many application areas, however, real light probes would be intrusive or impractical. Ideally, all of the information necessary to produce detailed environment maps could be captured using a single device. We introduce a method using an RGBD camera and a small fisheye camera, contained in a single unit, to create environment maps at any location in an indoor scene. The method combines the output from both cameras to correct for their limited field of view and the displacement from the virtual object, producing complete environment maps suitable for rendering the virtual content in real time. Our method improves on previous probeless approaches by its ability to recover high-frequency environment maps. We demonstrate how this can be used to render virtual objects which shadow, reflect and refract their environment convincingly.
Fast 3D point cloud segmentation using supervoxels with geometry and color for 3D scene understanding

Francesco Verdoja, Diego Gabriel Francis Thomas, Akihiro Sugimoto

2017 IEEE International Conference on Multimedia and Expo, ICME 2017 2017.8

　More details

Event date： 2017.7

Language：English

Venue：Hong Kong

Segmentation of 3D colored point clouds is a research field with renewed interest thanks to recent availability of inexpensive consumer RGB-D cameras and its importance as an unavoidable low-level step in many robotic applications. However, 3D data's nature makes the task challenging and, thus, many different techniques are being proposed, all of which require expensive computational costs. This paper presents a novel fast method for 3D colored point cloud segmentation. It starts with supervoxel partitioning of the cloud, i.e., an oversegmentation of the points in the cloud. Then it leverages on a novel metric exploiting both geometry and color to iteratively merge the supervoxels to obtain a 3D segmentation where the hierarchical structure of partitions is maintained. The algorithm also presents computational complexity linear to the size of the input. Experimental results over two publicly available datasets demonstrate that our proposed method outperforms state-of-the-art techniques.
Augmented blendshapes for real-time simultaneous 3D head modeling and facial motion capture

Diego Gabriel Francis Thomas, Rin-Ichiro Taniguchi

2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016 2016

　More details

Event date： 2016.6 - 2016.7

Language：English

Venue：Las Vegas Country：United States

We propose a method to build in real-time animated 3D head models using a consumer-grade RGB-D camera. Our framework is the first one to provide simultaneously comprehensive facial motion tracking and a detailed 3D model of the user's head. Anyone's head can be instantly reconstructed and his facial motion captured without requiring any training or pre-scanning. The user starts facing the camera with a neutral expression in the first frame, but is free to move, talk and change his face expression as he wills otherwise. The facial motion is tracked using a blendshape representation while the fine geometric details are captured using a Bump image mapped over the template mesh. We propose an efficient algorithm to grow and refine the 3D model of the head on-the-fly and in real-time. We demonstrate robust and high-fidelity simultaneous facial motion tracking and 3D head modeling results on a wide range of subjects with various head poses and facial expressions. Our proposed method offers interesting possibilities for animation production and 3D video telecommunications.
Dense 3D reconstruction using RGB-D cameras Invited International conference

Diego Thomas

International Conference on 3DVision 2014 2014.12

　More details

Event date： 2014.12

Language：English Presentation type：Public lecture, seminar, tutorial, course, or other speech

Venue：3DV2014, Tokyo, Japan. Country：Japan

The generation of fine 3D models from RGB-D (color plus depth) measurements is of great interest for the computer vision community. Although the 3D reconstruction pipeline has been widely studied in the last decades, a new era has started recently with the advent of low cost consumer depth cameras (called RGB-D cameras) that capture RGB-D images at a video rate (e.g., Microsoft Kinect or Asus Xtion Pro). The introduction to the public of 3D measurements has brought its own revolution to the scientific community with many projects and applications using RGB-D cameras.

In this tutorial, we will give an overview of the existing 3D reconstruction methods using a single RGB-D camera using various 3D representations, including point based representations (SURFELS), implicit volumetric representations (TSDF), patch based representations and parametric representations. These different 3D scene representations give us powerful tools to build virtual representations of the real world in real-time from RGB-D cameras. We can not only reconstruct small-scale static scenes but also large-scale scenes and dynamic scenes. We will also discuss about current trend in depth sensing and future challenges for 3D scene reconstruction.
A two-stage strategy for real-time dense 3D reconstruction of large-scale scenes

Diego Gabriel Francis Thomas, Akihiro Sugimoto

13th European Conference on Computer Vision, ECCV 2014 2015.1

　More details

Event date： 2014.9

Language：English

Venue：Zurich Country：Switzerland

The frame-to-global-model approach is widely used for accurate 3D modeling from sequences of RGB-D images. Because still no perfect camera tracking system exists, the accumulation of small errors generated when registering and integrating successive RGB-D images causes deformations of the 3D model being built up. In particular, the deformations become significant when the scale of the scene to model is large. To tackle this problem, we propose a two-stage strategy to build in details a large-scale 3D model with minimal deformations where the first stage creates accurate small-scale 3D scenes in real-time from short subsequences of RGB-D images while the second stage re-organises all the results from the first stage in a geometrically consistent manner to reduce deformations as much as possible. By employing planar patches as the 3D scene representation, our proposed method runs in real-time to build accurate 3D models with minimal deformations even for large-scale scenes. Our experiments using real data confirm the effectiveness of our proposed method.
A flexible scene representation for 3D reconstruction using an RGB-D camera

Diego Gabriel Francis Thomas, Akihiro Sugimoto

2013 14th IEEE International Conference on Computer Vision, ICCV 2013 2013

　More details

Event date： 2013.12

Language：English

Venue：Sydney, NSW Country：Australia

Updating a global 3D model with live RGB-D measurements has proven to be successful for 3D reconstruction of indoor scenes. Recently, a Truncated Signed Distance Function (TSDF) volumetric model and a fusion algorithm have been introduced (KinectFusion), showing significant advantages such as computational speed and accuracy of the reconstructed scene. This algorithm, however, is expensive in memory when constructing and updating the global model. As a consequence, the method is not well scalable to large scenes. We propose a new flexible 3D scene representation using a set of planes that is cheap in memory use and, nevertheless, achieves accurate reconstruction of indoor scenes from RGB-D image sequences. Projecting the scene onto different planes reduces significantly the size of the scene representation and thus it allows us to generate a global textured 3D model with lower memory requirement while keeping accuracy and easiness to update with live RGB-D measurements. Experimental results demonstrate that our proposed flexible 3D scene representation achieves accurate reconstruction, while keeping the scalability for large indoor scenes.
Compact and accurate 3-D face modeling using an RGB-D camera Let's open the door to 3-D video conference

Pavan Kumar Anasosalu, Diego Gabriel Francis Thomas, Akihiro Sugimoto

2013 14th IEEE International Conference on Computer Vision Workshops, ICCVW 2013 2013

　More details

Event date： 2013.12

Language：English

Venue：Sydney, NSW Country：Australia

We present a method for producing an accurate and compact 3-D face model in real time using a low cost RGB-D sensor like the Kinect camera. We extend and use Bump Images for highly accurate and low memory consumption 3-D reconstruction of the human face. Bump Images are generated by representing the Cartesian coordinates of points on the face in the spherical coordinate system whose origin is the center of the head. After initialization, the Bump Images are updated in real time with every RGB-D frame with respect to the current viewing direction and head pose that are estimated using the frame-to-global-model registration strategy. While high accuracy of the representation allows to recover fine details, low memory use opens new possible applications of consumer depth cameras such as 3-D video conferencing. We validate our approach by quantitatively comparing our result with the result obtained by a commercial high resolution laser scanner. We also discuss the potential of our proposed method for a 3-D video conferencing application with existing internet speeds.
Learning to discover objects in RGB-D images using correlation clustering

Michael Firman, Diego Gabriel Francis Thomas, Simon Julier, Akihiro Sugimoto

2013 26th IEEE/RSJ International Conference on Intelligent Robots and Systems: New Horizon, IROS 2013 2013.12

　More details

Event date： 2013.11

Language：English

Venue：Tokyo Country：Japan

We introduce a method to discover objects from RGB-D image collections which does not require a user to specify the number of objects expected to be found. We propose a probabilistic formulation to find pairwise similarity between image segments, using a classifier trained on labelled pairs from the recently released RGB-D Object Dataset. We then use a correlation clustering solver to both find the optimal clustering of all the segments in the collection and to recover the number of clusters. Unlike traditional supervised learning methods, our training data need not be of the same class or category as the objects we expect to discover. We show that this parameter-free supervised clustering method has superior performance to traditional clustering methods.
Robust simultaneous 3D registration via rank minimization

Diego Gabriel Francis Thomas, Yasuyuki Matsushita, Akihiro Sugimoto

2nd Joint 3DIM/3DPVT Conference: 3D Imaging, Modeling, Processing, Visualization and Transmission, 3DIMPVT 2012 2012

　More details

Event date： 2012.10

Language：English

Venue：Zurich Country：Switzerland

We present a robust and accurate 3D registration method for a dense sequence of depth images taken from unknown viewpoints. Our method simultaneously estimates multiple extrinsic parameters of the depth images to obtain a registered full 3D model of the scanned scene. By arranging the depth measurements in a matrix form, we formulate the problem as a simultaneous estimation of multiple extrinsics and a low-rank matrix, which corresponds to the aligned depth images as well as a sparse error matrix. Unlike previous approaches that use sequential or heuristic global registration approaches, our solution method uses an advanced convex optimization technique for obtaining a robust solution via rank minimization. To achieve accurate computation, we develop a depth projection method that has minimum sensitivity to sampling by reading projected depth values in the input depth images. We demonstrate the effectiveness of the proposed method through extensive experiments and compare it with previous standard techniques.
Illumination-free photometric metric for range image registration

Diego Gabriel Francis Thomas, Akihiro Sugimoto

2012 IEEE Workshop on the Applications of Computer Vision, WACV 2012 2012

　More details

Event date： 2012.1

Language：English

Venue：Breckenridge, CO Country：United States

This paper presents an illumination-free photometric metric for evaluating the goodness of a rigid transformation aligning two overlapping range images, under the assumption of Lambertian surface. Our metric is based on photometric re-projection error but not on feature detection and matching. We synthesize the color of one image using albedo of the other image to compute the photometric re-projection error. The unknown illumination and albedo are estimated from the correspondences induced by the input transformation using the spherical harmonics representation of image formation. This way allows us to derive an illumination-free photometric metric for range image alignment. We use a hypothesize-and-test method to search for the transformation that minimizes our illumination-free photometric function. Transformation candidates are efficiently generated by employing the spherical representation of each image. Experimental results using synthetic and real data show the usefulness of the proposed metric.
Robust range image registration using local distribution of albedo

Diego Gabriel Francis Thomas, Akihiro Sugimoto

2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops 2009 2009

　More details

Event date： 2009.9 - 2009.10

Language：English

Venue：Kyoto Country：Japan

We propose a robust registration method for range images under a rough estimate of illumination. Because reflectance properties are invariant to changes in illumination, they are promising to range image registration of objects lacking in discriminative geometric features under variable illumination. In our method, we use adaptive regions to model the local distribution of reflectance, which enables us to stably extract reliable attributes of each point against illumination estimation. We use a level set method to grow robust and adaptive regions to define these attributes. A similarity metric between two attributes is defined using the principal component analysis to find matches. Moreover, remaining mismatches are efficiently removed using the rigidity constraint of surfaces. Our experiments using synthetic and real data demonstrate the robustness and effectiveness of our proposed method.

▼display all

Professional Memberships

IEEE

Committee Memberships

IPSJ Domestic

2024.4 - Present

　 More details

Committee type：Academic society

Academic Activities

Symposium on Virtual and Augmented Reality 2025

Role(s)： Planning, management, etc.

SBC 2025.9 - 2025.10

　More details

Type：Competition, symposium, etc.
Shonan meeting #226: The Power of Geometric Algebra in Modern Computer Vision International contribution

Role(s)： Planning, management, etc.

NII Shonan Meeting 2025.5

　More details

Type：Academic society, research group, etc.

Number of participants：18
Area chair

第27回画像の認識・理解シンポジウム MIRU2024 （ Kumamoto Japan ） 2024.8

　More details

Type：Competition, symposium, etc.

Number of participants：1,000
Screening of academic papers

Role(s)： Peer review

2023

　More details

Type：Peer review

Number of peer-reviewed articles in foreign language journals：3

Proceedings of International Conference Number of peer-reviewed papers：25

Proceedings of domestic conference Number of peer-reviewed papers：3
Program commitee International contribution

CVPR2022 （ New Orleans, Louisiana UnitedStatesofAmerica ） 2022.6

　More details

Type：Competition, symposium, etc.

Number of participants：10,000
Senior Program Committee International contribution

AAAI 2022 （ Vancouver Canada ） 2022.2 - 2022.3

　More details

Type：Competition, symposium, etc.

Number of participants：8,000
Screening of academic papers

Role(s)： Peer review

2022

　More details

Type：Peer review

Number of peer-reviewed articles in foreign language journals：8

Proceedings of International Conference Number of peer-reviewed papers：18

Proceedings of domestic conference Number of peer-reviewed papers：3
Senior Program Committee Member International contribution

30th International Joint Conference on Artificial Intelligence (IJCAI-21) （ Montreal, Canada Canada ） 2021.8 - 2021.5

　More details

Type：Competition, symposium, etc.

Number of participants：1,000
Program committee International contribution

CVPR 2021 （ Online UnitedStatesofAmerica ） 2021.6 - 2021.5

　More details

Type：Competition, symposium, etc.

Number of participants：10,000
Program committee International contribution

WACV 2021 （ UnitedStatesofAmerica ） 2021.3 - 2021.5

　More details

Type：Competition, symposium, etc.

Number of participants：400
講演座長

情報処理学会第83回全国大会（ Online Japan ） 2021.3 - 2021.5

　More details

Type：Competition, symposium, etc.

Number of participants：100
Local chair International contribution

3D Vision (3DV 2020) （ Fukuoka Japan ） 2020.11

　More details

Type：Competition, symposium, etc.

Number of participants：300
Program committee International contribution

CVPR 2020 （ Seattle, Washington UnitedStatesofAmerica ） 2020.6

　More details

Type：Competition, symposium, etc.

Number of participants：8,000
Program committee International contribution

WACV 2020 （ Aspen UnitedStatesofAmerica ） 2020.3

　More details

Type：Competition, symposium, etc.

Number of participants：1,000
Screening of academic papers

Role(s)： Peer review

2020

　More details

Type：Peer review

Number of peer-reviewed articles in foreign language journals：4

Number of peer-reviewed articles in Japanese journals：2

Proceedings of International Conference Number of peer-reviewed papers：25

Proceedings of domestic conference Number of peer-reviewed papers：4
Program chair International contribution

Machine Perception and Robotics (MPR 2019) （ Biwako Kusatsu Campus (BKC), Ritsumeikan University Japan ） 2019.11

　More details

Type：Competition, symposium, etc.

Number of participants：80
Area chair International contribution

The 9th Pacific-Rim Symposium on Image and Video Technology (PSIVT 2019) （ Sydney Australia ） 2019.11

　More details

Type：Competition, symposium, etc.

Number of participants：80
Screening of academic papers

Role(s)： Peer review

2019

　More details

Type：Peer review

Number of peer-reviewed articles in foreign language journals：15

Proceedings of International Conference Number of peer-reviewed papers：25
Publicity chair International contribution

The 12th International Workshop on Information Search, Integration, and Personalization (ISIP2018) （ Kyushu University, Fukuoka, Japan Japan ） 2018.5

　More details

Type：Competition, symposium, etc.

Number of participants：40
Publicity chair International contribution

The 12th International Workshop on Information Search, Integration, and Personalization (ISIP 2018) （ Kyushu University, Fukuoka Japan ） 2018.5

　More details

Type：Competition, symposium, etc.

Number of participants：50
Screening of academic papers

Role(s)： Peer review

2018

　More details

Type：Peer review

Number of peer-reviewed articles in foreign language journals：20

Proceedings of International Conference Number of peer-reviewed papers：20
Local arrangement chair International contribution

JFLI-KYUDAI JOINT WORKSHOP ON INFORMATICS （ Ito Campus, Kyushu University, Fukuoka, Japan Japan ） 2017.9

　More details

Type：Competition, symposium, etc.

Number of participants：15
Screening of academic papers

Role(s)： Peer review

2017

　More details

Type：Peer review

Number of peer-reviewed articles in foreign language journals：10

Proceedings of International Conference Number of peer-reviewed papers：24
Program Committee International contribution

SITIS2016 （ Naples Italy ） 2016.11 - 2016.12

　More details

Type：Competition, symposium, etc.
Program Committee

MIRU2016 （ Hamamatsu Japan ） 2016.8

　More details

Type：Competition, symposium, etc.
Screening of academic papers

Role(s)： Peer review

2016

　More details

Type：Peer review

Number of peer-reviewed articles in foreign language journals：2

Proceedings of International Conference Number of peer-reviewed papers：19

Proceedings of domestic conference Number of peer-reviewed papers：1
Program Committee

MIRU2015 （ Osaka Japan ） 2015.7

　More details

Type：Competition, symposium, etc.
Screening of academic papers

Role(s)： Peer review

2015

　More details

Type：Peer review

Number of peer-reviewed articles in foreign language journals：1

Proceedings of International Conference Number of peer-reviewed papers：12
Program committee

MIRU2014 （ Okayama Japan ） 2014.7

　More details

Type：Competition, symposium, etc.

▼display all

Research Projects

Development of Digital Twin of Common Marmoset

Grant number：25wm0625404h0002 2024.9 - 2029.3

AMED Neurointegration Program, Team Type B AMED Neurointegration Program, Team Type B

　 More details

Authorship：Coinvestigator(s) Grant type：Competitive funding other than Grants-in-Aid for Scientific Research
A new data-driven approach to bring humanity into virtual worlds with computer vision

Grant number：23H03439 2023 - 2025

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B)

　 More details

Authorship：Principal investigator Grant type：Scientific research funding
NerF-based multi-view 3D shape reconstruction using Centroidal Voronoi Tessellation International coauthorship

2022.4 - 2023.6

Kyushu University (Japan)

　 More details

Authorship：Principal investigator

We investigate the use of CVT to jointly optimize 3D shape, appearance and discretization of the 3D space for high definition 3D mesh reconstruction from multi-view images.
Multi-Camera 3D Pedestrian Detection with Domain Adaptation and Generalization

2022 - 2023

Japan Society for the Promotion of Science JSPS Invitational Fellowships for Research in Japan (short term)

　 More details

Authorship：Principal investigator Grant type：Joint research
AI-based animation of 3D avatars.

2021.6 - 2022.5

Joint research

　 More details

Authorship：Principal investigator Grant type：Other funds from industry-academia collaboration
Deep human avatar animation International coauthorship

2021.5 - 2022.5

Japan

　 More details

Authorship：Principal investigator

This is a joint research project with HUAWEI about learning to generate avatar animations from 2D videos in real-time
Realistic environment rendering with real humans for architecture project visualization

2021.4 - 2022.5

　 More details

Authorship：Principal investigator

This is a joint project with Professor Koga (architecture design) and Professor Ochiai (Maths for industry) about generating immersive virtual environments of architectural project to support design and evaluation.
Multi-view 3D pedestrian localisation International coauthorship

2021.3 - 2023.4

Brazil

　 More details

Authorship：Coinvestigator(s)

The project is about identifying, localizing and tracking pedestrians in 3D from multi-view videos.
A new approach for supporting architectural works with virtual reality environments.

2021 - 2022

QR Tsubasa (つばさプロジェクト)

　 More details

Authorship：Principal investigator Grant type：On-campus funds, funds, etc.
Weakly-supervised human 3D body shape estimation from single images International coauthorship

2020.9 - 2021.8

U.S.A

　 More details

Authorship：Coinvestigator(s)

We are working on a solution to learn to estimate 3D shape of human bodies from 2D observation in an unsupervised manner.
Dynamic human motion tracking using dual quaternion algebra International coauthorship

2020.7 - 2022.3

Japan

　 More details

Authorship：Coinvestigator(s)

Joint research project with Vincent Nozick from Gustave-Eiffel University in France . This project is about reconstructing non-rigid motion of human bodies captured by RGB-D cameras.
Human body 3D shape estimation, animation and gesture synthesis

2020.4 - 2021.3

Joint research

　 More details

Authorship：Principal investigator Grant type：Other funds from industry-academia collaboration
Personalized avatars with real emotions for next generation holoportation systems International coauthorship

2020.1 - 2021.1

Microsoft Research Asia

　 More details

Authorship：Principal investigator

Personalized avatars are the key towards more natural communication in the virtual space. If you can express yourself with not only your own voice, but your own body, expressions or emotions it allows you to better communicate. This is also a powerful way to avoid being cheated by fake characters. And there is a huge demand for real avatars and emotes, with a big business opportunity. When communicating in the virtual space it is important to transmit real expressions and real emotions, but it is also important to keep the possibility to remain anonymous. While ultra-realistic avatars that have someone’s own appearance, skin and face will surely break anonymity, body motion and gesture can convey a large part of real expressions and emotions without revealing a person’s identity. In this project, we aim at capturing full body 3D motion and fine gestures and re-targeting them into a mixed reality telepresence system (also called holoportation) deployed on the Microsoft Hololens. To achieve our objective there are three main challenges to tackle: (1) detailed 3D motion of the human body must be captured from standard RGB cameras; (2) the human motion must be faithfully re-targeted to a virtual avatar, which may have different animation characteristics than the human; (3) the avatar must be displayed in 3D with the Hololens while considering the surrounding illumination conditions. Fundamental findings unveiled in the project will provide new insights for human motion estimation, re-targeting to other bodies with different kinematics and environment mapping with mixed reality devices.
2 years training and international research

2020 - 2022

SENTAN-Q

　 More details

Authorship：Principal investigator Grant type：On-campus funds, funds, etc.
3D shape estimation and motion retargeting from 2D videos for future Holoportation systems.

2020

QR Wakaba challenge

　 More details

Authorship：Principal investigator Grant type：On-campus funds, funds, etc.
Unifying multiple RGB and depth cameras for real-time large-scale dynamic 3D modeling with unmanned micro aerial vehicles.

2019.4 - 2021.4

KAKENHI

　 More details

Authorship：Principal investigator

The project is about real-time 3D reconstruction of large-scale dynamic scenes (i.e., scenes containing one or more moving objects to be modeled, possibly with shape deformation) from unmanned micro aerial vehicles. The objective is to investigate fusion of multiple RGB and depth sensors mounted on multiple micro aerial vehicle for real-time 3D reconstruction of large-scale dynamic 3D scenes. Fundamental algorithms that will be unveiled here will be used to build large-scale dynamic 3D models and provide the necessary tools for real-time automatic dynamic 3D scene understanding.
Unifying multiple RGB and depth cameras for real-time large-scale dynamic 3D modeling with unmanned micro aerial vehicles

Grant number：19K20297 2019 - 2020

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Early-Career Scientists

　 More details

Authorship：Principal investigator Grant type：Scientific research funding
Facial motion capture International coauthorship

2017.10 - 2018.9

Huawei Technologies Japan K.K (China).

　 More details

Authorship：Collaborating Investigator(s) (not designated on Grant-in-Aid)

This project is divided into three stages, the first stage is that roughly evaluates our base algorithm, and the second stage is that evaluates the robustness for overall reconstruction (expression) ability of the facial impression transfer to any 3D avatar by any person. And the third stage is that improves facial model quality (as for providing complete facial model, we need to add eye ball and mouth).
Facial motion capture system

2017.9 - 2018.8

Joint research

　 More details

Authorship：Collaborating Investigator(s) (not designated on Grant-in-Aid) Grant type：Other funds from industry-academia collaboration
Free-form dynamic 3D scene reconstruction at high resolution

2017 - 2018

スタートアップ支援経費

　 More details

Authorship：Coinvestigator(s) Grant type：On-campus funds, funds, etc.
Large-scale and dynamic 3D reconstruction using an RGB-D camera

2015 - 2017

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Grant-in-Aid for JSPS Fellows

　 More details

Authorship：Principal investigator Grant type：Scientific research funding

▼display all

Educational Activities

I am teaching practically exercises of data science. In this class we teach program implementation for research subject of each student. We also provide individual guidance within the lecture time.
I am teaching the class "Information Science" for the first year students of the University.
I have been teaching the class "Programming in Python" for the first year students of the University.
I am teaching the class "Digital Humans I & II" at the faculty of Information Science and Electrical Engineering.
I am teaching the experimental class "Distributed robots" at the faculty of Information Science and Electrical Engineering.
I am teaching the class "Methodologies for practical data analysis 1 & 2" at the School of Interdisciplinary Science and Innovation

Class subject

デジタルヒューマンII

2024.6 - 2024.8 Summer quarter
デジタルヒューマンⅠ

2024.4 - 2024.6 Spring quarter
情報科学（英語）

2023.10 - 2024.3 Second semester
情報理工学論議Ⅱ

2023.10 - 2024.3 Second semester
情報理工学論述Ⅱ

2023.10 - 2024.3 Second semester
情報理工学演示

2023.10 - 2024.3 Second semester
デジタルヒューマンⅡ

2023.6 - 2023.8 Summer quarter
【通年】情報理工学講究

2023.4 - 2024.3 Full year
【通年】情報理工学研究Ⅰ

2023.4 - 2024.3 Full year
【通年】情報理工学演習

2023.4 - 2024.3 Full year
情報理工学論議Ⅰ

2023.4 - 2023.9 First semester
情報理工学読解

2023.4 - 2023.9 First semester
情報理工学論述Ⅰ

2023.4 - 2023.9 First semester
分散ロボット実験

2023.4 - 2023.6 Spring quarter
デジタルヒューマンⅠ

2023.4 - 2023.6 Spring quarter
情報科学（英語）

2022.10 - 2023.3 Second semester
データサイエンス演習第一

2022.10 - 2023.3 Second semester
データサイエンス演習第二

2022.10 - 2023.3 Second semester
情報知能工学演習第三

2021.10 - 2022.3 Second semester
情報知能工学演習第一

2021.10 - 2022.3 Second semester
データサイエンス演習第二

2021.10 - 2022.3 Second semester
データサイエンス演習第一

2021.10 - 2022.3 Second semester
プログラミング演習(P)

2021.6 - 2021.8 Summer quarter
情報知能工学演習第二

2021.4 - 2021.9 First semester
データサイエンス演習第一

2020.10 - 2021.3 Second semester
データサイエンス演習第二

2020.10 - 2021.3 Second semester
情報科学

2020.4 - 2020.9 First semester
情報科学

2019.10 - 2020.3 Second semester
データサイエンス演習第二

2019.4 - 2019.9 First semester
データサイエンス演習第一

2019.4 - 2019.9 First semester
データサイエンス演習第二

2018.4 - 2018.9 First semester
データサイエンス演習第一

2018.4 - 2018.9 First semester
データサイエンス演習第二

2017.4 - 2017.9 First semester
データサイエンス演習第一

2017.4 - 2017.9 First semester

▼display all

FD Participation

2025.3 Role：Participation Title：各種表彰／フェロー称号等の戦略的獲得に向けて

Organizer：University-wide
2025.2 Role：Participation Title：プレアドミッション・サポートデスク(PSD)による留学生のための出願前支援〜導入のメリット〜

Organizer：[Undergraduate school/graduate school/graduate faculty]
2025.1 Role：Participation Title：脳内シナプスの分子マッピングとその情報処理メカニズムの解明

Organizer：[Undergraduate school/graduate school/graduate faculty]
2025.1 Role：Participation Title：日本学術振興会の人材育成事業と男女共同参画推進に関するご紹介 ― 特別研究員制度、日本学術振興会賞ほか ―

Organizer：University-wide

Outline of Social Contribution and International Cooperation activities

I have been actively involved in international research collaboration with France, Canada and Brazil over the last years. I have hosted several international research internship student, visited foreign labs, co-organized international workshops and conferences.

Social Activities

JSPS Science Dialogue

Fukui Prefectural Wakasa High School (Wakasa-city, Fukui) 2017.1

　More details

Audience：Infants,　Schoolchildren,　Junior students,　High school students

Type：Seminar, workshop

Acceptance of Foreign Researchers, etc.

Acceptance period： 2025.6 - 2025.12 （Period）：1 month or more

Nationality：Taiwan, Province of China

Business entity：Japan Science and Technology Agency
Acceptance period： 2025.2 - 2025.12 （Period）：1 month or more

Nationality：France

Business entity：Japan Science and Technology Agency
Max Planck Institute

Acceptance period： 2025.2 - 2025.3 （Period）：1 month or more

Nationality：France

Business entity：Japan Science and Technology Agency

Travel Abroad

2016.12

Staying countory name 1：France Staying institution name 1：INRIA Grenoble
2011.3 - 2011.7

Staying countory name 1：China Staying institution name 1：Microsoft Research Asia
2010.2

Staying countory name 1：Czech Republic Staying institution name 1：Center for Machine Perception (CMP)