九州大学 研究者情報
研究者情報 (研究者の方へ)入力に際してお困りですか?
基本情報 研究活動 教育活動 社会活動
THOMAS DIEGO GABRIEL FRANCIS(トマ ディエゴ ガブリエル フランシス) データ更新日:2024.03.29

准教授 /  システム情報科学研究院 情報知能工学部門 情報知能工学部門 実世界ロボティクス講座


主な研究テーマ
デジタルヒューマン
キーワード:generative AI; 3D and 4D capture; motion retargeting; gesture
2023.01.
Aerial-based outdoor 3D scene mapping
キーワード:aerial drone; RGB-D SLAM; outdoor scene
2020.04~2022.04.
AI-based avatar animation synthesis
キーワード:deep learning; avatar animation; dense deformation; texture
2021.06~2022.06.
一枚の画像からの3D形状
キーワード:Deep learning, 3D shape estimation
2019.04~2021.08.
幼児教育のためにバーチャルアシスタント
キーワード:三次元シーン理解、教育アプリケーション、三次元バーチャルアシスタント
2018.05~2020.06.
複数のカメラで高フレームレート3D再構成
キーワード:RGB-D camera; high frame rate; multi-view set-up; real time; distributed system; GPU optimization; volumetric reconstruction; fast and uncontrolled motion
2017.12~2018.02.
動的シーンにおける人体の3次元再構成
キーワード:RGB-D camera; fast motion; skeleton; deforming bounding boxes; volumetric depth fusion; ICP; GPU optimization; large-scale scene
2017.04~2018.02.
顔の3次元再構成と表現の追跡
キーワード:RGB-D camera; Facial expression; Blendshape; Template mesh; Texturing; 3D modeling; Retargeting; Deviation mapping; Real-time.
2015.04~2018.02.
3次元モデリング
キーワード:RGB-D カメラ; SLAM; 3次元モデリング
2012.04~2017.04.
従事しているプロジェクト研究
NerF-based multi-view 3D shape reconstruction using Centroidal Voronoi Tessellation
2022.04~2023.06, 代表者:Diego Thomas, Kyushu University, Kyushu University (Japan)
多視点画像から高解像度の 3D メッシュを再構成するために、3D 空間の 3D 形状、外観、離散化を共同で最適化するための CVT の使用を調査します。.
Multi-view 3D pedestrian localisation
2021.03~2023.04, 代表者:Joao Paulo Lima, University of Pernambuco, Brazil, Brazil
The project is about identifying, localizing and tracking pedestrians in 3D from multi-view videos..
Realistic environment rendering with real humans for architecture project visualization
2021.04~2022.05, 代表者:Diego Thomas, Kyushu University
This is a joint project with Professor Koga (architecture design) and Professor Ochiai (Maths for industry) about generating immersive virtual environments of architectural project to support design and evaluation..
Deep human avatar animation
2021.05~2022.05, 代表者:Diego Thomas, Kyushu University, Japan
This is a joint research project with HUAWEI about learning to generate avatar animations from 2D videos in real-time.
Dynamic human motion tracking using dual quaternion algebra
2020.07~2022.03, 代表者:Stephane Breuil, National Institute of Informatics, Japan
Joint research project with Vincent Nozick from Gustave-Eiffel University in France . This project is about reconstructing non-rigid motion of human bodies captured by RGB-D cameras..
Weakly-supervised human 3D body shape estimation from single images
2020.09~2021.08, 代表者:Jane Wu, Standford, U.S.A
We are working on a solution to learn to estimate 3D shape of human bodies from 2D observation in an unsupervised manner..
Personalized avatars with real emotions for next generation holoportation systems
2020.01~2021.01, 代表者:Diego Thomas, Kyushu University, Microsoft Research Asia
Personalized avatars are the key towards more natural communication in the virtual space. If you can express yourself with not only your own voice, but your own body, expressions or emotions it allows you to better communicate. This is also a powerful way to avoid being cheated by fake characters. And there is a huge demand for real avatars and emotes, with a big business opportunity. When communicating in the virtual space it is important to transmit real expressions and real emotions, but it is also important to keep the possibility to remain anonymous. While ultra-realistic avatars that have someone’s own appearance, skin and face will surely break anonymity, body motion and gesture can convey a large part of real expressions and emotions without revealing a person’s identity. In this project, we aim at capturing full body 3D motion and fine gestures and re-targeting them into a mixed reality telepresence system (also called holoportation) deployed on the Microsoft Hololens. To achieve our objective there are three main challenges to tackle: (1) detailed 3D motion of the human body must be captured from standard RGB cameras; (2) the human motion must be faithfully re-targeted to a virtual avatar, which may have different animation characteristics than the human; (3) the avatar must be displayed in 3D with the Hololens while considering the surrounding illumination conditions. Fundamental findings unveiled in the project will provide new insights for human motion estimation, re-targeting to other bodies with different kinematics and environment mapping with mixed reality devices..
Unifying multiple RGB and depth cameras for real-time large-scale dynamic 3D modeling with unmanned micro aerial vehicles.
2019.04~2021.04, 代表者:Diego Thomas, Kyushu University, KAKENHI
このプロジェクトは、無人のマイクロ航空車両からの大規模な動的シーンのリアルタイム三次元復元に関する研究です。目的は、大規模な動的三次元シーンのリアルタイム三次元復元のために、複数のマイクロ航空機に搭載された複数のRGB画像と距離画像の融合を調査することです。ここで公開される基本的な手法は、大規模な動的三次元モデルを作成し、リアルタイムで動的な三次元シーンを理解するために必要なアルゴリズムを提供するために使用されます。.
Facial motion capture
2017.10~2018.09, 代表者:Sun Fujiang President of Huawei Japan Research Center , Huawei Technologies Japan K.K (China).
This project is divided into three stages, the first stage is that roughly evaluates our base algorithm, and the second stage is that evaluates the robustness for overall reconstruction (expression) ability of the facial impression transfer to any 3D avatar by any person. And the third stage is that improves facial model quality (as for providing complete facial model, we need to add eye ball and mouth)..
研究業績
主要原著論文
1. Hayato Onizuka, Zehra Hayirci, Diego Thomas, Akihiro Sugimoto, Hideaki Uchiyama, Rin-ichiro Taniguchi, TetraTSDF: 3D human reconstruction from a single image with a tetrahedral outer shell, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, page: 6011-6020, 2020.06, Recovering the 3D shape of a person from its 2D appearance is ill-posed due to ambiguities. Nevertheless, with the help of convolutional neural networks (CNN) and prior knowledge on the 3D human body, it is possible to overcome such ambiguities to recover detailed 3D shapes of human bodies from single images. Current solutions, however, fail to reconstruct all the details of a person wearing loose clothes. This is because of either (a) huge memory requirement that cannot be maintained even on modern GPUs or (b) the compact 3D representation that cannot encode all the details. In this paper, we propose the tetrahedral outer shell volumetric truncated signed distance function (TetraTSDF) model for the human body, and its corresponding part connection network (PCN) for 3D human body shape regression. Our proposed model is compact, dense, accurate, and yet well suited for CNN-based regression task. Our proposed PCN allows us to learn the distribution of the TSDF in the tetrahedral volume from a single image in an end-to-end manner. Results show that our proposed method allows to reconstruct detailed shapes of humans wearing loose clothes from single RGB images..
2. Akihiko Sayo, Hayato Onizuka, Diego Thomas, Yuta Nakashima, Hiroshi Kawasaki, and Katsushi Ikeuchi, Human shape reconstruction with loose clothes from partially observed data by pose specific deformation, he 9th Pacific-Rim Symposium on Image and Video Technology, 2019.11, Recent approaches for full-bodyreconstruction use a statistical shape model, which is built upon accu-rate full-body scans of people in skin-tight clothes. Such a model can befitted to a point cloud of a person wearing loose clothes, however, it can-not represent the detailed shape of loose clothes, such as wrinkles and/orfolds. In this paper, we propose a method that reconstructs 3D modelof full-body human with loose clothes by reproducing the deformationsas displacements from the skin-tight body mesh. We take advantage ofa statistical shape model as base shape of full-body human mesh, andthen, obtain displacements from the base mesh by non-rigid registration.To efficiently represent such displacements, we use lower dimensional em-beddings of the deformations. This enables us to regress the coefficientscorresponding to the small number of bases. We also propose a methodto reconstruct shape only from a single 3D scanner, which is realized byshape fitting to only visible meshes as well as intra-frame shape inter-polation. Our experiments with both unknown scene and partial bodyscans confirm the reconstruction ability of our proposed method..
3. Diego Thomas, Ekaterina Sirazitdinova, Akihiro Sugimoto, Rin-ichiro Taniguchi, Revisiting Depth Image Fusion with Variational Message Passing, International conference on 3D vison 2019., 2019.09, The running average approach has long been perceived as the best choice for fusing depth measurements captured by a consumer-grade RGB-D camera into a global 3D model. This strategy, however, assumes exact correspondences between points in a 3D model and points in the captured RGB-D images. Such assumption does not hold true in many cases because of errors in motion tracking, noise, occlusions, or inconsistent surface sampling during measurements. Accordingly, reconstructed 3D models suffer unpleasant visual artifacts. In this paper, we visit the depth fusion problem from a probabilistic viewpoint and formulate it as a probabilistic optimization using variational message passing in a Bayesian network. Our formulation enables us to fuse depth images robustly, accurately, and fast for high quality RGB-D keyframe creation, even if exact point correspondences are not always available. Our formulation also allows us to smoothly combine depth and color information for further improvements without increasing computational speed. The quantitative and qualitative comparative evaluation on built keyframes of indoor scenes show that our proposed framework achieves promising results for reconstructing accurate 3D models while using low computational power and being robust against misalignment errors without post-processing..
4. Remy Maxence, Hideaki Uchiyama, Hiroshi Kawasaki, Diego Thomas, Vincent Nozick, Hideo Saito, Dense 3D reconstruction by combining photometric stereo and key frame-based SLAM with a moving smartphone and its flashlight, International Conference on 3D vision, 2019.09, The standard photometric stereo is a technique to densely reconstruct objects’ surfaces using light variation under the assumption of a static camera with a moving light source. In this work, we use photometric stereo to reconstruct dense 3D scenes while moving the camera and the light altogether. In such non-static case, camera poses as well as correspondences between pixels of each frame to apply photometric stereo are required. ORB-SLAM is a technique that can be used to estimate camera poses. To retrieve correspondences, our idea is to start from a sparse 3D mesh obtained with ORB SLAM and then densify the mesh by a plane sweep method using a multi-view photometric consistency. By combining ORB-SLAM and photometric stereo, it is possible to reconstruct dense 3D scenes with a off-the-shelf smartphone and its embedded torchlight. Note that SLAM systems usually struggle with textureless object, which is effectively compensated by the photometric stereo in our method. Experiments are conducted to show that our proposed method gives better results than SLAM alone or COLMAP, especially for partially textureless surfaces..
5. Hayato Onizuka, Diego Thomas, Hideaki Uchiyama, Rin-ichiro Taniguchi, Landmark-guided deformation transfer of template facial expressions for automatic generation of avatar blendshapes, Proceedings of the IEEE International Conference on Computer Vision Workshops, pages 0-0, 2019.09, Blendshape models are commonly used to track and re-target facial expressions to virtual avatars using RGB-D cameras and without using any facial marker. When using blendshape models, the target avatar model must possess a set of key-shapes that can be blended depending on the estimated facial expression. Creating realistic set of key-shapes is extremely difficult and requires time and professional expertise. As a consequence, blendshape-based re-targeting technology can only be used with a limited amount of pre-built avatar models, which is not attractive for the large public. In this paper, we propose an automatic method to easily generate realistic key-shapes of any avatar that map directly to the source blendshape model (the user is only required to select a few facial landmarks on the avatar mesh). By doing so, captured facial motion can be easily re-targeted to any avatar, even when the avatar has largely different shape and topology compared with the source template mesh. Our experimental results show the accuracy of our proposed method compared with the state-of-the-art method for mesh deformation transfer..
6. Shih Hsuan Yao, Diego Thomas, Akihiro Sugimoto, Shang-Hong Lai, Rin-Ichiro Taniguchi Kyushu, SegmentedFusion: 3D human body reconstruction using stitched bounding boxes, 2018 International Conference on 3D Vision (3DV), pages 190-198, 2018.09, This paper presents SegmentedFusion, a method possessing the capability of reconstructing non-rigid 3D models of a human body by using a single depth camera with skeleton information. Our method estimates a dense volumetric 6D motion field that warps the integrated model into the live frame by segmenting a human body into different parts and building a canonical space for each part. The key feature of this work is that a deformed and connected canonical volume for each part is created, and it is used to integrate data. The dense volumetric warp field of one volume is represented efficiently by blending a few rigid transformations. Overall, SegmentedFusion is able to scan a non-rigidly deformed human surface as well as to estimate the dense motion field by using a consumer-grade depth camera. The experimental results demonstrate that SegmentedFusion is robust against fast inter-frame motion and topological changes. Since our method does not require prior assumption, SegmentedFusion can be applied to a wide range of human motions..
7. Diego Thomas, Rin-Ichiro Taniguchi, Augmented blendshapes for real-time simultaneous 3d head modeling and facial motion capture, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3299-3308, 2016.06,
We propose a method to build in real-time animated 3D head models using a consumer-grade RGB-D camera. Our framework is the first one to provide simultaneously com- prehensive facial motion tracking and a detailed 3D model of the user’s head. Anyone’s head can be instantly recon- structed and his facial motion captured without requiring any training or pre-scanning. The user starts facing the camera with a neutral expression in the first frame, but is free to move, talk and change his face expression as he wills otherwise. The facial motion is tracked using a blendshape representation while the fine geometric details are captured using a Bump image mapped over the template mesh. We propose an efficient algorithm to grow and refine the 3D model of the head on-the-fly and in real-time. We demon- strate robust and high-fidelity simultaneous facial motion tracking and 3D head modeling results on a wide range of subjects with various head poses and facial expressions. Our proposed method offers interesting possibilities for an- imation production and 3D video telecommunications..
8. Diego Thomas, Akihiro Sugimoto, Range Image Registration Using a Photometric Metric under Unknown Lighting, IEEE transactions on pattern analysis and machine intelligence, pages 2252-2269, 2013.09, Based on the spherical harmonics representation of image formation, we derive a new photometric metric for evaluating the correctness of a given rigid transformation aligning two overlapping range images captured under unknown, distant, and general illumination. We estimate the surrounding illumination and albedo values of points of the two range images from the point correspondences induced by the input transformation. We then synthesize the color of both range images using albedo values transferred using the point correspondences to compute the photometric reprojection error. This way allows us to accurately register two range images by finding the transformation that minimizes the photometric reprojection error. We also propose a practical method using the proposed photometric metric to register pairs of range images devoid of salient geometric features, captured under unknown lighting. Our method uses a hypothesize-and-test strategy to search for the transformation that minimizes our photometric metric. Transformation candidates are efficiently generated by employing the spherical representation of each range image. Experimental results using both synthetic and real data demonstrate the usefulness of the proposed metric..
主要学会発表等
1. Hayato Onizuka, Zehra Hayirci, Diego Thomas, Akihiro Sugimoto, Hideaki Uchiyama, Rin-ichiro Taniguchi, TetraTSDF: 3D human reconstruction from a single image with a tetrahedral outer shell, IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020.06.
2. Akihiko Sayo, Hayato Onizuka, Diego Thomas, Yuta Nakashima, Hiroshi Kawasaki, Katsushi Ikeuchi, Human shape reconstruction with loose clothes from partially observed data by pose specific deformation, Pacific-Rim Symposium on Image and Video Technology, 2019.11.
学会活動
所属学会名
IEEE
学会大会・会議・シンポジウム等における役割
2024.08.06~2024.08.09, 第27回 画像の認識・理解シンポジウム MIRU2024, Area chair.
2022.02.22~2022.03.01, AAAI 2022, Senior Program Committee.
2022.06.19~2022.06.24, CVPR2022, Program commitee.
2021.03~2021.05.19, WACV 2021, Program committee.
2021.08~2021.05.19, 30th International Joint Conference on Artificial Intelligence (IJCAI-21), Senior Program Committee Member.
2021.06~2021.05.19, CVPR 2021, Program committee.
2021.03~2021.05.19, 情報処理学会第83回全国大会, 講演座長.
2020.11.25~2020.11.28, 3D Vision (3DV 2020), Local chair.
2020.06.14~2020.06.19, CVPR 2020, Program committee .
2020.03.01~2020.03.05, WACV 2020, Program committee .
2019.11.20~2019.11.23, Machine Perception and Robotics (MPR 2019), Program chair.
2019.11.18~2019.11.22, The 9th Pacific-Rim Symposium on Image and Video Technology (PSIVT 2019), Area chair.
2018.05.14~2018.05.15, The 12th International Workshop on Information Search, Integration, and Personalization (ISIP2018), Publicity chair.
2018.05.14~2018.05.15, The 12th International Workshop on Information Search, Integration, and Personalization (ISIP 2018), Publicity chair.
2017.09.25~2017.09.26, JFLI-KYUDAI JOINT WORKSHOP ON INFORMATICS, Local arrangement chair.
2016.11.27~2016.12.01, SITIS2016, Program Committee.
2016.08.01~2016.08.04, MIRU2016, Program Committee.
2015.07.27~2015.07.30, MIRU2015, Program Committee.
2014.07.28~2014.07.31, MIRU2014, Program committee.
学術論文等の審査
年度 外国語雑誌査読論文数 日本語雑誌査読論文数 国際会議録査読論文数 国内会議録査読論文数 合計
2023年度   25  31 
2022年度   18  29 
2020年度 25  35 
2019年度 15    25    40 
2015年度   12    13 
2016年度   19  22 
2017年度 10    24    34 
2018年度 20    20    40 
その他の研究活動
海外渡航状況, 海外での教育研究歴
Center for Machine Perception (CMP), CzechRepublic, 2010.02~2010.02.
INRIA Grenoble, France, 2016.12~2016.12.
Microsoft Research Asia, China, 2011.03~2011.07.
受賞
Best poster presentation award, Machine Perception and Robotics (MPR 2019), 2019.11.
Best paper award, The 9th Pacific-Rim Symposium on Image and Video Technology (PSIVT 2019), 2019.11.
Best poster award, IW-FCV2019, 2019.02.
Outstanding research achievement and contribution to ASPCIT 2019 Annual Meeting Invited Presentation, Asia Pacific Society for Computing and Information Technology, 2019.07.
Outstanding reviewer, MIRU 2015, 2015.07.
Best student award, National Insitute of Informatics, 2012.03.
研究資金
科学研究費補助金の採択状況(文部科学省、日本学術振興会)
2023年度~2025年度, 基盤研究(B), 代表, A new data-driven approach to bring humanity into virtual worlds with computer vision.
2019年度~2020年度, 若手研究, 代表, Unifying multiple RGB and depth cameras for real-time large-scale dynamic 3D modeling with unmanned micro aerial vehicles.
2015年度~2017年度, 特別研究員奨励費, 代表, Large-scale and dynamic 3D reconstruction using an RGB-D camera.
日本学術振興会への採択状況(科学研究費補助金以外)
2022年度~2023年度, JSPS Invitational Fellowships for Research in Japan (short term), 代表, Multi-Camera 3D Pedestrian Detection with Domain Adaptation and Generalization.
共同研究、受託研究(競争的資金を除く)の受入状況
2020.04~2021.03, 代表, Human body 3D shape estimation, animation and gesture synthesis.
2021.06~2022.05, 代表, AI-based animation of 3D avatars..
2017.09~2018.08, 連携, Facial motion capture system.
学内資金・基金等への採択状況
2021年度~2022年度, QR Tsubasa (つばさプロジェクト), 代表, A new approach for supporting architectural works with virtual reality environments..
2020年度~2022年度, SENTAN-Q, 代表, 2 years training and international research.
2020年度~2020年度, QR Wakaba challenge, 代表, 3D shape estimation and motion retargeting from 2D videos for future Holoportation systems..
2017年度~2018年度, スタートアップ支援経費, 分担, Free-form dynamic 3D scene reconstruction at high resolution.

九大関連コンテンツ

pure2017年10月2日から、「九州大学研究者情報」を補完するデータベースとして、Elsevier社の「Pure」による研究業績の公開を開始しました。