Kyushu University Academic Staff Educational and Research Activities Database
Researcher information (To researchers) Need Help? How to update
Diego Gabriel Francis Thomas Last modified date:2023.06.06

Associate Professor / Graduate School of Information Science and Electrical Engineering
Department of Advanced Information Technology
Faculty of Information Science and Electrical Engineering

E-Mail *Since the e-mail address is not displayed in Internet Explorer, please use another web browser:Google Chrome, safari.
 Reseacher Profiling Tool Kyushu University Pure
Academic Degree
Ph. D, Master (France)
Country of degree conferring institution (Overseas)
Yes Bachelor Master
Field of Specialization
ORCID(Open Researcher and Contributor ID)
Total Priod of education and research career in the foreign country
Outline Activities
My research is related to the automatic 3D modeling of static and dynamic indoor scenes using consumer-grade RGB-D cameras. At first, I focused my research activities on the 3D registration task (i.e., how to align two parts of the same object captured from different viewpoint). Then I focused my research in deriving more suitable 3D models (i.e., 3D mathematical representations) for efficient fusion of videos of (aligned) depth images for large- scale, real-time 3D modeling of indoor scenes. Recently, my research aimed at handling animations in the 3D modeling process. More precisely, I focused my research theme on the animation and 3D modeling of the human body.

-Year 2017-Now: Assistant professor at Kyushu University, Fukuoka (Japan)
-Year 2015-2017: JSPS Post-doc research at Kyushu University, Fukuoka (Japan)
-Year 2012-2015: Post-doc researcher at the National Institute of Informatics, Tokyo (Japan).
-Year 2009-2012: Phd course at the National Institute of Informatics, Tokyo (Japan) as a student of SOKENDAI. (Best student award)
-Year 2005-2008: Master course at the ENSIMAG-INPG (Engineering school of Computer Science and Mathematics), Grenoble, option IRV (Image and Virtual Reality). Diploma with honors.
-Year 2003-2005: Two-year intensive course in science and mathematics at undergraduate level as preparation for competitive entrance exam to French engineering schools, lycee Champollion, in Grenoble.
Research Interests
  • Aerial-based outdoor 3D scene mapping
    keyword : aerial drone; RGB-D SLAM; outdoor scene
  • AI-based avatar animation synthesis
    keyword : deep learning; avatar animation; dense deformation; texture
  • 3D shape from a single image
    keyword : Deep learning, 3D shape estimation
  • Mediated Reality Agents for educational applications targeting young children
    keyword : Education support, Virtual Reality, Augmented Reality, Mixed Reality, Virtual Assistant, RGB-D camera.
  • High frame-rate 3D reconstruction with multiple cameras
    keyword : RGB-D camera; high frame rate; multi-view set-up; real time; distributed system; GPU optimization; volumetric reconstruction; fast and uncontrolled motion
  • Human body 3D reconstruction in dynamic scenes
    keyword : RGB-D camera; fast motion; skeleton; deforming bounding boxes; volumetric depth fusion; ICP; GPU optimization; large-scale scene
  • Facial 3D reconstruction and expression tracking
    keyword : RGB-D camera; Facial expression; Blendshape; Template mesh; Texturing; 3D modeling; Retargeting; Deviation mapping; Real-time.
  • 3D reconstruction of large-scale static indoor scenes using consumer-grade RGB-D cameras
    keyword : RGB-D camera; SLAM; Depth Fusion; 3D modeling; Camera tracking; Loop closure
Current and Past Project
  • We investigate the use of CVT to jointly optimize 3D shape, appearance and discretization of the 3D space for high definition 3D mesh reconstruction from multi-view images.
  • The project is about real-time 3D reconstruction of large-scale dynamic scenes (i.e., scenes containing one or more moving objects to be modeled, possibly with shape deformation) from unmanned micro aerial vehicles. The objective is to investigate fusion of multiple RGB and depth sensors mounted on multiple micro aerial vehicle for real-time 3D reconstruction of large-scale dynamic 3D scenes. Fundamental algorithms that will be unveiled here will be used to build large-scale dynamic 3D models and provide the necessary tools for real-time automatic dynamic 3D scene understanding.
  • This project is divided into three stages, the first stage is that roughly evaluates our base algorithm, and the second stage is that evaluates the robustness for overall reconstruction (expression) ability of the facial impression transfer to any 3D avatar by any person. And the third stage is that improves facial model quality (as for providing complete facial model, we need to add eye ball and mouth).
Academic Activities
1. Hayato Onizuka, Zehra Hayirci, Diego Thomas, Akihiro Sugimoto, Hideaki Uchiyama, Rin-ichiro Taniguchi, TetraTSDF: 3D human reconstruction from a single image with a tetrahedral outer shell, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, page: 6011-6020, 2020.06, Recovering the 3D shape of a person from its 2D appearance is ill-posed due to ambiguities. Nevertheless, with the help of convolutional neural networks (CNN) and prior knowledge on the 3D human body, it is possible to overcome such ambiguities to recover detailed 3D shapes of human bodies from single images. Current solutions, however, fail to reconstruct all the details of a person wearing loose clothes. This is because of either (a) huge memory requirement that cannot be maintained even on modern GPUs or (b) the compact 3D representation that cannot encode all the details. In this paper, we propose the tetrahedral outer shell volumetric truncated signed distance function (TetraTSDF) model for the human body, and its corresponding part connection network (PCN) for 3D human body shape regression. Our proposed model is compact, dense, accurate, and yet well suited for CNN-based regression task. Our proposed PCN allows us to learn the distribution of the TSDF in the tetrahedral volume from a single image in an end-to-end manner. Results show that our proposed method allows to reconstruct detailed shapes of humans wearing loose clothes from single RGB images..
2. Akihiko Sayo, Hayato Onizuka, Diego Thomas, Yuta Nakashima, Hiroshi Kawasaki, and Katsushi Ikeuchi, Human shape reconstruction with loose clothes from partially observed data by pose specific deformation, he 9th Pacific-Rim Symposium on Image and Video Technology, 2019.11, Recent approaches for full-bodyreconstruction use a statistical shape model, which is built upon accu-rate full-body scans of people in skin-tight clothes. Such a model can befitted to a point cloud of a person wearing loose clothes, however, it can-not represent the detailed shape of loose clothes, such as wrinkles and/orfolds. In this paper, we propose a method that reconstructs 3D modelof full-body human with loose clothes by reproducing the deformationsas displacements from the skin-tight body mesh. We take advantage ofa statistical shape model as base shape of full-body human mesh, andthen, obtain displacements from the base mesh by non-rigid registration.To efficiently represent such displacements, we use lower dimensional em-beddings of the deformations. This enables us to regress the coefficientscorresponding to the small number of bases. We also propose a methodto reconstruct shape only from a single 3D scanner, which is realized byshape fitting to only visible meshes as well as intra-frame shape inter-polation. Our experiments with both unknown scene and partial bodyscans confirm the reconstruction ability of our proposed method..
3. Diego Thomas, Ekaterina Sirazitdinova, Akihiro Sugimoto, Rin-ichiro Taniguchi, Revisiting Depth Image Fusion with Variational Message Passing, International conference on 3D vison 2019., 2019.09, The running average approach has long been perceived as the best choice for fusing depth measurements captured by a consumer-grade RGB-D camera into a global 3D model. This strategy, however, assumes exact correspondences between points in a 3D model and points in the captured RGB-D images. Such assumption does not hold true in many cases because of errors in motion tracking, noise, occlusions, or inconsistent surface sampling during measurements. Accordingly, reconstructed 3D models suffer unpleasant visual artifacts. In this paper, we visit the depth fusion problem from a probabilistic viewpoint and formulate it as a probabilistic optimization using variational message passing in a Bayesian network. Our formulation enables us to fuse depth images robustly, accurately, and fast for high quality RGB-D keyframe creation, even if exact point correspondences are not always available. Our formulation also allows us to smoothly combine depth and color information for further improvements without increasing computational speed. The quantitative and qualitative comparative evaluation on built keyframes of indoor scenes show that our proposed framework achieves promising results for reconstructing accurate 3D models while using low computational power and being robust against misalignment errors without post-processing..
4. Remy Maxence, Hideaki Uchiyama, Hiroshi Kawasaki, Diego Thomas, Vincent Nozick, Hideo Saito, Dense 3D reconstruction by combining photometric stereo and key frame-based SLAM with a moving smartphone and its flashlight, International Conference on 3D vision, 2019.09, The standard photometric stereo is a technique to densely reconstruct objects’ surfaces using light variation under the assumption of a static camera with a moving light source. In this work, we use photometric stereo to reconstruct dense 3D scenes while moving the camera and the light altogether. In such non-static case, camera poses as well as correspondences between pixels of each frame to apply photometric stereo are required. ORB-SLAM is a technique that can be used to estimate camera poses. To retrieve correspondences, our idea is to start from a sparse 3D mesh obtained with ORB SLAM and then densify the mesh by a plane sweep method using a multi-view photometric consistency. By combining ORB-SLAM and photometric stereo, it is possible to reconstruct dense 3D scenes with a off-the-shelf smartphone and its embedded torchlight. Note that SLAM systems usually struggle with textureless object, which is effectively compensated by the photometric stereo in our method. Experiments are conducted to show that our proposed method gives better results than SLAM alone or COLMAP, especially for partially textureless surfaces..
5. Hayato Onizuka, Diego Thomas, Hideaki Uchiyama, Rin-ichiro Taniguchi, Landmark-guided deformation transfer of template facial expressions for automatic generation of avatar blendshapes, Proceedings of the IEEE International Conference on Computer Vision Workshops, pages 0-0, 2019.09, Blendshape models are commonly used to track and re-target facial expressions to virtual avatars using RGB-D cameras and without using any facial marker. When using blendshape models, the target avatar model must possess a set of key-shapes that can be blended depending on the estimated facial expression. Creating realistic set of key-shapes is extremely difficult and requires time and professional expertise. As a consequence, blendshape-based re-targeting technology can only be used with a limited amount of pre-built avatar models, which is not attractive for the large public. In this paper, we propose an automatic method to easily generate realistic key-shapes of any avatar that map directly to the source blendshape model (the user is only required to select a few facial landmarks on the avatar mesh). By doing so, captured facial motion can be easily re-targeted to any avatar, even when the avatar has largely different shape and topology compared with the source template mesh. Our experimental results show the accuracy of our proposed method compared with the state-of-the-art method for mesh deformation transfer..
6. Shih Hsuan Yao, Diego Thomas, Akihiro Sugimoto, Shang-Hong Lai, Rin-Ichiro Taniguchi Kyushu, SegmentedFusion: 3D human body reconstruction using stitched bounding boxes, 2018 International Conference on 3D Vision (3DV), pages 190-198, 2018.09, This paper presents SegmentedFusion, a method possessing the capability of reconstructing non-rigid 3D models of a human body by using a single depth camera with skeleton information. Our method estimates a dense volumetric 6D motion field that warps the integrated model into the live frame by segmenting a human body into different parts and building a canonical space for each part. The key feature of this work is that a deformed and connected canonical volume for each part is created, and it is used to integrate data. The dense volumetric warp field of one volume is represented efficiently by blending a few rigid transformations. Overall, SegmentedFusion is able to scan a non-rigidly deformed human surface as well as to estimate the dense motion field by using a consumer-grade depth camera. The experimental results demonstrate that SegmentedFusion is robust against fast inter-frame motion and topological changes. Since our method does not require prior assumption, SegmentedFusion can be applied to a wide range of human motions..
7. Diego Thomas, Rin-Ichiro Taniguchi, Augmented blendshapes for real-time simultaneous 3d head modeling and facial motion capture, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3299-3308, 2016.06,
We propose a method to build in real-time animated 3D head models using a consumer-grade RGB-D camera. Our framework is the first one to provide simultaneously com- prehensive facial motion tracking and a detailed 3D model of the user’s head. Anyone’s head can be instantly recon- structed and his facial motion captured without requiring any training or pre-scanning. The user starts facing the camera with a neutral expression in the first frame, but is free to move, talk and change his face expression as he wills otherwise. The facial motion is tracked using a blendshape representation while the fine geometric details are captured using a Bump image mapped over the template mesh. We propose an efficient algorithm to grow and refine the 3D model of the head on-the-fly and in real-time. We demon- strate robust and high-fidelity simultaneous facial motion tracking and 3D head modeling results on a wide range of subjects with various head poses and facial expressions. Our proposed method offers interesting possibilities for an- imation production and 3D video telecommunications..
8. Diego Thomas, Akihiro Sugimoto, Range Image Registration Using a Photometric Metric under Unknown Lighting, IEEE transactions on pattern analysis and machine intelligence, pages 2252-2269, 2013.09, Based on the spherical harmonics representation of image formation, we derive a new photometric metric for evaluating the correctness of a given rigid transformation aligning two overlapping range images captured under unknown, distant, and general illumination. We estimate the surrounding illumination and albedo values of points of the two range images from the point correspondences induced by the input transformation. We then synthesize the color of both range images using albedo values transferred using the point correspondences to compute the photometric reprojection error. This way allows us to accurately register two range images by finding the transformation that minimizes the photometric reprojection error. We also propose a practical method using the proposed photometric metric to register pairs of range images devoid of salient geometric features, captured under unknown lighting. Our method uses a hypothesize-and-test strategy to search for the transformation that minimizes our photometric metric. Transformation candidates are efficiently generated by employing the spherical representation of each range image. Experimental results using both synthetic and real data demonstrate the usefulness of the proposed metric..
1. Hayato Onizuka, Zehra Hayirci, Diego Thomas, Akihiro Sugimoto, Hideaki Uchiyama, Rin-ichiro Taniguchi, TetraTSDF: 3D human reconstruction from a single image with a tetrahedral outer shell, IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020.06.
2. Akihiko Sayo, Hayato Onizuka, Diego Thomas, Yuta Nakashima, Hiroshi Kawasaki, Katsushi Ikeuchi, Human shape reconstruction with loose clothes from partially observed data by pose specific deformation, Pacific-Rim Symposium on Image and Video Technology, 2019.11.
Membership in Academic Society
  • IEEE
Educational Activities
I am teaching practically exercises of data science.
Program implementation for research subject of each student. We also provide individual guidance within the lecture time.
I guide students in the laboratory in their research for their Master and Bachelor theses.

I am teaching the class "Information Science" for the first year students of the University.
I am teaching the class "Programming in Python" for the first year students of the University.

I participate in the Kyuso project class of the University by giving a talk during the class.