Kyushu University Academic Staff Educational and Research Activities Database
List of Presentations
Diego Gabriel Francis Thomas Last modified dateļ¼š2024.03.29

Associate Professor / Graduate School of Information Science and Electrical Engineering / Department of Advanced Information Technology / Faculty of Information Science and Electrical Engineering


Presentations
1. Kazuto Ichimaru, Takaki Ikeda, Diego Thomas, Takafumi Iwaguchi, Hiroshi Kawasaki, ActiveNeuS: Neural Signed Distance Fields for Active Stereo, International Conference on 3D Vision, 2024.03.
2. Takumi Kitamura, Naoya Iwamoto, Hiroshi Kawasaki, Diego Thomas, A Two-step Approach for Interactive Animatable Avatars, COMPUTER GRAPHICS INTERNATIONAL 2023, 2023.08, We propose a two-step human body animation technique that generates pose-dependent detailed deformations in real-time on standard animation pipeline. In order to accomplish real-time animation, we utilize the template-based approach and represent pose-dependent deformations using 2D displacement maps. In order to generalize to totally new motions, we employ a two-step strategy: 1) the first step aligns the topology of the Skinned Multi-Person Linear Model (SMPL) [23] model to our proposed template model. 2) the second step models detailed clothes and muscles deformation for the specific motion. Our experimental results show that our proposed method can animate an avatar up to 30 times faster than baselines while keeping similar or even better level of details..
3. Takumi Kitamura, Hitoshi Teshima, Diego Thomas, Hiroshi Kawasaki, Refining OpenPose With a New Sports Dataset for Robust 2D Pose Estimation
, IEEE/CVF Winter Conference on Applications of Computer Vision, 2022.01, [URL], 3D marker-less motion capture can be achieved by triangulating estimated multi-views 2D poses. However, when the 2D pose estimation fails, the 3D motion capture also fails. This is particularly challenging for sports performance of athletes, which have extreme poses. In extreme poses (like having the head down) state-of-the-art 2D pose estimator such as OpenPose do not work at all. In this paper, we propose a new method to improve the training of 2D pose estimators for extreme poses by leveraging a new sports dataset and our proposed data augmentation strategy. Our results show significant improvements over previous methods for 2D pose estimation of athletes performing acrobatic moves, while keeping state-of-the-art performance on standard datasets..
4. Sun, Cheng, Diego Thomas, and Hiroshi Kawasaki, Unsupervised 3D Human Pose Estimation in Multi-view-multi-pose Video, 25th International Conference on Pattern Recognition (ICPR), 2021.01.
5. Hayato Onizuka, Zehra Hayirci, Diego Thomas, Akihiro Sugimoto, Hideaki Uchiyama, Rin-ichiro Taniguchi, TetraTSDF: 3D human reconstruction from a single image with a tetrahedral outer shell, IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020.06.
6. Akihiko Sayo, Hayato Onizuka, Diego Thomas, Yuta Nakashima, Hiroshi Kawasaki, Katsushi Ikeuchi, Human shape reconstruction with loose clothes from partially observed data by pose specific deformation, Pacific-Rim Symposium on Image and Video Technology, 2019.11.
7. Diego Thomas, 3D human body reconstruction using RGB-D camera, Asia Pacific Society for Computing and Information Technology 2019 Annual Meeting, 2019.07, Consumer grade RGB-D cameras have become the commodity tool to build dense 3D models of indoor scenes. Motivated by the strong demand to build high-fidelity personal 3D avatars, there is now many efforts done to use RGB-D cameras to automatically reconstruct high-quality 3D models of the human body. This is a very difficult task because the human body non-rigidly moves during the scanning process. How to simultaneously reconstruct the detailed 3D shape of the body while accurately tracking the non-rigid motion is the main challenge that all successful systems must solve. In addition, to be used in portable devices such as smartphones, solutions that require few memory consumption and low computational power are needed. In this talk, we will first briefly review existing successful strategies for real-time 3D human body reconstruction. Then, we will present our proposed solution for 3D human body reconstruction that is light in memory consumption and computational power. Our main idea here is to separate the full body non-rigid reconstruction into multiple nearly-rigid reconstructions of body parts that are tightly stitched together. .
8. Diego Thomas, VMPFusion: Variational Message Passing for dynamic 3D face reconstruction , IDS/JFLI workshop, 2018.05, In this talk I will describe a probabilistic approach for dynamic 3D face modeling using a consumer-grade RGB-D camera. In this research my goal is to formulate a strategy to fuse noisy 3D measurements captured with a Kinect camera into a 3D facial model without relying on explicit point correspondences. We propose to tackle this challenging problem with the Variational Message Passing (VMP) algorithm, which optimize a variational distribution using a message passing procedure on a graphical model. We show the validity of our formulation with real-data experiments..
9. David R. Walton, Diego Gabriel Francis Thomas, Anthony Steed, Akihiro Sugimoto, Synthesis of environment maps for mixed reality, 16th IEEE International Symposium on Mixed and Augmented Reality, ISMAR 2017, 2017.11, When rendering virtual objects in a mixed reality application, it is helpful to have access to an environment map that captures the appearance of the scene from the perspective of the virtual object. It is straightforward to render virtual objects into such maps, but capturing and correctly rendering the real components of the scene into the map is much more challenging. This information is often recovered from physical light probes, such as reflective spheres or fisheye cameras, placed at the location of the virtual object in the scene. For many application areas, however, real light probes would be intrusive or impractical. Ideally, all of the information necessary to produce detailed environment maps could be captured using a single device. We introduce a method using an RGBD camera and a small fisheye camera, contained in a single unit, to create environment maps at any location in an indoor scene. The method combines the output from both cameras to correct for their limited field of view and the displacement from the virtual object, producing complete environment maps suitable for rendering the virtual content in real time. Our method improves on previous probeless approaches by its ability to recover high-frequency environment maps. We demonstrate how this can be used to render virtual objects which shadow, reflect and refract their environment convincingly..
10. Francesco Verdoja, Diego Gabriel Francis Thomas, Akihiro Sugimoto, Fast 3D point cloud segmentation using supervoxels with geometry and color for 3D scene understanding, 2017 IEEE International Conference on Multimedia and Expo, ICME 2017, 2017.08, Segmentation of 3D colored point clouds is a research field with renewed interest thanks to recent availability of inexpensive consumer RGB-D cameras and its importance as an unavoidable low-level step in many robotic applications. However, 3D data's nature makes the task challenging and, thus, many different techniques are being proposed, all of which require expensive computational costs. This paper presents a novel fast method for 3D colored point cloud segmentation. It starts with supervoxel partitioning of the cloud, i.e., an oversegmentation of the points in the cloud. Then it leverages on a novel metric exploiting both geometry and color to iteratively merge the supervoxels to obtain a 3D segmentation where the hierarchical structure of partitions is maintained. The algorithm also presents computational complexity linear to the size of the input. Experimental results over two publicly available datasets demonstrate that our proposed method outperforms state-of-the-art techniques..
11. Diego Gabriel Francis Thomas, Rin-Ichiro Taniguchi, Augmented blendshapes for real-time simultaneous 3D head modeling and facial motion capture, 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, 2016, We propose a method to build in real-time animated 3D head models using a consumer-grade RGB-D camera. Our framework is the first one to provide simultaneously comprehensive facial motion tracking and a detailed 3D model of the user's head. Anyone's head can be instantly reconstructed and his facial motion captured without requiring any training or pre-scanning. The user starts facing the camera with a neutral expression in the first frame, but is free to move, talk and change his face expression as he wills otherwise. The facial motion is tracked using a blendshape representation while the fine geometric details are captured using a Bump image mapped over the template mesh. We propose an efficient algorithm to grow and refine the 3D model of the head on-the-fly and in real-time. We demonstrate robust and high-fidelity simultaneous facial motion tracking and 3D head modeling results on a wide range of subjects with various head poses and facial expressions. Our proposed method offers interesting possibilities for animation production and 3D video telecommunications..
12. Diego Thomas, Akihiro Sugimoto, 3D Modeling of Large-Scale Indoor Scenes Using RGB-D Cameras, The 1st International Conference on Advanced Imaging, 2015.06.
13. Diego Gabriel Francis Thomas, Akihiro Sugimoto, A two-stage strategy for real-time dense 3D reconstruction of large-scale scenes, 13th European Conference on Computer Vision, ECCV 2014, 2015.01, The frame-to-global-model approach is widely used for accurate 3D modeling from sequences of RGB-D images. Because still no perfect camera tracking system exists, the accumulation of small errors generated when registering and integrating successive RGB-D images causes deformations of the 3D model being built up. In particular, the deformations become significant when the scale of the scene to model is large. To tackle this problem, we propose a two-stage strategy to build in details a large-scale 3D model with minimal deformations where the first stage creates accurate small-scale 3D scenes in real-time from short subsequences of RGB-D images while the second stage re-organises all the results from the first stage in a geometrically consistent manner to reduce deformations as much as possible. By employing planar patches as the 3D scene representation, our proposed method runs in real-time to build accurate 3D models with minimal deformations even for large-scale scenes. Our experiments using real data confirm the effectiveness of our proposed method..
14. Diego Thomas, Dense 3D reconstruction using RGB-D cameras, International Conference on 3DVision 2014, 2014.12, The generation of fine 3D models from RGB-D (color plus depth) measurements is of great interest for the computer vision community. Although the 3D reconstruction pipeline has been widely studied in the last decades, a new era has started recently with the advent of low cost consumer depth cameras (called RGB-D cameras) that capture RGB-D images at a video rate (e.g., Microsoft Kinect or Asus Xtion Pro). The introduction to the public of 3D measurements has brought its own revolution to the scientific community with many projects and applications using RGB-D cameras.

In this tutorial, we will give an overview of the existing 3D reconstruction methods using a single RGB-D camera using various 3D representations, including point based representations (SURFELS), implicit volumetric representations (TSDF), patch based representations and parametric representations. These different 3D scene representations give us powerful tools to build virtual representations of the real world in real-time from RGB-D cameras. We can not only reconstruct small-scale static scenes but also large-scale scenes and dynamic scenes. We will also discuss about current trend in depth sensing and future challenges for 3D scene reconstruction..
15. Michael Firman, Diego Gabriel Francis Thomas, Simon Julier, Akihiro Sugimoto, Learning to discover objects in RGB-D images using correlation clustering, 2013 26th IEEE/RSJ International Conference on Intelligent Robots and Systems: New Horizon, IROS 2013, 2013.12, We introduce a method to discover objects from RGB-D image collections which does not require a user to specify the number of objects expected to be found. We propose a probabilistic formulation to find pairwise similarity between image segments, using a classifier trained on labelled pairs from the recently released RGB-D Object Dataset. We then use a correlation clustering solver to both find the optimal clustering of all the segments in the collection and to recover the number of clusters. Unlike traditional supervised learning methods, our training data need not be of the same class or category as the objects we expect to discover. We show that this parameter-free supervised clustering method has superior performance to traditional clustering methods..
16. Diego Gabriel Francis Thomas, Akihiro Sugimoto, A flexible scene representation for 3D reconstruction using an RGB-D camera, 2013 14th IEEE International Conference on Computer Vision, ICCV 2013, 2013, Updating a global 3D model with live RGB-D measurements has proven to be successful for 3D reconstruction of indoor scenes. Recently, a Truncated Signed Distance Function (TSDF) volumetric model and a fusion algorithm have been introduced (KinectFusion), showing significant advantages such as computational speed and accuracy of the reconstructed scene. This algorithm, however, is expensive in memory when constructing and updating the global model. As a consequence, the method is not well scalable to large scenes. We propose a new flexible 3D scene representation using a set of planes that is cheap in memory use and, nevertheless, achieves accurate reconstruction of indoor scenes from RGB-D image sequences. Projecting the scene onto different planes reduces significantly the size of the scene representation and thus it allows us to generate a global textured 3D model with lower memory requirement while keeping accuracy and easiness to update with live RGB-D measurements. Experimental results demonstrate that our proposed flexible 3D scene representation achieves accurate reconstruction, while keeping the scalability for large indoor scenes..
17. Pavan Kumar Anasosalu, Diego Gabriel Francis Thomas, Akihiro Sugimoto, Compact and accurate 3-D face modeling using an RGB-D camera
Let's open the door to 3-D video conference, 2013 14th IEEE International Conference on Computer Vision Workshops, ICCVW 2013, 2013, We present a method for producing an accurate and compact 3-D face model in real time using a low cost RGB-D sensor like the Kinect camera. We extend and use Bump Images for highly accurate and low memory consumption 3-D reconstruction of the human face. Bump Images are generated by representing the Cartesian coordinates of points on the face in the spherical coordinate system whose origin is the center of the head. After initialization, the Bump Images are updated in real time with every RGB-D frame with respect to the current viewing direction and head pose that are estimated using the frame-to-global-model registration strategy. While high accuracy of the representation allows to recover fine details, low memory use opens new possible applications of consumer depth cameras such as 3-D video conferencing. We validate our approach by quantitatively comparing our result with the result obtained by a commercial high resolution laser scanner. We also discuss the potential of our proposed method for a 3-D video conferencing application with existing internet speeds..
18. Diego Gabriel Francis Thomas, Akihiro Sugimoto, Illumination-free photometric metric for range image registration, 2012 IEEE Workshop on the Applications of Computer Vision, WACV 2012, 2012, This paper presents an illumination-free photometric metric for evaluating the goodness of a rigid transformation aligning two overlapping range images, under the assumption of Lambertian surface. Our metric is based on photometric re-projection error but not on feature detection and matching. We synthesize the color of one image using albedo of the other image to compute the photometric re-projection error. The unknown illumination and albedo are estimated from the correspondences induced by the input transformation using the spherical harmonics representation of image formation. This way allows us to derive an illumination-free photometric metric for range image alignment. We use a hypothesize-and-test method to search for the transformation that minimizes our illumination-free photometric function. Transformation candidates are efficiently generated by employing the spherical representation of each image. Experimental results using synthetic and real data show the usefulness of the proposed metric..
19. Diego Gabriel Francis Thomas, Yasuyuki Matsushita, Akihiro Sugimoto, Robust simultaneous 3D registration via rank minimization, 2nd Joint 3DIM/3DPVT Conference: 3D Imaging, Modeling, Processing, Visualization and Transmission, 3DIMPVT 2012, 2012, We present a robust and accurate 3D registration method for a dense sequence of depth images taken from unknown viewpoints. Our method simultaneously estimates multiple extrinsic parameters of the depth images to obtain a registered full 3D model of the scanned scene. By arranging the depth measurements in a matrix form, we formulate the problem as a simultaneous estimation of multiple extrinsics and a low-rank matrix, which corresponds to the aligned depth images as well as a sparse error matrix. Unlike previous approaches that use sequential or heuristic global registration approaches, our solution method uses an advanced convex optimization technique for obtaining a robust solution via rank minimization. To achieve accurate computation, we develop a depth projection method that has minimum sensitivity to sampling by reading projected depth values in the input depth images. We demonstrate the effectiveness of the proposed method through extensive experiments and compare it with previous standard techniques..
20. Diego Gabriel Francis Thomas, Akihiro Sugimoto, Robust range image registration using local distribution of albedo, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops 2009, 2009, We propose a robust registration method for range images under a rough estimate of illumination. Because reflectance properties are invariant to changes in illumination, they are promising to range image registration of objects lacking in discriminative geometric features under variable illumination. In our method, we use adaptive regions to model the local distribution of reflectance, which enables us to stably extract reliable attributes of each point against illumination estimation. We use a level set method to grow robust and adaptive regions to define these attributes. A similarity metric between two attributes is defined using the principal component analysis to find matches. Moreover, remaining mismatches are efficiently removed using the rigidity constraint of surfaces. Our experiments using synthetic and real data demonstrate the robustness and effectiveness of our proposed method..