研究者詳細 - THOMAS DIEGO GABRIEL FRANCIS

お知らせ

写真a

トマ　ディエゴ　ガブリエル　フランシス

THOMAS DIEGO GABRIEL FRANCIS

THOMAS GABRIEL FRANCIS DIEGO

所属

システム情報科学研究院情報知能工学部門准教授
工学部電気情報工学科（併任）
システム情報科学府情報理工学専攻（併任）

連絡先

電話番号

0928023596

プロフィール

私の研究は、RGB-D SLAM、少数の画像からのAIベースの3D形状推定、多視点による3D人体キャプチャを含む、3D再構築とデジタルヒューマンに関連しています。平面パッチと安価な深度センサーを用いたメモリ効率の高い3Dシーン再構築技術を開発し、屋内環境の高精度マッピングを実現しました。深層ニューラルネットワークを活用して、単一の2D画像から詳細な3D人体形状を再構築する手法を提案し、さらにこれをスパースビューからの教師なし形状推定へと拡張しました。ニューラルシェイプ表現に関する研究により、多視点画像からの高精度な3D形状推定を進展させました。加えて、共同研究では、多視点画像からミリ単位の精度で3D人体をキャプチャし、リアルタイムの自由視点レンダリングを可能にしています。

ホームページ

https://diegothomas.github.io/DigitalHumans-lab/

外部リンク

研究分野

情報通信 / ヒューマンインタフェース、インタラクション

学位

修士（フランス）
博士

経歴

九州大学准教授

2022年12月 - 現在
九州大学助教

2017年4月 - 2022年11月
I was a post doc researcher at the National Institute of Informatics from April 2012 to March 2015.

学歴

九州大学 Digital Humans 1-2

2023年4月
九州大学 Information Science

2023年10月

研究テーマ・研究キーワード

研究テーマ： multimodal data to 3D motion generation

研究キーワード： motion diffusion model

研究期間： 2024年9月 - 現在
研究テーマ：デジタルヒューマン

研究キーワード： generative AI; 3D and 4D capture; motion retargeting; gesture

研究期間： 2023年1月
研究テーマ： multiview 3D reconstruction

研究キーワード： 3D reconstruction; differentiable rendering

研究期間： 2022年8月 - 現在
研究テーマ： AI-based avatar animation synthesis

研究キーワード： deep learning; avatar animation; dense deformation; texture

研究期間： 2021年6月 - 2022年6月
研究テーマ： Aerial-based outdoor 3D scene mapping

研究キーワード： aerial drone; RGB-D SLAM; outdoor scene

研究期間： 2020年4月 - 2022年4月
研究テーマ：一枚の画像からの3D形状

研究キーワード： Deep learning, 3D shape estimation

研究期間： 2019年4月 - 2021年8月
研究テーマ：幼児教育のためにバーチャルアシスタント

研究キーワード：三次元シーン理解、教育アプリケーション、三次元バーチャルアシスタント

研究期間： 2018年5月 - 2020年6月
研究テーマ：複数のカメラで高フレームレート3D再構成

研究キーワード： RGB-D camera; high frame rate; multi-view set-up; real time; distributed system; GPU optimization; volumetric reconstruction; fast and uncontrolled motion

研究期間： 2017年12月 - 2018年2月
研究テーマ：動的シーンにおける人体の3次元再構成

研究キーワード： RGB-D camera; fast motion; skeleton; deforming bounding boxes; volumetric depth fusion; ICP; GPU optimization; large-scale scene

研究期間： 2017年4月 - 2018年2月
研究テーマ：顔の3次元再構成と表現の追跡

研究キーワード： RGB-D camera; Facial expression; Blendshape; Template mesh; Texturing; 3D modeling; Retargeting; Deviation mapping; Real-time.

研究期間： 2015年4月 - 2018年2月
研究テーマ：３次元モデリング

研究キーワード： RGB-D カメラ; SLAM; ３次元モデリング

研究期間： 2012年4月 - 2017年4月

受賞

MIRU長尾賞

2024年8月 Meeting on Image Recognition and Understanding (MIRU) 3D Shape Modeling with Adaptive Centroidal Voronoi Tesselation on Signed Distance Field

Diego Thomas (Kyusyu University), Jean-Sebastien Franco (INRIA), Edmond Boyer (INRIA)

　詳細を見る

受賞区分：国内学会・会議・シンポジウム等の賞受賞国：日本国

受賞理由: 多視点画像を用いた３次元再構成問題において、適応的重心ボロノイ分割を用いた新たなニューラル場の表現による手法を提案する研究であり、任意の法線を持つ物体表面形状を直接表現できるという特徴を活かして、従来の座標軸方向に拘束された離散化に基づく手法と比較して少ない分割数で高い再構成精度を実現している。高速な最適化手法および微分可能レンダリングなど実装上の工夫も提案しており、アイデアと実装双方において完成度が高く、MIRU長尾賞にふさわしい論文である。
MIRU優秀賞

2024年8月 Meeting on Image Recognition and Understanding (MIRU) Text-Guided Diverse Scene Interaction Synthesis by Disentangling Actions from Scenes

Hitoshi Teshima (Kyushu University), Naoki Wake (Microsoft), Diego Thomas (Kyushu University), Yuta Nakashima (Osaka University), Hiroshi Kawasaki (Kyushu University), Katsushi Ikeuchi (Microsoft)

　詳細を見る

受賞国：日本国

受賞理由: シーン中オブジェクトとのインタラクションを伴う動作生成という挑戦的課題に取り組んだ研究である。このため、動作指示のみからの動作生成結果から取り出したKey Poseとシーンの接合を既存の基盤モデルで推定した上で、そこに至る軌跡を生成し動作生成するパイプラインにより、シーン情報が含まれない既存データセットのみで学習させる現実的な方法を提案した。周辺環境に応じた動作生成という難しい問題設定に挑戦し有効性を確認した点は高い評価に値する。
Best paper award

2019年11月 The 9th Pacific-Rim Symposium on Image and Video Technology (PSIVT 2019) This prize was received after the presentation by Akihiko Sayo about his joint work on "Human shape reconstruction with loose clothes from partially observed data by pose specific deformation"
Best poster presentation award

2019年11月 Machine Perception and Robotics (MPR 2019) This prize was obtained after the presentation of Hayato Onizuka about his work on "Regression of 3D body shapes from a single image in a tetrahedral volume"
Outstanding research achievement and contribution to ASPCIT 2019 Annual Meeting Invited Presentation

2019年7月 Asia Pacific Society for Computing and Information Technology This prize was received after an invited talk at APSCIT 2019 organized in Sapporo.
Best poster award

2019年2月 IW-FCV2019 This prize was received after the poster presentation of Maxence Remy about the joint research "Merging SLAM and photometric stereo for 3D reconstruction with a moving camera"
Outstanding reviewer

2015年7月 MIRU 2015 Outstanding reviewer
Best student award

2012年3月 National Insitute of Informatics Best student award at the end of my Ph.D course.

▼全件表示

論文

ProbeSDF: Light Field Probes For Neural Surface Reconstruction 査読国際共著

B Toussaint, D Thomas, JS Franco

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2025) 11026 - 11035 2025年6月

　詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）
Millimetric Human Surface Capture in Minutes 査読国際共著

Briac Toussaint, Laurence Boissieux, Diego Thomas, Edmond Boyer, Jean-Sébastien Franco

SIGGRAPH Asia 2024 Conference Papers 1 - 12 2024年12月

　詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）
3D Shape Modeling with Adaptive Centroidal Voronoi Tesselation on Signed Distance Field 査読

Diego Thomas, Jean-Sébastien Franco, Edmond Boyer

Meeting on Image Recognition and Understanding 2024年8月

　詳細を見る

担当区分：筆頭著者,　責任著者記述言語：英語掲載種別：研究論文（研究会，シンポジウム資料等）
TetraTSDF: 3D human reconstruction from a single image with a tetrahedral outer shell 査読国際誌

Hayato Onizuka, Zehra Hayirci, Diego Thomas, Akihiro Sugimoto, Hideaki Uchiyama, Rin-ichiro Taniguchi

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2020年6月

　詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）

Recovering the 3D shape of a person from its 2D appearance is ill-posed due to ambiguities. Nevertheless, with the help of convolutional neural networks (CNN) and prior knowledge on the 3D human body, it is possible to overcome such ambiguities to recover detailed 3D shapes of human bodies from single images. Current solutions, however, fail to reconstruct all the details of a person wearing loose clothes. This is because of either (a) huge memory requirement that cannot be maintained even on modern GPUs or (b) the compact 3D representation that cannot encode all the details. In this paper, we propose the tetrahedral outer shell volumetric truncated signed distance function (TetraTSDF) model for the human body, and its corresponding part connection network (PCN) for 3D human body shape regression. Our proposed model is compact, dense, accurate, and yet well suited for CNN-based regression task. Our proposed PCN allows us to learn the distribution of the TSDF in the tetrahedral volume from a single image in an end-to-end manner. Results show that our proposed method allows to reconstruct detailed shapes of humans wearing loose clothes from single RGB images.
Human shape reconstruction with loose clothes from partially observed data by pose specific deformation 査読国際誌

#Akihiko Sayo, #Hayato Onizuka, Diego Thomas, Yuta Nakashima, Hiroshi Kawasaki, and Katsushi Ikeuchi

he 9th Pacific-Rim Symposium on Image and Video Technology 2019年11月

　詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）

Recent approaches for full-bodyreconstruction use a statistical shape model, which is built upon accu-rate full-body scans of people in skin-tight clothes. Such a model can befitted to a point cloud of a person wearing loose clothes, however, it can-not represent the detailed shape of loose clothes, such as wrinkles and/orfolds. In this paper, we propose a method that reconstructs 3D modelof full-body human with loose clothes by reproducing the deformationsas displacements from the skin-tight body mesh. We take advantage ofa statistical shape model as base shape of full-body human mesh, andthen, obtain displacements from the base mesh by non-rigid registration.To efficiently represent such displacements, we use lower dimensional em-beddings of the deformations. This enables us to regress the coefficientscorresponding to the small number of bases. We also propose a methodto reconstruct shape only from a single 3D scanner, which is realized byshape fitting to only visible meshes as well as intra-frame shape inter-polation. Our experiments with both unknown scene and partial bodyscans confirm the reconstruction ability of our proposed method.
Revisiting Depth Image Fusion with Variational Message Passing 査読国際誌

Diego Thomas, Ekaterina Sirazitdinova, Akihiro Sugimoto, Rin-ichiro Taniguchi

International conference on 3D vison 2019. 2019年9月

　詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）

The running average approach has long been perceived as the best choice for fusing depth measurements captured by a consumer-grade RGB-D camera into a global 3D model. This strategy, however, assumes exact correspondences between points in a 3D model and points in the captured RGB-D images. Such assumption does not hold true in many cases because of errors in motion tracking, noise, occlusions, or inconsistent surface sampling during measurements. Accordingly, reconstructed 3D models suffer unpleasant visual artifacts. In this paper, we visit the depth fusion problem from a probabilistic viewpoint and formulate it as a probabilistic optimization using variational message passing in a Bayesian network. Our formulation enables us to fuse depth images robustly, accurately, and fast for high quality RGB-D keyframe creation, even if exact point correspondences are not always available. Our formulation also allows us to smoothly combine depth and color information for further improvements without increasing computational speed. The quantitative and qualitative comparative evaluation on built keyframes of indoor scenes show that our proposed framework achieves promising results for reconstructing accurate 3D models while using low computational power and being robust against misalignment errors without post-processing.
Landmark-guided deformation transfer of template facial expressions for automatic generation of avatar blendshapes 査読国際誌

Hayato Onizuka, Diego Thomas, Hideaki Uchiyama, Rin-ichiro Taniguchi

Proceedings of the IEEE International Conference on Computer Vision Workshops 2019年9月

　詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）

Blendshape models are commonly used to track and re-target facial expressions to virtual avatars using RGB-D cameras and without using any facial marker. When using blendshape models, the target avatar model must possess a set of key-shapes that can be blended depending on the estimated facial expression. Creating realistic set of key-shapes is extremely difficult and requires time and professional expertise. As a consequence, blendshape-based re-targeting technology can only be used with a limited amount of pre-built avatar models, which is not attractive for the large public. In this paper, we propose an automatic method to easily generate realistic key-shapes of any avatar that map directly to the source blendshape model (the user is only required to select a few facial landmarks on the avatar mesh). By doing so, captured facial motion can be easily re-targeted to any avatar, even when the avatar has largely different shape and topology compared with the source template mesh. Our experimental results show the accuracy of our proposed method compared with the state-of-the-art method for mesh deformation transfer.
Dense 3D reconstruction by combining photometric stereo and key frame-based SLAM with a moving smartphone and its flashlight 査読国際誌

@Remy Maxence, Hideaki Uchiyama, Hiroshi Kawasaki, Diego Thomas, Vincent Nozick, Hideo Saito

International Conference on 3D vision 2019年9月

　詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）

The standard photometric stereo is a technique to densely reconstruct objects’ surfaces using light variation under the assumption of a static camera with a moving light source. In this work, we use photometric stereo to reconstruct dense 3D scenes while moving the camera and the light altogether. In such non-static case, camera poses as well as correspondences between pixels of each frame to apply photometric stereo are required. ORB-SLAM is a technique that can be used to estimate camera poses. To retrieve correspondences, our idea is to start from a sparse 3D mesh obtained with ORB SLAM and then densify the mesh by a plane sweep method using a multi-view photometric consistency. By combining ORB-SLAM and photometric stereo, it is possible to reconstruct dense 3D scenes with a off-the-shelf smartphone and its embedded torchlight. Note that SLAM systems usually struggle with textureless object, which is effectively compensated by the photometric stereo in our method. Experiments are conducted to show that our proposed method gives better results than SLAM alone or COLMAP, especially for partially textureless surfaces.
SegmentedFusion: 3D human body reconstruction using stitched bounding boxes 査読国際誌

Shih Hsuan Yao, Diego Thomas, Akihiro Sugimoto, Shang-Hong Lai, Rin-Ichiro Taniguchi Kyushu

2018 International Conference on 3D Vision (3DV) 2018年9月

　詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）

This paper presents SegmentedFusion, a method possessing the capability of reconstructing non-rigid 3D models of a human body by using a single depth camera with skeleton information. Our method estimates a dense volumetric 6D motion field that warps the integrated model into the live frame by segmenting a human body into different parts and building a canonical space for each part. The key feature of this work is that a deformed and connected canonical volume for each part is created, and it is used to integrate data. The dense volumetric warp field of one volume is represented efficiently by blending a few rigid transformations. Overall, SegmentedFusion is able to scan a non-rigidly deformed human surface as well as to estimate the dense motion field by using a consumer-grade depth camera. The experimental results demonstrate that SegmentedFusion is robust against fast inter-frame motion and topological changes. Since our method does not require prior assumption, SegmentedFusion can be applied to a wide range of human motions.
Augmented blendshapes for real-time simultaneous 3d head modeling and facial motion capture 査読国際誌

Diego Thomas, Rin-Ichiro Taniguchi

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2016年6月

　詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）

We propose a method to build in real-time animated 3D head models using a consumer-grade RGB-D camera. Our framework is the first one to provide simultaneously com- prehensive facial motion tracking and a detailed 3D model of the user’s head. Anyone’s head can be instantly recon- structed and his facial motion captured without requiring any training or pre-scanning. The user starts facing the camera with a neutral expression in the first frame, but is free to move, talk and change his face expression as he wills otherwise. The facial motion is tracked using a blendshape representation while the fine geometric details are captured using a Bump image mapped over the template mesh. We propose an efficient algorithm to grow and refine the 3D model of the head on-the-fly and in real-time. We demon- strate robust and high-fidelity simultaneous facial motion tracking and 3D head modeling results on a wide range of subjects with various head poses and facial expressions. Our proposed method offers interesting possibilities for an- imation production and 3D video telecommunications.
Range Image Registration Using a Photometric Metric under Unknown Lighting 査読国際誌

Diego Thomas, Akihiro Sugimoto

IEEE transactions on pattern analysis and machine intelligence 2013年9月

　詳細を見る

記述言語：英語掲載種別：研究論文（学術雑誌）

Based on the spherical harmonics representation of image formation, we derive a new photometric metric for evaluating the correctness of a given rigid transformation aligning two overlapping range images captured under unknown, distant, and general illumination. We estimate the surrounding illumination and albedo values of points of the two range images from the point correspondences induced by the input transformation. We then synthesize the color of both range images using albedo values transferred using the point correspondences to compute the photometric reprojection error. This way allows us to accurately register two range images by finding the transformation that minimizes the photometric reprojection error. We also propose a practical method using the proposed photometric metric to register pairs of range images devoid of salient geometric features, captured under unknown lighting. Our method uses a hypothesize-and-test strategy to search for the transformation that minimizes our photometric metric. Transformation candidates are efficiently generated by employing the spherical representation of each range image. Experimental results using both synthetic and real data demonstrate the usefulness of the proposed metric.
Neural SDF for Shadow-Aware Unsupervised Structured Light 国際共著

Kazuto Ichimaru, Diego Thomas, Takafumi Iwaguchi, Hiroshi Kawasaki

2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 287 - 296 2025年2月

　詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）
VortSDF: 3D Modeling with Centroidal Voronoi Tessellation on Signed Distance Field 査読国際共著

Diego Thomas, Briac Toussaint, Jean-Sebastien Franco, Edmond Boyer

2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 495 - 504 2025年2月

　詳細を見る

担当区分：筆頭著者,　責任著者記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）
Sparse-View 3D Reconstruction of Clothed Humans via Normal Maps 査読国際共著

Jane Wu, Diego Thomas, Ronald Fedkiw

2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 11 - 22 2025年2月

　詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）
Text-Guided Diverse Scene Interaction Synthesis by Disentangling Actions From Scenes

Teshima H., Wake N., Thomas D., Nakashima Y., Kawasaki H., Ikeuchi K.

IEEE Access 13 73818 - 73830 2025年

　詳細を見る

出版者・発行元：IEEE Access

Generating human motion within 3D scenes from textual descriptions remains a challenging task because of the scarcity of hybrid datasets encompassing text, 3D scenes, and motion. Existing approaches suffer from fundamental limitations: a lack of datasets that integrate text, 3D scenes, and motion, and a reliance on end-to-end methods, which constrain the diversity and realism of generated human-scene interactions. In this paper, we propose a novel method to generate motions of humans interacting with objects in a 3D scene given a textual prompt. Our key innovation focuses on decomposing the motion generation task into distinct steps: 1) generating key poses from textual and scene contexts and 2) synthesizing full motion trajectories guided by these key poses and path planning. This approach eliminates the need for hybrid datasets by leveraging independent text-motion and pose datasets, significantly expanding action diversity and overcoming the constraints of prior works. Unlike previous methods, which focus on limited action types or rely on scarce datasets, our approach enables scalable and adaptable motion generation. Through extensive experiments, we demonstrate that our framework achieves unparalleled diversity and contextually accurate motions, advancing the state-of-the-art in human-scene interaction synthesis.

DOI： 10.1109/ACCESS.2025.3562086

Scopus
VR and computer vision based facade complexity analysis for building design

Jarrin F., Koga Y., Thomas D.

Journal of Asian Architecture and Building Engineering 2025年（ ISSN:13467581 ）

　詳細を見る

出版者・発行元：Journal of Asian Architecture and Building Engineering

Architectural practice is evolving through digital fabrication, enabling complex designs that challenge the uniformity of barren walls and fully glazed facades that often dominate contemporary streetscapes. This paper addresses the challenge of quantifying complexity in architectural facade design. It asks whether a Virtual Reality (VR) and Computer Vision (CV) approach can effectively measure facade complexity and align with user perceptions. The study employs the Computational Image Complexity Analysis (CICA) system, integrating VR and CV algorithms, to assess reactions to various facade complexities. Results reveal an average standard deviation of 9% between the system’s complexity measurements and participants’ perceptions, with a preference for moderate complexity. These findings highlight the importance of aligning architectural complexity with user preferences to enhance sustainability and satisfaction and the potential of this approach to quantify complexity and guide data-driven building design processes. Future research should explore the long-term impact of complex facades on user well-being and environmental sustainability.

DOI： 10.1080/13467581.2025.2458791

Scopus
Fast direct multi-person radiance fields from sparse input with dense pose priors

Lima J.P., Uchiyama H., Thomas D., Teichrieb V.

Computers and Graphics Pergamon 124 2024年11月（ ISSN:00978493 ）

　詳細を見る

出版者・発行元：Computers and Graphics Pergamon

Volumetric radiance fields have been popular in reconstructing small-scale 3D scenes from multi-view images. With additional constraints such as person correspondences, reconstructing a large 3D scene with multiple persons becomes possible. However, existing methods fail for sparse input views or when person correspondences are unavailable. In such cases, the conventional depth image supervision may be insufficient because it only captures the relative position of each person with respect to the camera center. In this paper, we investigate an alternative approach by supervising the optimization framework with a dense pose prior that represents correspondences between the SMPL model and the input images. The core ideas of our approach consist in exploiting dense pose priors estimated from the input images to perform person segmentation and incorporating such priors into the learning of the radiance field. Our proposed dense pose supervision is view-independent, significantly speeding up computational time and improving 3D reconstruction accuracy, with less floaters and noise. We confirm the advantages of our proposed method with extensive evaluation in a subset of the publicly available CMU Panoptic dataset. When training with only five input views, our proposed method achieves an average improvement of 6.1% in PSNR, 3.5% in SSIM, 17.2% in LPIPS<sup>vgg</sup>, 19.3% in LPIPS<sup>alex</sup>, and 39.4% in training time.

DOI： 10.1016/j.cag.2024.104063

Scopus
Virtual reality-based site layout planning for building design

Jarrin F., Koga Y., Thomas D., Kawasaki H.

Automation in Construction 167 2024年11月（ ISSN:09265805 ）

　詳細を見る

出版者・発行元：Automation in Construction

This paper addresses the challenge of integrating optimized design solutions in Site Layout Planning (SLP) through Virtual Reality (VR), questioning how VR simulations can enhance their acceptance. The methodology involves a multi-objective optimization model that evaluates critical factors like earthwork volume, cost, and environmental impact, integrated into a VR framework for interactive participant evaluations. Results show a notable 48.3% increase in decision-making accuracy among participants using VR, highlighting VR's potential to significantly improve comprehension and application of complex data-driven designs in SLP. This underlines the transformative impact of VR on enhancing stakeholder engagement and optimizing design outcomes. Future research will broaden the participant base and further investigate the long-term effects of VR integration in professional environments.

DOI： 10.1016/j.autcon.2024.105690

Scopus
A Practical Calibration Method for Cameras and Multiple Line-Lasers in Light Sectioning Systems for Underwater Environments 査読

Takaki Ikeda, Takafumi Iwaguchi, Diego Thomas, Hiroshi Kawasaki

2024 IEEE International Conference on Image Processing (ICIP) 1602 - 1608 2024年10月

　詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）
Two-stage pose optimization algorithm using color information for underwater SLAM with light-sectioning-based 3D scanning method 査読

Takaki Ikeda, Takafumi Iwaguchi, Diego Thomas, Hiroshi Kawasaki

2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 10182 - 10189 2024年10月

　詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）
Mean Teacher for Unsupervised Domain Adaptation in Multi-View 3D Pedestrian Detection 査読国際共著

João Paulo Lima, Diego Thomas, Hideaki Uchiyama, Veronica Teichrieb

2024 37th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI) 1 - 6 2024年9月

　詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）
ActiveNeuS: Neural Signed Distance Fields for Active Stereo 査読国際誌

Kazuto Ichimaru, Takaki Ikeda, Diego Thomas, Takafumi Iwaguchi, Hiroshi Kawasaki

International Conference on 3D Vision 2024年3月

　詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）
Weakly-Supervised 3D Reconstruction of Clothed Humans via Normal Maps 国際誌

Wu, Jane, Diego Thomas, and Ronald Fedkiw

arXiv 2023年11月

　詳細を見る

記述言語：英語掲載種別：研究論文（学術雑誌）
A Two-step Approach for Interactive Animatable Avatars 査読国際誌

#Takumi Kitamura, Diego Thomas, Hiroshi Kawasaki, Naoya Iwamoto

COMPUTER GRAPHICS INTERNATIONAL 2023 2023年9月

　詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）

We propose a new two-step human body animation technique based on displacement mapping that can learn a detailed deformation space, works at interactive time (more than 30 fps) and can be directly integrated into standard animation environments. To achieve real-time animation we employ the template-based approach and model pose-dependent deformations with 2D displacement images. We propose our own template model to facilitate and automatize training data preparation. Key to achieve detailed animation with few artifacts is to learn pose-dependent displacements directly in the pose space, without having to predict skinning weights. In order to generalize to totally new motions we employ a two step approach where the first step contains knowledge about general human motion while second step contains information about user specific motion. Our experimental results show that our proposed method can animate an avatar up to 300 times faster than baselines while keeping similar or even better level of details.
ACT2G: Attention-based Contrastive Learning for Text-to-Gesture Generation

Teshima H., Wake N., Thomas D., Nakashima Y., Kawasaki H., Ikeuchi K.

Proceedings of the ACM on Computer Graphics and Interactive Techniques 6 ( 3 ) 2023年8月

　詳細を見る

出版者・発行元：Proceedings of the ACM on Computer Graphics and Interactive Techniques

Recent increase of remote-work, online meeting and tele-operation task makes people find that gesture for avatars and communication robots is more important than we have thought. It is one of the key factors to achieve smooth and natural communication between humans and AI systems and has been intensively researched. Current gesture generation methods are mostly based on deep neural network using text, audio and other information as the input, however, they generate gestures mainly based on audio, which is called a beat gesture. Although the ratio of the beat gesture is more than 70% of actual human gestures, content based gestures sometimes play an important role to make avatars more realistic and human-like. In this paper, we propose a attention-based contrastive learning for text-to-gesture (ACT2G), where generated gestures represent content of the text by estimating attention weight for each word from the input text. In the method, since text and gesture features calculated by the attention weight are mapped to the same latent space by contrastive learning, once text is given as input, the network outputs a feature vector which can be used to generate gestures related to the content. User study confirmed that the gestures generated by ACT2G were better than existing methods. In addition, it was demonstrated that wide variation of gestures were generated from the same text by changing attention weights by creators.

DOI： 10.1145/3606940

Scopus
ACT2G: Attention-based Contrastive Learning for Text-to-Gesture Generation 査読国際誌

#Teshima Hitoshi, Naoki Wake, Diego Thomas, Yuta Nakashima, Hiroshi Kawasaki, Katsushi Ikeuchi

ACM on Computer Graphics and Interactive Techniques 6 2023年8月

　詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）
Deep Gesture Generation for Social Robots Using Type-Specific Libraries 査読国際誌

Hitoshi Teshima, Naoki Wake, Diego Thomas, Yuta Nakashima, Hiroshi Kawasaki, Katsushi Ikeuchi

2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2022年10月

　詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）

Body language such as conversational gesture is a powerful way to ease communication. Conversational gestures do not only make a speech more lively but also contain semantic meaning that helps to stress important information in the discussion. In the field of robotics, giving conversational agents (humanoid robots or virtual avatars) the ability to properly use gestures is critical, yet remain a task of extraordinary difficulty. This is because given only a text as input, there are many possibilities and ambiguities to generate an appropriate gesture. Different to previous works we propose a new method that explicitly takes into account the gesture types to reduce these ambiguities and generate human-like conversational gestures. Key to our proposed system is a new gesture database built on the TED dataset that allows us to map a word to one of three types of gestures: “Imagistic” gestures, which express the content …
Self-calibration of multiple-line-lasers based on coplanarity and Epipolar constraints for wide area shape scan using moving camera 査読国際誌

Genki Nagamatsu, Takaki Ikeda, Takafumi Iwaguchi, Diego Thomas, Jun Takamatsu, Hiroshi Kawasaki

2022 26th International Conference on Pattern Recognition (ICPR) 2022年8月

　詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）

High-precision three-dimensional scanning systems have been intensively researched and developed. Recently, for acquisition of large scale scene with high density, simultaneous localisation and mapping (SLAM) technique is preferred because of its simplicity; a single sensor that is moved around freely during 3D scanning. However, to integrate multiple scans, captured data as well as position of each sensor must be highly accurate, making these systems difficult to use in environments not accessible by humans, such as underwater, internal body, or outer space. In this paper, we propose a new, flexible system with multiple line lasers that reconstructs dense and accurate 3D scenes. The advantages of our proposed system are (1) no need of synchronization nor precalibration between lasers and a camera, and (2) the system can reconstruct 3D scenes in extreme conditions, such as underwater. We propose a …
3D pedestrian localization using multiple cameras: a generalizable approach 査読国際誌

João Paulo Lima, Rafael Roberto, Lucas Figueiredo, Francisco Simões, Diego Thomas, Hideaki Uchiyama, Veronica Teichrieb

Machine Vision and Applications 2022年7月

　詳細を見る

記述言語：英語掲載種別：研究論文（学術雑誌）

Pedestrian detection is a critical problem in many areas, such as smart cities, surveillance, monitoring, autonomous driving, and robotics. AI-based methods have made tremendous progress in the field in the last few years, but good performance is limited to data that match the training datasets. We present a multi-camera 3D pedestrian detection method that does not need to be trained using data from the target scene. The core idea of our approach consists in formulating consistency in multiple views as a graph clique cover problem. We estimate pedestrian ground location on the image plane using a novel method based on human body poses and person’s bounding boxes from an off-the-shelf monocular detector. We then project these locations onto the ground plane and fuse them with a new formulation of a clique cover problem from graph theory. We propose a new vertex ordering strategy to define fusion …
Generalizable Online 3D Pedestrian Tracking with Multiple Cameras 査読国際誌

Victor Lyra, Isabella de Andrade, Joao Paulo Lima, Rafael Roberto, Lucas Figueiredo, Joao Marcelo Teixeira, Diego Thomas, Hideaki Uchiyama, Veronica Teichrieb

Proceedings of the 17th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2022) 2022年3月

　詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）

3D pedestrian tracking using multiple cameras is still a challenging task with many applications such as surveillance, behavioral analysis, statistical analysis, and more. Many of the existing tracking solutions involve training the algorithms on the target environment, which requires extensive time and effort. We propose an online 3D pedestrian tracking method for multi-camera environments based on a generalizable detection solution that does not require training with data of the target scene. We establish temporal relationships between people detected in different frames by using a combination of graph matching algorithm and Kalman filter. Our proposed method obtained a MOTA and MOTP of 77.1% and 96.4%, respectively on the test split of the public WILDTRACK dataset. Such results correspond to an improvement of approximately 3.4% and 22.2%, respectively, compared to the best existing online technique. Our experiments also demonstrate the advantages of using appearance information to improve the tracking performance.
Refining OpenPose With a New Sports Dataset for Robust 2D Pose Estimation 査読国際誌

Takumi Kitamura, Hitoshi Teshima, Diego Thomas, Hiroshi Kawasaki

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision 2022年1月

　詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）

3D marker-less motion capture can be achieved by triangulating estimated multi-views 2D poses. However, when the 2D pose estimation fails, the 3D motion capture also fails. This is particularly challenging for sports performance of athletes, which have extreme poses. In extreme poses (like having the head down) state-of-the-art 2D pose estimator such as OpenPose do not work at all. In this paper, we propose a new method to improve the training of 2D pose estimators for extreme poses by leveraging a new sports dataset and our proposed data augmentation strategy. Our results show significant improvements over previous methods for 2D pose estimation of athletes performing acrobatic moves, while keeping state-of-the-art performance on standard datasets.
Integration of gesture generation system using gesture library with DIY robot design kit 査読国際誌

Hitoshi Teshima, Naoki Wake, Diego Thomas, Yuta Nakashima, David Baumert, Hiroshi Kawasaki, Katsushi Ikeuchi

2022 IEEE/SICE International Symposium on System Integration (SII) 2022年1月

　詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）
3D pedestrian localization using multiple cameras: a generalizable approach

Lima J.P., Roberto R., Figueiredo L., Simões F., Thomas D., Uchiyama H., Teichrieb V.

Machine Vision and Applications 33 ( 4 ) 61 - 61 2022年（ ISSN:09328092 ）

　詳細を見る

掲載種別：研究論文（学術雑誌）出版者・発行元：Machine Vision and Applications

Pedestrian detection is a critical problem in many areas, such as smart cities, surveillance, monitoring, autonomous driving, and robotics. AI-based methods have made tremendous progress in the field in the last few years, but good performance is limited to data that match the training datasets. We present a multi-camera 3D pedestrian detection method that does not need to be trained using data from the target scene. The core idea of our approach consists in formulating consistency in multiple views as a graph clique cover problem. We estimate pedestrian ground location on the image plane using a novel method based on human body poses and person’s bounding boxes from an off-the-shelf monocular detector. We then project these locations onto the ground plane and fuse them with a new formulation of a clique cover problem from graph theory. We propose a new vertex ordering strategy to define fusion priority based on both detection distance and vertex degree. We also propose an optional step for exploiting pedestrian appearance during fusion by using a domain-generalizable person re-identification model. Finally, we compute the final 3D ground coordinates of each detected pedestrian with a method based on keypoint triangulation. We evaluated the proposed approach on the challenging WILDTRACK and MultiviewX datasets. Our proposed method significantly outperformed state of the art in terms of generalizability. It obtained a MODA that was approximately 15% and 2% better than the best existing generalizable detection technique on WILDTRACK and MultiviewX, respectively.

DOI： 10.1007/s00138-022-01323-9

Scopus

researchmap
Generalizable Online 3D Pedestrian Tracking with Multiple Cameras

Lyra V., de Andrade I., Lima J.P., Roberto R., Figueiredo L., Teixeira J.M., Thomas D., Uchiyama H., Teichrieb V.

Proceedings of the International Joint Conference on Computer Vision Imaging and Computer Graphics Theory and Applications 5 820 - 827 2022年（ ISSN:21845921 ISBN:9789897585555 ）

　詳細を見る

掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：Proceedings of the International Joint Conference on Computer Vision Imaging and Computer Graphics Theory and Applications

3D pedestrian tracking using multiple cameras is still a challenging task with many applications such as surveillance, behavioral analysis, statistical analysis, and more. Many of the existing tracking solutions involve training the algorithms on the target environment, which requires extensive time and effort. We propose an online 3D pedestrian tracking method for multi-camera environments based on a generalizable detection solution that does not require training with data of the target scene. We establish temporal relationships between people detected in different frames by using a combination of graph matching algorithm and Kalman filter. Our proposed method obtained a MOTA and MOTP of 77.1% and 96.4%, respectively on the test split of the public WILDTRACK dataset. Such results correspond to an improvement of approximately 3.4% and 22.2%, respectively, compared to the best existing online technique. Our experiments also demonstrate the advantages of using appearance information to improve the tracking performance.

DOI： 10.5220/0010842800003124

Scopus

researchmap

その他リンク： https://dblp.uni-trier.de/db/conf/visapp/visapp2022-2.html#LyraALRFTTUT22
Unsupervised Multi-view Multi-person 3D Pose Estimation Using Reprojection Error

De Franca Silva, DW; Do Monte Lima, JPS; Macedo, D; Zanchettin, C; Thomas, DGF; Uchiyama, H; Teichrieb, V

ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2022, PT III 13531 482 - 494 2022年（ ISSN:0302-9743 ISBN:978-3-031-15933-6 eISSN:1611-3349 ）

　詳細を見る

出版者・発行元：Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics

This work addresses multi-view multi-person 3D pose estimation in synchronized and calibrated camera views. Recent approaches estimate neural network weights in a supervised way; they rely on ground truth annotated datasets to compute the loss function and optimize the weights in the network. However, manually labeling ground truth datasets is labor-intensive, expensive, and prone to errors. Consequently, it is preferable not to rely heavily on labeled datasets. This work proposes an unsupervised approach to estimating 3D human poses requiring only an off-the-shelf 2D pose estimation method and the intrinsic and extrinsic camera parameters. Our approach uses reprojection error as a loss function instead of comparing the predicted 3D pose with the ground truth. First, we estimate the 3D pose of each person using the plane sweep stereo approach, in which the depth of each 2D joint related to each person is estimated in a selected target view. The estimated 3D pose is then projected onto each of the other views using camera parameters. Finally, the 2D reprojection error in the image plane is computed by comparing it with the estimated 2D pose corresponding to the same person. The 2D poses that correspond to the same person are identified using virtual depth planes, where each 3D pose is projected onto the reference view and compared to find the nearest 2D pose. Our proposed method learns to estimate 3D pose in an end-to-end unsupervised manner and does not require any manual parameter tuning, yet we achieved results close to state-of-the-art supervised methods on a public dataset. Our method achieves only 5.8% points below the fully supervised state-of-the-art method and only 5.1% points below the best geometric approach in the Campus dataset.

DOI： 10.1007/978-3-031-15934-3_40

Web of Science

Scopus
Self-calibrated dense 3D sensor using multiple cross line-lasers based on light sectioning method and visual odometry 査読国際誌

Genki Nagamatsu, Jun Takamatsu, Takafumi Iwaguchi, Diego Thomas, Hiroshi Kawasaki

2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2021年9月

　詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）
PoseRN: A 2D pose refinement network for bias-free multi-view 3D human pose estimation 査読国際誌

Akihiko Sayo, Diego Thomas, Hiroshi Kawasaki, Yuta Nakashima, Katsushi Ikeuchi

2021 IEEE International Conference on Image Processing (ICIP) 2021年9月

　詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）

We propose a new 2D pose refinement network that learns to predict the human bias in the estimated 2D pose. There are biases in 2D pose estimations that are due to differences between annotations of 2D joint locations based on annotators’ perception and those defined by motion capture (MoCap) systems. These biases are crafted into publicly available 2D pose datasets and cannot be removed with existing error reduction approaches. Our proposed pose refinement network allows us to efficiently remove the human bias in the estimated 2D poses and achieve highly accurate multi-view 3D human pose estimation.
Unsupervised 3D Human Pose Estimation in Multi-view-multi-pose Video 査読国際誌

#Cheng Sun, Diego Thomas, Hiroshi Kawasaki

2020 25th International Conference on Pattern Recognition (ICPR) 5959 - 5964 2021年1月

　詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）
Analysis and classification of gestures in ted talks 査読国際誌

Hitoshi Teshima, Naoki Wake, Diego Thomas, Yuta Nakashima, Hiroshi Kawasaki, Katsushi Ikeuchi

IEICE Technical Report; IEICE Tech. Rep. 2020年10月

　詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）
On-the-fly Extrinsic Calibration of Non-Overlapping in-Vehicle Cameras based on Visual SLAM under 90-degree Backing-up Parking 査読国際誌

Kazuki Nishiguchi, Hideaki Uchiyama, Kazutaka Hayakawa, Jun Adachi, Diego Thomas, Atsushi Shimada, Rin-Ichiro Taniguchi

2020 IEEE Intelligent Vehicles Symposium (IV) 2020年10月

　詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）
Real-time Simultaneous 3D Head Modeling and Facial Motion Capture with an RGB-D camera 国際誌

Diego Thomas

arXiv preprint arXiv:2004.10557 2020年4月

　詳細を見る

記述言語：英語掲載種別：研究論文（学術雑誌）
Generating a consistent global map under intermittent mapping conditions for large-scale vision-based navigation 査読国際誌

Kazuki Nishiguchi, Walid Bousselham, Hideaki Uchiyama, Diego Thomas, Atsushi Shimada, Rin Ichiro Taniguchi

15th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, VISIGRAPP 2020 2020年1月

　詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）

Localization is the process to compute sensor poses based on vision technologies such as visual Simultaneous Localization And Mapping (vSLAM). It can generally be applied to navigation systems . To achieve this, a global map is essential such that the relocalization process requires a single consistent map represented with an unified coordinate system. However, a large-scale global map cannot be created at once due to insufficient visual features at some moments. This paper presents an interactive method to generate a consistent global map from intermittent maps created by vSLAM independently via global reference points. First, vSLAM is applied to individual image sequences to create maps independently. At the same time, multiple reference points with known latitude and longitude are interactively recorded in each map. Then, the coordinate system of each individual map is converted into the one that has metric scale and unified axes with the reference points. Finally, the individual maps are merged into a single map based on the relative position of each origin. In the evaluation, we show the result of map merging and relocalization with our dataset to confirm the effectiveness of our method for navigation tasks. In addition, the report on participating in the navigation competition in a practical environment is also discussed.
Regression of 3D human body shapes from a single image in a tetrahedral volume 国際誌

#Hayato Onizuka, Diego Thomas, @Zehra Hayirci, Akihiro Sugimoto, Hideaki Uchiyama, Rin-ichiro Taniguchi

THE 15TH JOINT WORKSHOP ON MACHINE PERCEPTION AND ROBOTICS 2019年11月

　詳細を見る

記述言語：英語掲載種別：研究論文（大学，研究機関等紀要）

Reconstructing a 3D shape from a single 2D image is an ill-posed problem. This is because different 3D shapes may produce the same 2D image. Nevertheless, under some conditions and with the help of deep neural networks (DNN), approximate solutions can be obtained. The recent advances in convolutional neural networks (CNNs) for 3D object shape reconstruction from a single image are particularly thrilling for the case of 3D human body shape retrieval. The 3D human body has been extensively studied and modelled using standard computer vision techniques, which give us a sufficient amount of prior knowledge to constrain the 3D shape recovery problem using DNN. Current solutions, however, fail to reconstruct the fine details of the body due to a required huge amount of memory that cannot be maintained even on modern GPUs. In this paper, we propose the tetrahedral volumetric truncated signed distance function (TSDF) model for the human body, and its corresponding part connection network (PCN) for detailed shape regression. Our proposed 3D representation requires a low amount of memory and allows us to reconstruct detailed shapes from a single RGB image. Experimental results using real data demonstrate that our proposed method is promising.
Real-Time Facial Motion Capture Using RGB-D Images Under Complex Motion and Occlusions 査読国際誌

@Joao Otavio de Lucena, Joao Paulo Lima, Diego Thomas, Veronica Teichrieb

THE SYMPOSIUM ON VIRTUAL AND AUGMENTED REALITY 2019 2019年10月

　詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）

We present a technique for capturing facial performance in real time using an RGB-D camera. Such method can be applied to face augmentation by leveraging facial expression changes. The technique is able to perform both 3D facial modeling and facial motion tracking without the need of pre-scanning or training for a specific user.
The proposed approach builds on an existing method that we refer as FaceCap, which uses a blendshape representation and a Bump image for tracking facial motion and capturing geometric details. The original FaceCap algorithm fails in some scenarios with complex motion and occlusions, mainly due to problems in the face detection and tracking steps. FaceCap also has problems with the Bump image filtering step that generates outliers, causing more distortion on the 3D augmented blendshape.
In order to solve these problems, we propose two refinements: (a) a new framework for face detection and landmark localization based on the state-of-the-art methods MTCNN and CE-CLM, respectively; and (b) a simple but effective modification in the filtering step, removing reconstruction failures in the eye region.
Experiments showed that the proposed approach can deal with unconstrained scenarios, such as large head pose variations and partial occlusions, while achieving real-time execution.
MeRA: An Interactive Mediated Reality Agent for Educational Application 査読国際誌

@Guillaume Quiniou, Frederic Rayar, Diego Thomas

International Symposium on Mixed and Augmented Reality | ISMAR 2019 2019年10月

　詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）

The recent developments of Mixed Reality devices and advances in 3D scene understanding and mapping unlock new possibilities for richer interactions between users, the surrounding 3D environment but also virtual agents. In this work, we present MeRA: an interactive Meditated Reality agent for ludic and educational applications. The agent evolves in mediated tabletop environment and can help the user to learn, play or create Tangram, a jigsaw-like traditional game. This opens new exciting perspectives for educational support of young children, who require active and human-like interactions.
Landmark-guided deformation transfer of template facial expressions for automatic generation of avatar blendshapes 査読国際誌

#Hayato Onizuka, Diego Thomas, Hideaki Uchiyama, Rin-ichiro Taniguchi

The 2nd Workshop on 3D Reconstruction in the Wild (3DRW2019) in conjunction with ICCV2019 2019年10月

　詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）

Blendshape models are commonly used to track and re-target facial expressions to virtual avatars using RGB-D cameras and without using any facial marker. When using blendshape models, the target avatar model must possess a set of key-shapes that can be blended depending on the estimated facial expression. Creating realistic set of key-shapes is extremely difficult and requires time and professional expertise. As a consequence, blendshape-based re-targeting technology can only be used with a limited amount of pre-built avatar models, which is not attractive for the large public. In this paper, we propose an automatic method to easily generate realistic key-shapes of any avatar that map directly to the source blendshape model (the user is only required to select a few facial landmarks on the avatar mesh). By doing so, captured facial motion can be easily re-targeted to any avatar, even when the avatar has largely different shape and topology compared with the source template mesh. Our experimental results show the accuracy of our proposed method compared with the state-of-the-art method for mesh deformation transfer.
Blended-Keyframes for Mobile Mediated Reality Applications 査読国際誌

#Yu Xue, Diego Thomas, Frederic Rayar, Hideaki Uchiyama, Rin-ichiro Taniguchi, Boacai Yin

IEEE International Symposium on Mixed and Augmented Reality (ISMAR) 2019年10月

　詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）

With the recent developments of Mixed Reality (MR) devices and advances in 3D scene understanding, MR applications on mobile devices are becoming available to a large part of the society. These applications allow users to mix virtual content into the surrounding environment. However the ability to mediate (\textit{i.e.}, modify or alter) the surrounding environment remains a difficult and unsolved problem that limits the degree of immersion of current MR applications on mobile devices. In this paper, we present a method to mediate 2D views of a real environment using a single consumer-grade RGB-D camera and without the need of pre-scanning the scene. Our proposed method creates in real-time a dense and detailed keyframe-based 3D map of the real scene and takes advantage of a semantic instance segmentation to isolate target objects. We show that our proposed method allows to remove target objects in the environment and to replace them by their virtual counterpart, which are built on-the-fly. Such an approach is well suited for creating mobile Mediated Reality applications.
仲介現実を用いた次世代教育に向けるアプリ

#Xue Yu, Diego Thomas, Frederic Rayar, Hideaki Uchiyama, Yin Baocai, Rin-ichiro Taniguchi

第22回画像の認識・理解シンポジウム (MIRU2019) 2019年8月

　詳細を見る

記述言語：日本語掲載種別：研究論文（その他学術会議資料等）

近年では、仲介現実（Mediated Reality）の設備と三次元復元に関する３D シーン理解の研究が行われてきた。仮想エージェントと人と周囲環境の相互作用手段が適用可能となった。本稿では仲介現実エージェントを紹介して、市販の手持ちRGBDカメラを用いた、モバイル機器上に仲介現実環境を構築する方法を提案した。
3D Body and Background Reconstruction in a Large-scale Indoor Scene using Multiple Depth Cameras 査読国際誌

Daisuke Kobayashi ; Diego Thomas ; Hideaki Uchiyama ; Rin-ichiro Taniguchi

2019 12th Asia Pacific Workshop on Mixed and Augmented Reality (APMAR) 2019年3月

　詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）

3D reconstruction of indoor scenes that contain non-rigidly moving human body using depth cameras is a task of extraordinary difficulty. Despite intensive efforts from the researchers in the 3D vision community, existing methods are still limited to reconstruct small scale scenes. This is because of the difficulty to track the camera motion when a target person moves in a totally different direction. Due to the narrow field of view (FoV) of consumer-grade red-green-blue-depth (RGB-D) cameras, a target person (generally put at about 2-3 meters from the camera) covers most of the FoV of the camera. Therefore, there are not enough features from the static background to track the motion of the camera. In this paper, we propose a system which reconstructs a moving human body and the background of an indoor scene using multiple depth cameras. Our system is composed of three Kinects that are approximately set in the same line and facing the same direction so that their FoV do not overlap (to avoid interference). Owing to this setup, we capture images of a person moving in a large scale indoor scene. The three Kinect cameras are calibrated with a robust method that uses three large non parallel planes. A moving person is detected by using human skeleton information, and is reconstructed separately from the static background. By separating the human body and the background, static 3D reconstruction can be adopted for the static background area while a method specialized for the human body area can be used to reconstruct the 3D model of the moving person. The experimental result shows the performance of proposed system for human body in a large-scale indoor scene.
3D Body and Background Reconstruction in a Large-scale Indoor Scene using Multiple Depth Cameras 査読国際誌

#Daisuke Kobayashi, Diego Thomas, Hideaki Uchiyama, Rin-ichiro Taniguchi

The 12th Asia Pacific Workshop on Mixed and Augmented Reality 2019年3月

　詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）

3D reconstruction of indoor scenes that contain non-rigidly moving human body using depth cameras is a task of extraordinary difficulty. Despite intensive efforts from the researchers in the 3D vision community, existing methods are still limited to reconstruct small scale scenes. This is because of the difficulty to track the camera motion when a target person moves in a totally different direction. Due to the narrow field of view (FoV) of consumer-grade red-green-blue-depth (RGB-D) cameras, a target person (generally put at about $2-3$ meters from the camera) covers most of the FoV of the camera. Therefore, there are not enough features from the static background to track the motion of the camera. In this paper, we propose a system which reconstructs a moving human body and the background of an indoor scene using multiple depth cameras. Our system is composed of three Kinects that are approximately set in the same line and facing the same direction so that their FoV do not overlap (to avoid interference). Owing to this setup, we capture images of a person moving in a large scale indoor scene. The three Kinect cameras are calibrated with a robust method that uses three large non parallel planes. A moving person is detected by using human skeleton information, and is reconstructed separately from the static background. By separating the human body and the background, static 3D reconstruction can be adopted for the static background area while a method specialized for the human body area can be used to reconstruct the 3D model of the moving person. The experimental result shows the performance of proposed system for human body in a large-scale indoor scene.
Solving monocular visual odometry scale factor with adaptive step length estimates for pedestrians using handheld devices 査読国際誌

Nicolas Antigny, Hideaki Uchiyama, Myriam Servières, Valérie Renaudin, Diego Thomas, Rin-ichiro Taniguchi

MDPI Sensors 2019年1月

　詳細を見る

記述言語：英語掲載種別：研究論文（学術雑誌）

The urban environments represent challenging areas for handheld device pose estimation (ie, 3D position and 3D orientation) in large displacements. It is even more challenging with low-cost sensors and computational resources that are available in pedestrian mobile devices (ie, monocular camera and Inertial Measurement Unit). To address these challenges, we propose a continuous pose estimation based on monocular Visual Odometry. To solve the scale ambiguity and suppress the scale drift, an adaptive pedestrian step lengths estimation is used for the displacements on the horizontal plane. To complete the estimation, a handheld equipment height model, with respect to the Digital Terrain Model contained in Geographical Information Systems, is used for the displacement on the vertical axis. In addition, an accurate pose estimation based on the recognition of known objects is punctually used to correct the pose estimate and reset the monocular Visual Odometry. To validate the benefit of our framework, experimental data have been collected on a 0.7 km pedestrian path in an urban environment for various people. Thus, the proposed solution allows to achieve a positioning error of 1.6–7.5% of the walked distance, and confirms the benefit of the use of an adaptive step length compared to the use of a fixed-step length.
Incremental 3D Cuboid Modeling with Drift Compensation 査読国際誌

Masashi Mishima, Hideaki Uchiyama, Diego Thomas, Rin-ichiro Taniguchi, Rafael Roberto, João Lima, Veronica Teichrieb

MDPI Sensors 2019年1月

　詳細を見る

記述言語：英語掲載種別：研究論文（学術雑誌）

This paper presents a framework of incremental 3D cuboid modeling by using the mapping results of an RGB-D camera based simultaneous localization and mapping (SLAM) system. This framework is useful in accurately creating cuboid CAD models from a point cloud in an online manner. While performing the RGB-D SLAM, planes are incrementally reconstructed from a point cloud in each frame to create a plane map. Then, cuboids are detected in the plane map by analyzing the positional relationships between the planes, such as orthogonality, convexity, and proximity. Finally, the position, pose, and size of a cuboid are determined by computing the intersection of three perpendicular planes. To suppress the false detection of the cuboids, the cuboid shapes are incrementally updated with sequential measurements to check the uncertainty of the cuboids. In addition, the drift error of the SLAM is compensated by the registration of the cuboids. As an application of our framework, an augmented reality-based interactive cuboid modeling system was developed. In the evaluation at cluttered environments, the precision and recall of the cuboid detection were investigated, compared with a batch-based cuboid detection method, so that the advantages of our proposed method were clarified.
Indoor Positioning System Based on Chest-Mounted IMU 査読国際誌

Chuanhua Lu, Hideaki Uchiyama, Diego Thomas, Atsushi Shimada, Rin-ichiro Taniguchi

MDPI Sensors 2019年1月

　詳細を見る

記述言語：英語掲載種別：研究論文（学術雑誌）

Demand for indoor navigation systems has been rapidly increasing with regard to location-based services. As a cost-effective choice, inertial measurement unit (IMU)-based pedestrian dead reckoning (PDR) systems have been developed for years because they do not require external devices to be installed in the environment. In this paper, we propose a PDR system based on a chest-mounted IMU as a novel installation position for body-suit-type systems. Since the IMU is mounted on a part of the upper body, the framework of the zero-velocity update cannot be applied because there are no periodical moments of zero velocity. Therefore, we propose a novel regression model for estimating step lengths only with accelerations to correctly compute step displacement by using the IMU data acquired at the chest. In addition, we integrated the idea of an efficient map-matching algorithm based on particle filtering into our system to improve positioning and heading accuracy. Since our system was designed for 3D navigation, which can estimate position in a multifloor building, we used a barometer to update pedestrian altitude, and the components of our map are designed to explicitly represent building-floor information. With our complete PDR system, we were awarded second place in 10 teams for the IPIN 2018 Competition Track 2, achieving a mean error of 5.2 m after the 800 m walking event.
FusionMLS: Highly dynamic 3D reconstruction with consumer-grade RGB-D cameras 査読国際誌

Siim Meerits, Diego Thomas, Vincent Nozick, Hideo Saito

Computational Visual Media 2018年12月

　詳細を見る

記述言語：英語掲載種別：研究論文（学術雑誌）

Multi-view dynamic three-dimensional reconstruction has typically required the use of custom shutter-synchronized camera rigs in order to capture scenes containing rapid movements or complex topology changes. In this paper, we demonstrate that multiple unsynchronized low-cost RGB-D cameras can be used for the same purpose. To alleviate issues caused by unsynchronized shutters, we propose a novel depth frame interpolation technique that allows synchronized data capture from highly dynamic 3D scenes. To manage the resulting huge number of input depth images, we also introduce an efficient moving least squares-based volumetric reconstruction method that generates triangle meshes of the scene. Our approach does not store the reconstruction volume in memory, making it memory-efficient and scalable to large scenes. Our implementation is completely GPU based and works in real time. The results shown herein, obtained with real data, demonstrate the effectiveness of our proposed method and its advantages compared to state-of-the-art approaches.
Sparse cost volume for efficient stereo matching 査読国際誌

Chuanhua Lu, Hideaki Uchiyama, Diego Thomas, Atsushi Shimada, Rin-ichiro Taniguchi

MDPI Remote Sensing 2018年11月

　詳細を見る

記述言語：英語掲載種別：研究論文（学術雑誌）

Stereo matching has been solved as a supervised learning task with convolutional neural network (CNN). However, CNN based approaches basically require huge memory use. In addition, it is still challenging to find correct correspondences between images at ill-posed dim and sensor noise regions. To solve these problems, we propose Sparse Cost Volume Net (SCV-Net) achieving high accuracy, low memory cost and fast computation. The idea of the cost volume for stereo matching was initially proposed in GC-Net. In our work, by making the cost volume compact and proposing an efficient similarity evaluation for the volume, we achieved faster stereo matching while improving the accuracy. Moreover, we propose to use weight normalization instead of commonly-used batch normalization for stereo matching tasks. This improves the robustness to not only sensor noises in images but also batch size in the training process. We evaluated our proposed network on the Scene Flow and KITTI 2015 datasets, its performance overall surpasses the GC-Net. Comparing with the GC-Net, our SCV-Net achieved to:(1) reduce 73.08% GPU memory cost;(2) reduce 61.11% processing time;(3) improve the 3PE from 2.87% to 2.61% on the KITTI 2015 dataset.
RGB-D SLAM based incremental cuboid modeling 査読国際誌

Masashi Mishima, Hideaki Uchiyama, Diego Thomas, Rin-ichiro Taniguchi, Rafael Roberto, Veronica Teichrieb

The European Conference on Computer Vision (ECCV) workshops, 2018 2018年9月

　詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）

This paper present a framework for incremental 3D cuboid modeling combined with RGB-D SLAM. While performing RGB-D SLAM, planes are incrementally reconstructed from point clouds. Then, cuboids are detected in the planes by analyzing the positional relationships between the planes; orthogonality, convexity, and proximity. Finally, the position, pose and size of a cuboid are determined by computing the intersection of three perpendicular planes. In addition, the cuboid shapes are incrementally updated to suppress false detections with sequential measurements. As an application of our framework, an augmented reality based interactive cuboid modeling system is introduced. In the evaluation at a cluttered environment, the precision and recall of the cuboid detection are improved with our framework owing to stable plane detection, compared with a batch based method.
Live structural modeling using RGB-D SLAM 査読国際誌

Nicolas Olivier, Hideaki Uchiyama, Masashi Mishima, Diego Thomas, Rin-Ichiro Taniguchi, Rafael Roberto, João Paulo Lima, Veronica Teichrieb

2018 IEEE International Conference on Robotics and Automation (ICRA) 2018年5月

　詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）

This paper presents a method for localizing primitive shapes in a dense point cloud computed by the RGB-D SLAM system. To stably generate a shape map containing only primitive shapes, the primitive shape is incrementally modeled by fusing the shapes estimated at previous frames in the SLAM, so that an accurate shape can be finally generated. Specifically, the history of the fusing process is used to avoid the influence of error accumulation in the SLAM. The point cloud of the shape is then updated by fusing the points in all the previous frames into a single point cloud. In the experimental results, we show that metric primitive modeling in texture-less and unprepared environments can be achieved online.
Synthesis of environment maps for mixed reality 査読国際誌

David R Walton, Diego Thomas, Anthony Steed, Akihiro Sugimoto

2017 IEEE International Symposium on Mixed and Augmented Reality (ISMAR) 2017年10月

　詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）

When rendering virtual objects in a mixed reality application, it is helpful to have access to an environment map that captures the appearance of the scene from the perspective of the virtual object. It is straightforward to render virtual objects into such maps, but capturing and correctly rendering the real components of the scene into the map is much more challenging. This information is often recovered from physical light probes, such as reflective spheres or fisheye cameras, placed at the location of the virtual object in the scene. For many application areas, however, real light probes would be intrusive or impractical.
Ideally, all of the information necessary to produce detailed en- vironment maps could be captured using a single device. We intro- duce a method using an RGBD camera and a small fisheye camera, contained in a single unit, to create environment maps at any lo- cation in an indoor scene. The method combines the output from both cameras to correct for their limited field of view and the dis- placement from the virtual object, producing complete environment maps suitable for rendering the virtual content in real time. Our method improves on previous probeless approaches by its ability to recover high-frequency environment maps. We demonstrate how this can be used to render virtual objects which shadow, reflect and refract their environment convincingly.
Fast 3D point cloud segmentation using supervoxels with geometry and color for 3D scene understanding 査読国際誌

Francesco Verdoja, Diego Thomas, Akihiro Sugimoto

IEEE International Conference on Multimedia and Expo (ICME), 2017 2017年7月

　詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）

Segmentation of 3D colored point clouds is a research field with renewed interest thanks to recent availability of inexpensive consumer RGB-D cameras and its importance as an unavoidable low-level step in many robotic applications. However, 3D data's nature makes the task challenging and, thus, many different techniques are being proposed, all of which require expensive computational costs. This paper presents a novel fast method for 3D colored point cloud segmentation. It starts with supervoxel partitioning of the cloud, i.e., an oversegmentation of the points in the cloud. Then it leverages on a novel metric exploiting both geometry and color to iteratively merge the supervoxels to obtain a 3D segmentation where the hierarchical structure of partitions is maintained. The algorithm also presents computational complexity linear to the size of the input. Experimental results over two publicly available datasets demonstrate that our proposed method outperforms state-of-the-art techniques.
Parametric surface representation with bump image for dense 3d modeling using an rbg-d camera 査読国際誌

Diego Thomas, Akihiro Sugimoto

International Journal of Computer Vision 123 ( 2 ) 2017年6月

　詳細を見る

記述言語：英語掲載種別：研究論文（学術雑誌）

When constructing a dense 3D model of an indoor static scene from a sequence of RGB-D images, the choice of the 3D representation (e.g. 3D mesh, cloud of points or implicit function) is of crucial importance. In the last few years, the volumetric truncated signed distance function (TSDF) and its extensions have become popular in the community and largely used for the task of dense 3D modelling using RGB-D sensors. However, as this representation is voxel based, it offers few possibilities for manipulating and/or editing the constructed 3D model, which limits its applicability. In particular, the amount of data required to maintain the volumetric TSDF rapidly becomes huge which limits possibilities for portability. Moreover, simplifications (such as mesh extraction and surface simplification) significantly reduce the accuracy of the 3D model (especially in the color space), and editing the 3D model is difficult. We propose a novel compact, flexible and accurate 3D surface representation based on parametric surface patches augmented by geometric and color texture images. Simple parametric shapes such as planes are roughly fitted to the input depth images, and the deviations of the 3D measurements to the fitted parametric surfaces are fused into a geometric texture image (called the Bump image). A confidence and color texture image are also built. Our 3D scene representation is accurate yet memory efficient. Moreover, updating or editing the 3D model becomes trivial since it is reduced to manipulating 2D images. Our experimental results demonstrate the advantages of our proposed 3D representation through a concrete indoor scene reconstruction application.
Modeling large-scale indoor scenes with rigid fragments using RGB-D cameras 査読国際誌

Diego Thomas, Akihiro Sugimoto

Computer Vision and Image Understanding 2017年4月

　詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）

Hand-held consumer depth cameras have become a commodity tool for constructing 3D models of indoor environments in real time. Recently, many methods to fuse low quality depth images into a single dense and high fidelity 3D model have been proposed. Nonetheless, dealing with large-scale scenes remains a challenging problem. In particular, the accumulation of small errors due to imperfect camera localization becomes crucial (at large scale) and results in dramatic deformations of the built 3D model. These deformations have to be corrected whenever it is possible (when a loop exists for example). To facilitate such correction, we use a structured 3D representation where points are clustered into several planar patches that compose the scene. We then propose a two-stage framework to build in details and in real-time a large-scale 3D model. The first stage (the local mapping) generates local structured 3D models with rigidity constraints from short subsequences of RGB-D images. The second stage (the global mapping) aggregates all local 3D models into a single global model in a geometrically consistent manner. Minimizing deformations of the global model reduces to re-positioning the planar patches of the local models thanks to our structured 3D representation. This allows efficient, yet accurate computations. Our experiments using real data confirm the effectiveness of our proposed method.
Multi-view facial landmark detector learned by the Structured Output SVM 査読国際誌

Michal Uřičář, Vojtěch Franc, Diego Thomas, Akihiro Sugimoto, Václav Hlaváč

Image and Vision Computing 47 2016年3月

　詳細を見る

記述言語：英語掲載種別：研究論文（学術雑誌）

We propose a real-time multi-view landmark detector based on Deformable Part Models (DPM). The detector is composed of a mixture of tree based DPMs, each component describing landmark configurations in a specific range of viewing angles. The usage of view specific DPMs allows to capture a large range of poses and to deal with the problem of self-occlusions. Parameters of the detector are learned from annotated examples by the Structured Output Support Vector Machines algorithm. The learning objective is directly related to the performance measure used for detector evaluation. The tree based DPM allows to find a globally optimal landmark configuration by the dynamic programming. We propose a coarse-to-fine search strategy which allows real-time processing by the dynamic programming also on high resolution images. Empirical evaluation on “in the wild” images shows that the proposed detector is competitive with the state-of-the-art methods in terms of speed and accuracy yet it keeps the guarantee of finding a globally optimal estimate in contrast to other methods.

DOI： https://doi.org/10.1016/j.imavis.2016.02.004
Real-time multi-view facial landmark detector learned by the structured output SVM 査読国際誌

Michal Uřičář, Vojtěch Franc, Diego Thomas, Akihiro Sugimoto, Václav Hlaváč

11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), 2015 2015年5月

　詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）

While the problem of facial landmark detection is getting big attention in the computer vision
community recently, most of the methods deal only with near-frontal views and there is only
a few really multi-view detectors available, that are capable of detection in a wide range of
yaw angle (eg Φ ε (-90°, 90°)). We describe a multi-view facial landmark detector based on
the Deformable Part Models, which treats the problem of the simultaneous landmark
detection and the viewing angle estimation within a structured output classification
framework. We present an easily extensible and flexible framework which provides a real-
time performance on the “in the wild” images, evaluated on a challenging “Annotated Facial
Landmarks in the Wild” database. We show that our detector achieves better results than the
current state of the art in terms of the localization error.

DOI： 10.1109/FG.2015.7284810
A two-stage strategy for real-time dense 3D reconstruction of large-scale scenes. 査読国際誌

Diego Thomas, Akihiro Sugimoto

IEEE European Conference on Computer Vision Workshops (ECCVW), 2014 2014年9月

　詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）

The frame-to-global-model approach is widely used for accurate 3D modeling from sequences of RGB-D images. Because still no perfect camera tracking system exists, the accumulation of small errors generated when registering and integrating successive RGB-D images causes deformations of the 3D model being built up. In particular, the deformations become significant when the scale of the scene to model is large. To tackle this problem, we propose a two-stage strategy to build in details a large-scale 3D model with minimal deformations where the first stage creates accurate small-scale 3D scenes in real-time from short subsequences of RGB-D images while the second stage re-organises all the results from the first stage in a geometrically consistent manner to reduce deformations as much as possible. By employing planar patches as the 3D scene representation, our proposed method runs in real-time to build accurate 3D models with minimal deformations even for large-scale scenes. Our experiments using real data confirm the effectiveness of our proposed method.
A flexible scene representation for 3D reconstruction using an RGB-D camera 査読国際誌

Diego Thomas, Akihiro Sugimoto

IEEE International Conference on Computer Vision (ICCV), 2013 2013年12月

　詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）

Updating a global 3D model with live RGB-D measure- ments has proven to be successful for 3D reconstruction of indoor scenes. Recently, a Truncated Signed Distance Function (TSDF) volumetric model and a fusion algorithm have been introduced (KinectFusion), showing significant advantages such as computational speed and accuracy of the reconstructed scene. This algorithm, however, is expen- sive in memory when constructing and updating the global model. As a consequence, the method is not well scalable to large scenes. We propose a new flexible 3D scene repre- sentation using a set of planes that is cheap in memory use and, nevertheless, achieves accurate reconstruction of in- door scenes from RGB-D image sequences. Projecting the scene onto different planes reduces significantly the size of the scene representation and thus it allows us to generate a global textured 3D model with lower memory requirement while keeping accuracy and easiness to update with live RGB-D measurements. Experimental results demonstrate that our proposed flexible 3D scene representation achieves accurate reconstruction, while keeping the scalability for large indoor scenes.
Learning to discover objects in RGB-D images using correlation clustering 査読国際誌

Michael Firman, Diego Thomas, Simon Julier, Akihiro Sugimoto

IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2013 1107 - 1112 2013年11月

　詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）

We introduce a method to discover objects from RGB-D image collections which does not
require a user to specify the number of objects expected to be found. We propose a
probabilistic formulation to find pairwise similarity between image segments, using a
classifier trained on labelled pairs from the recently released RGB-D Object Dataset. We
then use a correlation clustering solver to both find the optimal clustering of all the segments
in the collection and to recover the number of clusters. Unlike traditional supervised learning
methods, our training data need not be of the same class or category as the objects we
expect to discover. We show that this parameter-free supervised clustering method has
superior performance to traditional clustering methods.
Compact and accurate 3-d face modeling using an rgb-d camera: Let's open the door to 3-d video conference 査読国際誌

Pavan Kumar Anasosalu, Diego Thomas, Akihiro Sugimoto

The IEEE International Conference on Computer Vision (ICCV) Workshops 2013年5月

　詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）

We present a method for producing an accurate and compact 3-D face model in real time using a low cost RGB-D sensor like the Kinect camera. We extend and use Bump Images for highly accurate and low memory consumption 3-D reconstruction of the human face. Bump Images are generated by representing the Cartesian coordinates of points on the face in the spherical coordinate system whose origin is the center of the head. After initialization, the Bump Images are updated in real time with every RGB-D frame with respect to the current viewing direction and head pose that are estimated using the frame-to-global-model registration strategy. While high accuracy of the representation allows to recover fine details, low memory use opens new possible applications of consumer depth cameras such as 3-D video conferencing. We validate our approach by quantitatively comparing our result with the result obtained by a commercial high resolution laser scanner. We also discuss the potential of our proposed method for a 3-D video conferencing application with existing internet speeds.
Robust simultaneous 3D registration via rank minimization 査読国際誌

Diego Thomas, Yasuyuki Matsushita, Akihiro Sugimoto

Second International Conference on 3D Imaging, Modeling, Processing, Visualization and Transmission (3DIMPVT), 2012 2012年10月

　詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）

We present a robust and accurate 3D registration method for a dense sequence of depth images taken from unknown viewpoints. Our method simultaneously estimates multiple extrinsic parameters of the depth images to obtain a registered full 3D model of the scanned scene. By arranging the depth measurements in a matrix form, we formulate the problem as a simultaneous estimation of multiple extrinsics and a low-rank matrix, which corresponds to the aligned depth images as well as a sparse error matrix. Unlike previous approaches that use sequential or heuristic global registration approaches, our solution method uses an advanced convex optimization technique for obtaining a robust solution via rank minimization. To achieve accurate computation, we develop a depth projection method that has minimum sensitivity to sampling by reading projected depth values in the input depth images. We demonstrate the effectiveness of the proposed method through extensive experiments and compare it with previous standard techniques.
Range Image Registration Based on Photometry 査読

Diego Thomas

PhD thesis, The National Institute of Informatics, SOKENDAI, Tokyo, Japan 2012年3月

　詳細を見る

記述言語：英語掲載種別：研究論文（学術雑誌）

3D modeling of a real scene stands for constructing a virtual represen- tation of the scene, generally simplified that can be used or modified at our will. Constructing such a 3D model by hand is a laborious and time consuming task, and automating the whole process has attracted growing in- terest in the computer vision field. In particular, the task of registering (i.e. aligning) different parts of the scene (called range images) acquired from different viewpoints is of crucial importance when constructing 3D models. During the last decades, researchers have concentrated their efforts on this problem and proposed several methodologies to automatically register range images. Thereby, key-point detectors and descriptors have been utilized to match points across different range images using geometric features or tex- tural features. Several similarity metrics have also been proposed to identify the overlapping regions. In spite of the advantages of the current methods, several limitation cases have been reported. In particular, when the scene lacks in discriminative geometric features, the difficulty of accounting for the changes in appearance of the scene observed in different poses, or from different viewpoints, significantly degrades the performance of the current methods. We address this issue by investigating the use of photometry (i.e. the relationship between geometry, reflectance properties and illumination) for range image registration. First, we propose a robust descriptor using albedo that is permissive to errors in the illumination estimation. Second, we propose an albedo extraction technique for specular surfaces that enlarges the range of materials we can deal with. Third, we propose a photometric metric under unknown lighting that allows registration of range images with- out any assumptions on the illumination. With these proposed methods, we significantly enlarge the practicability and range of applications of range image registration.
Illumination-free photometric metric for range image registration 査読国際誌

Diego Thomas, Akihiro Sugimoto

IEEE Workshop on Applications of Computer Vision (WACV), 2012 2012年1月

　詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）

This paper presents an illumination-free photometric metric for evaluating the goodness of a rigid transformation aligning two overlapping range images, under the assumption of Lambertian surface. Our metric is based on photometric re-projection error but not on feature detection and matching. We synthesize the color of one image using albedo of the other image to compute the photometric re-projection error. The unknown illumination and albedo are estimated from the correspondences induced by the input transformation using the spherical harmonics representation of image formation. This way allows us to derive an illumination-free photometric metric for range image alignment. We use a hypothesize-and-test method to search for the transformation that minimizes our illumination-free photometric function. Transformation candidates are efficiently generated by employing the spherical representation of each image. Experimental results using synthetic and real data show the usefulness of the proposed metric.
Robustly registering range images using local distribution of albedo 査読国際誌

Diego Thomas and Akihiro Sugimoto

Computer Vision and Image Understanding 115 ( 5 ) 649 - 667 2011年5月

　詳細を見る

記述言語：英語掲載種別：研究論文（学術雑誌）

We propose a robust method for registering overlapping range images of a Lambertian object under a rough estimate of illumination. Because reflectance properties are invariant to changes in illumination, the albedo is promising to range image registration of Lambertian objects lacking in discriminative geometric features under variable illumination. We use adaptive regions in our method to model the local distribution of albedo, which enables us to stably extract the reliable attributes of each point against illumination estimates. We use a level-set method to grow robust and adaptive regions to define these attributes. A similarity metric between two attributes is also defined to match points in the overlapping area. Moreover, remaining mismatches are efficiently removed using the rigidity constraint of surfaces. Our experiments using synthetic and real data demonstrate the robustness and effectiveness of our proposed method.

DOI： https://doi.org/10.1016/j.cviu.2010.11.016
Range image registration of specular objects under complex illumination 査読国際誌

Diego Thomas, Akihiro Sugimoto

Fifth International Symposium on 3D Data Processing, Visualization and Transmission (3DPVT2010) 2010年6月

　詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）

We present a method for range image registration of specular objects devoid of salient
geometric properties under complex lighting environment. Our method uses illumination
consistency on two range images to detect specular highlights, which are used to obtain
diffuse reflection components. By using light information estimated from the specular
highlights and the diffuse reflection components, we extract albedo at the surface of an
object, even under unknown complex lighting environment. We then robustly register the two
range images using extracted albedo. This technique can handle various kind of illumination
situations and can be applied to a wide range of materials. Our experiments using synthetic
data and real data show the effectiveness, the robustness and the accuracy of our proposed
method.
Range image registration of specular objects 査読国際誌

Diego Thomas, Akihiro Sugimoto

Proc. of CVWW’10 2010年2月

　詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）

We present a method for range image registration of specular objects devoid of salient
geometric properties under complex lighting environment. We propose to use illumination
consistency on two range images to detect specular highlights, which are used to obtain
diffuse reflection components. By using light information estimated from the specular
highlights and the diffuse reflection components, we extract photometric features invariant to
changes in pose and illumination, even under unknown complex lighting environment. We
then robustly register the two range images using these features. This technique can handle
various kind of illumination situations and can be applied to a wide range of materials. Our
experiments using synthetic data show the effectiveness, the robustness and the accuracy of
our proposed method.
Robust range image registration using local distribution of albedo 査読国際誌

Diego Thomas, Akihiro Sugimoto

IEEE 12th International Conference on Computer Vision Workshops (ICCV Workshops), 2009 2009年9月

　詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）

We propose a robust registration method for range images under a rough estimate of
illumination. Because reflectance properties are invariant to changes in illumination, they
are promising to range image registration of objects lacking in discriminative geometric
features under variable illumination. In our method, we use adaptive regions to model the
local distribution of reflectance, which enables us to stably extract reliable attributes of each
point against illumination estimation. We use a level set method to grow robust and adaptive
regions to define these attributes. A similarity metric between two attributes is defined using
the principal component analysis to find matches. Moreover, remaining mismatches are
efficiently removed using the rigidity constraint of surfaces. Our experiments using synthetic
and real data demonstrate the robustness and effectiveness of our proposed method.

▼全件表示

講演・口頭発表等

ProbeSDF: Light Field Probes For Neural Surface Reconstruction 国際共著国際会議

B Toussaint, D Thomas, JS Franco

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2025) 2025年6月

　詳細を見る

開催年月日： 2025年6月

記述言語：英語会議種別：ポスター発表
VortSDF: 3D Modeling with Centroidal Voronoi Tessellation on Signed Distance Field 国際共著国際会議

D Thomas, B Toussaint, JS Franco, E Boyer

2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV 2025) 2025年2月

　詳細を見る

開催年月日： 2025年2月

記述言語：英語会議種別：口頭発表（一般）

国名：アメリカ合衆国
Neural SDF for Shadow-Aware Unsupervised Structured Light 国際会議

K Ichimaru, D Thomas, T Iwaguchi, H Kawasaki

2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2025年2月

　詳細を見る

開催年月日： 2025年2月

記述言語：英語会議種別：口頭発表（一般）
Sparse-View 3D Reconstruction of Clothed Humans via Normal Maps 国際共著国際会議

J Wu, D Thomas, R Fedkiw

2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2025年2月

　詳細を見る

開催年月日： 2025年2月

記述言語：英語会議種別：口頭発表（一般）
Millimetric Human Surface Capture in Minutes 国際共著国際会議

B Toussaint, L Boissieux, D Thomas, E Boyer, JS Franco

SIGGRAPH Asia 2024 2024年12月

　詳細を見る

開催年月日： 2024年12月

記述言語：英語会議種別：口頭発表（一般）
3D Shape Modeling with Adaptive Centroidal Voronoi Tesselation on Signed Distance Field 国際共著

Diego Thomas, Jean-Sébastien Franco, Edmond Boyer

Meeting on Image Recognition and Understanding 2024年8月

　詳細を見る

開催年月日： 2024年8月

記述言語：英語会議種別：口頭発表（一般）
TetraTSDF: 3D human reconstruction from a single image with a tetrahedral outer shell 国際会議

Hayato Onizuka, Zehra Hayirci, Diego Thomas, Akihiro Sugimoto, Hideaki Uchiyama, Rin-ichiro Taniguchi

IEEE/CVF Conference on Computer Vision and Pattern Recognition 2020年6月

　詳細を見る

開催年月日： 2021年5月

記述言語：英語

国名：その他
Human shape reconstruction with loose clothes from partially observed data by pose specific deformation 国際会議

Akihiko Sayo, Hayato Onizuka, Diego Thomas, Yuta Nakashima, Hiroshi Kawasaki, Katsushi Ikeuchi

Pacific-Rim Symposium on Image and Video Technology 2019年11月

　詳細を見る

開催年月日： 2021年5月

記述言語：英語

国名：オーストラリア連邦
A Practical Calibration Method for Cameras and Multiple Line-Lasers in Light Sectioning Systems for Underwater Environments 国際会議

Takaki Ikeda, Takafumi Iwaguchi, Diego Thomas, Hiroshi Kawasaki

2024 IEEE International Conference on Image Processing (ICIP) 2024年10月

　詳細を見る

開催年月日： 2024年10月

記述言語：英語
Two-stage pose optimization algorithm using color information for underwater SLAM with light-sectioning-based 3D scanning method 国際会議

Takaki Ikeda, Takafumi Iwaguchi, Diego Thomas, Hiroshi Kawasaki

2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2024年10月

　詳細を見る

開催年月日： 2024年10月
Neural Active Structure-from-Motion in Dark and Textureless Environment 国際会議

Kazuto Ichimaru, Diego Thomas, Takafumi Iwaguchi, Hiroshi Kawasaki

Proceedings of the Asian Conference on Computer Vision 2024年10月

　詳細を見る

開催年月日： 2024年10月

記述言語：英語
Mean Teacher for Unsupervised Domain Adaptation in Multi-View 3D Pedestrian Detection 国際共著国際会議

João Paulo Lima, Diego Thomas, Hideaki Uchiyama, Veronica Teichrieb

2024 37th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI) 2024年9月

　詳細を見る

開催年月日： 2024年9月

記述言語：英語
ActiveNeuS: Neural Signed Distance Fields for Active Stereo 国際会議

Kazuto Ichimaru, Takaki Ikeda, Diego Thomas, Takafumi Iwaguchi, Hiroshi Kawasaki

International Conference on 3D Vision 2024年3月

　詳細を見る

開催年月日： 2024年3月

記述言語：英語

開催地：Davos 国名：スイス連邦
A Two-step Approach for Interactive Animatable Avatars 国際会議

Takumi Kitamura, Naoya Iwamoto, Hiroshi Kawasaki, Diego Thomas

COMPUTER GRAPHICS INTERNATIONAL 2023 2023年8月

　詳細を見る

開催年月日： 2023年8月 - 2023年9月

記述言語：英語会議種別：口頭発表（一般）

開催地：Shanghai 国名：中華人民共和国

We propose a two-step human body animation technique that generates pose-dependent detailed deformations in real-time on standard animation pipeline. In order to accomplish real-time animation, we utilize the template-based approach and represent pose-dependent deformations using 2D displacement maps. In order to generalize to totally new motions, we employ a two-step strategy: 1) the first step aligns the topology of the Skinned Multi-Person Linear Model (SMPL) [23] model to our proposed template model. 2) the second step models detailed clothes and muscles deformation for the specific motion. Our experimental results show that our proposed method can animate an avatar up to 30 times faster than baselines while keeping similar or even better level of details.
Refining OpenPose With a New Sports Dataset for Robust 2D Pose Estimation 国際会議

Takumi Kitamura, Hitoshi Teshima, Diego Thomas, Hiroshi Kawasaki

IEEE/CVF Winter Conference on Applications of Computer Vision 2022年1月

　詳細を見る

開催年月日： 2022年1月

記述言語：英語会議種別：口頭発表（一般）

開催地：Hawai (online) 国名：アメリカ合衆国

3D marker-less motion capture can be achieved by triangulating estimated multi-views 2D poses. However, when the 2D pose estimation fails, the 3D motion capture also fails. This is particularly challenging for sports performance of athletes, which have extreme poses. In extreme poses (like having the head down) state-of-the-art 2D pose estimator such as OpenPose do not work at all. In this paper, we propose a new method to improve the training of 2D pose estimators for extreme poses by leveraging a new sports dataset and our proposed data augmentation strategy. Our results show significant improvements over previous methods for 2D pose estimation of athletes performing acrobatic moves, while keeping state-of-the-art performance on standard datasets.

その他リンク： https://openaccess.thecvf.com/content/WACV2022W/CV4WS/html/Kitamura_Refining_OpenPose_With_a_New_Sports_Dataset_for_Robust_2D_WACVW_2022_paper.html
スポーツ選手のマーカレスモーションキャプチャーのための効率的なOpenpose再学習

#北村卓弥, 川崎洋, ディエゴトマ

情報処理学会　第248回自然言語処理・第226回コンピュータビジョンとイメージメディア合同研究発表会 2021年5月

　詳細を見る

開催年月日： 2021年5月

記述言語：日本語会議種別：シンポジウム・ワークショップパネル（公募）

国名：日本国
Analysis and Classification of Gestures in TED Talks

Hitoshi Teshima, Naoki Wake, Diego Thomas, Yuta Nakashima, Hiroshi Kawasaki, Katsushi Ikeuchi

パターン認識・メディア理解研究会 (PRMU 2020) 2020年10月

　詳細を見る

開催年月日： 2021年5月

記述言語：日本語会議種別：シンポジウム・ワークショップパネル（公募）

国名：日本国
Unsupervised 3D Human Pose Estimation in Multi-view-multi-pose Video 国際会議

Sun, Cheng, Diego Thomas, and Hiroshi Kawasaki

25th International Conference on Pattern Recognition (ICPR) 2021年1月

　詳細を見る

開催年月日： 2021年5月

記述言語：英語

国名：その他
3D human body reconstruction using RGB-D camera 招待国際会議

Diego Thomas

Asia Pacific Society for Computing and Information Technology 2019 Annual Meeting 2019年7月

　詳細を見る

開催年月日： 2019年7月

記述言語：英語会議種別：口頭発表（一般）

開催地：Sapporo, Hokkaido 国名：日本国

Consumer grade RGB-D cameras have become the commodity tool to build dense 3D models of indoor scenes. Motivated by the strong demand to build high-fidelity personal 3D avatars, there is now many efforts done to use RGB-D cameras to automatically reconstruct high-quality 3D models of the human body. This is a very difficult task because the human body non-rigidly moves during the scanning process. How to simultaneously reconstruct the detailed 3D shape of the body while accurately tracking the non-rigid motion is the main challenge that all successful systems must solve. In addition, to be used in portable devices such as smartphones, solutions that require few memory consumption and low computational power are needed. In this talk, we will first briefly review existing successful strategies for real-time 3D human body reconstruction. Then, we will present our proposed solution for 3D human body reconstruction that is light in memory consumption and computational power. Our main idea here is to separate the full body non-rigid reconstruction into multiple nearly-rigid reconstructions of body parts that are tightly stitched together.
VMPFusion: Variational Message Passing for dynamic 3D face reconstruction 招待国際会議

Diego Thomas

IDS/JFLI workshop 2018年5月

　詳細を見る

開催年月日： 2019年6月

記述言語：英語会議種別：口頭発表（一般）

開催地：Osaka 国名：日本国

In this talk I will describe a probabilistic approach for dynamic 3D face modeling using a consumer-grade RGB-D camera. In this research my goal is to formulate a strategy to fuse noisy 3D measurements captured with a Kinect camera into a 3D facial model without relying on explicit point correspondences. We propose to tackle this challenging problem with the Variational Message Passing (VMP) algorithm, which optimize a variational distribution using a message passing procedure on a graphical model. We show the validity of our formulation with real-data experiments.
3D Modeling of Large-Scale Indoor Scenes Using RGB-D Cameras 招待国際会議

Diego Thomas, Akihiro Sugimoto

The 1st International Conference on Advanced Imaging 2015年6月

　詳細を見る

開催年月日： 2018年6月

記述言語：英語会議種別：口頭発表（一般）

開催地：National Center of Science, Tokyo, Japan 国名：日本国
Synthesis of environment maps for mixed reality

David R. Walton, Diego Gabriel Francis Thomas, Anthony Steed, Akihiro Sugimoto

16th IEEE International Symposium on Mixed and Augmented Reality, ISMAR 2017 2017年11月

　詳細を見る

開催年月日： 2017年10月

記述言語：英語

開催地：Nantes 国名：フランス共和国

When rendering virtual objects in a mixed reality application, it is helpful to have access to an environment map that captures the appearance of the scene from the perspective of the virtual object. It is straightforward to render virtual objects into such maps, but capturing and correctly rendering the real components of the scene into the map is much more challenging. This information is often recovered from physical light probes, such as reflective spheres or fisheye cameras, placed at the location of the virtual object in the scene. For many application areas, however, real light probes would be intrusive or impractical. Ideally, all of the information necessary to produce detailed environment maps could be captured using a single device. We introduce a method using an RGBD camera and a small fisheye camera, contained in a single unit, to create environment maps at any location in an indoor scene. The method combines the output from both cameras to correct for their limited field of view and the displacement from the virtual object, producing complete environment maps suitable for rendering the virtual content in real time. Our method improves on previous probeless approaches by its ability to recover high-frequency environment maps. We demonstrate how this can be used to render virtual objects which shadow, reflect and refract their environment convincingly.
Fast 3D point cloud segmentation using supervoxels with geometry and color for 3D scene understanding

Francesco Verdoja, Diego Gabriel Francis Thomas, Akihiro Sugimoto

2017 IEEE International Conference on Multimedia and Expo, ICME 2017 2017年8月

　詳細を見る

開催年月日： 2017年7月

記述言語：英語

開催地：Hong Kong

Segmentation of 3D colored point clouds is a research field with renewed interest thanks to recent availability of inexpensive consumer RGB-D cameras and its importance as an unavoidable low-level step in many robotic applications. However, 3D data's nature makes the task challenging and, thus, many different techniques are being proposed, all of which require expensive computational costs. This paper presents a novel fast method for 3D colored point cloud segmentation. It starts with supervoxel partitioning of the cloud, i.e., an oversegmentation of the points in the cloud. Then it leverages on a novel metric exploiting both geometry and color to iteratively merge the supervoxels to obtain a 3D segmentation where the hierarchical structure of partitions is maintained. The algorithm also presents computational complexity linear to the size of the input. Experimental results over two publicly available datasets demonstrate that our proposed method outperforms state-of-the-art techniques.
Augmented blendshapes for real-time simultaneous 3D head modeling and facial motion capture

Diego Gabriel Francis Thomas, Rin-Ichiro Taniguchi

2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016 2016年

　詳細を見る

開催年月日： 2016年6月 - 2016年7月

記述言語：英語

開催地：Las Vegas 国名：アメリカ合衆国

We propose a method to build in real-time animated 3D head models using a consumer-grade RGB-D camera. Our framework is the first one to provide simultaneously comprehensive facial motion tracking and a detailed 3D model of the user's head. Anyone's head can be instantly reconstructed and his facial motion captured without requiring any training or pre-scanning. The user starts facing the camera with a neutral expression in the first frame, but is free to move, talk and change his face expression as he wills otherwise. The facial motion is tracked using a blendshape representation while the fine geometric details are captured using a Bump image mapped over the template mesh. We propose an efficient algorithm to grow and refine the 3D model of the head on-the-fly and in real-time. We demonstrate robust and high-fidelity simultaneous facial motion tracking and 3D head modeling results on a wide range of subjects with various head poses and facial expressions. Our proposed method offers interesting possibilities for animation production and 3D video telecommunications.
Dense 3D reconstruction using RGB-D cameras 招待国際会議

Diego Thomas

International Conference on 3DVision 2014 2014年12月

　詳細を見る

開催年月日： 2014年12月

記述言語：英語会議種別：公開講演，セミナー，チュートリアル，講習，講義等

開催地：3DV2014, Tokyo, Japan. 国名：日本国

The generation of fine 3D models from RGB-D (color plus depth) measurements is of great interest for the computer vision community. Although the 3D reconstruction pipeline has been widely studied in the last decades, a new era has started recently with the advent of low cost consumer depth cameras (called RGB-D cameras) that capture RGB-D images at a video rate (e.g., Microsoft Kinect or Asus Xtion Pro). The introduction to the public of 3D measurements has brought its own revolution to the scientific community with many projects and applications using RGB-D cameras. In this tutorial, we will give an overview of the existing 3D reconstruction methods using a single RGB-D camera using various 3D representations, including point based representations (SURFELS), implicit volumetric representations (TSDF), patch based representations and parametric representations. These different 3D scene representations give us powerful tools to build virtual representations of the real world in real-time from RGB-D cameras. We can not only reconstruct small-scale static scenes but also large-scale scenes and dynamic scenes. We will also discuss about current trend in depth sensing and future challenges for 3D scene reconstruction.
A two-stage strategy for real-time dense 3D reconstruction of large-scale scenes

Diego Gabriel Francis Thomas, Akihiro Sugimoto

13th European Conference on Computer Vision, ECCV 2014 2015年1月

　詳細を見る

開催年月日： 2014年9月

記述言語：英語

開催地：Zurich 国名：スイス連邦

The frame-to-global-model approach is widely used for accurate 3D modeling from sequences of RGB-D images. Because still no perfect camera tracking system exists, the accumulation of small errors generated when registering and integrating successive RGB-D images causes deformations of the 3D model being built up. In particular, the deformations become significant when the scale of the scene to model is large. To tackle this problem, we propose a two-stage strategy to build in details a large-scale 3D model with minimal deformations where the first stage creates accurate small-scale 3D scenes in real-time from short subsequences of RGB-D images while the second stage re-organises all the results from the first stage in a geometrically consistent manner to reduce deformations as much as possible. By employing planar patches as the 3D scene representation, our proposed method runs in real-time to build accurate 3D models with minimal deformations even for large-scale scenes. Our experiments using real data confirm the effectiveness of our proposed method.
A flexible scene representation for 3D reconstruction using an RGB-D camera

Diego Gabriel Francis Thomas, Akihiro Sugimoto

2013 14th IEEE International Conference on Computer Vision, ICCV 2013 2013年

　詳細を見る

開催年月日： 2013年12月

記述言語：英語

開催地：Sydney, NSW 国名：オーストラリア連邦

Updating a global 3D model with live RGB-D measurements has proven to be successful for 3D reconstruction of indoor scenes. Recently, a Truncated Signed Distance Function (TSDF) volumetric model and a fusion algorithm have been introduced (KinectFusion), showing significant advantages such as computational speed and accuracy of the reconstructed scene. This algorithm, however, is expensive in memory when constructing and updating the global model. As a consequence, the method is not well scalable to large scenes. We propose a new flexible 3D scene representation using a set of planes that is cheap in memory use and, nevertheless, achieves accurate reconstruction of indoor scenes from RGB-D image sequences. Projecting the scene onto different planes reduces significantly the size of the scene representation and thus it allows us to generate a global textured 3D model with lower memory requirement while keeping accuracy and easiness to update with live RGB-D measurements. Experimental results demonstrate that our proposed flexible 3D scene representation achieves accurate reconstruction, while keeping the scalability for large indoor scenes.
Compact and accurate 3-D face modeling using an RGB-D camera Let's open the door to 3-D video conference

Pavan Kumar Anasosalu, Diego Gabriel Francis Thomas, Akihiro Sugimoto

2013 14th IEEE International Conference on Computer Vision Workshops, ICCVW 2013 2013年

　詳細を見る

開催年月日： 2013年12月

記述言語：英語

開催地：Sydney, NSW 国名：オーストラリア連邦

We present a method for producing an accurate and compact 3-D face model in real time using a low cost RGB-D sensor like the Kinect camera. We extend and use Bump Images for highly accurate and low memory consumption 3-D reconstruction of the human face. Bump Images are generated by representing the Cartesian coordinates of points on the face in the spherical coordinate system whose origin is the center of the head. After initialization, the Bump Images are updated in real time with every RGB-D frame with respect to the current viewing direction and head pose that are estimated using the frame-to-global-model registration strategy. While high accuracy of the representation allows to recover fine details, low memory use opens new possible applications of consumer depth cameras such as 3-D video conferencing. We validate our approach by quantitatively comparing our result with the result obtained by a commercial high resolution laser scanner. We also discuss the potential of our proposed method for a 3-D video conferencing application with existing internet speeds.
Learning to discover objects in RGB-D images using correlation clustering

Michael Firman, Diego Gabriel Francis Thomas, Simon Julier, Akihiro Sugimoto

2013 26th IEEE/RSJ International Conference on Intelligent Robots and Systems: New Horizon, IROS 2013 2013年12月

　詳細を見る

開催年月日： 2013年11月

記述言語：英語

開催地：Tokyo 国名：日本国

We introduce a method to discover objects from RGB-D image collections which does not require a user to specify the number of objects expected to be found. We propose a probabilistic formulation to find pairwise similarity between image segments, using a classifier trained on labelled pairs from the recently released RGB-D Object Dataset. We then use a correlation clustering solver to both find the optimal clustering of all the segments in the collection and to recover the number of clusters. Unlike traditional supervised learning methods, our training data need not be of the same class or category as the objects we expect to discover. We show that this parameter-free supervised clustering method has superior performance to traditional clustering methods.
Robust simultaneous 3D registration via rank minimization

Diego Gabriel Francis Thomas, Yasuyuki Matsushita, Akihiro Sugimoto

2nd Joint 3DIM/3DPVT Conference: 3D Imaging, Modeling, Processing, Visualization and Transmission, 3DIMPVT 2012 2012年

　詳細を見る

開催年月日： 2012年10月

記述言語：英語

開催地：Zurich 国名：スイス連邦

We present a robust and accurate 3D registration method for a dense sequence of depth images taken from unknown viewpoints. Our method simultaneously estimates multiple extrinsic parameters of the depth images to obtain a registered full 3D model of the scanned scene. By arranging the depth measurements in a matrix form, we formulate the problem as a simultaneous estimation of multiple extrinsics and a low-rank matrix, which corresponds to the aligned depth images as well as a sparse error matrix. Unlike previous approaches that use sequential or heuristic global registration approaches, our solution method uses an advanced convex optimization technique for obtaining a robust solution via rank minimization. To achieve accurate computation, we develop a depth projection method that has minimum sensitivity to sampling by reading projected depth values in the input depth images. We demonstrate the effectiveness of the proposed method through extensive experiments and compare it with previous standard techniques.
Illumination-free photometric metric for range image registration

Diego Gabriel Francis Thomas, Akihiro Sugimoto

2012 IEEE Workshop on the Applications of Computer Vision, WACV 2012 2012年

　詳細を見る

開催年月日： 2012年1月

記述言語：英語

開催地：Breckenridge, CO 国名：アメリカ合衆国

This paper presents an illumination-free photometric metric for evaluating the goodness of a rigid transformation aligning two overlapping range images, under the assumption of Lambertian surface. Our metric is based on photometric re-projection error but not on feature detection and matching. We synthesize the color of one image using albedo of the other image to compute the photometric re-projection error. The unknown illumination and albedo are estimated from the correspondences induced by the input transformation using the spherical harmonics representation of image formation. This way allows us to derive an illumination-free photometric metric for range image alignment. We use a hypothesize-and-test method to search for the transformation that minimizes our illumination-free photometric function. Transformation candidates are efficiently generated by employing the spherical representation of each image. Experimental results using synthetic and real data show the usefulness of the proposed metric.
Robust range image registration using local distribution of albedo

Diego Gabriel Francis Thomas, Akihiro Sugimoto

2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops 2009 2009年

　詳細を見る

開催年月日： 2009年9月 - 2009年10月

記述言語：英語

開催地：Kyoto 国名：日本国

We propose a robust registration method for range images under a rough estimate of illumination. Because reflectance properties are invariant to changes in illumination, they are promising to range image registration of objects lacking in discriminative geometric features under variable illumination. In our method, we use adaptive regions to model the local distribution of reflectance, which enables us to stably extract reliable attributes of each point against illumination estimation. We use a level set method to grow robust and adaptive regions to define these attributes. A similarity metric between two attributes is defined using the principal component analysis to find matches. Moreover, remaining mismatches are efficiently removed using the rigidity constraint of surfaces. Our experiments using synthetic and real data demonstrate the robustness and effectiveness of our proposed method.

▼全件表示

所属学協会

IEEE

委員歴

一般社団法人情報処理学会国内

2024年4月 - 現在

　詳細を見る

団体区分：学協会

学術貢献活動

Symposium on Virtual and Augmented Reality 2025

役割：企画立案・運営等

SBC 2025年9月 - 2025年10月

　詳細を見る

種別：大会・シンポジウム等
Shonan meeting #226: The Power of Geometric Algebra in Modern Computer Vision 国際学術貢献

役割：企画立案・運営等

NII Shonan Meeting 2025年5月

　詳細を見る

種別：学会・研究会等

参加者数：18
Area chair

第27回画像の認識・理解シンポジウム MIRU2024 （ Kumamoto Japan ） 2024年8月

　詳細を見る

種別：大会・シンポジウム等

参加者数：1,000
学術論文等の審査

役割：査読

2023年

　詳細を見る

種別：査読等

外国語雑誌　査読論文数：3

国際会議録　査読論文数：25

国内会議録　査読論文数：3
Program commitee 国際学術貢献

CVPR2022 （ New Orleans, Louisiana UnitedStatesofAmerica ） 2022年6月

　詳細を見る

種別：大会・シンポジウム等

参加者数：10,000
Senior Program Committee 国際学術貢献

AAAI 2022 （ Vancouver Canada ） 2022年2月 - 2022年3月

　詳細を見る

種別：大会・シンポジウム等

参加者数：8,000
学術論文等の審査

役割：査読

2022年

　詳細を見る

種別：査読等

外国語雑誌　査読論文数：8

国際会議録　査読論文数：18

国内会議録　査読論文数：3
Senior Program Committee Member 国際学術貢献

30th International Joint Conference on Artificial Intelligence (IJCAI-21) （ Montreal, Canada Canada ） 2021年8月 - 2021年5月

　詳細を見る

種別：大会・シンポジウム等

参加者数：1,000
Program committee 国際学術貢献

CVPR 2021 （ Online UnitedStatesofAmerica ） 2021年6月 - 2021年5月

　詳細を見る

種別：大会・シンポジウム等

参加者数：10,000
Program committee 国際学術貢献

WACV 2021 （ UnitedStatesofAmerica ） 2021年3月 - 2021年5月

　詳細を見る

種別：大会・シンポジウム等

参加者数：400
講演座長

情報処理学会第83回全国大会（ Online Japan ） 2021年3月 - 2021年5月

　詳細を見る

種別：大会・シンポジウム等

参加者数：100
Local chair 国際学術貢献

3D Vision (3DV 2020) （ Fukuoka Japan ） 2020年11月

　詳細を見る

種別：大会・シンポジウム等

参加者数：300
Program committee 国際学術貢献

CVPR 2020 （ Seattle, Washington UnitedStatesofAmerica ） 2020年6月

　詳細を見る

種別：大会・シンポジウム等

参加者数：8,000
Program committee 国際学術貢献

WACV 2020 （ Aspen UnitedStatesofAmerica ） 2020年3月

　詳細を見る

種別：大会・シンポジウム等

参加者数：1,000
学術論文等の審査

役割：査読

2020年

　詳細を見る

種別：査読等

外国語雑誌　査読論文数：4

日本語雑誌　査読論文数：2

国際会議録　査読論文数：25

国内会議録　査読論文数：4
Program chair 国際学術貢献

Machine Perception and Robotics (MPR 2019) （ Biwako Kusatsu Campus (BKC), Ritsumeikan University Japan ） 2019年11月

　詳細を見る

種別：大会・シンポジウム等

参加者数：80
Area chair 国際学術貢献

The 9th Pacific-Rim Symposium on Image and Video Technology (PSIVT 2019) （ Sydney Australia ） 2019年11月

　詳細を見る

種別：大会・シンポジウム等

参加者数：80
学術論文等の審査

役割：査読

2019年

　詳細を見る

種別：査読等

外国語雑誌　査読論文数：15

国際会議録　査読論文数：25
Publicity chair 国際学術貢献

The 12th International Workshop on Information Search, Integration, and Personalization (ISIP2018) （ Kyushu University, Fukuoka, Japan Japan ） 2018年5月

　詳細を見る

種別：大会・シンポジウム等

参加者数：40
Publicity chair 国際学術貢献

The 12th International Workshop on Information Search, Integration, and Personalization (ISIP 2018) （ Kyushu University, Fukuoka Japan ） 2018年5月

　詳細を見る

種別：大会・シンポジウム等

参加者数：50
学術論文等の審査

役割：査読

2018年

　詳細を見る

種別：査読等

外国語雑誌　査読論文数：20

国際会議録　査読論文数：20
Local arrangement chair 国際学術貢献

JFLI-KYUDAI JOINT WORKSHOP ON INFORMATICS （ Ito Campus, Kyushu University, Fukuoka, Japan Japan ） 2017年9月

　詳細を見る

種別：大会・シンポジウム等

参加者数：15
学術論文等の審査

役割：査読

2017年

　詳細を見る

種別：査読等

外国語雑誌　査読論文数：10

国際会議録　査読論文数：24
Program Committee 国際学術貢献

SITIS2016 （ Naples Italy ） 2016年11月 - 2016年12月

　詳細を見る

種別：大会・シンポジウム等
Program Committee

MIRU2016 （ Hamamatsu Japan ） 2016年8月

　詳細を見る

種別：大会・シンポジウム等
学術論文等の審査

役割：査読

2016年

　詳細を見る

種別：査読等

外国語雑誌　査読論文数：2

国際会議録　査読論文数：19

国内会議録　査読論文数：1
Program Committee

MIRU2015 （ Osaka Japan ） 2015年7月

　詳細を見る

種別：大会・シンポジウム等
学術論文等の審査

役割：査読

2015年

　詳細を見る

種別：査読等

外国語雑誌　査読論文数：1

国際会議録　査読論文数：12
Program committee

MIRU2014 （ Okayama Japan ） 2014年7月

　詳細を見る

種別：大会・シンポジウム等

▼全件表示

共同研究・競争的資金等の研究課題

デジタルツインマーモセットの開発

研究課題/領域番号：25wm0625404h0002 2024年9月 - 2029年3月

AMED Neurointegration Program, Team Type B AMED Neurointegration Program, Team Type B

　詳細を見る

担当区分：研究分担者資金種別：科研費以外の競争的資金
A new data-driven approach to bring humanity into virtual worlds with computer vision

研究課題/領域番号：23H03439 2023年 - 2025年

日本学術振興会科学研究費助成事業基盤研究(B)

　詳細を見る

担当区分：研究代表者資金種別：科研費
NerF-based multi-view 3D shape reconstruction using Centroidal Voronoi Tessellation 国際共著

2022年4月 - 2023年6月

Kyushu University (Japan)

　詳細を見る

担当区分：研究代表者

多視点画像から高解像度の 3D メッシュを再構成するために、3D 空間の 3D 形状、外観、離散化を共同で最適化するための CVT の使用を調査します。
Multi-Camera 3D Pedestrian Detection with Domain Adaptation and Generalization

2022年 - 2023年

日本学術振興会 JSPS Invitational Fellowships for Research in Japan (short term)

　詳細を見る

担当区分：研究代表者資金種別：共同研究
AI-based animation of 3D avatars.

2021年6月 - 2022年5月

共同研究

　詳細を見る

担当区分：研究代表者資金種別：その他産学連携による資金
Deep human avatar animation 国際共著

2021年5月 - 2022年5月

Japan

　詳細を見る

担当区分：研究代表者

This is a joint research project with HUAWEI about learning to generate avatar animations from 2D videos in real-time
Realistic environment rendering with real humans for architecture project visualization

2021年4月 - 2022年5月

　詳細を見る

担当区分：研究代表者

This is a joint project with Professor Koga (architecture design) and Professor Ochiai (Maths for industry) about generating immersive virtual environments of architectural project to support design and evaluation.
Multi-view 3D pedestrian localisation 国際共著

2021年3月 - 2023年4月

Brazil

　詳細を見る

担当区分：研究分担者

The project is about identifying, localizing and tracking pedestrians in 3D from multi-view videos.
A new approach for supporting architectural works with virtual reality environments.

2021年 - 2022年

QR Tsubasa (つばさプロジェクト)

　詳細を見る

担当区分：研究代表者資金種別：学内資金・基金等
Weakly-supervised human 3D body shape estimation from single images 国際共著

2020年9月 - 2021年8月

U.S.A

　詳細を見る

担当区分：研究分担者

We are working on a solution to learn to estimate 3D shape of human bodies from 2D observation in an unsupervised manner.
Dynamic human motion tracking using dual quaternion algebra 国際共著

2020年7月 - 2022年3月

Japan

　詳細を見る

担当区分：研究分担者

Joint research project with Vincent Nozick from Gustave-Eiffel University in France . This project is about reconstructing non-rigid motion of human bodies captured by RGB-D cameras.
Human body 3D shape estimation, animation and gesture synthesis

2020年4月 - 2021年3月

共同研究

　詳細を見る

担当区分：研究代表者資金種別：その他産学連携による資金
Personalized avatars with real emotions for next generation holoportation systems 国際共著

2020年1月 - 2021年1月

Microsoft Research Asia

　詳細を見る

担当区分：研究代表者

Personalized avatars are the key towards more natural communication in the virtual space. If you can express yourself with not only your own voice, but your own body, expressions or emotions it allows you to better communicate. This is also a powerful way to avoid being cheated by fake characters. And there is a huge demand for real avatars and emotes, with a big business opportunity. When communicating in the virtual space it is important to transmit real expressions and real emotions, but it is also important to keep the possibility to remain anonymous. While ultra-realistic avatars that have someone’s own appearance, skin and face will surely break anonymity, body motion and gesture can convey a large part of real expressions and emotions without revealing a person’s identity. In this project, we aim at capturing full body 3D motion and fine gestures and re-targeting them into a mixed reality telepresence system (also called holoportation) deployed on the Microsoft Hololens. To achieve our objective there are three main challenges to tackle: (1) detailed 3D motion of the human body must be captured from standard RGB cameras; (2) the human motion must be faithfully re-targeted to a virtual avatar, which may have different animation characteristics than the human; (3) the avatar must be displayed in 3D with the Hololens while considering the surrounding illumination conditions. Fundamental findings unveiled in the project will provide new insights for human motion estimation, re-targeting to other bodies with different kinematics and environment mapping with mixed reality devices.
2 years training and international research

2020年 - 2022年

SENTAN-Q

　詳細を見る

担当区分：研究代表者資金種別：学内資金・基金等
3D shape estimation and motion retargeting from 2D videos for future Holoportation systems.

2020年

QR Wakaba challenge

　詳細を見る

担当区分：研究代表者資金種別：学内資金・基金等
Unifying multiple RGB and depth cameras for real-time large-scale dynamic 3D modeling with unmanned micro aerial vehicles.

2019年4月 - 2021年4月

KAKENHI

　詳細を見る

担当区分：研究代表者

このプロジェクトは、無人のマイクロ航空車両からの大規模な動的シーンのリアルタイム三次元復元に関する研究です。目的は、大規模な動的三次元シーンのリアルタイム三次元復元のために、複数のマイクロ航空機に搭載された複数のRGB画像と距離画像の融合を調査することです。ここで公開される基本的な手法は、大規模な動的三次元モデルを作成し、リアルタイムで動的な三次元シーンを理解するために必要なアルゴリズムを提供するために使用されます。
Unifying multiple RGB and depth cameras for real-time large-scale dynamic 3D modeling with unmanned micro aerial vehicles

研究課題/領域番号：19K20297 2019年 - 2020年

日本学術振興会科学研究費助成事業若手研究

　詳細を見る

担当区分：研究代表者資金種別：科研費
Facial motion capture 国際共著

2017年10月 - 2018年9月

Huawei Technologies Japan K.K (China).

　詳細を見る

担当区分：連携研究者

This project is divided into three stages, the first stage is that roughly evaluates our base algorithm, and the second stage is that evaluates the robustness for overall reconstruction (expression) ability of the facial impression transfer to any 3D avatar by any person. And the third stage is that improves facial model quality (as for providing complete facial model, we need to add eye ball and mouth).
Facial motion capture system

2017年9月 - 2018年8月

共同研究

　詳細を見る

担当区分：連携研究者資金種別：その他産学連携による資金
Free-form dynamic 3D scene reconstruction at high resolution

2017年 - 2018年

スタートアップ支援経費

　詳細を見る

担当区分：研究分担者資金種別：学内資金・基金等
Large-scale and dynamic 3D reconstruction using an RGB-D camera

2015年 - 2017年

日本学術振興会科学研究費助成事業特別研究員奨励費

　詳細を見る

担当区分：研究代表者資金種別：科研費

▼全件表示

教育活動概要

データサイエンスの実践的な演習を教えています。この授業では各学生の研究課題に応じたプログラムの実装を指導します。講義時間内での個別指導も行います。
私は大学1年生を対象に「情報科学」という授業を担当しています。
私は大学1年生を対象に「Pythonによるプログラミング」の授業を担当しています。
私は情報科学部・電気工学部で「デジタルヒューマンⅠ・Ⅱ」という授業を担当しています。
私は情報理工学部で実験授業「分散ロボット実験」を教えています。
総合科学技術・イノベーション学術院で「実践的データ解析手法1＆2」の授業を担当しています

担当授業科目

デジタルヒューマンII

2024年6月 - 2024年8月夏学期
デジタルヒューマンⅠ

2024年4月 - 2024年6月春学期
情報科学（英語）

2023年10月 - 2024年3月後期
情報理工学論議Ⅱ

2023年10月 - 2024年3月後期
情報理工学論述Ⅱ

2023年10月 - 2024年3月後期
情報理工学演示

2023年10月 - 2024年3月後期
デジタルヒューマンⅡ

2023年6月 - 2023年8月夏学期
【通年】情報理工学講究

2023年4月 - 2024年3月通年
【通年】情報理工学研究Ⅰ

2023年4月 - 2024年3月通年
【通年】情報理工学演習

2023年4月 - 2024年3月通年
情報理工学論議Ⅰ

2023年4月 - 2023年9月前期
情報理工学読解

2023年4月 - 2023年9月前期
情報理工学論述Ⅰ

2023年4月 - 2023年9月前期
分散ロボット実験

2023年4月 - 2023年6月春学期
デジタルヒューマンⅠ

2023年4月 - 2023年6月春学期
情報科学（英語）

2022年10月 - 2023年3月後期
データサイエンス演習第一

2022年10月 - 2023年3月後期
データサイエンス演習第二

2022年10月 - 2023年3月後期
情報知能工学演習第三

2021年10月 - 2022年3月後期
情報知能工学演習第一

2021年10月 - 2022年3月後期
データサイエンス演習第二

2021年10月 - 2022年3月後期
データサイエンス演習第一

2021年10月 - 2022年3月後期
プログラミング演習(P)

2021年6月 - 2021年8月夏学期
情報知能工学演習第二

2021年4月 - 2021年9月前期
データサイエンス演習第一

2020年10月 - 2021年3月後期
データサイエンス演習第二

2020年10月 - 2021年3月後期
情報科学

2020年4月 - 2020年9月前期
情報科学

2019年10月 - 2020年3月後期
データサイエンス演習第二

2019年4月 - 2019年9月前期
データサイエンス演習第一

2019年4月 - 2019年9月前期
データサイエンス演習第二

2018年4月 - 2018年9月前期
データサイエンス演習第一

2018年4月 - 2018年9月前期
データサイエンス演習第二

2017年4月 - 2017年9月前期
データサイエンス演習第一

2017年4月 - 2017年9月前期

▼全件表示

FD参加状況

2025年3月役割：参加名称：各種表彰／フェロー称号等の戦略的獲得に向けて

主催組織：全学
2025年2月役割：参加名称：プレアドミッション・サポートデスク(PSD)による留学生のための出願前支援〜導入のメリット〜

主催組織：部局
2025年1月役割：参加名称：脳内シナプスの分子マッピングとその情報処理メカニズムの解明

主催組織：部局
2025年1月役割：参加名称：日本学術振興会の人材育成事業と男女共同参画推進に関するご紹介 ― 特別研究員制度、日本学術振興会賞ほか ―

主催組織：全学

社会貢献・国際連携活動概要

ここ数年、フランス、カナダ、ブラジルとの国際研究協力に積極的に関わってきました。国際研究インターンシップ生の受け入れ、海外の研究室訪問、国際ワークショップや国際会議の共同開催など、様々な活動を行ってきました。

社会貢献活動

JSPS Science Dialogue

Fukui Prefectural Wakasa High School (Wakasa-city, Fukui) 2017年1月

　詳細を見る

対象：幼稚園以下,　小学生,　中学生,　高校生

種別：セミナー・ワークショップ

外国人研究者等の受け入れ状況

受入れ期間： 2025年6月 - 2025年12月（期間）：1ヶ月以上

国籍：台湾

専業主体：科学技術振興機構
受入れ期間： 2025年2月 - 2025年12月（期間）：1ヶ月以上

国籍：フランス共和国

専業主体：科学技術振興機構
Max Planck Institute

受入れ期間： 2025年2月 - 2025年3月（期間）：1ヶ月以上

国籍：フランス共和国

専業主体：科学技術振興機構

海外渡航歴

2016年12月

滞在国名1：フランス共和国滞在機関名1：INRIA Grenoble
2011年3月 - 2011年7月

滞在国名1：中華人民共和国滞在機関名1：Microsoft Research Asia
2010年2月

滞在国名1：チェコ共和国滞在機関名1：Center for Machine Perception (CMP)