|内山 英昭（うちやま ひであき）||データ更新日：2019.07.02|
准教授 ／ 附属図書館
|1.||Nicolas Antigny, Hideaki Uchiyama, Myriam Servières, Valérie Renaudin, Diego Gabriel Francis Thomas, Rin-Ichiro Taniguchi, Solving monocular visual odometry scale factor with adaptive step length estimates for pedestrians using handheld devices, Sensors (Switzerland), 10.3390/s19040953, 19, 4, 2019.02, [URL], The urban environments represent challenging areas for handheld device pose estimation (i.e., 3D position and 3D orientation) in large displacements. It is even more challenging with low-cost sensors and computational resources that are available in pedestrian mobile devices (i.e., monocular camera and Inertial Measurement Unit). To address these challenges, we propose a continuous pose estimation based on monocular Visual Odometry. To solve the scale ambiguity and suppress the scale drift, an adaptive pedestrian step lengths estimation is used for the displacements on the horizontal plane. To complete the estimation, a handheld equipment height model, with respect to the Digital Terrain Model contained in Geographical Information Systems, is used for the displacement on the vertical axis. In addition, an accurate pose estimation based on the recognition of known objects is punctually used to correct the pose estimate and reset the monocular Visual Odometry. To validate the benefit of our framework, experimental data have been collected on a 0.7 km pedestrian path in an urban environment for various people. Thus, the proposed solution allows to achieve a positioning error of 1.6-7.5% of the walked distance, and confirms the benefit of the use of an adaptive step length compared to the use of a fixed-step length..|
|2.||Daisuke Kobayashi, Diego Gabriel Francis Thomas, Hideaki Uchiyama, Rin-Ichiro Taniguchi, 3D Body and Background Reconstruction in a Large-scale Indoor Scene using Multiple Depth Cameras, 12th Asia Pacific Workshop on Mixed and Augmented Reality, APMAR 2019
Proceedings of the 2019 12th Asia Pacific Workshop on Mixed and Augmented Reality, APMAR 2019, 10.1109/APMAR.2019.8709280, 2019.05, [URL], 3D reconstruction of indoor scenes that contain non-rigidly moving human body using depth cameras is a task of extraordinary difficulty. Despite intensive efforts from the researchers in the 3D vision community, existing methods are still limited to reconstruct small scale scenes. This is because of the difficulty to track the camera motion when a target person moves in a totally different direction. Due to the narrow field of view (FoV) of consumer-grade red-green-blue-depth (RGB-D) cameras, a target person (generally put at about 2-3 meters from the camera) covers most of the FoV of the camera. Therefore, there are not enough features from the static background to track the motion of the camera. In this paper, we propose a system which reconstructs a moving human body and the background of an indoor scene using multiple depth cameras. Our system is composed of three Kinects that are approximately set in the same line and facing the same direction so that their FoV do not overlap (to avoid interference). Owing to this setup, we capture images of a person moving in a large scale indoor scene. The three Kinect cameras are calibrated with a robust method that uses three large non parallel planes. A moving person is detected by using human skeleton information, and is reconstructed separately from the static background. By separating the human body and the background, static 3D reconstruction can be adopted for the static background area while a method specialized for the human body area can be used to reconstruct the 3D model of the moving person. The experimental result shows the performance of proposed system for human body in a large-scale indoor scene..
|3.||Chao Ma, Atsushi Shimada, Hideaki Uchiyama, Hajime Nagahara, Rin-Ichiro Taniguchi, Fall detection using optical level anonymous image sensing system, Optics and Laser Technology, 10.1016/j.optlastec.2018.07.013, 110, 44-61, 2019.02, [URL], Fall is one of the leading causes of injury for the elderly individuals. Systems that automatically detect falls can significantly reduce the delay of assistance. Most of commercialized fall detection systems are based on wearable devices, which elderly individuals tend to forget wearing. Using surveillance cameras to detect falls based on computer vision is ideal, because anyone in the monitoring scopes can be under protection. However, the privacy protection issue using surveillance cameras has been bothering people. To effectively protect the privacy, we proposed an optical level anonymous image sensing system, which can protect the privacy by hiding the facial regions optically at the video capturing phase. We apply the system to fall detection. In detecting falls, we propose a neural network by combining a 3D convolutional neural network for feature extraction and an autoencoder for modelling the normal behaviors. The learned autoencoder reconstructs the features extracted from videos with normal behaviors with smaller average errors than those extracted from videos with falls. We evaluated our neural network by a hold-out validation experiment, and showed its effectiveness. In field tests, we showed and discussed the applicability of the optical level anonymous image sensing system for privacy protection and fall detection..|
|4.||Masashi Mishima, Hideaki Uchiyama, Diego Gabriel Francis Thomas, Rin-Ichiro Taniguchi, Rafael Roberto, João Paulo Lima, Veronica Teichrieb, Incremental 3D cuboid modeling with drift compensation, Sensors (Switzerland), 10.3390/s19010178, 19, 1, 2019.01, [URL], This paper presents a framework of incremental 3D cuboid modeling by using the mapping results of an RGB-D camera based simultaneous localization and mapping (SLAM) system. This framework is useful in accurately creating cuboid CAD models from a point cloud in an online manner. While performing the RGB-D SLAM, planes are incrementally reconstructed from a point cloud in each frame to create a plane map. Then, cuboids are detected in the plane map by analyzing the positional relationships between the planes, such as orthogonality, convexity, and proximity. Finally, the position, pose, and size of a cuboid are determined by computing the intersection of three perpendicular planes. To suppress the false detection of the cuboids, the cuboid shapes are incrementally updated with sequential measurements to check the uncertainty of the cuboids. In addition, the drift error of the SLAM is compensated by the registration of the cuboids. As an application of our framework, an augmented reality-based interactive cuboid modeling system was developed. In the evaluation at cluttered environments, the precision and recall of the cuboid detection were investigated, compared with a batch-based cuboid detection method, so that the advantages of our proposed method were clarified..|
|5.||Chuanhua Lu, Hideaki Uchiyama, Diego Gabriel Francis Thomas, Atsushi Shimada, Rin-Ichiro Taniguchi, Indoor positioning system based on chest-mounted IMU, Sensors (Switzerland), 10.3390/s19020420, 19, 2, 2019.01, [URL], Demand for indoor navigation systems has been rapidly increasing with regard to location-based services. As a cost-effective choice, inertial measurement unit (IMU)-based pedestrian dead reckoning (PDR) systems have been developed for years because they do not require external devices to be installed in the environment. In this paper, we propose a PDR system based on a chest-mounted IMU as a novel installation position for body-suit-type systems. Since the IMU is mounted on a part of the upper body, the framework of the zero-velocity update cannot be applied because there are no periodical moments of zero velocity. Therefore, we propose a novel regression model for estimating step lengths only with accelerations to correctly compute step displacement by using the IMU data acquired at the chest. In addition, we integrated the idea of an efficient map-matching algorithm based on particle filtering into our system to improve positioning and heading accuracy. Since our system was designed for 3D navigation, which can estimate position in a multifloor building, we used a barometer to update pedestrian altitude, and the components of our map are designed to explicitly represent building-floor information. With our complete PDR system, we were awarded second place in 10 teams for the IPIN 2018 Competition Track 2, achieving a mean error of 5.2 m after the 800 m walking event..|
|6.||Masashi Mishima, Hideaki Uchiyama, Diego Gabriel Francis Thomas, Rin-Ichiro Taniguchi, Rafael Roberto, João Paulo Lima, Veronica Teichrieb, RGB-D SLAM based incremental cuboid modeling, 15th European Conference on Computer Vision, ECCV 2018
Computer Vision – ECCV 2018 Workshops, Proceedings, 10.1007/978-3-030-11009-3_25, 414-429, 2019.01, [URL], This paper present a framework for incremental 3D cuboid modeling combined with RGB-D SLAM. While performing RGB-D SLAM, planes are incrementally reconstructed from point clouds. Then, cuboids are detected in the planes by analyzing the positional relationships between the planes; orthogonality, convexity, and proximity. Finally, the position, pose and size of a cuboid are determined by computing the intersection of three perpendicular planes. In addition, the cuboid shapes are incrementally updated to suppress false detections with sequential measurements. As an application of our framework, an augmented reality based interactive cuboid modeling system is introduced. In the evaluation at a cluttered environment, the precision and recall of the cuboid detection are improved with our framework owing to stable plane detection, compared with a batch based method..
|7.||Chuanhua Lu, Hideaki Uchiyama, Diego Gabriel Francis Thomas, Atsushi Shimada, Rin-Ichiro Taniguchi, Sparse cost volume for efficient stereo matching, Remote Sensing, 10.3390/rs10111844, 10, 11, 2018.11, [URL], Stereo matching has been solved as a supervised learning task with convolutional neural network (CNN). However, CNN based approaches basically require huge memory use. In addition, it is still challenging to find correct correspondences between images at ill-posed dim and sensor noise regions. To solve these problems, we propose Sparse Cost Volume Net (SCV-Net) achieving high accuracy, low memory cost and fast computation. The idea of the cost volume for stereo matching was initially proposed in GC-Net. In our work, by making the cost volume compact and proposing an efficient similarity evaluation for the volume, we achieved faster stereo matching while improving the accuracy. Moreover, we propose to use weight normalization instead of commonly-used batch normalization for stereo matching tasks. This improves the robustness to not only sensor noises in images but also batch size in the training process. We evaluated our proposed network on the Scene Flow and KITTI 2015 datasets, its performance overall surpasses the GC-Net. Comparing with the GC-Net, our SCV-Net achieved to: (1) reduce 73.08% GPU memory cost; (2) reduce 61.11% processing time; (3) improve the 3PE from 2.87% to 2.61% on the KITTI 2015 dataset..|
|8.||Clément Glédel, Hideaki Uchiyama, Yuji Oyamada, Rin-Ichiro Taniguchi, Texture synthesis for stable planar tracking, 24th ACM Symposium on Virtual Reality Software and Technology, VRST 2018
Proceedings - VRST 2018
24th ACM Symposium on Virtual Reality Software and Technology, 10.1145/3281505.3283399, 2018.11, [URL], We propose a texture synthesis method to enhance the trackability of a target planar object by embedding natural features into the object in the object design process. To transform an input object into an easy-to-track object in the design process, we extend an inpainting method for naturally embedding the features into the texture. First, a feature-less region in an input object is extracted based on feature distribution based segmentation. Then, the region is filled by using an inpainting method with a feature-rich region searched in an object database. By using context based region search, the inpainted region can be consistent in terms of the object context while improving the feature distribution..
|9.||Hideaki Uchiyama, Yuji Oyamada, Transparent Random Dot Markers, 24th International Conference on Pattern Recognition, ICPR 2018
2018 24th International Conference on Pattern Recognition, ICPR 2018, 10.1109/ICPR.2018.8545845, 254-259, 2018.11, [URL], This paper presents random dot markers (RDM) printed on transparent sheets as transparent fiducial markers. They are extremely unobstructive, and useful for developing novel user interfaces. However, the marker identification is required to be robust to observable back sides of the transparent sheets. To realize such markers, we propose a graph based framework for geometric feature based robust point matching for RDM. Instead of building one-to-one correspondences, we first build one-to-many correspondences using a 2D affinity matrix, and then globally optimize the matching assignment from the matrix. Especially, we incorporate pairwise relationship between neighboring points using local geometric descriptors into the matrix, and finally solve it with spectral matching. In the evaluation, we investigate the effectiveness of the global assignment from one-to-many correspondences, and finally show that our proposed method is enough robust to identifying overlapped markers..
|10.||Yoshiki Hashimoto, Daisaku Arita, Atsushi Shimada, Takashi Yoshinaga, Takashi Okayasu, Hideaki Uchiyama, Rin-Ichiro Taniguchi, Yield visualization based on farm work information measured by smart devices, Sensors (Switzerland), 10.3390/s18113906, 18, 11, 2018.11, [URL], This paper proposes a new approach to visualizing spatial variation of plant status in a tomato greenhouse based on farm work information operated by laborers. Farm work information consists of a farm laborer’s position and action. A farm laborer’s position is estimated based on radio wave strength measured by using a smartphone carried by the farm laborer and Bluetooth beacons placed in the greenhouse. A farm laborer’s action is recognized based on motion data measured by using smartwatches worn on both wrists of the farm laborer. As experiment, harvesting information operated by one farm laborer in a part of a tomato greenhouse is obtained, and the spatial distribution of yields in the experimental field, called a harvesting map, is visualized. The mean absolute error of the number of harvested tomatoes in each small section of the experimental field is 0.35. An interview with the farm manager shows that the harvesting map is useful for intuitively grasping the states of the greenhouse..|
|11.||Nicolas Olivier, Hideaki Uchiyama, Masashi Mishima, Diego Gabriel Francis Thomas, Rin-Ichiro Taniguchi, Rafael Roberto, Joao Paulo Lima, Veronica Teichrieb, Live structural modeling using RGB-D SLAM, 2018 IEEE International Conference on Robotics and Automation, ICRA 2018
2018 IEEE International Conference on Robotics and Automation, ICRA 2018, 10.1109/ICRA.2018.8460973, 6352-6358, 2018.09, [URL], This paper presents a method for localizing primitive shapes in a dense point cloud computed by the RGB-D SLAM system. To stably generate a shape map containing only primitive shapes, the primitive shape is incrementally modeled by fusing the shapes estimated at previous frames in the SLAM, so that an accurate shape can be finally generated. Specifically, the history of the fusing process is used to avoid the influence of error accumulation in the SLAM. The point cloud of the shape is then updated by fusing the points in all the previous frames into a single point cloud. In the experimental results, we show that metric primitive modeling in texture-less and unprepared environments can be achieved online..
|12.||Atsutoshi Hanasaki, Hideaki Uchiyama, Atsushi Shlmada, Rin Ichiro Taniquch, Deep Localization on Panoramic Images, 25th IEEE Conference on Virtual Reality and 3D User Interfaces, VR 2018
25th IEEE Conference on Virtual Reality and 3D User Interfaces, VR 2018 - Proceedings, 10.1109/VR.2018.8446048, 567-568, 2018.08, [URL], Sensor pose estimation is an essential technology for various applications. For instance, it can be used not only to display immersive contents according user movements in Virtual Reality (VR) and but also to superimpose computer-generated objects onto images from a camera in Augmented Reality (AR). As a technical term definition, camera localization with respect to a pre-created map database is specifically referred to as image based localization, memory based localization, or camera relocalization..
|13.||Rafael Roberto, João Paulo Lima, Hideaki Uchiyama, Clemens Arth, Veronica Teichrieb, Rin-Ichiro Taniguchi, DIeter Schmalstieg, Incremental Structural Modeling Based on Geometric and Statistical Analyses, 18th IEEE Winter Conference on Applications of Computer Vision, WACV 2018
Proceedings - 2018 IEEE Winter Conference on Applications of Computer Vision, WACV 2018, 10.1109/WACV.2018.00110, 2018-January, 955-963, 2018.05, [URL], Finding high-level semantic information from a point cloud is a challenging task, and it can be used in various applications. For instance, it is useful to compactly represent the scene structure and efficiently understand the scene context. This task is even more challenging when using a hand-held monocular visual SLAM system that outputs a noisy sparse point cloud. In order to tackle this issue, we propose an incremental primitive modeling method using both geometric and statistical analyses for such point cloud. The main idea is to select only reliably-modeled shapes by analyzing the geometric relationship between the point cloud and the estimated shapes. Besides that, a statistical evaluation is incorporated to filter wrongly-detected primitives in a noisy point cloud. As a result of this processing, our approach largely improved precision when compared with state of the art methods. We also show the impact of segmenting and representing a scene using primitives instead of a point cloud..
|14.||Tsubasa Minematsu, Atsushi Shimada, Hideaki Uchiyama, Vincent Charvillat, Rin-Ichiro Taniguchi, Reconstruction-based change detection with image completion for a free-moving camera, Sensors (Switzerland), 10.3390/s18041232, 18, 4, 2018.04, [URL], Reconstruction-based change detection methods are robust for camera motion. The methods learn reconstruction of input images based on background images. Foreground regions are detected based on the magnitude of the difference between an input image and a reconstructed input image. For learning, only background images are used. Therefore, foreground regions have larger differences than background regions. Traditional reconstruction-based methods have two problems. One is over-reconstruction of foreground regions. The other is that decision of change detection depends on magnitudes of differences only. It is difficult to distinguish magnitudes of differences in foreground regions when the foreground regions are completely reconstructed in patch images. We propose the framework of a reconstruction-based change detection method for a free-moving camera using patch images. To avoid over-reconstruction of foreground regions, our method reconstructs a masked central region in a patch image from a region surrounding the central region. Differences in foreground regions are enhanced because foreground regions in patch images are removed by the masking procedure. Change detection is learned from a patch image and a reconstructed image automatically. The decision procedure directly uses patch images rather than the differences between patch images. Our method achieves better accuracy compared to traditional reconstruction-based methods without masking patch images..|
|15.||Hideaki Uchiyama, Shunsuke Sakurai, Yoshiki Hashimoto, Atsutoshi Hanasaki, Daisaku Arita, Takashi Okayasu, Atsushi Shimada, Rin-Ichiro Taniguchi, Sensing technologies for advanced smart agricultural systems, 11th International Conference on Sensing Technology, ICST 2017
2017 11th International Conference on Sensing Technology, ICST 2017, 10.1109/ICSensT.2017.8304451, 1-4, 2018.02, [URL], We introduce our sensing technologies to acquire agricultural information, such as image-based plant phenotyping, harvest quantity data, and localization information using a camera in a greenhouse. Commercial systems exist that support agriculture, but many unresolved issues remain regarding optimization of farming sustainability and productivity. Therefore, we intend to apply state-of-the-art information and communication technology (ICT) to tackle these agricultural issues and to investigate their limitations for developing advanced smart agricultural systems..
|16.||Hideaki Uchiyama, Shunsuke Sakurai, Masashi Mishima, Daisaku Arita, Takashi Okayasu, Atsushi Shimada, Rin-Ichiro Taniguchi, An Easy-to-Setup 3D Phenotyping Platform for KOMATSUNA Dataset, 16th IEEE International Conference on Computer Vision Workshops, ICCVW 2017
Proceedings - 2017 IEEE International Conference on Computer Vision Workshops, ICCVW 2017, 10.1109/ICCVW.2017.239, 2038-2045, 2018.01, [URL], We present a 3D phenotyping platform that measures both plant growth and environmental information in small indoor environments for plant image datasets. Our objective is to construct a compact and complete platform by using commercial devices to allow any researcher to begin plant phenotyping in their laboratory. In addition, we introduce our annotation tool to manually but effectively create leaf labels in plant images on a pixel-by-pixel basis. Finally, we show our RGB-D and multiview datasets containing images in the early growth stages of the Komatsuna with leaf annotation..
|17.||Tsubasa Minematsu, Atsushi Shimada, Hideaki Uchiyama, Rin-Ichiro Taniguchi, Analytics of deep neural network-based background subtraction, Journal of Imaging, 10.3390/jimaging4060078, 4, 6, 2018.01, [URL], Deep neural network-based (DNN-based) background subtraction has demonstrated excellent performance for moving object detection. The DNN-based background subtraction automatically learns the background features from training images and outperforms conventional background modeling based on handcraft features. However, previous works fail to detail why DNNs work well for change detection. This discussion helps to understand the potential of DNNs in background subtraction and to improve DNNs. In this paper, we observe feature maps in all layers of a DNN used in our investigation directly. The DNN provides feature maps with the same resolution as that of the input image. These feature maps help to analyze DNN behaviors because feature maps and the input image can be simultaneously compared. Furthermore, we analyzed important filters for the detection accuracy by removing specific filters from the trained DNN. From the experiments, we found that the DNN consists of subtraction operations in convolutional layers and thresholding operations in bias layers and scene-specific filters are generated to suppress false positives from dynamic backgrounds. In addition, we discuss the characteristics and issues of the DNN based on our observation..|
|18.||Shunsuke Sakurai, Hideaki Uchiyama, Atsushi Shimada, Daisaku Arita, Rin-Ichiro Taniguchi, Two-step transfer learning for semantic plant segmentation, 7th International Conference on Pattern Recognition Applications and Methods, ICPRAM 2018
ICPRAM 2018 - Proceedings of the 7th International Conference on Pattern Recognition Applications and Methods, 332-339, 2018.01, We discuss the applicability of a fully convolutional network (FCN), which provides promising performance in semantic segmentation tasks, to plant segmentation tasks. The challenge lies in training the network with a small dataset because there are not many samples in plant image datasets, as compared to object image datasets such as ImageNet and PASCAL VOC datasets. The proposed method is inspired by transfer learning, but involves a two-step adaptation. In the first step, we apply transfer learning from a source domain that contains many objects with a large amount of labeled data to a major category in the plant domain. Then, in the second step, category adaptation is performed from the major category to a minor category with a few samples within the plant domain. With leaf segmentation challenge (LSC) dataset, the experimental results confirm the effectiveness of the proposed method such that F-measure criterion was, for instance, 0.953 for the A2 dataset, which was 0.355 higher than that of direct adaptation, and 0.527 higher than that of non-adaptation..
|19.||Daisaku Arita, Yoshiki Hashimoto, Atsushi Shimada, Hideaki Uchiyama, Rin-Ichiro Taniguchi, Visualization of farm field information based on farm worker activity sensing, 6th International Conference on Distributed, Ambient and Pervasive Interactions, DAPI 2018 Held as Part of HCI International 2018
Distributed, Ambient and Pervasive Interactions
Understanding Humans - 6th International Conference, DAPI 2018, Held as Part of HCI International 2018, Proceedings, 10.1007/978-3-319-91125-0_16, 191-202, 2018.01, [URL], Our research goal is to construct a system to measure farm labor activities in a farm field and visualize farm field information based on the activities. As the first step for the goal, this paper proposes a method to measure harvesting information of farm labors in a tomato greenhouse and to visualize the tomato yield distribution in the greenhouse, we call it a harvesting map, for supporting the farm managers making decisions. A harvesting map shows daily, weekly and monthly tomato yields in small sections into which the tomato greenhouse is divided..
|20.||Chao Ma, Ngo Thanh Trung, Hideaki Uchiyama, Hajime Nagahara, Atsushi Shimada, Rin-ichiro Taniguchi, Adapting Local Features for Face Detection in Thermal Image, Sensors, 2017.12.|
|21.||Chao Ma, Ngo Thanh Trung, Hideaki Uchiyama, Hajime Nagahara, Atsushi Shimada, Rin-Ichiro Taniguchi, Adapting local features for face detection in thermal image, Sensors (Switzerland), 10.3390/s17122741, 17, 12, 2017.12, [URL], A thermal camera captures the temperature distribution of a scene as a thermal image. In thermal images, facial appearances of different people under different lighting conditions are similar. This is because facial temperature distribution is generally constant and not affected by lighting condition. This similarity in face appearances is advantageous for face detection. To detect faces in thermal images, cascade classifiers with Haar-like features are generally used. However, there are few studies exploring the local features for face detection in thermal images. In this paper, we introduce two approaches relying on local features for face detection in thermal images. First, we create new feature types by extending Multi-Block LBP. We consider a margin around the reference and the generally constant distribution of facial temperature. In this way, we make the features more robust to image noise and more effective for face detection in thermal images. Second, we propose an AdaBoost-based training method to get cascade classifiers with multiple types of local features. These feature types have different advantages. In this way we enhance the description power of local features. We did a hold-out validation experiment and a field experiment. In the hold-out validation experiment, we captured a dataset from 20 participants, comprising 14 males and 6 females. For each participant, we captured 420 images with 10 variations in camera distance, 21 poses, and 2 appearances (participant with/without glasses). We compared the performance of cascade classifiers trained by different sets of the features. The experiment results showed that the proposed approaches effectively improve the performance of face detection in thermal images. In the field experiment, we compared the face detection performance in realistic scenes using thermal and RGB images, and gave discussion based on the results..|
|22.||Tsubasa Minematsu, Hideaki Uchiyama, Atsushi Shimada, Hajime Nagahara, Rin-Ichiro Taniguchi, Adaptive background model registration for moving cameras, Pattern Recognition Letters, 10.1016/j.patrec.2017.03.010, 96, 86-95, 2017.09, [URL], We propose a framework for adaptively registering background models with an image for background subtraction with moving cameras. Existing methods search for a background model using a fixed window size, to suppress the number of false positives when detecting the foreground. However, these approaches result in many false negatives because they may use inappropriate window sizes. The appropriate size depends on various factors of the target scenes. To suppress false detections, we propose adaptively controlling the method parameters, which are typically determined heuristically. More specifically, the search window size for background registration and the foreground detection threshold are automatically determined using the re-projection error computed by the homography based camera motion estimate. Our method is based on the fact that the error at a pixel is low if it belongs to background and high if it does not. We quantitatively confirmed that the proposed framework improved the background subtraction accuracy when applied to images from moving cameras in various public datasets..|
|23.||Rafael Roberto, Hideaki Uchiyama, Joao Paulo Lima, Hajime Nagahara, Rin-Ichiro Taniguchi, Veronica Teichrieb, Incremental structural modeling on sparse visual SLAM, 15th IAPR International Conference on Machine Vision Applications, MVA 2017
Proceedings of the 15th IAPR International Conference on Machine Vision Applications, MVA 2017, 10.23919/MVA.2017.7986765, 30-33, 2017.07, [URL], This paper presents an incremental structural modeling approach that improves the precision and stability of existing batch based methods for sparse and noisy point clouds from visual SLAM. The main idea is to use the generating process of point clouds on SLAM effectively. First, a batch based method is applied to point clouds that are incrementally generated from SLAM. Then, the temporal history of reconstructed geometric primitives is statistically merged to suppress incorrect reconstruction. The evaluation shows that both precision and stability are improved compared to a batch based method and the proposed method is suitable for real-time structural modeling..
|24.||Takafumi Taketomi, Hideaki Uchiyama, Sei Ikeda, Visual SLAM algorithms: a survey from 2010 to 2016, IPSJ Transactions on Computer Vision and Applications, 2017.06.|
|25.||Rafael A. Roberto, Hideaki Uchiyama, João Paulo S. M. Lima, Hajime Nagahara, Rin-ichiro Taniguchi, Veronica Teichrieb, Incremental structural modeling on sparse visual SLAM, IPSJ Transactions on Computer Vision and Applications, 9, 5, 2017.03.|
|26.||Tsubasa Minematsu, Hideaki Uchiyama, Atsushi Shimada, Hajime Nagahara, Rin-ichiro Taniguchi, Adaptive background model registration for moving cameras, Pattern Recognition Letters, 2017.03.|
|27.||Chao Ma, Ngo Thanh Trung, Hideaki Uchiyama, Hajime Nagahara, Atsushi Shimada, Rin-Ichiro Taniguchi, Mixed features for face detection in thermal image, 13th International Conference on Quality Control by Artificial Vision, QCAV 2017
Thirteenth International Conference on Quality Control by Artificial Vision 2017, 10.1117/12.2266836, 2017.01, [URL], An infrared (IR) camera captures the temperature distribution of an object as an IR image. Because facial temperature is almost constant, an IR camera has the potential to be used in detecting facial regions in IR images. However, in detecting faces, a simple temperature thresholding does not always work reliably. The standard face detection algorithm used is AdaBoost with local features, such as Haar-like, MB-LBP, and HOG features in the visible images. However, there are few studies using these local features in IR image analysis. In this paper, we propose an AdaBoost-based training method to mix these local features for face detection in thermal images. In an experiment, we captured a dataset from 20 participants, comprising 14 males and 6 females, with 10 variations in camera distance, 21 poses, and participants with and without glasses. Using leave-one-out cross-validation, we show that the proposed mixed features have an advantage over all the regular local features..
|28.||Tsubasa Minematsu, Atsushi Shimada, Hideaki Uchiyama, Rin-Ichiro Taniguchi, Simple Combination of Appearance and Depth for Foreground Segmentation, 19th International Conference on Image Analysis and Processing, ICIAP 2017
New Trends in Image Analysis and Processing – ICIAP 2017 - ICIAP International Workshops, WBICV, SSPandBE, 3AS, RGBD, NIVAR, IWBAAS, and MADiMa 2017, Revised Selected Papers, 10.1007/978-3-319-70742-6_25, 266-277, 2017.01, [URL], In foreground segmentation, the depth information is robust to problems of the appearance information such as illumination changes and color camouflage; however, the depth information is not always measured and suffers from depth camouflage. In order to compensate for the disadvantages of the two pieces of information, we define an energy function based on the two likelihoods of depth and appearance backgrounds and minimize the energy using graph cuts to obtain a foreground mask. The two likelihoods are obtained using background subtraction. We use the farthest depth as the depth background in the background subtraction according to the depth information. The appearance background is defined as the appearance with a large likelihood of the depth background to eliminate appearances of foreground objects. In the computation of the likelihood of the appearance background, we also use the likelihood of the depth background for reducing false positives owing to illumination changes. In our experiment, we confirm that our method is sufficiently accurate for indoor environments using the SBM-RGBD 2017 dataset..
|29.||Eric Marchand, 内山 英昭, Fabien Spindler, Pose estimation for augmented reality: a hands-on survey, 22, 12, 2633-2651, 2016.12.|
|30.||Eric Marchand, Hideaki Uchiyama, Fabien Spindler, Pose Estimation for Augmented Reality
A Hands-On Survey, IEEE Transactions on Visualization and Computer Graphics, 10.1109/TVCG.2015.2513408, 22, 12, 2633-2651, 2016.12, [URL], Augmented reality (AR) allows to seamlessly insert virtual objects in an image sequence. In order to accomplish this goal, it is important that synthetic elements are rendered and aligned in the scene in an accurate and visually acceptable way. The solution of this problem can be related to a pose estimation or, equivalently, a camera localization process. This paper aims at presenting a brief but almost self-contented introduction to the most important approaches dedicated to vision-based camera localization along with a survey of several extension proposed in the recent years. For most of the presented approaches, we also provide links to code of short examples. This should allow readers to easily bridge the gap between theoretical aspects and practical implementations..
|31.||Liming Yang, Hideaki Uchiyama, Jean Marie Normand, Guillaume Moreau, Hajime Nagahara, Rin-Ichiro Taniguchi, Real-time surface of revolution reconstruction on dense SLAM, 4th International Conference on 3D Vision, 3DV 2016
Proceedings - 2016 4th International Conference on 3D Vision, 3DV 2016, 10.1109/3DV.2016.13, 28-36, 2016.12, [URL], We present a fast and accurate method for reconstructing surfaces of revolution (SoR) on 3D data and its application to structural modeling of a cluttered scene in real-time. To estimate a SoR axis, we derive an approximately linear cost function for fast convergence. Also, we design a framework for reconstructing SoR on dense SLAM. In the experiment results, we show our method is accurate, robust to noise and runs in real-time..
|32.||Joao Paulo Silva do Monte Lima, Francisco Paulo Magalhaes Simoes, Hideaki Uchiyama, Veronica Teichrieb, Eric Marchand, Depth-assisted rectification for real-time object detection and pose estimation, 2016.02.|
|33.||João Paulo Silva do Monte Lima, Francisco Paulo Magalhães Simões, Hideaki Uchiyama, Veronica Teichrieb, Eric Marchand, Depth-assisted rectification for real-time object detection and pose estimation, Machine Vision and Applications, 10.1007/s00138-015-0740-8, 27, 2, 193-219, 2016.02, [URL], RGB-D sensors have become in recent years a product of easy access to general users. They provide both a color image and a depth image of the scene and, besides being used for object modeling, they can also offer important cues for object detection and tracking in real time. In this context, the work presented in this paper investigates the use of consumer RGB-D sensors for object detection and pose estimation from natural features. Two methods based on depth-assisted rectification are proposed, which transform features extracted from the color image to a canonical view using depth data in order to obtain a representation invariant to rotation, scale and perspective distortions. While one method is suitable for textured objects, either planar or non-planar, the other method focuses on texture-less planar objects. Qualitative and quantitative evaluations of the proposed methods are performed, showing that they can obtain better results than some existing methods for object detection and pose estimation, especially when dealing with oblique poses..|
|34.||Mohamed A. Abdelwahab, Moataz M. Abdelwahab, Hideaki Uchiyama, Atsushi Shimada, Rin-Ichiro Taniguchi, Video object segmentation based on superpixel trajectories, 13th International Conference on Image Analysis and Recognition, ICIAR 2016
Image Analysis and Recognition - 13th International Conference, ICIAR 2016, Proceedings, 10.1007/978-3-319-41501-7_22, 9730, 191-197, 2016.01, [URL], In this paper, a video object segmentation method utilizing the motion of superpixel centroids is proposed. Our method achieves the same advantages of methods based on clustering point trajectories, furthermore obtaining dense clustering labels from sparse ones becomes very easy. Simply for each superpixel the label of its centroid is propagated to all its entire pixels. In addition to the motion of superpixel centroids, histogram of oriented optical flow, HOOF, extracted from superpixels is used as a second feature. After segmenting each object, we distinguish between foreground objects and the background utilizing the obtained clustering results..
|35.||Ryo Kawahata, Atsushi Shimada, Takayoshi Yamashita, Hideaki Uchiyama, Rin-Ichiro Taniguchi, Design of a low-false-positive gesture for awearable device, 5th International Conference on Pattern Recognition Applications and Methods, ICPRAM 2016
ICPRAM 2016 - Proceedings of the 5th International Conference on Pattern Recognition Applications and Methods, 581-588, 2016, As smartwatches are becoming more widely used in society, gesture recognition, as an important aspect of interaction with smartwatches, is attracting attention. An accelerometer that is incorporated in a device is often used to recognize gestures. However, a gesture is often detected falsely when a similar pattern of action occurs in daily life. In this paper, we present a novel method of designing a new gesture that reduces false detection. We refer to such a gesture as a low-false-positive (LFP) gesture. The proposed method enables a gesture design system to suggest LFP motion gestures automatically. The user of the system can design LFP gestures more easily and quickly than what has been possible in previous work. Our method combines primitive gestures to create an LFP gesture. The combination of primitive gestures is recognized quickly and accurately by a random forest algorithm using our method. We experimentally demonstrate the good recognition performance of our method for a designed gesture with a high recognition rate and without false detection..
|36.||Tsubasa Minematsu, Hideaki Uchiyama, Atsushi Shimada, Hajime Nagahara, Rin-Ichiro Taniguchi, Adaptive search of background models for object detection in images taken by moving cameras, IEEE International Conference on Image Processing, ICIP 2015
2015 IEEE International Conference on Image Processing, ICIP 2015 - Proceedings, 10.1109/ICIP.2015.7351278, 2015-December, 2626-2630, 2015.12, [URL], We propose a strategy of background subtraction for an image sequence captured by a moving camera. To adapt for camera motion, it is necessary to estimate the relation between consecutive frames in background subtraction. However, simple background subtraction using the relation between consecutive frames results in many false detections. We use re-projection error to handle this problem. The re-projection error has a low value in a background region. According to re-projection error, our method searches neighboring background models and tunes a threshold value for detection in order to reduce false detections. We evaluated the accuracy of detection of our method in experiments. Our method provided better detection than a method that does not search neighboring background models. Our method thus reduced the number of false detections..
|37.||Yosuke Nakagawa, Hideaki Uchiyama, Hajime Nagahara, Rin-Ichiro Taniguchi, Estimating Surface Normals with Depth Image Gradients for Fast and Accurate Registration, 2015 International Conference on 3D Vision, 3DV 2015
Proceedings - 2015 International Conference on 3D Vision, 3DV 2015, 10.1109/3DV.2015.80, 640-647, 2015.11, [URL], We present a fast registration framework with estimating surface normals from depth images. The key component in the framework is to utilize adjacent pixels and compute the normal at each pixel on a depth image by following three steps. First, image gradients on a depth image are computed with a 2D differential filtering. Next, two 3D gradient vectors are computed from horizontal and vertical depth image gradients. Finally, the normal vector is obtained from the cross product of the 3D gradient vectors. Since horizontal and vertical adjacent pixels at each pixel are considered composing a local 3D plane, the 3D gradient vectors are equivalent to tangent vectors of the plane. Compared with existing normal estimation based on fitting a plane to a point cloud, our depth image gradients based normal estimation is extremely faster because it needs only a few mathematical operations. We apply it to normal space sampling based 3D registration and validate the effectiveness of our registration framework by evaluating its accuracy and computational cost with a public dataset..
|38.||Hideaki Uchiyama, Takafumi Taketomi, Sei Ikeda, Joao Paulo Silva Do Monte Lima, [POSTER] Abecedary tracking and mapping
A toolkit for tracking competitions, 14th IEEE International Symposium on Mixed and Augmented Reality, ISMAR 2015
Proceedings of the 2015 IEEE International Symposium on Mixed and Augmented Reality, ISMAR 2015, 10.1109/ISMAR.2015.63, 198-199, 2015.11, [URL], This paper introduces a toolkit with camera calibration, monocular visual Simultaneous Localization and Mapping (vSLAM) and registration with a calibration marker. With the toolkit, users can perform the whole procedure of the ISMAR on-site tracking competition in 2015. Since the source code is designed to be well-structured and highly-readable, users can easily install and modify the toolkit. By providing the toolkit, we encourage beginners to learn tracking techniques and to participate in the competition..
|39.||Tsubasa Minematsu, Hideaki Uchiyama, Atsushi Shimada, Hajime Nagahara, Rin-Ichiro Taniguchi, Evaluation of foreground detection methodology for a moving camera, 2015 21st Korea-Japan Joint Workshop on Frontiers of Computer Vision, FCV 2015
2015 Frontiers of Computer Vision, FCV 2015, 10.1109/FCV.2015.7103752, 2015.01, [URL], Detection of moving objects is one of the key steps for vision based applications. Many previous works leverage background subtraction using background models and assume that image sequences are captured from a stationary camera. These methods are not directly applied to image sequences from a moving camera because both foreground and background objects move with respect to the camera. One of the approaches to tackle this problem is to estimate background movement by computing pixel correspondences between frames such as homography. With this approach, moving objects can be detected by using existing background subtraction. In this paper, we evaluate detection of foreground objects for image sequences from a moving camera. Especially, we focus on homography as a camera motion. In our evaluation we change the following parameters: changing feature points, the number of them and estimation methods of homography. We analyze its effect on detection of moving objects in regard to detection accuracy, processing time. Through experiments, we show requirement of background models in image sequences form a moving camera..
|40.||Hao Liu, Atsushi Shimada, Xing Xu, Hajime Nagahara, Hideaki Uchiyama, Rin-Ichiro Taniguchi, Query expansion with pairwise learning in object retrieval challenge, 2015 21st Korea-Japan Joint Workshop on Frontiers of Computer Vision, FCV 2015
2015 Frontiers of Computer Vision, FCV 2015, 10.1109/FCV.2015.7103703, 2015.01, [URL], Making a reasonable ranking on images in dataset is one of the main objectives for object retrieval challenge, and in this paper we intend to improve the ranking quality. We follow the idea of query expansion in previous researches. Based on the use of bag-of-visual-words model, tf-idf scoring and spatial verification, previous method applied a pointwise style learning in query expansion stage, using but not fully exploring verification results. We intend to extend their learning approach for better discriminative power in retrieval. In re-ranking stage we propose a method using pairwise learning, instead of pointwise learning previously used. We could obtain more reliable ranking on a shortlist of examples. If this verification itself is reliable, a good re-ranking should best preserve this sub-ranking order. Thus in our proposed method, we are motivated to leverage a pairwise learning method to incorporate the ranking sequential information more efficiently. We evaluate and compare our proposed method with previous methods over Oxford 5k dataset, a standard benchmark dataset, where our method achieve better mean average precision and showed better discriminative power..
|41.||Hideaki Uchiyama, Shinichiro Haruyama, Atsushi Shimada, Hajime Nagahara, Rin-Ichiro Taniguchi, Spatially-multiplexed MIMO markers, 2015 10th IEEE Symposium on 3D User Interfaces, 3DUI 2015
2015 IEEE Symposium on 3D User Interfaces, 3DUI 2015 - Proceedings, 10.1109/3DUI.2015.7131765, 191-192, 2015.01, [URL], We present spatially-multiplexed fiducial markers with the framework of code division multiple access (CDMA), which is a technique in the field of communications. Since CDMA based multiplexing is robust to signal noise and interference, multiplexed markers can be demultiplexed under several image noises and transformation. With this framework, we explore the paradigm of multiple-input and multiple-output (MIMO) for fiducial markers so that the data capacity of markers can be improved and different users can receive different data from a multiplexed marker..
|42.||João Paulo Lima, Francisco Simões, Hideaki Uchiyama, Veronica Teichrieb, Eric Marchand, Depth-Assisted Rectification of Patches
Using RGB-D consumer devices to improve real-time keypoint matching, 8th International Conference on Computer Vision Theory and Applications, VISAPP 2013
VISAPP 2013 - Proceedings of the International Conference on Computer Vision Theory and Applications, 1, 651-656, 2013.05, This paper presents a method named Depth-Assisted Rectification of Patches (DARP), which exploits depth information available in RGB-D consumer devices to improve keypoint matching of perspectively distorted images. This is achieved by generating a projective rectification of a patch around the keypoint, which is normalized with respect to perspective distortions and scale. The DARP method runs in real-time and can be used with any local feature detector and descriptor. Evaluations with planar and non-planar scenes show that DARP can obtain better results than existing keypoint matching approaches in oblique poses..
|43.||Hideaki Uchiyama, Recent trends on visual tracking for augmented reality, 19th International Display Workshops in Conjunction with Asia Display 2012, IDW/AD 2012
Society for Information Display - 19th International Display Workshops 2012, IDW/AD 2012, 1742-1745, 2012.12, Computer vision technologies play an important role for estimating and tracking a camera pose in augmented reality applications. This paper reports state-of-the-art visual tracking technologies and classifies them into three categories: fiducial marker based augmented reality, object template based augmented reality and wide area augmented reality..
|44.||João Paulo Lima, Hideaki Uchiyama, Veronica Teichrieb, Eric Marchand, Texture-less planar object detection and pose estimation using Depth-Assisted Rectification of Contours, 11th IEEE and ACM International Symposium on Mixed and Augmented Reality, ISMAR 2012
ISMAR 2012 - 11th IEEE International Symposium on Mixed and Augmented Reality 2012, Science and Technology Papers, 10.1109/ISMAR.2012.6402582, 297-298, 2012.12, [URL], This paper presents a method named Depth-Assisted Rectification of Contours (DARC) for detection and pose estimation of texture-less planar objects using RGB-D cameras. It consists in matching contours extracted from the current image to previously acquired template contours. In order to achieve invariance to rotation, scale and perspective distortions, a rectified representation of the contours is obtained using the available depth information. DARC requires only a single RGB-D image of the planar objects in order to estimate their pose, opposed to some existing approaches that need to capture a number of views of the target object. It also does not require to generate warped versions of the templates, which is commonly needed by existing object detection techniques. It is shown that the DARC method runs in real-time and its detection and pose estimation quality are suitable for augmented reality applications..
|45.||Sandy Martedi, Hideaki Uchiyama, Guillermo Enriquez, Hideo Saito, Tsutomu Miyashita, Takenori Hara, Foldable augmented maps, IEICE Transactions on Information and Systems, 10.1587/transinf.E95.D.256, E-95-D, 1, 256-266, 2012.01, [URL], This paper presents a folded surface detection and tracking method for augmented maps. First, we model a folded surface as two connected planes. Therefore, in order to detect a folded surface, the plane detection method is iteratively applied to the 2D correspondences between an input image and a reference plane. In order to compute the exact folding line from the detected planes for visualization purpose, the intersection line of the planes is computed from their positional relationship. After the detection is done, each plane is individually tracked by the frame-by-frame descriptor update method. We overlay virtual geographic data on each detected plane. As scenario of use, some interactions on the folded surface are introduced. Experimental results show the accuracy and performance of folded surface detection for evaluating the effectiveness of our approach..|
|46.||Hideaki Uchiyama, Eric Marchand, Deformable random dot markers, 2011 10th IEEE International Symposium on Mixed and Augmented Reality, ISMAR 2011
2011 10th IEEE International Symposium on Mixed and Augmented Reality, ISMAR 2011, 10.1109/ISMAR.2011.6092394, 237-238, 2011.12, [URL], We extend planar fiducial markers using random dots  to non-rigidly deformable markers. Because the recognition and tracking of random dot markers are based on keypoint matching, we can estimate the deformation of the markers with nonrigid surface detection from keypoint correspondences. First, the initial pose of the markers is computed from a homography with RANSAC as a planar detection. Second, deformations are estimated from the minimization of a cost function for deformable surface fitting. We show augmentation results of 2D surface deformation recovery with several markers..
|47.||Tomoki Hayashit, Hideaki Uchiyama, Julien Pilet, Hideo Saito, Tabletop augmented reality system with omnidirectional camera using specific object recognition, Kyokai Joho Imeji Zasshi/Journal of the Institute of Image Information and Television Engineers, 10.3169/itej.65.976, 65, 7, 976-982, 2011.12, [URL], We present an augmented reality (AR) system that is based on a tabletop system and has a hemisphere omnidirectional camera. We perpendicularly set the camera on a tabletop display and showed an omnidirectional image on the display screen. When users present objects to the camera, these objects are captured and recognized by using a specific object recognition technique. On the bases of the recognition result, AR contents are overlaid onto the omnidirectional image on the screen. Our system includes following two advantages. First, users can interact with each other by recognizing the relationship among the users, objects, and their AR contents because the entire surrounding of the tabletop is shown as a circular image on the screen. Second, multiple users interact here without the need for any specific devices. To describe the applicability of our system, we present various examples of applications exploiting its advantages..|
|48.||Hideaki Uchiyama, Eric Marchand, Toward augmenting everything
Detecting and tracking geometrical features on planar objects, 2011 10th IEEE International Symposium on Mixed and Augmented Reality, ISMAR 2011
2011 10th IEEE International Symposium on Mixed and Augmented Reality, ISMAR 2011, 10.1109/ISMAR.2011.6092366, 17-25, 2011.12, [URL], This paper presents an approach for detecting and tracking various types of planar objects with geometrical features. We combine tra- ditional keypoint detectors with Locally Likely Arrangement Hash- ing (LLAH)  for geometrical feature based keypoint matching. Because the stability of keypoint extraction affects the accuracy of the keypoint matching, we set the criteria of keypoint selection on keypoint response and the distance between keypoints. In order to produce robustness to scale changes, we build a non-uniform im- age pyramid according to keypoint distribution at each scale. In the experiments, we evaluate the applicability of traditional keypoint detectors with LLAH for the detection. We also compare our ap- proach with SURF and finally demonstrate that it is possible to de- tect and track different types of textures including colorful pictures, binary fiducial markers and handwritings..
|49.||Hideaki Uchiyama, Hideo Saito, Myriam Servières, Guillaume Moreau, Camera tracking by online learning of keypoint arrangements using LLAH in augmented reality applications, Virtual Reality, 10.1007/s10055-010-0173-7, 15, 2-3, 109-117, 2011.06, [URL], We propose a camera-tracking method by on-line learning of keypoint arrangements in augmented reality applications. As target objects, we deal with intersection maps from GIS and text documents, which are not dealt with by the popular SIFT and SURF descriptors. For keypoint matching by keypoint arrangement, we use locally likely arrangement hashing (LLAH), in which the descriptors of the arrangement in a viewpoint are not invariant to the wide range of viewpoints because the arrangement is changeable with respect to viewpoints. In order to solve this problem, we propose online learning of descriptors using new configurations of keypoints at new viewpoints. The proposed method allows keypoint matching to proceed under new viewpoints. We evaluate the performance and robustness of our tracking method using view changes..|
|50.||Hideaki Uchiyama, Hideo Saito, Random dot markars, 18th IEEE Virtual Reality Conference, VR 2011
VR 2011 - IEEE Virtual Reality 2011, Proceedings, 10.1109/VR.2011.5759433, 35-38, 2011.05, [URL], This paper presents a novel approach for detecting and tracking markers with randomly scattered dots for augmented reality applications. Compared with traditional markers with square pattern, our random dot markers have several significant advantages for flexible marker design, robustness against occlusion and user interaction. The retrieval and tracking of these markers are based on geometric feature based keypoint matching and tracking. We experimentally demonstrate that the discriminative ability of forty random dots per marker is applicable for retrieving up to one thousand markers..
|51.||Hideaki Uchiyama, Hideo Saito, Random dot markers, 18th IEEE Virtual Reality Conference, VR 2011
VR 2011 - IEEE Virtual Reality 2011, Proceedings, 10.1109/VR.2011.5759503, 271-272, 2011.05, [URL], We introduce a novel type of markers with randomly scattered dots for augmented reality applications. Compared with traditional square markers, our markers have several significant advantages for flexible marker design, robustness against occlusion and user interaction. Our markers do not need to have a black frame, and their shape is not limited to square because the retrieval and tracking of the markers are based on geometric feature based keypoint matching. In our demonstration, we show real-time simultaneous retrieval and tracking of the markers on a laptop..
|52.||Yusuke Yamamoto, Hideaki Uchiyama, Yasuaki Kakehi, OnNote
A musical interface using markerless physical scores, ACM Special Interest Group on Computer Graphics and Interactive Techniques Conference, SIGGRAPH'11
ACM SIGGRAPH 2011 Posters, SIGGRAPH'11, 10.1145/2037715.2037771, 2011, [URL].
|53.||Byung Kuk Seo, Hideaki Uchiyama, Jong Il Park, StAR
Visualizing constellations with star retrieval, SIGGRAPH Asia 2011 Posters, SA'11
SIGGRAPH Asia 2011 Posters, SA'11, 10.1145/2073304.2073364, 2011, [URL], Thanks to the great feature that allows to intuitively visualize virtual contents by superimposing them on real scenes, augmented reality (AR) technologies have widely been used in various fields such as entertainment, advertisement, education, tourism, and industrial/ medical applications. In particular, AR applications in education have provided good solutions for attracting interests of users and enhancing their understanding in learning capabilities. For example, AR-based books are very helpful for users to easily understand text-based knowledge by augmenting supplementary contents on the books..
|54.||Yusuke Yamamoto, Hideaki Uchiyama, Yasuaki Kakehi, onNote
Playing printed music scores as a musical instrument, 24th Annual ACM Symposium on User Interface Software and Technology, UIST'11
UIST'11 - Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology, 10.1145/2047196.2047249, 413-422, 2011, [URL], This paper presents a novel musical performance system named onNote that directly utilizes printed music scores as a musical instrument. This system can make users believe that sound is indeed embedded on the music notes in the scores. The users can play music simply by placing, moving and touching the scores under a desk lamp equipped with a camera and a small projector. By varying the movement, the users can control the playing sound and the tempo of the music. To develop this system, we propose an image processing based framework for retrieving music from a music database by capturing printed music scores. From a captured image, we identify the scores by matching them with the reference music scores, and compute the position and pose of the scores with respect to the camera. By using this framework, we can develop novel types of musical interactions..
|55.||Sandy Martedi, Hideaki Uchiyama, Hideo Saito, Clickable augmented documents, 2010 IEEE International Workshop on Multimedia Signal Processing, MMSP2010
2010 IEEE International Workshop on Multimedia Signal Processing, MMSP2010, 10.1109/MMSP.2010.5662012, 162-166, 2010.12, [URL], This paper presents an Augmented Reality (AR) system for physical text documents that enable users to click a document. In the system, we track the relative pose between a camera and a document to overlay some virtual contents on the document continuously. In addition, we compute the trajectory of a fingertip based on skin color detection for clicking interaction. By merging a document tracking and an interaction technique, we have developed a novel tangible document system. As an application, we develop an AR dictionary system that overlays the meaning and explanation of words by clicking on a document. In the experiment part, we present the accuracy of the clicking interaction and the robustness of our document tracking method against the occlusion..
|56.||Sandy Martedi, Hideaki Uchiyama, Guillermo Enriquez, Hideo Saito, Tsutomu Miyashita, Takenori Hara, Foldable augmented maps, 9th IEEE International Symposium on Mixed and Augmented Reality 2010: Science and Technology, ISMAR 2010
9th IEEE International Symposium on Mixed and Augmented Reality 2010
Science and Technology, ISMAR 2010 - Proceedings, 10.1109/ISMAR.2010.5643552, 65-72, 2010.12, [URL], This paper presents folded surface detection and tracking for augmented maps. For the detection, plane detection is iteratively applied to 2D correspondences between an input image and a reference plane because the folded surface is composed of multiple planes. In order to compute the exact folding line from the detected planes, the intersection line of the planes is computed from their positional relationship. After the detection is done, each plane is individually tracked by frame-by-frame descriptor update. For a natural augmentation on the folded surface, we overlay virtual geographic data on each detected plane. The user can interact with the geographic data by finger pointing because the finger tip of the user is also detected during the tracking. As scenario of use, some interactions on the folded surface are introduced. Experimental results show the accuracy and performance of folded surface detection for evaluating the effectiveness of our approach..
|57.||Tomoki Hayashi, Hideaki Uchiyama, Julien Pilet, Hideo Saito, An augmented reality setup with an omnidirectional camera based on multiple object detection, 2010 20th International Conference on Pattern Recognition, ICPR 2010
Proceedings - 2010 20th International Conference on Pattern Recognition, ICPR 2010, 10.1109/ICPR.2010.776, 3171-3174, 2010.11, [URL], We propose a novel augmented reality (AR) setup with an omnidirectional camera on a table top display. The table acts as a mirror on which real playing cards appear augmented with virtual elements. The omnidirectional camera captures and recognizes its surrounding based on a feature based image retrieval approach which achieves fast and scalable registration. It allows our system to superimpose virtual visual effects to the omnidirectional camera image. In our AR card game, users sit around a table top display and show a card to the other players. The system recognizes it and augments it with virtual elements in the omnidirectional image acting as a mirror. While playing the game, the users can interact with each other directly and through the display. Our setup is a new, simple, and natural approach to augmented reality. It opens new doors to traditional card games..
|58.||Hideaki Uchiyama, Julien Pilet, Hideo Saito, On-line document registering and retrieving system for AR annotation overlay, 1st Augmented Human International Conference, AH'10
Proceedings of the 1st Augmented Human International Conference, AH '10, 10.1145/1785455.1785478, 2010.07, [URL], We propose a system that registers and retrieves text documents to annotate them on-line. The user registers a text document captured from a nearly top view and adds virtual annotations. When the user thereafter captures the document again, the system retrieves and displays the appropriate annotations, in real-time and at the correct location. Registering and deleting documents is done by user interaction. Our approach relies on LLAH, a hashing based method for document image retrieval. At the on-line registering stage, our system extracts keypoints from the input image and stores their descriptors computed from their neighbors. After registration, our system can quickly find the stored document corresponding to an input view by matching keypoints. From the matches, our system estimates the geometrical relationship between the camera and the document for accurately overlaying the annotations. In the experimental results, we show that our system can achieve on-line and real-time performances..
|59.||Hideaki Uchiyama, Hideo Saito, Myriam Servières, Guillaume Moreau, AR geovisualization framework based on on-line geographic data matching between a map with intersections and its intersection database on GIS, Kyokai Joho Imeji Zasshi/Journal of the Institute of Image Information and Television Engineers, 10.3169/itej.64.563, 64, 4, 563-569, 2010.04, [URL], We propose a novel geovisualization framework based on geographic data matching between maps with intersections and their corresponding intersection database on GIS. First, users select several regions to view. The GIS generates maps with intersections and registers these intersections and their contents in our system. The users then view the map content in augmented reality (AR). To identify several dozens of area maps, we propose using LLAH (Locally Likely Arrangement Hashing), a point retrieval method using local geometry of neighbor points, when retrieving the intersections. We have improved LLAH to the point where it can now retrieve intersections with a free-moving camera. Experimental results indicate that intersection retrieval-based map retrieval is possible within several dozens of maps. Additional evaluations demonstrate the robustness of our improved LLAH from several different views..|
|60.||Fumihisa Shibata, Sei Ikeda, Takeshi Kurata, Hideaki Uchiyama, An Intermediate Report of TrakMark WG - International voluntary activities on establishing benchmark test schemes for AR/MR geometric registration and tracking methods, 9th IEEE International Symposium on Mixed and Augmented Reality 2010: Science and Technology, ISMAR 2010
9th IEEE International Symposium on Mixed and Augmented Reality 2010: Science and Technology, ISMAR 2010 - Proceedings, 10.1109/ISMAR.2010.5643613, 298-302, 2010, [URL], In the study of AR/MR field, tracking and geometric registration methods are very important topics that are actively discussed. Especially, the study on tracking is flourishing and many algorithms are being proposed every year. With this trend in mind, we, the TrakMark WG, had proposed benchmark test schemes for geometric registration and tracking in ARlMR at ISM AR 2009 . This paper is an intermediate report of the TrakMark WG, which describes its activities and the first proposal on benchmarking image sequences..
|61.||Sandy Martedi, Hideaki Uchiyama, Guillermo Enriquez, Hideo Saito, Tsutomu Miyashita, Takenori Hara, Foldable augmented maps, 9th IEEE International Symposium on Mixed and Augmented Reality 2010: Science and Technology, ISMAR 2010
9th IEEE International Symposium on Mixed and Augmented Reality 2010: Science and Technology, ISMAR 2010 - Proceedings, 10.1109/ISMAR.2010.5643621, 312, 2010, [URL], This demonstration presents folded surface detection and tracking for augmented maps. We model the folded surface as multiple planes. To detect a folded surface, plane detection is iteratively applied to 2D correspondences between an input image and a reference plane. In order to compute the exact folding line from the detected planes, the intersection line of the planes is computed from their positional relationship. After the detection is done, each plane is individually tracked by frame-by-frame descriptor update. For a natural augmentation on the folded surface, we overlay virtual geographic data on each detected plane..
|62.||Hideaki Uchiyama, Hideo Saito, Myriam Servières, Guillaume Moreau, AR city representation system based on map recognition using topological information, 3rd International Conference on Virtual and Mixed Reality, VMR 2009. Held as Part of HCI International 2009
Virtual and Mixed Reality - Third International Conference, VMR 2009, Held as Part of HCI International 2009, Proceedings, 10.1007/978-3-642-02771-0_15, 128-135, 2009.12, [URL], This paper presents a system for overlaying 3D GIS data information such as 3D buildings onto a 2D physical urban map. We propose a map recognition framework by analysis of distribution of local intersections in order to recognize the area of the physical map from a whole map. The retrieval of the geographical area described by the physical map is based on a hashing scheme, which is called LLAH. In the results, we will show some applications augmenting additional information on the map..
|63.||Hideaki Uchiyama, Hideo Saito, Augmenting text document by on-line learning of local arrangement of keypoints, 8th IEEE 2009 International Symposium on Mixed and Augmented Reality, ISMAR 2009 - Science and Technology
Science and Technology Proceedings - IEEE 2009 International Symposium on Mixed and Augmented Reality, ISMAR 2009, 10.1109/ISMAR.2009.5336491, 95-98, 2009.12, [URL], We propose a technique for text document tracking over a large range of viewpoints. Since the popular SIFT or SURF descriptors typically fail on such documents, our method considers instead local arrangement of keypoints. We extends Locally Likely Arrangement Hashing (LLAH), which is limited to fronto-parallel images: We handle a large range of viewpoints by learning the behavior of keypoint patterns when the camera viewpoint changes. Our method starts tracking a document from a nearly frontal view. Then, it undergoes motion, and new configurations of keypoints appear. The database is incrementally updated to reflect these new observations, allowing the system to detect the document under the new viewpoint. We demonstrate the performance and robustness of our method by comparing it with the original LLAH..
|64.||Naoyuki Yazawa, Hideaki Uchiyama, Hideo Saito, Myriam Servières, Guillaume Moreau, Image based view localization system retrieving from a panorama database by SURF, 11th IAPR Conference on Machine Vision Applications, MVA 2009
Proceedings of the 11th IAPR Conference on Machine Vision Applications, MVA 2009, 118-121, 2009.12, We propose a system for estimating a user's view direction with its location of a captured image by retrieving its corresponding region from panoramas in a database. In our database, 104 panoramas captured within a local area are stored. For retrieving a user's location, the query image captured by the user's planarprojection camera is compared with all the panoramas in the database. SURF is used for finding corresponding points between the query and each panorama. The panorama which gets maximum number of corresponding points is selected as the location. In addition, a user's view direction is also estimated by reprojecting the center of the query onto the selected panorama. As a result, image based view localization with panoramas can be achieved..
|65.||Hideaki Uchiyama, Hideo Saito, Rotated image based photomosaic using combination of principal component hashing, 3rd Pacific Rim Symposium on Image and Video Technology, PSIVT 2009
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 10.1007/978-3-540-92957-4_58, 5414 LNCS, 668-679, 2009.02, [URL], This paper introduces a new method of Photomosaic. In this method, we propose to use tiled images that can be rotated in a restricted range. The tiled images are selected from a database. The selection of an image is done by a hashing method based on principal component analysis of a database. After computing the principal components of the database, various kinds of hash tables based on the linear combination of the principal component are prepared beforehand. Using our hashing method, we can reduce the computation time for selecting the tiled images based on the approximated nearest neighbor searching in consideration of a distribution of data in a database. We demonstrate the effectiveness of our hashing method by using a huge number of data in high dimensional space and better looking results of our tiling in experimental results..
|66.||Hideaki Uchiyama, Masaki Yoshino, Sinichiro Haruyama, Hideo Saito, Masao Nakagawa, Takao Kakehashi, Naoki Nagamoto, “A Photogrammetric System Based on Visible Light Communications Using Light Markers”, Journal of the Institute of Image Electronics Engineers of Japan, 10.11371/iieej.38.703, 38, 5, 703-711, 2009.01, [URL], We propose a photogrammetric system based on visible light communications using a light marker as a reference mark. Our system automatically matches the light markers captured from multiple viewpoints. Each light blinks as a signal based on visible light communications. Using the signal, the automatic matching of the light markers is achieved. In addition, the automatic detection of the light markers is achieved by the rule of pulse position modulation and cyclic redundancy check. In our experimental results, the accuracy of the detection of the light areas and photogrammetry using light markers are discussed..|
|67.||Hideaki Uchiyama, Hideo Saito, Myriam Servières, Guillaume Moreau, AR GIS on a physical map based on map image retrieval using LLAH tracking, 11th IAPR Conference on Machine Vision Applications, MVA 2009
Proceedings of the 11th IAPR Conference on Machine Vision Applications, MVA 2009, 382-385, 2009, This paper presents a method for retrieving a corresponding map of a captured map image from a map database. Our method is inspired from LLAH based Document Image Retrieval (DIR). LLAH is a method for recognizing a point by using a LLAH feature composed of its neighbor points. Since Map Image Retrieval (MIR) is achieved by analyzing distribution of intersections, the LLAH feature is used in order to describe the distribution. In our method, registration and retrieval in LLAH based DIR are improved for reducing the computational costs of the retrieval. In addition, the LLAH features are updated while a camera is moving. Our improvements enable MIR to the case of strong camera tilting, occlusion and fewer intersections..
|68.||Hideaki Uchiyama, Masaki Yoshino, Hideo Saito, Masao Nakagawa, Shinichiro Haruyama, Takao Kakehashi, Naoki Nagamoto, Photogrammetric system using visible light communication, 34th Annual Conference of the IEEE Industrial Electronics Society, IECON 2008
Proceedings - 34th Annual Conference of the IEEE Industrial Electronics Society, IECON 2008, 10.1109/IECON.2008.4758222, 1771-1776, 2008.01, [URL], We propose an automated photogrammetric system using visible light communication. Our system can be applied to the measurement of a variety of distances using a light as a reference point. In addition, the matching of same lights in different viewpoints can be automatically done by using unique blinking patterns. A light area is extracted based on a rule of the lighting patterns without a pre-known threshold. Experimental results show that our system can provide enough accuracy for photogrammetry..
|69.||Hideaki Uchiyama, Hideo Saito, AR display of visual aids for supporting pool games by online markerless tracking, 17th International Conference on Artificial Reality and Telexistence, ICAT 2007
Proceedings 17th International Conference on Artificial Reality and Telexistence, ICAT 2007, 10.1109/ICAT.2007.17, 172-179, 2007.12, [URL], This paper presents a supporting system for pool games by computer vision based augmented reality technology. Main purpose of this system is to present visual aids drawn on a pool table through LCD display of a camera mounted handheld device without any artificial marker. Since a pool table is rectangle, a pool ball is sphere and each has a specific color, those servo as a substitute for artificial markers. Using these natural features, the registration of visual aids such as shooting direction and ball behavior is achieved. Also, our supporting information is computed based on the rules of pool games and includes the next shooting way by simulating ball behavior. Experimental results represent that the accuracy of ball positions is enough for computing our supporting information..
|70.||Hideaki Uchiyama, Hideo Saito, Position estimation of solid balls from handy camera for pool supporting system, 1st Pacific Rim Symposium on Image and Video Technology, PSIVT 2006
Advances in Image and Video Technology - First Pacific Rim Symposium, PSIVT 2006, Proceedings, 10.1007/11949534-39, 393-402, 2006.12, [URL], This paper presents a method for estimating positions of solid balls from images which are captured using a handy camera moving around the pool table. Since the camera moves around by hand in this method, the motion of the camera in 3D space should be estimated. For the camera motion estimation, a homography is calculated by extracting the green felt region of the table-top area that is approximated to a polygon. Then, the balls are extracted from the table-top region for obtaining the positions of the balls. The 3D position of each ball is estimated using a projection matrix determined by the homography. The ball areas are classified by distribution of RGB data in each area. We apply our method to image sequences taken with a handy camera for evaluating the accuracy of the ball position estimation. By this experiment, we confirm that the accuracy of the estimated position is up to 18mm error, which is sufficiently small for displaying the strategy information in the pool supporting system..