Deep perception technology is mainly divided into perception of static objects and dynamic objects. Static perception is generally environmental perception, which requires the geometric, color, and other feature details of the actual environment to be displayed as detailed as possible; dynamic perception needs to perceive objects that appear in real time in the environment, such as people, animals, vehicles, etc., and generally requires the deployment of low-power, high-performance real-time algorithms on the device side. With the development of deep learning and big data, and the popularization of mobile GPUs, high-precision static perception and real-time dynamic perception can be achieved (as shown in Figure 1).
Figure 1. Autonomous driving-3D depth perception
Main research content: Research on a series of low-cost and efficient perception algorithms for static scenes, such as autonomous driving road information; develop algorithms and systems for 3D perception of randomly appearing dynamic objects, using a combination of color cameras, millimeter-wave radar, and big data priors.
Direction2
High-fidelity digitization of natural scenes is an important trend in the development of future visual imaging technology, supporting revolutionary developments in frontier field applications such as navigation and human-computer interaction, virtual reality and augmented reality (VR/AR), online cultural tourism, engineering design, etc. (as shown in Figure 2). Traditional scene reconstruction uses MVS methods, which have problems such as feature point mismatches, inability to accurately match dynamic objects in the scene, and excessive storage consumption for large scenes. Scene reconstruction using implicit representation and differentiable rendering can optimize matching points through backpropagation, separate modeling of dynamic and static objects, reduce memory consumption through implicit storage, and achieve high-quality immersive scene experiences.
Figure 2. Large-scale scene reconstruction
Main research content: High-precision scene reconstruction for complex environments, research on autonomous collection and reconstruction algorithms and systems based on drones, including autonomous path planning, scene understanding, and three-dimensional reconstruction.
Direction3
High-fidelity digitization of natural humans is one of the core tasks of digital twins, studying the reconstruction of high-fidelity three-dimensional digital objects from real targets, which is the basis for subsequent interaction and generation. With the rapid update of devices such as color cameras, depth cameras, drones, and the development of new technologies such as artificial intelligence, big data, and the Internet, three-dimensional reconstruction has ushered in a new era of transformation. The overall reconstruction of natural humans, including shape, expression, body animation, etc., is a challenging task (as shown in Figure 3). Using more advanced devices and algorithms, integrating rule-guided explicit reconstruction with data-driven implicit reconstruction methods, high-precision scene reconstruction and human body reconstruction can be achieved at low cost.
Figure 3. High-precision digital human modeling based on Meta human software
Main research content: Facing the needs of high-precision, high-efficiency, and diversified digital human collection and reconstruction, a spherical controllable light field camera array high-precision collection device is built.
Direction4
Perceptual interaction emphasizes collaboration with technologies in other fields, including tracking and positioning, immersive sound field, gesture tracking, eye tracking, three-dimensional reconstruction, machine vision, electromyographic sensing, voice recognition, smell simulation, virtual movement, tactile feedback, brain-computer interface, etc. For service robots, based on high-precision 3D reconstruction of the environment, semantic and natural understanding is achieved. Cameras, sensors, and other devices or virtual signals are used to build the gestures, actions, expressions, and voices of natural/virtual humans, to capture the input signals of natural humans, and to drive and control the reconstruction of high-quality digital humans and bionic robots (as shown in Figure 4).
Figure 4. Robot and environment interaction
Main research content: Research on scene reconstruction, scene understanding, natural human gesture recognition, posture estimation, action understanding, expression analysis, and voice recognition technology, intelligent analysis of human interaction intentions based on multimodal signals such as images, videos, and audio from multiple perception devices. Research will be conducted on the digital twin and remote control technology coordinated by multimodal in industrial production, and a driving signal and body hardware mapping and inverse mapping control technology and model system in the interaction of “natural human-robot” and “digital human-robot” will be established.
Direction5
Realistic rendering is the core technology of presenting three-dimensional digital content (production process) in terminal devices (display process). The current mainstream includes gaze point rendering, cloud rendering, dedicated rendering chips, light field rendering, etc. Three-dimensional (3D) display technology is the ultimate development goal of display technology, and integrated imaging 3D display technology does not require auxiliary equipment, coherent light sources, or viewing fatigue, and can provide full parallax, continuous viewpoint, full color, real-time true 3D images, and is considered one of the most promising 3D display technologies (as shown in Figure 5). The resolution, field of view angle, depth range, etc., in integrated imaging 3D display are limited by principles and devices, and improvements are still needed.
Figure 5. Three-dimensional holographic display of a car model
Main research content: Research on gaze point rendering based on eye tracking, gaze point optics as hotspot technology, rendering technology based on artificial intelligence, and collaborative rendering based on cloud network edge. High-resolution integrated imaging 3D display, 2D/3D switchable integrated imaging 3D display, floating integrated imaging 3D display, large field of view integrated imaging 3D display system.