PhD Candidate in Point Cloud Prediction

Updated: over 1 year ago
Deadline: PhD Candidate in Point Cloud Prediction PhD Candidate in Point Cloud Prediction - Institutt for datateknologi og informatikkGjøvikTemporary14. okt 2022

This is NTNU

NTNU is a broad-based university with a technical-scientific profile and a focus in professional education. The university is located in three cities with headquarters in Trondheim.

At NTNU, 9,000 employees and 42,000 students work to create knowledge for a better world.

You will find more information about working at NTNU and the application process here.

   


...

(Video unable to load from YouTube. Accept cookie and refresh page to watch video, or click here to open video)



About the position

Volumetric Video (VV) is becoming a key technology for the creation of highly realistic and immersive environment. Given the expected growth of immersive technology in the coming years (e.g., Metaverse project launched by Facebook), it becomes necessary to develop innovative AI-based solutions to improve immersive media production / consumption workflow.

In this PhD project, we propose to focus on point cloud (PC) data prediction which is an important processing step that complements and benefits other immersive media production and consumption steps. More specifically, PC prediction is closely connected to volumetric video compression and thus will have a strong impact on the quality of user experience.

The PhD project objectives are to develop various Neural Network-based point cloud prediction models and validate them in the context of VV coding and rendering. Similar to conventional video compression algorithms based on motion estimation/compensation steps, it is expected that efficient point cloud prediction will improve the coding performance of volumetric data. Moreover, the prediction of point clouds can provide a better understanding of the future user pose and ensure low-latency VV streaming. In this respect, for an input sequence with n frames of point clouds, the objective consists in predicting one (or multiple) future frame(s) given some reference frames.


Unlike classic (single and multi-view) video prediction which has been widely investigated in the literature, there are very few works developed in the context of point cloud prediction. The latter is obviously a more challenging problem due to the unordered and unstructured nature of such data. With the ultimate goal of achieving efficient and accurate prediction and motivated by the great success of deep learning in image and video processing, this work will focus on graph and neural networks-based approaches. In this respect, our research approach will consist firstly in developing new graph neural network-based prediction models. Then, the latter will be validated on public PC datasets and evaluated using quantitative and qualitative techniques (i.e., objective and subjective quality assessment techniques).

While a few deep learning-based PC prediction approaches have been developed, mainly based on PointNet and recurrent neural networks, our method will rely on spatial-temporal transformers. The latter have already shown promising results in the context of conventional video motion estimation and prediction. Thus, the extension of such transformers to the context of dynamic PC will be investigated in this thesis. Moreover, to better exploit the local information/topology as well as the inter-frame correlation, we will resort to graph-based representations. Therefore, we will mainly design new graph-based neural networks for effective motion estimation and video prediction. Finally, regarding the choice of the loss function employed to train the developed models, we will investigate new perceptual metrics to better reflect the quality of user experience.



Similar Positions