PhD position in 3D Reconstruction of Humans in Interaction from Images

Updated: 4 months ago
Job Type: Temporary
Deadline: 31 Aug 2022

Do you want to help computers see, understand, and assist us, humans, in our everyday life? Are you excited with artificial intelligence (AI), mixed reality (MR), 3D spatial computing, 3D human avatars and the "metaverse"? Do you aspire to conduct internationally-visible research in one of the world's most exciting cities? We search for a strong PhD candidate to push together the state of the art!

Humans constantly interact with objects, spaces and other humans to perform tasks. This is reflected in the photos and videos that we upload on Facebook, Instagram, YouTube, or that we capture through smart glasses (Microsoft's HoloLens, Meta's Aria). Our long-term goal is to develop human-centered AI that accurately perceives humans from images while performing tasks, and assists them in these. This is important for Ambient Intelligence, Virtual Assistants, Human-Computer and Human-Robot Interaction, and Augmented/Virtual Reality (AR/VR).

To this end, we first need to "make sense" of the observed scene, i.e., to model how people, objects, and spaces look, to estimate their shape and pose, to infer their semantics and spatial relationships, and to do all of these in 3D, as our bodies and world are also 3D. Think of this as "mirroring" the observed scene, with the humans and objects contained in it, to a "replica" virtual scene with 3D humans and objects. This holistic 3D reconstruction (or 4D over time) endows computers with the ability to recognize what is in the scene, infer the state of humans and objects, and analyze the semantic and spatial configuration of the observed scene and action.

Although for humans this perceptual capability seems effortless, for computers this has proven to be hard. Challenges exist at all levels of abstraction, from the ill-posed 3D inference from 2D images, to the semantic interpretation of it.

Among others, this project involves challenges like:

  • Reconstructing deformable 3D human bodies and hands from single-/multi-view images;
  • Reconstructing additionally 3D physical objects from single-/multi-view images;
  • Dealing with the strong occlusions during realistic human-object interactions;
  • Representing (possibly through "learning") the spatial relations (e.g., proximal distances, contact, penetration) and semantics (e.g., affordances) of human-object interactions;
  • Accounting for the low-data regime - possible directions: collecting novel datasets for training and evaluation, weakly-supervised approaches, optimization-based approaches;
  • Extending 3D reconstruction over time (4D);
  • Potentially using the above to develop novel pose/motion generation methods.

These are hot research problems for both academia and industry with no signs of slowing down. For representative papers see at the tabs "projects" and "publications".

Each project in this PhD can be tailored to the aligned interests between the PhD candidate and the advisors. The goal is fundamental research to push the state of the art, publish at top-tier venues, release data and code useful for the community, and introduce new research problems.

What are you going to do

Your tasks and responsibilities will be to:

  • Develop and evaluate new methods at the intersection of (3D) computer vision, computer graphics and machine learning, within the project context described above;
  • Collaborate with other researchers in the Computer Vision (CV) lab and in the University of Amsterdam (UvA), as well as (inter-)nationally;
  • Complete and defend a PhD thesis within the official appointment duration of four years;
  • Regularly present intermediate research results at top-tier international conferences and workshops, and publish them in proceedings and journals;
  • Provide a reviewing service for top-tier conferences and journals;
  • Develop exciting demos for both the research community and for public outreach;
  • Assist in relevant teaching activities, e.g., lectures, labs, co-advising BSc/MSc students.


You will be co-advised by Dimitrios Tzionas (DT) and Theo Gevers (TG). We work on the intersection of (3D) computer vision, graphics and machine learning. We publish at top international venues (CVPR, ICCV, ECCV, SIGGRAPH/TOG, IJCV, CVIU, TPAMI). We have expertise on statistical 3D models for human bodies/hands, and 3D human shape/pose and (inter-)action understanding from images. Recently, a paper co-authored by DT was a best-paper finalist at CVPR 2022, and two project-relevant startups of TG had a successful "exit". A surrounding team within the CV lab of 7 senior researchers and 14 PhD students (and growing) are available for interactions and possible collaboration. Our strong international network (ELLIS society and beyond) can also lead to potential collaborations and/or internships.

View or Apply

Similar Positions