Research data scientist intern : Machine learning-based data imputation and augmentation for prediction of response to immunotherapy

Updated: over 1 year ago
Location: Azur, AQUITAINE
Deadline: 08 Jan 2023

2022-05572 - Research data scientist intern : Machine learning-based data imputation and augmentation for prediction of response to immunotherapy

Level of qualifications required : Master's or equivalent


Fonction : Internship Research


About the research centre or Inria department

The Inria Sophia Antipolis - Méditerranée center counts 34 research teams as well as 8 support departments. The center's staff (about 500 people including 320 Inria employees) is made up of scientists of different nationalities (250 foreigners of 50 nationalities), engineers, technicians and administrative staff. 1/3 of the staff are civil servants, the others are contractual agents.


Context

The internship position will take place in the environment of the Inria-Inserm team COMPO (COMputational Pharmacology in Oncology), located in the University Hospitals of Marseille (AP-HM). The team is composed of mathematicians, pharmacists and clinicians and is a unique multidisciplinary environment focused on developing novel computational tools for decision-making in clinical oncology.

The intern will join the QUANTIC project, which consists in the statistical and machine learning analysis of the large scale, multi-modal data collected within the national PIONeeR clinical study [1]. The project aims at predicting response/resistance to immune-checkpoint inhibitors for patients with advanced non-small cell lung carcinoma. The data consists of multi-modal and high-dimensional features from quantitative digital pathology, lesions sizes, pharmacokinetics, immunoprofiling, soluble biomarkers and sequencing data.

[1] https://marseille-immunopole.org/the-pioneer-project/


Assignment

Despite the large number of variables (1167), multiple variables have missing values, at possibly high (>50%) level. The inter will be in charge of exploring the missingness pattern of the data (missing at/completely at/not at random) and assessing state-of-the art machine learning methods for missing values imputation in this concrete application.

In addition, he/she will explore methods for data augmentation, with a focus on large-dimensional longitudinal data.

Keywords: Machine learning; clinical oncology; data imputation; data augmentation


Main activities
  • Literature review
  • Data visualization
  • Exploratory data analysis
  • Statistical analysis
  • Programming (R/python)
  • Implementation of machine learning algorithms for data imputation
  • Implementation of machine learning algorithms for data augmentation

Skills

Technical skills and level required :

  • Data analysis
  • Statistics
  • Computer programming (R/python)
  • Basic knowledge of cancer biology, immunology and/or medicine is a plus

Other valued appreciated : motivation to develop computational tools for concrete clinical applications, general enthusiasm to work in a team


Benefits package
  • Subsidized meals
  • Partial reimbursement of public transport costs
  • Leave: 7 weeks of annual leave + 10 extra days off due to RTT (statutory reduction in working hours) + possibility of exceptional leave (sick children, moving home, etc.)
  • Professional equipment available (videoconferencing, loan of computer equipment, etc.)
  • Social, cultural and sports events and activities
  • Access to vocational training
  • Social security coverage

Remuneration

The amount of the stipend is based on the hourly rate in effect within public institutions and indexed to changes in the maximum hourly rate fixed by the social security