NLP Data Scientist

Updated: 3 months ago
Location: Hinxton, ENGLAND
Job Type: FullTime
Deadline: 16 Feb 2024

26 Jan 2024
Job Information
Organisation/Company

EMBL-EBI - European Bioinformatics Institute
Research Field

Biological sciences
Biological sciences » Biology
Medical sciences » Medicine
Technology » Biotechnology
Computer science
Mathematics » Statistics
Researcher Profile

First Stage Researcher (R1)
Established Researcher (R3)
Country

United Kingdom
Application Deadline

16 Feb 2024 - 21:59 (UTC)
Type of Contract

Permanent
Job Status

Full-time
Is the job funded through the EU Research Framework Programme?

Not funded by an EU programme
Is the Job related to staff position within a Research Infrastructure?

No

Offer Description

NLP Data Scientist
EMBL-EBI - European Bioinformatics Institute
Hinxton, United Kingdom

Organisation data: Chemogenomics
Job Number: EBI02211
Contract Type: Staff Member
Contract Duration-Length of Time (years/months): 3 years (Project based contract)
Advertised Grade-Grading: Grade 5 or 6 (£3,090 or £3,456 per month after tax) plus benefits depending on personal circumstances
Closing date: 16 February 2024


About the team/job

We are looking for an enthusiastic and talented (NLP) data scientist to join the newly initiated AI knowledge management project, initially for a period of 3 years. This position will be situated in the Chemical Biology Services Team, which also develops and delivers several globally-recognised resources including ChEMBL, ChEBI, SureChEMBL, and UniChem. We are in particular interested in applications of AI and machine learning to mine the research literature for additional types of entities relevant to drug discovery not currently available to OT (such as variants, biomarkers, tissues/cell types, adverse events, and assay conditions). This position provides a real opportunity to make a significant impact on a critical problem in drug discovery for the many users of the OT Platform and an opportunity to contribute to the open source models and code associated with biological and drug discovery entities.

You will be embedded into a multi-disciplinary project team that also includes machine learning data experts and data scientists/engineers. You will need to be able to demonstrate the ability to work well with colleagues and to collaborate with external partners. You must have excellent communication and interpersonal skills and enjoy working in a stimulating, international environment.


Your role
  • Collect benchmark data sets from the open domain as training sets for NLP models;
  • Collect specifications, test prototypes and deliver tools for benchmarking;
  • Develop and utilise statistically robust methods for data analysis and benchmarking;
  • Work with other team members to find and use suitable pre-trained NLP models from the public domain (e.g., HuggingFace );
  • Work with other team members to retrain publicly available NLP models on the open scientific literature available in Europe PMC to ensure they are optimised for the project's need;
  • Support the team to modernise and extend the current entity recognition workflows to cover an array of additional types of entities relevant to drug discovery;
  • Support the team to development of new machine learning, deep learning or NLP protocols to enhance curation workflows;
  • Analyse the newly extracted entity relationships as part of specific use cases (e.g., explore their scientific value);
  • Collaborate with the OT partners to assess, prioritise, validate and refine the developed methods;
  • Work closely with the OT core team for the seamless integration of data and workflows into the OT Platform;
  • Actively disseminate the outcomes of the project to the scientific community and stakeholders through well-crafted presentations and publications.

You have
  • Advanced degree (MSc, PhD) in biology, biomedical sciences or related discipline;
  • Proficiency in at least one modern programming/scripting language (e.g. Python);
  • Experience of biological data curation and knowledge of bioinformatics databases;
  • Experience with advanced big data preprocessing, cleaning, and transformation techniques specific to textual data including ontologies;
  • Good understanding of statistical methods and their application to data analysis and use of data visualisation tools and libraries (such as Matplotlib, Seaborn) to effectively communicate data insights;
  • Excellent attention to detail;
  • Strong communication skills, both presentations and verbal;
  • Experience working in a team-oriented environment;
  • Able to work independently, to manage your time and work to deadlines.

You might also have
  • Experience working in a drug discovery and development environment;
  • Proficiency in using text analytics methods and/or machine learning tasks;
  • Knowledge of version control systems (e.g., GitHub);
  • Knowledge and practical experience with bioinformatics methods including systems biology and genetics analysis.

Why join us
Do something meaningful

At EMBL-EBI you can apply your talent and passion to accelerate science and tackle some of humankind's greatest challenges. EMBL-EBI, part of the European Molecular Biology Laboratory , is a worldwide leader in the storage, analysis and dissemination of large biological datasets. We provide the global research community with access to publicly available databases and tools which are crucial for the advancement of healthcare, food security, and biodiversity.


Join a culture of innovation

We are located on the Wellcome Genome Campus , alongside other prominent research and biotech organisations, and surrounded by beautiful Cambridgeshire countryside. This is a highly collaborative and inclusive community where our employees enjoy a relaxed atmosphere. We are committed to ensuring our employees feel valued, supported and empowered to reach their professional potential.


Enjoy lots of benefits
  • Financial incentives: Monthly family, child and non-resident allowances, annual salary review, pension scheme, death benefit, long-term care, accident-at-work and unemployment insurances;
  • Flexible working arrangements;
  • Private medical insurance for you and your immediate family (including all prescriptions and generous dental & optical cover);
  • Generous time off: 30 days annual leave per year, in addition to eight bank holidays;
  • Relocation package including installation grant (if required);
  • Campus life: Free shuttle bus to and from work, on-site library, subsidised on-site gym and cafeteria, casual dress code, extensive sports and social club activities (on campus and remotely);
  • Family benefits: On-site nursery, 10 days of child sick leave, generous parental leave, holiday clubs on campus and monthly family and child allowances;
  • Benefits for non-UK residents: Visa exemption, education grant for private schooling, financial support to travel back to your home country every second year and a monthly non-resident allowance.

For more details please see our employee benefits page .


What else you need to know
  • Contract duration: This position is a project based 3 year contract which will expire in June 2027;
  • International applicants: We recruit internationally and successful candidates are offered visa exemptions. Read more on our page for international applicants ;
  • Diversity and inclusion: At EMBL-EBI, we strongly believe that inclusive and diverse teams benefit from higher levels of innovation and creative thought. We encourage applications from women, LGBTQ+ and individuals from all nationalities;
  • EMBL is a signatory of DORA. Find out how we implement best practices in research assessment in our recruitment processes here ;
  • ...

Requirements
Additional Information
Work Location(s)
Number of offers available
1
Company/Institute
EMBL-EBI - European Bioinformatics Institute
Country
United Kingdom
City
Hinxton
Geofield


Where to apply
Website

https://www.eurosciencejobs.com/job_display/251949/NLP_Data_Scientist_EMBL_EBI_…

Contact
City

Hinxton
E-Mail

[email protected]

STATUS: EXPIRED

Similar Positions