PhD Studentship: Sustainable and fast inference and fine-tuning for large language models

Updated: about 1 month ago
Location: Southampton, ENGLAND
Job Type: FullTime
Deadline: 08 Apr 2024

PhD Supervisor: Dr. Zhanxing Zhu

Supervisory Team: Dr. Zhanxing Zhu and Dr. Stuart Middleton

Project description:

Recently, large language models (LLMs), such as OpenAI’s ChatGPT and Google’s Bard, have caught enormous attention of the public. They can generate remarkably realistic, coherent text based on a user's input and have the potential to be general-purpose tools used throughout society, e.g. for customer service, summarizing texts, answering questions, code generation and even solving math problems. These LLMs are typically with Transformer architectures with tens of billions of parameters and trained with trillions of tokens.

However, the large model size and high computational complexity of LLMs results in significant computational power and storage requirements far surpassing what is currently available with standard consumer hardware. For instance, even the relatively modest-sized model LLaMA-65B needs 130GB of GPU RAM for inference and more than 780 GB for fine-tuning with new data.  Therefore, the aim of this Ph.D project is to develop fast and efficient inference and fine-tuning strategies for LLMs such that they can be accessible and deployed given limited computational resources. This will promote sustainable language model usage in more scenarios in terms of power and hardware consumption.  

In this project, the student will explore various pathways to achieve sustainable LLM inference and tuning, including:

  • Model quantization, i.e. quantizing high-bit values into low-bit ones like 4-bit, including weight quantization, KV cache quantization from an information-theoretical perspective. Particularly, task adapted quantization, where quantization method could factor in information from the expected target task corpus distribution as well as original LLM pre-training corpus. 
  • Data selection during fine-tuning, where selection mechanisms of data will be developed instead of using entire training dataset; 
  • Model compression and distillation, i.e. compressing large models into tiny ones with significantly smaller number of parameters.  

Students will work in a collaborative team with both experts of machine learning and natural language processing involved. We strongly suggest that students with strong motivation dive into the research of sustainable large language models to make GPT-type models accessible even with limited computational resources.  

This is a 4-year integrated PhD (iPhD) programme and is part of the UKRI AI Centre for Doctoral Training in AI for Sustainability (SustAI). For more information about SustAI, please see: https://sustai.info/

If you wish to discuss any details of the project informally, please contact Professor Enrico Gerding, Director of the SustAI CDT, Email: [email protected] .

Entry Requirements

A very good undergraduate degree (at least a UK 2:1 honours degree, or its international equivalent).

Closing date : 8 April 2024.

Applications will be considered in the order that they are received.

Funding: We offer a range of funding opportunities for both UK and international students, including Bursaries and Scholarships.  For more information please visit PhD Scholarships | Doctoral College | University of Southampton   Funding will be awarded on a rolling basis, so apply early for the best opportunity to be considered.

How To Apply

Apply online, by clicking the 'Apply' button, above.

Select programme type (Research), 2024/25, Faculty of Engineering and Physical Sciences, next page select “PhD iPhD AI for Sustainability (Full time)”. In Section 2 of the application form you should insert the name of the supervisor

Applications should include:

  • Research Proposal
  • Curriculum Vitae
  • Two reference letters
  • Degree Transcripts/Certificates to date

For further information please contact: [email protected]