Instructor Lingpeng Kong (lpk AT cs.hku.hk)
Season Fall 2022
Location CBA, Chow Yei Ching Building
TA Qintong Li (qtli AT connect.hku.hk)
Course description:

Natural language processing (NLP) is the study of human language from a computational perspective. The course will be focusing on machine learning and corpus-based methods and algorithms. We will cover syntactic, semantic and discourse processing models. We will describe the use of these methods and models in applications including syntactic parsing, information extraction, statistical machine translation, dialogue systems, and summarization. This course starts with language models (LMs), which are both front and center in natural language processing (NLP), and then introduces key machine learning (ML) ideas that students should grasp (e.g. feature-based models, log-linear models and then the neural models). We will land on modern generic meaning representation methods (e.g. BERT/GPT-3) and the idea of pretraining / finetuning.

Prerequisites:

COMP3314 or COMP3340, MATH1853

Assessment:

50% continuous assessment, 50% examination

Schedule

Lecture     Topic/papers Recommended reading Materials
Part I    
Sept. 2 Introduction to NLP, Language Models [slides] [J&M Ch. 1] [Lee, 2004]
Sept. 6 Language Models [slides] [J&M Ch. 4] [J&M Ch. 7] [M. Collins, Notes 1]
Sept. 9 Language Models, Smoothing [slides] same as last lecture
Sept. 13 RNNLM, BERT, Pretraining + Fine-tuning [slides]
[BERT paper]
Sept. 16 Computational Graphs [slides] [J&M Ch. 8.1 - 8.3] [M. Collins, Notes][C. Dyer, LSTM Notes] Assignment 1 [Requirements] [Colab]
Sept. 20 Computational Graphs and Sequence to Sequence Models [slides] [Sutskever et al, 2014]
Sept. 21 Attention Mechanism [slides] [Baahdanau et al, 2015]
Sept. 27 & Sept. 30 Transformers [slides] [Vaswani et al, 2017] [The Annotated Transformer]
Part II    
Oct. 7 Parsing, Context-free Grammars [slides] [M. Collins, Notes] [J&M Ch. 12]
Oct. 18 Probabilistic Context-free Grammars [slides]
Oct. 21 Recursive Neural Networks, Shift-reduce Parsing [slides] [Stanford Sentiment Treebank] [Socher et al, 2013]
Oct. 25 Recurrent Neural Network Grammars [slides] [Dyer et al., 2016] [Kuncoro et al., 2017] Assignment 2
[Deadline: Nov 15, 6:00am] [Requirements] [Colab]
Nov. 1 Dependency Parsing [slides] [J&M Ch. 14]
Part III    
Nov. 4 Large Pretrained Models [slides] [BART] [T5] [InfoWord] [GPT-3] [ELMo]
Nov. 8 Prompt, Prefix-Tuning and Adaptors [slides] [Liu et al., 2021] [Li and Liang, 2021] [Houlsby, 2019]
Nov. 11 Natural Language Generation [slides] [Holtzman et al., 2019] [Ghazvininejad et al., 2017] [Dathathri et al., 2020] Assignment 3
[Deadline: Dec 10, 6:00am] [Exercises]
Nov. 15 Question Answering [slides] [Rajpurkar et al., 2017] [Seo et al., 2017] [J&M Ch. 23] [Joshi et al., 2020] [Lee et al., 2019]
Nov. 18 Multilinguality [slides] [Universal Dependencies] [Pires et al., 2019] [Lample and Conneau, 2019] [Liu et al., 2020] [Assignment 3]
Nov. 22 Multimodality, NLP + Vision [slides] [VQA] [Xu et al., 2015] [GQA] [Hudson and Manning, 2018]
Nov. 25 Model Interpretability [slides] [Wu et al., 2020] [Tenney et al., 2020]
Nov. 29 All Questions Answered

Assignments

Submission:

Please submit to moodle a zip file that contains (1) your code, (2) a write-up (pdf) that explains your model, and (3) your model’s predictions (strictly following the required format). Please name your zip file in formt UniversityNumber.zip.

Programming:

For each assignment, you can use a different programming language (e.g., Python or C++) and different deep learning frameworks (e.g., PyTorch or Tensorflow).

Evaluation:

We will review your work individually to ensure that you receive due credit for your work. Please note that both your project output and logic will be considered for marking.

Policy and honor code:

You are free to discuss ideas and implementation details with other students. However, copying others’ codes will not help your study but jeopardize it. We will check your work against other submissions and Internet sources. It is easy to know if you did your own work or not. To be clear, we encourage you to discuss with your classmates but you MUST do your work independently and CANNOT simply copy. If plagiarism is identified, one may face serious consequences according to the Faculty and University policy.