Deep Learning for Vision and Multimodal Data
Fall 2024

Syllabus

Administrivia

Lecture: Friday 1:30 - 4:20 PM

Instructor: Qifeng Chen (cqf@ust.hk)


Course description

This course focuses on advanced deep learning architectures and their applications in various areas. Specifically, the topics include various deep neural network architectures with applications in computer vision, signal processing, graph analysis, and natural language processing. Different state-of-the-art neural network models will be introduced, including graph neural networks, normalizing flows, point cloud models, sparse convolutions, and neural architecture search. The students have the opportunities to implement deep learning models for some AI-related tasks such as visual perception, image processing and generation, graph processing, speech enhancement, sentiment classification, and novel view synthesis.

Course outline

• Week 1-2: Overview of deep learning: Basic architectures (CNN, RNN), Backpropagation, Loss functions
• Week 3: Neural networks for image and video recognition tasks
• Week 4: Neural networks for image and video processing tasks
• Week 5: Deep 3D learning for point clouds, meshes, and volumetric data
• Week 6: Deep 3D learning for stereo and multi-view data
• Week 7: Graph neural networks for graph processing and analysis
• Week 8: Sequential modeling and signal processing: transformer
• Week 9: Deep generative models: normalizing flow, GAN, diffusion model
• Week 10: Efficient neural networks
• Week 11: Neural architecture search
• Week 12-13: Final project presentation and project report submission

Recommended text

    Ian Goodfellow, Yoshua Bengio and Aaron Courville. Deep Learning, MIT Press, 2016.
    Zhang, A., Lipton, Z.C., Li, M. and Smola, A.J., 2019. Dive into deep learning. https://d2l.ai.

Grading

The breakdown is subject to change as a whole and adjustments on a per-student basis in exceptional cases. This is the general breakdown we'll be using:

Homework: 30%
Midterm: 35%
Presentation: 5%
Final project: 30%
maintenance by Qifeng Chen. EESM 5900V: Deep Learning for Vision and Multimodal Data
Fall 2024