Approx. 4 months

Assumes 6hrs/wk (work at your own pace)

Built by
Join thousands of students

Start Free Course

Start Free Course
You get
Instructor videos
Learn by doing exercises

Course Summary

This course provides an introduction to computer vision including fundamentals of image formation, camera imaging geometry, feature detection and matching, multiview geometry including stereo, motion estimation and tracking, and classification. We’ll develop basic methods for applications that include finding known models in images, depth recovery from stereo, camera calibration, image stabilization, automated alignment (e.g. panoramas), tracking, and action recognition. We focus less on the machine learning aspect of CV as that is really classification theory best learned in an ML course.

The focus of the course is to develop the intuitions and mathematics of the methods in lecture, and then to learn about the difference between theory and practice in the problem sets. All algorithms work perfectly in the slides. But remember what Yogi Berra said: In theory there is no difference between theory and practice. In practice there is. (Einstein said something similar but who knows more about real life?) In this course you do not, for the most part, apply high-level library functions but use low to mid level algorithms to analyze images and extract structural information.

Why Take This Course?

Images have become ubiquitous in computing. Sometimes we forget that images often capture the light reflected from a physical scene. This course gives you both insight into the fundamentals of image formation and analysis, as well as the ability to extract information much above the pixel level. These skills are useful for anyone interested in operating on images in a context-aware manner or where images from multiple scenarios need to be combined or organized in an appropriate way.

Prerequisites and Requirements

  • Data structures: You'll be writing code that builds representations of images, features, and geometric constructions.
  • A good working knowledge of Matlab and/or Python with NumPy. The lecture videos use Matlab for occasional demonstration because the instructor is too old to change. Problem sets will be done in Matlab or Python. As mentioned in the resources note below, you can use either Matlab or the open source version Octave.
  • This course has more math than many CS courses: Linear algebra, vector calculus, and linear algebra (that is not a typo).
  • No prior knowledge of vision is assumed though any experience with Signal Processing is helpful.

See the Technology Requirements for using Udacity.


A brief outline of units is given below, grouped into 10 parts:

1 Introduction

  • 1A Introduction

2 Image Processing for Computer Vision

  • 2A Linear image processing
  • 2B Model fitting
  • 2C Frequency domain analysis

3 Camera Models and Views

  • 3A Camera models
  • 3B Stereo geometry
  • 3C Camera calibration
  • 3D Multiple views

4 Image Features

  • 4A Feature detection
  • 4B Feature descriptors
  • 4C Model fitting

5 Lighting

  • 5A Photometry
  • 5B Lightness
  • 5C Shape from shading

6 Image Motion

  • 6A Overview
  • 6B Optical flow

7 Tracking

  • 7A Introduction to tracking
  • 7B Parametric models
  • 7C Non-parametric models
  • 7D Tracking considerations

8 Classification and Recognition

  • 8A Introduction to recognition
  • 8B Classification: Generative models
  • 8C Classification: Discriminative models
  • 8D Action recognition

9 Useful Methods

  • 9A Color spaces and segmentation
  • 9B Binary morphology
  • 9C 3D perception

10 Human Visual System

  • 10A The retina
  • 10B Vision in the brain

GT OMSCS Students

Note: Please refer to your course website/schedule for further details, assignments, etc.

Spring 2015 resources (old):

  • Schedule: Suggested pace, assignments, deadlines, references.
  • Course website: Course information, problem sets, academic policies, grading scheme.
  • Piazza forum: Discussions, announcements, clarifications.
  • T-Square site: Problem set submissions.

Note: This course was previously offered as CS 4495.

Instructors & Partners

instructor photo

Aaron Bobick

Aaron Bobick, PhD, joined Washington University in St. Louis as Dean of the School of Engineering & Applied Science and the James M. McKelvey Professor July 1, 2015. Prior to Washington University, he was a professor and founding chair of the School of Interactive Computing at the Georgia Institute of Technology, where he was a member of the faculty since 1999. He has B.Sc. degrees from MIT in Mathematics (1981) and Computer Science (1981) and a Ph.D. from MIT in Cognitive Science (1987). He joined the MIT Media Laboratory faculty in 1992 where he was a pioneer in the area of action recognition by computer vision. In 1999 Prof. Bobick moved to Georgia Tech where he became the Director of the GVU Center, an internationally known research center in computer vision, graphics, ubiquitous computing, and HCI. In 2005 the School of Interactive Computing was created with Prof. Bobick serving as the founding Chair. Prof. Bobick is both an IEEE Fellow and an ACM Distinguished Scientist. He has served as a senior area chair for most international computer vision conferences including serving as Program Chair of IEEE Conference on Computer Vision and Pattern Recognition. He has also served on the advisory board or boards of directors of a variety of surveillance-focused computer vision and medical imaging technology companies.

instructor photo

Irfan Essa

Irfan Essa is a Professor in the School of Interactive Computing (iC) and Associate Dean in the College of Computing (CoC), at the Georgia Institute of Technology (GA Tech), in Atlanta, Georgia, USA. Professor Essa works in the areas of Computer Vision, Computer Graphics, Computational Perception, Robotics and Computer Animation, Machine Learning, and Social Computing, with potential impact on Video Analysis and Production (e.g., Computational Photography & Video, Image-based Modeling and Rendering, etc.) Human Computer Interaction, Artificial Intelligence, Computational Behavioral/Social Sciences, and Computational Journalism research. He has published over 150 scholarly articles in leading journals and conference venues on these topics and several of his papers have also won best paper awards. He has been awarded the NSF CAREER and was elected to the grade of IEEE Fellow. He has held extended research consulting positions with Disney Research and Google Research and also was an Adjunct Faculty Member at Carnegie Mellon’s Robotics Institute. He joined GA Tech Faculty in 1996 after his earning his MS (1990), Ph.D. (1994), and holding research faculty position at the MIT Media Lab (1988-1996).

instructor photo

Arpan Chakraborty

Arpan likes to find computing solutions to everyday problems. He is interested in human-computer interaction, robotics and cognitive science. He obtained his PhD from North Carolina State University, focusing on biologically-inspired computer vision. At Udacity, he spends a good chunk of time designing interactive exercises for his courses, besides working on pet projects to improve or automate workflow.