Chapter 12 – Computer vision

Contents

12.1  Overview
12.2  Introduction
12.2.1  Why Computer Vision Is Difficult
12.2.2  Phases of Computer Vision
12.3  Digitisation and Signal Processing
12.3.1  Digitising Images
12.3.2  Thresholding
12.3.3  Digital Filters
12.3.3.1  Linear Filters
12.3.3.2  Smoothing
12.3.3.3  Gaussian Filters
12.3.3.4  Practical Considerations
12.4  Edge Detection
12.4.1  Identifying Edge Pixels
12.4.1.1  Gradient Operators
12.4.1.2  Robert's Operator
12.4.1.3  Sobel's Operator
12.4.1.4  Laplacian Operator
12.4.1.5  Successive Refinement and Marr's Primal Sketch
12.4.2  Edge Following
12.5  Region Detection
12.5.1  Region Growing
12.5.2  The Problem of Texture
12.5.3  Representing Regions -- Quadtrees
12.5.4  Computational Problems
12.6  Reconstructing Objects
12.6.1  Inferring Three-Dimensional Features
12.6.1.1  Problems with Labelling
12.6.2  Using Properties of Regions
12.7  Identifying Objects
12.7.1  Using Bitmaps
12.7.2  Using Summary Statistics
12.7.3  Using Outlines
12.7.4  Using Paths
12.8  Facial and Body Recognition
12.9  Neural Networks for Images
12.9.1  Convolutional Neural Networks
12.9.2  Autoencoders
12.10  Generative Adversarial Networks
12.10.1  Generated Data
12.10.2  Diffusion Models
12.10.3  Bottom-up and Top-down Processing
12.11  Multiple Images
12.11.1  Stereo Vision
12.11.2  Moving Pictures
12.12  Summary

Glossary items referenced in this chapter

accuracy, active vision, ambiguous image, aspect ratio, auto-associative memory, autoencoder, backpropagation, Bayes Theorem, binary image, bitmap image, Boltzmann machine, bottom-up reasoning, camera!pan, camera!zoom, clustering, connectionist model, constraint satisfaction, constraints, contour following, convolutional neural network, convolutions, correlation, crowdsourcing, data structure, database, deep fakes, deep neural network, differential (calculus), diffusion models, digital filtering, digital signal processing, digitisation, edge detection, edge following, emotion recognition, facial recognition, false positive, frame of video, game playing, Gaussian filter, generative adversarial network, geographic information system, geometric constraints, gesture recognition, GIS, Google, GPU, gradient descent, gradient operators, grey-scale image, ground truth, handwriting recognition, heuristic evaluation function, Hopfield networks, human perception, hybrid architecture, image thresholding, image understanding, labelling, Laplacian operator, Laplacian-of-Gaussian filter, line labelling, linear filter, machine learning, Marr's primal sketch, moving images, multiple images, neural network, neural-network architecture, Normal distribution, normalisation, object identification, object identification!bitmaps, object identification!outlines, object identification!paths, object identification!summary statistics, object recognition, OCR (optical character recognition), octree, optical flow, parallax, parallel processing, pattern matching, pen-based systems, position independent, pre-processing, privacy, quadtree, reasoning with uncertainty, receptive field, region detection, region growing, Restricted Boltzmann Machine, ridge, Robert's operator, robotics, segmentation, sensation, sensor fusion, sharpening filters, signal processing, similarity measure, Skynet, smoothing, Sobel's operator, standard deviation, stereo vision, successive refinement, template matching, texture, three-dimensional objects, threshold, time series, unsupervised learning, voxel, Waltz's algorithm, wavelet transform, zero-sum game

Prolog examples (from 1st ed.)

eximages.pimage processing utilities
images from examples in book
image.p

image processing utilities
simple representation of a pixel image

gimage.p

image processing utilities
the 'gimage' representation of a pixel image

filter.pdigital filters
gradient filters
threshold.pthresholding