Visual Computing Theory and Engineering

Visual Computing Theory and Engineering Topic: Low Level Group 1

Topic：Low-level • Group Members (Group1): • 车朝晖，李伟，蔡春磊，陈琳，张烨珣，李高磊 • 陈卉，郑策，王敏思，朱文瀚 • Subtopics: • Image estimator • Image de-noising • Image restoration • Super resolution

Outline • ‘Low-level’ and ‘High-level’ • Subtopics of ‘Low-level’: • Image estimator • Image de-noising • Image restoration • Super resolution • Summary

Low-level & High-level • Low-level[1]: • Low level image processing is mainly concerned with extracting descriptions from images (that are usually represented as images themselves) . • There may be multiple, largely independent descriptions, such as edge fragments, spots, reflectance, line fragments, etc. • High-level: • High-level features are something that we can directly see and recognize, like object classification, recognition, segmentation and so on. [1] Low-level image processing http://homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/MARBLE/low/low.htm

Subtopics of ‘Low-level’: • Image estimator • Image de-noising • Image restoration • Super resolution

Image Estimator • Generative Image Modeling Using Spatial LSTMs • Zhaohui Che 车朝晖 • Depth and surface normal estimation from monocular images using regression on deep features and hierarchical CRFs • Gaolei Li 李高磊

Generative Image ModelingUsing Spatial LSTMs车朝晖

MCGSM : mixture of conditional Gaussian scale mixtures • We should give the pixels an ordering and specify the distribution of each pixel conditioned on its parent pixels. • SLSTM : spatial long short-term memory • core part: memory units c(ij) and hidden units h(ij). For each location (i,j), the operation is :

σ is the logistic sigmoid function, indicates a pointwise product, and T(A,b) is an affine transformation which depends on the only parameters of the network A and b. The gating units i(ij) and o(ij) determine which memory units are affected by the inputs through g(ij), and which memory states are written to the hidden units h(ij).(RIDE)Recurrent image density estimator:use pixels in a much larger region for prediction, and to nonlinearly transform the pixels before applying the MCGSM

Depth and surface normal estimation from monocularimages using regression on deep features and hierarchical CRFs 李高磊

Problems • Depth and surface estimation for single monocular color images

Related Methods • Conditional random fields (CRFs). • Regression on deep convolutional neural network (DCNN).

Basic Architecture

Steps • Depth regression with CNNs. • Refining the results via hierarchical CRF.

Principles and Implementation

Experimental Results • NYU2 data set

Make3D data set

Performance Evaluation

Conclusions • In this paper, we have presented a new and common framework for depth and surface normal estimation from single monocular images, which consists of regression using deep CNNs and refining via a hierarchical CRF. With this simple framework, we have achieved promising results for both tasks of depth and surface normal estimation. In the future, we plan to investigate different data augmentation methods to improve the performance in handling real-world image transformations. • Furthermore, we plan to explore the use of deeper CNNs. Our preliminary results show that improved depth estimation can be obtained with VGGNet, compared with AlexNet. In addition, the effect of joint depth and semantic class estimation with deep CNN features also deserves attention.

Image de-noising • Learning a Convolutional Neural Network for Non-uniform Motion Blur Removal • Ce Zheng 郑策 • Patch Group Based Nonlocal Self-Similarity Prior Learning for Image De-noising • Wenhan Zhu 朱文瀚

Learning a Convolutional Neural Network for Non-uniformMotion Blur Removal 郑策

Introduction • Image deblurring aims at recovering sharp image from a blurry image due to camera shake, object motion or out-of-focus. • Estimate the probabilities of motion kernels at the patch level using a convolutional neural network (CNN) • Fuse the patch-based estimations into a dense field of motion kernels using a Markov random field (MRF) model. • Effectively estimate the spatially varying motion kernels, which enable us to well remove the motion blur.

CNN for Motion Blur Estimation • First predict the probabilities of different motion kernels for each image patch. • Then estimate dense motion blur kernels for the whole image using a Markov random field model enforcing motion smoothness. Representation of motion blur kernel by motion vector and generation of motion kernel candidates

Patch-level Motion Kernel Estimation by CNN Structure of CNN for motion kernels prediction It is composed of 6 layers of convolutional layers and fully connected layers, outputing the probability of each candidate motion kernel using soft-max layer. Motion kernel estimation on a rotated patch

Dense Motion Field Estimation by MRF The left of (b) show four blurry patches cropped from (a). Each color map on the right of (b) shows the probabilities of motion kernels in different motion lengths and orientations estimated for each blurry patch by CNN. Note that the high probability regions are local in each map. (c) shows our final motion kernel estimation. Examples of motion kernel probabilities

Dense Motion Field Estimation by MRF Example of non-uniform motion kernel estimation (b) Estimation using the unary term of Eqn.(6), i.e., choosing the motion kernel with highest confidence for each pixel. (c) Estimation using the full model of Eqn.(6) with motion smoothness constraint. (d) Ground-truth motion blur.

Experiments Figure 9 presents four examples with strongly non-uniform motion blur captured for scenes with complex depth layers. The first three examples are real-captured blurry images, and the final example is a synthetic blurry image. All these examples show that our CNN-based approach can effectively predict the spatially varying motion kernels.

Conclusion In this paper, we have proposed a novel CNN-based non-uniform motion deblurring approach. We learn an effective CNN for estimating motion kernels from local patches. Using an MRF model, we are able to well predict the non-uniform motion blur field. This leads to state-of-the-art motion deblurring results.

Patch Group Based Nonlocal Self-Similarity Prior Learning for Image Denoising Wenhan Zhu

Purpose • Background：There is not an explicit NSS prior model learned from natural images for image restoration. • Major work：In this paper, they propose to learn explicit NSS models from natural images, and apply the learned prior models to noisy images for high performance denoising. • Try to develop a patch group based NSS prior learning scheme to improve the performance of image restoration.

Flowchart of PGPD • Flowchart of the proposed patch group based prior learning and image de-noising framework.

Patch group • A PG is formed by grouping the M similar patches, denoted by • The mean vector of this PG is • is the group mean subtracted patch vector • called PG

Algorithm

Results： • Compare the proposed PGPD algorithm with BM3D, EPLL, LSSC, NCSR and WNNM. • PGPD has higher PSNR values than BM3D, LSSC, EPLL and NCSR, and is only slightly inferior to WNNM. However, PGPD is much more efficient than WNNM. • In summary, the proposed PGPD method demonstrates powerful de-noising ability quantitatively and qualitatively, and it is highly efficient.

Image Restoration • Conformal and Low-Rank Sparse Representation for Image Restoration • Yexun Zhang 张烨珣

Conformal and Low-Rank Sparse Representation for Image Restoration Yexun Zhang 张烨珣

Objective: Image Restoration • Method: Obtaining an appropriate dictionary is the key point when sparse representation is applied to computer vision or image processing problems such as image restoration. • Opportunity: Many existing dictionary learning methods handle training samples individually, while missing relationships between samples, which result in dictionaries with redundant atoms but poor representation ability.

Sparse Representation:is a model which suggests that there exists a dictionary which can reconstruct the signals. Each signal can be represented by a sparse linear combination of atoms in the dictionary. • How to obtain the dictionary? analytically learning dictionary predefined from training samples

Early dictionary learning methods • K-SVD algorithm • Focus on reconstruction power of the dictionary • Depend on a large training dataset • Fixed dictionary size leading to dictionary redundancy • Add discrimination constraints • Only train samples individually or consider the discrimination between classes • Ignore local relationships between samples or structure of the data manifold leading to redundant dictionaries with poor representation ability

Consider the relationship • local perspective: • Local samples have similar features and form a local subspace, reflecting the affinities between each other. • Global perspective: • Samples with similar features are linearly related, and thus they lie on a low-dimensional latent space. Key: How to embed these two relationships into dictionary learning and sparse representation?

The paper’s work • Conformal and Low-rank Sparse Representation (CLRSR ) • Conformal property is introduced by preserving the angles of localgeometry formed by neighboring samples in the feature space. • Imposing low-rank constraint on the coefficient matrix can lead more faithful subspaces and capture the global structure of data.

The data inner structures can be better modelled through Conformal Eigenmaps, which projects data from a high dimensional space to a low dimensional manifold while preserving the angles formed by neighboring samples. • These angle relationships are called as the conformal property. • Enforce the coefficient matrix to be low-rankto involve the global structure of data. • The samples extracted from image/video are relevant to each other, thus these samples lie on low-rank subspaces. • Samples having similar features will have similar sparse representations, resulting in similar coding coefficients. Therefore, the coefficient matrix A is expected to be low-rank.

Super Resolution • Convolutional Sparse Coding for Image Super-resolution • Lin Chen 陈琳，Chunlei Cai 蔡春磊 • Deep Networks for Image Super-Resolution with Sparse Prior • Minsi Wang 王敏思 • Bidirectional Recurrent Convolutional Networks for Multi-Frame Super-Resolution • Wei Li 李伟 • Video Super-Resolution via Deep Draft-Ensemble Learning • Hui Chen 陈卉

Convolutional Sparse Coding for Image Super-resolution 2015 IEEE International Conference on Computer Vision Reporter: Chunlei Cai 蔡春磊 Lin Chen 陈琳

Some Fundamental Concept • Definition: • Super-resolution，SR The purpose of super-resolution (SR) is to reconstruct a high resolution (HR) image from a single low resolution (LR) image or a sequence of LR images • Sparse coding，SC Sparse representation encodes a signal vector x as the linear combination of a few atoms in a dictionary D, i.e., x ≈ Dα, where α is the sparse coding vector

Related Work • Single Image Super Resolution Methods，SISR: • Most SISR methods utilize the prior knowledge on image patches, which can be grouped into three categories example-based methods mapping-based methods sparse coding based methods • What is the disadvantage of these methods? These methods partition the image into overlapped patches, and process each patch separately, they ignore the consistency of pixels in overlapped patches, which is a strong constraint for image reconstruction.

The Contribution of this Paper • Contribution: • Propose a convolutional sparse coding(CSC) based SC Compared with conventional sparse coding methods which process each overlapped patch independently, the global decomposition strategy in CSC is more suitable for image reconstruction • Train a sparse mapping function To take full advantage of the feature maps generated by the convolutional coding, this paper utilizes the feature space information to train a sparse mapping function. Such a mechanism reduces the number of filters used to decompose the LR input image • Experiments turn out better results Experiments on commonly used test images show that the proposed method achieves very competitive SR results with the state-of-the-art methods not only in PSNR index, but also in visual quality.

Details About the proposed Algorithm • Flowchart: In order to obtain sparser feature maps, this paper decomposes the LR image into one smooth component and one residual component before SR. The smooth component is simply enlarged by the bi-cubic interpolator, and the proposed CSCSR model is performed on the residual component

Details About the proposed Algorithm • Algorithm: 1 The corresponding learning models： 2 4

Experimental Results • Convergence Analysis: In most of experiments, the algorithm will converge in 10 iterations

Visual Computing Theory and Engineering

Visual Computing Theory and Engineering

Presentation Transcript

Visual Grounded Theory

Visual Reverse Engineering

F453 – Computing Theory

Visual Computing

Theory of Computing

Theory of Computing

Theory of Computing

Theory of Computing

Theory of Computing

Theory of Computing

Visual Grounded Theory

Theory of Computing

Theory of Computing

Theory of Computing

Theory of Computing

Theory of Computing

Theory of Computing

Theory of Computing

Theory of Computing

Visual Computing Theory and Engineering

Theory of Computing

Visual Engineering