750 likes | 762 Views
Explore the subtopics of low-level visual computing, such as image estimation, de-noising, restoration, and super resolution techniques. Delve into the theories and engineering concepts behind these topics.
E N D
Visual Computing Theory and Engineering Topic: Low Level Group 1
Topic:Low-level • Group Members (Group1): • 车朝晖,李伟,蔡春磊,陈琳,张烨珣,李高磊 • 陈卉,郑策,王敏思,朱文瀚 • Subtopics: • Image estimator • Image de-noising • Image restoration • Super resolution
Outline • ‘Low-level’ and ‘High-level’ • Subtopics of ‘Low-level’: • Image estimator • Image de-noising • Image restoration • Super resolution • Summary
Low-level & High-level • Low-level[1]: • Low level image processing is mainly concerned with extracting descriptions from images (that are usually represented as images themselves) . • There may be multiple, largely independent descriptions, such as edge fragments, spots, reflectance, line fragments, etc. • High-level: • High-level features are something that we can directly see and recognize, like object classification, recognition, segmentation and so on. [1] Low-level image processing http://homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/MARBLE/low/low.htm
Subtopics of ‘Low-level’: • Image estimator • Image de-noising • Image restoration • Super resolution
Image Estimator • Generative Image Modeling Using Spatial LSTMs • Zhaohui Che 车朝晖 • Depth and surface normal estimation from monocular images using regression on deep features and hierarchical CRFs • Gaolei Li 李高磊
MCGSM : mixture of conditional Gaussian scale mixtures • We should give the pixels an ordering and specify the distribution of each pixel conditioned on its parent pixels. • SLSTM : spatial long short-term memory • core part: memory units c(ij) and hidden units h(ij). For each location (i,j), the operation is :
σ is the logistic sigmoid function, indicates a pointwise product, and T(A,b) is an affine transformation which depends on the only parameters of the network A and b. The gating units i(ij) and o(ij) determine which memory units are affected by the inputs through g(ij), and which memory states are written to the hidden units h(ij).(RIDE)Recurrent image density estimator:use pixels in a much larger region for prediction, and to nonlinearly transform the pixels before applying the MCGSM
Depth and surface normal estimation from monocularimages using regression on deep features and hierarchical CRFs 李高磊
Problems • Depth and surface estimation for single monocular color images
Related Methods • Conditional random fields (CRFs). • Regression on deep convolutional neural network (DCNN).
Steps • Depth regression with CNNs. • Refining the results via hierarchical CRF.
Experimental Results • NYU2 data set
Conclusions • In this paper, we have presented a new and common framework for depth and surface normal estimation from single monocular images, which consists of regression using deep CNNs and refining via a hierarchical CRF. With this simple framework, we have achieved promising results for both tasks of depth and surface normal estimation. In the future, we plan to investigate different data augmentation methods to improve the performance in handling real-world image transformations. • Furthermore, we plan to explore the use of deeper CNNs. Our preliminary results show that improved depth estimation can be obtained with VGGNet, compared with AlexNet. In addition, the effect of joint depth and semantic class estimation with deep CNN features also deserves attention.
Image de-noising • Learning a Convolutional Neural Network for Non-uniform Motion Blur Removal • Ce Zheng 郑策 • Patch Group Based Nonlocal Self-Similarity Prior Learning for Image De-noising • Wenhan Zhu 朱文瀚
Learning a Convolutional Neural Network for Non-uniformMotion Blur Removal 郑策
Introduction • Image deblurring aims at recovering sharp image from a blurry image due to camera shake, object motion or out-of-focus. • Estimate the probabilities of motion kernels at the patch level using a convolutional neural network (CNN) • Fuse the patch-based estimations into a dense field of motion kernels using a Markov random field (MRF) model. • Effectively estimate the spatially varying motion kernels, which enable us to well remove the motion blur.
CNN for Motion Blur Estimation • First predict the probabilities of different motion kernels for each image patch. • Then estimate dense motion blur kernels for the whole image using a Markov random field model enforcing motion smoothness. Representation of motion blur kernel by motion vector and generation of motion kernel candidates
Patch-level Motion Kernel Estimation by CNN Structure of CNN for motion kernels prediction It is composed of 6 layers of convolutional layers and fully connected layers, outputing the probability of each candidate motion kernel using soft-max layer. Motion kernel estimation on a rotated patch
Dense Motion Field Estimation by MRF The left of (b) show four blurry patches cropped from (a). Each color map on the right of (b) shows the probabilities of motion kernels in different motion lengths and orientations estimated for each blurry patch by CNN. Note that the high probability regions are local in each map. (c) shows our final motion kernel estimation. Examples of motion kernel probabilities
Dense Motion Field Estimation by MRF Example of non-uniform motion kernel estimation (b) Estimation using the unary term of Eqn.(6), i.e., choosing the motion kernel with highest confidence for each pixel. (c) Estimation using the full model of Eqn.(6) with motion smoothness constraint. (d) Ground-truth motion blur.
Experiments Figure 9 presents four examples with strongly non-uniform motion blur captured for scenes with complex depth layers. The first three examples are real-captured blurry images, and the final example is a synthetic blurry image. All these examples show that our CNN-based approach can effectively predict the spatially varying motion kernels.
Conclusion In this paper, we have proposed a novel CNN-based non-uniform motion deblurring approach. We learn an effective CNN for estimating motion kernels from local patches. Using an MRF model, we are able to well predict the non-uniform motion blur field. This leads to state-of-the-art motion deblurring results.
Patch Group Based Nonlocal Self-Similarity Prior Learning for Image Denoising Wenhan Zhu
Purpose • Background:There is not an explicit NSS prior model learned from natural images for image restoration. • Major work:In this paper, they propose to learn explicit NSS models from natural images, and apply the learned prior models to noisy images for high performance denoising. • Try to develop a patch group based NSS prior learning scheme to improve the performance of image restoration.
Flowchart of PGPD • Flowchart of the proposed patch group based prior learning and image de-noising framework.
Patch group • A PG is formed by grouping the M similar patches, denoted by • The mean vector of this PG is • is the group mean subtracted patch vector • called PG
Results: • Compare the proposed PGPD algorithm with BM3D, EPLL, LSSC, NCSR and WNNM. • PGPD has higher PSNR values than BM3D, LSSC, EPLL and NCSR, and is only slightly inferior to WNNM. However, PGPD is much more efficient than WNNM. • In summary, the proposed PGPD method demonstrates powerful de-noising ability quantitatively and qualitatively, and it is highly efficient.
Image Restoration • Conformal and Low-Rank Sparse Representation for Image Restoration • Yexun Zhang 张烨珣
Conformal and Low-Rank Sparse Representation for Image Restoration Yexun Zhang 张烨珣
Objective: Image Restoration • Method: Obtaining an appropriate dictionary is the key point when sparse representation is applied to computer vision or image processing problems such as image restoration. • Opportunity: Many existing dictionary learning methods handle training samples individually, while missing relationships between samples, which result in dictionaries with redundant atoms but poor representation ability.
Sparse Representation:is a model which suggests that there exists a dictionary which can reconstruct the signals. Each signal can be represented by a sparse linear combination of atoms in the dictionary. • How to obtain the dictionary? analytically learning dictionary predefined from training samples
Early dictionary learning methods • K-SVD algorithm • Focus on reconstruction power of the dictionary • Depend on a large training dataset • Fixed dictionary size leading to dictionary redundancy • Add discrimination constraints • Only train samples individually or consider the discrimination between classes • Ignore local relationships between samples or structure of the data manifold leading to redundant dictionaries with poor representation ability
Consider the relationship • local perspective: • Local samples have similar features and form a local subspace, reflecting the affinities between each other. • Global perspective: • Samples with similar features are linearly related, and thus they lie on a low-dimensional latent space. Key: How to embed these two relationships into dictionary learning and sparse representation?
The paper’s work • Conformal and Low-rank Sparse Representation (CLRSR ) • Conformal property is introduced by preserving the angles of localgeometry formed by neighboring samples in the feature space. • Imposing low-rank constraint on the coefficient matrix can lead more faithful subspaces and capture the global structure of data.
The data inner structures can be better modelled through Conformal Eigenmaps, which projects data from a high dimensional space to a low dimensional manifold while preserving the angles formed by neighboring samples. • These angle relationships are called as the conformal property. • Enforce the coefficient matrix to be low-rankto involve the global structure of data. • The samples extracted from image/video are relevant to each other, thus these samples lie on low-rank subspaces. • Samples having similar features will have similar sparse representations, resulting in similar coding coefficients. Therefore, the coefficient matrix A is expected to be low-rank.
Super Resolution • Convolutional Sparse Coding for Image Super-resolution • Lin Chen 陈琳,Chunlei Cai 蔡春磊 • Deep Networks for Image Super-Resolution with Sparse Prior • Minsi Wang 王敏思 • Bidirectional Recurrent Convolutional Networks for Multi-Frame Super-Resolution • Wei Li 李伟 • Video Super-Resolution via Deep Draft-Ensemble Learning • Hui Chen 陈卉
Convolutional Sparse Coding for Image Super-resolution 2015 IEEE International Conference on Computer Vision Reporter: Chunlei Cai 蔡春磊 Lin Chen 陈琳
Some Fundamental Concept • Definition: • Super-resolution,SR The purpose of super-resolution (SR) is to reconstruct a high resolution (HR) image from a single low resolution (LR) image or a sequence of LR images • Sparse coding,SC Sparse representation encodes a signal vector x as the linear combination of a few atoms in a dictionary D, i.e., x ≈ Dα, where α is the sparse coding vector
Related Work • Single Image Super Resolution Methods,SISR: • Most SISR methods utilize the prior knowledge on image patches, which can be grouped into three categories example-based methods mapping-based methods sparse coding based methods • What is the disadvantage of these methods? These methods partition the image into overlapped patches, and process each patch separately, they ignore the consistency of pixels in overlapped patches, which is a strong constraint for image reconstruction.
The Contribution of this Paper • Contribution: • Propose a convolutional sparse coding(CSC) based SC Compared with conventional sparse coding methods which process each overlapped patch independently, the global decomposition strategy in CSC is more suitable for image reconstruction • Train a sparse mapping function To take full advantage of the feature maps generated by the convolutional coding, this paper utilizes the feature space information to train a sparse mapping function. Such a mechanism reduces the number of filters used to decompose the LR input image • Experiments turn out better results Experiments on commonly used test images show that the proposed method achieves very competitive SR results with the state-of-the-art methods not only in PSNR index, but also in visual quality.
Details About the proposed Algorithm • Flowchart: In order to obtain sparser feature maps, this paper decomposes the LR image into one smooth component and one residual component before SR. The smooth component is simply enlarged by the bi-cubic interpolator, and the proposed CSCSR model is performed on the residual component
Details About the proposed Algorithm • Algorithm: 1 The corresponding learning models: 2 4
Experimental Results • Convergence Analysis: In most of experiments, the algorithm will converge in 10 iterations