Visual Computing Theory and Engineering

Visual Computing Theory and Engineering Topic: Descriptors Group Members:马悦郭超世胡欢武刘国超宋志超王丹肖勖徐阳杨一诚朱璐瑶白立勋

Descriptors • In computer vision, visual descriptors or image descriptors are descriptions of the visual features of the contents in images, videos, or algorithms or applications that produce such descriptions. • They describe elementary characteristics such as the shape, the color, the texture or the motion, among others.

MatchNet: Unifying Feature and Metric Learning for Patch-Based Matching 马悦

Introduction • MatchNet:a patch matching system • Propose and evaluate a unified approach for patch based image matching that jointly learns: • A deep convolutional neural network for local patch representation • A network for robust feature comparison

Contributions • A new state-of-the-art system for patch-based matching using deep convolutional networks that significantly improves on the previous results. • Improved performance over the previous state of the art using smaller descriptors. • Provide a public release of MatchNet trained using our own large collection of patches.

Network Architecture

Result

Local Convolutional Features with Unsupervised Training for Image Retrieval 115034910135 王丹

Summary • Aim stereo-matching content based image retrieval. • Contribution patch-level descriptors —— patch-CKN unsupervised framework • Dataset RomePatches

RomePatches • Top: examples of matching patches • Bottom: Images of the same bundle

Image Retrieval Pipeline • Interest point detection same with changes in viewpoint or illumination Hessian-Affine detector • Interest point description a normalized patch M feature representation φ(M) in a Euclidean space • Patch matching fixed-length image descriptor VLAD representation

Convolutional Descriptors • Convolutional Neural Networks three convolutional and one fully connected layers input 64x64 patches produces a 512 dimensional output • Convolutional Kernels Networks input: CNK-white CNK-grad CNK-raw

Result

Comparison • Comparison with state-of-the-art image retrieval results.

End-to-End Integration of a Convolutional Network, Deformable Parts Model and Non-Maximum Suppression Speaker:bailixun

What’s DPM, ConvNets and non-maximum suppressionrespectively? • Deformable Parts Models and Convolutional Networks each have achieved notable performance in object detection. • DPMs are well-versed in object composition, modeling fine-grained spatial relationships between parts • ConvNets are adept at producing powerful image features, having been discriminatively trained directly on the pixels.

What’s the goal of this article？ • They propose a new model that combines these two approaches, obtaining the advantages of each. Howtoachievethismodel？ • They train this model using a new structured loss function that considers all bounding boxes within an image, rather than isolated object instances. • This enables the non-maximal suppression (NMS) operationto be integrated into the model.

overview

What’s the advantages to do so? • They use a DPM for detection, but replace the HoG features with features learned by a convolutional network. This allows the use of complex image features, but still preserves the spatial relationships between object parts during inference.

Deep Multi-Patch Aggregation Networkfor Image Style, Aesthetics, and Quality Estimation Reporter:Zhichao Song

The problems to be investigated • Image style recognition • Aesthetic quality categorization • Image quality estimation

Drawbacks of traditional mathods • Ignored fine-grained high resolution details inimages • The performance of the single-column neural networks remains to improve

Improved methods • multiple image resolutions • Multi-column neural network

Multi-patch aggregation networks

Statistics aggregation structure

Fully-connected sorting aggregation

The Application of Two-level Attention Models in Deep Convolutional NeuralNetwork for Fine-grained Image Classificationlecturer: Chaoshi Guo Student Id:115034910120

Destination: Fine-grained Image Classification Difficulty:Intra-class variance can be larger than inter-class, so fine-grained classification aretechnically challenging Method: Two-level Attention Models in Deep Convolutional Neural Network

Most fine-grained classification systems follow the pipeline: finding foreground object or object parts (where) to extract discriminative features (what). So, a method called bottom-up process is appeared. • Advantage: the method has high recall • Disadvantage: low precision and the most strongest supervision • Because of the above question: we find foreground object and object parts can be regarded as a two-level attention processes, one at object-level and another at part-level

Two-level Attention Models: 1) Object-Level Attention Model • Patch selection using object-level attention • Training a DomainNet • Classification using object-level attention

Two-level Attention Models: 2) Part-Level Attention Model • Building the part detector • Building the part-based classifier

Two-level Attention Models: 3) The Complete Pipeline

Results of the comparison between methods :

Conclusions we propose a fine-grained classification pipeline combining bottom-up and two top-down attentions. The object-level attention feeds the network with patches relevant to the task domain with different views and scales. Both levels of attention can bring significant gains, and they compensate each other nicely with late fusion. One important advantage ofour method is that, the attention is derived from the CNN trained with classification task, thus it can be conducted under the weakest supervision setting where only class label is provided.

Fully Connected Object Proposals for Video Segmentation 肖勖 115034910141

Fully Connected Object Proposals for Video Segmentation • Introduction: a novel approach to video segmentation using multiple object proposals • Method: combine appearance with long-range point tracks • Advantage: ensure robustness with respect to fast motion and occlusions over longer video sequences

Fully Connected Object Proposals for Video Segmentation • (1) • (2) • (3) • we demonstrate robustness to challenging situations typical of unconstrained videos such as: • fast motion and motion blur (1), • color ambiguities between fore- and background (2), • and partial occlusions (3).

Fully Connected Object Proposals for Video Segmentation • Algorithm: • First: a rough classification and subsampling of the data is performed using a self-trained Support Vector Machine (SVM) classifier • Next: maximum a posteriori (MAP) inference is performed on a fully connected conditional random field (CRF) • Finally: each labeled proposal casts a vote to all pixels that it overlaps. The aggregate result yields the final foreground-background segmentation.

Fully Connected Object Proposals for Video Segmentation • Process: • Object Proposal Generation • Candidate Proposal Pruning • Feature Extraction and Training • Classification and Resampling • Fully Connected Proposal Labeling

Fully Connected Object Proposals for Video Segmentation • Left: Precision-Recall curves and F-score isolines for the SVM and CRF classification of object proposals into foreground and background. • Right: Average, maximum and minimum F-score.

Fully Connected Object Proposals for Video Segmentation • Limitations: • It requires a sufficiently high video resolution such that the computation of proposals using existing techniques produces meaningful results.

Image Based Relighting Using Neural Networks 徐阳 XU Yang 1150349101421

OUTLINE • Relight the image • Neural network algorithm • results

Light Transport Reconstruction • Brute force methods • directly sample all the entries of the lighttransport matrix from the scene • Sparsity based methods • model light transport using a sparserepresentation that is recovered from images of the scene lit with designed illumination patterns • Coherence based methods • exploit the data coherence in light transport to reconstruct the light transport matrix from a subset of rows/columns sampled from the scene

Neural Networks for Light Transport • Formulate the light transport matrix as discrete samples of a continuous light transport function • Approximate the transport function using neural networks

Light transport function Model the light transport matrix as discrete samples of a continuous light transport functionΨ(p,l): M(i, j) = Ψ(p(i),l( j)) where M(i, j) is an element in the light transport matrix that corresponds to pixel i and light source j, p(i) denotes the image coordinates of pixel i, and l( j) is the position of light source j in the 2D light domain. By expressing the 2D light transport matrix as a continuous 4D light transport function, the coherence of light transport in both the image domain and light domain can be more readily exploited

Neural network approximation • We approximate the light transport function with multilayer acyclic feed-forward neural networks

Light Transport Reconstruction • To reconstruct the light transport matrix, we recover the light transport function through neural network regression on captured images. • A light transport matrix element is approximated by averaging the outputs of all the base neural networks Φn: where Ne is the number of base neural networks in the ensemble, and wn is the weight vector of base neural network Φn.

Adaptive Fuzzy Clustering • Fuzzy clustering • Adaptive fuzzy clustering

Visual Computing Theory and Engineering

Visual Computing Theory and Engineering

Presentation Transcript

Visual Grounded Theory

Visual Reverse Engineering

F453 – Computing Theory

Visual Computing

Theory of Computing

Theory of Computing

Theory of Computing

Theory of Computing

Theory of Computing

Theory of Computing

Visual Grounded Theory

Theory of Computing

Theory of Computing

Theory of Computing

Theory of Computing

Theory of Computing

Theory of Computing

Theory of Computing

Theory of Computing

Visual Computing Theory and Engineering

Theory of Computing

Visual Engineering