Structured Modeling and Learning in Generalized Data Compression and Processing

VALSE-WebinarLecture Structured Modeling and Learning in Generalized Data Compression and Processing HongkaiXiong 熊红凯 http://ivm.sjtu.edu.cn 电子工程系上海交通大学 13 Jan. 2016

L Sparse Representation = N • Sparse representation where x Ψ θ Σ

Multimedia Communication 3 High dimension 3D signal （video） 1D signal （audio） 2D signal （image） SVC Sources DVC Video coding: advances in higher dimension and higher resolution, in goals of better R-D behavior and greater compression rate Networks: develops to multiple data streaming within heterogeneous network structure, in goals of higher throughput and transmission reliability Multicast Many-to-many Multicast One-to-many Unicast One-to-one Networks

4 Generalized Context Modeling in Signal Processing • Wenrui Dai, HongkaiXiong, J. Wang, S. Cheng, Y. F. Zheng, "Generalized Context Modeling with Multi-Directional Structuring and MDL-based Model Selection for Heterogeneous Data Compression", IEEE Trans. Signal Processing, 2015. • Wenrui Dai, HongkaiXiong, J. Wang, et al., “Discriminative structured set prediction modeling with max-margin Markov network for lossless image coding,” IEEE Transactions on Image Processing, 2014. • WenruiDai, HongkaiXiong, X. Jiang, et al., “Structured set intra prediction with discriminative learning in max-margin Markov network for High Efficiency Video Coding,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 23, no. 11, pp. 1941-1956, 2013.

Heterogeneous Data Compression: Data • Heterogeneous data are generated by multiple interlacing sources complied with different incomplete distribution statistics • Image&Video:Spatial correlations are characterized by piecewise smooth with local oscillatory patterns like multiscale edges and textures • Genomesequence:Repeatable patterns of nucleotides in various regions • Executablefiles:Multiple interlacing data streams, e.g. opcode, displacement, and immediate data

Heterogeneous Data Compression: Framework Datatobepredicted Estimatedprobability Encodedbitstream ContextModel Coder 01011101…… Structuredprobabilisticmodel • Captureregularpatterns • Optimizedforspecificdata • Context-basedsetprediction Classiccontextmodel • Variableorder • Sequentialprediction • Weightedestimation

Background • Context is suffix of data symbolsin classical context modeling. • Define the set of subsequences whose suffix is s. • Disjoint property: no string in the context set is a suffix of any other string in this set, or for , ; • Exhaustive property: each subsequence of data symbols can find its suffix in the context set, or .

Foundation: Structured Prediction Model • Graphical probabilistic model • Structured prediction model can be represented in the form of graph. • Each node for the random variable to predict and each edge for the interdependencies among nodes. • Estimating joint or conditional distribution for its nodes. • Learning methods • Markov random field (MRF) • Max-margin Markov network: MRF + SVM • Reasoning algorithms • Belief propagation (BP) • Expectation propagation (EP)

StructuredProbabilisticModel:Motivation • Complex data structure • Cannot be analytically represented • Adaptively capture features with learning-based algorithms • Incomplete distribution • Cannot exactly estimate parameters for actual distribution • Context-based predictive model using learning-based algorithms • Structural coherence • Cannot guarantee structural coherence of prediction task with isolated prediction • Structured probabilistic model to constrain prediction task with structural coherence

Perceptual Intuition Motivation Intuition 1: Structural coherence for heterogeneous data An example: Images generated with the same local distribution Structural coherence • A natural images is not a 2D array of pixels generated with probabilistic distribution. The structural coherence is maintained to keep an image meaningful. • Structured prediction model is proposed to maintain such coherence.

Generalized Context Modeling Motivation Intuition 2: Complex structure for heterogeneous data Statistics of heterogeneous data is not sequential and with uniform distribution, but is flexible and with interlaced complex distribution. Prediction based on sequential, contexts with uniform distribution Prediction based on flexibly constructed contexts with interlaced complex distribution

Problem Definition Pixel-wise prediction to impair the structure Similar PDF Challenge： • Linear Prediction without high-order nonlinearity • Independent Prediction without inter-dependency Parallel prediction to keep structure Contribution： • Structure consistence →Structured Set

Structured Probabilistic Model: Example 4x4 block at coordinate (401,113) in LENA Least Squares MSE:82.06 Structured Probabilistic Model MSE:68.75

Motivation Theoretical support: Sequential Source Coding • Harish Viswanathan & Berger 2000:Given random variable X1 and X2, under arbitrary distortion D1 and D2, the rate for jointly describing them is no greater than the rate to describe them separately. Encoder1 Decoder1 Encoder2 Decoder2

Contribution • GCM for heterogeneous data compression • Structured probabilistic model for genome compression • MRF for dependency between side information & optimized with BP • Structured probabilistic model for lossless image coding • M3N for joint spatial statistics and structural coherence • Structured probabilistic model for intra-frame video coding • M3N optimized with EP

Contribution Universal Coding Heterogeneous Data Data Dependency Executables Syntax & Semantics Genome Learning GCM Input space Image Feature space Video Wenrui Dai@SJTU, 2014

Background: Heterogeneous Data • Heterogeneous data • Genomic data • Long repeats with exception of insertion, deletion, and substitution. • Image and Video • Spatially spanned along the structures, e.g. edge, texture, and etc. • Executables • Interlaced data streams, e.g. opcodes, immediate data, and etc.

Definition:Heterogeneous Data • Heterogeneous data is generated by interlacingM data streams with a time-varying random process • The j-th data stream is emitted from a stationary Markov source with order • Symbol is obtained from the -th data stream by • and • is not stationary and wide sense stationary

Clue In Memory Of Ben Taskar (1977-2013) A rising star in machine learning, computational linguistics and computer vision The founder of Max-Margin Markov Network

Generalized Context Modeling Scenario: Coding Based on Context Modeling Predictive models for heterogeneous data compression Symbols for predicting Estimated probability Enocded bits Context Model Coding engine 01011101…… Structured prediction model • Capture intrinsic structure of complex data • Find adaptation to specific data • Optimal set prediction based on observed contexts

Generalized Context Modeling Topology Illustration Generalized context modeling (GCM) with combinatorial structuring & multi-dimensional extension Extended context Context with combinatorial structuring Current symbol Sequential context 2-D context Context with multi-dimensional extension M-D context

Generalized Context Modeling Graphical Model for Prediction • Graphical model for GCM with D-order and M-directional context. • Symbols for predicting are correlated with their neighboring symbols. • Component of context in each direction is served as an observation for the prediction. • Conditional random field represent dependencies among symbols for predicting and context-based correlations. Contexts Symbols for predicting

Definition:ContextSet • Given context s, the set of subsequences containing s is where is the index set of s. • is a valid generalized context model, if it satisfies in each of its direction • Exhaustive property：for any subsequence in j-th direction, there exists s in such that • Disjoint property： for any subsequence in j-th direction, given arbitrary sand s’

Modeling&Prediction:ModelGraph • Trellis-like graph rooted from M-ary vector (∅, · · ·,∅) • Each node corresponds to an index set for finite order combination of predicted symbols. Given node , • Its succeeding node satisfies that • Its preceding node statisfies that • possible context structures locating in DM+1 vertical slices for GCM with given M and D.

Generalized Context Modeling Model Tree example

Representation&Prediction: ModelGraph • Model graph with depth D=3 and M=2 directions • Solid (red) and dashed (blue) paths share some common nodes

Generalized Context Modeling Problem Statement • Model tree to represent generalized context models and their successive relationship. • Minimum description length (MDL) principle to select optimal class of context models. • Normalized maximum likelihood (NML) to determine optimal weighted probability for prediction.

Generalized Context Modeling Model Tree • For contexts with maximum order D and M directions, model tree elaborates all the possible combinations of finite-order predicted symbols and their successive relationship. • Its root is M-directional empty vector , and each of its node corresponds to the index set of one combination of predicted symbols. • There are nodes in the paths from the root to the leaf nodes, which constraining the context selection.

ModelSelection:SeparableContextModeling • Prediction based on contexts with multi-directional structuring can be made separately in each of its directions. Given where and its elements are • The size of model class grows linearly with M

Generalized Context Modeling Model Selection • NML function with MDL principle for model selection. • Contexts in each direction are compared with the NML function to find optimal context for predicting current symbol. • For M-interlaced autoregressive sources, the model complexity is constant with data size N. Model complexity Code assignment function

Generalized Context Modeling Weighted Probability for Prediction • Estimated probability for generalized context modeling In a sequential way, for each symbol • For each context s, its weights is where

Generalized Context Modeling Model Redundancy • Given generalized model class M with maximum order D and M directions, the model redundancy led by multi-directional extension is • where L is the size of alphabet, ηis the compensation for various contexts. • The model redundancy led by multi-directional extension only depends on the maximum order D and the number of directions M, but is independent of size of data N.

Generalized Context Modeling Experimental Results • In Calgary corpus, GCM outperforms CTW by 7%-12% in executable files and seismic data. • In executable file compression, GCM outperforms PPMd and PPMonstr by 10% and 4%, respectively. GCM is comparative to the best compressor PAQ8 with less computational complexity. • ML-based does not fully exploit the statistics in heterogeneous data. As an alternative, learning of structured prediction model is proposed.

Model Redundancy • Given generalized model classM with maximum order D and M directions, the model redundancy led by combinatorial structuring is • where L is the size of alphabet, ηis the compensation for various contexts. • The model redundancy led by multi-directional extension only depends on the maximum order D, but is independent of size of data N.

Conceptual Diagram: Image • Discriminative prediction distinguishes the actual values of pixels with other possible estimations to the max margin based on contexts, but cannot utilize the structure for predictions. • Markov network maintains the structural coherence in the regions for predicting but cannot optimize the context-based prediction. • Joint optimization by max-margin Markov network

Diagram Flowing diagram for structured set prediction model Structural coherence Imaging constraints for set of pixels Sampling Encoding Context-based prediction for each pixel Context-based prediction for each pixel Decoding Reconstruction Imaging constraints for set of pixels Structural coherence

Prediction • Given y the block of encoding pixels and x the reconstructed pixels as contexts, its prediction is derived in the concurrent form. • Local spatial statistics is represented by the linear combination of the class of feature functions. Trained model parameters Collection of feature functions

Training Trained model parameter Joint optimization Feature function Loss function Structural coherence • Model parameter w is trained over the collection of training data S={xi, yi}. • The feature functions {fi} establish the conditional probabilistic model for prediction based on the various contexts derived from the supposed predictive direction • is the loss function that evaluates the prediction and adjusts the model parameter w.

Loss Function • The M-ary estimated output ŷ(i) for block of pixels y is measured over the generated graphical model. • Log-Gaussian function for node clique and Dirac function for edge clique. With prediction error ϵi= ŷ(i) -y(i) and variance σ2 over errors

Solution • Standard quadratic programming (QP) for solving the min-max formulation suffers the high computational cost for the problems with large alphabet size. As an alternative, its dual is proposed. • Sequential minimal optimization (SMO) breaks the dual problem into a series of small (QP) problems for cliques and takes an ascent step to modify the least number of variables • where and αi(y) is the marginal distribution for the ith clique. SMO iteratively chooses the pairs of y with respect to the KKT condition for solution.

Solution • Junction tree for loopy Markov network. Each junction is generated by adding edges to link cliques. • Junction tree is unique. • Each clique is predicted along the junction tree • Belief Propagation (BP) as message passing algorithm for inference and update the potential of each clique

Upper Bound of Prediction Error Theoretical upper bound for prediction error Theorem: Given the trained weighting vector w and arbitrary constant η>0, the prediction error is asymptotically equivalent to the one obtained over the training data with probability at least 1-exp(-η). Upper bound for average prediction error Additional term converges to zero when N grows Upper bound for γ-relaxed average training error Remark: The prediction error is upper-bounded by the well-tuned training error. The Theorem ensures the predictive performance of the structured prediction model.

Upper Bound of Prediction Error • In view of probability, given the trained weighting vector w and arbitrary constant η>0, with sufficient sampling, there exists ε(L,γ,N,η)→0, satisfying • The prediction error is upper-bounded by the well-tuned training error. The Theorem ensures the predictive performance of the structured set prediction model.

Implementation • Combining with variance-based predictor for smooth regions, structured set prediction serves as an alternative mode • Comparing the coding cost of two alternative modes for the optimal one • Log-Gaussian loss function to obtain optimal coding of the residual based on alleged Gaussian distribution.

Experimental Results • Performance exceeds JPEG-LS by 10% and JPEG2000 lossless mode by 14% in average in bits per pixel. • Performance exceeds the minimum rate predictor (MRP, the optimal predictor) by 1.35% in average in bits per pixel.

Conceptual Diagram: Video Conceptual description for structured prediction model Trained model parameter Feature function Joint optimization Loss function Structural coherence • Optimal joint prediction by max-margin Markov network • Max-margin estimation directly conditioned on the predicted pixels for context-based prediction • Markov network to maintain the structural coherence in the regions for predicting.

Loss Function • Laplacian loss function for the M-ary estimated error. • Laplacian errors derived for each node and the state transition of the neighboring nodes for each edge. For each node, its prediction error is where the error ϵi= ŷ(i) -y(i) and variance σ2 over errors. • Laplacian loss function meets with DCT transform. The structured prediction model optimize it for minimal coding length under HEVC framework.

Expectation Propagation for Message Passing • Utilize SMO for Solving the standard quadratic programming (QP) for the max-margin Markov network. Accordingly, junction tree is generated and message passing algorithm is conducted along junction tree for the most probable states of each pixel. • The lossy intra video coding does not require to propagate the actual states along the junction tree. Statistics like means and variance cannot be selected and propagated for robust message passing with convergence. • Expectation propagation (EP) utilizes such statistics so that the actual distribution is approximated with the exponential family. The metric for approximation can be varied based on the video data. • Prediction based on EP is proven to converge to an upper bound.

Implementation • Structured prediction model as an alternative mode: MODE_STRUCT • Integration into current HEVC framework without additional syntax element • Mode decision by rate-distortion optimization • Laplacian-based loss function for the residual obtaining best coding performance under DCT transform

Experimental Results • Performance exceeds HEVC common test model by 2.26% in BD-rates. The gain in BD-PSNR is up to 0.38dB. • Performance exceeds HEVC with combined intra coding (CIP) by 1.31% in BD-rates. Foreman_352×288 BlowingBubbles_416×240 BQMall_832×480 Cactus_1920×1080

Structured Modeling and Learning in Generalized Data Compression and Processing

Structured Modeling and Learning in Generalized Data Compression and Processing

Presentation Transcript

Adaptive Filtering and Data Compression using Neural Networks in Biomedical Signal Processing

Wavelets and Data Compression

Structured Design and Modeling

Data Compression and Security

Structured Data Types and Encapsulation

BIST AND DATA COMPRESSION

Sparse Modeling of Graph-Structured Data … and … Images

Relational Learning, and Structured Output

Multimedia Data Introduction to Data Compression and Lossless Compression

Artifact and Structured Data Management

Learning and information processing

Semi-Structured Data and XML

Image Compression and Signal Processing

Plate Scanning, Processing, and Compression Status

Mesh modeling and processing

Mathematical modeling and streamed data processing

Using Machine Learning to Discover and Understand Structured Data

Structured Design and Modeling

Structured Data and Classes