Three Novel Algorithms for Hiding Data in PDF Files Based on Incremental Updates

Three Novel Algorithms for Hiding Data in PDF Files Based on Incremental Updates Li Lei School of Information Science and Technology Sun Yat-Sen University

Contents 1 Introduction 2 The Structure of PDF Files 3 Incremental Updates Proposed Algorithms 4 Experimental Results 5 Future work 6

Introduction • PDF (Portable Document Format) A widely used electronic document format High printing quality Cross-platform applicability Device-independence

Introduction • Hiding information in PDF file Secret message transmission Mark the source and transmission path

Introduction • Existing algorithms • First category Varying the line, word, character spacing or other certain display attributes slightly. [2,3,4,5,6,7] Obvious defects that the effect of page display is disturbed and that information security is relatively low. • Second category Adding or changing the content of PDF file streams. [8,9,10] Disadvantages in guaranteeing large capacity, high security and robustness to some degree.

The structure of PDF file • File structure (Physical structure) It includes the header, the body which contains a lot of objects, the cross-reference table containing information about the indirect objects in the file and the trailer. • It determines how the objects are stored in a PDF file.

The structure of PDF file • Document structure (Logical structure) A PDF document can be regarded as a hierarchy of objects contained in the body section of a PDF file. The document structure of PDF file is organized in the shape of an object tree topped by Catalog and five subtrees named Page tree, Outline hierarchy, Article thread, Named destinations and Interactive form included.

The structure of PDF file • Object An object is the basic element in PDF files. PDF supports eight basic types of objects: Boolean Object, Numeric Object, String Object, Name Object, Array Object, Dictionary Object, Stream Object and Null Object. Objects may be labeled so that they can be referred to by other objects. A labeled object is called an indirect object.

The structure of PDF file • Content stream The content stream belong to Page tree contains the almost all information about PDF text contents and display attributes. Each page’s contents will be cut to some blocks and saved in some dictionary objects named Contents object. Each Contents object will contain text object and text state. The text object describes the text contents and the text state is a collection of page display attributes.

Incremental updates The contents of PDF file can be updated incrementally without rewriting the entire file. Changes are appended to the end of the file, leaving its original contents intact. In an incremental update, any new or changed objects are appended to the file, which constitute the updated body at the end of the file, a cross-reference section and a new trailer are appended followed.

Incremental updates • When Incremental updates? • Right-click and modify properties • “Save” editing operations

Proposed algorithms • A compensated version of modifying display attributes Text state in Contents object indicates the attributes of text display. Every attribute has a operator key word to mark it, such as Char Space: Tc, Word Space: Tw, Scale: Tz, Leading: TL, Font size: Tf, Render: Tr, Rise: Ts etc. These operator key words in the content stream can be modified to hide information.

Proposed algorithms • A compensated version of modifying display attributes But these algorithms affect the display of the PDF file.

Proposed algorithms • A compensated version of modifying display attributes we can compensate the effect of data hiding using incremental updates of PDF files: After altering the text states of contents objects to embed information, the original contents objects are written in updated body.

Proposed algorithms • Algorithms based on new body and cross-reference section • In the updated body, the actual embedded carrier is indirect objects. Considering the complexity of inserting objects, content security, capacity and other factors, we select stream object as the embedded carrier. • Select the new cross-reference section as covert information carrier. We can embed information by controlling the 10-bytes offset in cross-reference section’s entry. Use the difference of adjacent entries’ offset to represent the covert information.

Proposed algorithms • Algorithms based on new body and cross-reference section

The experimental results and analysis • Data Embedding Capacity User interface:

The experimental results and analysis • Perceptual transparency property Seen from the effects chart, after having embedded data, there was not any change in display effect of the cover file.

The experimental results and analysis • The robustness to reading and editing operations 1. Robustness to annotating and marking operations Apply Adobe Acrobat 9 Pro to annotate and mark the embedded PDF file in various ways. We try to extract the covert information from it. And the experiment result shows that the accuracy of extracting data is 100%.

The experimental results and analysis • The robustness to reading and editing operations 1. Robustness to interactive form editing (a) is the stego file without any editing and (b) is the file been written some contents to (a). We try to extract the covert information from (b), and the experiment result shows that the accuracy of extracting test is 100%.

The experimental results and analysis • Increase in the size of carrier file 1. Algorithm 1 (Embed 128 bits) Rewriting a Contents object by incremental update will increase the size of the original file by 1 to 8 KB (depending on the size of the original Contents object). Real experimental result shows average rate of files’ size increase is around 1%.

The experimental results and analysis • Increase in the size of carrier file 2. Algorithm 2, 3 (Embed 128 bits) The increase of the size caused by algorithm 2 is irrelevant to the original files. Using 4 objects to embed 128 bits, will add no more than 1KB to original PDF file. 200KB0.5% The increase of the size caused by algorithm 3 is also irrelevant to the original files. Using 22 entries (need to add 22 new objects) of cross-reference to embed 128 bits, the maximal size increase will be around 4 to 5 KB. 2002.5%

The experimental results and analysis • Performance Comparison

Future work Different versions of PDF files are being used at present. Some higher versions of PDF files have used cross-reference streams to store the information of indirect objects. How to advance the compatibility of different PDF versions is the emphasis for our next step work.

Reference 1. Adobe Systems Incorporated. PDF Reference, fifth edition, version 1.6. http://www.adobe.com/devnet/pdf/pdfs/PDFReference16.pdf, 2006 2. S. H. Low and N. F. Maxemchuk. Performance comparison of two text marking methods. IEEE Journal on Selected Areas in Communications, Vol.16, No.4, 1998,pp.561-572 3. J. T. Brassil, et al. Electronic marking and identification techniques to discourage document copying, IEEE Journal on Selected Areas in Communications,Vol.13, No.8, 1995, pp.1495-1504 4. Shangping Zhong, Tierui Chen. Information Steganography Algorithm Based on PDF Documents. Computer Engineering, Vol.32, No.3, Feb. 2006, pp.161-163 5. S. H. Low, et al. Document marking an identification using both line and word shifing. in Proceedings INFOCOM’95, Boston, MA, Apr. 1995, pp.853-860 6. N. F. Maxemchuk and S. H. Low. Marking text documents. in Proceedings, International Conference Image Processing,, Boston, Santa Barbara, CA, Oct. 1997, pp.13-17 7. E. Franz and A. Pfitzmann. Steganography secure against Cover-Stego-Attacek, 3 th International Workshop, Information Hiding 1999,2000, pp.29-46. 8. wbStego Studio. The steganography tool wbStego4. http://www.wbailer.com/wbstego, 2007. 9. Youji Liu, Xingming Sun, Gang Luo. A Novel Information Hidng Algorithm Based on Structure of PDF Document. Computer Engineering, Vol.32, No.17, Sep. 2006, pp.230-232 10. Xingtong Liu, Quan Zhang, Chaojing Tang, Jingjing Zhao and Jian Liu. A Steganographic Algorithm for Hiding Data in PDF Files Based on Equivalent Transformation, in Information Processing (ISIP), 2008 International Symposiums on, 23-25 May 2008, pp. 417-421.

It’s all Thanks

Three Novel Algorithms for Hiding Data in PDF Files Based on Incremental Updates

Three Novel Algorithms for Hiding Data in PDF Files Based on Incremental Updates

Presentation Transcript

Information Hiding in Digital Data

Data Hiding Technique based on Fractal Orthonormal Basis

The optimal method for data hiding based on LSB Matching using inverted pattern

Incremental Consistent Updates

Data Hiding in a Kind of PDF Texts for Secret Communication

Adjustable prediction-based reversible data hiding

Reversible data hiding based on block truncation coding scheme

Improved PVO-based reversible data hiding

A lossless data hiding scheme based on three-pixel block differences

On the Sensitivity of Incremental Algorithms for Combinatorial Auctions

Adjustable prediction-based reversible data hiding

Reversible data hiding scheme based on neighboring pixel differences

Reversible Data Hiding Based on Two-Dimensional Prediction Errors

A Novel Turtle Shell Based Scheme For Data Hiding

Results today based primarily on three data sources…

Reversible Data Hiding

Data Hiding

Adjustable prediction-based reversible data hiding

On the Sensitivity of Incremental Algorithms for Combinatorial Auctions

A New Algorithm for Hiding Data Using Image Based Steganography

Reversible Data Hiding