210 likes | 371 Views
Approaches to Persistence in Java. Philip Johnson Collaborative Software Development Laboratory Information and Computer Sciences University of Hawaii Honolulu HI 96822. Small scale persistence: dozens to hundreds of users single machine
E N D
Approaches to Persistence in Java Philip Johnson Collaborative Software Development Laboratory Information and Computer Sciences University of Hawaii Honolulu HI 96822
Small scale persistence: dozens to hundreds of users single machine All persistent objects can fit in memory at once. No transaction, rollback, fail-over Cheap and fast to implement Large scale persistence: thousands to millions of users clusters of machines, shared caching lazy/incremental loading of objects Transaction, rollback, fail-over required Costly and time-consuming to implement Part 1: Small Scale Persistence
Common motivations • Persistence across application restarts/failure • Avoid data loss • Checkpoints • Allow rollback to previous state • Transfer of data between multiple applications • Synchronous vs. asynchronous • Example: network vs. file system • Application-specific vs. portable • Example: Serialized object vs. XML • Caching of intermediate results • Avoid computation loss
Simple, key-value: java.util.Properties java.util.prefs package JNDI Object-based: java.io.Serializable JavaBean persistence Java Data Objects (JDO) XML-based: JDOM JAXB Database: Relational Object-oriented Enterprise Java Beans: Entity & Session beans CMP & BMP Some flavors of persistence
Persistence creates development issues • Persistence tends to slow down development. • Adds cost & risk to major design changes. • Tends to “lock in” early (bad) design decisions. • Why is persistence a problem? • The Object-Relational Impedance Mismatch • Multiple design issues and constraints • How can we maintain development velocity in face of need for persistence? • A “Late-binding persistence” development strategy
Object-Relational Impedance Mismatch • Object paradigm: • Networks of objects with state and behavior • Processing via: traversal • Classes, inheritance, polymorphism, etc. • Relational paradigm: • Tables of entities with only data. • Processing via: selection/joining of rows • Tables, columns, keys, indices, etc. • The intrinsic differences between paradigms creates design problems.
Example • Consider a family tree. • Consider the query “Return all of the grandchildren of Family Member X” • Which representation would make this query easiest to implement? • A network (tree) of family members • A set of database tables and SQL statements
Addressing the OO-DB IM • 1. Eliminate the OO: • User interface manipulates SQL. • Pros: single paradigm, simplicity • Cons: complexity of “advanced” processing (stored procedures, etc.) • 2. Eliminate the relational DB: • OODBs, Serialized objects, etc. • Pros: single paradigm, simplicity • Cons: potential loss of relational data integrity (normalization)
Addressing the OO-DB IM • 3. Hide the DB: • Object-to-relational mappings, JDO, EJB... • Pros: Allows use of back-end RDBMs • Cons: Complexity, lock-in, overhead • 4. Stop whining and just deal with it: • Manual mapping between objects and tables • Pros: Flexibility • Cons: Maintenance and complexity
Simplicity: How complicated to set up for me? For my users? Financial cost: Do I have to pay for it? Do my users? Data specificity: What kinds of data am I saving? Can I use a “special purpose” persistence mechanism? Design lock-in: How much code will I have to change if I need to change my mechanism? Longevity: How long must the persistent data exist? Scalability: What usage level do I expect over the next six months? Integrity: Do I require transactions? Rollback? Fail-over? OO-Relational impedance mismatch: Do I mind the cost? Optimization: Do I need something faster than a relational database? Choice of persistence depends upon many design issues
One development approach:Late-binding persistence • Initial development: No persistence. • Deploy initial versions to user on “trial basis” with no persistence guarantees. • Early “live” releases: simple, “non-scalable”. • Enable data migration. • Determine true bottlenecks/integrity issues. • Maintain application evolvability. • Ongoing development: • Think about multiple persistence approaches. • Example: Preferences + XML + RDMS • Each approach optimized to persistence requirements. • Applicability of this approach depends upon nature of system/requirements!
Preference and configuration data • java.util.Properties: • Well known, easy to use • No standards as to where data should reside • Problems for backup, or transfer to other machines. • JNDI (Java Naming and Directory Service): • Back-end neutrality • Large, complicated to set up • java.util.prefs (JDK 1.4): • Back-end neutrality of JNDI • Simplicity of java.util.Properties • Can be invoked by multiple threads safely
Object-based persistence:java.io.Serializable • Pros: • Converts an object (and all internal objects) into a stream of bytes that can be later deserialized into a copy of the original object (and all internal objects). • Fast, simple, compact representation of an object graph. • May be great choice for *temporary* storage of data. • Cons: • Creates long-term maintenance issues • Harder to evolve objects and maintain backward compatibility with serialized representation. • See “Effective Java”, Chapter 10, for a good description of issues with Serialization
XML file-based persistence: JDOM and JAXB • Pros: • Very high level of data portability. • Simple • Cons: • Space-inefficient • Complex graph structures problematic. • For data structures: • JDOM • For bi-directional object mappings: • JAXB
Java-based RDBMS • Most important one is Derby. • http://db.apache.org/derby • Will be included in JDK 1.6! • Can run as either ‘embedded’ in your application JVM or as a stand-alone network server. • To embed, just add derby.jar to your classpath!
Open Source Java Persistence Frameworks • Hibernate (www.hibernate.org): • Object to RDBMS binding • “Hibernate Query Language” • Claims to be very fast, very scalable, very efficient. • Most popular open source framework for object/relational mapping in Java.
Others • Enterprise Java Beans • Public standard framework • Simple reference implementation • Support for clustering, fail-over, etc. in distributed applications. • Firestorm/DAO • Automatically generates Java source code for accessing relational databases.
Things to think about • Sometimes simple is better • Try the least complicated persistence mechanism first. • Sometimes you can mix and match • Not all data must necessarily be persisted the same way • You can evolve your solution over time • Especially if you design your system to encapsulate your persistence mechanism.
Things to think about • Your persistence strategy might depend on context: • Java First: • You’ve developed/inherited some Java code and need someplace to store the objects. • Hibernate • Database First: • You’ve developed/inherited a database and want to access it in Java. • Firestorm/DAO • Spaghetti Junction: • You’ve inherited Java code and a DB and want to put the two together. • Uh oh.
Things to think about • IF: Client-side, single thread, simple structure, installation simplicity • DB optional, consider XML. • IF: Multiple clients need access to data • DB highly recommended • IF: Transaction support, fail-over, etc: • DB required