-->
These old forums are deprecated now and set to read-only. We are waiting for you on our new forums!
More modern, Discourse-based and with GitHub/Google/Twitter authentication built-in.

All times are UTC - 5 hours [ DST ]



Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 3 posts ] 
Author Message
 Post subject: I'm new to Java/Hibernate...
PostPosted: Thu Dec 09, 2004 1:34 pm 
Newbie

Joined: Thu Dec 09, 2004 12:51 pm
Posts: 7
Location: Boston, MA
Hello, I'm a developer with about 10 years experience, about 5 of which have been in database-backed web applications. I'm looking into using Java for some new projects and Hibernate caught my eye as a good ORM component. I've been working my way through the HIA book and reading some of the posts of this forum.

I'm very excited and impressed with most of what I'm seeing in Hibernate.
A few items in the book, and in forums, are currently disturbing me greatly, however and I was wondering if you could provide links/references to documents that back up the following opinions of the Hibernate team and/or document how I'll limit my use of Hibernate if I choose to design my mapping/database against the advice of the Hibernate team.

1. Extreme bias against natural keys.
I've always found that natural keys keep my databases more useable and increase preformance, while possibly requiring additional storage space. It ends up removing many needs for joins solely to "look-up" a usefal value for humans -- both in debugging and user interface display. Of course there are times when surrogate keys are more appropriate as well. I was also a little suprised to see the HIA's definition of primary key include the 3rd bullet "The value of values of a paticular row never change." That is not a normal requirement for a primary key -- of course it is a very strong recommendation. The databases I've worked with have good support for referential integrity and on the rare occasions when I've had to update a PK (Natural or Surrogate) the ON UPDATE clauses have allowed the change to propagate to all needed locations in the schema.

In short the reading, developing, experimenting I've done have indicated that the natural v. surrogate key decision is almost always matter of personal opinion. If I choose, on occasions where I feel the natural key is more appropriate than a surrogate key, am I crippling myself with regards to Hibernate's ability to help manage my objects?

2. Denormalization, psuedo-"repeating groups"
The HIA book promotes "more classes than tables". The suggestions in section 3.5 to store all the columns for two seperate types of addresses into the same row as the user information invokes "denormalization" for preformance reasons. I'm a farily strong supporter of Date/ Darwen/ Pascal's writing so any such mention is a red-flag to me. Furthermore extending the tables with the two sets of columns is a "smell" close to repeating groups. There is the suggestion to use an Address class to deal with their repeated columns on the application side, why not an Address table to store them? If I choose a design/mapping that "promotes" more, or equal, tables as classes, will my use of Hibernate suffer? I've yet to fully understand some of the implication of user-defined types in Hibernate, which might provide a happy middle ground for me

The first two chapters of the book appeared to embrace relational theory; its part of what drew me to the book and Hibernate; now it seems less "friendly".


Top
 Profile  
 
 Post subject:
PostPosted: Thu Dec 09, 2004 5:22 pm 
Hibernate Team
Hibernate Team

Joined: Tue Aug 26, 2003 3:00 pm
Posts: 1816
Location: Austin, TX
Quote:
I've always found that natural keys keep my databases more useable and increase preformance

Huh? The tale I love to tell regarding usability of natural keys is when my manager decided that a user's username would make a great PK for the user table. The part I loved is when executives came saying that they wanted their usernames changed because they did not like them and my manager had not the balls to say no, guess who ended up finding and cleaning up all those FK references? The point is that a value with meaning to the application which will never change is *extremely* difficult to find.

And as for increasing performance, how did you surmise that? My experience has been the exact opposite. Synthetic keys tend to be simple numerics like intergers which are very fast to index.

But regardless, Hibernate has support for defining things either way. Of course natural keys cannot be auto-generated for you (that'd be against their definition) and so you will miss out on a bunch of the niceties afforded by Hibernate in regards to auto-generated ids.


So why is denormalizing for read inherently a bad thing? It's interesting that you use the example of an address (or maybe that's what the book used). People have addresses; companies have addresses; etc. So most data modelers end up modeling this exactly as you described. But here's the thing. When is the last time you saw an application in which a person(s) and a company(s) shared an address pk value? I've seen one application in my 10+ years of programming where an ADDRESS table row could validly be referenced by more than one defined FK. So why reflexively build-in the performance overhead of normalizing this? For storage, right? That's a practical consideration, not a "relation modeling" consideration. Or perhaps you are concerned about having to add a UNIVERSE column to ADDRESS in a few years :)

But, again, regardless, Hibernate will let you do this either way you want. You just need to consider that Hibernate defines seperate lifecycles for entities (address as a table) and for components (address as denormalized columns).


Top
 Profile  
 
 Post subject:
PostPosted: Thu Dec 09, 2004 7:23 pm 
Newbie

Joined: Thu Dec 09, 2004 12:51 pm
Posts: 7
Location: Boston, MA
steve wrote:
And as for increasing performance, how did you surmise that? My experience has been the exact opposite. Synthetic keys tend to be simple numerics like intergers which are very fast to index.


My first schema for an application was a completely normalized, all surrogate key design.

About 10% of my (select) queries are based around a self join of a large table with supporting joins to between 4 and 8 other tables, sometimes 8-16(if both ends of the self join needed the full supporting information). Normally half of these supporting joins were only needed to convert the surrogate key to a displayable value. 89% of the remaining queries are typically trivial single table queries on indexed columns. The left over 1% are even larger joins, but rarely needed.

Switching those relations that could support a reasonably strong argument for a natural key to use the natural key cut down over half the joins and reduced the query execution time by a factor of 8. The simple queries showed a very slight increase in execution time that, even under load, was typically lost in the network latency from the webserver to the user's browser. I didn't notice any measureable changes in the update/insert/deletes.

Now maybe two-thirds of the eliminated joins could probably be dealt with by the second level caching offered by Hibernate. It definately something with which I'll need to experiment -- these are small(<25 rows) to moderate (<200) relatively constant look-up tables from the perspective of most of the application.

My second generation schema only uses surrogate keys when no good natural key suggests itself -- and a composite key is almost never considered "good" unless I can't even dream about a case where I'ld want to refer to it.... and even then I'm cautious. The general schema is also very useful from a development point as ad-hoc queries tend to give much greater data as you don't need to join into the LUT.

Quote:
So why is denormalizing for read inherently a bad thing? It's interesting that you use the example of an address (or maybe that's what the book used). People have addresses; companies have addresses; etc. So most data modelers end up modeling this exactly as you described. But here's the thing. When is the last time you saw an application in which a person(s) and a company(s) shared an address pk value? I've seen one application in my 10+ years of programming where an ADDRESS table row could validly be referenced by more than one defined FK. So why reflexively build-in the performance overhead of normalizing this? For storage, right? That's a practical consideration, not a "relation modeling" consideration. Or perhaps you are concerned about having to add a UNIVERSE column to ADDRESS in a few years :)

(Yes that is the example the book used.)
No, no need for a UNIVERSE column, nor do I often take normalization to the extreme that would split city/state off to a seperate table referenced by zip_code ( assuming US only, for now).

However in my application, I commonly have multiple entities referencing the same address so that example jumped right out at me. I do agree with you that in general its not commonly a shared reference -- in which case I'ld probably produce a data model with person_address, compnay_address, etc, tables. Possibly "over-normalization", but it keeps the row size down which can be a win in some cases, typically in heavily used tables.

With the single exception of the zip-code functional dependency, every time I've denormalization, I've regretted it later. When I've found a place where it appears denormalization could help improve a poor preformance area, I've also found that something else was more to blame -- perhaps a bad algorithm, perhaps a poorly thought out query. In some instances it was better to produce a temporary table (or materialized view depnding on your DBMS) for some reporting process, especially with historic data.

But I'll have to remember that due to the languange in which I've done most of my database-backed (web) applications, I haven't had a very rich OO environment in which to work. Lessons learned might not apply... Part of the reason I'm looking at Java and Hibernate (and other parts of the Java alphabet soup) is to get a better OO model and if that means evolving my understanding of DB usage then I had better do that...

Thank you for your reply.


Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 3 posts ] 

All times are UTC - 5 hours [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
cron
© Copyright 2014, Red Hat Inc. All rights reserved. JBoss and Hibernate are registered trademarks and servicemarks of Red Hat, Inc.