Optimization – ByATool

So something I’ve come across time and time again since starting to use ORMs (about 3 or so years ago) was the infamous “hydrate a reference object just to save another.” If you haven’t had this before, you probably will anyhow, but here’s what it is. Say I have a “Manny” object and I need to assign some kind of “fertility drug type” to it. Now this “fertility drug type” is basically representative of a table that has an id and description and nothing else really. Now the question is, if I know the ID of “HCG” is 1, should I bother even getting it from the database? I mean after all:

  Drug totallyNotUsingVitamins = toolContext.Drugs.Select(drug => drug.Name == "HCG");
  manny.DrugOfChoice = totallyNotUsingVitaminS;

Looks pointless when I can do this:

  playersObject.DrugTypeReference.EntityKey =
     new EntityKey("ToolEntities.BallPlayers", "drugOfChoiceID", 1);

Which was taken from here.

So here’s the problem: Why should I have to hit the database if I know what the ID is? I mean, that’s a trip I don’t have to make right?

If you’re asking that question, then it’s possible you don’t understand how a lot of ORMs work, or in this site’s case: The Entity Framework.

What it really comes down to a couple arguments:

What is the cost if it hits the database?
Is it actually hitting the database?
It’s just easier to set the ID

What is the cost if it hits the database?

Now the first one you could almost make an argument, after all what’s the point of hitting it if you don’t need to? This might seem silly:

  using(ToolEntities toolContext = new ToolEntities())
  {
    Drug totallyNotUsingVitamins = toolContext.Drugs.Select(drug => drug.Name == "HCG");
    manny.DrugOfChoice = totallyNotUsingVitaminS;
  }

If that is the only time you’re using this. A one off like that does seem to beg overdoing it. But really is it? This is probably the strongest argument for the straight ID set and really what is the real cost? In order to persist the object, a connection has to be created and the entity has to be persisted. So in reality, you have to create a connection any how, so the actually getting and hydrating of the reference object is just sort of tagged onto that and unless there is something wrong with the table or connection, this adds a minimal amount of time.

Is it actually hitting the database?

But what if this has to be run a million times? Then what? I have to get that item a million times and now my hits have gone from one million to two million?

  using(ToolEntities toolContext = new ToolEntities())
  {
    for(Int32 loopCounter = 0; loopCounter < someNumber; loopCounter ++)
    {
      Drug totallyNotUsingVitamins = toolContext.Drugs
                                       .Select(drug => drug.Name == "HCG")
                                       .First();
      manny.DrugOfChoice = totallyNotUsingVitaminS;
    }
  }

Ok not the best example, but the point is that within that using you will be updating a million records which you would assume that

  Drug totallyNotUsingVitamins = toolContext.Drugs.Select(drug => drug.Name == "HCG");

Would be run a possible two million times. You could make the argument that Entity Framework does make a select against the database even if the object is in the context. This is something I’ve thought rather odd myself. However, it’s just a query on an already open connection. And since creating the query is done the first time around, the acutal query being run is minimal in effort. Basically the hard work was does the first time around. The next time is nothing to really consider.

It’s just easier to set the ID

Ok so this one is hard to discredit since it is easier. It’s about one line less easier. Is one line really worth this? Is it worth the ease of understanding? From the first example, it’s pretty easy to understand what I’m setting. Select a drug type of “HCG”. Second one, not so much. All I see is an ID, so if I want to know what drug that is, I’d have to go searching for what that ID is. Also, from a spec level, it’s more clear to say:

Set DrugOfChoice to “HCG”

Then

Set DrugOfChoice to 1

Now I may be reaching on this last example, but as I said, it will be hard to prove that the second example isn’t “easier” coding wise. After all it’s one less line. However, when things like readablity comes in, I can’t see the second being a winner in that contest.

Once a person understands how an ORM like the Entity Framework works, it becomes apparent that some of the old ways of thinking need to be abandoned as in the long run they only hold overall development back. And beyond that, sometimes when we think we’re optimizing, we could actually doing more harm than needed. Losing readability for the sake of ridiculously small amounts of time just doesn’t make sense anymore.

Tag: Optimization

Just Use the Stupid Functionality