Dictionary Index Lookup Vs Contains Key Vs List Contains Vs Linq… Speed Test/Texas Tornado Match

Ok so with MVC comes the use of Routes which calls in the need to compare request values to see which route to use. Now before I even bother with that headache (Although it’s getting better AND will be a post) I ran into a situation where I would have to check a passed in string against a list of strings to see if it matches any of them.

One thing I like to do is use Dictionaries. They are just plain convenient when it comes to looking things up or matching values to get methods. But what if I don’t really have a value to find with a key? What if finding the key is all that matters? Say I have a list of strings and I just want to know if the list contains that string, sounds like a job for an array or list right? Wouldn’t it be silly to create a dictionary like:

  Dictiontary<String, String> someList = new Dictiontary<String, String>();
  someList.Add("INeedThis", ""); someList.Add("ThisToo", "");

and do this:

  if(someList.ContainsKey("INeedThis"))

If I don’t actually care about the attached value? I’m sure I’m breaking a rule somewhere… but what if it was faster overall? What if ContainsKey is faster than a list using Any, Contains, FirstOrDefault, or where? Turns out it is. Here’s the method I used.

  public void TimeRun(Holder toHold)
  {
    Int32 maxLength = 1000;

    Dictionary<String, String> stringDictionary = Enumerable.Range(0, maxLength).Select(item => RandomTool.RandomString(item)).ToDictionary(item => item, item => item);
    List<String> stringList = stringDictionary.Select(item => item.Key).ToList();

    String chosenString = stringList[RandomTool.RandomInt32(0, maxLength)];

    Stopwatch runTime = new Stopwatch();

    runTime.Start();
    stringDictionary.ContainsKey(chosenString);
    runTime.Stop();
    toHold.DictionaryContainsKeyTime = runTime.ElapsedTicks;
    runTime.Reset();

    runTime.Start();
    String junk = stringDictionary[chosenString];
    runTime.Stop();
    toHold.DictionaryStraightIndexCheck = runTime.ElapsedTicks;
    runTime.Reset();

    runTime.Start();
    Boolean junkThree = stringList.Contains(chosenString);
    runTime.Stop();
    toHold.ListContains = runTime.ElapsedTicks;
    runTime.Reset();

    runTime.Start();
    Boolean junkTwo = stringList.Any(item => item == chosenString);
    runTime.Stop();
    toHold.ListLinqAny = runTime.ElapsedTicks;
    runTime.Reset();

    runTime.Start();
    String junkFour = stringList.First(item => item == chosenString);
    runTime.Stop();
    toHold.ListLinqFirst = runTime.ElapsedTicks;
    runTime.Reset();

    runTime.Start();
    IEnumerable<String> junkFive = stringList.Where(item => item == chosenString);
    if (junkFive.FirstOrDefault() != String.Empty)
    {

    }
    runTime.Stop();
    toHold.ListLinqWhere = runTime.ElapsedTicks;
    runTime.Reset();
  }

Crazy simple, and why shouldn’t it? Am I right? Am I right? Ok. As you can see, I gave all the methods a run and timed them using StopWatch. And then I ran it a given amount of times, 200 in this code but I tried up to 10000 also. (I’ll put the test code at the end) The test was to go through a list of a thousand strings, each string increasing in length. (Yeah I could have done random size strings but I’m lazy)

What did I find out? Well if it didn’t throw an exception, a straight index search on a dictionary is fastest:

someList["INeedThis"]

And pretty consistently fast. Around 2600 ticks or so on average on multiple runs. (so 10 iterations of parent method running 200-10000 interations of the test method) Next fastest was the ContainsKey method on the dictionary, usually around 2-4 times faster than the next in line good old List.Contains. What I did find surprising is that all the Linq methods failed on this one. I figured that once the first run was through, it would be at least as fast as Contains. (Linq always sucks the first time through) Yeah not so much though. Contains was always faster. Sometimes it was close. Sometimes not even. Here are some example runs:

Dictionary_ContainsKey: 15805
Dictionary_StraightIndexCheck: 2926
List_Contains: 34559
List_LinqAny: 96575
List_LinqFirst: 56541
List_LinqWhere: 64678 

Dictionary_ContainsKey: 7264
Dictionary_StraightIndexCheck: 2676
List_Contains: 29970
List_LinqAny: 41280
List_LinqFirst: 58313
List_LinqWhere: 45669 

Dictionary_ContainsKey: 6773
Dictionary_StraightIndexCheck: 2636
List_Contains: 32366
List_LinqAny: 38670
List_LinqFirst: 33859
List_LinqWhere: 41288

All in ticks. Now mind you, none of these are horribly slow so it probably just comes down to reability and ease of understanding. Personally I like the Dictionary way, so at least speed wise I’m on track. As for looks? That’s a personal thing.

Rest of the code. Here is the parent method. This is a unit test hense the .Assert but it could easily be adapted to any output.

  [TestMethod]
  public void RunTime()
  {
    Int64 overallDictionaryContainsKeyTime = 0;
    Int64 overallDictionaryStraightIndexCheck = 0;
    Int64 overallListContains = 0;
    Int64 overallListLinqAny = 0;
    Int64 overallListLinqFirst = 0;
    Int64 overallListLinqWhere = 0;
    Int32 loopMax = 200;

    for (Int32 loopCounter = 0; loopCounter < loopMax; loopCounter++)
    {
      Holder currentHolder = new Holder();

      TimeRun(currentHolder);
      overallDictionaryContainsKeyTime += currentHolder.DictionaryContainsKeyTime;
      overallDictionaryStraightIndexCheck += currentHolder.DictionaryStraightIndexCheck;
      overallListContains += currentHolder.ListContains;
      overallListLinqAny += currentHolder.ListLinqAny;
      overallListLinqFirst += currentHolder.ListLinqFirst;
      overallListLinqWhere += currentHolder.ListLinqWhere;
    }

    Assert.IsTrue
    (
      false,
      " Dictionary_ContainsKey: " + (overallDictionaryContainsKeyTime / loopMax) +
      " Dictionary_StraightIndexCheck: " + (overallDictionaryStraightIndexCheck / loopMax) +
      " List_Contains: " + (overallListContains / loopMax) +
      " List_LinqAny: " + (overallListLinqAny / loopMax) +
      " List_LinqFirst: " + (overallListLinqFirst / loopMax) +
      " List_LinqWhere: " + (overallListLinqWhere / loopMax)
    );
  }

And the holder class which is a nothing class. I just didn’t care for having to add parameters to the child mehod.

  public class Holder
  {
    public Int64DictionaryContainsKeyTime { get; set; }
    public Int64DictionaryStraightIndexCheck { get; set; }
    public Int64ListLinqAny { get; set; }
    public Int64ListContains { get; set; }
    public Int64ListLinqFirst { get; set; }
    public Int64ListLinqWhere { get; set; }
  }

Couple Notes:

  • StopWatch is in System.Diagnostics
  • RandomTool is actual a class of mine. Nothing special about it. Just makes a string of X length with all random letters.
  • This can not be rebroadcast or retransmitted without the express written permission of my mom.

Hashing and you

This is really simple, but don’t feel bad if you didn’t know about this. No one tells me nothin’ either.

The other day while reading through a blog, I got schooled without going to class. I had no idea that passwords should be one way only, as in you shouldn’t be able to retrieve a forgotten password, only reset it. Well shoot, I missed that one. So they start talking about hashing the password and saving that. Good thing I had built something to hash request items in a url to help stop people with screwing with a site url. Well ok, I didn’t totally build it. I got the idea from somewhere else. But I f-ing integrated it so don’t look at me like that.

Basically you take a password, add a salt (fancy name for a predetermined string or number) to the password, and get the hash value for that.

Why getting the same hash everytime sucks

Well the problem with hashing, or really the reason why we are doing this, is that the word “pass” creates the same hash value every time. Now if some sort of mean person figures out the hash for “pass” (That isn’t too hard since most people use some sort of standard like the .net MD5CryptoServiceProvider), he/she could search for a slew of typical password words. See the problem? Now enters the salt. The salt is some word that you come up with to add anywhere you want. This makes it a little difficult to figure out since the word “passSalt” is no longer like “pass”. one could try every word known and still fail because said person doesn’t know to add the word “Salt”.

Why getting the same hash everytime is great:

Right now you might be wonder what the point of having a password saved that can’t be unhashed to check against. Easy, you don’t have to unhash. Since the hash for “passSalt” will always be the same, the password in the database will always match the entered password + the salt. So basically the user enters the password, the system adds the salt to the password, the system then gets the hash, and finally checks that against the database record. Fun huh?

Now for the code:


public class HashString
{
 private const String DEFAULT_SALT = "8745";

 public static String CreateHash(String originalValue, String salt)
 {
   Byte[] computedHash;
   StringBuilder hashedValues;
   StringBuilder hashString;

   //Take the string and append your "secret" string at the end.
   hashString = new StringBuilder(originalValue);
   hashString.Append(salt);

   //Get the hash for the new word.
   computedHash = new MD5CryptoServiceProvider().ComputeHash(Encoding.UTF8.GetBytes(hashString.ToString()));
   hashedValues = new StringBuilder();

   //Go through computedHash and create a string from it.
   computedHash.ToList().ForEach(currentItem =>
ashedValues.Append(currentItem.ToString("x2")));

   return hashedValues.ToString();
 }

 //Not needed overload, just have it here for ease of use
 public static String CreateHash(String originalValue)
 {
   return CreateHash(originalValue, DEFAULT_SALT);
 }
}

And I even have a test for you… if you have VS Professional or higher. If not, no big deal. Just remove the asserts, attributes, and step through.


[TestClass]
public class HashStringTest
{
 private class UserTable
 {
   private const String SALT_VALUE = "SomeValue";
   private List table;

   public UserTable()
   {
     table = new List();
   }

   public void AddUser(String password, String userName)
   {
     table.Add(new UserRow(){Password = HashString.CreateHash(password, SALT_VALUE), UserName = userName});
   }

   public Boolean UserExists(String password, String userName)
   {
     String hashedPassword;

     hashedPassword = HashString.CreateHash(password, SALT_VALUE);

     var query = from user in table
                 where user.Password == hashedPassword && user.UserName == userName
                 select user;

     return query.ToList().Count == 1;
   }

   private class UserRow
   {
     public String Password { get; set; }
     public String UserName { get; set; }
   }
 }

 [TestMethod]
 public void CreateHash_PasswordMatchPass()
 {
   UserTable table;
   String userName;
   String goodPassword;
   String failedPassword;

   userName = "SomeUser";
   goodPassword = "goodPassword";

   table = new UserTable();
   table.AddUser(goodPassword, userName);
   Assert.IsTrue(table.UserExists(goodPassword, userName));

   failedPassword = "failedPassword";
   Assert.IsFalse(table.UserExists(userName, failedPassword));
 }
}

And last, the namespaces:


using System;
using System.Collections.Generic;
using System.Collections.Specialized;
using System.Linq;
using System.Text;
using System.Security.Cryptography;
using Microsoft.VisualStudio.TestTools.UnitTesting;

For the record, I hate the word salt in this use.