Sunday, July 1, 2012

LINQ Join, LINQ Distinct, LINQ Union, overriding equality on C# Structs, and lookups are all jammed into this fat blog posting!

I've been reading about LINQ in "C# 4.0 in a Nutshell" by Joseph Albahari and Ben Albahari! It may be more efficient to do a Join instead of a Select when working against local queries. In doing a joinesque selection, a Select will iterate through an outer collection while iterating through an inner collection over and over again one time for each item in the outer collection. A Join puts the inner collection into a lookup which has a dictionaryesque grab-item-by-key flavor to it allowing for pertinent inner objects to be found without crawling through all of their kin in a collection. The four tests which follow show off joining and all pass.

using System;
using System.Linq;
using System.Collections.Generic;
using LinqStuff.Core;
using Microsoft.VisualStudio.TestTools.UnitTesting;
namespace LinqStuff.Tests
{
   [TestClass]
   public class LinqTests
   {
      public LinqTests()
      {
      }
      
      [TestMethod]
      public void JoinTest()
      {
         Tuple<IEnumerable<RegisteredDriver>, IEnumerable<Car>> tuple =
               GetDriversAndCars();
         IEnumerable<RegisteredDriver> drivers = tuple.Item1;
         IEnumerable<Car> cars = tuple.Item2;
         List<RegisteredDriver> list = drivers.Join(
            cars,
            d => d.LicenseNumber,
            c => c.OwnersLicenseNumber,
            (d,c) => new RegisteredDriver()
                  {
                     LicenseNumber = d.LicenseNumber,
                     LicenseExpiration = d.LicenseExpiration,
                     IsRequiredToWearGlasses = d.IsRequiredToWearGlasses,
                     Cars = new List<Car>(){c}
                  }
            ).ToList();
         Assert.AreEqual(list.Count, 4);
         Assert.AreEqual(list[0].LicenseNumber, 33489432);
         Assert.AreEqual(list[0].Cars.Count(), 1);
         Assert.AreEqual(list[0].Cars.ElementAt(0).LicensePlate, "DIE HOT");
         Assert.AreEqual(list[1].LicenseNumber, 90343534);
         Assert.AreEqual(list[1].Cars.Count(), 1);
         Assert.AreEqual(list[1].Cars.ElementAt(0).LicensePlate, "888 BAR");
         Assert.AreEqual(list[2].LicenseNumber, 90343534);
         Assert.AreEqual(list[2].Cars.Count(), 1);
         Assert.AreEqual(list[2].Cars.ElementAt(0).LicensePlate, "FOO 42X");
         Assert.AreEqual(list[3].LicenseNumber, 54234989);
         Assert.AreEqual(list[3].Cars.Count(), 1);
         Assert.AreEqual(list[3].Cars.ElementAt(0).LicensePlate, "ABC 123");
      }
      
      [TestMethod]
      public void JoinTestWithDistinct()
      {
         Tuple<IEnumerable<RegisteredDriver>, IEnumerable<Car>> tuple =
               GetDriversAndCars();
         IEnumerable<RegisteredDriver> drivers = tuple.Item1;
         IEnumerable<Car> cars = tuple.Item2;
         List<RegisteredDriver> list = drivers.Join(
            cars,
            d => d.LicenseNumber,
            c => c.OwnersLicenseNumber,
            (d,c) => new RegisteredDriver()
                  {
                     LicenseNumber = d.LicenseNumber,
                     LicenseExpiration = d.LicenseExpiration,
                     IsRequiredToWearGlasses = d.IsRequiredToWearGlasses,
                     Cars = cars.Where(x => x.OwnersLicenseNumber ==
                           d.LicenseNumber).ToList()
                  }
            ).Distinct().ToList();
         Assert.AreEqual(list.Count, 3);
         Assert.AreEqual(list[0].LicenseNumber, 33489432);
         Assert.AreEqual(list[0].Cars.Count(), 1);
         Assert.AreEqual(list[0].Cars.ElementAt(0).LicensePlate, "DIE HOT");
         Assert.AreEqual(list[1].LicenseNumber, 90343534);
         Assert.AreEqual(list[1].Cars.Count(), 2);
         Assert.AreEqual(list[1].Cars.ElementAt(0).LicensePlate, "888 BAR");
         Assert.AreEqual(list[1].Cars.ElementAt(1).LicensePlate, "FOO 42X");
         Assert.AreEqual(list[2].LicenseNumber, 54234989);
         Assert.AreEqual(list[2].Cars.Count(), 1);
         Assert.AreEqual(list[2].Cars.ElementAt(0).LicensePlate, "ABC 123");
      }
      
      [TestMethod]
      public void JoinTestWithUnionAndDistinct()
      {
         Tuple<IEnumerable<RegisteredDriver>, IEnumerable<Car>> tuple =
               GetDriversAndCars();
         IEnumerable<RegisteredDriver> drivers = tuple.Item1;
         IEnumerable<Car> cars = tuple.Item2;
         List<RegisteredDriver> list = drivers.Join(
            cars,
            d => d.LicenseNumber,
            c => c.OwnersLicenseNumber,
            (d,c) => new RegisteredDriver()
                  {
                     LicenseNumber = d.LicenseNumber,
                     LicenseExpiration = d.LicenseExpiration,
                     IsRequiredToWearGlasses = d.IsRequiredToWearGlasses,
                     Cars = cars.Where(x => x.OwnersLicenseNumber ==
                           d.LicenseNumber).ToList()
                  }
            ).Distinct().Union(drivers).ToList();
         Assert.AreEqual(list.Count, 5);
         Assert.AreEqual(list[0].LicenseNumber, 33489432);
         Assert.AreEqual(list[0].Cars.Count(), 1);
         Assert.AreEqual(list[0].Cars.ElementAt(0).LicensePlate, "DIE HOT");
         Assert.AreEqual(list[1].LicenseNumber, 90343534);
         Assert.AreEqual(list[1].Cars.Count(), 2);
         Assert.AreEqual(list[1].Cars.ElementAt(0).LicensePlate, "888 BAR");
         Assert.AreEqual(list[1].Cars.ElementAt(1).LicensePlate, "FOO 42X");
         Assert.AreEqual(list[2].LicenseNumber, 54234989);
         Assert.AreEqual(list[2].Cars.Count(), 1);
         Assert.AreEqual(list[2].Cars.ElementAt(0).LicensePlate, "ABC 123");
         Assert.AreEqual(list[3].LicenseNumber, 83242343);
         Assert.AreEqual(list[3].Cars.Count(), 0);
         Assert.AreEqual(list[4].LicenseNumber, 29423477);
         Assert.AreEqual(list[4].Cars.Count(), 0);
      }
      
      [TestMethod]
      public void JoinTestWithDefaultIfEmptyAndDistinct()
      {
         Tuple<IEnumerable<RegisteredDriver>, IEnumerable<Car>> tuple =
               GetDriversAndCars();
         IEnumerable<RegisteredDriver> drivers = tuple.Item1;
         IEnumerable<Car> cars = tuple.Item2;
         List<RegisteredDriver> list = (from d in drivers
            join c in cars on d.LicenseNumber equals c.OwnersLicenseNumber into gunk
            from g in gunk.DefaultIfEmpty()
            select new RegisteredDriver
                  {
                     LicenseNumber = d.LicenseNumber,
                     LicenseExpiration = d.LicenseExpiration,
                     IsRequiredToWearGlasses = d.IsRequiredToWearGlasses,
                     Cars = cars.Where(x => x.OwnersLicenseNumber ==
                           d.LicenseNumber).ToList()
                  }).Distinct().ToList();
         Assert.AreEqual(list.Count, 5);
         Assert.AreEqual(list[0].LicenseNumber, 33489432);
         Assert.AreEqual(list[0].Cars.Count(), 1);
         Assert.AreEqual(list[0].Cars.ElementAt(0).LicensePlate, "DIE HOT");
         Assert.AreEqual(list[1].LicenseNumber, 90343534);
         Assert.AreEqual(list[1].Cars.Count(), 2);
         Assert.AreEqual(list[1].Cars.ElementAt(0).LicensePlate, "888 BAR");
         Assert.AreEqual(list[1].Cars.ElementAt(1).LicensePlate, "FOO 42X");
         Assert.AreEqual(list[2].LicenseNumber, 83242343);
         Assert.AreEqual(list[2].Cars.Count(), 0);
         Assert.AreEqual(list[3].LicenseNumber, 29423477);
         Assert.AreEqual(list[3].Cars.Count(), 0);
         Assert.AreEqual(list[4].LicenseNumber, 54234989);
         Assert.AreEqual(list[4].Cars.Count(), 1);
         Assert.AreEqual(list[4].Cars.ElementAt(0).LicensePlate, "ABC 123");
      }
      
      public Tuple<IEnumerable<RegisteredDriver>, IEnumerable<Car>>
            GetDriversAndCars()
      {
         IEnumerable<RegisteredDriver> drivers = new List<RegisteredDriver>()
               {
                  new RegisteredDriver()
                        {
                           LicenseNumber = 33489432,
                           LicenseExpiration = new DateTime(2013, 6, 15),
                           IsRequiredToWearGlasses = true,
                           Cars = new List<Car>()
                        },
                  new RegisteredDriver()
                        {
                           LicenseNumber = 90343534,
                           LicenseExpiration = new DateTime(2012, 12, 7),
                           IsRequiredToWearGlasses = false,
                           Cars = new List<Car>()
                        },
                  new RegisteredDriver()
                        {
                           LicenseNumber = 83242343,
                           LicenseExpiration = new DateTime(2012, 7, 31),
                           IsRequiredToWearGlasses = false,
                           Cars = new List<Car>()
                        },
                  new RegisteredDriver()
                        {
                           LicenseNumber = 29423477,
                           LicenseExpiration = new DateTime(2012, 8, 29),
                           IsRequiredToWearGlasses = false,
                           Cars = new List<Car>()
                        },
                  new RegisteredDriver()
                        {
                           LicenseNumber = 54234989,
                           LicenseExpiration = new DateTime(2013, 1, 14),
                           IsRequiredToWearGlasses = false,
                           Cars = new List<Car>()
                        }
               };
         IEnumerable<Car> cars = new List<Car>()
               {
                  new Car()
                        {
                           YearMakeModel = "1994 Daihatsu Charade",
                           LicensePlate = "DIE HOT",
                           RegistrationExpiration = new DateTime(2012, 8, 16),
                           IsInsured = true,
                           OwnersLicenseNumber = 33489432
                        },
                  new Car()
                        {
                           YearMakeModel = "2011 Ford Taurus",
                           LicensePlate = "888 BAR",
                           RegistrationExpiration = new DateTime(2012, 12, 30),
                           IsInsured = true,
                           OwnersLicenseNumber = 90343534
                        },
                  new Car()
                        {
                           YearMakeModel = "2005 Honda Accord",
                           LicensePlate = "FOO 42X",
                           RegistrationExpiration = new DateTime(2010, 11, 11),
                           IsInsured = false,
                           OwnersLicenseNumber = 90343534
                        },
                  new Car()
                        {
                           YearMakeModel = "2003 Nissan Centra",
                           LicensePlate = "ABC 123",
                           RegistrationExpiration = new DateTime(2012, 9, 3),
                           IsInsured = true,
                           OwnersLicenseNumber = 54234989
                        }
               };
         return new Tuple<IEnumerable<RegisteredDriver>,
               IEnumerable<Car>>(drivers,cars);
      }
   }
}

 
 

Let's talk through the four tests:

  1. JoinTest gives us only RegisteredDrivers who have Cars and the RegisteredDrivers get duplicate entries for every Car that they own. In order for matching to work Car carries on it the driver's license number of its owner.
    using System;
    namespace LinqStuff.Core
    {
       public class Car
       {
          public string YearMakeModel { get; set; }
          public string LicensePlate { get; set; }
          public DateTime RegistrationExpiration { get; set; }
          public bool IsInsured { get; set; }
          public int OwnersLicenseNumber { get; set; }
       }
    }

    This is a basic Join with poorly shaped end data. We can do better! Shall we say that our end goal should be to get all of the RegisteredDrivers without duplicates and without dropping RegisteredDrivers which do not have Cars? I tinkered with how to accomplish this and ultimately concluded I would need to look beyond the Join itself.
     
  2. JoinTestWithDistinct uses .Distinct() to get rid of duplicate RegisteredDrivers, which beneath the hood does comparison operations against RegisteredDrivers and drops perceived duplicates. In order to get this to work, I had to change RegisteredDriver from a class to a struct as a class is a reference object and two different references to a RegisteredDriver holding identical getsetter settings which nonetheless occupy different places on the heap will not be seen as equal. The switch to struct alone was not enough. I am not sure why. I suspect it is because one of the getsetters holds a collection of classes (Car) and thus will have the same matching problem as RegisteredDriver would in class form. In the end, I overrode the Equals method on RegisteredDriver to ensure that equality comparisons compared just one of the getsetters in the struct instead of all of them.
    using System;
    using System.Collections.Generic;
    namespace LinqStuff.Core
    {
       public struct RegisteredDriver
       {
          public int LicenseNumber { get; set; }
          public DateTime LicenseExpiration { get; set; }
          public bool IsRequiredToWearGlasses { get; set; }
          public IEnumerable<Car> Cars { get; set; }
          
          public override bool Equals(object other)
          {
             return LicenseNumber == ((RegisteredDriver)other).LicenseNumber;
          }
       }
    }

    This test also implements a Select when assigning Cars to a particular RegisteredDriver so that a RegisteredDriver gets every car it owns. In the previous test we basically had flat data, but now we have hierarchal data. The problem we have not beat however is one of how to include RegisteredDrivers that have no Cars.
     
  3. JoinTestWithUnionAndDistinct adds the RegisteredDrivers without Cars to our collection by way of a Union. A Union between two collections will produce no duplicates even if duplicates exist (not so with Concat) and thus if one does a Union of a set of RegisteredDrivers without Cars and a subset of that set in which the RegisteredDrivers are assigned Cars the end result will more or less mirror the first set save that some of the RegisteredDrivers will have Cars. One change up may be the ordering. Also, I approached the Union like so:
    subset.Union(fullset)
    Doing the Union the other way around would have cost us the Car data.
     
  4. JoinTestWithDefaultIfEmptyAndDistinct uses .DefaultIfEmpty() in fluent syntax to do what Union did in the prior test (with ultimately different ordering). I do not know how to emulate .DefaultIfEmpty() in a Lambda as of yet. There may not be a way.

No comments:

Post a Comment