Welcome to EMC Consulting Blogs Sign in | Join | Help

Howard van Rooijen's EMC Consulting Blog (2004 - 2010)

This blog has now moved to http://howard.vanrooijen.co.uk/blog - please update your subscriptions if you wish to receive new content.

LINQ to EXCEL Provider + PostSharp = Cleaner Code

Earlier in the year when I was doing some research into writing my own LINQ Provider I stumbled across a great listing of LINQ Providers, two entries looked very interesting; the first was LINQ to RDF(Semantic Web), the second was LINQ to Excel. The latter caught my attention as it was a very simple implementation of a LINQ Provider, which replicates LINQ to SQL functionality, but using Excel, rather than SQL Server as your data source. I also thought it could be a valuable addition to my developer tool belt as generally with Greenfield projects, before you have a proper data model or extract, you are much more likely to be given an Excel spreadsheet containing some sample data, thus LINQ to Excel could be used as a very crude ETL tool. I thought it would also come in handy as a data source for unit testing. So with that in mind, I decided to kick its tires.

When I’m doing this kind of research – I like to use “real data” rather than Microsoft samples (Northwinds / AdventureWorks) as they are generally more interesting. Luckily in the UK there is a charity UK Citizens Online Democracy that runs a wonderful project called MySociety:

mySociety has two missions. The first is to be a charitable project which builds websites that give people simple, tangible benefits in the civic and community aspects of their lives. The second is to teach the public and voluntary sectors, through demonstration, how to use the internet most efficiently to improve lives.

One of MySociety’s projects is They Work for You - whose aim is to help bridge the growing democratic disconnect between Citizen and Government – the volunteers keep tabs on their elected MPs, and their unelected Peers, and comment on what goes on in Parliament. One of the great features of the MySociety projects and especially They Work for You is their Source Code, APIs and Raw Data Feeds. For this code sample I’m using their data feed about All Members of Parliament and in particular Members of Parliament who have died while in office (the extract goes back to 1815!).

In Excel, the data appears as below:

image

To get up and running the first step is to create a class that describes the data represented in the Excel WorkSheet. I implemented it using the sample provided with In the original LINQ to Excel 2.5 release:

  1. Add the  [ExcelSheet(Name = "Deceased MPs")] attribute to the Data Model class, where Name is the name of the worksheet
  2. Make the class implement the System.ComponentModel.INotifyPropertyChanged interface – this is used by the LINQ to Excel plumbing to track changes to ObjectState for persisting the object back to the Excel WorkSheet.
  3. Add the PropertyChangedEventHandler PropertyChanged public event.
  4. Implement public properties with backing fields
  5. Add the [ExcelColumn(Name = "fromdate", Storage = "fromDate")] attribute to the property, where Name is the column name and Storage is the name of the backing field. 
  6. Implement the INotifyPropertyChanged behaviour in each property setter by adding the following call  this.SendPropertyChanged("FromDate");  where "FromDate" is the name of the property.

You can see the full source of this Data Model class below:

[ExcelSheet(Name = "Deceased MPs")]
public class MemberOfParliamentDataModel : System.ComponentModel.INotifyPropertyChanged
{
private string constituency;
private string firstName;
private string fromDate;
private string fromWhy;
private string lastName;
private string party;
private string toDate;
private string toWhy;

public event System.ComponentModel.PropertyChangedEventHandler PropertyChanged;

[ExcelColumn(Name = "constituency", Storage = "constituency")]
public string Constituency
{
get { return this.constituency; }
set
{
this.constituency = value;
this.SendPropertyChanged("Constituency");
}
}

[ExcelColumn(Name = "firstname", Storage = "firstName")]
public string FirstName
{
get { return this.firstName; }
set
{
this.firstName = value;
this.SendPropertyChanged("FirstName");
}
}

[ExcelColumn(Name = "fromdate", Storage = "fromDate")]
public string FromDate
{
get { return this.fromDate; }
set
{
this.fromDate = value;
this.SendPropertyChanged("FromDate");
}
}

[ExcelColumn(Name = "fromwhy", Storage = "fromWhy")]
public string FromWhy
{
get { return this.fromWhy; }
set
{
this.fromWhy = value;
this.SendPropertyChanged("FromWhy");
}
}

[ExcelColumn(Name = "lastname", Storage = "lastName")]
public string LastName
{
get { return this.lastName; }
set
{
this.lastName = value;
this.SendPropertyChanged("LastName");
}
}

[ExcelColumn(Name = "party", Storage = "party")]
public string Party
{
get { return this.party; }
set
{
this.party = value;
this.SendPropertyChanged("Party");
}
}

[ExcelColumn(Name = "todate", Storage = "toDate")]
public string ToDate
{
get { return this.toDate; }
set
{
this.toDate = value;
this.SendPropertyChanged("ToDate");
}
}

[ExcelColumn(Name = "towhy", Storage = "toWhy")]
public string ToWhy
{
get { return this.toWhy; }
set
{
this.toWhy = value;
this.SendPropertyChanged("ToWhy");
}
}

protected virtual void SendPropertyChanged(string propertyName)
{
System.ComponentModel.PropertyChangedEventHandler handler = this.PropertyChanged;

if (handler != null)
{
handler(this, new System.ComponentModel.PropertyChangedEventArgs(propertyName));
}
}
}

Next you need to ensure that you build the application in x86 configuration mode (this is crucial if you are running on a 64bit environment) otherwise the application will not be able to resolve the OLEDB drivers.

Once you have Data Model class set up and configured the build, you can now perform a LINQ query against the Excel document, first, create a new Excel LINQ Provider, pointing to the document, then perform you LINQ Query against the specified WorkSheet:

var provider = ExcelProvider.Create(Path.Combine(targetDirectory.FullName, "Documents\\deceased-members.xlsx"));

If you wanted to retrieve a list of all Labour MPs who have died in office – you can just perform a normal LINQ query:

var xMps = from p in provider.GetSheet<MemberOfParliamentDataModel>() select p;
var xLabourMps = from xMp in xMps 
where xMp.Party == "Lab"
select xMp;

foreach (var mp in xLabourMps)
{
Console.WriteLine(
string.Format(
"{0} {1} - {2} - {3} - {4}",
mp.FirstName,
mp.LastName,
mp.Party,
mp.Constituency,
mp.ToDate));
}
If you want to add a new record the syntax is as follows:
var diedToday = new MemberOfParliamentDataModel
{
Constituency = "Somewhere",
FirstName = "John",
FromDate = "2000-01-01",
FromWhy = "general_election",
LastName = "Doe",
Party = "Ind",
ToDate = DateTime.Now.ToShortDateString(),
ToWhy = "died"
};

provider.GetSheet<MemberOfParliamentDataModel>().InsertOnSubmit(diedToday);
provider.SubmitChanges();

And that’s it. Very simple, very flexible, very powerful. But the Data Model class is also very ugly, very verbose; bloated with boilerplate commodity behaviour. At this point I decided to change the tires.

The first step was to refactor the source code; originally all the source code was in a single file, so I moved each class into it’s own file – instantly you get a better feel for the shape and complexity of the codebase:

image

Next I went through the codebase and cleaned it up a little, renamed some of the variables, added more spacing to make the flow of logic easier to follow with the eye and reordered some of the object creation, so that it’s easier to differentiate dependency creation from actual business logic. Now it’s much easier to see the magic that the author wove. Next I added some better null (empty cell handling) that I was seeing with some of my test data.

Whenever I see code that’s “verbose, bloated with boilerplate commodity behaviour” I always think about PostSharp, the Open Source AOP Framework and how this can be used to remove commodity. Randomly at about the same time an internal discussion started about eliminating magic strings from INotifyPropertyChanged implementations and Jonathan George rose to the challenge and implemented a PostSharp Aspect called ImplementINotifyPropertyChanged to weave the INotifyPropertyChanged behaviour into the decorated object at compile time. Once he published the code – I refactored my sample and was amazed at the result. The code for the aspect is included in the example at the end of the post – but be warned, it’s black voodoo magik performed in “here be dragons” land. But the simplifying effect on the code is utterly amazing – the PostSharp Aspect is responsible for all the INotifyPropertyChanged concerns, which means our Data Model can be converted to use automatic properties, we can now take advantage of some default behaviours inside LINQ to Excel that resolves the Storage field of the ExcelColumn attribute to be the name of the property it decorates:

[ImplementINotifyPropertyChanged]
[ExcelSheet(Name = "Deceased MPs")]
public class MemberOfParliamentDataModel
{
[ExcelColumn(Name = "constituency")]
public string Constituency { get; set; }

[ExcelColumn(Name = "firstname")]
public string FirstName { get; set; }

[ExcelColumn(Name = "fromdate")]
public string FromDate { get; set; }

[ExcelColumn(Name = "fromwhy")]
public string FromWhy { get; set; }

[ExcelColumn(Name = "lastname")]
public string LastName { get; set; }

[ExcelColumn(Name = "party")]
public string Party { get; set; }

[ExcelColumn(Name = "todate")]
public string ToDate { get; set; }

[ExcelColumn(Name = "towhy")]
public string ToWhy { get; set; }
}

If you download the sample, build it and then use .NET Reflector to decompile LINQtoExcel.Sample.exe and examine all the extra plumbing that PostSharp has woven into the MemberOfParliamentDataModel class. AOP / PostSharp are commonly used to remove commodity code such as Logging / Caching / Exception Handling – but as Jonathan’s code (and Ralf Westphal’s Software Transactional Memory example) demonstrates it can be used in much more sophisticated scenarios.

LINQ to Excel should be a great addition to your developer tool belt – it’s especially useful for playing with sample data in simple ETL scenarios. When combined with PostSharp implementation of your data model is vastly simplified and will get you up and running even faster. Don’t work hard, work smart and utilise many of the powerful Open Source tools at your disposal.

@HowardvRooijen

Published Thursday, August 27, 2009 11:08 PM by howard.vanrooijen

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Howard van Rooijen's Blog said:

Visual Studio 2005 offered a number of User Experience improvements within the Debugging feature set

August 28, 2009 9:15 AM
 

Jonathan Sharratt said:

Very clean,  have you checked out LinqToSharepoint it was something I was playing around with a while ago pretty sweet.

Went down the route of MVC pulling out data from SharePoint to present a clean presentation layer but being able to use SharepPoint as the content manager.

http://www.codeplex.com/LINQtoSharePoint

August 28, 2009 12:24 PM
 

howard.vanrooijen said:

That's awesome Jon and exactly what I need for a side project. Will have a play this weekend! Hope all is well with you.

h

August 28, 2009 1:48 PM
 

Jonathan George's Blog said:

Since I first encountered it a year or so ago, I’ve been dabbling with PostSharp , an Aspect Oriented

September 10, 2009 11:33 PM
 

RamMan said:

I have had some bad experiences with Linq to XML, doing some work on a document tended to balloon memory usage to abotu 3x the document size (better than XMLDOM which was 4x-6x but still big). Do you think the same could be said for the Excel provider, especially since Excel 2007 can have more than 64k rows? Maybe layering in some kind of filesystem-backing stream would let the provider work with any size document?

Just some thoughts...

Thanks.

October 7, 2009 4:21 PM
 

howard.vanrooijen said:

@RamMan,

If you look at the code in the Provider - essentially what it does is convert your LINQ Query into a ADO.NET Query which it executes against the Excel Driver... so the performance should be comparable to any other way of accessing Excel data via that driver - not sure whether it brings all data back as a blob - or as a recordset, or as a stream where you can buffer the data rather than reading it all into memory. Worth investigating if you a good sample data set.

I generally use it as a quick and dirty tool for converting data - so perf generally isn't a concern.

h

October 7, 2009 4:31 PM
 

Paul said:

When you have time, I would recommend checking out another custom Linq to Excel provider located on google code (http://code.google.com/p/linqtoexcel/)

This library has a much cleaner interface with no need of decorating classes or properties with any attributes. Nor do you have to implement the INotifyPropertyChanged interface. You can use any class in your project as is to retrieve data from Excel.

Check out the introductory video here: http://www.vimeo.com/7689508

December 11, 2009 2:04 PM

Leave a Comment

(required) 
(optional)
(required) 
Submit

This Blog

Syndication

News

This blog has now moved - please visit http://howard.vanrooijen.co.uk/blog for new content!
Add to Live.com
Powered by Community Server (Personal Edition), by Telligent Systems