Earlier in the year when I was doing some research into writing my own LINQ Provider I stumbled across a great listing of LINQ Providers, two entries looked very interesting; the first was LINQ to RDF(Semantic Web), the second was LINQ to Excel. The latter caught my attention as it was a very simple implementation of a LINQ Provider, which replicates LINQ to SQL functionality, but using Excel, rather than SQL Server as your data source. I also thought it could be a valuable addition to my developer tool belt as generally with Greenfield projects, before you have a proper data model or extract, you are much more likely to be given an Excel spreadsheet containing some sample data, thus LINQ to Excel could be used as a very crude ETL tool. I thought it would also come in handy as a data source for unit testing. So with that in mind, I decided to kick its tires.
When I’m doing this kind of research – I like to use “real data” rather than Microsoft samples (Northwinds / AdventureWorks) as they are generally more interesting. Luckily in the UK there is a charity UK Citizens Online Democracy that runs a wonderful project called MySociety:
mySociety has two missions. The first is to be a charitable project which builds websites that give people simple, tangible benefits in the civic and community aspects of their lives. The second is to teach the public and voluntary sectors, through demonstration, how to use the internet most efficiently to improve lives.
One of MySociety’s projects is They Work for You - whose aim is to help bridge the growing democratic disconnect between Citizen and Government – the volunteers keep tabs on their elected MPs, and their unelected Peers, and comment on what goes on in Parliament. One of the great features of the MySociety projects and especially They Work for You is their Source Code, APIs and Raw Data Feeds. For this code sample I’m using their data feed about All Members of Parliament and in particular Members of Parliament who have died while in office (the extract goes back to 1815!).
In Excel, the data appears as below:

To get up and running the first step is to create a class that describes the data represented in the Excel WorkSheet. I implemented it using the sample provided with In the original LINQ to Excel 2.5 release:
- Add the [ExcelSheet(Name = "Deceased MPs")] attribute to the Data Model class, where Name is the name of the worksheet
- Make the class implement the System.ComponentModel.INotifyPropertyChanged interface – this is used by the LINQ to Excel plumbing to track changes to ObjectState for persisting the object back to the Excel WorkSheet.
- Add the PropertyChangedEventHandler PropertyChanged public event.
- Implement public properties with backing fields
- Add the [ExcelColumn(Name = "fromdate", Storage = "fromDate")] attribute to the property, where Name is the column name and Storage is the name of the backing field.
- Implement the INotifyPropertyChanged behaviour in each property setter by adding the following call this.SendPropertyChanged("FromDate"); where "FromDate" is the name of the property.
You can see the full source of this Data Model class below:
[ExcelSheet(Name = "Deceased MPs")]
public class MemberOfParliamentDataModel : System.ComponentModel.INotifyPropertyChanged
{
private string constituency;
private string firstName;
private string fromDate;
private string fromWhy;
private string lastName;
private string party;
private string toDate;
private string toWhy;
public event System.ComponentModel.PropertyChangedEventHandler PropertyChanged;
[ExcelColumn(Name = "constituency", Storage = "constituency")]
public string Constituency
{
get { return this.constituency; }
set
{
this.constituency = value;
this.SendPropertyChanged("Constituency");
}
}
[ExcelColumn(Name = "firstname", Storage = "firstName")]
public string FirstName
{
get { return this.firstName; }
set
{
this.firstName = value;
this.SendPropertyChanged("FirstName");
}
}
[ExcelColumn(Name = "fromdate", Storage = "fromDate")]
public string FromDate
{
get { return this.fromDate; }
set
{
this.fromDate = value;
this.SendPropertyChanged("FromDate");
}
}
[ExcelColumn(Name = "fromwhy", Storage = "fromWhy")]
public string FromWhy
{
get { return this.fromWhy; }
set
{
this.fromWhy = value;
this.SendPropertyChanged("FromWhy");
}
}
[ExcelColumn(Name = "lastname", Storage = "lastName")]
public string LastName
{
get { return this.lastName; }
set
{
this.lastName = value;
this.SendPropertyChanged("LastName");
}
}
[ExcelColumn(Name = "party", Storage = "party")]
public string Party
{
get { return this.party; }
set
{
this.party = value;
this.SendPropertyChanged("Party");
}
}
[ExcelColumn(Name = "todate", Storage = "toDate")]
public string ToDate
{
get { return this.toDate; }
set
{
this.toDate = value;
this.SendPropertyChanged("ToDate");
}
}
[ExcelColumn(Name = "towhy", Storage = "toWhy")]
public string ToWhy
{
get { return this.toWhy; }
set
{
this.toWhy = value;
this.SendPropertyChanged("ToWhy");
}
}
protected virtual void SendPropertyChanged(string propertyName)
{
System.ComponentModel.PropertyChangedEventHandler handler = this.PropertyChanged;
if (handler != null)
{
handler(this, new System.ComponentModel.PropertyChangedEventArgs(propertyName));
}
}
}
Next you need to ensure that you build the application in x86 configuration mode (this is crucial if you are running on a 64bit environment) otherwise the application will not be able to resolve the OLEDB drivers.
Once you have Data Model class set up and configured the build, you can now perform a LINQ query against the Excel document, first, create a new Excel LINQ Provider, pointing to the document, then perform you LINQ Query against the specified WorkSheet:
var provider = ExcelProvider.Create(Path.Combine(targetDirectory.FullName, "Documents\\deceased-members.xlsx"));
If you wanted to retrieve a list of all Labour MPs who have died in office – you can just perform a normal LINQ query:
var xMps = from p in provider.GetSheet<MemberOfParliamentDataModel>() select p;
var xLabourMps = from xMp in xMps
where xMp.Party == "Lab"
select xMp;
foreach (var mp in xLabourMps)
{
Console.WriteLine(
string.Format(
"{0} {1} - {2} - {3} - {4}",
mp.FirstName,
mp.LastName,
mp.Party,
mp.Constituency,
mp.ToDate));
}
If you want to add a new record the syntax is as follows:
var diedToday = new MemberOfParliamentDataModel
{
Constituency = "Somewhere",
FirstName = "John",
FromDate = "2000-01-01",
FromWhy = "general_election",
LastName = "Doe",
Party = "Ind",
ToDate = DateTime.Now.ToShortDateString(),
ToWhy = "died"
};
provider.GetSheet<MemberOfParliamentDataModel>().InsertOnSubmit(diedToday);
provider.SubmitChanges();
And that’s it. Very simple, very flexible, very powerful. But the Data Model class is also very ugly, very verbose; bloated with boilerplate commodity behaviour. At this point I decided to change the tires.
The first step was to refactor the source code; originally all the source code was in a single file, so I moved each class into it’s own file – instantly you get a better feel for the shape and complexity of the codebase:
Next I went through the codebase and cleaned it up a little, renamed some of the variables, added more spacing to make the flow of logic easier to follow with the eye and reordered some of the object creation, so that it’s easier to differentiate dependency creation from actual business logic. Now it’s much easier to see the magic that the author wove. Next I added some better null (empty cell handling) that I was seeing with some of my test data.
Whenever I see code that’s “verbose, bloated with boilerplate commodity behaviour” I always think about PostSharp, the Open Source AOP Framework and how this can be used to remove commodity. Randomly at about the same time an internal discussion started about eliminating magic strings from INotifyPropertyChanged implementations and Jonathan George rose to the challenge and implemented a PostSharp Aspect called ImplementINotifyPropertyChanged to weave the INotifyPropertyChanged behaviour into the decorated object at compile time. Once he published the code – I refactored my sample and was amazed at the result. The code for the aspect is included in the example at the end of the post – but be warned, it’s black voodoo magik performed in “here be dragons” land. But the simplifying effect on the code is utterly amazing – the PostSharp Aspect is responsible for all the INotifyPropertyChanged concerns, which means our Data Model can be converted to use automatic properties, we can now take advantage of some default behaviours inside LINQ to Excel that resolves the Storage field of the ExcelColumn attribute to be the name of the property it decorates:
[ImplementINotifyPropertyChanged]
[ExcelSheet(Name = "Deceased MPs")]
public class MemberOfParliamentDataModel
{
[ExcelColumn(Name = "constituency")]
public string Constituency { get; set; }
[ExcelColumn(Name = "firstname")]
public string FirstName { get; set; }
[ExcelColumn(Name = "fromdate")]
public string FromDate { get; set; }
[ExcelColumn(Name = "fromwhy")]
public string FromWhy { get; set; }
[ExcelColumn(Name = "lastname")]
public string LastName { get; set; }
[ExcelColumn(Name = "party")]
public string Party { get; set; }
[ExcelColumn(Name = "todate")]
public string ToDate { get; set; }
[ExcelColumn(Name = "towhy")]
public string ToWhy { get; set; }
}
If you download the sample, build it and then use .NET Reflector to decompile LINQtoExcel.Sample.exe and examine all the extra plumbing that PostSharp has woven into the MemberOfParliamentDataModel class. AOP / PostSharp are commonly used to remove commodity code such as Logging / Caching / Exception Handling – but as Jonathan’s code (and Ralf Westphal’s Software Transactional Memory example) demonstrates it can be used in much more sophisticated scenarios.
LINQ to Excel should be a great addition to your developer tool belt – it’s especially useful for playing with sample data in simple ETL scenarios. When combined with PostSharp implementation of your data model is vastly simplified and will get you up and running even faster. Don’t work hard, work smart and utilise many of the powerful Open Source tools at your disposal.
@HowardvRooijen