Welcome to EMC Consulting Blogs Sign in | Join | Help

Claypole's World - The SQL Server Side

Concerning development of bespoke applications and database administration.

Focus on Fast Track : Understanding the –E Startup Parameter

One of my focuses at the moment is the work that Microsoft have done with various hardware providers to deliver “Fast Track” – A reference architecture for SQL Server Data Warehousing that delivers predictable blistering performance on prescribed commodity hardware. Sounds great I think you’ll agree.  What is even better is that the architecture comes with a series of best practices on how to make the most of your Fast Track system. 

As you might expect there are some SQL Server settings you need to change to make.  However, I am impressed by the fact that the team have gone further than this.  There is also a focus on the build of your solution and how you should load data into a Fast Track Data Warehouse.

It’s inspired me to get blogging again and I’ve decided to write a series called “Focus on Fast Track”.  This should follow on nicely for those of you who attended my session at SQLBits VI entitled “Fast Track Foundations : Serious about Sequential I/O”. If you didn’t see this presentation it shouldn’t matter too much – especially as all the sessions are recorded at SQLBits and are made available for free over the internet once all the editing / post production work has been done. 

Fast Track is important an important step for Microsoft. They are helping customers get the most from SQL Server and not just from the software perspective.  What is being offered is clear deployment solution for data warehousing.  Tackling the physical deployment challenge with real hardware specifications is a big step forward. I hope it is the first of many.

In this first post I’d like to talk about one of the recommended settings for SQL Server in a Fast Track System.  The use of the –E startup parameter. Actually its very relevant to all SQL Server Data Warehousing scenarios.  It has also had a chequered past in terms of how this parameter has been documented. I’d therefore like to try and bring a new insight into what this parameter does but also why it is important.

Life without –E

Before identifying what –E does or doesn’t do I think it would be helpful to clearly state what happens when –E has not been configured on a SQL Server.

Imagine a Database with a Primary and Secondary File Group. The Primary contains a single file and only holds the internal tables etc and the secondary hold the user data.  The secondary file group contains multiple files that are evenly sized and spread across multiple LUNs.  It would look something like the database below.

CREATE DATABASE [EParam] ON  PRIMARY 
    (NAME = N'EParam'
    ,FILENAME = N'C:\Temp\EParam.mdf' 
    ,SIZE = 51200KB 
    ,MAXSIZE = UNLIMITED
    ,FILEGROWTH = 1024KB 
    ) 
    ,FILEGROUP [Data]  DEFAULT 
    (NAME = N'EParam1'
    ,FILENAME = N'C:\Temp\EParam1.ndf' 
    ,SIZE = 10240KB 
    ,MAXSIZE = UNLIMITED
    ,FILEGROWTH = 1024KB
    )
    ,
    (NAME = N'EParam2'
    ,FILENAME = N'C:\Temp\EParam2.ndf' 
    ,SIZE = 10240KB 
    ,MAXSIZE = UNLIMITED
    ,FILEGROWTH = 1024KB 
    )
    ,
    (NAME = N'EParam3'
    ,FILENAME = N'C:\Temp\EParam3.ndf'
    ,SIZE = 10240KB 
    ,MAXSIZE = UNLIMITED
    ,FILEGROWTH = 1024KB
    )
    , 
    (NAME = N'EParam4'
    ,FILENAME = N'C:\Temp\EParam4.ndf' 
    ,SIZE = 10240KB 
    ,MAXSIZE = UNLIMITED
    ,FILEGROWTH = 1024KB
    )
LOG ON 
    (NAME = N'EParam_log'
    ,FILENAME = N'C:\Temp\EParam_log.ldf' 
    ,SIZE = 1024KB 
    ,MAXSIZE = 2048GB 
    ,FILEGROWTH = 10%
    )

When a database has been laid out as above SQL Server will use “a round robin proportional fill” algorithm to evenly distribute the data.  Every time a table requests an extent from SQL Server it will be satisfied by allocating an extent from the next file in the filegroup. This ensures that the files fill up evenly. Note the round robin is not determined at the table level.  We can see this in the following example script.

Use EParam

Create Table NoET1 (ECol1 char(8000)) ON Data;
Create Table NoET2 (ECol1 char(8000)) ON Data;

declare @cnt int = 0

While @cnt <8
BEGIN
    Insert into NoET1
    Values (REPLICATE('X',8000));
    
    Set @cnt = @cnt +1
END
GO
declare @cnt int = 0

While @cnt <8
BEGIN
    Insert into NoET2
    Values (REPLICATE('X',8000));
    
    Set @cnt = @cnt +1
END
GO
declare @cnt int = 0

While @cnt <8
BEGIN
    Insert into NoET1
    Values (REPLICATE('X',8000));
    
    Set @cnt = @cnt +1
END
GO
declare @cnt int = 0

While @cnt <8
BEGIN
    Insert into NoET2
    Values (REPLICATE('X',8000));
    
    Set @cnt = @cnt +1
END
--and so on

I’ll use the excellent Internals Viewer to illustrate what has happened.

image

Each time a new extent has been required it has been granted from the next data file in the sequence. This has kept the data distribution even.

If prop fill operated at the table level then the NoET1 (blue purple) table would have data in all four files but it does not.

The Gaps (Slight Tangent)

You may be wondering why there is a couple of odd looking things going on in the picture above.  Lets look at file 5 as a good example.

  1. Why are the first set of squares empty?
  2. Why are there holes and mixed colours at the start of the allocation process but not at the end

image

At the beginning of every file is a bunch of pages that define the file. They are the File Header(5:0), PFS(5:1), GAM(5:2), SGAM(5:3), DCM(5:6) and BCM(5:7) pages. The numbers in brackets relate to the (File Number : Page Number). That takes care of the first eight grey squares.  (5:4) and (5:5) are not allocated.

The reason the next extent has three colours is because it is a mixed extent. It contains one page from NoET1, six pages from NoET2 and an IAM page.

IAM pages are distributed as needed through files and manage the extents for a 4 GB range in a given database file for a given allocation unit.  Each Table can have up to three types of allocation unit associated with it – IN_ROW_DATA, LOB_DATA, ROW_OVERFLOW_DATA. All our Rows are IN_ROW_DATA. Furthermore, you will get one IAM page for each file on which an allocation unit has extents.

So in our case lets quickly look at NoET1. This table only contains IN_ROW_DATA and so is associated with a single allocation unit.  However, it contains data in three files so we have 3 IAM Pages for this table.

Abbreviation Meaning
GAM Global Allocation Map

SGAM

Shared Global Allocation Map
DCM Differential Change Map
BCM Bulk Changed Map
PFS Page Free Space

Mixed Extents, GAM, SGAM, DCM, BCM and IAM pages are all covered in books online here.

Where is the Confusion?

The functionality of –E I don’t believe has been completely documented. There is a subtlety to how –E works that hasn’t in my view been adequately described. Even the Fast Track paper references an incorrect/out of date KB Article.  Let’s just highlight the sources of incorrect info and look at what they say.

  1. Fast Track White Paper
  2. KB Article 329526
  3. My Blog :o(
  4. Data Loading Performance Guide

Fast Track White Paper

For the clarity of the post I've copied verbatim what the Fast Track White Paper says:

-E should be added to the start-up options. This increases the number of contiguous extents in each file that are allocated to a database table as it grows. This improves sequential disk access. Microsoft Knowledge Base Article 329526 describes the -E option in more detail.”

What is wrong with this statement? –E does not increase the number of contiguous extents in each file that are allocated to a table.  Maybe it should – interesting argument -  but it does not.

KB Article 329526

The source: 329526.

“The default allocation mechanism allocates one extent per file, using a round robin proportional fill factor algorithm, before it switches to the next file in the filegroup.
In the SQL Server 64-bit release, you can allocate four consecutive extents per given file, before switching to the next file in the filegroup.”

So what is wrong with this statement? You do not allocate four consecutive extents per given file before switching to the next file in the filegroup. Certainly not on SQL Server 2008. This KB article hasn’t been updated to say it covers 2008 but I don’t believe the code here changed. I would ignore this KB Article completely.

My Blog

The SQL Gods may strike me down for this one. Rule no.1 don’t believe everything you read.  I’d referenced 329526 for a series of posts about enterprise options but clearly hadn’t tested this one properly. Naughty Claypole. I would ignore this post if I were you.

Data Loading Performance Guide

The Data Loading Performance Guide says the following about –E :

“When SQL Server bulks load into a table allocated to this filegroup, extents are allocated in a round robin fashion. In the default configuration, SQL Server allocates one extent to each file before moving to the next in the chain. It is possible to increase this to 64 extents using the –E startup flag.”

File Default Extent Allocation With –E Startup
0 Extent 1 1-64
1

Extent 2

65-128
0 Extent 3 129-192
1 Extent 4 193-256
etc etc etc

This is easily the closest to the actual answer. I’d say the wording  “It is possible to increase this to 64 extents using the –E startup flag” is slightly ambiguous and doesn’t deal with a small subtlety which is what I hope to show you in the next section. Suffice it to say that SQL Server does not increase the allocation to 64 extents when –E has been enabled.  Remember an allocation means a reservation. If a table received 64 extents every time it received an allocation then the Total pages column in sys.allocation_units would be increased by 512 pages. It does not.

So what does –E do?

-E changes the rate at which the proportional fill occurs. The Data Loading Performance Guide alludes to this – especially in the table.  Rather than moving to the next file after a single extent allocation the movement occurs after 64 extents have been allocated.  These extent allocations do not need to come from a single table. If I was loading two tables in parallel for example and had –E enabled then both tables would consume those 64 extents in the single file before moving on to the next file.

-E does not alter the fact that SQL Server allocates one extent at a time.

Therefore when –E is enabled you will only see Total Pages increase by 8 pages (1 extent) every time an allocation occurs.

That sounds like I might be splitting hairs but I believe the distinction is important to understand. It is especially important for Fast Track as Fast Track is completely dependent on sequential I/O for its performance. –E is enabled to enhance the sequential scanning performed in a read-ahead so that more data is contiguously available in a single file.  If two tables that share the same files/filegroup are loaded in parallel then the advantage of –E is removed. The extents will be fragmented and this will in turn reduce the performance of the read-ahead.

Now to show that I have learned my lesson. Here comes the proof.

Proving –E Functionality

First of all we need to enable the –E startup parameter.  You can do this by amending the startup parameters on the SQL Server Service.  You should be using SQL Server Configuration Manager to do this.

Step 1 : Enable –E Startup Parameter

I have actually documented setting the –E startup parameter in another post. There are a couple of pitfalls that I thought were worth bringing out and so if you are unsure about setting Startup Parameters I’d urge you to read it. It includes a step by step guide to setting this correctly (yes you can set it incorrectly).

However, if you just want to get on with the show make sure you set your startup parameter with exactly ;-E at the end of your list of Startup Parameters on the Startup parameter property of the SQL Server Service by using SQL Server Configuration Manager.

Step 2 : Create a multi file database

Something like this should be fine.

CREATE DATABASE [EParam] ON  PRIMARY 
    ( NAME = N'EParam'
    , FILENAME = N'C:\Temp\EParam.mdf' 
    , SIZE = 51200KB 
    , MAXSIZE = UNLIMITED
    , FILEGROWTH = 1024KB 
    )
, FILEGROUP [Data]  DEFAULT 
    ( NAME = N'EParam1'
    , FILENAME = N'C:\Temp\EParam1.ndf' 
    , SIZE = 10240KB 
    , MAXSIZE = UNLIMITED
    , FILEGROWTH = 1024KB 
    )
    , 
    ( NAME = N'EParam2'
    , FILENAME = N'C:\Temp\EParam2.ndf' 
    , SIZE = 10240KB 
    , MAXSIZE = UNLIMITED
    , FILEGROWTH = 1024KB 
    )
    , 
    ( NAME = N'EParam3'
    , FILENAME = N'C:\Temp\EParam3.ndf' 
    , SIZE = 10240KB 
    , MAXSIZE = UNLIMITED
    , FILEGROWTH = 1024KB 
    )
    , 
    ( NAME = N'EParam4'
    , FILENAME = N'C:\Temp\EParam4.ndf' 
    , SIZE = 10240KB 
    , MAXSIZE = UNLIMITED
    , FILEGROWTH = 1024KB 
    )
     LOG ON 
    ( NAME = N'EParam_log'
    , FILENAME = N'C:\Temp\EParam_log.ldf'
    , SIZE = 1024KB 
    , MAXSIZE = 2048GB 
    , FILEGROWTH = 10%
    )
GO

Step 3 : Create two tables

Create Table ETestTable (ECol1 char(8000)) ON Data;
Create Table ETestTable2 (ECol1 char(8000)) ON Data;

Internals Viewer shows that the database is still empty. As no data has yet been added nothing has been allocated.

image

Step 4 : Load in 64 extents of data into ETestTable

declare @cnt int = 0

While @cnt <504
BEGIN
    Insert into ETestTable
    Values (REPLICATE('X',8000));
    
    Set @cnt = @cnt +1
END

Internals Viewer shows that all 504 records went into a single file. The round robin has not yet kicked in.  We load in 504 records because the first extent is mixed extent and our first 8 page requests will be allocated to mixed extents. The situation gets slightly muddled as one of the pages in the first extent is an IAM page meaning that we spill over into a second mixed extent. Therefore the first TWO extents allocated will be mixed extents. However, the 9th page request will allocate a uniform extent as we can see below.

image

Step 5 : Load another 64 extents of data into ETestTable

This time we have no issue. We can happily load in 512 records for our 64 extents.

Internals Viewer shows us that EParam3 has allocated all the extents.

image

Clearly then we can see 64 extents being allocated from one file before the prop fill round robin kicks in.

However, notice that there is no IAM page in  the EParam3 file. Where is it?

Well it is actually in the mixed extent on EParam4.

image

The two pages I have highlighted in the image above are the IAM Pages.  The first IAM page on the left handles allocations from IN_ROW_DATA for ETestTable in the EParam4 file and the IAM page on the right handles IN_ROW_DATA extent allocations from ETestTable in the EParam3 data file.

Step 6 : Parallel Load Two Tables

Step 6a : Load 64 Extents into second table

First of all let’s even up proceedings and do an initial load into even things up.

declare @cnt int = 0

While @cnt <512
BEGIN
    Insert into ETestTable2
    Values (REPLICATE('X',8000));
    
    Set @cnt = @cnt +1
END
image

This time we can load in 512 pages in our first attempt to get to a full 64 extents because the previously allocated mixed extents have consumed 5 of our initial 8 page requests. Also note that ETestTable2 needs 2 IAM pages as the data has now spread into two files.

image

Step 6b : Parallel load ETestTable and ETestTable2 with 32 Extents each

To simulate the parallel load I’ve inserted a row for each table in the following script and halved the number of loops:

declare @cnt int = 0

While @cnt <256
BEGIN
    Insert into ETestTable
    Values (REPLICATE('X',8000));
    Insert into ETestTable2
    Values (REPLICATE('X',8000));    
    Set @cnt = @cnt +1
END

What are the results?

image

We have a striped and perfectly interleaved set of extents. 

Clearly then –E does not allocate any more than one extent at a time and does nothing to protect this situation from occurring.

For completeness let’s just see where those IAM Pages are:

image

They are of course in that mixed extent that we used earlier.

Conclusion

So there we have it. Clearly –E does not allocate more than one extent at a time. The change made to SQL Server by adding it as a startup parameter is that round robin behaviour in the proportional fill algorithm has been affected. The round robin does not change files until 64 extents have been allocated from a single file.

Many thanks to Thomas Kejser for his help and to Henk van der Valk for suggesting Internals Viewer as the best way to showcase this behaviour.

In my next post in the Focus on Fast Track series I will look at the implications of –E before moving on to look at Sequential I/O and the impact –E has on it.

Until the next time,

JRJ

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Paul Randal said:

The confusion comes from the docs. The -E was originally put into the code to allow index builds and rebuilds to use 256k of each file before round-robining, so get better sequential IO for large DWs. The only thing that's changed in the code in 2008 is the size of the total allocations in a file before round-robin kicks in again - but the docs don't do an adequate job of explaining, as you've found out :-)

It was originally a 'benchmark special' and had very little testing, it was only doc'd in the KB article because it was used during benchmarks and so had to be documented.

April 25, 2010 2:56 PM
 

James Rowland-Jones said:

Thanks Paul - great insight.

April 25, 2010 3:23 PM
 

TheSQLGuru said:

Thanks for this series James!  I am getting pretty heavy into the Fast Track stuff, and just spent several days at a Microsoft Technical Center getting familiar with the HP FT setup.  I look forward to the remainder of your series!

Best,

Kevin Boles

May 2, 2010 2:59 PM
 

Eric Kraemer said:

Excellent analysis James. You correclty note that the v2.0 documentation is ambigous at best on the actual effect of the -E option. We should, at the least, state that -E increases the maximum "potential" number of contiguous extents allocated as a file grows. This underscores the importance of managing parallel and/or concurrent load operations and the value of partitioning and filegroup staging for any SQL Server deployment that values sequential data layout.

I look forrward to the rest of your series on Fast Track.

May 3, 2010 12:40 AM
 

James.Rowland-Jones said:

Thanks guys. Indeed this can be solved in the design of the ETL and this is something I hope to address in this series.

Having had time to reflect on Paul's comments a little more I think it's fair to say that -E has moved on from the realm of the benchmark special to a feature that underpins a reference architecture. It should therefore feature in BOL more prominently. The next fast track white paper and other sources can then point to that. Well that would be my suggestion.  I'll post something into connect and see if it gets traction...

Cheers, James

May 3, 2010 6:33 AM
 

John Alan said:

Just came across this blog after a link was posted on stackexchange.

Excellent explanation of -E

Makes you rethink about using it if you rely on SSDs

June 5, 2013 7:56 AM

Leave a Comment

(required) 
(optional)
(required) 
Submit

About James.Rowland-Jones

James is an Advisory Practice Consultant with EMC Consulting. He works primarily with SQL Server technologies in architecture, development and administration capacities. He also co-ordinates our Internal SQL Server Community.

View James Rowland-Jones's profile on LinkedIn

Powered by Community Server (Personal Edition), by Telligent Systems