Welcome to EMC Consulting Blogs Sign in | Join | Help

Data Based

Just some thoughts of mine

  • 2 + 2 = ?

    As a concise example of the evils of type coercion I came up with the following:

    INSERT INTO t
    VALUES (
    CAST(2 AS INT),
    CAST(2 AS INT));

    SELECT
    CASE WHEN x + z = 4 THEN '4' ELSE 'NOT 4' END xPLUSz
    FROM t;

    Result:

    xPLUSz
    ------
    NOT 4

    It is left as an exercise for the reader to work out what types x and z might be!

  • The Relational Model turns 40

    The Relational Data Model is 40 years old. Edgar Codd first described the idea that databases could be constructed from mathematical relations in an internal IBM Research report in August 1969. The following year his more famous paper on the same topic was published for public consumption. So began a revolution. Data was liberated from its physical representation and thereafter the most complex data processing problems could be solved using only a few simple building blocks (the eight basic operators of Codd’s relational algebra). Few concepts in the history IT have made such an impact or endured for so long.

    Derivability, Redundancy and Consistency of Relations Stored in Large Data Banks

    At this point I could try to write something profound about what the next 40 years might hold... I’m not foolish enough to predict anything like that far ahead though! For sure there is a lot more that could be done. The SQL systems that most of us use are still some way short of what Codd envisaged even four decades ago. It seems pretty clear that technology already exists to make possible much better database systems than we have today. Solid-state storage for example should make it possible to achieve far greater Physical Data Independence (database representation that is more abstracted from its internal storage structures). Richer type support in databases is something that database vendors have been improving but there is still some way to go – in particular type-inheritance isn’t well-supported by most DBMSs. Self-optimising databases is one area where I expect we’ll see a lot more innovation.  And then of course there is ever more scalability and distributed database technology, which brings me to the cloud and distributed databases…

    My potentially subversive question is this: Do we actually need new data models for the cloud? It seems that we do need different and hopefully better implementations of database systems than we have in centralised data centres. We will go on using XML stores and other types of storage too. But the relational model still ought to have a central role to play in cloudy, distributed data stores as well as the centralised ones.

    Brewer conjectures that we may need to accept partial or eventual consistency of data if we desire a system that’s both available and partitioned. Two observations about this seem important. Firstly, consistency guaranteed at some point will continue to be critical for the vast majority of non-trivial systems. No business is ever giving up consistency and accuracy as desirable goals because trust is the basis of every business model out there. Secondly, eventual consistency means that fewer business rules are implemented in the declarative manner that we are used to, i.e. they are evaluated some time after the event rather than at transaction time.

    This could actually suit the relational model very well. Because the RM doesn't have any navigational structures it can easily accommodate data that is late arriving. Provided no constraint is actually broken (as if we aren’t using declarative referential integrity for example) it is perfectly OK for some tuples to arrive “late” in our relational model. We can still query and join on the tuples that are present, we can derive place-holders for tuples that are missing (think outer joins) and we can apply integrity rules to data when it does arrive. This is in stark contrast to navigational models (like XML and other hierarchies) where an element must exist before its child elements can exist.

    Another desirable property of large distributed systems is a flexible and dynamic structure. RM is good here too. In principle relational schemas can be highly dynamic but unfortunately this is one area where SQL systems are found wanting. In a relational system, changing the set of attributes of a relation ought to be as simple as writing the query that transforms the data and then assigning the result of that query to a new relation variable. Potentially this needs to involve little or no change to the underlying storage and can therefore be a near-instantaneous metadata operation. SQL DBMSs make extremely heavy work of such a simple task however, which is why SQL databases are often difficult to change and adapt.

    Yet another thing on the distributed storage wish-list is rich type support – support for storing documents, images, XML and other objects in databases just as easily as strings and numbers. As already mentioned, SQL systems are doing some of that already and vendors will no doubt continue to improve type support.

    The many familiar and valid criticisms of SQL (see for example the “NoSQL” line of thought) are just that – problems with the SQL model and SQL DBMS limitations rather than with the relational model. (See also NoSQL and the Relational Model: don’t throw the baby out with the bathwater). So even after 40 years I still think the relational model is the coolest game in data-land, and perhaps the best is still to come because RM’s potential has never yet been fully realised in mainstream database software. Kudos Mr Codd!

    E.F.Codd

  • UPDATE FROM the Sybase code museum

    UPDATE FROM is dead … or it should be. Hugo Kornelis has done a good job of explaining why.

    Apart from a badly thought out and non-portable syntax, UPDATE FROM suffers a notorious bug/feature. I say “notorious” but it’s probably not notorious enough because it is still common to see people get tripped up by it. If I write:

    UPDATE Alpha
    SET z = Beta.z
    FROM Alpha
    JOIN Beta
    ON Alpha.x = Beta.x ;

    and if Beta.x is not unique then Alpha.z will get updated with some random value from one of the matching rows in Beta.z. This is documented in Books Online as follows:

    “The results of an UPDATE statement are undefined if the statement includes a FROM clause that is not specified in such a way that only one value is available for each column occurrence that is updated, that is if the UPDATE statement is not deterministic.”

    MERGE handles the same situation more wisely and safely: it returns a runtime error if the join is non-deterministic:

    MERGE INTO Alpha
    USING Beta
    ON Alpha.x = Beta.x
    WHEN MATCHED THEN UPDATE
    SET z = Beta.z ;

    Msg 8672, Level 16, State 1, Line 42
    The MERGE statement attempted to UPDATE or DELETE the same row more than once. This happens when a target row matches more than one source row. A MERGE statement cannot UPDATE/DELETE the same row of the target table multiple times. Refine the ON clause to ensure a target row matches at most one source row, or use the GROUP BY clause to group the source rows.

    MERGE is also standard SQL, works in other DBMSs and frequently results in more efficient execution plans than UPDATE FROM.

    UPDATE FROM isn’t originally Microsoft’s fault because it’s a legacy of SQL Server’s ancient historical roots at Sybase. But it is Microsoft’s fault they haven’t deprecated it yet. You can vote for deprecating UPDATE FROM on Connect:

    http://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=332437

  • DB2 9.7 and Database Type Safety

    DB2 has some new features that are aimed at customers who want to transition from Oracle to DB2:

    http://www-05.ibm.com/ch/events/symposium/pdf/9_M_Obrecht_IBM_Symposium_2009.pdf

    IBM are emphasising the improved snapshot isolation, PL/SQL support and packages (“modules” in IBM’s terms).

    One “feature” that caught my eye is support for implicit conversion. This is a surprise to me. In most areas of software engineering strong typing is considered not just desirable but pretty essential to program correctness and early identification of bugs. I don’t think that reducing or removing type safety is a good way to improve database languages.

    In SQL Server, Oracle and other DBMSs we already have to live with implicit typing (type coercion) and the problems it causes. SQL Server won’t complain at compile time if I write the following two lines in the same procedure or script:

    SELECT col1 FROM tbl WHERE col2 = CAST(1 AS INT);
    SELECT col1 FROM tbl WHERE col2 = CAST('2' AS VARCHAR(10));

    The effect is that SQL Server recklessly and silently casts strings as numbers and won’t even give a runtime error until the data itself causes the conversion to break. Getting it right therefore requires much more effort than it would do if types were checked at compile time.

    Erland Sommarskog has posted a feedback item requesting that stricter compile time checking be incorporated in a future version of SQL Server. Please rate it if you agree:

    http://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=260762

    Posted 24 June 2009 22:01 by David.Portas | 1 Comments
    Filed under:
  • SQL Server 2008 - Why I'm not waiting for SP1!

    Whether and when to upgrade to SQL Server 2008? It seems that there is still a school of thought that the release version of a product is not to be trusted and that the best policy is to "wait for SP1". Now this has never made much sense to me, not least because Service Packs can mean quite different things in different products. Sticking with SQL Server 2008 though, I believe there are plenty of reasons to upgrade now and NOT to wait for SP1.

    Firstly, SQL Server 2008 is an extremely high quality and stable product and has already notched up a great many server-years of collective testing, both before release and after. I personally have been developing and testing with it on projects for a year now and it has been the best quality preview version of the product I've ever worked with. SQL Server 2008 RTM is in very much better shape now than SQL Server 2005 was at this point in its life.

    Secondly, it has to be said that the history of SQL Server Service Packs is not an entirely happy one. There have been some very unfortunate problems with SQL SPs. For sure, the SQL Server team will learn from their mistakes but the fact is that an SP is never going to get the level of testing at customer sites that a major release gets. This is natural enough because SPs rarely contain new functionality so it's harder to justify the investment needed to build and test with the beta versions of SPs.

    In terms of quantifiable risk then, installing a 2008 service pack right after release is potentially far more risky than installing SQL Server 2008 RTM right now.

    Thirdly, there are the real returns you could get from upgrading to 2008 even with minimal effort. Upgrading from 2000 to 2005 was and is a big undertaking. Between 2000 and 2005 there is a whole raft of new features, deprecated features, breaking changes - all of which adds up to a lot of regression testing and potential development effort to take advantage of new features. In comparison, the additional effort to move from 2005 to 2008 or from 2000 to 2008  is a far smaller piece of work. Fewer new things will be broken or need re-working. Plus you can take advantage of new features like compression and policy based management with zero change to existing applications.

    Andrew Fryer says that you may have to wait a long time for a SQL 2008 SP. At the UK User Group meeting last night he also had a suggestion. Even if you aren't upgrading your servers yet, at least install and use 2008 Management Studio so that you can take advantage of the new features in the tool!

  • SQL Server 2008 RTM

    SQL Server 2008

    SQL Server 2008 is out! All editions are on MSDN now except for Express, which should hopefully follow soon. Enjoy!

  • Update on T-SQL MERGE

    The MERGE syntax I used here has changed. In 2008 RC0 the "WHEN SOURCE NOT MATCHED" clause is replaced with "WHEN NOT MATCHED BY SOURCE" which is supposed to make the meaning clearer.

    Although MERGE is part of standard SQL 2003 it's worth noting that the SOURCE NOT MATCHED / NOT MATCHED BY SOURCE clause is not part of the standard. It's an extension added by Microsoft - a very useful one in my view.

    MERGE INTO a
    USING b
    ON a.keycol = b.keycol
    WHEN MATCHED THEN
    UPDATE SET
      col1 = b.col1,
      col2 = b.col2,
      col3 = b.col3
    WHEN NOT MATCHED THEN 
      INSERT (keycol, col1, col2, col3) 
      VALUES (b.keycol, b.col1, b.col2, b.col3)
    WHEN NOT MATCHED BY SOURCE THEN 
      DELETE;

  • SQL Server 2008 RC0

    SQL Server 2008 Release Candidate 0 (build 10.0.1442.32) was released to MSDN and TechNet subscribers on Friday. It should be available to everyone else very soon via the Connect site.

    http://msdn.microsoft.com/en-us/subscriptions/default.aspx

    http://technet.microsoft.com/en-us/subscriptions/default.aspx

     

    Apart from the new branding (see above) the goodies in this release include some Management Studio enhancements: object search, T-SQL debugger and PowerShell support. Yes, the debugger is back for those who miss it from SQL Server 2000 days!

    What's also new is that multiple editions are now included in a single package, so you get Express, Standard, Enterprise, etc instead of just Developer. A product key is required to install the paid-for editions.

     

    SQLServer2008RC0Setup

     

    SQL Server Express Edition 64 bit

    I have seen no info at all from Microsoft about this but RC0 does include a 64 bit version of Express Edition. This is the first native 64 bit release of Express. SQL Server 2005 Express Edition is 32 bit and WOW only.

     

    SELECT @@VERSION;

    --------------------------------------------------------------------------------------------------------
    Microsoft SQL Server 2008 (CTP) - 10.0.1442.32 (X64)
        May 29 2008 23:52:04
        Copyright (c) 1988-2007 Microsoft Corporation
        Express Edition with Advanced Services (64-bit) on Windows NT 5.2 <X64> (Build 3790: Service Pack 2)

    (1 row(s) affected)

  • Not the model, Bill


    According to InfoWorld, Bill Gates thinks that declarative modelling of business logic has not caught on because of "weak models".

    This is a surprise. E.F.Codd's earliest papers on the Relational Model described a relational calculus for the purpose of supporting declarative business logic. Even the SQL standards committee added some powerful features to standard SQL, such as CREATE ASSERTION and generalised CHECK constraints with subqueries. Despite this, InfoWorld identifies the Relational Model as one of the "weak" models referred to by Gates!

    What is stopping people putting declarative business logic in their data models today? The answer is not the model. It is the DBMS products that are supposed to implement the model. Here's what Terry Halpin said about declarative constraint support in SQL back in 2001

    "Although the above SQL syntax [CREATE DOMAIN and CHECK constraint with subquery] is legal as far back as SQL-92, some commercial versions of SQL do not yet support all of this syntax. For data definition, some versions have barely progressed beyond the old SQL-89 standard (which had no domain clauses, and restricted check clauses to conditions on a single row). In practice, some features of a relational schema may need to be specified as a procedural code rather than declaratively."

    (Information Modeling and Relational Databases, p411)

    In 2000, Fabian Pascal wrote:

    "most commercial DBMSs have not implemented the [SQL] standard's advanced level integrity features. Some implement integrity procedurally via user-written stored procedures expressed in proprietary SQL extensions that can be quite complex, vary across products and have various limitations."


    (Practical Issues in Database Management, p64)

    And here's my take on the same problem:
    http://blogs.conchango.com/davidportas/archive/2007/02/19/Trouble-with-CHECK-Constraints.aspx

    Database practitioners have been saying for years that most SQL DBMSs do a poor job of supporting declarative business logic. Dear Bill: the relational model is just great. Please fix the omissions in the current products before going on to invent whole new ones!



  • SQL Server 2008 November CTP

    CTP5 is available today!

    http://www.microsoft.com/downloads/details.aspx?FamilyId=3BF4C5CA-B905-4EBC-8901-1D4C1D1DA884

    Jamie has already blogged about some of the new goodies in CTP5, which include Intellisense for Management Studio. Also new in this release is the FileStream feature, which is something I've been working with recently. I plan to have more to say about FileStream soon.

Powered by Community Server (Personal Edition), by Telligent Systems