Monday, July 16, 2007

SQL hierarchial query

"Solving problems that are hierarchical and nested in nature are best helped with a record oriented approach rather than a SET ORIENTED approach"

This assertion comes from the Visual FoxPro environment, where developers can navigate through result sets (VFP cursors…not to be confused with SQL Server Cursors) at the record level, very quickly and easily. No question, VFP is a great database tool. So great in fact, that some have used it for so long, and to solve so many problems, that it's easy to conclude that VFP's offerings are the only optimal hammer in the toolbox for certain operations.

SQL Server 2005 conforms more closely to the ANSI-99 SQL spec – and some of the features of the ANSI-99 standard making querying more flexible than before. One such feature is Common Table Expressions and recursive querying of hierarchical data. I'm going to devote some blog entries over the next few weeks to recursive querying, as I believe developers regularly face this challenge.

I'm going to start with a very simple example, to demonstrate the mechanics of recursive querying in SQL 2005. First, here's a simple set of hierarchical data – a product hierarchy with Brands, Groups, and Items. Each row contains a unique integer identifier, a description, and a reference to the row's parent:

DECLARE @tProducts TABLE (ID int, Name char(50), ParentID int)

INSERT INTO @tProducts VALUES (1, 'Brand 1', null)
INSERT INTO @tProducts VALUES (2, 'Brand 2', null)
INSERT INTO @tProducts VALUES (3, 'Brand 3', null)

INSERT INTO @tProducts VALUES (6, 'Brand 1, Group 1', 1)
INSERT INTO @tProducts VALUES (7, 'Brand 1, Group 2', 1)
INSERT INTO @tProducts VALUES (8, 'Brand 1, Group 3', 1)

INSERT INTO @tProducts VALUES ( 9, 'Brand 2, Group 1', 2)
INSERT INTO @tProducts VALUES (10, 'Brand 2, Group 2', 2)

INSERT INTO @tProducts VALUES ( 11, 'Brand 3, Group 3', 3)

INSERT INTO @tProducts VALUES (12, 'Brand 1, Group 1, Item 1', 6)
INSERT INTO @tProducts VALUES (13, 'Brand 1, Group 1, Item 2', 6)


INSERT INTO @tProducts
VALUES (14, 'Brand 1, Group 1, Item 1, SKU 1', 12)


Here's the challenge: for any specific item in this table, I want to know all the child records, going all the way down the hierarchy. Think about how you'd handle this query in SQL 2000 using SQL statements. While it was doable, querying hierarchical data was never an easy task. Fortunately, SQL 2005 makes this much simpler. Here's the entire query, which sets a search string, and then retrieves all the child records for that search string:

DECLARE @cSearch char(50)
SET @cSearch = 'Brand 1, Group 1'

; WITH ProductCTE AS
-- Anchor query
(SELECT ID, Name, ParentID
FROM @tProducts
WHERE Name = @cSearch
UNION ALL
-- Recursive query
SELECT Prod.ID, Prod.Name, Prod.ParentID
FROM @tProducts Prod
INNER JOIN ProductCTE
ON ProductCTE.ID = Prod.parentID )

SELECT * FROM ProductCTE

I'll confess, when I first looked at CTE and recursive query syntax, I felt pretty stupid that I couldn't grasp it. So, I took an online example, completely changed the sample data, built my own example, and then I didn't feel quite so stupid. ;)

Here's the deal/trick with recursive queries - in a "normal" SQL SELECT statement, you're querying from a table that isn't changing during the lifetime of the query (OK, data could be inserted from another session, but that's a different topic). You're querying directly FROM one table, and into a totally separate result set.

In a recursive query, some of the data you need to query is actually a result of a prior phase of the query. So results become the source for future results. Think about this with respect to our simple example, where we want all the children of a particular row – we can't pull the lowest level (SKU) without pulling the level above that (Item), and we can't do that without pulling the level above that (Group) and so on.

So let's take a look at the anatomy of the query, which has three parts. The first part is defining the Common Table Expression…

; WITH ProductCTE AS

As I said in my last blog post, a CTE is like a temporary derived table or view, with a very short lifespan. What we're going to do in the next two steps is query into ProductCTE for the search string, and then continue to query all the way down the hierarchy by comparing IDs to ParentIDs.

The second part is called the main or anchor query:

(SELECT ID, Name, ParentID
FROM @tProducts
WHERE Name = @cSearch

The anchor query executes first, so we know that ProductCTE will contain the ID, Name, and ParentID for the single row that matches the search condition.

The final part is the actual recursive query, which is connected to the anchor query with a UNION ALL:

UNION ALL
-- Recursive query
SELECT Prod.ID, Prod.Name, Prod.ParentID
FROM @tProducts Prod
INNER JOIN ProductCTE
ON ProductCTE.ID = Prod.ParentID )

Note that the anchor query is performing a join between the original list of products and the ProductCTE. So each "hit" between an ID and a ParentID will go into ProductCTE, and the query will continue until no more matches are found.


OK, so that handles querying DOWNWARD….suppose we wanted to find all the parent rows instead? Just switch the column names on the INNER JOIN in the recursive query:

ON ProductCTE.ParentID = Prod.ID )

If you're wondering about any limit to the number of recursions, the default is 100, which can be configured: check out MAXRECURSION in the online help if you ever have a hierarchy with more than 100 levels. (I'm almost afraid to see a hierarchy with that many levels!)

See, that wasn't so bad? There are many examples of hierarchical data out there, such as Bill of Material data, organization chart data, etc. Anyone who has ever worked with e-commerce applications is likely VERY familiar with the challenges of hierarchical data. SQL Server 2005 makes life much easier to deal with these challenges.

No comments: