Sunday, March 29, 2009

XQuery – How ye disappoint

 

I’m updating a legacy app from VB6 to ASP and then to .NET. The ASP is a transitional step so that I can stop dealing with COM+ objects.

Anyhoo… I picked one of the common pages. It prints a grid, basically.

  1. page calls a vbscript function
  2. the vbscript function executes a query and gets back a recordset
  3. a series of nested loops covert the recordset to xml
  4. the xml is returned to the page
  5. the page uses XSLT to convert the XML to HTML

Sweet.

Now that I’m converting it to .NET, though, I wanted to try exciting new possibilities. The application is on SQL 2000, but updating to 2005 is a reasonable expectation. (I won’t push my luck with 2008).

The Intent

The idea is to transform this data

image

into this xml

image

In the current app, that transformation is done in ASP VbScript.

 

SQL XML

I’ve dabbled with some of the XML capabilities in 2005. I’ve used it to join to tables and to shred the xml. I’ve also used it to create xml documents without that pesky !TAG! syntax. But, they were all meager efforts.

I started by hoping that such meagerness would be sufficient.

   1: declare @startDate datetime



   2: declare @endDate datetime



   3: declare @theaterId int



   4:  



   5: select



   6:     @startDate = '10/31/2003',



   7:     @endDate = '11/6/2003',



   8:     @theaterId = 170



   9:  



  10:     select 



  11:         v.FilmId "film/@film-id",



  12:         v.FilmName "film/@film-name",



  13:         v.PrintId "film/print/@print-id",



  14:         dates.[Date] "film/print/date/@date",



  15:         a.AuditoriumName "film/print/date/auditorium/@auditorium-name",



  16:         a.AuditoriumId "film/print/date/auditorium/@auditorium-id"



  17:     from 




But its not. That built the the hierarchy, but it repeats itself over and over. It doesn’t group itself the way I need. If there was/is a way to do everything I need by specifying the paths like that, then it would be a good day.



SQL XQUERY



Next, I started dabbling with XQUERY. My only XQUERY experience has been via SQL Server 2005, and in its simplest form.



I went Google-Crazy and read up on some stuff. I was able to write a query (albeit a crappy one) that does the job.





   1: select @output.query('



   2:     <theater>



   3:         {



   4:             for $filmId in distinct-values(/theater/film/@film-id)



   5:             return 



   6:             <film>



   7:                 { attribute id { $filmId }}



   8:                 { attribute name { /theater/film [@film-id = $filmId][1]/@film-name }}



   9:                 {



  10:                     for $printId in distinct-values(/theater/film [@film-id=$filmId]/@print-id)



  11:                         order by $printId



  12:                         return 



  13:                         <print>



  14:                             { attribute id { $printId }}



  15:                             {



  16:                                 for $date in distinct-values(/theater/film [@film-id=$filmId and @print-id=$printId]/@date)



  17:                                     order by $date



  18:                                     return 



  19:                                     <date>



  20:                                         {attribute date { $date }}



  21:                                         {



  22:                                             for $auditoriumId in distinct-values(/theater/film [@film-id=$filmId and @print-id = $printId and @date=$date]/@auditorium-id)



  23:                                             return



  24:                                             <auditorium>



  25:                                                 { attribute id { $auditoriumId }}



  26:                                                 { attribute name { /theater/film [@film-id=$filmId and @print-id = $printId and @date=$date and @auditorium-id=$auditoriumId]/@auditorium-name }}



  27:                                             </auditorium>



  28:                                         }                                            



  29:                                     </date>



  30:                             }



  31:                         </print>



  32:                 }



  33:             </film>



  34:         }



  35:     </theater>



  36: ')




How does this suck? Let me count the ways




  1. 4 levels of nesting. Its not pretty. But, the VbScript has the same layers. (The logic is different, but its just as nested)


  2. Each layer has to go back to the top and work its way back down based on the key information collected thus far


  3. In the loops, I can only order by the loop indexer. For example: Auditorium. I don’t want to sort on “auditorium id”. I want to sort on display order. I can’t, because “auditorium id” is a value, not a node. If it was a node, I’d be able to get to a sibling attribute.


  4. It offers a handy distinct-values, but does not offer a handy distinct-nodes. (there are example how to do distinct-nodes, but the few I’ve seen use the LET statement, which you can’t do in SQL 2005)



What doesn’t suck



Obviously I’m having problems with it, but that may just be due to my staggering 45 minutes of inexperience with it.




  1. I like the syntax of specifying the attributes (shown) and elements (not shown) through the {} syntax


  2. I like that the comments are smiley faces (not shown).  (: this is an xquery comment :)


  3. In principle, I like how you can do the layering.



The Problem



I got the XML that I want, but its slow.  The SQL XQUERY consists of 2 parts: the query to get the data as xml, and the xquery to transform it the way I’d like it.



The first part comes back instantaneously. The 2nd part takes anywhere from 2 to 16 seconds. One time, it took a minute and 54 seconds?!. Its really inconsistent. I looked at the execution plan multiple times. Every time it says that the first query accounts for 0% of the time, and the 2nd query accounts for 100% of the time.



The legacy app does all it needs to do, including rendering it on the page, in a 1/2 second or less. You don’t even see it happen; you just click the link and the page renders.



I know that my xquery is amateur. If I can rewrite it the way it should be written and try again, maybe the results will be drastically improved. (At least I hope they are.)



Things that Would Help




  • SQL 2008 supports the LET statement. If I had that in 2005, then I could assign node sets at the various levels, and treat that as the root for that level. Then it wouldn’t have to go to the top of the document every time. (At least, it seems like I’d be able to do that)


  • If I could do a distinct-nodes instead distinct-values, then as I loop through, I can get the other stuff I need relative to the attribute. IE: $film-id/../@film-name.


  • Knowledge of XQuery would sure be helpful.



Next Steps / Conclusions



I wanted the source doc to be hierarchical so that its would be an accurate representation of the data. Since the XQuery didn’t work, I may end up doing it in C#. Then, the page will use an XSLT to render it. (I’ll look into using XQUERY to render it, but I don’t think that’s a viable option yet).



I developed the original application starting in 2001. Over the first few years, I spent a lot of time performance testing the quickest way to get the data out of the database and onto a page. I always lean towards XML and XSLT so that you can easily render it different ways. I want to keep it transformable.



Of all the things I tried, the quickest thing has always been:




  1. Run the query and get back a flat dataset


  2. Use code to convert the dataset to xml



Despite the repeating data, and despite the manual conversion, it wins every time.



Things I may try




  1. Convert the SQL XML to my XML via XSLT


  2. Convert the SQL XML to my XML via C# code (the old fashioned way with a new language)


  3. Read more about XQuery to determine how off-target my query really is

4 comments:

Troy said...

Have you tried the FOR XML PATH syntax for creating the output xml?

Jay Allard said...

Yep.

The first query, prior to the xquery, does the PATH to get the flat xml. I just showed the select since the FROM and WHERE would've cluttered it. I was able to build the hierarchy I needed pretty easy, but the top 3 levels kept repeating themselves.

So, I'd have
film/print/date/auditorium[1]
film/print/date/auditorium[1]

But I want
film/print/date/auditorium[1]
/auditorium[2]

Jay Allard said...

I realize I can do it using PATH and nested queries. I'm trying to avoid the nested query part, but I will put that on the list of things to try. If its the speed I'm looking for, swell.

Jay Allard said...

I just spent hours playing with this.

I tried using a CTE and subqueries. It didn't perform very well. I traded in the CTE for a table variable, and it performed much better.

The problem in the non-flat approach, I've determined, is the deepest subquery. When I pull everything back flat, it comes back easy. When I break the query in two, it hurts. I tried using another @table and having the subquery hit that, and it helped, but its still not as fast as the flat.