Pages

Sunday, March 27, 2011

Using FOR XML to Flatten Detail Rows

Here is a technique to flatten multiple detail rows into a single row using the FOR XML clause of the SELECT statement. FOR XML is used to convert a record set into an XML document, but, as we shall see, it has other non XML related uses as well. As an example we will take a look at the sys.indexes and sys.index_columns views to generate a list of all the indexes in your database.

We will start with a simple query against sys.indexes:

SELECT OBJECT_SCHEMA_NAME(i.object_id) + '.' + OBJECT_NAME(i.object_id) 'object_name'
, i.name 'index_name'
, i.type_desc 'index_type'
FROM sys.indexes i
WHERE i.type > 0 AND OBJECT_SCHEMA_NAME(i.object_id) != 'sys'
ORDER BY 'object_name', i.index_id


This query returns a simple list of all the indexes in your database, excluding objects in the sys schema, with one row per index. Although this is a quick and handy way to see what indexes you have, and which objects they belong to, it does not show you which columns are involved in each index. To get this detail information we need to include the sys.index_columns and sys.columns views:

SELECT OBJECT_SCHEMA_NAME(i.object_id) + '.' + OBJECT_NAME(i.object_id) 'object_name'
, i.name 'index_name'
, i.type_desc 'index_type'
, c.name 'index_key_column'
FROM sys.indexes i
JOIN sys.index_columns ic
    ON ic.object_id = i.object_id
    AND ic.index_id = i.index_id
JOIN sys.columns c
    ON c.object_id = ic.object_id
    AND c.column_id = ic.column_id
WHERE i.type > 0 AND OBJECT_SCHEMA_NAME(i.object_id) != 'sys'
ORDER BY 'object_name', i.index_id, ic.key_ordinal


Although we can now see which columns are involved in each index this result set is difficult to read because it now returns one row per index column instead of the one row per index result set we got from the first query. The reason, of course, is because there is a one-to-many (1:N) relationship between sys.indexes and sys.index_columns. This is an example of a master-detail record set where we get one row from the master side of the Join operation, and multiple rows from the detail side. When these record sets are combined using an Inner Join we end up with one row in the combined record set for each row from the detail side. But what we want to make our result set more readable is to "flatten" the detail rows that represent the index columns so that we can again have one row in the result set for each index. We could do this by converting the index column names into a comma separated list but that would normally require converting our simple query into a T-SQL script using variables and control flow expressions. Another option would be to use multiple Left Outer Joins of the sys.index_columns view for each possible column involved in the index and then concatenate the names into a single column but since SQL Server allows you to define up to 16 columns as index keys that would result in a very long and ugly (not to mention slow) query. This is where the FOR XML clause comes in. To see how this works we will first extract the detail side of the Join and use FOR XML with the PATH mode to create a simple XML document of all the detail rows:

SELECT ic.object_id, ic.index_id
, c.name 'index_column'
FROM sys.index_columns ic
JOIN sys.columns c
    ON c.object_id = ic.object_id
    AND c.column_id = ic.column_id
FOR XML PATH

Running this query in SSMS will produce a single row of XML in the Results pane. Clicking this row will open up another tab showing the results as an XML file:



Next we can strip out all the XML elements from the results by removing all the column aliases and passing an empty string ('') to the PATH modifier. Then we can concatenate a comma to the sys.columns.name column to create a comma separated list of all the index column names in the database:

SELECT c.name + ', '
FROM sys.index_columns ic
JOIN sys.columns c
    ON c.object_id = ic.object_id
    AND c.column_id = ic.column_id
FOR XML PATH('')


This gives us all our detail data in a single row of comma separated values. Now we can parse this list into smaller lists and combine them with the indexes they belong to by embedding this query as a column sub-query in the SELECT list of the master query. By sorting the sub-query on the sys.index_columns.key_ordinal column we can make sure that the index key columns appear in the same order they appear in the index definition:

SELECT OBJECT_SCHEMA_NAME(i.object_id) + '.' + OBJECT_NAME(i.object_id) 'object_name'
, i.name 'index_name'
, i.type_desc 'index_type'
, (SELECT c.name + ', '
    FROM sys.index_columns ic
    JOIN sys.columns c
        ON c.object_id = ic.object_id
        AND c.column_id = ic.column_id
    WHERE ic.object_id = i.object_id
        AND ic.index_id = i.index_id
    ORDER BY ic.key_ordinal
    FOR XML PATH('')) 'index_key_columns'
FROM sys.indexes i
WHERE i.type > 0 AND OBJECT_SCHEMA_NAME(i.object_id) != 'sys'
ORDER BY 'object_name', i.index_id


The only thing that remains now is to strip off the trailing comma in each comma separated list. To do this we concatenate the comma before the index column name and then use the SUBSTRING function to chop off the first two characters of each list to remove the leading comma and space:

SELECT OBJECT_SCHEMA_NAME(i.object_id) + '.' + OBJECT_NAME(i.object_id) 'object_name'
, i.name 'index_name'
, i.type_desc 'index_type'
, SUBSTRING((SELECT ', ' + c.name 
    FROM sys.index_columns ic
    JOIN sys.columns c
        ON c.object_id = ic.object_id
        AND c.column_id = ic.column_id
    WHERE ic.object_id = i.object_id
        AND ic.index_id = i.index_id
    ORDER BY ic.key_ordinal
    FOR XML PATH('')
    ), 3, 2048) 'index_key_columns'
FROM sys.indexes i
WHERE i.type > 0 AND OBJECT_SCHEMA_NAME(i.object_id) != 'sys'
ORDER BY 'object_name', i.index_id


And there you have it - a simple query that combines master and detail records with the detail rows flattened into a comma separated list. Click here to download an SQL file containing all the examples used in this post.

Monday, March 14, 2011

Querying Active Directory with T-SQL

Recently I was asked to extract Windows domain user information from Active Directory, and load it into an employee database used by HR, running on SQL Server 2008. I had heard it was possible to query AD from SQL Server, but I had never done it before so the task turned out to be a good learning experience.

Now I will not pretend to be an expert on Active Directory or provide a detailed explanation of what it is and how it works. Instead, here is a TechNet article containing an introduction to AD.

Also, keep in mind that every company will have a different Active Directory structure so the example queries I have provided will probably have to be modified to work with your own AD service.

The first thing you need to set up in order to begin querying AD from SQL Server is to create a data source. The Active Directory Services Interface (ADSI) is an OLE DB provider available in SQL Server 7.0 and later that allows you to execute distributed queries against AD using a linked server. You can create the linked server using the GUI in Management Studio, or by using sp_addlinkedserver:

EXEC master.dbo.sp_addlinkedserver
  @server = N'ADSI'
, @srvproduct=N'Active Directory Services'
, @provider=N'ADsDSOObject'
, @datasrc=N'server_name.your_domain.com'

where @datasrc is the DNS name of your Windows domain controller. Next you will need to create a linked server login:

EXEC master.dbo.sp_addlinkedsrvlogin
  @rmtsrvname=N'ADSI'
, @useself=N'False'
, @locallogin=NULL
, @rmtuser=N'your_domain\domain_user'
, @rmtpassword='********'

where @rmtuser is a service account with permission to view AD objects, and @rmtpassword is the password for that account.

You should now be able to send a query to the ADSI linked server and get back data from AD. To do this use OPENQUERY. The following query should return a list of AD object distinguished names:

SELECT * FROM OPENQUERY (
  ADSI,
  'SELECT distinguishedName
  FROM ''LDAP://DC=your_domain,DC=com'''
)

Notice the LDAP URL in the FROM clause of the query that is passed to OPENQUERY. Replace your_domain with the name of your domain, and com with the appropriate top level domain if yours is not a .com (ie - net, org, etc). As you can see the LDAP URL needs to be encased in single quotes, which must be escaped with another pair of single quotes since the entire query that is passed to OPENQUERY must be encased in single quotes as well.

Assuming this statement returns data, and not an error, you should be able to move on to more useful AD queries. This is where things can get a little tricky though. Active Directory is not a relational database, so you can not simply browse through the schema in Management Studio, like you would a normal database, in order to figure out what objects to query. Attempting to open the Tables folder under the ADSI linked server object using the Object Explorer will return the following error:

Cannot obtain the required interface ("IID_IDBSchemaRowset") from OLE DB provider "ADsDSOObject" for linked server "ADSI". (Microsoft SQL Server, Error: 7301)

Fortunately, there is an MMC snap-in called Active Directory Schema that I found very useful for figuring out what classes, objects and attributes are available to query. The other problem I ran into is the syntax expected by ADSI is not ANSI standard SQL, and the error messages you get back from ADSI are not very helpful, so constructing the appropriate statement involved some trial and error. The following query demonstrates some of the syntactical differences in the way NULLs and wildcards are handled. Also, notice the addition of an Organizational Unit in the LDAP URL that narrows the query to just domain users, eliminating other resources like computers, printers, etc.

SELECT sn as LastName, givenName as FirstName,
  employeeNumber, userPrincipalName as UserName,
  mail as EmailAddr, title, department, manager
FROM OPENQUERY (
  ADSI,
  'SELECT sn, givenName, employeeNumber, userPrincipalName,
     mail, title, department, manager
  FROM ''LDAP://OU=Domain Users,DC=mydomain,DC=com''
  WHERE department = ''Marketing''
    AND userPrincipalName = ''*'' --IS NOT NULL
    AND userPrincipalName <> ''SI*''' --NOT LIKE 'SI%'
)

One BIG limitation of ADSI is that it will only return 1,000 rows in a single result set.
This limitation is actually hard coded into the OLE DB provider so there is no way around it other than to page through the desired AD objects with multiple query executions. The following code uses dynamic SQL to populate a table with domain user info by looping through the alphabet for the first letter of each user's last name:

DECLARE
  @adsiQuery NVARCHAR(1000),
  @sql NVARCHAR(1000),
  @alpha TINYINT

CREATE TABLE #DomainUser (
  UserName  VARCHAR(255),
  FirstName  VARCHAR(255),
  LastName  VARCHAR(255),
  EmployeeNum  VARCHAR(255),
  Email  VARCHAR(255),
  JobTitle  VARCHAR(255),
  Department  VARCHAR(255),
  Manager  VARCHAR(255)
)

SET @adsiQuery = N'SELECT userPrincipalName, givenName, sn, ' +
  'employeeNumber, mail, title, department, manager ' +
  'FROM OPENQUERY (ADSI, ''SELECT userPrincipalName, ' +
  'givenName, sn, employeeNumber, mail, title, department, ' +
  'manager ' +
  'FROM ''''LDAP://OU=Domain Users,DC=aruplab,DC=net'''' ' +
  'WHERE userPrincipalName = ''''*'''' ' +
  'AND userPrincipalName <> ''''SI*'''' ' +
  'AND sn = ''''<alpha>*'''''')'

SET @alpha = ASCII('A')
WHILE @alpha <= ASCII('Z')
BEGIN
  SET @sql = N'INSERT #DomainUser ' +
    '( UserName, FirstName, LastName, EmployeeNum, Email, ' +
    'JobTitle, Department, Manager ) ' +
    REPLACE(@adsiQuery, '<alpha>', CHAR(@alpha))
  EXEC(@sql)
  SET @alpha = @alpha + 1
END

Of course, this approach assumes there are not more than 1,000 users whose last names begin with the same letter! If you work for a very large company then you will probably need to take a different approach. You could loop through departments instead of last names, or even combine both with a nested loop.