P3.NET

Simplifying ADO.NET Data Access, Part 1

Even with the advent of ORMs, under the hood ADO.NET is still the core infrastructure used to access databases. There are many cases where using ADO.NET is still the easiest solution and oftentimes you find that you have to mix in some ADO.NET with your ORM calls. This is a testament to the solid design of ADO.NET. The biggest issue with ADO.NET is that it requires quite a bit of boilerplate code to retrieve data from the database. In this series of articles I’m going to demonstrate how I do data access when working with ADO.NET. The end result will be simple code to access any database in a (relatively) generic manner without the need for boilerplate code. Here’s a rough outline of where I’m going.

History

The code I’m going to show has been in use for 15 years in one form or another across multiple projects and databases. The original base code came from SqlHelper which was part of Microsoft’s Data Access Application Block. In the years I’ve been using and maintaining it the code has evolved from simple extensions to an entire rewrite to make the library general enough to be used across databases. Today the library is used to talk with SQL and Oracle code with only a line change. Other databases can be added with little difficulty. The library has also evolved to support generics, extend the base ADO.NET libraries (when needed) and expose a more fluent API where needed. The library itself continues to evolve as our usage changes (more on that later).

Where We’re Going

To start the series off we’ll take a look at typical ADO.NET code for retrieving data from a database. The actual database layout, column types and data returned isn’t relevant to the discussion so we’ll keep it simple. Here’s how you might go about retrieving some data using ADO.NET today.

public IEnumerable<Role> GetRoles ( )
{
	var roles = new List<Role>();

	using (var conn = new SqlConnection(m_connectionString))
	{
		using (var cmd = new SqlCommand("SELECT Id, Name FROM Roles", conn))
		{
			conn.Open();

			using (var dr = cmd.ExecuteReader())
			{
				while (dr.Read())
				{
					roles.Add(new Role(dr.GetInt32(0)) { Name = dr.GetString(1) });
				};
			};
		};
	};

	return roles;
}

By the time this series is complete it’ll collapse down to this.

public IEnumerable<Role> GetRoles ()
{
    var query = new AdhocQuery("SELECT Id, Name FROM Roles");

    return new SqlConnectionManager(m_connectionString).ExecuteQueryWithResults(query, dr =>
    {
        new Role(dr.GetInt32OrDefault("Id")) { Name = dr.GetStringOrDefault("Name") }
    });
}

In this example I am purposefully using a data reader rather than relying on a DataSet. In my experience a DataSet is useful for only a limited set of cases and doesn’t really provide a good real world example. While the simplifications I will show will work with them it won’t be the focus.

ADO.NET Basics

Before we get to far into the series it is important to understand some of the core tenets of ADO.NET. To make sure we’re on the same page here’s a quick overview of the above ADO.NET code. Note that in this discussion I’ll be referring to the generic, abstract versions of the types rather than a database-specific type.

To connect to a database you need a connection object (DbConnection). There is a specialized type for each database server product (SQL, Oracle, etc.). The connection requires a connection string identifying the database to connect to and any connection parameters. A connection isn’t made until the Open method is called. Once the connection is open it has to be closed either explicitly or when the instance is disposed. Because the connection uses an unmanaged resource it implements IDisposable. In the majority of cases the connection should be wrapped in a using statement. Under the hood the connection is reasonable for creating all the connection-specific types.

To execute a command you need a command (DbCommand). The command represents the text being sent to the database. It requires the query (adhoc or stored procedure), any parameters and a connection to run against. The command also implements IDisposable but only a couple of the existing implementations actually do anything with it. Nevertheless it needs to be wrapped in a using statement.

Any parameters to a command are represented by yet another object (DbParameter). The parameter contains the name (in database-specific format), type, value, direction and any other attribute needed to use it in the command. Since the parameter represents a value it does not need to be disposed.

To execute a command you call one of the various Execute methods depending upon whether you want a DataSet or DbDataReader returned. A full discussion of when to use each is beyond the scope of this post but, in general, small sets of data that you want to use directly should prefer DataSet while large sets of data or where you want to populate custom objects should use the reader. A DataSet is a snapshot of data in memory so it does not need to be cleaned up. A reader on the other hand is a live stream of data. When you are done with it then dispose of it. To simplify things the reader can automatically close the associated command to reduce the effort.

The final type of interest is DbTransaction. In many cases a set of changes need to be made as an atomic unit. A transaction allows a set of changes to be made as a single unit. They either all succeed or they all fail. Committing the transaction is an explicit step that must be done. If the transaction is disposed before the commit occurs then all changes are rolled back. Not every command needs a transaction but it is there if you need it.

Providers

One of the bigger issues with ADO.NET is that the types are strongly tied to the database server type (the provider). The boiler plate code remains basically the same except for the specific types being used. ADO.NET tried to resolve this in later versions by introducing the DbProviderFactory. Following the Factory pattern, this type allows you to create command, parameters and other types without regard for the actual provider. The factory returns the server-agnostic types mentioned earlier.

To get to the factory you can use one of several approaches:

By wrapping this in a simple connection manager type you can hide the actual provider information from the remainder of your code making your code more database agnostic. The provider factory will come into play later when we generalize the database types. The factory will allow us to write a general purpose layer that we can use for any database provider without the need for specialization. Of course for some core providers we will still specialize the functionality but this will get us up and running quicker.

Before moving on it is important to ask the question “Do we really need to write database-agnostic code? What do we gain by doing so?” I tend to agree that when writing code you will already have selected your provider and it is unlikely to change. But there is something to be said for being able to take a single code block and use it anywhere you need a database query without having to make lots of little changes just because of the database provider. So while reuse within a single project is not necessarily a goal it is certainly beneficial across multiple projects involving differing databases.

Getting Results

When executing a command you can get 1 of several possible results back. For each of these results ADO.NET provides a separate method. It is useful to look at those now.

  • ExecuteNonQuery – This method is used for executing non-select commands like INSERT or UPDATE. It does return a value but the return value is pretty much ignored.
  • ExecuteScalar – This method is used to return a single value from a query. This is useful when you’re returning a single value from a query like a count or something.
  • ExecuteReader – This is the primary method used to retrieve results from a SELECT statement. It uses a reader to stream the results one row at a time. It is the most efficient approach when dealing with large result sets. The data read must be stored by the application before moving on to the next row.
  • FillDataSet – This method fills a dataset from a query. Unlike the other methods this method requires that you use a DataAdapter. The adapter associates the command(s) to execute to operate on the data set.

If you have never worked with a data set then here is a quick summary. A data set represents a set of (generally) related data. It is very much like a database. Within a data set (and database) are tables (DataTable). A data set allows you to specify the relationships between tables so that constraints can be enforced. Each table has a set of columns (DataColumn) that describe the data it stores. Additionally each table has a set of rows (DataRow) that represent the actual data. Before data can be stored in a row the columns have to be defined.

Each row of a table tracks its state (added, modified, deleted). In ADO.NET you generally update the entire data set when saving changes so that all the relationships are saved together. Because each row tracks its own state ADO.NET can identify what needs to be inserted, updated, deleted and apply only those changes to the database. Because any of these operations can be performed you need to associate multiple commands to the update. This is handled by using a DataAdapter. Under the hood all queries are performed using a reader. When you are working with a data set ADO.NET simply enumerates the reader and builds the tables for you.

Here’s my recommendations based upon your query needs.

  • INSERT, UPDATE, DELETE – Use ExecuteNonQuery
  • SELECT of a single value – Use ExecuteScalar
  • Retrieving a small number of rows – Use FillDataSet
  • Retrieving any number of rows – Use ExecuteReader

Next time we will take a look at making ADO.NET types easier to use.

Download the Demo Code

Download the Library Code

Comments

  1. Hi,

    First of all, great blog. I’ve learned a lot from your posts. My sincere thanks.

    On a second note, I’m sorry for posting it here, but the original entry doesn’t allow comments: http://blogs.msmvps.com/p3net/2013/01/11/hosting-a-private-visual-studio-gallery/

    I followed your steps, that are basically the ONLY info about that on the web (and for that, thanks again), but when I try to upload MSI extensions I’m unable to, since Visual Studio always sees it as a VSIX. In your post you say that you say that “not much changes” but changing the URL and so on, makes Visual Studio treat it the same way as a VSIX, and therefore, gives an error for “Not Supported VSIX Package”.

    Can you please share your knowledge on the matter? I’ve tried everything I could think of, but the documentation, as you said, is almost inexistent, and so far your blog is the only good source on the matter.

    Thanks in advance!

    1. If you’re using the ATOM feed gallery that I posted about then it doesn’t support MSI files, just VSIX. The author has released a new version of the gallery that is a more feature rich version but I haven’t had time to play around with it yet.