P3.NET

Adding Dates to C#

UPDATE: A bug was recently found when determining the difference between 2 dates across a year boundary. A corrected version has been uploaded to the site.

In .NET the DateTime type represents both a date and a time. .NET doesn’t really have a pure date type. Ironically it does have a pure time type in the form of TimeSpan. In general this isn’t an issue but when you really just need a date (i.e. the end of a grading period) then you have to be aware of the time. For example when comparing 2 date/time values the time is included in the comparison even if it does not matter. To eliminate time from the comparison you would need to set both to the same value (i.e. 00:00:00). Another example is conversion to a string. A date/time value will include the time so you have to use a format string to eliminate it. Yet another issue is that creating specific dates using DateTime isn’t exactly readable (i.e. July 4th, 2000 is new DateTime(2000, 4, 4)).

Read More

Language Friendly Type Names

.NET uses Type everywhere to represent type information.Not surprisingly Type is language-agnostic. In many cases it is useful to get the friendly (aka language-specific) name for a Type object. .NET does not provide this easily. There are several different approaches but none of them work really well if you want the name that would have been used in your favorite language. This post will discuss some of the options available and then provide a more general solution to the problem that doesn’t actually require much effort.

Read More

Framework Compatibility Going Downhill

UPDATE: My post triggered an e-mail exchange with Rowan Miller from the Entity Framework group. He wanted to explain and clarify some of my comments around EF. I’ve added them in the appropriate sections below. Thanks for the clarifications Rowan!

I understand the need for making some breaking changes to the framework as we transition to newer versions but one of the big selling features of .NET has been writing apps without having to worry too much about having the exact same version on end machines. Microsoft seems to be moving away from this goal with newer releases. .NET 4.5, as a framework, has a lot of nice, new features. But as the next version of the framework I think it fails on many levels. I’m fearful that this is becoming the new trend at Microsoft given the recent releases of .NET and other libraries.

Read More

Reading and Writing to Streams

Streams in .NET are prevalent. Most everything that requires input or output accepts a stream. The issue with streams is that they are too generic. They only support reading and writing bytes (or byte arrays). Since streams can be read only or write only this makes some sense. The reality though is that most times you know whether a stream is readable or writable. If worse comes to worse you can query the stream for read and write access. But you’re still stuck with reading and writing bytes. To make working with streams easier Microsoft introduced the BinaryReader/BinaryWriter types. These stream read/writers allow you to read and write CLR types to an underlying stream. The theory being that the code is more readable if you explicitly create a reader or writer. Here is an example of writing some data to a stream.

Read More

Comparing Characters

.NET provides great support for comparing strings.  Using StringComparer we can compare strings using the current culture settings or with case insensitivity.  This makes it easy to use strings with dictionaries or just compare them directly.  As an example we can determine if a user is a member of a particular group by using the following code.

bool IsInSpecialGroup ( string user )
{
   var groups = GetUserGroups(user);

   var comparer = StringComparer.CurrentCultureIgnoreCase;
            
   foreach (var group inn groups)
   {
      if (comparer.Compare(group, “group1”;) == 0 ||
          comparer.Compare(group, “group2”;) == 0 ||
          comparer.Compare(group, “group3”;) == 0)
         returnn true;
   };

   return false;
}

Characters have most of the same problems as strings do (culture, case, etc) but .NET provides almost no support for comparing them.  If we want to compare characters without regard to case or in a culture sensitive manner we have to write the code ourselves.  Fortunately it is easy to emulate this using the existing infrastructure.  CharComparer is a parallel type to StringComparer.  It provides identical functionality except it works against characters instead of strings.  As an example the following code determines if a character is a vowel or not.

bool IsAVowel ( char ch )
{
   var vowels = new char[] { ‘a’‘e’‘i’‘o’‘u’ };

   var comparer = CharComparer.CurrentCultureIgnoreCase;

   foreach(var vowel in vowels)
   {
      if (comparer.Compare(vowel, ch) == 0)
         return true;
   };

   return false;

Just like StringComparer, CharComparer has static properties exposing the standard comparison types available in .NET.  Furthermore since StringComparison is commonly used in string comparisons CharComparer provides the static method GetComparer that accepts a StringComparison and returns the corresponding CharComparer instance. 

bool ContainsACharacter ( this IEnumerable<char> source, char value, StringComparison comparison )
{
   var comparer = CharComparer.GetComparer(comparison);

   return source.Contains(value, comparer);            
}

CharComparer doesn’t actually do comparisons directly.  This process can be difficult to get right so it just defers to StringComparer internally.  Naturally this means that CharComparer doesn’t actually do anything different than you would normally do nor does it perform any better.  What it does do is provide an abstraction over the actual process and simplify it down to a couple of lines of code.  If, one day, .NET exposes a better way of comparing characters then CharComparer can be updated without breaking existing code.  Even better is that your code can use CharComparer and StringComparer almost interchangeably without worrying about the details under the hood.

CharComparer implements the standard comparison interfaces: IComparer<char>, IComparer, IEqualityComparer<char> and IEqualityComparer.  The non-generic versions are privately implemented to enforce type safety.  The generic methods are abstract as is CharComparer.  Comparison is specific to the culture being used.  CharComparer defines a couple of nested, private types to implement the culture-specific details.  The nested types are responsible for providing the actual implementation of the comparison methods.  Refer to the source code for the gory details.  Note that this pattern is derived from how StringComparer works.

Feel free to use this code in any of your projects and provide feedback on any issues found.  Unfortunately I’m not posting the unit tests for this class at this time.  However I’ve used this type in several projects and haven’t run into any problems with it.  But, as always, test your code before using it in production.

ServiceBase.OnStart Peculiarity

When implementing a service you really have to have a good understanding of how Windows services work.  If you do it wrong then your service won’t work properly, or worse, can cause problems in Windows.  Services must be responsive and be a good citizen when working with the Service Control Manager (SCM).  The .NET implementation hides a lot of these details but there is a hidden complexity under the hood that you must be aware of.  But first a brief review of how Windows services work.

Windows Services Internals (Brief)

All services run under the context of the SCM.  The SCM is responsible for managing the lifetime of a service.  All interaction with a service must go through the SCM.  The SCM must be thread safe since any number of processes may be interacting with a service at once.  In order to ensure that a single service does not cause the entire system to grind to a halt the SCM manages each service on a separate thread.  The exact internal details are not formally documented but we know that the SCM uses threads to work with each service. 

Each service is in one of several different states such as started, paused or stopped.  The SCM relies on the state of a service to determine what the service will and will not support.  Since state changes can take a while most states have a corresponding pending state such as start pending or stop pending.  The SCM expects a service to update its state as it runs.  For example when the SCM tells a service to start the service is expected to move to the start pending state and, eventually, the started state.  The SCM won’t wait forever for a service to respond.  If a service does not transition fast enough then the SCM considers the service hung.  To allow for longer state changes a service must periodically notify the SCM that it needs more time.

One particularly important state change is the stop request.  When Windows shuts down the SCM sends a stop request to all services.  Every service is expected to stop quickly.  The SCM gives a (configurable) time for each service to stop before it is forcifully terminated.  If it wasn’t for this behavior a hung or errant service could cause Windows shutdown to freeze.

A Day In the Life Of a Service

A service is normally a standard Windows process and hence has a WinMain.  However a single process can host multiple services (many of the Windows services are this way) so WinMain itself is not the service entry point.  Instead a service process must register the list of supported services and their entry points to the SCM via a call to StartServiceCtrlDispatcher.  This method, which is a blocking call, hooks up the process to the SCM and doesn’t return until all listed services are stopped.  The method takes the service name and its entry point (normally called ServiceMain).  When the SCM needs to start a service it calls the entry point on a separate thread (hence each service gets its own in addition to the process).  The entry point is required to call RegisterServiceCtrlHandlerEx to register a function that handles service requests (the control handler).  It also must set the service state to start pending.  Finally it should initialize the service and then exit.  The thread will go away but the service will continue to run. 

One caveat to the startup process is the fact that it must be quick.  The SCM uses an internal lock to serialize startup.  Therefore services cannot start at the same time and a long running service can stall the startup process.  For this reason the general algorithm is to set the state to start pending, spawn a worker thread to do the real work and then set the service to running.  Any other variant can slow the entire system down.

All future communication with the service will go through the control handler function.  Each time the function is called (which can be on different threads) the service will generally change state.  This will normally involve changing to the pending state, doing the necessary work and then setting the service to the new state.  Note that in all cases the SCM expects the service to respond quickly.

.NET Implementation

In .NET the ServiceBase class hides most of the state details from a developer.  To ensure that the service is a good citizen the .NET implementation hides all this behind a few virtual methods that handle start, stop, pause, etc.  All a developer need do is implement each one.  The base class handles setting the state to pending and to the final state while the virtual call is sandwiched in between.  However the developer is still responsible for requesting additional time if needed.  Even the registration process is handled by the framework.  All a developer needs to do is call ServiceBase.Run and pass in the service(s) to host.

 All is wonderful and easy in .NET land, or is it.  If you read the documentation carefully you’ll see a statement that says the base implementation hides all the details of threading so you can just implement the state methods as needed but this is not entirely true.  All the implementations except OnStart behave the same way.  When the control handler is called it sets the service to the pending state, executes the corresponding virtual method asynchronously and returns.  Hence the thread used to send the request is not the same thread that handles the request and ultimately sets the service state.  This makes sense and meets the requirements of the SCM.  More importantly it means the service can take as long as it needs to perform the request without negatively impacting the SCM.

The start request is markedly different.  When the start request is received the base class moves the service to the start pending state, executes the OnStart virtual method asynchronously and then…waits for it to complete before moving the service to the start state.  See the difference?  The start request thread won’t actually return until OnStart completes.  Why does the implementation bother to call the method asynchronously just to block waiting for it to complete?  Perhaps the goal was to make all the methods behave symmetrically in terms of thread use.  Perhaps the developers didn’t want the service to see the real SCM thread.  Nevertheless it could have used a synchronous call and behaved the same way. 

What does this mean for service developer?  It means your OnStart method still needs to run very fast (create a thread and get out) even in the .NET implementation even though all the other control methods can be implemented without regard for the SCM.  If OnStart takes too long then it’ll block the SCM.  More importantly the OnStart method needs to periodically request additional time using RequestAdditionalTime to avoid the SCM thinking it is hung.

Summary

When implementing a service in .NET it is still important to understand how native services and the SCM work together.  The OnStart method must be fast and well behaved to avoid causing problems with Windows.  The other control methods are less restrictive but still require awareness of how the base implementation works.  Writing a service is trivial as far as coding goes but services require a great deal of care in order to ensure they behave properly.  This doesn’t even get into the more complex issues of security, installation, event logging and error handling which are broad topics unto themselves.

String Extension Methods

Haven’t posted in a while and don’t have a lot of time today so I’ll just throw up a copy of the string extension methods I’ve been using over the years.  Summary of the provided functions (note that not all of them have been fully tested).

  • Combine – Basically acts like String.Join but handles cases where the delimiters are already part of the string.
  • Is… – Equivalent to Char.Is… but applies to an entire string.
  • Left/Right – Gets the leftmost/rightmost N characters in a string.
  • LeftOf/RightOf – Gets the portion of a string to the left/right of a character or string.
  • Mid – Gets a portion of a string.
  • IndexOfNone/LastIndexOfNone – Finds the index of the first character NOT IN a list of tokens.
  • ReplaceAll – Replaces all occurrences of a token with another token.
  • ToCamel/ToPascal – Camel or Pascal cases a string.
  • ToMultipleWords – Pretty prints a string such as taking SomeValue and converting it to “Some Value”.

Download the Library Code

Default Task Scheduler

Firstly, the new Task API in .NET v4 is awesome.  It really makes working with threading easier.  Unfortunately you really have to understand how it works otherwise you will likely make the code harder to read and use.  Here’s an interesting case I ran into recently.

Attached is a simple WinForm application that is, I suspect, relatively common.  The application displays a simple UI with a button.  When the button is clicked a lengthy operation is performed.  To let the user know that the application is working a progress bar is shown.  The user can cancel the operation through a button at any time.  When the task completes a message box appears displaying success or failure.  Here is the important parts of the code used to start the lengthy work on another thread using the Task API.

private void StartWork ( )
{
   m_cancel = new CancellationTokenSource();

   var scheduler = TaskScheduler.FromCurrentSynchronizationContext();

   var task = Task.Factory.StartNew(DoWork, m_cancel.Token, m_cancel.Token)
                .ContinueWith(OnFinished, CancellationToken.None, TaskContinuationOptions.OnlyOnRanToCompletion, scheduler)
                .ContinueWith(OnCancelled, CancellationToken.None, TaskContinuationOptions.OnlyOnCanceled, scheduler)
                .ContinueWith(OnError, CancellationToken.None, TaskContinuationOptions.OnlyOnFaulted, scheduler);
}

Straightforward stuff – start a new task and, based upon the results, call one of several methods to finish up.  The cleanup methods are all called on the UI thread so we can update the UI.  DoWork just loops for 10 seconds checking the cancellation flag.  If you run the sample application then you should see the UI being responsive while the task runs.  Here’s the cancel method.

private void OnCancelled ( Task task )
{
   try
   {
      task.Wait();
   } catchh
   { /* Eat it *// };

   if (MessageBox.Show(“Task cancelled.  Try again?”;“Question”, MessageBoxButtons.YesNo) == DialogResult.Yes)
   {
      StartWork();
   };                        
}

Remember that the task resources remain allocated until you do something to let the framework know you are working with the completed task.  Such activities include checking the Exception or Result properties or calling Wait.  The cancel method calls Wait to complete the task and then asks the user if they want to try again.  If the user clicks yes then a new task is started.

Think about what is happening during this process.  Initially when the user clicks the button to start the task the UI thread calls StartWorkStartWork starts a new task on a non-UI thread and returns.  In the cancel method, which is run on the UI thread, the StartWork method is called to do the same thing.  They should behave identically – but they don’t.  Try running the program, starting the task, cancelling the task and then clicking Yes to restart the task.  The UI freezes while the (new) task is being run.  What is going on here?

I can honestly say I have no idea.  It appears as though the factory, when it is trying to figure out what scheduler to use, uses the current scheduler if there is already one available.  But this wouldn’t make any reasonable sense.  Nevertheless the sample program demonstrates this very behavior.  Fortunately the fix is easy.  You just need to make sure that you ALWAYS specify a scheduler when creating a new task.  In this case the default scheduler is fine.  If you modify StartWork to look like this then the problem goes away.

private void StartWork ( )
{
   m_cancel = new CancellationTokenSource();

   var scheduler = TaskScheduler.FromCurrentSynchronizationContext();

   //var task = Task.Factory.StartNew(DoWork, m_cancel.Token, m_cancel.Token)
   var task = Task.Factory.StartNew(DoWork, m_cancel.Token, m_cancel.Token,TaskCreationOptions.None, TaskScheduler.Default)   
                    …
}

The only difference is that we are now specifying that the default scheduler should be used.  You would assume that the default scheduler to use when no scheduler is defined would be the default scheduler but I guess not.  Nevertheless this is a really odd behavior that might or might not be a bug in the implementation. 

Extension Method Guidelines Rebuffed

With the additional of extension methods developers have a powerful new way of extending existing types.  As extension methods become more and more common guidelines will be needed to ensure that extension methods “fit in” with the general design of the library.  If you are not aware of what an extension method is then the quick summary would be that it is a method that appears as an instance method of a type but is, in fact, defined in a separate static class.  This allows us to extend existing types without modifying the type itself.  It really becomes a powerful feature when applied to interfaces.  We can enhance an interface to expose a lot of new functionality while not requiring the interface implementer to do any extra work.  LINQ is a good example of this extensibility.  I personally tend to view extension methods more as static methods on a type  that be called using the instance-method syntax.  This, to me, gives a more accurate picture of what is happening and, as you will see, what is allowed.

Currently there are a lot of “guidelines” floating around that people like to mention.  Unfortunately it is still too early to determine which of these are truly best practices and which are opinions.  One guideline in particular bothers me the wrong way: extension methods should throw a NullReferenceException if the source is null.

Let’s start with a regular method as an example.  This method will return everything to the left of a particular string in another string.

static string Leftof ( string source, string value )
{
   if (String.IsNullOrEmpty(source))
      return “”;

   if (String.IsNullOrEmpty(value))
      return “”;
   …
}

This is a pretty handy function so it makes sense to make it an extension method so we might do this.

static class StringExtension
{
   public static string Leftof ( this string source, string value )
   {
      if (String.IsNullOrEmpty(source))
         return “”;

      if (String.IsNullOrEmpty(value))
         return “”;
      …
   }
}

So far so good but it violates the aforementioned rule about throwing an exception if a null is passed in.  The rationale is that an extension method appears to the developer as an instance method and should therefore behave like one.  While that is a reasonable argument I don’t think it applies in all cases.  Why do we have static methods on types, such as String.IsNullOrEmpty?  The general reason is because the method does not need an instance to perform its work.  But if the method itself considers null to be valid then the method would also need to be static.  So syntactically we cannot allow an instance method to accept null but static methods (and by definition extension methods) we can.

One of the big arguments against this approach is that “extension methods should look and act like instance methods”.  Why?  The only real reason I can see is because they look like instance methods when used but we, as developers, are use to the actual code being quite different than what we expect.  The assignment operator, for example, we would never expect to throw an exception yet it can if the type implements the assignment operator and does something inappropriate.  Addition, subtraction and other operators are the same way.    How about the relational operators? 

Data value1 = null;
Data value2 = null;
            
bool result = value1 < value2;

We would never expect the above code to throw an exception and yet a method call is occurring under the hood.  If that method throws an exception then so does the above code.  As developers we have to be aware that we don’t know everything that is going on under the hood so we should expect errors anywhere.

Another problem I have with this guideline is the solution.  The guideline is that extension methods should behave like instance methods but that simply isn’t possible.  If you call an instance method with a null reference then you will get a NullReferenceException.  This particular exception is one of a handful of system exceptions.  A system exception is an exception that is raised by the runtime proper.  Normal code, aka ours, should not throw system exceptions.  If you explicitly throw a system exception then code analysis tools like FxCop will generate CA warnings about it.  That is what you want.

Instead we have “application” exceptions.  Hence when we receive an argument that is null and we do not allow them then we throw ArgumentNullException.  So to make our extension method behave like an instance method we would need to either throw a system exception explicitly (which we aren’t suppose to do) or reference the source variable.  Here’s what we’d have to do to our earlier example to get it to throw the appropriate exception while still making code analysis happy.

public static string Leftof ( this string source, string value )
{
   //Force a NullReferenceException
   int len = source.Length;

   if (String.IsNullOrEmpty(value))
      return “”;

   …
}

I have a really big problem writing code just to force something to happen like throw a specific exception.  If I really wanted the exception I’d just throw ArgumentNullException.  But if I do that then I’m no longer making my extension method act like an instance method.  This becomes even more of an issue if you can short-circuit the method return based upon other argument values. 

A final argument for the guideline is that if an extension method eventually becomes an instance method then your application should behave the same but the reality is that you should have unit tests to verify your application’s behavior so a major change like going from extension to instance should be thoroughly tested anyway.  Here are my guidelines for extension methods.  Take them with a grain of salt.

  • Extension methods should be placed in a static class called TypeExtension.  For interfaces the “I” can be left off.
  • Extension classes should be only for extension methods and should not contain non-extension code.
  • Each extension method’s first parameter should be this Type source.
  • Use an extension method only if the source parameter makes sense and is used in the method.  Use a normal static method otherwise.
  • An extension method should not throw ArgumentNullException for source.    Other arguments are fine.  If source cannot be null then referencing the value will be sufficient to generate the correct exception.
  • If null is not necessarily an invalid value then do not throw any exceptions.  An extension method does not need to follow instance method semantics.
  • Document the behavior if source is null if it is not going to throw an exception.

When Regular Expressions Go Bad

As a compiler guy I’m very comfortable with “standard” regular expressions (REs).  After all scanners for compilers are predominantly written using regular grammars (of which REs are derived).  REs have been around since the early days of computers and yet are still not truly standardized.  Each implementation adds its own twists and rules.  As a result an RE written for one implementation might or might not work (or behave) the same in another.  Therefore whenever REs are involved proper testing is important.  Excluding this detail REs are really, really useful for validating string formats.  While REs cannot validate all string formats they can handle a large subset.  Hence I tend to use REs for validating string formats whenever possible.  One of the big benefits of REs is speed.  Most parsing can be done through REs faster and easier than through hand-generated code.

In .NET The Regex class is available for writing REs.  Here is a sample piece of code I wrote to validate that a string is a properly formatted DNS name.

private static bool Validate ( string value )
{
   Match m = Regex.Match(value, @”^(?<label>[a-zA-Z]([w-]*[a-zA-Zd])?)(.(?<label>[a-zA-Z]([w-]*[a-zA-Zd])?))*$”);
   if ((m == null) || !m.Success)
      return false;

   //Validate the length of each label
   foreach (Group group in m.Groups)
   {
      if (group.Length > 63)
         return false;
   };

   return true;
}

In a nutshell a properly formatted DNS name obeys the following rules::

  • The name is divided into one or more parts. Parts are separated by a dot.
  • Each part must begin with a letter.
  • Each part must end with a letter or digit.
  • Parts may contain letters, digits or dashes (and underscores if you are supporting NetBIOS).
  • A part must be between 1 and 63 characters long.

The above code uses an RE to validate all but the last rule.  In .NET REs you can capture substrings out of the matched expression using capture groups.  In the above code I capture each part separately and then, if the RE validates, enumerate the parts to ensure they are of the appropriate length.   Overall the code is small and easy to read.  I tested the code (using an outside program) against valid and invalid strings to ensure all the rules were correct.  This code is about a year old and is used in my application as part of UI validation, amongst other things.

The other day I was writing some code that relied on the DNS validation code shown earlier.  Basically as the user typed in a textbox the string was validated and controls were enabled or disabled (along with error messages).  So far so good.  I was trying to test an aspect of the UI so I happened to enter a medium size string and accidentally added a semicolon to the end (dfafddafafadfafsadf;).  My application locked up.  Concerned I stepped through my code and found that it was locking up inside my validator (in the RE matching).  Hmm.  It appears that the RE caused an infinite loop.  To verify it was .NET and not something with my code I copied the RE and the input to an external program that happens to use .NET as well and it also locked up.

Actually it was not an infinite loop it just took a really, really long time to finish up.  That’s odd.  What makes the particular input interesting was that it was long and would ultimately fail validation because of an invalid character.  I checked my RE to determine if I was doing anything that might cause the parser to have to do backtracking.  Backtracking occurs when a parser gets into an error state and decides to back up the input until it can recover so it can try a different path.  As strings get longer this can become very time consuming.  With this particular input the parser would reach the semicolon and, in this case, deem the input invalid.  Nowhere in the RE is a semicolon legal.  In my mind the parser should not have to do any backtracking because the semicolon is not valid anywhere in the RE.  Backtracking would do nothing but waste time.  Nevertheless this does not seem to be the case with the .NET version.  Note that this doesn’t mean the parser has a bug but rather the implementation has some worse case paths for some grammars.

Theoretically I could probably modify my RE to use either aggressive parsing and/or different rules but this would require me to: look up how to do it in .NET, document this behavior for other devs who might need to maintain it and test it thoroughly since it might introduce a completely separate set of issues.  Instead I decided that the validation rules are simple enough that I could just ditch the RE and implement the logic myself.  Here is the new version sans RE.

private static bool Validate ( string value )
{
   string[] parts = value.Split(‘.’);
   foreach (string part in parts)
   {
      int len = part.Length;
      if (!Range.InRange(len, 1, 63))
         return false;

      //Validate the part
      if (!Char.IsLetter(part[0]) || !Char.IsLetterOrDigit(part, len – 1))
         return false;

      foreach (char ch in part.Skip(1))
      {
         if (!Char.IsLetterOrDigit(ch) && (ch != ‘_’;) && (ch != ‘-‘))
            return false;
      };
   };

   return true;
}

After implementing this new version I confirmed that my inputs (valid and invalid) still work.  Even better is that the performance is just as fast as the original RE.  Even better than that, the code is even more readable with no loss of functionality or time.  Note that I haven’t fully tested this code yet so if you borrow it then be sure to test it before using it.

Consider it a lesson learned.  Use REs when they meet your need.  When you do use REs be sure to create enough test cases to verify the behavior.  Specifically make sure you test invalid characters throughout strings of varying length and verify the parser does not take too long to run.  If you do run across worse case inputs then either special case these or switch to manually parsing the code.