Using Regular Expressions to Convert Most Formatted Dates into DateTime Stamps (AutoHotkey Tool)

AutoHotkey Offers Many Techniques for Converting the DateTime Stamp (yyyymmdd) into Formatted Dates, But What About Going in the Other Direction? Use RegEx to Identify Date Formats!

DateStampThe HowLongYearsMonthsDays.ahk function calculates the difference in years, months, and days between any two dates. To manually set the two dates, the script employs two DateTime GUI controls—input dates saved in the DateTime Stamp format (i.e. yyyymmdd) and the output in years, months and days. But wouldn’t you find it easier if you could highlight the dates in any document or Web page regardless of format, then use AutoHotkey to convert and copy the DateTime Stamps directly into the DateTime GUI controls?

HowLongOwnDialog

The problem with converting various formatted dates into the DateTime Stamp used in most AutoHotkey functions and commands gets complicated by the many forms of date formats found in documents. The date October 22, 2018 appears as “Monday, October 22, 2018”, “Oct 22, 2018”, “10/22/2018”,  “10-22-18”, and more—and that only refers to the American date formats.

Library Benefits

Some Solutions Can Get Too Complex

I did a search of the Web for solutions to this problem and found one AutoHotkey Regular Expression function which “sort of” does the job. (Take a look at the linked forum thread. The RegEx will blow your mind.) While that expression works most of the time, it had problems with the differences between various US and British date formats, as well as, all-numeric date formats (e.g. US 12-11-2018 versus UK 11-12-2018) where ambiguity occurs.

While I use Regular Expressions in my solutions, rather than a one-size-fits-all solution, I deal with each issue independently:

  1. US date formats put the text name of the month first (i.e. January 23, 2018, or Jan 23, 2018). AutoHotkey recognizes these dates base upon the location of the text month name (first).
  2. British date formats put the text name of the month second (i.e. 23 January 2018 or 23 Jan 2018). AutoHotkey recognizes these dates base upon the location of the text month name (second).
  3. AutoHotkey can instantly recognize certain numeric date formats:
    • US numeric date formats which place the month first with a day of the month greater than 12 (i.e. 1/23/2018, 1-23-18, etc). If a number over 12 occurs in the second set of digits, AutoHotkey recognizes these dates as the US format based upon that second number (over 12) appearing in the expression.
    • British numeric date formats which place the month second with a day of the month greater than 12 (i.e. 23/1/2018, 23-1-18, etc). If a number over 12 occurs in the first set of digits, AutoHotkey recognizes these dates as the British format based upon that first number (over 12) appearing in the expression.
    • However, when both the month number and day of the month show up as 12 or less, we find the US/British date formats ambiguous. You must tell the computer which one to use. Therefore, the script asks the user to pick a format (US or British) to clear up the confusion.

The DateStampConvert.ahk script parses selected date formats which randomly appear in documents, e-mail, Web pages, and other sources processing each selection with the Standard AutoHotkey Clipboard Routine, then returning the appropriate standard DateTime Stamp.

By separating the format types into conditions, I created solutions which add more universality then that shown in the linked Regular Expression while using reasonably simple RegExs—although, the user must designate the input format type (US or British) when the script encounters those ambiguous numeric dates.

Note: I say British date formats rather than European date formats because the various spellings of the months in other languages complicate the problem. Although, you can easily rewrite this conversion function for a specific language by adding the appropriate text switches to the MonthConvert() function found within the script.

The DateStampConvert.ahk script converts dates selected in documents into DateTime Stamps for use in other commands, functions, and scripts (such as HowLongYearsMonthsDays.ahk). In this blog, I discuss the Regexs required to discern the various date formats. In future blogs, I plan to discuss a number of other AutoHotkey learning points exposed in this script:

  1. The MakeStamp() function returns the DateTime Stamp.
  2. The MonthConvert() function, a switch-like function using the ternary operator converts month names into their respective numeric values.
  3. The YearCheck() function converts two-digit years into the appropriate century.
  4. A discussion of global variables, returned values, versus ByRef values in functions.
  5. How to test for a valid date.

RegExs for Date Formats

By far, Regular Expressions offer the best method for identifying a variety of date formats. Any other method would require much more tedious techniques.

Cover 200Note: The script in this blog uses Regular Expressions (RegEx) to power through without otherwise complicated matching functions. If you don’t know RegEx, the book Beginner’s Guide to Using Regular Expressions in AutoHotkey offers a gentle introduction into how to use RegEx for maximum effectiveness.

Identifying Alphanumeric Dates

Most date formats you see in documents include three pieces: the day (always numeric), the month (text or numeric), and the year (either two or four digits). The US date format puts the month first while British dates place the day first. Most common date formats appearing in everyday communications end with the year. By resolving these formats into sets of conditions, we create alternative expressions for each possibility: mixed alphabetic/numeric format (text month names for both US and British formats) and all numeric dates (numeric month names for US formats, British formats, and ambiguous formats).

US Alphabetic Month Format

The first RegEx below matches the Clipboard contents to three pieces of the date. First, it looks for alphabetic month characters, then two sets of digits—day (one or two digits) and year (two or four digits). In between the three, the RegEx consumes any type of spaces, symbols, or characters using the ungreedy .*? wildcard.

Using the RegExMatch() function, the routine checks for a valid month “name/month day/year” date format:

RegExMatch(Clipboard, "^([[:alpha:]]+).*?(\d\d?).*?(\d\d\d?\d?)$", Date))

The RegEx opens with the beginning anchor ^ and terminates with the end anchor $. This limits the match to the selected date enclosed between the anchors preventing some possible matching errors. To stop the accidental inclusion of any space characters when selecting dates in a document, the script adds:

 Clipboard := Trim(Clipboard)

to the Standard AutoHotkey Clipboard Routine shown in the DateStampConvert.ahk script.

The key to identifying this type of US date format requires the use of an expression which only matches alphabetic characters—not numbers—for the text month name—regardless of its length (e.g. Jan or January). We cannot use the universal \w syntax (e.g. [a-zA-Z0-9_]) since it matches digits as well as letters.

The POSIX Bracket Expressions (“Portable Operating System Interface for uniX“) offer abbreviated terms for matching types of characters. Here we use [[:alpha:]] rather than [a-zA-Z] to match any alphabetic form of the month—although either one works. From the documentation:

The following POSIX named sets are also supported via the form [[:xxx:]], where xxx is one of the following words: alnum, alpha, ascii (0-127), blank (space or tab), cntrl (control character), digit (0-9), xdigit (hex digit), print, graph (print excluding space), punct, lower, upper, space (whitespace), word (same as \w).

The \d\d? expression matches the numeric day. The question mark after the second digit \d? allows the expression to optionally match one or two digits.

The \d\d\d?\d? expression matches the year. The question mark following the third and fourth digits \d?\d? allows the expression to optionally match two or four digits.

We enclose all three subpatterns in parentheses (month, day, year). AutoHotkey uses these to determine the numeric month Date1, the day of the month Date2, and the year Date3 for concatenating into the final DateTime Stamp.

British Alphabetic Date Format

To create the British RegEx we exchange the position of the first two expressions.

RegExMatch(Clipboard, "^(\d\d?).*?([[:alpha:]]+).*?(\d\d\d?\d?)$", Date)

Swapping the first two terms in the RegEx places the alphabetic month name subpattern second (Date2).

Tip: Since the .?* symbols consume none or more characters, both 13Jan18 and 13 Jan 18 yield the same valid result.

US/British Numeric Date Format (Days Over 12)

This next all numeric RegEx parses both US and British dates:

RegExMatch(Clipboard, "^(\d\d?).*?(\d\d?).*?(\d\d\d?\d?)$", Date)

AutoHotkey can only determine the format type based upon the values for the numeric month and day. Whenever one of the two exceeds 12, it represents the day.

Two conditions required:

  If (Date1 > 12)        ; British date format
      Return MakeStamp(Date3,Date2,Date1)    ; 13-10-18
  Else If (Date2 > 12)   ; US date format
      Return MakeStamp(Date3,Date1,Date2)    ; 10-13-2018

Notice that the MakeStamp(Year,Month,Day) function passes parameters in the DateTime Stamp order. Date1 and Date2 reverse position between US and British dates.

Ambiguous US/British Numeric Date Formats (Days 12 and Under)

Unfortunately, if both the day and month equal to a number 12 or less, AutoHotkey cannot determine the format on its own.

Take your pick with this MsgBox command routine:

MsgBox, 4,, US date format?        (press Yes)`rBritish date format
                                  ? (press No)
IfMsgBox Yes
      Return MakeStamp(Date3,Date1,Date2)    ;  10-1-18 US
else
      Return MakeStamp(Date3,Date2,Date1)    ;  1-10-18 British

since both formats produce valid dates.

RegEx in DateStampConvert.ahk

The following snippet of code shows all of the required RegEx code in context within the script:

; US alphabetic date formats October 1, 2018

If (RegExMatch(Clipboard, "^([[:alpha:]]+).*?(\d\d?)
              .*?(\d\d\d?\d?)$" , Date))
{

; Convert alphabetic month to numeric month

    NewMonth := MonthConvert(Date1)
    If NewMonth = "Not found!"
    {
      MsgBox Not a valid date!
      Return
    }
    Else
      Return MakeStamp(Date3,NewMonth,Date2)
}

; British alphabetic date formats 1 October 2018

Else If (RegExMatch(Clipboard, "^(\d\d?).*?([[:alpha:]]+)
        .*?(\d\d\d?\d?)$" , Date))
{

; Convert alphabetic month to numeric month

    NewMonth := MonthConvert(Date2)
    If NewMonth = "Not found!"
    {
      MsgBox Not a valid date!
         Exit
    }
    Else
      Return MakeStamp(Date3,NewMonth,Date1)
}

; Numeric date formats (10-13-18 and 13-10-2018)
Else If (RegExMatch(Clipboard, "^(\d\d?).*?(\d\d?)
.*?(\d\d\d?\d?)$" , Date))
{

; Numeric US/British date formats (10-13-18 and 13-10-2018)
If (Date1 > 12) ; British date format
   Return MakeStamp(Date3,Date2,Date1) ; 13-10-18
Else If (Date2 > 12) ; US date format
   Return MakeStamp(Date3,Date1,Date2) ; 10-13-2018
Else
  {

; Ambiguous US/British date formats (10-1-18 and 1-10-18)
  MsgBox, 4,, US date format
    ? (press Yes)`rBritish date format
    ? (press No)
  IfMsgBox Yes
    Return MakeStamp(Date3,Date1,Date2) ; 10-1-18 US
  else
    Return MakeStamp(Date3,Date2,Date1) ; 1-10-18 British
  }
}
Else
  MsgBox No date found!
  Clipboard := OldClipboard
}

To see this snippet within the script, checkout DateStamptConvert.ahk.

Coming Soon!

The DateStamptConvert.ahk script also includes the following functions:

  1. The MakeStamp() function which returns the DateTime Stamp for the selected formatted date.
  2. The MonthConvert() function which converts the text month name into its numeric value.
  3. The YearCheck() function which converts two-digit years into four-digit years.

These functions, as well as a special discussion of global variables, returned values, versus ByRef values in functions and how to check for a valid date, will act as topics for future blogs.

jack

This post was proofread by Grammarly
(Any other mistakes are all mine.)

(Full disclosure: If you sign up for a free Grammarly account, I get 20¢. I use the spelling/grammar checking service all the time, but, then again, I write a lot more than most people. I recommend Grammarly because it works and it’s free.)

3 thoughts on “Using Regular Expressions to Convert Most Formatted Dates into DateTime Stamps (AutoHotkey Tool)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s