AutoHotkey Tip of the Week—Powerful RegEx Text Search Shorthand (~=)

AutoHotkey Provides an Abbreviated Regular Expression RegExMatch() Operator ( ~= ) for Quick Wildcard Text Matches

Regular Expressions (RegEx) can get confusing, but once understood, they pay tremendous dividends. Acting almost as another programming language, Regular Expressions in AutoHotkey provide a method for accomplishing complex search and/or replacement with only one line of code. While not impossible, doing the same thing without using RegEx often requires complex tricks and many lines of code. In the beginning, learning RegEx many feel daunting but you’ll find it well worth the journey.

Light Bulb!In spite of the initial learning curve, you don’t need to learn how the two primary AutoHotkey RegEx functions work (RegExMatch() and RegExReplace()) to make good use of a RegEx. The shorthand RegEx operator ( ~= ) provides a method for doing a complex string match without the limitations of the InStr() function. Regular Expressions search for patterns while the InStr() function searches for exact strings.

Many of my scripts use Regular Expressions in a variety of forms. See the following list for a few of my AutoHotkey scripts using RegEx:

MultiPasteNoSpace
MultiPaste.ahk MsgBox.
  • In the DateStampConvert.ahk script, I use RegEx to convert US and British date formats into the standard DateStamp format (yyyymmdd) required for many AutoHotkey commands and functions.
  • The MultiPaste.ahk script breaks a range of data into individual paste items.  It uses RegEx to identify US and UK postal codes for placement in the variable array listed in the MsgBox.
  • In the RhymeMenu.ahk script, a RegEx pulls various rhyming words from a Web page.
  • The SynonymLookup.ahk script culls replacement words with RegEx from the Thesaurus.com site for adding to a selection menu.
  • The IPFind.ahk script digs through a Web page with a RegEx to identify its world location.

While some of the posted apps use RegEx techniques as their backbone, even more, do quick RegEx checks. Whenever I find that a simple InStr() search won’t work, I turn to RegEx.

For example, the InstantHotstring.ahk script uses RegEx sparingly to validate the Hotstring format in external files. When loading Hotstrings from a file, the script must identify for processing any line of code which uses the standard Hotstring syntax—otherwise completely ignoring it.

Use RegEx Shorthand to Identify Hotstring Syntax in Any AutoHotkey (.ahk) Script

The RegExMatch() shorthand uses the following format:

Haystack ~= Needle

Haystack comprises the larger inspection portion of text. Needle contains the RegEx code for matching a piece in HaystackBy using this shorthand for RegExMatch() function, we write a wildcard for an IF conditional while looping through a file that verifies any Hotstring format code and skips non-matching lines:

If (A_Loopfield ~= "^:.*?:.+?::[^\s]")

Note: We only use the shown RegEx to identify valid Hotstring formats in an AutoHotkey (or other) file. This allows us to loop through a file only processing lines that contain a valid Hotstring.

Library Benefits

The RegEx ( ^:.*?:.+?::[^\s] ) looks for a string of characters which matches the Hotstring double double-colon format at the beginning of a line:

  1. The Up Arrow ( ^ ) tells AutoHotkey to start the match at the beginning of the line with the first colon ( ^: ).
  2. Since Hotstring options reside between the first two colons, RegEx uses the any-character dot wildcard ( . ) with the repeat match none-or-more times modifier ( * ) to consume all the Hotstring option characters ( .* ).
  3. The question mark ( ? ) toggles the all-or-none wildcard to non-greedy—forcing the RegEx to stop matching characters on the next colon ( .*?: ). (Otherwise, the match would continue until encountering the last colon in the file—greedy.)
  4. After the second colon, the RegEx again uses the none-or-more wildcard combination ( .*?:: ) to match the entire activating Hotstring—this time stopping at the next double-colon.
  5. The last part of the RegEx uses a range ( [^\s] ) to exclude any action Hotstrings which do not contain replacement text after the second double-colon. In this case, the Up Arrow ( ^ ) means do-not-match and the ( \s ) matches any space or end-of-line character—causing a space character or return not to match.

See the online AutoHotkey RegEx Quick Reference for more information on these and other Regular Expressions (RegEx).

Note: Although the full RegExMatch() function offers more functionality than we need in this situation, it can create the same result:

If (RegExMatch(A_Loopfield,"^:.*?:.+?::[^\s]"))

Note: Not shown in the conditional above, the RegExMatch() function allows the saving of the match string to a variable, as well as, letting AutoHotkey designate where to start the search in Haystack. Use the shorthand form for InStr() type testing and the full form function for data extraction.

*          *          *

Cover 200

The book, A Beginner’s Guide to Using Regular Expressions in AutoHotkey, covers a number of Regular Expressions (RegEx) topics introducing practical techniques for making your AutoHotkey scripts more powerful. The first chapters introduce the concept and implementation of RegEx.

  • Chapter Five discusses eliminating double words.
  • Chapter Six discusses fixing contractions with RegEx.
  • Chapter Seven shows how to swap letters or words.
  • Chapter Eight uses RegEx to extract world location information about an IP address from a Web page.
  • Chapter Nine shows you how to remove HTML tags from any document or source code.
  • Chapter Ten demonstrates how to extract Web links from Web pages.
  • Chapter Eleven offers a RegEx for verifying valid e-mail addresses with AutoHotkey.

*          *          *

(Also, see the book Jack’s Motley Assortment of AutoHotkey Tips for many more RegEx tips.)

The beauty of this RegExMatch() shorthand lies in its ability to identify Hotstring format code in any file which ignoring all the other code. It skips action Hotstrings which start on the next line because the Hotstring() function cannot process multiple lines of code. (Hotstrings lines using the X option and a sole subroutine Label or function will work as long as the subroutine or function appears in the calling script.)

Whenever you encounter a text match too complex for the InStr() function, then you should consider using the RegExMatch() function shorthand operator ( ~= ).

Click the Follow button at the top of the sidebar on the right of this page for e-mail notification of new blogs. (If you’re reading this on a tablet or your phone, then you must scroll all the way to the end of the blog—pass any comments—to find the Follow button.)

jack

This post was proofread by Grammarly
(Any other mistakes are all mine.)

(Full disclosure: If you sign up for a free Grammarly account, I get 20¢. I use the spelling/grammar checking service all the time, but, then again, I write a lot more than most people. I recommend Grammarly because it works and it’s free.)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s