A Mini-Regular Expressions (RegEx) Tutorial Using the RegExHotstrings() Function for Word Swapping and Double Word Auto-Delete
While the RegExHotstrings() function has its limitations (discussed in “Dynamic Regular Expressions (RegEx) for Math Calculating Hotstrings“), we can quickly implement some simple (yet complex) dynamic Hotstrings using a one-line function call. The RegExHotstrings() function offers a few advantages over the traditional Hotstring format. Regular Expressions (RegEx) used in the function bust through the fixed-text limitations of the double-colon format (e.g. ::lol::laugh out loud). RegEx allows you to match string patterns making wildcard text replacements possible. To explain how the RegExHotstrings() function works, I use one-line function calls to replace ambiguous text with targeted results.
In order to make the best use of the RegExHotstrings() function, we need an understanding of the key concepts driving the function. Once we get a hang of how to operate this dynamic Hotstring function, we can analyze the parentheses-enclosed expressions in each example to develop a better grasp of how RegEx works.
In this blog, I highlight two different RegExHotstrings() function word editing operations: one for swapping the order of two errant words; the second for auto-deleting duplicate words. After introducing RegExHotstrings() key concepts, I explain step-by-step how each RegEx behaves.
How the RegExHotstrings() Function Works
After loading the RegExHotstrings() function, almost every key on your Windows keyboard turns into a Hotkey. The function records each keystroke and monitors the entire string until it encounters a match for one of the called active RegEx parameters. In the same manner as traditional Hotstrings, the RegEx Hotstring action fires when it encounters the matching text. It then resets and starts looking for the next match. The function only records keystrokes as far back as the last reset. After activation, Hotstring action won’t repeat until recognizing new keyed-in text conforming with one of the active expressions.
Note: You can use the backspace key to make corrections, but not other cursor movement keys or the mouse. While Space, Tab, Backspace, Esc and punctuation keys don’t reset the RegExHotstrings() function, a mouse button click, text cursor arrow action (←↑→↓), and other cursor movement keys do. (Although the Delete key does not force a reset, it provides no help since Delete only removes unmonitored characters off the right end of the text.)
The Key to Understanding How to Use the RegExHotstrings() Function: Capturing and Manipulating Subpatterns
In Regular Expressions, you save subpatterns (also called backreferences) by placing a set of parentheses around critical portions of the expression. In RegEx, parentheses display a number of different properties—one of which includes capturing any enclosed text separately from the entire expression. Using this subpattern-capturing feature of RegEx in the RegExHotstrings() function allows the clever manipulation of text. There lies the function’s power.
Each set of parentheses creates a sequential variable containing the text from the matched subpattern (in the order that they appear in the RegEx) for later use—either as a replacement value ($1, $2, $3, …) or as a backreference inside the RegEx itself (\1, \2, \3, …). Understanding how the RegExHotstrings() function uses subpatterns and backreferences allows you to create flexible wildcard Hotstrings.
Tip: If your RegExHotstring() function call does not require capturing either a subpattern (e.g. $1) or backreference (e.g. \1) then most likely you can drop the function completely and get the job done with either a traditional double-colon Hotstrings or the Hotstring() function.
Swapping Two Words
Our first RegEx Hotstring swaps the order of two freshly-typed words when the user appends the left arrow (<) immediately after entering the misplaced words. For example, if I accidentally key in “that and this”, I can instantly switch the order to “this and that” by pressing the left arrow key (Shift+comma):
that and this< ⇒ this and that
This RegEx Hotstring requires only one line of code—the RegExHotstrings() function call:
RegExHotstrings("([\w'-]+)([,;]?\s(?:and\s|or\s)?)([\w'-]+)<" ,"%$3%%$2%%$1%")
(The line above uses line continuation techniques to wrap the line of code for display purposes.)
Note: You must make the RegExHotstrings() function available to the script either by embedding the function in the script, #Including it at runtime, or maintaining it in a Function Library (see “Guidelines for AutoHotkey Function Libraries“).
Let’s take a look at the RegEx in the first parameter of the function.
How the Word Swap RegEx Works
When analyzing how the RegEx matches the monitored keystrokes, we break it into its component pieces. First, we look at the RegExHotstrings() function subpatterns (enclosed in red parentheses) needed as replacement text:
([\w’-]+)([,;]?\s(?:and\s|or\s)?)([\w’-]+)<
When adding sets of parentheses to a RegEx, we create and save subpatterns. The RegExHotstrings() function assigns the same standard variable names used in the build-in RegExReplace() function (i.e. $1, $2, $3, …)—in the order of subpattern appearance in the expression:
([\w’-]+) ⇒ $1
([,;]?\s(?:and\s|or\s)?) ⇒ $2
([\w’-]+) ⇒ $3
Identifying Words
We immediately see that $1 and $3 use the identical RegEx ([\w’-]+). This expression employs the following symbols to match and save any single word as a subpattern:
- The square brackets ([…]) enclose a class or range of characters matching any one of the characters in the list. The \w symbol matches any alphabetic character or a numeric digit. In addition, you find the apostrophe ( ‘ ) and dash ( – ) characters as options—enabling the RegEx to acquire contractions and hyphenated words, respectfully.
- The plus sign ( + ) appended to the range tells RegEx to continue including one or more characters in the word—as long as each matches a letter, a digit, an apostrophe, or a hyphen.
The Space Between the Words
The expression ([,;]?\s(?:and\s|or\s)?) stores the space (and other designated text) found between the two words in the $2 subpattern variable. If the separation between the typed words does not match this RegEx, then the Hotstring won’t fire. The included symbols show the following matching behavior:
- The range [,;] matches either a comma or semi-colon. Adding a question mark (?) to the range makes the match optional—none or one. That means we can swap two words—even if a comma or semi-colon sits between them.
- The \s symbol recognizes spaces, tabs, and newlines. Sitting on its own with no added modifier tells the RegEx that one space character must appear in the final match.
- The next set of parentheses (?:and\s|or\s)? offers the possibility of intervening conjunctions (“and” or “or”) appearing between the two words—followed by a space character (\s).
- When included within parentheses, the vertical pipe character ( | ) separates matching string options—one (and\s) or the other (or\s), but not both.
- The question mark following the preceding set of parentheses (?:and\s|or\s)? makes the conjunction followed by a space optional—none or one.
- The ?: inside the parentheses (?:and\s|or\s)? prevents the creation of a new, unhelpful subpattern variable.
This RegEx works for any two words (including contractions and hyphenated words) separated by either: a sole space; a comma or a semicolon followed by a single space; or, as an appended option, a conjunction (“and” or “or”) followed by another space. When recognizing the need to swap two words, immediately press the less-than sign key ( < ) to activate the swap.
The second function parameter “%$3%%$2%%$1%” simply reverses the order of the saved subpatterns as replacement text for the original matched RegEx.
Recognize that you must activate this Hotstring swap immediately. Coming back later won’t do since the Hotstring resets after any mouse click or arrow key cursor movement. For later repair action, you might need Hotkey behavior similar to that discussed in Chapter Seven “A Simple Beginner’s Trick for Swapping Letters and Words” in the book A Beginner’s Guide to Using Regular Expressions in AutoHotkey. The final RegEx example in the chapter allows the swapping of any two words at the beginning and end of selected text in a paragraph.
Auto-Delete Duplicate Words with a Backreference
Our second example uses a RegEx backreference to recognizes when someone types the same word twice in a row—then deletes the second:
RegExHotstrings("(\b\w+\b)\s+\1","%$1%")
This RegEx Hotstring actually replaces the entire matching expression with only the first occurrence of the word—as if deleting the second word.
In the RegEx parameter of the RegExHotstrings() function, the set of parentheses creates both a backreference for matching the reoccurrence of the first word \1 and the replacement subpattern $1:
(\b\w+\b) ⇒ \1 ⇒ $1
This example demonstrates both the use of a backreference matching an identical subpattern in a RegEx (\1) and the use of the same value as replacement text ($1). RegEx increments backreference variable names sequentially—in the manner as the matching subpatterns.
The subpattern expression matches one or more alphanumeric characters (\w+) bound (\b) by non-alphanumeric characters (word boundary)—identifying a word. The \b symbol used for marking the beginning and end of the word does not consume any characters. I added the word boundary (\b) because without it the last character of any word matching the first letter of the same word immediately deletes the repeated letter:
test test ⇒ testest
If you want to include hyphenated words and contractions, then insert a range […] into the expression and add the hyphen and apostrophe:
(\b[\w-‘]+\b) ⇒ \1 ⇒ $1 accepts hyphens and apostrophes
Any word repeated with one or more intervening spaces \s+ (or tab characters and newlines) instantly deletes the second word:
the the ⇒ the
Pretty cool!
Only your creativity limits these RegEx Hotstring techniques. However, I suggest that you keep RegEx strings relatively short. Long text input increases the risk of typing errors and cursor movement—resetting the dynamic Hotstrings action far too often.
I have an idea for a RegEx Hotstring which inserts a word multiple times by merely adding the number of instances at the end of the word:
Rah!3 ⇒ Rah! Rah! Rah!
go,4 ⇒ go, go, go, go,
I’m pretty sure that I would need to write a subroutine with a loop—although, I don’t know how much I would use this type of Hotstring trick?
Click the Follow button at the top of the sidebar on the right of this page for e-mail notification of new blogs. (If you’re reading this on a tablet or your phone, then you must scroll all the way to the end of the blog—pass any comments—to find the Follow button.)
This post was proofread by Grammarly
(Any other mistakes are all mine.)
(Full disclosure: If you sign up for a free Grammarly account, I get 20¢. I use the spelling/grammar checking service all the time, but, then again, I write a lot more than most people. I recommend Grammarly because it works and it’s free.)
Find my AutoHotkey books at ComputorEdge E-Books!
[…] the end of my last blog, I postulated the possibility of a word duplicating RegEx Hotstring. While I don’t know how […]
LikeLike
I was hoping to use this for the auto-capitalize as you type, but I get this eror when I run it..
https://i.imgur.com/fljXVZC.png
Also, when you said “April 21, 2020: Added word capitalization while you type and word/phrase repeat.”
Does that mean I can be typing in my web browser and it will auto-cap for me?
I’ve used AHK for a long time and the AutoSentenceCap.ahk is too slow for me or somewthing.
So now I’m looking for alternatives for the specific feature, such as LaTeX.
Are you aware of any programs that will auto-cap while you type?
LikeLike