Strip HTML Tags from Text (AutoHotkey Quick Tip)

Use This AutoHotkey Trick to Remove HTML Code from Any Text

Last time in “Alternative Web Page HTML Download Techniques (AutoHotkey Tip),” I mentioned how I updated the GooglePhraseFix.ahk script by aaston86 to get it working again and make it a little more robust. The script uses a Google search page to autocorrect common expressions and people’s names. (It only works if Google thinks you may have made an error.)

For example, if you type “Ralph Nadal” the Spanish tennis player, selecting the name and using the CTRL+ALT+G Hotkey combination changes “Ralph Nadal” to “Rafael Nadal.” It only works for obvious possibilities, but may come in handy for correcting hard to remember spellings (i.e. “Jocavic” turns into “Djokovic”).

I added the phrase “Showing results for ” to the script as a search key in the Google results page. Google includes the phrase when it senses that you may have made a mistake. The original script used the StringReplace command to remove some HTML code and correct any apostrophes ('):

   StringReplace, clipboard, match2, <b><i>,, All
   StringReplace, clipboard, clipboard, </i></b>,, All
   StringReplace, clipboard, clipboard, ',', All

The StringReplace command can work for unchanging HTML tags but you need to add the command for each tag (or set of tags). By using the RegExReplace() function, you can remove all HTML code with one command.

HTML Tag Stripping Regular Expressions (RegEx) Using the RegExReplace() Function

The selected section of the Google page now includes a lot more HTML code than merely italics <i> and bold <b>. Using the following expression removes it all:

   var := RegExReplace(var,"<.+?>")

You don’t need to know anything about AutoHotkey Regular Expressions (RegEx) to use the above RegExReplace() function. The command removes all text found in var bounded by the arrow brackets (< … >).

Suppose you want to copy all the text from a Web page to a file. You could use the URLDownloadToFile command to copy the page source code, then execute the above RegExReplace() function to remove all of the HTML code. Only the plain text remains.

Click the Follow button at the top of the sidebar on the right of this page for e-mail notification of new blogs. (If you’re reading this on a tablet or your phone, then you must scroll all the way to the end of the blog—pass any comments—to find the Follow button.)

jack

This post was proofread by Grammarly
(Any other mistakes are all mine.)

(Full disclosure: If you sign up for a free Grammarly account, I get 20¢. I use the spelling/grammar checking service all the time, but, then again, I write a lot more than most people. I recommend Grammarly because it works and it’s free.)

Find my AutoHotkey books at ComputorEdge E-Books!

Find quick-start AutoHotkey classes at “Robotic Desktop Automation with AutoHotkey“!

Alternative Web Page HTML Download Techniques (AutoHotkey Tip)

When One Method for Downloading HTML Code Breaks, Try the Alternative AutoHotkey Command

After noticing that, although I could quickly get the latitude and longitude for any location with a Google search in a browser, when I attempted to download that page using the GetWebPage() function code taken from the AutoHotkey documentation (shown below in the first script), Google stopped me. The Google server denied the download attempt of the coordinates for San Diego with the following statement:

403. That’s an error.

Your client does not have permission to get URL /search?q=latitude+longitude+san+diego+decimal&rlz=1C1GEWG_enUS953US953 from this server.

Thwarted by Google again (see my “Switched IPFind.ahk to OpenStreetMap.org for Reliable AutoHotkey GUI Map Embedding” blog), I wanted to find an alternative source for the same information.

I searched for an unblocked Web page providing the latitude and longitude. I didn’t have to look very far. (The site name appears in the AutoHotkey snippet below.) I wrote the following test code for proof of function:

Continue reading

AutoHotkey Tip of the Week: Repeat Words and Phrases with RegEx Hotstrings

Save Time with This RegEx Hotstring for Inserting Repeated Words or Sentences—”Blah!” Instantly Turns Into  “Blah! Blah! Blah!”

regexrobotcartoonAt the end of my last blog, I postulated the possibility of a word duplicating RegEx Hotstring. While I don’t know how many people would ever use it, I do remember a time when the technique would have come in handy (as shown in the cartoon on the left). I thought that I would leave the problem as a reader’s challenge and move on, but I found that I couldn’t just abandon the loose end.

While this trick may not embody the most essential Hotstring, the technique might stimulate other AutoHotkey users to venture forward with their own variations on RegEx Hotstrings. I would love to hear about other innovative applications of the RegExHotstring() function—doing things that prove difficult (or impossible) with either traditional double-colon Hotstrings or the built-in Hotstring() function. Continue reading

AutoHotkey Tip of the Week: Word Manipulating Dynamic AutoHotkey Hotstrings

A Mini-Regular Expressions (RegEx) Tutorial Using the RegExHotstrings() Function for Word Swapping and Double Word Auto-Delete

While the RegExHotstrings() function has its limitations (discussed in “Dynamic Regular Expressions (RegEx) for Math Calculating Hotstrings“), we can quickly implement some simple (yet complex) dynamic Hotstrings using a one-line function call. The RegExHotstrings() function offers a few advantages over the traditional Hotstring format. Regular Expressions (RegEx) used in the function bust through the fixed-text limitations of the double-colon format (e.g. ::lol::laugh out loud). RegEx allows you to match string patterns making wildcard text replacements possible. To explain how the RegExHotstrings() function works, I use one-line function calls to replace ambiguous text with targeted results.

RegExHotstrings

In order to make the best use of the RegExHotstrings() function, we need an understanding of the key concepts driving the function. Once we get a hang of how to operate this dynamic Hotstring function, we can analyze the parentheses-enclosed expressions in each example to develop a better grasp of how RegEx works.

In this blog, I highlight two different RegExHotstrings() function word editing operations: one for swapping the order of two errant words; the second for auto-deleting duplicate words. After introducing RegExHotstrings() key concepts, I explain step-by-step how each RegEx behaves.

Continue reading

AutoHotkey Tips of the Week: The ComObjCreate() Function for Web Page Downloads, E-Mail, and Text Audio

While AutoHotkey Directly Supports Most Windows Features, the Flexibility of the ComObjCreate() Function Adds More Useful Capabilities—Especially for Capturing Web Data, Sending E-mail, and Reading Text Out-Loud

A number of my scripts use the ComObjCreate() function in various forms. Most of them I copied from the AutoHotkey Forums and modified for my own purposes. In this blog, I highlight the ComObjCreate() applications I use most, then offer a list of other forms of the function you may find useful.

How I Use ComObjCreate()

Synonym Page
The SynonymLookup.ahk script pulls replacement terms for the highlighted word “Page” from the Web.

While AutoHotkey supports many of these features in one form or another, directly accessing the COM (Component Object Model) might provide a solution you can get by no other method. I use the ComObjCreate() function in three ways:

  1. Collect data from Web pages (ComObjCreate(“WinHttp.WinHttpRequest.5.1”)).
  2. Send e-mail directly from an AutoHotkey script (ComObjCreate(“CDO.Message”))—no mail program required.
  3. Use the computer voice to read text (ComObjCreate(“SAPI.SpVoice”)).

While I haven’t found much additional information about the ComObjCreate() function posted on the new AutoHotkey forum, the old forum contains a useful COM Object reference list. You don’t need to know how they work—just how to use them. Continue reading

AutoHotkey Tip of the Week: Windows Trick for Adding Embedded Folder Icons to QuickLinks Menus

This Technique Accesses Icons Embedded in Windows Folders for Inserting into Pop-up Menus—Plus, the New Combined Switch/Case Statement QuickLinks QL_GetIcon() Function

I completely rewrote the functions from the last blog for adding icons to the menus in the QuickLinks.ahk script combining the two into a shorter prioritized list using Switch/Case statements. In the process—after investigating how to read icons embedded in Windows folder/directory listings—I discovered an interesting Windows secret. It turns out that this procedure requires a totally different Windows maneuver than that used for reading Windows Shortcut file icons.

The Windows Desktop.ini File

UnHideFiles
Ryan’s UnHideFiles.ahk script makes Windows Registry changes to hide and unhide files.

When you embed an icon into a Windows folder (right-click on the folder name in Windows File Explorer, select Properties and the Customize tap, then click Change Icon… and browse for icons), rather than saving the icon path and icon number in the folder itself—as Windows does for shortcut files—it creates a special hidden file named desktop.ini in that same folder. With Windows set to Show Hidden Files, folder and drives in the View tab of the Folder Options window, you can view the hidden desktop.ini file in that folder. (Tip: You can use Ryan’s UnHideFiles.ahk script to hide and unhide files and folders.) Continue reading

AutoHotkey Tip of the Week: Use Regular Expressions (RegEx) to Convert Repetitious AutoHotkey Code

Regular Expressions (RegEx) Can Simplify a Tedious Code Reformatting Problem

Recently, I received the following comment from Thom:

Greetings,

A small improvement to the Autocorrect AHK script. I have been using this script for years and find it very useful. I was always a bit intrigued about the section of ambiguous entries which was commented out and not much use.

I was fascinated to read about your TextMenu function [found in the book Beginning AutoHotkey Hotstrings] to display the various choices. I found a simple way with RegEx to change all the entries in the section.

For example:

::electon::election, electron

To:

::electon::
  TextMenu("election, electron")
Return

I copied and pasted the list into Notepad++ and then ran this find-and-replace.

Find:

(::\w+::)(.+)

Replace:

$1\n TextMenu\(\"$2\"\)\nReturn

And presto it works—some entries need tweaking but it works well. Continue reading

AutoHotkey Tip of the Week: A Look at the New Switch/Case Command

In the DateStampConvert.ahk Script, Rather than Using a Series of If-Else Statements (or the Ternary Operator), the New Switch Command Sets Up Case Statements for Alternative Results—Plus, Easily Add Conversions for Spanish, German, French, and Italian Date Formats

Over a year ago, I used a cascading series of the ternary operators to convert English text month names into their numeric values within a single function (“Use the Ternary Operator to Create Conditional Case Statements or Switches“). The ternary operator shortcut acts as If-Else statements in abbreviated form.

DateConvertSend
In the DateStampConvert.ahk script, a technique similar to Switch/Case statements converts the name of a month into its corresponding numeric value.

Continue reading

AutoHotkey Tip of the Week: Add Single-Key Shortcuts to Pop-up Menus—September 16, 2019

Sometimes It’s Just Easier to Use the Keyboard Rather Than Your Mouse

HotString Pop-upIf a menu busts in while typing, it forces you to switch to your mouse for resolution. This can get pretty annoying if your script uses a number of pop-up menus. For example, Chapter Eight, “Make Your Own Text AutoCorrect Hotstring Pop-up Menus with AutoHotkey” and Chapter Nine, “How to Turn AutoHotkey Hotstring AutoCorrect Pop-up Menus into a Function” of the book Beginning AutoHotkey Hotstrings shows you how to set up a list of alternative corrections. It works well for offering options but, at times, wouldn’t you prefer to hit a single key to make the selection rather than first fetching the mouse, then clicking?

Recent Question from a Reader:

Is there any way to improve the script in order to, once the menu appears, select an option using a given key combination?

For instance: If I typed “alt+1” AutoHotkey would automatically select the option “again”, if I typed “alt+2” it would select the option “a gin” and so on so forth until alt+0?

*          *          *
Continue reading

Total the Numbers Found in Any Document (AutoHotkey RegEx Tips Part 5)

For a Quick-and-Dirty Calculator, Use Regular Expressions (RegEx) to Pull Numbers from Documents or Web Pages and Total Them Up—Plus, a RegEx for Removing (or Extracting) Numeric IP Addresses

Shifting gears, I end the discussion of the MultiPaste.ahk script which parses copied data into component parts for easier paste operations into other documents. With this blog, I start working on another tool for simplifying a Windows task—addition.

Sometimes I see a list of numbers in either a document or a Web page which I would like to quickly total without loading a separate calculator. For example, the shopping cart program I use for my book sales offers a summary table of all recent sales. While I can use a reports section of the site to get more information (e.g. monthly sales), I want a tool to quickly highlight the desired entries and give me the total of the individual sales. To do that I use a Regular Expression (RegEx) specifically for extracting those sales numbers. Continue reading