Strip HTML Tags from Text (AutoHotkey Quick Tip)

Use This AutoHotkey Trick to Remove HTML Code from Any Text

Last time in “Alternative Web Page HTML Download Techniques (AutoHotkey Tip),” I mentioned how I updated the GooglePhraseFix.ahk script by aaston86 to get it working again and make it a little more robust. The script uses a Google search page to autocorrect common expressions and people’s names. (It only works if Google thinks you may have made an error.)

For example, if you type “Ralph Nadal” the Spanish tennis player, selecting the name and using the CTRL+ALT+G Hotkey combination changes “Ralph Nadal” to “Rafael Nadal.” It only works for obvious possibilities, but may come in handy for correcting hard to remember spellings (i.e. “Jocavic” turns into “Djokovic”).

I added the phrase “Showing results for ” to the script as a search key in the Google results page. Google includes the phrase when it senses that you may have made a mistake. The original script used the StringReplace command to remove some HTML code and correct any apostrophes ('):

   StringReplace, clipboard, match2, <b><i>,, All
   StringReplace, clipboard, clipboard, </i></b>,, All
   StringReplace, clipboard, clipboard, ',', All

The StringReplace command can work for unchanging HTML tags but you need to add the command for each tag (or set of tags). By using the RegExReplace() function, you can remove all HTML code with one command.

HTML Tag Stripping Regular Expressions (RegEx) Using the RegExReplace() Function

The selected section of the Google page now includes a lot more HTML code than merely italics <i> and bold <b>. Using the following expression removes it all:

   var := RegExReplace(var,"<.+?>")

You don’t need to know anything about AutoHotkey Regular Expressions (RegEx) to use the above RegExReplace() function. The command removes all text found in var bounded by the arrow brackets (< … >).

Suppose you want to copy all the text from a Web page to a file. You could use the URLDownloadToFile command to copy the page source code, then execute the above RegExReplace() function to remove all of the HTML code. Only the plain text remains.

Click the Follow button at the top of the sidebar on the right of this page for e-mail notification of new blogs. (If you’re reading this on a tablet or your phone, then you must scroll all the way to the end of the blog—pass any comments—to find the Follow button.)

jack

This post was proofread by Grammarly
(Any other mistakes are all mine.)

(Full disclosure: If you sign up for a free Grammarly account, I get 20¢. I use the spelling/grammar checking service all the time, but, then again, I write a lot more than most people. I recommend Grammarly because it works and it’s free.)

Find my AutoHotkey books at ComputorEdge E-Books!

Find quick-start AutoHotkey classes at “Robotic Desktop Automation with AutoHotkey“!

Embedding Google Maps in the IPFind.ahk GUI (AutoHotkey Web Trick)

Write a Local File to Load HTML iFrame Embedding Code into the ActiveX Control

May 26, 2021, Alert: Wow! That was fast! Google has already disabled this iFrame map embedding technique…at least for Google Maps. Oh, well, I’ve already reverted to using the original IPFindMap.ahk script using OpenStreetMap.org (“Use ActiveX Control to Embed World Maps in AutoHotkey GUI” May 10, 2021). The technique remains valid. I’ll offer another iFrame embedding application soon—this time probably with a weather forecast.

Last time “Embed Google Maps in an AutoHotkey GUI (No API Required!)” I discussed how you can bypass much of the clutter on Web pages by embedding the map, video, or image in an HTML iFrame read directly into the AutoHotkey ActiveX GUI control from a local file. Sites offering this service often supply HTML code generators for copying the appropriate link. Sometimes, as in the case of Google Maps, you will only find the legacy code by searching the Web. (Google wants you to signup for the API.)

I don’t know how long this Google Map feature will work, but for now, it provides a reasonable solution for AutoHotkey users wanting to embed a simple map into an application.

This time, I modified the IPFindMap.ahk script to write the HTML iFrame code to a separate .html file, then use that filename as the destination URL for the AutoHotkey ActiveX GUI control. This allows AutoHotkey to load an interactive Google map for each IP address found in the selected text.

I appreciate this solution because the embedded Google map looks cleaner than the previous OpenStreetMap.org map and displays the foreign map names in English.

Continue reading

Use ActiveX Control to Embed World Maps in AutoHotkey GUI

By Directly Loading a Map from OpenStreetMap.org into Your AutoHotkey Graphical User Interface (GUI) Pop-up Window, You Can Add Interactive Geographic Locations to All Your Apps

I have some good news and some bad news about using AutoHotkey tools to directly access Web data through the Internet. First the bad news. Since the AutoHotkey tools for downloading and reading Web pages use Internet Explore (built into Windows but no longer supported by Microsoft), Web providers can effectively block access by identifying that user browser. For the good news, you rarely need to use those sites blocking simple little personal apps such as my IPFind.ahk script. So many other sites support location data for IP addresses that I don’t have a problem keeping the script up and running.

For a quick glance at the geographic location of an IP address, insert an OpenStreetMap.org Web window into an AutoHotkey GUI. Hold the mouse cursor over a map and scroll in or out to zoom in or out.

Previously, I had repaired other issues caused by changes in the source Web page and converted the IPFind.ahk script to use a GUI window rather than a MsgBox command. This upgrade facilitated adding links to the app, see “Adding Web Links to the AutoHotkey IPFind.ahk Script,” as well as making the current insertion of interactive maps using the ActiveX GUI control possible. I fixed the IPFind.ahk script problems by switching to another source Web page and added an interactive map from OpenStreetMap.com.

Continue reading

Adding Web Links to the AutoHotkey IPFind.ahk Script

While Fixing the IPFind.ahk Script for Listing the Geographic Location of an IP Address, I Added Links for the IP Identification Site and OpenStreetMap

Occasionally, Web page scraping apps fail (or display strange results) due to changes in source page data formats. It usually only takes a few minutes to review the code and make the necessary RegEx adjustments to restore acceptable results. This time while repairing the IPFind.ahk script, I noticed that the Web page source code also offered IP longitude and latitude. I thought, “Why not add a map link to the display window for anyone curious about its geographic position?” The IP site (which I also added as a link) includes a map, but I wanted one with greater detail.

An IP address site can provide a great deal of information—including approximate longitude and latitude.

Note: I recently discussed the Link GUI control in “Turn Web Addresses into Hotlinks for the AHK File Peek Window.”

Continue reading

Using Parts to Build a New AutoHotkey Script (HowLongInstant.ahk)

While Many Users Find the Original GUI Based HowLong Script Valuable, Combining Snippets of Code Creates a New Instant HowLong Script

Last time in “Extracting Multiple Dates from Text Using AutoHotkey RegEx,” I wrote a Regular Expressions (RegEx) that copied the first and last date (in a variety of formats) found in a selection from a document or Web page. (I recently updated that RegEx to make it more robust.) That represented the first step in building an instant HowLongYearsMonthsDay.ahk script. The goal, as defined by the reader, included highlighting a section of text which bounds two dates, pressing a Hotkey combination, then immediately calculating and displaying the timespan—no delaying the process with an input GUI or clicking a calculate button. As with many new scripts, I took pieces of it from other scripts and integrated them to produce a new one.

The chunks I used to produce the new script included:

  1. The Standard Clipboard Routine for capturing the selected text.
  2. The RegEx for identifying and capturing the target dates. (Discussed in my last blog.)
  3. The DateConvert() function found in the DateStampConvert.ahk script for formatting the parsed dates as the standard TimeDate stamp (YYYYMMDD).
  4. The HowLong() function found in the HowLongYearsMonthsDays.ahk script for calculating the timespan between the two TimeDate stamp parameters.
  5. A MsgBox for instantly displaying the results.
Continue reading

Extracting Multiple Dates from Text Using AutoHotkey RegEx

While Not Simple (and a Little Bit “Greedy”), the RegEx for Two-Date Parsing Only Requires One Selection

I received the following query from a reader:

Regular Expressions in AutoHotkey
Regular Expressions (RegEx) can be mysterious in any language.

Hi! Is it possible to highlight the entire date range (e.g. 16 March 2021 to 21 May 2021) when the Hotkey is triggered, feed it into the timespan ahk, and share the timespan as result?

Working with AutoHotkey Date Formats and Timespan Calculations

Yes, it is! You’ll find using Regular Expressions (RegEx) to simultaneously parse the two dates from the text the key to success. Plus, you’ll want to streamline the process by eliminating the GUI and feeding the dates directly into the HowLong() function found in HowLongYearsMonthsDays.ahk script. Implementing the instant calculation requires three steps:

  1. Writing a RegEx for identifying and capturing the target dates. (Discussed in this blog.)
  2. Using DateStampConvert.ahk code to format the parsed dates in the standard TimeDate stamp (YYYYMMDD).
  3. Calculate the timespan by running the HowLong() function using the two dates as parameters.

This approach should provide you with an instant timespan calculation between any two dates matched in a text selection.

I have not done all the work, but I have developed a RegEx which locates the first and last date in a text selection;

sx)(\b[[:alpha:]]+.?\s\d\d?,?\s\d?\d?\d\d|\b\d\d?[-\s]?[[:alpha:]]+[-\s]?\d\d\d?\d?|\b\d\d?[-/]\d\d?[-/]\d\d\d?\d?)
.*(\b\[[:alpha:]]+.?\s\d\d?,?\s\d?\d?\d\d|\b\d\d?[-\s]?[[:alpha:]]+[-\s]?\d\d\d?\d?|\b\d\d?[-/]\d\d?[-/]\d\d\d?\d?)

Update March 26, 2021: \w in original RegEx changed to [[:alpha:]] to include only alphabetic characters.

While I don’t discuss every aspect of this RegEx here, I cover the important aspects of its construction. (I’ve written numerous blogs and an entire book discussing the basics of AutoHotkey Regular Expressions.)

Continue reading

Working with AutoHotkey Date Formats and Timespan Calculations

AutoHotkey Date and Time Calculations Require Special Handling—Check Out This List of How-to’s for Working with Dates

Over the years, I’ve written a number of blogs and many chapters about formatting and calculating dates, but one of my AutoHotkey apps that I think most demonstrate the full range of these capabilities include the scripts HowLongYearMonthDay.ahk and DateConvert.ahk.

DateConvertSend
When combined with the HowLongYearMonthDay.ahk script, the DateStampConvert.ahk script directly converts various ambiguous date formats selected in documents or Web pages into the standard datetime stamp format for inserting into the time-span calculating GUI pop-up.

Rather than using AutoHotkey commands for converting the standard datetime stamp into one of the numerous worldwide date formats, this conversion tool does the reverse and reformats selected dates into the universal datetime stamp.

Continue reading

Sending E-Mail and AutoHotkey

After Working Out the Kink’s, AutoHotkey Sends Individual E-Mails Smoothly

RobotEmailCartoon

The scourge of the Internet, Spam haunts our daily lives—whether in the form of phishing e-mails or unwanted phone calls. While never eliminated, we minimized its impact through filtering and blocking. As a side effect of our efforts, we now commonly check our Spam folder when searching for an errant missive. Due to this problem e-mail providers now add layers of protection to their servers—usually in the form of what content we can transmit, message size, and the number of e-mails sent in a specific period of time.

Generally, we never think about these limitations because our local e-mail program restricts us enough to prevent our abusing the system. This confines us safely within the parameters of our e-mail provider. Only setting up our own e-mail server removes these restrictions.

It is important to understand that sending a mass email through your Gmail does have some limits (a total of more than 500 recipients in a single email and or more than 500 emails sent in a day). There is a maximum of email recipients a user can have in one single email, as well as a maximum amount of emails a user can send in 24 hours. It will not work by sending them at 11:50 pm and again at 12:05 am; the system requires a full 24 hours to pass.

How to Send Mass Email in Gmail – Few Easy Options
Continue reading

Adapting Web Scraping Routines to Changing Web Pages (AutoHotkey Tip)

When the Horoscope Web Page I Use for E-mails Altered Its Format, I Quickly Adjusted the Script

Last year, I wrote a script that e-mails a daily horoscope to my wife, “E-mail the Daily Horoscope to Yourself (AutoHotkey Trick).” Every morning she receives on her tablet an e-mail containing her daily horoscope. (I don’t send it to myself because I don’t want to know that much about my future—and I don’t listen to advice.) Recently, she pointed out that the e-mail started coming up blank. I immediately realized that the target Web site had changed its source code. (I’ve experienced the same problem with the SynonymLookup.ahk script.) I knew I could repair the Regular Expression (RegEx) in the broken script fairly quickly by following some basic steps:

  1. Access the source code for the target Web page and locate the key text.
  2. Copy the critical portion of the source code, including any unique HTML tags surrounding the target text, then paste the selection into Ryan’s RegEx Tester.
  3. Adjust the RegEx to include key unique tags surrounding the text—then extracting the paragraph.
  4. In the script, replace the old RegEx found in the RegExMatch() function with the new one from Ryan’s RegEx Tester.
  5. Make any necessary adjustments to the RegEx—primarily escaping double quotation marks.

The new horoscope e-mail script now includes more details and a link to the site.
Continue reading

Turn Web Addresses into Hotlinks for the AHK File Peek Window (AutoHotkey Tip)

Using the AutoHotkey GUI Link Control to Display AHK File Notes Allows You to Turn Web Links Hot

While perusing the notes in various .ahk scripts using the subroutine ReadNotes—which I had added to the AutoStartupControl.ahk script and discussed in my blog “Peeking at Notes Inside Auto-Startup AHK Script Files (AutoHotkey Startup Control)“—I noticed that many scripts included URLs to reference sites. A common practice used by scriptwriters when giving credit to another script or offering additional information about the source, these sites can offer valuable insight or resources. Usually, a Web address appears as a complete URL including the HTTP(S)://. I wondered, “Wouldn’t it be great to just click a link in the Notes window to load the page?”

Since we write AutoHotkey scripts in plain text, attempting to provide hotlinks inside the file using HTML code (or other techniques) doesn’t make much sense. I can open the file and copy the Web address—pasting it into my browser, but a hotlink in the Notes window would save a lot of time. I immediately switched from using the Text GUI control to the Link GUI control. By inserting the Link control into the AutoStartupControl Notes GUI window, I can turn any URL into a hotlink—as long as I use a Regular Expressions (RegEx).

The Link GUI control in the Notes window can turn any fully formed Web address into a hotlink for immediate access.

Using the Link GUI control comes with a couple of foibles, but, for the most part, it behaves in a manner very similar to the Text GUI control.

Continue reading