Strip HTML Tags from Text (AutoHotkey Quick Tip)

Use This AutoHotkey Trick to Remove HTML Code from Any Text

Last time in “Alternative Web Page HTML Download Techniques (AutoHotkey Tip),” I mentioned how I updated the GooglePhraseFix.ahk script by aaston86 to get it working again and make it a little more robust. The script uses a Google search page to autocorrect common expressions and people’s names. (It only works if Google thinks you may have made an error.)

For example, if you type “Ralph Nadal” the Spanish tennis player, selecting the name and using the CTRL+ALT+G Hotkey combination changes “Ralph Nadal” to “Rafael Nadal.” It only works for obvious possibilities, but may come in handy for correcting hard to remember spellings (i.e. “Jocavic” turns into “Djokovic”).

I added the phrase “Showing results for ” to the script as a search key in the Google results page. Google includes the phrase when it senses that you may have made a mistake. The original script used the StringReplace command to remove some HTML code and correct any apostrophes ('):

   StringReplace, clipboard, match2, <b><i>,, All
   StringReplace, clipboard, clipboard, </i></b>,, All
   StringReplace, clipboard, clipboard, ',', All

The StringReplace command can work for unchanging HTML tags but you need to add the command for each tag (or set of tags). By using the RegExReplace() function, you can remove all HTML code with one command.

HTML Tag Stripping Regular Expressions (RegEx) Using the RegExReplace() Function

The selected section of the Google page now includes a lot more HTML code than merely italics <i> and bold <b>. Using the following expression removes it all:

   var := RegExReplace(var,"<.+?>")

You don’t need to know anything about AutoHotkey Regular Expressions (RegEx) to use the above RegExReplace() function. The command removes all text found in var bounded by the arrow brackets (< … >).

Suppose you want to copy all the text from a Web page to a file. You could use the URLDownloadToFile command to copy the page source code, then execute the above RegExReplace() function to remove all of the HTML code. Only the plain text remains.

Click the Follow button at the top of the sidebar on the right of this page for e-mail notification of new blogs. (If you’re reading this on a tablet or your phone, then you must scroll all the way to the end of the blog—pass any comments—to find the Follow button.)

jack

This post was proofread by Grammarly
(Any other mistakes are all mine.)

(Full disclosure: If you sign up for a free Grammarly account, I get 20¢. I use the spelling/grammar checking service all the time, but, then again, I write a lot more than most people. I recommend Grammarly because it works and it’s free.)

Find my AutoHotkey books at ComputorEdge E-Books!

Find quick-start AutoHotkey classes at “Robotic Desktop Automation with AutoHotkey“!

Alternative Web Page HTML Download Techniques (AutoHotkey Tip)

When One Method for Downloading HTML Code Breaks, Try the Alternative AutoHotkey Command

After noticing that, although I could quickly get the latitude and longitude for any location with a Google search in a browser, when I attempted to download that page using the GetWebPage() function code taken from the AutoHotkey documentation (shown below in the first script), Google stopped me. The Google server denied the download attempt of the coordinates for San Diego with the following statement:

403. That’s an error.

Your client does not have permission to get URL /search?q=latitude+longitude+san+diego+decimal&rlz=1C1GEWG_enUS953US953 from this server.

Thwarted by Google again (see my “Switched IPFind.ahk to OpenStreetMap.org for Reliable AutoHotkey GUI Map Embedding” blog), I wanted to find an alternative source for the same information.

I searched for an unblocked Web page providing the latitude and longitude. I didn’t have to look very far. (The site name appears in the AutoHotkey snippet below.) I wrote the following test code for proof of function:

Continue reading

Embed Google Maps in an AutoHotkey GUI (No API Required!)

While Not Commonly Advertised, You Can Add a Google Map (YouTube Video or Weather Forecast) to a GUI Without Using an API

May 26, 2021, Alert: Wow! That was fast! Google has already disabled this iFrame map embedding technique…at least for Google Maps. Oh, well, I’ve already reverted to using the original IPFindMap.ahk script using OpenStreetMap.org (“Use ActiveX Control to Embed World Maps in AutoHotkey GUI” May 10, 2021). The technique remains valid. I’ll offer another iFrame embedding application soon—this time probably with a weather forecast.

In my blog “Build a Barebones Web Browser Using the AutoHotkey ActiveX GUI Control,” I discuss a method for viewing Web pages using an ActiveX GUI control. It works fine for my Free AutoHotkey Scripts page, but as soon as I started viewing other common pages such as Google Maps, I ran into problems. If you want to embed a map in your AutoHotkey GUI, then Google wants you to sign up for an Application Programming Interface (API) using a valid credit card. When you register Google effectively gives you the API key free since it offers a $200 monthly credit for each account. Google wants the credit card number just in case…

As part of this initiative, Google has advised that from 16th July 2018, websites using Google Maps are now required to have a valid API key and a linked Google Cloud Platform Account with enabled credit card billing.

Changes to Google Maps API and Google’s New Billing Structure

You can access an extensive amount of information when using APIs and, for many people, that’s the way to go. Although I’ve never ventured into using APIs with AutoHotkey, Joe Glines has published an extensive amount of information—including the tutorial “Connecting to API / Web services.” I’ve considered digging into the topic and may do so in the future.

For now, I plan to demonstrate a trick for displaying a Google Map in an AutoHotkey GUI without signing up for an API. The trick may prove useful in other apps such as playing YouTube videos or embedding weather forecasts in AutoHotkey GUIs without all the extra clutter.

Continue reading

Adding Web Links to the AutoHotkey IPFind.ahk Script

While Fixing the IPFind.ahk Script for Listing the Geographic Location of an IP Address, I Added Links for the IP Identification Site and OpenStreetMap

Occasionally, Web page scraping apps fail (or display strange results) due to changes in source page data formats. It usually only takes a few minutes to review the code and make the necessary RegEx adjustments to restore acceptable results. This time while repairing the IPFind.ahk script, I noticed that the Web page source code also offered IP longitude and latitude. I thought, “Why not add a map link to the display window for anyone curious about its geographic position?” The IP site (which I also added as a link) includes a map, but I wanted one with greater detail.

An IP address site can provide a great deal of information—including approximate longitude and latitude.

Note: I recently discussed the Link GUI control in “Turn Web Addresses into Hotlinks for the AHK File Peek Window.”

Continue reading

Turn Web Addresses into Hotlinks for the AHK File Peek Window (AutoHotkey Tip)

Using the AutoHotkey GUI Link Control to Display AHK File Notes Allows You to Turn Web Links Hot

While perusing the notes in various .ahk scripts using the subroutine ReadNotes—which I had added to the AutoStartupControl.ahk script and discussed in my blog “Peeking at Notes Inside Auto-Startup AHK Script Files (AutoHotkey Startup Control)“—I noticed that many scripts included URLs to reference sites. A common practice used by scriptwriters when giving credit to another script or offering additional information about the source, these sites can offer valuable insight or resources. Usually, a Web address appears as a complete URL including the HTTP(S)://. I wondered, “Wouldn’t it be great to just click a link in the Notes window to load the page?”

Since we write AutoHotkey scripts in plain text, attempting to provide hotlinks inside the file using HTML code (or other techniques) doesn’t make much sense. I can open the file and copy the Web address—pasting it into my browser, but a hotlink in the Notes window would save a lot of time. I immediately switched from using the Text GUI control to the Link GUI control. By inserting the Link control into the AutoStartupControl Notes GUI window, I can turn any URL into a hotlink—as long as I use a Regular Expressions (RegEx).

The Link GUI control in the Notes window can turn any fully formed Web address into a hotlink for immediate access.

Using the Link GUI control comes with a couple of foibles, but, for the most part, it behaves in a manner very similar to the Text GUI control.

Continue reading

Moving Forward with AutoHotkey Chrome.ahk Tools

My Last Three Blogs Offer a Basic Introduction to Installing and Running the Chrome.ahk Web Page Automation Tools—Find More Resources for these Useful Functions

In my earlier blogs, I posted a beginner’s introduction to GeekDude’s Chrome.ahk Web page automation tools:

I wrote these columns to bridge the gap between the novice-level user and the videos produced by GeekDude and Joe Glines—even causing me to take time to allow the techniques to ferment in my brainpan. While the videos provide excellent information, they assume a certain level of user experience. Hopefully, my blogs provide enough insight to allow new users to:

  1. Develop a basic understanding of how Chrome.ahk functions facilitate the completion of Web forms while highlighting the complications from HTML and Javascript code.
  2. Make a decision about whether they will continue to pursue these Web automation techniques.

After this reference blog, unless someone asks me specific questions about Chrome.ahk, I intend to move on to other topics.

Continue reading

Using Chrome.ahk AutoHotkey Tools to Automatically Fill-in Web Forms (Part 2)

How to Write Javascript Code for Web Page Automation Using AutoHotkey Chrome.ahk Tools—Digging into the Quirks of Javascript

In my last blog (“Using Chrome.ahk AutoHotkey Tools to Automatically Fill-in Web Forms (Part 1)“), I discussed how to reveal Web page control names in the source code. This time, I explain how to use those control names to write Javascript expressions for inserting data into text fields and activating menu items and buttons.

Javascript Code

HTML code creates the Web page structure—including editing fields, menus, and buttons. We use Javascript commands to initiate action within the static HTML Web. The functions found in Chrome.ahk AutoHotkey tools use Javascript expressions to send commands to the active Web page by channeling those directives through a Chrome debugger channel. You must use Javascript to communicate with the Web page.

Continue reading

Using Chrome.ahk AutoHotkey Tools to Automatically Fill-in Web Forms (Part 1)

Analyze Web Page HTML Code to Find Control Names and/or IDs for Writing Javascript Expressions for Automating Web Forms Using the Chrome.ahk Library

Logging into online accounts ranks as one of the most common motivations for AutoHotkey users automating Web pages. Using screen-level AutoHotkey Web page automation can get cumbersome. For more reliable and accurate solutions consider source-level automation using the AutoHotkey Chrome.ahk Library of tools. However, before automating any Web forms with these functions, you need to accomplish two tasks:

  1. Analyze the Web page to identify the target HTML controls’ name or id (e.g. text fields, buttons, etc).
  2. Write Javascript action expressions for use with the Chrome.ahk library.

In this blog, I introduce how to identify the controls required to fill in a Web form. In my next blog, I’ll address the more complex task of writing the Javascript expressions for Web page input.

Continue reading

Installing Chrome.ahk AutoHotkey Web Page Automation Tools

Although It Comes with a Bit of a Learning Curve, the Chrome.ahk AutoHotkey Library Offers More Precise Source-Level Web Page Automation

(Updated November 5, 2020) Last time, I highlighted the limited techniques available for automating Web pages at the screen-level. The Web browser insulates the user from the underlying HTML and Javascript page code preventing the use of control names for automating Web pages.

This time, I introduce source-level Web page automation running a short test script after installing a set of Google Chrome AutoHotkey source-level Web page automation tools—Geekdude’s Chrome.ahk Library. I’ve set up a test page called “Jack’s AutoHotkey Chrome Test Page” for a quick trial of the tools. (When initially viewing the test Web page, you should see a set of three empty input fields: First Name, Last Name, and Street Address.) In this blog, I discuss how to install and set up the Chrome.ahk tools—then access the setup by running a sample AutoHotkey script that automatically fills in the three input fields:

The test script inserts data into the three input fields, then displays a Chrome message box displaying, “Hello World!”

If you can get this test script running with your Chrome browser, then a totally new world of Web page automation opens up.

Continue reading

Fixing AutoHotkey Web Lookup Scripts

If a Web Page Changes Format, the Data-Extracting Regular Expressions (RegEx) May Need Updating—Fixing the SynonymLookup.ahk Script

When writing a blog, I tend to use certain words over and over again. While rereading early versions, these redundant words jump out at me. Not only do they point out my limited vocabulary, but the repetitions tend to render my blogs a little more starchy and boring. That’s why I often resort to my always-loaded SynonymLookup.ahk script. This app saves time while making me look a little smarter.

The current version of SynonymLookup.ahk script lists more possibilities and marks antonyms (most of the time) with a caution sign (). (Click image for expanded view.)

After I discover a duplicated word, I highlight it, then hit the Ctrl+Alt+L Hotkey combination. A menu of possible replacements pops up. I click on the one that best fits my intent and the new term immediately displaces the original text. I habitually use this script.

When the SynonymLookup.ahk Script Breaks

Over the life of the script, I’ve encountered the menu shown at right a couple of times. This menu pops up whenever the script downloads and scans the source code 10 times without getting a RegEx hit—usually the result of code changes made by the source page Webmaster.

Continue reading