AutoHotkey Tip of the Week: Cull Web Links from a Web Page and Activate Each in a Pop-up GUI

This Time I Combine a Number of AutoHotkey Techniques to Put Active Links in a Graphical User Interface (GUI) Pop-up Saving Space with GUI Tabs

As I pondered the GetActiveBrowserURL() function from last time, I looked for more ways to use this unique function by reviewing Chapter Ten, “An App for Extracting Web Links from Web Pages” from A Beginner’s Guide to Using Regular Expressions in AutoHotkey. By combining the function with the UrlDownloadToFile command and a couple of GUI controls (Link and Tab), I quickly wrote a script for collecting all of the external links from a Web page into a pop-up window displaying a list of active links—merely, click to follow one.

WebPageLinks
The GUI contains 10 tabs—most with 20 hot links each scraped from the ComputorEdge Free AutoHotkey Scripts page.

This process included a number of learning points worth discussing:

  1. I found the GetActiveBrowserURL() function more reliable and robust than using the Standard Clipboard Routine.
  2. Depending upon the target Web site, you may need to tailor your Regular Expressions (RegEx) to produce the most useful results.
  3. The GUI Link control creates hot Internet links for immediate action.
  4. The GUI Tab control wraps long lists for scenarios where no scroll bars exist and column wrapping proves impractical.

In this blog, I offer the script with a short discussion of the Regular Expressions (RegEx). In a future blog, I’ll discuss how to build a GUI pop-up window with an unknown number of hot Weblinks (almost 200 in the example at right) while not letting it get out of hand. But first, my thoughts on the GetActiveBrowserURL() function.

The GetActiveBrowserURL() Function

My initial reaction to the GetActiveBrowserURL() function may have seemed a little dismissive. After all, in most Web browsers, the Web page address appears in the address bar. Anyone would find it simple enough to copy the URL or select it for use in the Standard Clipboard Routine. Yes, the function does save a couple of clicks but does it improve the reliability of the script.

AutoHotkey AutoHotkey Library Deal!

While I use the Standard Clipboard Routine in many of my scripts, it occasionally returns the “No text selected!” error—even though I can plainly see the highlighted text. When I retry the Hotkey combination, it usually works. I assume that these errors relate to script and computer speed. Strategic placement of the Sleep command may resolve some of these issues, but it occurs so infrequently that I have not pursued it.

After using the GetActiveBrowserURL() function extensively while testing this latest script, I found it much more reliable than using the Windows Clipboard. Based upon this anecdotal experience, I plan to continue using the function whenever I need to capture the URL of the active page from a Web browser.

Note: Since I have placed the GetActiveBrowserURL() function in my function library, you won’t find the function (or an #Include directive) in the script below. See “Capture Web Page Addresses (URLs)” to obtain a copy of the function. Either place it at the end of the script, #Include it, or add it to your library. See “Guidelines for AutoHotkey Function Libraries.”

The WebLinkFind.ahk Script

Run the following script, open any Web page in one of the compatible browsers (most work), and use the Hotkey combination CTRL+WIN+ALT+L. The routine captures the URL (GetActiveBrowserURL() function required), downloads the page, uses RegExMatch() function to extract external text Web links and the RegExReplace() function to clean up those links, then it places each in a GUI pop-up window as a hotlink using GUI Link controls for activation and Tab controls to wrap long lists:

 ModernBrowsers := "ApplicationFrameWindow,Chrome_WidgetWin_0
            ,Chrome_WidgetWin_1,Maxthon3Cls_MainFrm
            ,MozillaWindowClass,Slimjet_WidgetWin_1"
; LegacyBrowsers := "IEFrame,OperaWindowClass"

^#!l::
  sURL := GetActiveBrowserURL()
;  WinGetClass, sClass, A
  WinGetActiveTitle, WinTitle
  UrlDownloadToFile, %sURL%, URLtemp
  FileRead, URLtemp, URLtemp
  Next := 1
  LinkCount := 0
  TabCount := 1
  TabList := "1"
  Gui, Add, Tab3,, 1
  Loop
  {
    FoundPos := RegExMatch(URLtemp
             , "<a.+?href=""(https?.+?)"".*?>(.+?)</a>" 
             , Link, Next)
    If FoundPos = 0
      Break
    Else
      {
        Link2 := RegExReplace(Link2, "<.+?>")
        If (Link2 != "") 
        {
          Gui, Add, Link,, <a href="%Link1%">%Link2%</a>
          LinkCount := LinkCount + 1
          If LinkCount = 20
          {
            TabCount := TabCount + 1
			TabList  := TabList . "|" . TabCount
			GuiControl, ,SysTabControl321, %TabCount%
            Gui,Tab, %TabCount%
            LinkCount := 0
          }
      	}
        LinkList := LinkList . Link2 . "`n" . Link1 .  "`n`n"
        Next := FoundPos + StrLen(Link)
      }
  }
  If LinkList !
    MsgBox No External Links Found!
  ; Delete old file, write new file, and open with Notepad
  FileDelete, LinkText
  FileAppend, %LinkList%, LinkText
  Gui, Show, , %WinTitle%
Return

GuiClose:
Gui, Destroy
Return

May 11, 2020 Update: I commented out the lines in red above as most likely unnecessary for the GetActiveBrowserURL() function to operate properly.

Note: The first time you run the script it may seem a little slow but subsequent tests from the same load display results quicker.

The Regular Expressions (RegEx)

The script uses the RegExMatch() function to extract the external Weblinks:

FoundPos := RegExMatch(URLtemp
               , "<a.+?href=""(https?.+?)"".*?>(.+?)" 
               , Link, Next)

Cover 200The expression identifies the location of the HTML link tag. By using href and https? as markers, the RegEx specifies external links, rather than jumps within the page or abbreviated links to other pages on the same site. After extracting a link, the Loop adds the Next variable to StrLen(Link) to search for the ensuing link—if any. The subpattern Link1 stores the link URL while Link2 saves the link text.

Note: You can find a detailed discussion of the above RegEx in Chapter Ten of the book A Beginner’s Guide to Using Regular Expressions in AutoHotkey.

This version of the routine saves only the visible text for each link by removing any included HTML code. The script filters out any image tags or other non-text objects. The following RegExReplace() function removes the HTML tags from the subpattern Link2:

RegExReplace(Link2, "<.+?>")

This excludes images (and other non-essential HTML code) associated with links from the results.

If you need a different outcome in your script, check the downloaded file (URLtemp) for code specifics to guide your modification of the Regular Expressions. For example, with a few exceptions, the Weblinks (href) found on the AutoHotkey documentation pages do not include the HTTPS:// prefix.

Internal page jump:

<a href="#Link">Link</a>

Page on the same server site:

<a href="Gui.htm#label">g-label</a>

Relative page location:

<a href="../Variables.htm#GuiEvent">A_GuiEvent</a>

If you want to create working links in the GUI, you need to add the domain name (and possibly the page) to the addresses as appropriate.

Internal page jumps add the site address and page:

<a href="https://www.autohotkey.com/docs/commands/Gui.htm#Link">Link</a>

Page on the same server site add the site address:

<a href="https://www.autohotkey.com/docs/commands/Gui.htm#label">g-label</a>

Relative page locations replace double dots with the page:

<a href="https://www.autohotkey.com/docs/commands/Variables.htm#GuiEvent">A_GuiEvent</a>

This requires both a modification to the RegEx and conditional manipulation of the results. The current script will never display any of those internal autohotkey.com links.

Next time, I take a closer look at the GUI-building features which insert an undetermined number of links while adding tabs to accommodate the list without expanding beyond the computer screen.

Click the Follow button at the top of the sidebar on the right of this page for e-mail notification of new blogs. (If you’re reading this on a tablet or your phone, then you must scroll all the way to the end of the blog—pass any comments—to find the Follow button.)

jack

This post was proofread by Grammarly
(Any other mistakes are all mine.)

(Full disclosure: If you sign up for a free Grammarly account, I get 20¢. I use the spelling/grammar checking service all the time, but, then again, I write a lot more than most people. I recommend Grammarly because it works and it’s free.)

Find my AutoHotkey books at ComputorEdge E-Books!

One thought on “AutoHotkey Tip of the Week: Cull Web Links from a Web Page and Activate Each in a Pop-up GUI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s