The Trouble with Scripts Which Use Web Page Information, Plus AutoHotkey Tools for Downloading Web Page Source Code
While it doesn’t usually happen this fast, AutoHotkey scripts which depend on downloaded Web page data can go bad at any time. Last week, I discovered a clever index in the AutoHotkey site which allowed quick access to commands and other documentation. Now, I find that the site has changed and only a couple of the previous searches work.
I don’t know if my highlighting the feature prompted the change by the Webmasters or whether it was already in the works. In any case, you can forget about my last blog—at least from the point of view of writing an AutoHotkey quick reference script.
I think that the indexing may have been added as a tool for the site administrators—never intended for public consumption. Maybe, I inadvertently stumbled upon a security issue. Who knows? In any case, what looked only a week ago as a clever index which could make life easier for every AutoHotkey script writer appears to be blocked now.
Web Sites Change
When Web designers decide to change a Web page, they may defeat the purpose of some site-dependent AutoHotkey scripts—at least temporarily. Since these types of scripts rely upon parsing the page source code, any change to that code potentially renders the script inoperable.
That’s probably what happened to the now broken Dictionary.ahk script. No one (including myself) has taken the time to fix it. (To be fair, Dictionary.ahk uses Regular Expressions (RegEx)—it’s not necessarily a quick fix.) On the upside, fixing broken scripts usually takes less time than writing the original. In most cases, minor modifications to the script code or a RegEx will do the job.
I have not given up on the idea of using AutoHotkey.com for building quick reference scripts. I have other thoughts on how to use the site for more tools. These won’t depend on unique indexing features of the site itself which can be turned on and off. I may need to build my own indexes, but I think I can write a script which pulls the URLs I need, the parses specific types of information.
Tools for Accessing Web Page Data
The two approaches available for downloading Web page source code include URLDownloadToFile command and the download text to a variable example using the ComObjCreate() function (found on the same AutoHotkey page).
In the first case, URLDownloadToFile saves the source code for a Web page directly to a file. To parse the code, the file must be read into a variable with the FileRead command. In the second ComObjCreate() example, AutoHotkey bypasses the file and saves the source code directly to a variable. Which approach you use depends on how you plan to implement your reference tools. The URLDownloadToFile command requires less code, but the slightly more complex ComObjCreate() function probably offers greater execution speed.
This minor AutoHotkey.com reference setback has me thinking about other ways to build tools with the site data. While I will concentrate on the AutoHotkey site in the coming weeks, I plan to demonstrate techniques for accessing Web data with AutoHotkey scripts. They should work just as well for your favorite sites. The tools I build will use one of the two site download approaches mentioned above, plus Regular Expressions (RegEx) and/or pertinent other AutoHotkey commands (e.g. StringSplit). While there are a number of other ways to parse data, Regular Expressions are ideal for the uneven Web environment. For anyone who wants to add more power to their AutoHotkey scripts, I recommend learning how to use RegEx. (See this Introduction to Regular Expressions (RegEx) in AutoHotkey.)
The first topic addresses the issue of case-sensitive URLs. If the address entered does not conform exactly to the systax of the Web page URL, it won’t work. This next blog covers how to ensure the proper case (e.g. IfWinActive.htm with embedded caps) for accessing AutoHotkey command Web pages.
* * *
If you find Jack’s AutoHotkey Blogs useful, then please consider contributing by purchasing one or more of Jack’s AutoHotkey books. The e-books make handy AutoHotkey references.
One thought on “The Problem with Accessing Web Data with AutoHotkey Scripts”
For grabbing static text then RegEx is often regarded as overkill as well as unnecessarily complex. I’m very much an AHK beginner but I was still able to use WBGet to ‘scrape’ the contents of a table embedded in a forum’s web page then, instead of RegEx, used ‘if contains ,,’ to check for two words and an IP address within the table.
Hope this helps…