When Automating Tasks, Browser Web Pages Present Special Problems
Due to the nature of the Internet and the function of Web browsers, AutoHotkey users encounter particular issues when automating Web pages. AutoHotkey GUIs (Graphical User Interfaces) and many older Windows programs allow direct access to controls for automation. Newer apps tend to use ribbon menus which usually include accessible Alt+key shortcuts. However, Web browsers contain built-in protections which insulate users and make controlling operations more opaque. The average Web surfer only has access to what appears on the screen. Getting to the inner workings of Web browsers requires special tools.

When it comes to automating actions in a Web page with AutoHotkey you’ll find two options:
- Screen-Level Automation
Navigating and executing the on-screen Web page controls with basic AutoHotkey commands (e.g. Click, SendInput, etc). - Source-Level Automation
Accessing the page source code (HTML and JavaScript) to actively manipulate Web browser action with specially designed tools.
In either case, you need to do it through your Web browser. The Web browser (i.e. Internet Explorer, Google Chrome, Safari, Firefox, etc.) acts as a window into the Internet while hiding its inner workings from view.
Screen-Level Web Page Automation
By far, the most common form of AutoHotkey Web page automation involves working directly with the Web browser on the computer screen. Although often awkward, this approach directly interacts with the page using fundamental AutoHotkey tools.
On the downside, since the Web page view does not offer control names for direct access, we must resort to crude techniques. In order to engage Web page input fields and buttons, AutoHotkey needs to know where they sit in the browser window. That means the scriptwriter must study the page—often with Window Spy (right-click on an AutoHotkey .ahk icon in System Tray and select Window Spy from the menu)—returning pixel coordinates of control locations on the page. Once identified in the window layout, the script can include those coordinates for input field and button access.
Source-Level Web Page Automation
A more precise method for interacting with Web pages with AutoHotkey involves accessing the code running underneath the Web browser (the source delivered by the Web server): HTML code which sets up the static text and controls (e.g. input fields, buttons, links, etc.) and interactive JavaScript which can enact Web page changes after delivery to the Web browser. While more robust, this type of source-level automation involves more complications than screen-level automation.
Source-level automation often depends upon the specific browser. For example, you can use the Component Object Model (COM) tools to automatic Internet Explorer, but that won’t work with non-Microsoft browsers. Different browsers require a different set of tools to access a backdoor into the source code.
You’ll find third-party programs such as Selenium and iMacros for automating Web pages. However, I don’t plan to discuss these options (nor do I know enough about them to recommend either one). In future blogs, I’ll highlight a method for working at the source-level using only AutoHotkey tools (and a little JavaScript code).
Don’t get me wrong. Regardless of how you approach source-level Web page automation, you’ll encounter complications and a learning curve. You’ll need to learn a little HTML and JavaScript—although, nothing too complicated. For now, I highlight some of the commands you can use to do screen-level Web page automation.
AutoHotkey Screen-Level Web Page Automation Problem
The protections built into Web browsers present special problems for screen-level AutoHotkey automation. At a minimum, most other Windows programs offer ribbon menus or traditional menu bars; include right-click context menus; execute Alt+key shortcuts; or, in traditional Windows apps, use control names identifiable by Window Spy. Web browsers offer no such tools. On-screen Web pages respond to key combinations for navigation and page search, but these only allow AutoHotkey limited assistance in browser automation.
With restricted access to the Web browser on-screen controls, we resort to brute force techniques for AutoHotkey automation. These include:
- The Click command for locating the cursor on the page.
- The Send command (usually SendInput) for inserting data.
- Send the Tab character (examples) to jump to the next field or control.
- The Sleep command (or possibly the MsgBox command) to pause the script to account for time lags in Web page loading and in-page actions.
- Possibly use the ImageSearch command to locate icons, fields, or buttons in a Web page.
“Chapter Two: Use AutoHotkey to Instantly Insert Your E-Mail Address into Web Forms” of the free book AutoHotkey Tricks You Ought To Do With Windows (Sixth Edition) offers a couple of simple examples of screen-level Web page automation. Plus, while a number of my books discuss the Click and SendInput commands, the book Jack’s Motley Assortment of AutoHotkey Tips includes a section on Windows automation—although it doesn’t specifically discuss the issues involved in automating Web pages.
No Menus, Alt Commands, or Control Names for Web Pages
Most Windows programs give us some form of menus or control names allowing us to automate processes. But, without screen-level Web page hooks in Web browsers, scriptwriters resort to cruder techniques for quick-and-dirty automation:
- Use Window Spy to locate on-screen controls.
- Use the Click command to set the cursor location on the page.
- Use SendInput {tab} to jump to the next control.
- Use Sleep command to allow time for page loading and cursor movement.
While you’ll find other commands (e.g. ImageSearch) which assist in screen-level automation, the same Web page limitations apply.
Windows Spy to Learn the Lay of the Land
In many cases, the first step to screen-level Web page automation involves opening Window Spy and recording the key locations of the important controls.
The Focused Control field returns no values since you can’t access any control names at screen-level. Therefore, we use the “Window” (default) coordinates for Click locations. These coordinates return the relative location within the window needed by the Click command.
Click Command to Select Input Controls
After copying the coordinates for the control locations, you can use the Click command to directly place the input cursor into the proper input field. The Web page does not require exact coordinates as long as they fall within the input field. Then, use the SendInput command to input data.
For buttons, a Click at the proper location initiates its function. With pop-up menus, the script may need the SendInput {down 3}
command followed by another Click.
Tab Default Location
Usually when a Web page first loads, the input cursor defaults to a particular input field. In that case, you can skip the Window Spy step and use the SendInput {tab}
command to skip through the input fields. Also, the Return key often defaults to a Submit or Next button, in which case, you can use SendInput {enter}
to save or execute the page.
Browser Autofill Pop-up Menus
If you use autofill in your browser, then, after clicking inside an input field, the menu will pop open. You can use SendInput to move down the menu (SendInput {down 2}
), then select the proper item (SendInput {enter}
). This technique inserts data into multiple fields simultaneously.
^!#b::
Click 140,160
SendInput {down 2}
SendInput {enter}
Return
On the downside, if the autofill items move or change location in the menu, then the script needs updating.
Web Page Load Times
Unfortunately, for a variety of reasons, Web page load times can vary greatly. The AutoHotkey script must wait. Otherwise, the following AutoHotkey commands fly off into the nether regions. Use the Sleep command as appropriate or the MsgBox command for a dead stop.
:*:a1@::
Run, https://mybank.com/obc/forms/login.fcc
Sleep, 7500
SendInput, username{return}
Sleep, 5000
Sendinput, password{return}
Return
The Msgbox command requires user input to move on—although you can issue SendInput {enter]
or the Click command to continue.
While sometimes hit-or-miss, these techniques offer the easiest method for quickly automating a Web page. If you only use AutoHotkey for the occasional login, then you won’t need anything else. If you need to do much more with your Web pages, then perhaps you need to look at source-level Web page automation for better control of your browser and its loaded Web pages.
Next time, I start looking at source-level Web page automation. While these techniques offer more robust automation, it may take a little time to understand and implement the techniques. I’ll attempt to offer my best insights into how it works.
Click the Follow button at the top of the sidebar on the right of this page for e-mail notification of new blogs. (If you’re reading this on a tablet or your phone, then you must scroll all the way to the end of the blog—pass any comments—to find the Follow button.)

This post was proofread by Grammarly
(Any other mistakes are all mine.)
(Full disclosure: If you sign up for a free Grammarly account, I get 20¢. I use the spelling/grammar checking service all the time, but, then again, I write a lot more than most people. I recommend Grammarly because it works and it’s free.)
Find my AutoHotkey books at ComputorEdge E-Books!
Find quick-start AutoHotkey classes at “Robotic Desktop Automation with AutoHotkey“!
[…] Last time, I highlighted the limited techniques available for automating Web pages at the screen-level. The Web browser insulates the user from the underlying HTML and Javascript page code preventing the use of control names for automating Web pages. […]
LikeLike
[…] ranks as one of the most common motivations for AutoHotkey users automating Web pages. Using screen-level AutoHotkey Web page automation can get cumbersome. For more reliable and accurate solutions consider source-level automation using […]
LikeLike