Alternative Web Page HTML Download Techniques (AutoHotkey Tip)

When One Method for Downloading HTML Code Breaks, Try the Alternative AutoHotkey Command

After noticing that, although I could quickly get the latitude and longitude for any location with a Google search in a browser, when I attempted to download that page using the GetWebPage() function code taken from the AutoHotkey documentation (shown below in the first script), Google stopped me. The Google server denied the download attempt of the coordinates for San Diego with the following statement:

403. That’s an error.

Your client does not have permission to get URL /search?q=latitude+longitude+san+diego+decimal&rlz=1C1GEWG_enUS953US953 from this server.

Thwarted by Google again (see my “Switched IPFind.ahk to OpenStreetMap.org for Reliable AutoHotkey GUI Map Embedding” blog), I wanted to find an alternative source for the same information.

I searched for an unblocked Web page providing the latitude and longitude. I didn’t have to look very far. (The site name appears in the AutoHotkey snippet below.) I wrote the following test code for proof of function:

WebPage := "https://latitudelongitude.org/us/san-diego/"

Page := GetWebPage(WebPage)

RegExMatch(Page,"<span style=""white-space: nowrap; border:1px solid #e85151; padding:4px;"">(.*?)</span>",LatLong)
MsgBox San Diego`r%LatLong1%

Return

GetWebPage(WebPage)
{
    whr := ComObjCreate("WinHttp.WinHttpRequest.5.1")   
    whr.Open("GET", WebPage, true)
    whr.Send()
    whr.WaitForResponse()
    RefSource := whr.ResponseText
    Return RefSource
}
Regular Expressions in AutoHotkey
Regular Expressions (RegEx) can be mysterious in any language.

Note: For more information on how to extract data from text using AutoHotkey Regular Expressions (RegEx), see A Beginner’s Guide to Using Regular Expressions in AutoHotkey.

I’m starting to think that I’m doomed to continually make changes to my favorite scripts in reaction to host Web page redesigns. Eventually, the entire Web may develop a siege mentality—turning into an array of paid APIs. But for now, I see plenty of options.

This latest obstacle thrown up by Google search made me wonder if the old GooglePhraseFix.ahk script would still function. It did…sort of. It employs the URLDownloadToFile command to save the Web page to a file rather than the above GetWebPage() code. The downloaded code did contain the appropriate search page HTML. It was not blocked! Although the script downloads the correct HTML code, it had stopped doing its job.

The GooglePhaseFix.ahk script does a Google search for highlighted phrases then returns corrected comments when terms such as “”Showing results for” or “Did you mean:” appear in the page. The original GooglePhraseFix.ahk script ran into problems because the most common term for possible errors now displays “Including results for” rather than those two previous phrases. I added the new key phase—plus, I made a couple of other changes to simplify and fortify the script function. But that’s a topic for another time.

I’ve been a fan of the code for the alternative AutoHotkey GetWebPage() option supplied in the documentation for downloading Web page content because it saves it directly to a variable. However, since sites such as Google have started intercepting this technique, I decided to give the original URLDownloadToFile command a test. I don’t know why, but the AutoHotkey command still works while the GetWebPage() function draws the boilerplate warning message. With the command, the only downside requires writing the HTML to a file before reading it into a variable for parsing—although this doesn’t seem to slow things down very much.

After examining the HTML code, I quickly picked out the Regular Expressions (RegEx) keys required to extract the latitude and longitude.

I modified the script to use the URLDownloadToFile command and wrote the appropriate RegEx to extract the latitude and longitude of a city:

City := "Seattle"
State := "WA"

UrlDownloadToFile, % "https://www.google.com/search?q=latitude+longitude" city state, temp
FileRead, contents, temp
RegExMatch(Contents,"class=""BNeawe iBp4i AP7Wnd"">(.*?)</div>",LatLong)
LatLong1 := RegExReplace(LatLong1,"<.+?>")
MsgBox,,GPS Coordinates, %city% %state%`r%LatLong1%

FileDelete temp
Return

Google allows a lot more variance in the input terms—often returning correct results for searches with typos.

I’m not sure why the URLDownloadToFile command continues to work. Maybe because it simulates an Internet Explorer browser. (Just guessing.) I might have to do with the fact that it actually calls a Microsoft Windows command called by the same name. Blocking this Windows feature could present a much bigger obstacle than the code found in the GetWebPage() function. In spite of all the other browsers in the world, millions of people (5.57%) continue to use Internet Explorer. Sadly over time, even the URLDownloadToFile command may also lose its effectiveness. Then we will need to sign up for APIs.

Click the Follow button at the top of the sidebar on the right of this page for e-mail notification of new blogs. (If you’re reading this on a tablet or your phone, then you must scroll all the way to the end of the blog—pass any comments—to find the Follow button.)

jack

This post was proofread by Grammarly
(Any other mistakes are all mine.)

(Full disclosure: If you sign up for a free Grammarly account, I get 20¢. I use the spelling/grammar checking service all the time, but, then again, I write a lot more than most people. I recommend Grammarly because it works and it’s free.)

Find my AutoHotkey books at ComputorEdge E-Books!

Find quick-start AutoHotkey classes at “Robotic Desktop Automation with AutoHotkey“!

3 thoughts on “Alternative Web Page HTML Download Techniques (AutoHotkey Tip)

  1. Hi Jack,
    I’ve tried both this and Google Phrase fix and neither seems to be returning results for me. Is it still working in Oct ’21? Might it be down to my access to a temp file (I don’t have admin access). The URLdownloadtofile function still seems to work for me.
    Thanks,
    Matt

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s