Google As A Hacking Tool
Just as a knife can either be used to cut an apple or a throat, similarly the Google search engine can be put to use for both constructive as well as destructive purposes. It is not naïve to believe that a search engine that has seen such a massive growth and has ability to search a mammoth number of pages, would eventually find its use as potent tool in the hands of the black had community.
For a complete understanding of how to utilize Google for information gathering, there are books such as “Google Hacking for Penetration Testers” and other others. Google API is an also a relevant subject to study. Here, we will explore some of specific queries and show what they can reveal. You will notice that most of these queries make use of common and advanced search operators of Google such as OR, AND,|, -. +, ., which serve the purpose of choice, exclusion, inclusion, exact phrase matching, single character and multi-character wild card. Advanced operators take the form of :Search_Term.
Let us start with some of the most popular queries used in the initial phase of hacking, which is part of the reconnaissance mission once the target has been selected. For starters there is invaluable –site operator. It can be used to search a domain for unknown or hidden sub-domains for example a query site:bbc.co.uk –site: www.bbc.co.uk would return the results from bbc.co.uk domain, which is other than www.bbc.co.uk. In a similar way, one can search for uncommon, not-so-obvious and un-advertised contact points at the target – effectively increasing the attack surface area. Further, combining it with an IP look up can reveal how many different physical machines are serving the content. Once this info is at hand, next query intitle:index.of can be used, which is considered as the de-factor directory listing shortcuts that work against Apache Web servers. Results would reveal locations where directory listing is open on a site, when this query is combined with ‘site’ operator. Some other queries to locate directory listings are intitle:index.of “parent directory”, intitle:index.of server.at and intitle:index.of name size. All these should be combined with the site operator. As this query would identify presence of only Apache on the target, for other Web servers and their specific versions (needless to say, this information would help in later checking for version specific vulnerabilities), presence of default pages can be searched through the terms IIS, Internet, Under construction, Welcome to Windows in the title using the intitle or allintitle operators. The title searching technique is far more informative than mere server location – it can show up some nasty details if used intelligently. For example, most of those who have worked as Web developers know that when a page crashes, the default error pages throw up really ‘useful’ information meant for debugging, like variable names, query strings, line numbers and at times – usernames and passwords. So, a simple search for terms like error and access denied for user or exact SQL error messages starting with ORA in page titles and bodies over a target site can display unexpected information. For example ORA-00921: unexpected end of SQL command. Other areas to look for interesting information are hyperlink and caches – accessed through the inurl, link and cache operators. For example login pages, running query site: login I logon would show pages with text login and searching the same in a particular URL might lead to the ASP, ASPX or other page. Needless to say this can prove to be valuable information for anyone who wants to make further inroads in to the Web server since the ‘gates’ of the target site have been revealed. Other keywords to look for in page titles, URLs, links and generally anywhere else include – admin, username, ID, password and common phrases like “your username is”, “please contact your administrator”. Another common search target is forgotten password pages or helpdesk pages that might contain information helpful in launching social engineering attacks. Similarly, searching for backup and logs are also considered to be good sources of acquiring generally off-limit info.
The next most dangerous combination of operators is based upon filetype operators or extension based searches. Used with the – operator to eliminate common Internet file types like S/HTM/L, ASP/X and PHP, a site can be searched for terms residing in rest of the files, which may be documents, worksheets, research papers or even multimedia or disk image files. Few of the popular queries for file type searching are “admin account info” filetype:log and “#mysql dump” filetype:sql- surely one can be more creative. For example, https://*:*@www” domainname attempts to get inline password as well as domain names starting with word-www and intitle:”Indexof” config.php – a common place to find username and password information.
Such is the power of Google and if someone is still not convinced, a search on Google for other such queries can list hundreds of advanced queries meant specifically for server OS, drivers, credit cards, login portal detection, usernames, passwords, error messages, shopping info, searching vulnerable servers and files and more. Talking about vulnerabilities, a very good starting point when the target is general net-populace, sifting through security advisories and patch sites of popular software vendors gives exactly the build and version of vulnerability affected systems. Simple searches such as powered by/driven by/running/Maintained with/Welcome to <affected product with version e.g. Web Server/Mail Server/> is a common way to get the exact locations of potentially vulnerable systems. Hackers often search for open video sharing tools to play around.
Here, one example could be of interest that is the serial number revelation. It has been observed that at times, support departments of companies create spreadsheets of license keys owned by a company and place it over network (internet or extranet). If such files find their way to the Internet, one can imagine how easy it becomes to get license key off internet. Computer software’s name (e.g. OS) and a part of known license key can be used to search for complete keys and there are two more sources of information made available by Google, translation facility and newsgroup. This means, at least western script languages are not completely out of reach and newsgroups can be monitored for interesting information as well. There have been rumors of source code searching as well.
Then there are examples of CGI scanning and automating the Google-based security querying through freeware tools. One such tool for “nix systems is called Gooscan dubbed by a security professional as ‘front end for an external server assessment and aids in the information-gathering phase of a vulnerability assessment’. Another one is SiteDigger but it requires legitimate access to Google API to perform the built-in security related searches. These tools show to the site administrators what others might be able to glean from the site in its present state.
These are ample proofs of Google’s ability and utility in any serious penetration testing activity.
Protecting a company from the immense power of Google is a decision that large corporations need to take, particularly those involved in public sector services, education, manufacturing and financial sector. First of all, consider de-listing sensitive sites or domains from Google or better still, do not take the risk of placing information over web servers even it is meant for authorized access for a few days, because no one knows what could happen in those few days. Next, manually running the potentially hazardous queries and analyzing the findings before others could do so is also helpful.
Cult of the Dead Cow (cDc), a famous hacking group, has released a freeware tool called Goolag Scanner. This tool is capable of finding vulnerabilities in websites using data collected from Google. The website owners can use it to inspect their own websites. It also lets you scan the web-pages and find misconfigured web servers and those with open backdoors, weak user IDs and passwords.
And lastly, making it a schedule to assess the amount of information given out by way of Google searching and limiting it using the robots.txt are also highly advisable.