The next most dangerous combination of operators is based upon filetype operators or extension based searches. Used with the – operator to eliminate common Internet file types like S/HTM/L, ASP/X and PHP, a site can be searched for terms residing in rest of the files, which may be documents, worksheets, research papers or even multimedia or disk image files. Few of the popular queries for file type searching are “admin account info” filetype:log and “#mysql dump” filetype:sql- surely one can be more creative. For example, http://*:*@www” domainname attempts to get inline password as well as domain names starting with word-www and intitle:”Indexof” config.php – a common place to find username and password information.
Such is the power of Google and if someone is still not convinced, a search on Google for other such queries can list hundreds of advanced queries meant specifically for server OS, drivers, credit cards, login portal detection, usernames, passwords, error messages, shopping info, searching vulnerable servers and files and more. Talking about vulnerabilities, a very good starting point when the target is general net-populace, sifting through security advisories and patch sites of popular software vendors gives exactly the build and version of vulnerability affected systems. Simple searches such as powered by/driven by/running/Maintained with/Welcome to <affected product with version e.g. Web Server/Mail Server/> is a common way to get the exact locations of potentially vulnerable systems. Hackers often search for open video sharing tools to play around.
Here, one example could be of interest that is the serial number revelation. It has been observed that at times, support departments of companies create spreadsheets of license keys owned by a company and place it over network (internet or extranet). If such files find their way to the Internet, one can imagine how easy it becomes to get license key off internet. Computer software’s name (e.g. OS) and a part of known license key can be used to search for complete keys and there are two more sources of information made available by Google, translation facility and newsgroup. This means, at least western script languages are not completely out of reach and newsgroups can be monitored for interesting information as well. There have been rumors of source code searching as well.
Then there are examples of CGI scanning and automating the Google-based security querying through freeware tools. One such tool for “nix systems is called Gooscan dubbed by a security professional as ‘front end for an external server assessment and aids in the information-gathering phase of a vulnerability assessment’. Another one is SiteDigger but it requires legitimate access to Google API to perform the built-in security related searches. These tools show to the site administrators what others might be able to glean from the site in its present state.
These are ample proofs of Google’s ability and utility in any serious penetration testing activity.
Protecting a company from the immense power of Google is a decision that large corporations need to take, particularly those involved in public sector services, education, manufacturing and financial sector. First of all, consider de-listing sensitive sites or domains from Google or better still, do not take the risk of placing information over web servers even it is meant for authorized access for a few days, because no one knows what could happen in those few days. Next, manually running the potentially hazardous queries and analyzing the findings before others could do so is also helpful.
Cult of the Dead Cow (cDc), a famous hacking group, has released a freeware tool called Goolag Scanner. This tool is capable of finding vulnerabilities in websites using data collected from Google. The website owners can use it to inspect their own websites. It also lets you scan the web-pages and find misconfigured web servers and those with open backdoors, weak user IDs and passwords.
And lastly, making it a schedule to assess the amount of information given out by way of Google searching and limiting it using the robots.txt are also highly advisable.