390 likes | 494 Views
Phishing Statistics and Intuitive Enumeration of Hosts and Roles. February 7 th , 2009. Table Of Contents. Short Overview of Phishing Ambiguity and Statistics Gathering Tagging and Intuitive Analysis Social Networks and Administrative Hosts Phishing Metrics Tool Demonstration
E N D
Phishing Statistics and Intuitive Enumeration of Hosts and Roles February 7th , 2009
Table Of Contents Short Overview of Phishing Ambiguity and Statistics Gathering Tagging and Intuitive Analysis Social Networks and Administrative Hosts Phishing Metrics Tool Demonstration Open Floor – Let’s Talk
Motivations • Wanted to justify phishing as a useful exercise to clients. • Needed to find a way to gather more reliable statistics for reporting • Along the way I realized I was getting very useful information…even when users didn’t enter in credentials.
Ambiguity of responses (1 of 5) • Let’s take a simple example scenario: • An e-mail was sent to addresses A,B, and C, asking them to log into a web application owned by the attacker. • The site did not capture any login attempts, however web server logs indicate that 2 requests for the main page were received from 2 different IP addresses. • So, given the above information: • What was the response rate? • Were any of the requests from forwarded e-mails? • Who were the responses from?
A B C A B C A B C Ambiguity of responses (2 of 5) • Any of the following might have occurred: capture site
A A F D C B F C C B A D B E Ambiguity of responses (3 of 5) • Or these: capture site
A A C B E’ E C B A’ A B C Ambiguity of responses (4 of 5) • Or even these: D F capture site
Ambiguity of responses (5 of 5) • Even with a small example, the number of possible response scenarios is infinite. • Say we send 100 emails, and get 100 responses. Did one guy respond 100 times, or did each person respond once? • Without some method of identifying where the response originated from, larger response sets become even more difficult. • To be fair, these small response sets aren’t very helpful for our ultimate goal, and most phishing schemes will target large numbers of e-mail addresses
Attribution • Attribution is not an issue in the wild. The attacker doesn’t care who responds, just that they get a response. Testers often take the same perspective. • For reporting, attribution is critical. We want to know which e-mails were responded to, when, from where, and how often, in order to generate useful statistics. • We would also like to determine statistics from half-responses, or cases where someone went to the capture site but didn’t enter information. • To accomplish this, a tagging technique can be used.
Tagging (1 of 3) • Tagging e-mails involves manipulating the message content so that the response received by the server is unique for each e-mail sent. • Because each e-mail has a unique tag, the tester can gather statistics based on the responses they receive. • Tagging can take many different forms: • Each e-mail has a URL that points to a unique IP • Each e-mail has a URL that points to a unique port • URL’s contain parameter values that identify the recipient • Recipients are directed to uniquely named resources
Tagging (2 of 3) • Port based example: • Target address A has links that point to port 8081 (http://www.attacker.com:8081) • Target address B has links that point to port 8082 (http://www.attacker.com:8082) • Requests to port 8081 MAY indicate a response from target address A • File based example: • Symbolic links indexA.html and indexB.html are setup to point to index.html • Target address A has links that point to indexA.html (http://www.attacker.com/indexA.html) • Target address B has links that point to indexB.html (http://www.attacker.com/indexB.html) • Responses can be differentiated by the file they request, but a single file can be used to control content.
Tagging (3 of 3) • NOTE: Tagging does not mean that the person that owns an e-mail address is necessarily the one responding! • Tags should be unique enough that they adequately eliminate “guessed” or false responses. In which case, some tagging schemes are better than others. • Combinations can be used to eliminate false responses: • Port and file based (http://www.attacker.com:8082/indexE.html ) • Numeric identifiers (http://www.attacker.com/112233/index.html ) • We can start to infer other details as well, based on the tag that is received, and where the requests originate from….
Tagging example (1 of 6) • The original example scenario can be restated with tags: • An e-mail is sent to addresses A with tag {A}, B with tag {B}, and C with tag {C}, asking them to log into a web application owned by the attacker. • The site did not capture any login attempts, however web server logs indicate that 2 requests for the main page were received from 2 different IP addresses. Both requests contain tag {A}. • So, given the above information, we answer the same questions as before: • What was the response rate? 1/3 or 33%, because of the 3 e-mails sent, only one of the tags had responses. • Were any of the requests from forwarded e-mails? There is a high probability due to unique tags being received from two distinct IP addresses. • Who were the responses from? Responses were received with the tag sent to e-mail address A. • The last question is still a bit difficult to answer definitively…
A C B A B B A C C Tagging example (2 of 6) • None of these scenarios occurred, because B and C do not know {A}: {A} {A} {A} {A} {A} {A} server
C F D B A F E A B A B D C C Tagging example (3 of 6) • The second scenario is possible, because the e-mail forwarded to E and on to F would contain {A}: {A} {A} {A} {A} {A} {A} server
C B A E’ E C A A’ B A C B Tagging example (4 of 6) • The first two scenarios are possible because {A} would be known. However D and F should not be able to guess {A}: D F {A} {A} {A} {A} {A} {A} server
Tagging example (5 of 6) • Tags eliminate a lot of the possible scenarios, but not all of them, like a single user sending a request from multiple locations. • With larger response sets, differentiation becomes a little easier. • “There is a high probability that the e-mail was forwarded” is still a lot better than “We have no clue whatsoever”.
Tagging example (5 of 6) • We can now be more conclusive with some of the more critical statistics including: • Response rate (we know how many tags were sent, and how many were requested) • Fastest and average response times • We can do more, though, because we have more information about each response including: • The tag (which identifies the recipient address of the original e-mail) • A time and an IP address associated with the response
Intuitive Analysis (1 of 3) • Even without any captured information by the web application, we can infer : • The e-mail to A was accessed from two different IPs • The user who checks A may be forwarding phishing e-mails • If we send an exploit to A, it could hit multiple machines. • E-mails to B and C did not get any responses • It took X minutes for the first response to come in when we send e-mail to A, and Y minutes for the second response. • If web server logfiles are configured correctly, we may know what types of browsers sent the requests • By analyzing the source IPs of the two requests we might determine more information: • Are both of the IPs on the same network? • If not on the same network, who owns them (ISPs, France…)?
Intuitive Analysis (2 of 3) • With intuitive analysis, it is often possible to determine the most likely situation based on the responses that were received, assuming there is enough raw data. • For example: • More likely that someone forwarded an e-mail to 20 people than it is that they clicked on a link from 20 different computers. • If every response we receive comes from a single IP address, it is more likely that it is a proxy rather than a single user on a host responding to every e-mail. • We can’t completely eliminate these possibilities, but they become less probable based on the data. • NOTE: Not all of our assumptions will be correct, but they do give us much better odds for follow on attacks.
Intuitive Analysis (3 of 3) • Other types of information that might be gathered using intuitive analysis:
192.168.1.1 192.168.1.2 192.168.1.3 192.168.1.4 Response Examples: Forwarded E-mails C B A {C} {A} {A} {A} {A} {B} {A} server
192.168.1.1 192.168.1.2 192.168.1.3 192.168.1.4 ? ? A ? ? ? Response Examples: Forwarded E-mails {A} • Here is a likely scenario {A} {A} {A} {A} server
192.168.1.1 192.168.1.2 192.168.1.3 192.168.1.4 ? ? ? A ? ? Response Examples: Forwarded E-mails {A} {A} • Or this…. {A} {A} {A} {A} {A} {A} {A} server
192.168.1.1 192.168.1.2 192.168.1.3 192.168.1.4 A A A A Response Examples: Forwarded E-mails • COULD this have happened? Yes. {A} {A} {A} {A} server
Intuitive Analysis of Forwarded E-mails • Basically, a unique tag made it into an e-mail client on multiple hosts, and the users clicked on something that sent the tag to the server. • The fact that the requests originated from several different hosts allows us to guess what might likely have occurred. • We don’t care how many times the e-mail got forwarded, or where they ended up, just that we can answer the question: did this e-mail get forwarded? • In addition, if we know or guess that B logs in at 192.168.1.2, and C logs in at 192.168.1.4, then we have evidence that A, B and C have some communication path in common. This is a potential social network.
192.168.1.1 192.168.1.2 192.168.1.4 A C B Response Examples: Social Network {A} {C} {A} {B} {A} {B} {A} {C} server
192.168.1.1 192.168.1.1 192.168.1.1 192.168.1.4 Response Examples: Administrative Hosts C B A {C} {A} {B} {C} {A} {B} {A} server
192.168.1.1 192.168.1.4 B ? A C ? Response Examples: Administrative Hosts 1 {C} {C} {B} {A} 1) A replies but then forwards to the network admin 2) B sends to the network admin 3) C forwards to someone else, who forwards to the admin 4) The admin investigates, but in the process sends those tags as part of their requests. {A} {B} {C} {A} server
192.168.1.1 192.168.1.4 B A ? C ? Response Examples: Administrative Hosts 2 {C} {C} {B} {A} 1) A is the admin, and receives forwarded messages 2) B sends to the network admin, ie. A 3) C forwards to someone else, who forwards to the admin, ie. A 4) The admin investigates, but in the process sends those tags as part of their requests, including A’s tag. {A} {B} {C} {A} server
Intuitive Analysis of Administrative Hosts (1 of 2) • Multiple tags responded to from the same IP could mean: • Host is a request proxy • User is part of a social network, is responding to multiple forwarded e-mails • The user is an administrator or has another important role, and is investigating a potential security issue • Depending on how many e-mails we sent, and how many responses we received, the analysis may be different…
Intuitive Analysis of Administrative Hosts (2 of 2) • Thresholds can help eliminate some of these multiple IP response scenarios: • If a large number of all responses come from the same IP (say 75%), it is unlikely an administrator would respond to so many phishing e-mails. It’s most likely a proxy. • If the IP responds to 2 tags, it could just be from a single forwarded e-mail. 3 tags responded would be more likely from someone receiving multiple e-mails, but could still just be a social network. 5 tags would probably indicate an administrator doing some probing. • Determining appropriate thresholds depends on the individual task, how many e-mails were sent, etc.
Size Matters • Too few e-mails results in skewed statistics, and it is not likely to catch social network memberships. Need a good sample size. • Ultimately we want a decent number of targets, each with a unique tag, so we need to be able to craft potentially hundreds of unique e-mails. • We also need to handle the responses for all of the requests we receive, check their tags, etc. • So…