Cybersecurity Research Overview

Cybersecurity Research Overview Victor 1/6/2014

Outline • Introduction • Types of Research • Systems Research • Malware Analysis • Botnets • Digital Forensics • Hacker Forum Research • IRC Channel Research • Conclusion

Introduction • As computers become more ubiquitous throughout society, the security of networks and information systems is a growing concern. • An increasing amount of critical infrastructure relies on computers and information technologies • Advancing technologies have enabled hackers to commit cybercrime much more easily now than in the past. • At the same time, accessibility to technologies and methods to commit cybercrime has grown (Radianti & Gonzalez, 2009) • Availability of technologies and methods to commit cybercrime have become more available (Moore & Clayton, 2009) • Legitimate services such as such as DNS servers and search engines have uses to promote cybercriminal activity

Introduction • With growing importance of cybersecurity, researchers have taken interest in both areas of cybersecurity research • Studies to improve system security and malware analysis techniques • New research on observing and analyzing hackers within their communities • Here we discuss the various forms of cybersecurity research • Both technical- and hacker community-focused studies • Including discussions of tools used to conduct your own analyses and research

Types of Research • There are various forms of cybersecurity research ranging from technical research to sociological studies: • Systems & Network Security • Malware Analysis • Botnet Research • Digital Forensics • Hacker Forums • Hacker IRC Networks • Traditional cybersecurity research has focused on technological challenges and improvements to mitigate cyberattacks (Hopper et al, 2009; Holt & Kilger, 2012) • Systems and security research for purposes such as intrusion detection systems, autonomous networks, etc. (Garcίa-Teodoro et al, 2009; Dsouza et al, 2013) • Improved malware analysis techniques to detect more advanced malware that may be obfuscated or previously unknown (Cova et al, 2010; Ismail & Zainal, 2012) • Botnet tracking and identification (Lu & Ghorbani, 2009; Zhang et al, 2011)

Types of Research • Such focus on technological improvements to enhance security has been largely dominated past cybersecurity research • However, in comparison to more technical works, there is little research done to investigate hackers themselves and the human element behind cybercrime • More research on black hat hackers, i.e. cybercriminals, would offer new knowledge on securing cyberspace against those with malicious intent (Siponen et al, 2010) • Specifically, developing “methods to model adversaries” is one of the critical but unfulfilled research needs recommended in the “Trustworthy Cyberspace” report by the National Science and Technology Council. (National Science and Technology Council, 2011)

Types of Research • As a result, many recent studies in cyber security have taken different paths to study cyber adversaries • Content and topological analysis of hacker forums • Observation of hacker Internet Relay Chat (IRC) chat interactions • Hacker social media, such as forums and IRC channels, are important resources for many cybercriminals • Since hacking knowledge is not typically found in formal education, the use of web-based resources to advance skills and knowledge is common among both black and white hats • Hackers often utilize forums and IRC channels to disseminate hacking knowledge(Radiantiet al, 2009; Motoyama et al, 2011) • Forums and IRC channels also serve as black markets, where cybercriminal assets are traded and sold (Radianti et al, 2009; Holt & Lampke, 2010) • Each type of research is valuable in and necessary to improve the overall security of cyber infrastructure

Systems Security • Improving security mechanisms incorporated into systems and networks has been a traditional focus of security research • Automated and integrated management of cyber infrastructure, including intrusion detection systems and autonomous networks (Chen et al, 2007; Aydin et al, 2009) • Protocol-level security to mitigate known security vulnerabilities (Pervaiz et al, 2010) • Research often consists of collecting data and performing experiments by simulating networks and systems • For example, collecting network traffic data under normal operations and comparing it to network traffic data during simulated cyber attacks can help anomaly detection methods (García-Teodoro et al, 2009) • Wireshark (http://www.wireshark.org/) is a tool commonly used for packet capturing and analysis

Systems Security Close Ports Change Policies Isolate router Automated and Integrated Management (AIM) Methodology (Dong et al, 2003): Cyber-infrastructure

Systems Security Fault Injection Point Distribution of Normal vs Abnormal System Calls for Anomaly Detection (Qu et al, 2005) Time SysCall Abnormal Transaction Normal Transaction

Systems Security • Systems security research is becoming increasingly important as computers become more prevalent throughout society • Security concerns over SCADA systems, or systems that control the electric grid, water distribution, and other industrial systems, is growing as these systems are increasingly reliant on cyber infrastructure (Goel, 2011) • Cloud services and infrastructure have grown rapidly in recent years, necessitating increased security practices (Ramgovind et al, 2010; Rong et al, 2013) • In particular, these areas present a new set of challenges for security researchers • SCADA systems often run custom firmware or other software requiring specialized knowledge or new skillsets for researchers • Cloud services and service-oriented architecture (SOA) are of great concern due to their exposure on the Internet and necessity to remain online • Port scanners such as NMAP (http://nmap.org) are often used in security audits on such systems

Systems Security • Growing interest in further developing: • Resilient systems that can automatically mitigate and circumvent cyber attacks (Dsouza et al, 2013) • Moving Target Defense, or evolving defenses that can counter changing and improving cyber attacks (Carvalho et al, 2012) • While improving system and network security can help cyber infrastructure mitigate and recover from cyber attacks, research in other areas of security would be fruitful • Understanding more about the malware deployed against cyber infrastructure could aid in development of effective cyber defenses

Malware Analysis • To improve systems security, some researchers are interested in developing better defenses against malware (Shabtai et al, 2011; Sahs & Kahn, 2012) • Increasingly advanced malware variants appearing in the wild • Affecting servers, computers, mobile phones, etc. • Two forms of malware analysis (Willems et al, 2007; Ismail & Zainal, 2012) • Dynamic analysis - Executing malware and observing run-time behaviors, system calls, registry edits, etc. • Static analysis – Studying malware source code or opcode (operation code) without malware execution

Malware Analysis • By its nature, dynamic analysis will lead to malware infection of computers used for analysis • Requires controls and security measures to avoid malware spread on network • Can be time and resource intensive • May miss hidden execution behaviors if malware does not execute full source code • Conversely, static analysis does not require malware execution • Source code or opcode can be analyzed without malware execution • Full malware source can be analyzed, revealing code that could be hidden and only executed under special circumstances • However, code that is obfuscated can be difficult to analyze and understand • Both techniques are equally useful in different contexts, complementing each other well

Malware Analysis • Data is often collected through the use of honeypots • Honeypots are computers or clients that are setup with the purpose of attracting and logging cyber-attacks in real time • Often emulate or are exposed to live security vulnerabilities in order to capture malware and monitor cyberattacks • Can be used to better understand threats “in the wild” • Two types of honeypots exist (Zhuge et al, 2008; Cova et al, 2010): • Low-interaction honey pots: Emulate known vulnerabilities to capture malware payloads and hacker behavior. Honey pot machine is not actually compromised, and thus only a limited amount of data is captured. Multiple low-interaction honeypots can be hosted simultaneously on one machine. • High-interaction honey pots: Allow full operating system to be compromised in order to gather more data on cyberattacker patterns. Can reveal previously unknown exploits as honeypot does not rely on emulating already known vulnerabilities. However, real infection increases security risks, and more computing resources are required for high-interaction honeypots.

Malware Analysis • Many honeypot tools are developed and made available by The Honeynet Project - http://www.honeynet.org/ • International team of volunteer security researchers and practitioners • Investigate cyberattacks, discover new exploits • Develop to improve Internet security • All projects are open sourced and available for free • Low-interaction and high-interaction honeypots • Tools for other security applications • Open source tools provided by the Honeynett Project, as well as other sources, can be utilized to implement honeypot systems

Malware Analysis • To build a low-interaction honeypot with malware capturing capabilities, deploy the following tools simultaneously on a Linux-based machine:

Malware Analysis • Unfortunately, high-interaction honeypot tools are scarce • Much more complicated than low-interaction honeypots • Require significantly more resources to implement and maintain • Strict safeguards must be built around honeypot to ensure network security • Popular high-interaction honeypot packages: Capture-HPC • Developed by the Honeynet project • Problem: last updated in 2008 • Runs virtual machines as honeypot systems, but has trouble interfacing with latest virtualization software (e.g. VMWare, VirtualBox) due to lack of recent updates • One can build their own high-interaction honeypot by deploying vulnerable machines with system-level logging • System-level logging generally requires operating system kernel hooks • Difficult to implement for most individuals • Many researchers and practitioners opting for low-interaction honeypots with malware capture capability

Malware Analysis • Preliminary study presented at IEEE Intelligence and Security Informatics, 2013 (Benjamin & Chen, 2013) • Both low-interaction and high-interaction honeypots can be configured to capture shellcode samples used by cyber attackers • When deploying several honeypots, potential to capture large volume of shellcode samples • Can become difficult to analyze samples as volume increases • We collected nearly 4,000 malicious source code and shellcode samples from a exploit-sharing website • Four distinct attack vector categories: local memory attacks, remote code execution attacks, web application exploits, and denial of service • Several shellcode samples similar to potential honeypot captures • Motivated to develop automated technique to classify samples by attack vector category

Malware Analysis Program loads library for network communications Shellcode Low-level instructions to access vulnerable application’s memory space An example of a Perl exploit that attempts a remote buffer overflow attack on a popular enterprise Windows and Unix mailserver software. Malicious code such as this can be difficult for researchers to interpret in their explorations. Automated static analysis tools can help in such scenarios.

Malware Analysis • Research cites feature selection for malware analysis is difficult • We utilize a hybrid-GA approach by pairing a genetic algorithm with a classifier to select features based on their helpfulness to accurately classify samples • Features based semantic contents of sample files • Samples are run through a series of classification experiments • Compared SVM and C4.5 decision tree algorithms for classification using a series of experiment configurations; accuracy averaged 86% • Experiment could be extended to include true honeypot shellcode samples, more robust GA or feature selection technique

Botnets • Malware captured by honeypots can sometimes reveal botnets • Outbound network traffic generated by malware may be connecting to a botnet command and control (C&C) channel • These channels are used by cybercriminal “botmasters” to give commands to collections of malware-infected computers that covertly join the IRC channel and wait for instruction.

Botnets • C&C identification techniques have generally utilized honeypots • Honeypots are systems that are configured to simulate computer systems with software vulnerabilities • Can allow wild malware to intentionally exploit honeypot vulnerabilities; malware behaviors can be captured and studied in a sandboxed environment (Rajab et al, 2006; Lu et al, 2009). • All code execution, system changes, and network traffic are tracked and logged within a honeypot(Mielke& Chen, 2008; Zhu et al, 2008). • By observing outbound network traffic generated by malware, researchers may potentially reveal botnet C&C channels and other hacker-related web addresses.

Botnets • There are two common techniques used to collect IRC chat data, but both involve logging of real-time chat. • Logging IRC chat in real-time manually or using automated bots. (Fallman et al, 2010) • Scraping IRC packet contents generated by a honeypot’s local network traffic (Lu et al, 2009) • Several strategies can be taken to effectively use bots and ensure comprehensive data collection (Fallmann et al, 2010): • Swap strategy – Some IRC channels will automatically disconnect users who appear idle. Thus, it can be useful to occasionally rotate bots into different IRC channels for logging, avoiding some problems with idling • Use of multiple bots in the same channel can be used to help ensure comprehensive collection in case some bots get disconnected • Packet scraping requires the use of network traffic analyzer software • Wireshark can be used for this purpose

Botnets • Different forms of analysis should be used depending on research goals and data. For example, the goals and methods used for analysis would be different in: • Botnet research with data from command & control channels • Research on IRC channels affiliated with hacker forums or acting as social hubs • The simplest method of analysis, much like hacker forums, is to manually sift through data (Franklin et al, 2007; Fallmann et al. 2010; Motoyama et al. 2011) • Automated content and network analyses could be extended to IRC datasets as well when studying hacker IRC channels • Can reveal emerging threats, popular tools and methods • May help with attack attribution

Botnets • For botnet C&C channels, there common themes for analysis • Characterizing botmaster activity • Paxton et al, 2011 investigate the different operational styles used by botmasters by computing some usage statistics per botnet master • Mielke & Chen, 2008 use clustering to identify potential collaboration between botmasters based on their participation across different known C&C channels • Identifying botnets based on network traffic • Much research is spent analyzing honeypot captures and network logs to develop new techniques to combat evolving botnets (Lu et al, 2009; Choi & Lee, 2012) • Botnets are becoming increasingly more sophisticated in evading detection

Botnets • Published in IEEE Intelligence and Security Informatics, 2008 (Mielke & Chen, 2008) • A botnet monitoring group, the ShadowServer Foundation, provided the AI Lab with logs from multiple botnet IRC command & control channels. • Text mining techniques were used to differentiate bot masters from connected zombie computers • Bot master names were tracked across all channels • Several names appeared frequently across the data set • By clustering bot masters according to their channel participation, potential collaboration between bot masters can be identified • The roles of individuals within each group, and the overall operational style of each group can be identified by further analyzing C&C logs • Additionally, logs could be used to identify C&C activity patterns; this could help automatically identify future C&C channels

Digital Forensics • As increasingly complex malware and cyber attacks are deployed by individuals and groups, advancements in digital forensics becomes necessary to investigate computer crime • Digital forensics entails identification of security failures within a system, and also the prevention of future incidents (Hay et al, 2011, Sridhar et al, 2012) • Conducting “postmortem” analysis on cybercrime • Can reveal information concerning cyber attackers • Usually paired with other malware and botnet analysis techniques

Digital Forensics • Often requires examining file systems, RAM\volatile memory, and network traffic for for traces of data pertaining to cyber attack • Recovered data often used in persecution of cybercriminals or to identify advanced persistent threats • Research opportunity: there exist only a few standards and benchmarks for existing digital forensics investigations (Yates & Chi, 2011) • Increase of computing platforms has lead to lack of standard practices, no established “science” for forensics on newer operating systems and cyber infrastructure • Growing importance in cloud, mobile, and SCADA systems • Emerging challenges due to growing usage of complex encryption and data obfuscation techniques • Much research focuses on new practices and standards

Digital Forensics • For hands-on experience, SANS Institute offers a version of Linux pre-loaded with digital forensics tools (http://computer-forensics.sans.org/community/downloads) • Other tools:

Hacker Forum Research Hackhound.org Left: A cybercriminal on hackhound.org publishes the latest version of his hacking tool meant to help others steal cached passwords on victims’ computers Right: A hacker of the Chinese community Unpack.cn posts sample code demonstrating how to reverse engineer software written in the Microsoft .NET framework Hacking tool interface Description of code functionality Hacker’s Reputation Score Embedded sample of code Attached Hacking Tool Unpack.cn

Hacker Forum Research • Hacker forums can be useful to researchers for various reasons: • Asses emerging threats and their prevalence in hacker social media • Observing black market activity • Tracking the cybercriminal supply chain and how assets move throughout the global hacker community • Allow researchers to study hackers across different geopolitical regions • Unfortunately, hacker forum data is hard to obtain as many hacker communities employ anti-crawling features (Fallmanet al, 2010; Goel, 2011) • No hacker forum datasets available to researchers • Anti-crawling measures, such as bandwidth monitoring or detection of bot-like behaviors, prevent many researchers from using automated techniques to build a dataset • Thus, most current studies utilize manual data collection (Holt, 2010; Yip 2011).

Hacker Forum Research • To employ automated collection, anti-crawling measures must be circumvented • Reduce bot-like behaviors during collection • Practice identify obfuscation • We may also want to mask our true identity • Reducing crawling rate is useful for circumventing anti-crawling measures that monitor bandwidth usage or page views • To mask our identity, we can utilize proxy servers or peer-to-peer networks to route traffic through • Lets us even regain access to forums than ban us via IP bans • Stand-alone web proxies can be used for traffic routing and identity obfuscation • Peer-to-peer networks, such as the Tor Network, offer similar services as stand-alone web proxies with added capabilities

Hacker Forum Research Traditional proxy server configuration

Hacker Forum Research

Hacker Forum Research Various screenshots of the graphical Tor controller Vidalia. Left: A map allows users to view the locations of all published Tor relay nodes Middle: A real-time log of Tor network events allows users to monitor Tor activity. The Tor client automatically handles many Tor networking functions Right: A basic interface that allows Tor users to quickly assume a new identity by routing traffic through a new circuit. Applications such as web browsers and crawlers can utilize the Tor network by routing their network traffic to the local Tor client.

Hacker Forum Research

Hacker Forum Research • After hacker forum contents are collected, they can be analyzed using traditional social media techniques • Can make use of commonly used text mining tools • Content analysis would be useful for understanding the discuss and information inside hacker social media • Topological analyses often aim to observe hacker forum structure and the relationships between forum participants (Motoyama et al, 2011, Holt et al, 2012)

Hacker Forum Research Description of attached hacking tutorial Password to open attached file Password-protected file containing tutorial documents Hacker forum reputation system Iranian hacker forum participant ‘elvator’ is sharing a tutorial on shellcode, which refers to cyberattack payloads that grant hackers unauthorized access over compromised machines. This hacker has gained a total of 20,305 reputation points from his peers over 1,641 messages posted, which is above average for Ashyane.org.

Hacker Forum Research Hacker forum reputation score Screenshot of vulnerability scanning tool Tool download link Participant explicitly asks others to give him reputation points A forum participant of the Russian hacker forum Xekapok.net shares a vulnerability scanning tool with others. This participant’s message is relatively “media rich” compared to other forum posts due to the usage of images, font styling, and attachments. Additionally, they possess high reputation and thus appear to be well-established in the Xekapok.net community.

Hacker Forum Research • Preliminary hacker reputation study presented at IEEE Intelligence and Security Informatics, 2012 (Benjamin & Chen, 2012) • Collected two hacker communities from the United States and China to examine the mechanisms in which key actors arise within forums • Both communities featured reputation systems • How did hackers earn high levels of reputation among their peers? • Found that hackers who participated frequently and contributed the most towards the cognitive advance of their community had the highest reputation

Hacker Forum Research • Main challenges in hacker forum research are: • Identifying data sources • Collecting complete datasets • If not a security expert, some subject matter may be difficult to interpret • After collection of data, hacker forum research can utilize the same text mining techniques as traditional social media research • Topic modeling • Forum participant analysis • Social network analysis • Etc.

IRC Channel Research • Internet Relay Chat (IRC) is a protocol for real-time, multi-user text chat • IRC channels are used by hackers to communicate in real-time through text chat (Mielke & Chen, 2008, Motoyama et al, 2011) • Sometimes affiliated directly with hacker forums • Other times are independent communities only accessible through IRC • Contents can be analyzed through traditional text mining techniques • IRC is comprised of three major components: • IRC Networks (i.e. servers) • Chat channels existing within IRC networks • IRC Clients, or users • Understanding these three components is important for developing data collection methods

IRC Channel Research • IRC Networks • Usually defined by an address such as irc.domain.com • An IRC network is generally comprised of one server, or a network of servers directly connected to one another • Servers share information with one another such as user information, existing channels, chat information, etc. • New servers can be added to an existing network to scale-up network capacity • Different IRC networks are completely independent of one another • Every IRC channel exists within an IRC network

IRC Channel Research • Public vs Private networks • Network accessibility has many implications for data collection • If hackers decide to host their channel on a public network, it is theoretically possible to collect data from that channel by volunteering a server to support the network; many public networks are entirely volunteer-run • One limitation to volunteering a server to a public IRC network is that public IRC networks often require very significant bandwidth capacity (hundreds of GBs of transfer per month) • Conversely, if a hacker-related IRC channel is hosted within a private network, it is unlikely that we will be able to volunteer a server to the network. Client-bots can be used to collect data from such channels

IRC Channel Research • IRC Channels • IRC Channels are usually times separated by topic • Channel naming convention is #ChannelName • Each channel exists within a single IRC network • Two channels with the same name but different networks are two different channels • Two channels within the same network cannot share the same name • A list of all users connected to a particular channel is provided to each channel participant • User-chat is broadcasted to everyone within a channel

IRC Channel Research An example of a hacker IRC channel. A list of users, their messages, and timestamps for each message can be seen. The participants are discussing sqlmap, a tool for automated SQL injection and database hijacking, as well as programming concepts. The top header also includes links to other IRC channels affiliated with this one.

IRC Technical Information • IRC Users • Connect to IRC servers, can join multiple channels simultaneously • Can broadcast messages to all other users within channels • Can initiate private messages with other users that are hidden from all other chat participants • Such private messages cannot be collected with the client-bot method of collection • They can be collected when hosting a server, though many public IRC networks have privacy rules that prohibit server operators from such behavior

Cybersecurity Research Overview