710 likes | 807 Views
Data Management Technologies. Ohm Sornil Department of Computer Science National Institute of Development Administration. Information Architecture. Web-Survey System. Survey Creation. Create New Questions. Create Question (Multi-choice). Multi-choice Question. Create Question (Matrix).
E N D
Data Management Technologies Ohm Sornil Department of Computer Science National Institute of Development Administration
Databases • is a structured collection of records or data that is stored in a computer so that a computer program can consult it to answer queries • The computer program used to manage and query a database is known as a database management system (DBMS).
SQL • It is the standard language for relational systems • Supports • Data definition • CREATE TABLE, ALTER TABLE • Data manipulation • SELECT, INSERT, DELETE, UPDATE
Business Intelligence (BI) • Make use of enterprise-wide data to enable strategic decision making
Data Warehousing • A database • is designed and optimized) to record • Using complex SQL queries takes a lot of time on such a system • A data warehouse • is designed (and optimized) to respond to analysis questions that are critical for your business (i.e., read-optimized)
E-R Diagram (DB Data Model) Dimension Model (DW Data Model)
Data Warehousing • Separate from application databases ensure that business intelligence (BI) solution is scalable • Answer questions far more efficiently and frequently • Reduces the 'cost-per-analysis'
Other sources Extract Transform Load Operational DBs Multi-Tiered Architecture OLAP Server Analysis Query Reports Data mining Serve Data Warehouse Data Sources Data Storage OLAP Engine Front-End Tools
A Data Warehouse • is a subject-oriented, integrated, time-variant, non-updatable collection of data used in support of management decision-making processes (W.H. Inmon, 1980)
Data Warehouse Implementation • Dimension modeling • Extraction • Transformation • Data Quality • Loading
Transformation Issues • Format Revisions • Decoding of Fields • Calculated and Derived Values • Splitting of Single Fields • Merging of Information • Character Set Conversion • Conversion of Units of Measurements • Date/Time Conversion • Summarization • Key Restructuring • Deduplication
Loading Issues • Initial Load: populating all the data warehouse tables for the very first time • Incremental Load: applying ongoing changes as necessary in a periodic manner • Full Refresh: completely erasing the contents of one or more tables and reloading with fresh data (initial load is a refresh of all the tables)
Loading Issues (Paulraj Ponniah, 2001)
Data Quality • Accuracy • Domain Integrity • Consistency • Redundancy • Conformance to Business Rules • Structural Definiteness • Data Anomaly • Clarity • Timely • Usefulness
OLAP • Is a category of software technology that enables analysts, managers and executives to gain insight into data through fast, consistent, interactive access in a wide variety of possible views of information that has been transformed from raw data to reflect the real dimensionality of the enterprise as understood by the user (The OLAP council)
Computer Security • Processes and technologies that ensure confidentiality, integrity, and availability (CIA) of information-system assets • Assets • Hardware, software, firmware, and information being processed, stored, and communicated
How Are Computers and Networks Attacked? • Take advantages of vulnerabilities inside operating systems, applications, protocols, communication channels, and human
Motivations of Attackers • Money • Entertainment • Entrance to social groups/status • Cause/malice Source: Kilger M., Arkin O. and Stutzman J., Profiling. In The honeynet project know your enemy: learning about security threats (second edition). Boston: Addison, 2004.
Internal Security Attacks • Far greater cost per occurrence and total potential cost than attacks from outside • Employees, ex-employees, contractors and business partners • Trust and physical access • Motives • Challenge/curiosity • Revenge • Financial gain Source: Kristin Gallina Lovejoy (April 2006) http://www.csoonline.com/read/040106/caveat041206_pf.html
Common Internal Attacks • Sabotage of information or systems • Theft of information or computing assets • Introduction of bad code: time bombs or logic bombs • Viruses • Installation of unauthorized software or hardware • Manipulation of protocol design flaws • Manipulation of operating system design flaws • Social engineering Source: Kristin Gallina Lovejoy (April 2006) http://www.csoonline.com/read/040106/caveat041206_pf.html
Inherent Technology Weaknesses • Many of these problems can be traced back to weaknesses in the technology • Hackers have exploited many vulnerabilities found in network protocols • For example (TCP/IP) • Inability to verify the identity of communicating parties • Inability to protect the privacy of data on a network • Some products also have inherent security weaknesses (because not all product developers make security a design priority)
Configuration Weaknesses • Insecure user accounts (such as guest logins or expired user accounts) • System accounts with widely known default, unchanged passwords • Misconfigured Internet services • Insecure default settings within products
Policy Weaknesses • Policy is a set of rules by which we operate computer systems • Generally include • Physical access controls • Logical access controls • Security administration • Security monitoring and audit • Software and hardware change management • Disaster recovery and backup • Business continuity • No single solution should be viewed as providing all the protection you need