INTRODUCTION TO SQL

INTRODUCTION TO SQL

SQL stands for « StructuredQueryLanguage » • Progamminglanguage for databasecloser to natural Englishthan the other (based on « sentence » instead of « procedure ») • Aimis to ease the querying of data by the human and the programmation of interfaces • Powerfulfunctions for text recognition • Powerfull extensions for GIS (PostGIS, Oracle) • Standardized and recognized by most of the recentrelationaldatabaseBUT 1)...minordifferences of syntaxbetweenvendors and enhancedfunctionspreventeasyinteroperabilitybetweenproducts 2) SQL databasesoftenimplythat the development of the interfaces is a distinct from the development of the core of the database

Interoperabilityproblembetweenvendors • possible solutions • use an intermediate layer between the database and the interface • ODBC/JDBC (connectorsused by other software by Windows/Java) • use ORM (Object Relational Mapper) software thatallows the programmer to use the samesyntaxwhendeveloping interfaces e.g : Doctrine (Open-Source)

SQL and NoSQL • SQL isprettyuseful for normalizeddatabasewhere the control of data integrityis important (scientific value) • ...but itis not scalable : hugeamount (> 300 000) of data lowerpeformances) • since 4/5 years, with the explosion of the Internet thereis a trend in NoSQLdatabase; fastrdatabasesthatcanhandlehugeamount of data raplidly, e.g: Solr (to index Words and PDF),MongoDB, Cassandra etc... • NoSQLoffers speed, fastreplicationbetween locations, flexible structure but no control on integrity. It doesn’t replace SQL but complementsit (SQL=> control of the integrity and of the completness of data is more important than speed + good interaction with GIS NoSQL: high availability of data on the Internet but no schema to validateintegrity and not yet GIS plug in ) Problem: scientic information network requiresbothquality control and high availiability

SQL: 4 parts • Data QueryLanguage (DQL) • Search and display data matchingspecificcriteria • Data manipulation language (DML): • modify data (insert, update, delete) • lock (atomicity of data: two user cannotmodify the ame data in parallel) • use transation (rollback to the previous state of the database if a modification fails) • Data Definitionlanguage (DDL) • create the schema of the database (the normalised structure, the index): youcandefinedyourselve how to check the integrity of the database • Data Control language (DCL) • createauthorization and accessrule for users

Vocabulary Table Field Field name Field Type Record (or tuple)

Recommandations To ease the manipulation with SQL whencreating a database: • Avoiduppercaseletters in fieldnames • Avoidaccentedcharacters in fieldnames (but you must keepthem in the content of course!) • replace white spaceswithunderscore • avoidatanypriceothernon alphabetical or numericalcharacters • avoidgiving the samename to twofields in different tables (not always possible...) • table name in plural • fieldnamesingular • use descriptive fieldname (e.g: not ‘dc’ but ‘date_collected’)

Querying Pattern: SELECT<comma-separatedlist of fields>FROM<name of Table> ; e.g. SELECTLocalityFROMlocalities; SELECTLocality, Country FROMlocalities; SELECT*FROMLocalities; « * »=> all fields (wildcard)

Querying II Pattern: SELECT <comma-separatedlist of fields> FROM <name of Table> WHERE [condition] ; e.g. SELECT pk_locality, latitude_decimals, longitude_decimals FROM localitiesWHERE Locality =‘Tienen’;

Querying II Pattern: SELECT <comma-separatedlist of fields> FROM <name of Table> WHERE [condition] ; e.g. SELECT * FROM localities WHERE latitude_decimals>50.80 ANDlatitude_decimals<50.85

Querying III (boolean) Compare the result SELECT * FROM localities WHERE latitude_decimals>50.80 ANDlatitude_decimals<50.85 SELECT * FROM localities WHERE latitude_decimals>50.80 ORlatitude_decimals<50.85

Querying IV (boolean) Compare the result SELECT * FROM localities WHERE locality=‘Tienen’ AND locality=‘Bunsbeek’; SELECT * FROM localities WHERE locality=‘Tienen’ OR locality=‘Bunsbeek’;

Querying II Pattern: SELECT <comma-separatedlist of fields> FROM <name of Table> WHERE [condition] ; e.g. SELECT * FROM localities WHERE locality <>‘Hensberg’; SELECT * FROM localities WHERE locality IS NULL;

JOINING (I) SELECT * FROM specimens JOIN scientific_names ON specimens.fk_scientific_name = scientific_names.pk_scientific_name [+ WHERE CONDITION] ;

Joining II • Exercice • Find the collectors of ‘Agostis’

Joining II • Exercice • Find the collectors of ‘Agostis’ SELECTcollector_name, genus FROM specimens JOIN scientific_namesON specimens.fk_scientific_name= scientific_names.pk_scientific_name where genus='Agrostis';

Joining III • Exercice • Find the scientificnameshaving been collected in Tienen

Joining III • Exercice • Find the scientificnameshaving been collected in Tienen SELECTscientific_name FROMspecimens JOIN scientific_namesON specimens.fk_scientific_name= scientific_names.pk_scientific_name JOIN localitiesON specimens.fk_locality=localities.pk_locality wherelocality='Tienen';

Joining III (ordering) • Exercice Find the scientificnameshaving been collected in Tienen SELECTscientific_name FROMspecimens JOIN scientific_namesON specimens.fk_scientific_name= scientific_names.pk_scientific_name JOIN localitiesON specimens.fk_locality=localities.pk_locality wherelocality='Tienen‘ ORDER BY scientific_name;

Joining III • Exercice Find the collectors of ‘Balsaminaceae’ • Find the collectors of ‘Balsaminaceae’

Joining III • Exercice • Find the collectors of ‘Balsaminaceae’ SELECTcollector_name FROM specimens JOIN scientific_names ON specimens.fk_scientific_name= scientific_names.pk_scientific_name JOIN families ON scientific_names.fk_family=families.pk_family where family='Balsaminaceae' ;

Views ‘Save’ and makecomplexqueries permanent in the database (useful for programming of filtering) CREATE VIEW v_specimen_names_localities AS SELECTscientific_name FROMspecimens JOIN scientific_namesON specimens.fk_scientific_name= scientific_names.pk_scientific_name JOIN localitiesON specimens.fk_locality=localities.pk_locality

Search on Text Patterns (I) • match one position: '_'; • ‘_’ meansanycharacterpresentone time b) match several positions: '%'; • ‘%’ means the absence or repetition of anycharacter Note: white spacecounts for one character

Search on Text Patterns (II) • SQL Syntax SELECT ...WHEREfieldLIKE 'pattern'; • PostgresSQLSyntax SELECT ...WHEREfieldSIMILAR TO 'pattern';

Search on Text Patterns (III) Example: find the scientificnameshaving «’e’ » as second letter of genus: SELECTscientific_name FROMscientific_names WHERE genus SIMILAR TO '_e%';

Search on Text Patterns (IV) Example: Pattern: '_e%'; Response: ‘Aegopodium’ ‘Aethusa’ ‘Bellis’ ‘Betula’ ...

Search on Text Patterns (V) Example: Pattern: '_e%'; Response: ‘Aegopodium’ ‘Aethusa’ ‘Bellis’ ‘Betula’ ...

Search on text pattern (VI) • Interval of characters • Use brackets [a-z]: anylower case letter [A-Z]: anyuppercaseletter [0-9]: anynumer [aA]: ‘a’ or ‘A’

Search on text pattern (VII) • Useful to control nomenclature!! • Exercice: Search the speciescontaininguppercasecharacters:

Search on text pattern (VII) • Useful to control nomenclature • Exercice: Search the speciescontaininguppercasecharacters: SELECT * FROMscientific_names WHERE species SIMILAR TO '%[A-Z]%';

Search on text pattern (VIII) • Useful to control nomenclature • Exercice: Search the genuscontaininguppercaselettersafter the first one:

Search on text pattern (VIII) Exercice: Search the genuscontaininguppercaselettersafter the first letter: SELECT * FROMscientific_names WHERE genus SIMILAR TO ‘_%[A-Z]%';

Search on text pattern (IX) • Useful to control nomenclature • Exercice: Search the genuscontaining more than one word:

Search on text pattern (IX) Exercice: Search the genuscontaining more than one word SELECT * FROMscientific_names WHERE genus SIMILAR TO '%[a-z]% %[a-z]%';

Search on text pattern (X) • PostgreSQLisalsocompliantwith an even more powerfullmechanismcalled « regular expression » • standard syntaxshared by severalprogramminglanguages • allowmatchingcomplex patterns • canperform replacements and extractions

<optional if somebodyask how to group information in one row> Group specimencollected in Tienen per Collector SELECT array_to_string(array_agg(scientific_name), ','), collector_name FROM specimens JOIN scientific_names ON specimens.fk_scientific_name= scientific_names.pk_scientific_name JOIN localities ON specimens.fk_locality=localities.pk_locality where locality='Tienen' GROUP BY collector_name ORDER BY collector_name;

<optional if somebodyask how to group information in one row> Group localities per collectors SELECTarray_to_string(array_agg(locality), ','), collector_name FROM specimens JOIN scientific_names ON specimens.fk_scientific_name= scientific_names.pk_scientific_name JOIN localities ON specimens.fk_locality=localities.pk_locality GROUP BY collector_name ORDER BY collector_name;

INTRODUCTION TO SQL

INTRODUCTION TO SQL

Presentation Transcript

Introduction to SQL

Introduction to SQL

Introduction to SQL

Introduction to SQL

Introduction to SQL

Introduction to SQL

Introduction to SQL

Introduction to SQL

Introduction to SQL

Introduction to SQL

Introduction to SQL

Introduction to SQL

INTRODUCTION TO SQL

Introduction to SQL

Introduction to SQL

Introduction to SQL

Introduction to SQL

Introduction to SQL

Introduction to SQL

Introduction to SQL