370 likes | 528 Views
INTRODUCTION TO SQL. SQL stands for « Structured Query Language » P rogamming language for database closer to natural English than the other ( based on « sentence » instead of « procedure »)
E N D
SQL stands for « StructuredQueryLanguage » • Progamminglanguage for databasecloser to natural Englishthan the other (based on « sentence » instead of « procedure ») • Aimis to ease the querying of data by the human and the programmation of interfaces • Powerfulfunctions for text recognition • Powerfull extensions for GIS (PostGIS, Oracle) • Standardized and recognized by most of the recentrelationaldatabaseBUT 1)...minordifferences of syntaxbetweenvendors and enhancedfunctionspreventeasyinteroperabilitybetweenproducts 2) SQL databasesoftenimplythat the development of the interfaces is a distinct from the development of the core of the database
Interoperabilityproblembetweenvendors • possible solutions • use an intermediate layer between the database and the interface • ODBC/JDBC (connectorsused by other software by Windows/Java) • use ORM (Object Relational Mapper) software thatallows the programmer to use the samesyntaxwhendeveloping interfaces e.g : Doctrine (Open-Source)
SQL and NoSQL • SQL isprettyuseful for normalizeddatabasewhere the control of data integrityis important (scientific value) • ...but itis not scalable : hugeamount (> 300 000) of data lowerpeformances) • since 4/5 years, with the explosion of the Internet thereis a trend in NoSQLdatabase; fastrdatabasesthatcanhandlehugeamount of data raplidly, e.g: Solr (to index Words and PDF),MongoDB, Cassandra etc... • NoSQLoffers speed, fastreplicationbetween locations, flexible structure but no control on integrity. It doesn’t replace SQL but complementsit (SQL=> control of the integrity and of the completness of data is more important than speed + good interaction with GIS NoSQL: high availability of data on the Internet but no schema to validateintegrity and not yet GIS plug in ) Problem: scientic information network requiresbothquality control and high availiability
SQL: 4 parts • Data QueryLanguage (DQL) • Search and display data matchingspecificcriteria • Data manipulation language (DML): • modify data (insert, update, delete) • lock (atomicity of data: two user cannotmodify the ame data in parallel) • use transation (rollback to the previous state of the database if a modification fails) • Data Definitionlanguage (DDL) • create the schema of the database (the normalised structure, the index): youcandefinedyourselve how to check the integrity of the database • Data Control language (DCL) • createauthorization and accessrule for users
Vocabulary Table Field Field name Field Type Record (or tuple)
Recommandations To ease the manipulation with SQL whencreating a database: • Avoiduppercaseletters in fieldnames • Avoidaccentedcharacters in fieldnames (but you must keepthem in the content of course!) • replace white spaceswithunderscore • avoidatanypriceothernon alphabetical or numericalcharacters • avoidgiving the samename to twofields in different tables (not always possible...) • table name in plural • fieldnamesingular • use descriptive fieldname (e.g: not ‘dc’ but ‘date_collected’)
Querying Pattern: SELECT<comma-separatedlist of fields>FROM<name of Table> ; e.g. SELECTLocalityFROMlocalities; SELECTLocality, Country FROMlocalities; SELECT*FROMLocalities; « * »=> all fields (wildcard)
Querying II Pattern: SELECT <comma-separatedlist of fields> FROM <name of Table> WHERE [condition] ; e.g. SELECT pk_locality, latitude_decimals, longitude_decimals FROM localitiesWHERE Locality =‘Tienen’;
Querying II Pattern: SELECT <comma-separatedlist of fields> FROM <name of Table> WHERE [condition] ; e.g. SELECT * FROM localities WHERE latitude_decimals>50.80 ANDlatitude_decimals<50.85
Querying III (boolean) Compare the result SELECT * FROM localities WHERE latitude_decimals>50.80 ANDlatitude_decimals<50.85 SELECT * FROM localities WHERE latitude_decimals>50.80 ORlatitude_decimals<50.85
Querying IV (boolean) Compare the result SELECT * FROM localities WHERE locality=‘Tienen’ AND locality=‘Bunsbeek’; SELECT * FROM localities WHERE locality=‘Tienen’ OR locality=‘Bunsbeek’;
Querying II Pattern: SELECT <comma-separatedlist of fields> FROM <name of Table> WHERE [condition] ; e.g. SELECT * FROM localities WHERE locality <>‘Hensberg’; SELECT * FROM localities WHERE locality IS NULL;
JOINING (I) SELECT * FROM specimens JOIN scientific_names ON specimens.fk_scientific_name = scientific_names.pk_scientific_name [+ WHERE CONDITION] ;
Joining II • Exercice • Find the collectors of ‘Agostis’
Joining II • Exercice • Find the collectors of ‘Agostis’ SELECTcollector_name, genus FROM specimens JOIN scientific_namesON specimens.fk_scientific_name= scientific_names.pk_scientific_name where genus='Agrostis';
Joining III • Exercice • Find the scientificnameshaving been collected in Tienen
Joining III • Exercice • Find the scientificnameshaving been collected in Tienen SELECTscientific_name FROMspecimens JOIN scientific_namesON specimens.fk_scientific_name= scientific_names.pk_scientific_name JOIN localitiesON specimens.fk_locality=localities.pk_locality wherelocality='Tienen';
Joining III (ordering) • Exercice Find the scientificnameshaving been collected in Tienen SELECTscientific_name FROMspecimens JOIN scientific_namesON specimens.fk_scientific_name= scientific_names.pk_scientific_name JOIN localitiesON specimens.fk_locality=localities.pk_locality wherelocality='Tienen‘ ORDER BY scientific_name;
Joining III • Exercice Find the collectors of ‘Balsaminaceae’ • Find the collectors of ‘Balsaminaceae’
Joining III • Exercice • Find the collectors of ‘Balsaminaceae’ SELECTcollector_name FROM specimens JOIN scientific_names ON specimens.fk_scientific_name= scientific_names.pk_scientific_name JOIN families ON scientific_names.fk_family=families.pk_family where family='Balsaminaceae' ;
Views ‘Save’ and makecomplexqueries permanent in the database (useful for programming of filtering) CREATE VIEW v_specimen_names_localities AS SELECTscientific_name FROMspecimens JOIN scientific_namesON specimens.fk_scientific_name= scientific_names.pk_scientific_name JOIN localitiesON specimens.fk_locality=localities.pk_locality
Search on Text Patterns (I) • match one position: '_'; • ‘_’ meansanycharacterpresentone time b) match several positions: '%'; • ‘%’ means the absence or repetition of anycharacter Note: white spacecounts for one character
Search on Text Patterns (II) • SQL Syntax SELECT ...WHEREfieldLIKE 'pattern'; • PostgresSQLSyntax SELECT ...WHEREfieldSIMILAR TO 'pattern';
Search on Text Patterns (III) Example: find the scientificnameshaving «’e’ » as second letter of genus: SELECTscientific_name FROMscientific_names WHERE genus SIMILAR TO '_e%';
Search on Text Patterns (IV) Example: Pattern: '_e%'; Response: ‘Aegopodium’ ‘Aethusa’ ‘Bellis’ ‘Betula’ ...
Search on Text Patterns (V) Example: Pattern: '_e%'; Response: ‘Aegopodium’ ‘Aethusa’ ‘Bellis’ ‘Betula’ ...
Search on text pattern (VI) • Interval of characters • Use brackets [a-z]: anylower case letter [A-Z]: anyuppercaseletter [0-9]: anynumer [aA]: ‘a’ or ‘A’
Search on text pattern (VII) • Useful to control nomenclature!! • Exercice: Search the speciescontaininguppercasecharacters:
Search on text pattern (VII) • Useful to control nomenclature • Exercice: Search the speciescontaininguppercasecharacters: SELECT * FROMscientific_names WHERE species SIMILAR TO '%[A-Z]%';
Search on text pattern (VIII) • Useful to control nomenclature • Exercice: Search the genuscontaininguppercaselettersafter the first one:
Search on text pattern (VIII) Exercice: Search the genuscontaininguppercaselettersafter the first letter: SELECT * FROMscientific_names WHERE genus SIMILAR TO ‘_%[A-Z]%';
Search on text pattern (IX) • Useful to control nomenclature • Exercice: Search the genuscontaining more than one word:
Search on text pattern (IX) Exercice: Search the genuscontaining more than one word SELECT * FROMscientific_names WHERE genus SIMILAR TO '%[a-z]% %[a-z]%';
Search on text pattern (X) • PostgreSQLisalsocompliantwith an even more powerfullmechanismcalled « regular expression » • standard syntaxshared by severalprogramminglanguages • allowmatchingcomplex patterns • canperform replacements and extractions
<optional if somebodyask how to group information in one row> Group specimencollected in Tienen per Collector SELECT array_to_string(array_agg(scientific_name), ','), collector_name FROM specimens JOIN scientific_names ON specimens.fk_scientific_name= scientific_names.pk_scientific_name JOIN localities ON specimens.fk_locality=localities.pk_locality where locality='Tienen' GROUP BY collector_name ORDER BY collector_name;
<optional if somebodyask how to group information in one row> Group localities per collectors SELECTarray_to_string(array_agg(locality), ','), collector_name FROM specimens JOIN scientific_names ON specimens.fk_scientific_name= scientific_names.pk_scientific_name JOIN localities ON specimens.fk_locality=localities.pk_locality GROUP BY collector_name ORDER BY collector_name;