210 likes | 334 Views
The SCANCSV.LUA library. Jaroslav Hajtmar. Apology English - speaking participants. Sorry, but this talk is only in Czech. Due to my language skills I would probably just did not know enough to say everything important. I will try to at least the guide slideshow in English.
E N D
The SCANCSV.LUA library Jaroslav Hajtmar
ApologyEnglish-speakingparticipants Sorry, but this talk is only in Czech. Due to mylanguage skills I would probably just did not know enough to say everything important. I will try to at least the guide slideshow in English. Thanks for your understanding.
Abstract Data stored in CSV (Comma Separated Values) files are often used in data processing.This presentation describes the author's ScanCSV.lua library, its origin and demonstrates practical examples of its usage in ConTeXt MKIV. Author shows how easily and quickly create print reports, letters, forms, certificates, invitations, cards, business cards, double-sided cards, tables, animations etc (MOZNA BYCH ZKRATIL VYCET) using external CSV text databases. Users of ConTeXt MKIV (but LuaLATEX and LuaTEXas well) can easily use data from external CSV tables in their own documents via the library, using the TeX macros built on the library and make this data available in an attractive and very simple and natural way.
Introduction • SCANCSV.LUA library – easy way to use text database data stored in external CSV files in ConTeXt MkIV (in LuaLaTeX and LuaPlain is working too). • Easily create LuaTeX documents which can handle multiple data sets (CSV simple database). • Number of use cases: printing of various forms, collective letters, certificates, invitations, cards, business cards, double-sided cards, tables, animations etc. • Main objective : easy to use without knowledge of Lua, use also in LuaLaTeX and LuaPlainTeX, access CSV data purely by TeX macros built on library functions (without Lua code), motivate other users to use LuaTeX.
CSV data format and SCANCSV.LUA • Exchange data, export to CSV (e.g. the MySQL database), a simpler alternative to the XLM (XML?), easy handling (sorting and editing), spreadsheets (Excel, Calc, Gnumeric, ...) • General description of CSV format • CSV format suitable for SCANCSV.LUA: • file must be encoded in UTF-8! (Exported XLS files to be recoded – disadvantage) • Field separators: basically anything, default value is ; semi-colon (MS Excel) • Spacers fields???: can be anything, left and right may be different (most often “ - quotes), the default value is without spacers! • The parsing algorithm in SCANCSV.LUA is very simple (although it can be freely adjusted) => limitation (if spacers are set, then must be used everywhere – it is not required generally)
SCANCSV – history, inspiration • 2005 – discovery of scanbase.tex macro of Petr Olšák. Macro process text files in a particular format. • Petr Olšákmodified and generalized the macro scanbase.tex to new macro scancsv.tex – it process text files in CSV format. I used it in plainTeXtill 2008. • 2008 - modification of macro for LaTeX (JaromírKuben) and for ConTeXt (Petr Olšák). I use it in ConTeXtMkIIup to now. • 2010 - I started to use ConTeXtMkIV. Original macro does not work there. ConTeXt is working with character set UTF8, but macro is unable to process this character set. • March 2010 – my familiarization with LuaTeXandLua language, I started with creating the library scancsv.lua. First version was practically useless… • July 2010 – first really usable version • today – daily usage, improvements, tuning and expansion of options
The operating principle of the library • Load library scancsv.lua (the only Lua code in the source ConText text). • Optional settings of header flag, separator elements, and spacers (otherwise, the default value is used). • Open CSV file (different ways). • Load CSV table row (manually or in a cycle) • Parse row (column separation data). • Retrieve column data into TeX macros. • Repeat steps 4 to 6 for all lines of CSV tables.Processing method of first table row depends on whether it's "header" or not. After loading the column data in the macro data are available ConTeXt. Rows can be browsed "manually“ or using the standard cycles or macros of library.
Using in the "manual" mode • Load library \directlua{dofile(scancsv.lua)} • Setting a header flag (when the header is present) \setheader (or unset - \resetheader) • Open CSV file \opencsvfile{file.csv} • Then, in source text, we use the macros \cA, \cB ... (or \Firstname, \Lastname, ... if first line contains header Firstname, Lastname, …). These macros contains the column values of the current CSV row • \Nextrow - go to the next table row (macro \cA, \cB ... or \Firstname, \Lastname, … are filled with new values)
Main TeX macros for using the library • \setfiletoscan{CSVFile} – setting of name of CSV file • \setheader– set a header flag • \resetheader – unset a header flag • \setsep{,}, \setld{*}, \setrd{!} – set separator of columns, spacers of columns to user value – left and right(nondafault value) • \resetsep, \resetld, \resetrd– unset to default values • \opencsvfile{CSVFile}, \openheadercsvfile{CSVFile}??? • \nextrow – go to to next row of CSV file • \printline, \printall – print all of line / all of CSV table • \filelineaction, \filelineaction{CSVfile},\filelineaction{CSVfile}{to}, \filelineaction{CSVfile}{from}{to} –macros for processing of user-defined macro \lineaction in a cycle
TeXmacrosforaccessingof columns data 1;Petr;Novák;19.5.1989;m;Nymburk;U Brány 72;Jan;Novotný;5.7.1991;m;Praha;Uhlířská 1783;Zuzana;Vašíčková;13.9.1984;ž;Ostrava;Jánská 14… CSV file without Header (default option - \resetheader) \cA \cB \cC \cD … \resetheader Possibility setof Roman numbers of columns:\cI, \cII, \cIII, \cIV, … (defalut UserColumnNumbering=‘XLS’) no header data lines CSV file with Header (switch with \setheader) \cA = \Surname \cB = \Firstname \cC = \Birthdate … \setheader Header (no data) Surname;Firstname;Birthdate;Sex;City;Zipcode;Street Novák;Jan;14.10.1997;m;Zbečno;27024;Farní 21 Pospíšilová;Hana;4.1.1996;ž;Zábřeh;78901;Studénky 420 … data lines
TeX macros to obtain „system“ information • \csvfilename – name of actual open CSV file • \numcols – number of columns of the CSV table • \numrows – number of processed (offered) lines • \numline – the serial number of the currently loaded row • \csvreport – Report information on open CSV file Hooksfor data processing(default \relax) • \blinehook, \elinehook – begin line hook,end l.h. – macros are executed before and after processing row macro \lineaction (ie CSV table row) • \bfilehook, \efilehook – performed before and after processing the entire CSV table • \bch, \ech – begin column hook, end c.h. - can be manually set in lua code, because of the impossibility of testing the macro, this option is disabled TeX IF fortesting EOF CSV file • \ifEOF – TRUE,if we get to the end of processing a CSV file • \ifnotEOF – opposite \ifEOF
Using „manual“ mode • In the source code we use the macros\cA, \cB, or ... \Firstname, \Lastname, ... (if first line contains a header) containing a column value of the current CSV row.\Nextrow - go to the next table row (macros\cA, \cB ... are filled with new values)
Modification of functions of library • Default settings can be changed by editing the file scancsv.lua - in the introductory section of code • During the processing of ConTeXt MKIV (LuaLaTeX) it is possible to continuously change settings separator, spacers, headers, using TeX macros ... • Possibility of processing different CSV files in one document (with different separators and spacers columns) • Use Hooks – default are\relax
Main Lua library functions • ParseCSVdata() – function for parsing of individual records (rows) of CSV table • lineaction()-- processing of user macro \lineaction according to the specified range of lines at the open CSV file • CreatePageFiles()-- create two CSV files from one open CSV file. It will by used to print double-sided cards, printed on the page in block R x C (it reorder the CSV file with the 2nd page so that the front and back of the tiles match) • Filelineactioncards() – printing 1st and 2nd sides of list of cards from the files created by the previous function • CSVReport() – get report information about the open CSV file • csvfilename()– name of actually open CSV file • TMN(s)– (TeX Macro Name). Macro name must not contain prohibited characters • ar2rom()-- converts Arabic numbers to Roman. Used for "numbering" the column in the macro • ar2xls()-- converts numbers to the column name (Excel format) • ar2colnum() – converts TeX macro column name based on the global variable??? • printline()– prints actual row of the CSV table • printall() – prints the whole CSV table • printallcontext() -- prints the whole CSV table in ConTeX syntax
Testing and cycles Conditions with AND and OR (see Olšák TBN) % Condition A AND B \doloop{ \ifnum\Id>2 \ifnum\Id<10\lineaction \fi \fi \ifEOF\exitloop\else\nextrow\ifEOF\exitloop\fi\fi } % Condition A OR B \def\AorB{\lineaction} \doloop{ \ifnum\Id=1\AorB% \else\ifnum\Id>3\AorB\fi \fi \ifEOF\exitloop\else\nextrow\ifEOF\exitloop\fi\fi }
SCANCSV.LUA and cycles Examples of ConTeXt cycles: \dorecurse{5}{\lineaction\nextrow} - \lineactionmacro for next 5 rows \doloop{\lineaction\nextrow\ifnum\numline>7\exitloop\fi} \doloop{\ifEOF\exitloop\else\lineaction\nextrow\fi} \doloop{\lineaction\nextrow \if\Id3 \exitloop \fi} Examplesof library cycles(only in test version SCANCSV.LUA): The macros are based on \doloop macro to easier use in source code. \doloopwhile{\Trida}{3.A}{\tableaction} % List all meeting the criterion \doloopuntil{\Trida}{3.A}{\tableaction} % list until it is not satisfied \doloopforall{\lineaction} – for all lines will\lineaction macro \doloopfromto{3}{7}{\lineaction} \doloopaction – without parameter done for all rows macro \lineaction. \doloopaction{\useraction} – done for all rows user macro \useraction \doloopaction{\useraction}{5} – for the first 5 rows will doing \useraction macro \doloopaction{\useraction}{5}{7} - for rows 5-7 will doing \useraction macro
Practical demonstrations of the use of libraries • Forms, multiple letters, etc. • Cards, business cards, … • Tables • Metapost animation • Use ConTeXtových cycles, IF tests • SCANCSV.LUA “extras???“ (TeX macros in a CSV file, change \lineaction during processing CSV) • Samples of work for CTM & TE
Constraints, compatibility, flaws • SCANCSV.LUA does not process general CSV files. Reason: The parsing algorithm is very simple.If the item contains a column separator “,” the CSV output is the following: 1, Jan, Novotny, "The Gate 4, 111 50 Prague", ... Solution: an better (general) algorithm. Only requires to change ParseCSVdata function (). • Occasional problems with the expansion. E.g. I failed to get SCANCSV.LUA running in the module database (\usemodule [database]) Mojca Miklavec • Some things work only in ConTeXt
Possibilities of improvement ... • Improvements and generalizations of parsing algorithm • Apply to XML processing?? • Create a separate module ONLY FOR MKIV (remove number of limitations in LuaLaTeX)
Thanks… • To members of mail conference ntg-context@ntg.nl for advices about ConTeXt and Lua. The library would not have been created without their kind assistance. Special thanks to Taco Hoekwater, Hans Hagen, Wolfgang Schuster. • To members of mail conference cstex@cs.felk.cvut.cz for advices about TeX and LaTeX. Especially to Mr. Zdenek Wagner, Vit Zýka, Pavel Stříž, Petr Olšák .. • To Pavel Stříž for inspiration, testing, advices and for convincing me to finish the library and present it at this conference.