80 likes | 301 Views
DEiXTo. DEiXTo. Powerful web data extraction tool Freeware GUI tool (built with Turbo Delphi, Windows-only) Free, cross-platform Command Line Executor (in Perl) DEiXToBot agent (implemented in Perl) W3C Document Object Model (DOM) DOM-based extraction rules (wrappers).
E N D
DEiXTo • Powerful web data extraction tool • Freeware GUI tool (built with Turbo Delphi, Windows-only) • Free, cross-platform Command Line Executor (in Perl) • DEiXToBot agent (implemented in Perl) • W3C Document Object Model (DOM) • DOM-based extraction rules (wrappers). • Extracted data can be exported to a wide variety of formats (tab delimited, XML, RSS, etc). • Command Line Executor: • has database support via the Database independent interface for Perl • supports additional formats: Excel, CSV, OpenDocument Spreadsheet (.ods), HTML
GUI DEiXTo • user friendly graphical interface • enhanced, tree based, extraction rules • HTML tag filtering • fast, flexible and high performance tree pattern matching algorithm • regular expression support • can follow "Next Page" links and submit simple forms • can export results to XML and tab delimited formats and create RSS feeds • XML encoded wrapper project files (.wpf) that can be executed at will • last but not least, it's freeware!
DEiXTo Command Line Executor (CLE) • portable, efficient and fast command line executor of GUI DEiXTo generated wrappers • provides options and flexibility that you cannot get with GUI DEiXTo • supports additional output formats such as CSV, Excel and OpenDocument Spreadsheet • provides database support via DBI (the Database independent interface for Perl) • supports HTML output using an HTML template processor and an editable template file • overwrite, append and prepend output modes for all supported formats • can be scheduled to execute wrappers automatically (e.g. using cron in GNU/Linux) • it is free and open source, distributed under the GNU General Public License (GPL) Version 3!
DEiXToBot • A Mechanize agent (essentially a browser emulator) capable of extracting data of interest. • Flexible and efficient. • Allows extensive customization. • Supports multiple patterns on a single page and combination of their results. • Allows post-processing of the extracted data and enables you to transform it to any format you wish. • Programming skills required though to utilize it.
Corgialenios Library use case From HTML unstructured data To ESE format!
DEiXTo Services • We can definitely help you to: • transform the contents of your digital library into OAI-PMH or another suitable format • quickly populate product catalogues with full specifications • search various web resources in real time and extract the results returned • prepare large, focused datasets for scientific tasks (i.e. data mining) • monitor prices of the competition • <your extraction task goes here!>
Happy DEiXTo users! For further information, please visit http://deixto.com