250 likes | 264 Views
Extracting Recipes from Chemical Academic Papers. Lei Luo. Extracting Recipes from Chemical Academic Papers. Chemicals Extraction Tools Results Comparison Future Work Recipes Extraction Sample Results Future Work. C hemicals Extraction. Tools Brat ChemTagger ChemDataExtractor.
E N D
Extracting Recipes from Chemical Academic Papers • Chemicals Extraction • Tools • Results Comparison • Future Work • Recipes Extraction • Sample Results • Future Work
Chemicals Extraction • Tools • Brat • ChemTagger • ChemDataExtractor
Chemicals Extraction • Brat • Web-based tool for text annotation; that is, for adding notes to existing text documents. • Needs to define three things: • Top level annotation definition. • Second level annotation definition. • Original text file. • Needs manual annotation.
Brat • Top level annotation
Brat • Second level annotation
Brat • Original text file
Brat • Result
Chemicals Extraction • ChemTagger • Phrase-based semantic NLP tool for parsing the language of chemical experiments. • Takes a string as input and produces an XML document as output. • Uses a combination of OSCAR4, domain-specific regex and English taggers to identify parts-of-speech.
ChemTagger • Web-based interface
ChemTagger • Web-based interface
ChemTagger • Local
ChemTagger • Result – XML & Chemicals
Chemicals Extraction • ChemDataExtractor • Able to automatically extract chemical names, properties, and spectra from scientific papers. • Uses machine learning, custom dictionaries, and rule-based parsing grammars. • Able to resolve data interdependencies. • Extracts data from tables.
ChemDataExtractor • Web-based interface
ChemDataExtractor • Local
ChemTagger vs ChemDataExtractor • Example 1
ChemTagger vs ChemDataExtractor • Example 2
ChemTagger vs ChemDataExtractor • Example 3
ChemTagger vs ChemDataExtractor • Example 4
ChemTagger vs ChemDataExtractor • Results • ChemTagger identifies chemicals and the properties. ChemDataExtractor tags chemicals. • ChemTagger gives repetitive chemicals. • ChemTagger also tags non-chemicals. • ChemDataExtractor seems to be able to handle unclean text better than ChemTagger.
Chemicals Extraction • Near Future Work • Clean the results and combine. • Chemical entities verification. • Accuracy assessment.
Recipes Extraction • Sample Recipe
Recipes Extraction • Future Work • More literature review. • From a large number of papers we can get many different recipes for the making the same chemical. • For each paper we can extract chemicals and synthesis parameters.
Recipes Extraction • Future Work • Build a database for chemicals. • Use data mining to see under which condition the chemical is more likely to be produced. • use machine learning models by providing examples of synthesis parameters and synthesis outcomes. Then, make prediction.