1 / 17

Metadata – What is it, and why we need it

Metadata – What is it, and why we need it. By: Roman Olschanowsky roman2u@sdsc.edu. Create some metadata. Take 5 minutes, right now, to think about YOUR data, and do some brainstorming.

rosina
Download Presentation

Metadata – What is it, and why we need it

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Metadata – What is it, and why we need it By: Roman Olschanowsky roman2u@sdsc.edu

  2. Create some metadata Take 5 minutes, right now, to think about YOUR data, and do some brainstorming. Write down a definition of metadata, and any ideas you have about metadata regarding your files and/or datasets.

  3. Metadata - data about data? • System metadata(most file systems) • Developed for OS, not very helpful to you • Size, owner, permissions, timestamps, … • Standardized metadata • File headers: jpeg, mp3, DICOM(s), … • Dublin Core: Title, Creator, Subject, Date, … • User defined metadata • XML: (Whatever I want !!!) • Database: (Whatever I want !!!) • SRB: (Whatever I want !!!)

  4. System Metadata Q: If all I have is a plain file system, how do I do metadata? A: Organization, build a meaningful hierarchy Patient (Roman) label mri surf Log File Surface File Label File flash brain wm filled aseg norm transforms Transform File Slice File Slice File Slice File Slice File Flash File Parameter_maps Slice File

  5. A good hierarchy - Is this enough? • I now have 1000’s of patients. • Dr. Suchandsuch asks me: How many of your patients have a cranial thickness greater than .5 inches? • We can dig through all the images and measure the thicknesses, but now where to store the results? • 50% are greater than .5 inches • Great! Now how many of those are male, and were scanned with a GE system? • Sir, 75% male and GE, other 25% male too but scanned with different systems (fictional numbers)

  6. Standardized Metadata • Dublin core: What is the bare minimum metadata that needs to be present? • Everybody's idea of ‘bare minimum’ is different • What’s left isn’t very useful: Format: Power Point File • File Headers: • Very useful • (Think of them as system metadata for that file type) • Width: 10px|bite rate: 128 Kbps|Scanner: GE • But, the more files you have the slower it gets! • Who decides what that header is? Does everybody actually follow that standard?

  7. User defined metadata • Finally, a place to store my “cranial thickness” attribute. • XML: • Great! It’s not platform or application specific. • But, it’s usually slow, and with lots of overhead. • Database: • Great! It’s fast and it gives me my answers, more flexible (primary / foreign keys) • But, it’s expensive (Labor, licenses) Worst: It’s separate from the data, things can become out of sync. • SRB: • Great! It’s fast and it’s apart of the same system as the data. • But, what if I take the data out of the system? How does the metadata leave too?

  8. Subject Insitution_VisitID Study SPM Analysis Analysis_ user_ timestamp_ toolcode Analysis n+1 Analysis n+1 Analysis n+1 Analysis n+1 Analysis n+1 SPM Format Snapshot n Snapshot 1 Analysis n Analysis n Analysis n Analysis n TaskData KSPACE Original DICOM Native Series BIRN Human Collection and Metadata hierarchy Analyses on many subjects across institutions BIRN_ID Timestamp XML file XML file Analyses on a subject across institutions and studies VisitID? XML file XML file Analyses on many series of a subject within an institution StudyID? XML file XML file Analyses on muliple Series done at 1 institution Image/Scanner Parameters? XML file XML file Analyses on images from this Series XML file XML file ……… XML file XML file XML file XML file XML file Freesurfer LDMM Original is a pointer to the corresponding original scanner format XML file? XML file?

  9. Directory Hierarchy SRB Metadata XML elements (non-structural) HID Database Notes BIRN Should analyses that cross multiple data levels be split out to separate hierarchy? Human All Analysis collections are writeable so that users can create their own analysis collections (snapshots) Research Project (Name__ID) <project> Project ID nc_experiment Analysis Subject (BIRN ID) BIRN ID Timestamp <subjectConst> nc_humanSubject Analysis Institution Visit (Visit__Site ID_Visit #) Visit ID Institution ID <subjectVar> Analysis nc_expComponent Study (Study__ID #) <scanner> Analysis Study ID Series (Series__localID) Series Number Scanner Parameters? nc_expSegment and [protocol section] Analysis Separate the native data and analysis for easier access control and separation (Brian’s email) Analysis Native Native Data: Represents an upload of the “original data” Analysis: Represents a different analysis (either partial or full) [research and derived data sections] <acqProtocol> <expProtocol> <datarec> Image Parameters? Snapshot 1 Snapshot 1 DICOM AFNI Analysis Sub Tree • • • Analyze Derived versions of an individual series should remain with that series? Snapshot N (Ver__SER) Snapshot N

  10. All problems solved? Why are you calling it “skull thickness”? It’s suppose to be “cranial thickness”! You have to query on “brain”, not “purkinje cell” But, a “purkinje cell” is part of the “brain” shouldn’t the system know that?

  11. Ontologies For AI systems, what "exists" is that which can be represented. When the knowledge about a domain is represented in a declarative language, the set of objects that can be represented is called the universe of discourse. We can describe the ontology of a program by defining a set of representational terms. Definitions associate the names of entities in the universe of discourse (e.g. classes, relations, functions or other objects) with human-readable text describing what the names mean, and formal axioms that constrain the interpretation and well-formed use of these terms. Formally, an ontology is the statement of a logical theory.

  12. Distribution of Ryanodine receptor in cerebellum? • Navigates down domain map • Situates result in context of domain map Brain has a Cerebellum has a Purkinje Cell Layer has a Purkinje cell is a neuron

  13. ANATOM Domain Map • Rule-based ontology map • Encodes conceptual and semantic relationships using F-logic

  14. Integrated Knowledge Map

  15. Scared? Do: • Design a file hierarchy • Agree on a “Standard Vocabulary” • Add metadata in the right places, and several places • You can always add or change things later, doesn’t have to be perfect the first time • If it’s there you will use it! • What metadata do other people want? • Automate the process! (scripts and or workflows) Do not: • Wait. It’s harder to add metadata after the fact. • Do things manually, see #7 above • Attempt an ontology, professionals are working on them already! (Unless it’s already in your approved grant)

  16. Review your notes • Take another 5 minutes to go over your notes about metadata • Any big changes you would do? • Write down any changes, additions, and revelations. • Share with us some of your discoveries.

  17. Thanks! Questions? www.sdsc.edu/srb srb@sdsc.edu

More Related