660 likes | 907 Views
caTIES User Meeting 2010. Rebecca Crowley, MD, MS Associate Professor Department of Biomedical Informatics University of Pittsburgh School of Medicine crowleyrs@upmc.edu. Welcome. Welcome to all our users and partners! In person and Webinar Introductions
E N D
caTIES User Meeting 2010 Rebecca Crowley, MD, MS Associate Professor Department of Biomedical Informatics University of Pittsburgh School of Medicine crowleyrs@upmc.edu
Welcome • Welcome to all our users and partners! • In person and Webinar Introductions • Many types of users here with us today • Planning yearly User Meeting through the next five years
Purpose of this meeting • Get your feedback on the current product • Gather your ideas and requirements for future releases • Find out how you are using caTIES • Learn more about barriers to adoption • Build user community • Provide most up-to-date information Sow seeds for future opportunities (nationwide virtual tissue-bank, etc)
Sessions for Today 9:00 - 9:55 AM - Session 1 caTIES Introduction, History and Features – Rebecca Crowley 10:00 - 10:55 AM - Session 2 caTIES User Presentations Tara McSherry – University of Pennsylvania, Philadelphia, PA David Carrell – Group Health Cooperative, Seattle, WA Umit Topaloglu – University of Arkansas, Little Rock, AR 11:00 - 11:55 AM - Session 3 caTIES Installation – Girish Chavan 1:00 – 1:55 PM - Session 4 caTIES Dataloading and Customization – Kevin Mitchell 2:00 - 2:55 PM - Session 5 caTIES Discussions of Future Directions – Rebecca Crowley
caTIES • Open source system for coding, storing, and retrieving clinical reports within a single institution or across many institutions • Developed first for pathology reports • Uses natural language processing and controlled vocabulary to encode and semi-structure reports • Leverages Honest Broker model to provide de-identified data within constraints of HIPAA • Access to tissue and workflow for tissue-bank • Strong security and privacy model built on foundation of protocol-based research • Novel GUI and back end functionality • Deployed at University of Pittsburgh and other cancer centers and institutions across the country
Problems to solve • Reduce unnecessary barriers and unnecessary work for translational researchers • Make entire corpus de-identified and searchable • Improve retrieval • Many ways to say the same thing (synonymy) • Pertinent negatives (negation) • Temporality and aggregation • Create network of institutions sharing data; building collaborations
Benefit to translational research • Access to Clinical Report Information • Surgical pathology reports contain a wealth of information including • Histologic type • Stage and size • Prognostic factors (LN, ALI, PNI, margins) • Immunophenotype, molecular markers • Radiology and Pathology together • Access to Tissue • Some things we may not see much of in the future, retrospection may be only way to get these kinds of tissues • Contains a lot more than you think! (Normal tissue, non-CA diseases) • Ability to correlate this information with other information • clinical trial eligibility, SNP profiling, LOH, outcomes,
History - 1 • Chapter 1 - Shared Pathology Informatics Network • NCI Sponsored U01 (Collaborative Consortium) • Harvard,UCLA, Regenstreif, Pittsburgh • NLP tools, ideas, first demonstration of data-sharing • Chapter 2 - caBIG • Large multi-CC contract/community developing architecture + open source, interoperable systems for cancer research community • caTIES code base • Security Architecture • Grid Communications • User Interface • Version 1 and 2 • Four institutions sharing some data, IRB protocol Penn-Pitt
History - 2 • Chapter 3 – University of Pittsburgh CTSA Project • Deployed at University of Pittsburgh • Worked with Tissue Bank • Set up honest brokers • Policies and Processes • IRB issues • CaTIES v3 • Chapter 4 - caTIES NCI funded project • More autonomy • Stable, consistent release cycle • User community • New features • And that doesn’t include all the stuff that you all are doing – that we don’t know about!!!!
Details of our new NCI grant • Continued Development of Software R01 • New platforms and DBMS, new features, new report types and formats, auditing • Evaluation • Development of User Community, Support • User meetings yearly • Forums and other communication TBD • Pilot projects in Y4 and Y5
Sample queries Patients with dysplastic nevi who were diagnosed with melanoma after an interval of at least one year
Sample queries Women diagnosed with Lobular Carcinoma or LCIS who had a subsequent mastectomy
Entities and Roles P.I. /Researcher Researcher Researcher LocalAdministrator Tissue Consumer Tissue Provider Data Consumer Data Provider HB / Tissue Bank Staff Distribution Protocol Organization Honest Broker Honest Broker HB / Tissue Bank Staff
Policy • HIPAA • Institutional Review Boards • Honest Brokers • Materials transfer agreements for tissue
Institutional Review Boards • Data Providers, Data Consumers • Tissue Providers, Tissue Consumers • Must have IRB protocol to provide tissue and data • Must have IRB protocol to consume tissue and data • Institutions (through HB) must “sign-up” for a relationship to a given study
Honest Brokers • Ensure HIPAA compliance for release of information • Acts as intermediary to researcher • Maintains linkage file • Enables secure return to data so that researcher never exposed to identifiers
Importance of Honest Broker in caTIES • Can never be researcher and HB on the same protocol • Review IRB documents – Should Dr. X have access to caTIES? • Fulfill orders, find other information, send to researcher • QA of de-identified reports • Sign on to a protocol from outside institution (for a network)
Sharing data and materials • Trust agreements • Differing local IRB requirements • Complex intra-and inter institutional environments • Data User Agreements vs NHS • Materials Transfer Agreements
caTIES Release Cycle • Bi-Annual software releases. (June and Dec) • 2 month development cycle followed by 1 month test and release cycle. • Use Feature Requests tracker to post new feature requests for next release. • Software released to Source Forge Downloads site. • Source code in CVS is tagged by version. E.g. v3-6B1 • New release announcements on the website and new section of SourceForge. • In the future will post messages to SourceForge caties-user mailing list.
Getting more information on caTIES General Info: http://caties.cabig.upmc.edu/Wiki.jsp?page=History Code: https://sourceforge.net/projects/caties/develop/ Manual and Training: http://caties.cabig.upmc.edu/Wiki.jsp?page=UserManual Forums: https://sourceforge.net/projects/caties/forums/forum/626701
Questions for Discussion Today • For institutions with deployed systems • How are you using caTIES? • How do you want to use it in the future? • What are the technical and social barriers that you have overcome? • What barriers remain? • What needs to be done to make it a more valuable tool for researchers? • What can we do to help?
Questions for Discussion Today • For institutions still considering adoption • What problems are you hoping to solve using caTIES? • What barriers do you perceive in your environment? • What additional information do you need?
Questions for Discussion Today • For everyone…. • How are we doing? • What do you want to see in the next release? • How involved would you want to be in a user community? • Develop and contribute? • Answer questions? • Work on grants together? • What are the best ways to communicate?
caTIES Requirements RecommendedHardware Server: • Processor: Intel 3 GHz+ or AMD equivalent. • Memory (RAM): 3GB+ • Hard disk space (Fixed): 11GB • Hard disk space (Variable): 14GB (Assumes 350k reports. Each report takes 0.04MB) • Hard disk space (Total): 25GB Client: • Processor: Intel 1.5GHz+ or AMD equivalent. • Memory (RAM): 500MB+ • Hard disk space: 500MB Software • Operating System: Microsoft Windows 2000/XP or Linux • Web Server: Apache Tomcat 5.5.x • DBMS:MySQL Enterprise/Community Server 5.1 or Oracle 9i+ • Metamap Transfer(MMTX)
Supported caTIES Installations Single Machine Installation • Should use if • Installing caTIES to try it out. • caTIES is for internal use only. • Install all caTIES components on one machine. • Easier and faster to install. • No physical separation of identified and de-identified data – less secure. Dual Machine Installation • Should use if • wanting an option to share your data in a secure manner with other caTIES nodes. • wanting an option to allow your users to access caTIES from outside your institution’s firewall. • Identified data stored on separate machine - most secure. • More configuration required. Both types of installations use same software and are production ready. Upgrading from single to dual machine installation involves moving of identified databases to another machine and installing caTIES again with a dual machine setup.
caTIES Installers Simple Installer • Supports only single machine installation. • Uses defaults for most configuration options. Asks less questions. • Recommended for use only for first time installations. • Will get even simpler in the future by bundling Tomcat and MySQL. Regular Installer • Supports single or dual machine installation. • Lets you configure most configuration options through installer. • Can update an older v3.5 installation and preserve data. There are separate installer executables for Oracle and MySQL for each type of installer.
Important Installation Tips • Install all pre-requisite software before running installer. • Use forward slash(/) and short file names (8.3 format) in the installer when installing on Windows. • Specify external IP address for server machine in the installer to enable use from the network. • Install as Administrator when installing on Windows Vista or Windows 7. • If caTIES does not work after installation, verify the installation parameters in these files: • <catiesInstallLocation>/public/variables.txt OR • catiesInstallLocation>/private/variables.txt • If the installer fails during database creation or service installation at the final screen of the installer, you can rerun those tasks from the command prompt. The batch files are located in <catiesInstallLocation>/scripts/
Updating from caTIES v3.5 • Use the regular installer. • When asked for database names in the installer, specify your existing caTIES database names. • Select the “Update from v3.5” check box. • Installer will run sql scripts at the end of the installer to update the v3.5 schema to the latest schema.
Data Loading and Customization Kevin Mitchell January 21, 2010
CaTIES Customization Points • Acquisition Pipeline • DeIdentification Pipeline • Ties Pipeline • Index Pipeline • Grid Services API • CaTIES Dispatcher Pattern • GATE Platform
HL7ImportPipeController Reads HL7 files into the caTIES database In caTIES.properties caties.hl7importer.directory.home: Directory monitored for new hl7 files. RaceConfig.txt,EthnicityConfig.txt,GenderConfig.txt Map local value domain for race, ethnicity and gender to a standard caBIG value domain.
Loading data directly into database For all tables, "id" is the primary key and must be populated. ORGANIZATION - Only one organization (i.e. only one record) should be present in this table. IDENTIFIED_PATIENT - This table will contain the identified/private information for the patient. ID FIRST_NAME LAST_NAME SOCIAL_SECURITY_NUMBER and/or MEDICAL_RECORD_NUMBER ORGANIZATION_ID This should be the same as ID in ORGANIZATION table. IDENTIFIED_SECTION - This table contains the SPR text, or rather fragments of the SPR text organized as sections. Hence for each record in the IDENTIFIED_PATHOLOGY_REPORT table, there will be one or more records in the IDENTIFIED_SECTION table. NAME - The name of the section in the SPR. DOCUMENT_FRAGMENT - This contains the text for the section in the SPR. PATHOLOGY_REPORT_ID - The id that maps into the IDENTIFIED_PATHOLOGY_REPORT table.