210 likes | 312 Views
Splitting Setgen into two use cases?. Ruaraidh Sackville Hamilton International Rice Research Institute Los Baños, Philippines. Use cases. Parents managed by user: Entry point = parents; Setgen good Making crosses Making selections Bulking up seed
E N D
Splitting Setgen into two use cases? Ruaraidh Sackville Hamilton International Rice Research Institute Los Baños, Philippines
Use cases • Parents managed by user:Entry point = parents;Setgen good • Making crosses • Making selections • Bulking up seed • Parents managed by others:Entry point = offspring;Data quality problems with Setgen • Incoming seeds received from others • (Entering historical data) ICIS developers' workshop
Case 1. User manages parents Step 1: Before trial, create list of existing parental GIDs to be included Step 2: After trial, create list of new progeny GIDs to be created Step 3: Create data for progeny ICIS developers' workshop
Use case 1 - sub-cases:Making selections vs bulking seed Number of progeny GIDs per parent: • Making selections • 0, 1 … N = number of offspring liked by the breeder • DER • Bulking seed • 0 = failure of seed increase • 1 = normal successful seed increase • Usually MAN • (N>1: for the special case of splitting mixed accessions into uniform components) ICIS developers' workshop
Selection and seed increase:Features of GERMPLSM ICIS developers' workshop
Selection and seed increase:Features of NAMES • Typically one name per progeny GID • = Preferred name, NSTAT=1 • Also functions as preferred ID ( ≡ NSTAT=8) • NVAL assigned automatically as f(parental name) • User-specific rules for assigning NVAL: • Selection by IRRI breeder: = NVAL of preferred name of GPID2 & “-N” • Seed increase by IRRI GRC: = NVAL of preferred ID of MGID & “:YYYYSS” ICIS developers' workshop
Selection and seed increase:Features of NAMES ICIS developers' workshop
Use Case 1 summary • Sub-cases for selection and seed multiplication very similar • One Setgen suitable • User-defined customisation to handle the differences • User-defined customisation for ease of use • Setgen Cf GRIMS • Setgen = just workflow for parent offspring GIDs • GRIMS = whole workflow for selecting, growing, processing the harvest, storing the harvest • Setgen is just one element of the workflow controlled by GRIMS • Should Setgen be extended to handle the whole workflow? ICIS developers' workshop
Case 2. User receives seed from others 1: Initial data on batch: LISTNMS, EVENTMEM Need fast routine entry of data without need for expert judgements quick release by SHU 2: Initial data on new GIDs as orphans 3: Upload to central (for external receipts processed by SHU) 4: Search central for existing GIDs representing the parents 5: Update data for new GIDs with parents already in central 6: Create GIDs for parents not already in central 7: Update data for the new GIDs from those parents 8: Scan / file / deposit original documents FILELINK ICIS developers' workshop
Case 2 step 1: batch data • LISTNMS • EVENTMEM links to PERSONS, INSTITUT • Batch ID, batch description, date received, donor person, donor institute • IP conditions e.g. SMTA, SMTA with additional restrictions, other restrictions • FILELINK • To point to original documentation:e-files;Scanned paper documents ICIS developers' workshop
2: Initial data entry for new GIDs:GERMPLSM ICIS developers' workshop
2: Initial data entry for new GIDs:NAMES • Germplasm provider may provide: • ± pedigree info • 0, 1 … N names • Choose name values to enter as • ENTRYCD, SOURCE, DESIG, GRPNAME • Enter in LISTDATA • Create NAMES records • Preferred ID (if specified by user’s rules) • Automatically assigned NVAL by user’s rules • NSTAT=8, NLOCN=GLOCN, NDATE=today, NTYPE=user-specified) • Names provided by provider • With missing NSTAT, NLOCN, NDATE, NTYPE ICIS developers' workshop
4: Searching central for GPID2 • Does central already have a GID representing the provider’s sample? • Issue: • Many GIDs may share the same name • Nothing to indicate what each GID represents • New field GREPRESENTS?? • Guidance from GLOCN, NLOCN, NTYPE, NSTAT, and same fields of candidate’s GPID2 & GPID1 • ot easily seen in GMS_Search • Many errors in GLOCN, NLOCN, NTYPE, NSTAT • IRTP 456: 15 GIDs, 48 errors, 9 missing GIDs • Azucena: 74 GIDs, 27 missing, 10 unidentifiable, > 60% of GPID1-GPID2 values wrong • Inconsistent / inadequate user understanding • Inadequate data validation ICIS developers' workshop
GRepresents values proposed in 2009 • Good candidates for GPID2 • Accession conserved in genebank at GLocN • Breeder's selection or other line produced at GLocN • Sample maintained at GLocN for testing in nurseries • Copy of a genebank accession or breeder's line held informally at GLocN • Possible candidates • Notional GID required for historical pedigree • Inconsistent data • Unvalidated • Not possible as candidates • Cross made at GLocN • Sample collected from field or market at GLocN • Except for new direct accession from field ICIS developers' workshop
4: Searching central for GPID2 • Perfect match: • Provider specifies own preferred ID • Provider uses same ICIS central • Gives GID of own sample • Sample from provider’s curated collection • Gives preferred name & ID as separate identifiers • Sample bred by provider • Line name is only name, serving as ID and name • Candidate GID has • (GID represents sample managed by donor) • GNPGS < 0 • GLOCN = donor’s locid • Name with matching NVAL and: • Name with NLOCN=GLOCN • Only one name, or name with NSTAT=8 ICIS developers' workshop
4: Searching central for GPID2 • Super perfect match = perfect match plus • Provider specifies their donor’s preferred ID • Candidate GID has GPID2 with single name or with preferred ID matching provider’s donor’s preferred ID • Provider specifies the original collected sample ID • Candidate GID has GPID1 with preferred ID having NTYPE=9 and NVAL=collected sample ID • Provider specifies the pedigree • Candidate GID has the same pedigree ICIS developers' workshop
4: Searching central for GPID2 • Imperfect match: • Provider does not specify own preferred ID • Not professional germplasm manager • E.g. Provides only cultivar name or pedigree • Partial match; “matching” name (allowing variants): • “GID represents” not specified • GLOCN ≠ donor’s locid • NLOCN ≠ donor’s locid • Multiple names, none with NSTAT=8 • Provider = genebank, gives accession ID, but no NID with NTYPE=1, NSTAT=8 • Data reliability • Multiple NIDs with same NVAL but inconsistent NLOCN, NDATE, NSTAT, NTYPE, GPID1, GPID2 Potentially unreliable ICIS developers' workshop
4: Searching central for GPID2 • Search: • Calculate % match to donor’s sample • Sort by % match • Calculate reliability ICIS developers' workshop
5: Successful search for GPID2 • Assign • GPID2 := selected candidate • GPID1 := GPID1 of GPID2 • Display reliability and all recorded distinct values of NLOCN, NDATE, NSTAT, NTYPE, GPID1, GPID2 for same NVAL • Expert user corrects wrong data for GPID2 & GPID1 • After correcting GPID2, new GID: • Inherits NLOCN and NDATE of GPID2 • Is assigned NTYPE and NSTAT by user-rules • May be directly inherited e.g. NSTAT=1 • May be changed: e.g. NSTAT=8 for GPID2 NSTAT=0 for new GID ICIS developers' workshop
6: Unsuccessful search for GPID2 • Repeat • Step 3, create GIDs with partial data to represent GPID2 • Steps 4-6, look for and use or created source of GPID2 • Iteration finishes with • Successful search for source of source, or • Source of source = GPID1 ICIS developers' workshop
Intermediate cases • Transfers of seed between users of the same ICIS central • For recipient, like handling incoming seed, but with parent GIDs already defined in provider’s list • Seed increase of mixed accessions, splitting into uniform components as new accessions • Initially like seed increase, but then like receiving new accession ICIS developers' workshop