300 likes | 401 Views
Unduplication of Listing Data. 12 th International Blaise Users Conference Michael K. Mangiapane Technologies Management Office. Outline. What is meant by Unduplication Using a Blaise Procedure for Unduplication Using a Maniplus Script for Unduplication
E N D
Unduplication of Listing Data 12th International Blaise Users Conference Michael K. Mangiapane Technologies Management Office
Outline • What is meant by Unduplication • Using a Blaise Procedure for Unduplication • Using a Maniplus Script for Unduplication • Using Blaise API and VB6 for Unduplication • Lessons Learned
What is Unduplication • Unduplication is the process of verifying a data item entered into a table does not duplicate previously entered data items for that same column. Specifically, we are verifying permit numbers entered in the table are unique for the Survey of Construction (SOC) listing instrument. • Unduplication should: • Have little difference on the amount of time taken to search for a duplicate, whether the FR is on line 2 or line 2400. • Prompt the FR if a duplicate is found and give them a chance to fix it before moving forward. • Be performed right after a new permit number is entered.
Blaise Procedure • Straightforward approach. • Call the procedure after a new permit number is entered. • Expectation that a duplicated number is the one that was just entered.
Procedure Challenge • How to compare all the permit numbers in a table since each line is a separate block. • Direct reference of the previous numbers create internal parameters. • Internal parameters are inefficient compared to declared parameters.
Solution • If a permit number passes unduplication, store it in a comma-delimited string at a higher-level block in the instrument. • Two strings required to hold all 2400 permits if the FR had 24 digit permit numbers. • 60,000 characters total. Blaise limit is 32,767.
How The Procedure Works • Find the first permit number in the list. • Calculate the position of the first comma. • Read the first permit number, compare it with the latest permit number. • No duplicate found, repeat.
Procedure Unduplication Testing • Instant feedback to the FR if a duplicate is found. • A noticeable lag if the FR exited the instrument and re-opened it later to finish listing. • Lag in opening the instrument, loading the listing table, and switching between parallel blocks. • Instrument looked like it was “frozen”.
Number of Permits Instrument Load Time Table Load Time 100 2 seconds 0 seconds 200 3 seconds 3 seconds 500 12 seconds 10 seconds 1000 56 seconds 54 seconds 2399 9 minutes 40 seconds 9 minutes 46 seconds Procedure Testing Times
Procedure Summary • Advantages • Checks for duplicates as permits entered. • “Instantaneous” check in listing table. • Easy implementation. • Disadvantages • Huge instrument load lag for large listings. • Lag introduced when navigating parallel blocks.
Maniplus Script • Use the INTERCHANGE setting to connect to the instrument and perform unduplication. • Nested loops act as pointers to the table. • Outer loop is the pointer to the current permit number. • Inner loop is the pointer to all other listed permit numbers. • Compare the permit numbers. • If no duplicate is found, increment the inner loop and repeat, if at the end of the table, increment the outer loop and start again. • Repeat until a duplicate is found or all permit numbers are compared.
Limitation of Maniplus • Unable to call unduplication after a permit number was entered. • Maniplus scripts in Blaise 4.7 cannot be called from the rules. Scripts may be called via a menu command or an action at the end of a block. • Associated unduplication as an action when listing is completed.
Maniplus Challenge • How to display duplicate permits. • Permits in question could be on different pages inside the instrument. • FR has to navigate between the two permits to compare information. • Inconvenient if they have to remember which line each permit was on.
Finishing Unduplication • If there were duplicates, FR is brought back to the last question. • Must run unduplication again and repeat until there are no more duplicate permits. • Inconvenient to wait until the listing is done to check for duplicates.
Maniplus Unduplication Testing • Faster than the procedure. • Longer search time as more permits were listed. • Advisory message added.
Maniplus Summary • Advantages • No lag time with instrument load, in listing table, or navigating parallel blocks. • Fairly easy implementation. • Disadvantages of using Maniplus • Does not provide the functionality requested – duplicates are not identified until after listing is “complete.” • Convoluted way in which FRs had to deal with duplicate permit numbers. • A one-time lag of up to 1 minute for large listings.
Blaise API and Visual Basic • Blaise API was already being used in the SOC LI for another program. • Builder Table. • Could unduplication run in Visual Basic to give the FR instant feedback but not degrade instrument performance?
Blaise API Design • Hybrid of procedure and Maniplus unduplication. • Direct connection to the instrument. • Search from the first line number via a loop. • Alien Router inside the instrument calls unduplication. • Embedded block inside the listing table to keep fields together.
Blaise API Challenge #1 • Even when a duplicate was found, the cursor would move on to the next field in the table. • Unduplication did not run again unless the FR backed up to the permit number field. • Tried to keep cursor in place by assigning an alien router status or clearing the keyboard buffer in VB.
Blaise API Solution #1 • Clear the keyboard buffer, then run the following IF statement • IF DS.KeyBuffer = “” = “” THEN END IF • Unduplication would compile, but would fail at run-time when it encountered this statement. • A duplicate permit number had to be fixed before leaving the field, even if FR tries to access a parallel block.
Blaise API Challenge #2 • When the cursor is on the permit number field and a parallel block tab is clicked, there was an issue with focus. • Only happens if the parallel block is clicked on by the tab, did not happen with a keyboard command.
Blaise API Solution #2 • Remove the tabs from the instrument? • Removing the tabs removes functionality used in other Blaise surveys. • Ultimately decided to leave this issue alone since FRs can navigate back to where they were.
Blaise API Unduplication Testing • Blaise API and Visual Basic 6 gave instant feedback if a duplicate was found after a new permit number was entered. • Some lag the first time it runs if re-entering a case with a large number of permits. • No lag after the initial run.
Blaise API Summary • Advantages of using the Blaise API • Very small lag time with first run of unduplication (after reloading instrument). • No lag when navigating parallel blocks. • Checks for duplicates as permits entered – functionality requested. • “Instantaneous” check in listing table (no lag). • Disadvantages of using the Blaise API • More challenging implementation – must install DLL on laptops. • Focus issue when using the mouse to change parallel blocks from the permit number field.
The Winner Is… • Blaise API and Visual Basic 6 for unduplication. • All requirements for unduplication were satisfied by this approach. • Blaise API can also be used with other listing surveys without major changes to those instruments.
Lessons Learned • A procedure would be beneficial for a smaller survey instrument that does not need the Blaise API. • Maniplus scripting would work well if unduplication did not need to be performed immediately after a permit was keyed. • Using the Blaise API is the best approach for larger instruments that require heavy lifting. • May implement checking for permit numbers ahead of the one being checked in unduplication
QUESTIONS? Contact Information: Michael K. Mangiapane U.S. Census Bureau Technologies Management Office Phone: (301) 763-1955 E-Mail: michael.k.mangiapane@census.gov