Computer Vision REU Week 8 & 9

Computer Vision REUWeek 8 & 9 Adam Kavanaugh

Set to be matched • Went to Dr. Sugaya and confirmed an “ideal” match. • Same Gene, different brain

The problem • Two main areas of concentration • The matching problem • Match an input gene expression to a gene or set of genes from a defined database. • Automatic data processing • Given large unprocessed slides with brain slices, segment the brains and prep them for analysis

Data Processing • Basic Procedure: • Remove the background • Cut out each individual slice • Normalize the slices • Rotate the slices so they line up • Apply a threshold to remove any brain material which does not express the given gene

Segmentation • Uses connected components with pixel ranges for the components based on the background average • Flag the background to white, and rerun connected components except only allowing 2 pixel ranges: 0 – 254 and 255 • Cut out each individual component that meets a threshold value

Connected Components • Used a very basic connected components method where it only checks its left and top neighbors • Implemented the merge function by rescanning the image and swapping component values • This later caused some problems

Size Problem • Due to my naïve implementation of the merge function in the connected components, the program is VERY inefficient • A Test run on the large image (10768 x 4072) never completed. • Ran overnight for 7 hours and was only ¼ through the image and slowing

Solution • Run the image through Gaussian Pyramid and pass the smallest level to the segmentation program. • This method completed in about 1 minute as opposed to the estimated 30+ hours of the other. • However, this causes a loss of detail which is important down the line

Saving the details • Use the knowledge that the Gaussian Pyramid cuts the image by a constant factor • Make bounding boxes around each brain component in the smaller image • Multiply the corner coordinates of each box by 8 to get the corresponding positions in the full image. • Rerun the first stage segmentation on the new slices which runs much faster

Results - Original

Results - Segmented

Results - Individual Original Final

After segmentation • Throw out any bad slices based on certain thresholds • Then apply the rest of the processing • Normalize • Rotate • Threshold

Ideal Result • In the end, the results should look something like this • This slice was manually segmented, however the rest of the process was applied.

Future Improvements • Works best on “well behaved” slides while other slides throw off the components • Try to improve this situation to make it more viable for all slide variations • Work out the bugs with cutting the segmented slices out • Integrate the other parts of the processing

Matching Problem • Current Methodology: • Pulled primitive data from Guo-Hall skeletons and color histograms • Performed Sum of Squared Differences method to get a score and created a database • Tested various input slices for accuracy.

Results • Questionable for the skeleton data. • Matches are not consistent across genes • Promising results for the color histogram matching. • For the top 5 scorers, there are at least 3 brains of the same gene type

Additions to be made • Need larger scale matching • To much variation with edges and interiors to use specific information like canny edges • Applying a weighted scoring and integrate all of the matching data sets. • Pull more data to match off of.

Computer Vision REU Week 8 & 9