An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC

An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC Grace Dasovich Robert Kim Midterm Presentation August 21 2009

Outline Outline • Related Work • Data • Modeling Approach and Results • Similarity Measures • Artificial Neural Network • Multivariate Linear Regression • Conclusions • Future Work

Related Work • Computer-Aided Diagnosis (CADx) based on low-level image features • Armato et al. developed a linear discriminant classifier using features of lung nodules • Need to find the relationship between the image features and radiologists’ ratings

Related Work • Image features and the semantic ratings • Lung Interpretations • Barb et al. developed Evolutionary System for Semantic Exchange of Information in Collaborative Environments (ESSENCE) • Raicu et al. used ensemble classifiers and decision trees to predict semantic ratings • Samala et al. used several combinations of image features and the radiologists’ ratings to classify nodules

Related Work • Similarity • Li et al. investigated four different methods to compute similarity measures for lung nodules • Feature-based • Pixel-value-difference • Cross correlation • ANN

Materials Data • LIDC Dataset • 149 Unique Nodules • One slice per nodule, largest nodule area • 9 Semantic Characteristics • Calcification and Internal Structure had little variation, thus were not used • 64 Content Features • Shape, size, intensity, and texture 6

Outline • Related Work • Data • Modeling Approach and Results • Similarity Measures • Artificial Neural Network • Multivariate Linear Regression • Conclusions • Future Work

Similarity Measures • Cosine Similarity • Jeffrey Divergence • Euclidean Distance

Similarity Measures

Similarity Measures • Computed feature distance measures

Two three-layer ANNs Input (64 neurons), hidden layer (5 neurons), output (1) Input (64 neurons), hidden layer (5 neurons), output (7) Input = 64 feature distances Output = Semantic similarity or difference in semantic ratings Hyperbolic tangent function, backpropagation algorithm, 200 iterations Methods

ANN with a single output 640 random pairs from all 109 nodules 231 pairs from nodules with malignancy > 3 496 pairs from nodules with area > 122 mm2 Methods

Methods • ANN with seven outputs • 640 random pairs from all 109 nodules

Methods • Leave-one-out method • Cosine similarity or Jeffrey divergence or difference in Semantic ratings used as teaching data • An ANN trained with entire dataset minus one image pair • The pair left out used for testing • Correlation between calculated radiologists’ similarity and ANN output calculated

Methods • ANN with a single output • 640 random pairs from all 109 nodules • 231 pairs from nodules with malignancy > 3 • 496 pairs from nodules with area > 122 mm2 • ANN with seven outputs • 640 random pairs from all 109 nodules

ANN using 640 random pairs Results

ANN using 231 pairs with malignancy rating > 3 Results

ANN using 496 pairs with area > 122 mm2 Results

ANN output vs. target values using Jeffrey divergence for the 640 pairs (r = 0.438) Results

ANN using random 640 pairs and the Jeffrey divergence with seven semantic ratings Results

Methods Methods • Normalization of Features • Min-Max Technique • Z-Score Technique • Pair Selection • Looked for matches between k number of most similar images based on semantic and content 24

Methods Methods • Multivariate Regression Analysis • Select features with highest correlation coefficients • Feature distance measures 25

Nodule Analysis Determine differences between selected and non-selected nodules Define requirements for our model Methods

Results Results 27

Results

Results Results R2 = 0.871 29

Results Results A. Equivalent Diameter, B. Standard Deviation of Intensity, C. Malignancy, D. Subtlety

Conclusions Preliminary Issues • The ANN also is not yet sufficient to predict semantic similarity from content • Best correlation 0.438 • Malignancy correlation 0.521 • Jeffrey performed better unlike linear model • A semantic gap still exists

Conclusions Conclusions • Our linear model applies to a specific type of nodule • Characteristics: High malignancy, high texture, low lobulation, and low spiculation • Features: Larger diameter, greater intensity • Linear models are not sufficient for determination of similarities • R2 of 0.871 with chosen nodules 35

Future Work Future Work • Reduce variability among radiologists • Use only nodules with radiologists’ agreement • Find best combination of content features • 64 may be too many • Currently only using 2D

Future Work • Different semantic distance measures • Some ratings are ordinal, Jeffery is for categorical • Different methods of machine learning • Incorporate radiologists’ feedback into training • Ensemble of classifiers

Thanks for Listening Thanks for Listening Any Questions? 38

An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC

An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC

Presentation Transcript

Investigation into the Influence of Magnesia content, Alumina content and Silica content on the Mineralogy and Properti

The relationship between:

The Relationship between Internal Audit and Information Security: An Exploratory Investigation

MEASURING THE SIMILARITY BETWEEN IMPLICIT SEMANTIC RELATIONS USING WEB SEARCH ENGINES [2009]

An Investigation into Immersive Visualization

Looking into the relationship between teacher and principal evaluation

Measuring Semantic Similarity between Words Using HowNet

Sentence Similarity Based on Semantic Nets and Corpus Statistics

Student Centered Investigation - The Relationship between Polynomials and Matrices

A semantic similarity metric combining features and intrinsic information content

Identifying free text plagiarism based on semantic similarity

Semantic Content based Modeling

An Investigation into the relationship between the PEG Ratio and the Capitalization Rate

Feature Based Approaches to Semantic Similarity

Integrating Databases into the Semantic Web through an Ontology-based Framework

Content-Based Similarity Search

International Health Links: an investigation into health partnerships between Wales and Africa.

MEASURING THE SIMILARITY BETWEEN IMPLICIT SEMANTIC RELATIONS USING WEB SEARCH ENGINES

A Preliminary Investigation of the Relationship between Attachment and Emotion Perception

Sentence Similarity Based on Semantic Nets and Corpus Statistics

An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC

Semantic Similarity Measurement and Geographic Applications Similarity approaches