1 / 35

Spam Image Identification Using an Artificial Neural Network

2008 MIT Spam Conference. Spam Image Identification Using an Artificial Neural Network. Jason R. Bowling, Priscilla Hope and Kathy J. Liszka. The University of Akron. We know it’s bad…. 2005 – roughly 1% of all emails mid 2006 – rose to 21%.

feryal
Download Presentation

Spam Image Identification Using an Artificial Neural Network

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 2008 MIT Spam Conference Spam Image Identification Using an Artificial Neural Network Jason R. Bowling, Priscilla Hope and Kathy J. Liszka The University of Akron

  2. We know it’s bad… • 2005 – roughly 1% of all emails • mid 2006 – rose to 21% J. Swartz, “Picture this: A sneakier kind of spam,” USA Today, Jul. 23, 2006.

  3. The University of Akron December 2007 • 28,000,000 messages • 24,000,000 identified as spam and dropped

  4. Inspiration

  5. hidden input output FANN • Fast Artificial Neural Network Library • open source • adaptive, learn by example (given good input)

  6. Image Preparation • open source • converts from virtually any format to another • tradeoffs

  7. input images image2fann.cpp training data 150 × 150 pixel 8-bit grayscale jpg images

  8. number of input nodes number of output nodes number of images (input sets) 500 22500 1 .128 .123 .156 .128 .156 .254 … 1 .156 .128 .128 .123 .156 .254 … -1 spam ham

  9. two layers of hidden nodes 1 output node 22,500 input nodes

  10. Training the Network • A fully connected back propagation neural network. • Supervised learning paradigm.

  11. Activation Function • Takes the inputs to a node, uses a weight for each input and determines the weight of the output from the node.

  12. Steepness 1.0 0.5 0.0

  13. Widrow and Nguyen’s algorithm • An even distribution of weights across each input node’s active region. • Used at initialization.

  14. Epoch • One cycle where the weights are adjusted to match the output in the training file. I’m spam! I’m ham!

  15. Learning Rate • Train to a desired error. • Step down the training rate at preset intervals to avoid oscillation.

  16. Training 22604 nodes in network Max epochs 200. Desired error: 0.4 Epochs 1. Current error: 0.2800000012. Bit fail 56. Learning rate is: 0.500000 Max epochs 5000. Desired error: 0.2000000030. Epochs 1. Current error: 0.2800000012. Bit fail 56. Epochs 20. Current error: 0.2800000012. Bit fail 56. Epochs 40. Current error: 0.2251190692. Bit fail 56. Epochs 60. Current error: 0.2074941099. Bit fail 65. Epochs 71. Current error: 0.1479636133. Bit fail 48.

  17. input images image2fann.cpp train.c training data test.c FANN ham spam

  18. 572 Trained Images75 hidden nodes

  19. 572 Trained Images50 hidden nodes

  20. Corpus

  21. training data Scaling to number < 1 (divide by 1000) grayscale intensity 0 - 256 limited to 0 – 0.25

  22. Current Work • complete corpus • multipart images • separate ANNs • hidden nodes • color • image size

  23. Priscilla Hope

  24. Thank you!

More Related