170 likes | 374 Views
Filtering of Spam E-Mails Using Back-Propagation Neural Networks. Class : 資四A Professor : 楊維忠 Reporter : 林文仁 Team Members : 江念庭 林俊宇 黃國峰. Outline. Neural Network Back-propagation algorithm Flow chart of research Input & output System environment Flow chart of filtering e-mail Example
E N D
Filtering of Spam E-MailsUsing Back-Propagation Neural Networks Class:資四A Professor:楊維忠 Reporter:林文仁 Team Members:江念庭 林俊宇 黃國峰
Outline • Neural Network • Back-propagation algorithm • Flow chart of research • Input & output • System environment • Flow chart of filtering e-mail • Example • Conclusion
Neural Network Target Neural Network connections (called weights) between neurons Compare Output Input Adjust weights
Forward pass Back-propagation algorithm—the multilayer feedforward network Input layer Hidden layer Output layer neuron1 w1 Σ …… b Σ 1 …… …… result neuron2 b …… 1 wi: weight of i wi neuronj b: bias : transfer function
Flow chart of research 參考文獻 分析 mail & maillog, 定義垃圾郵件行為 樣本訓練 類神經網路 測試網路不適用並重新訓練 測試網路適用並結束訓練 與郵件伺服器相互整合
Input & output • Input • 共有28項規則,底下提出常遇到的項目。 • 6為 header-To(收件人) == header-Reply-To(收回覆信的人) ,則input第6項的值為1 • 17為 header-From(寄件人) != maillog-from(記錄檔裡的寄件人),則input第17項值為1 • 25為 header-Date(發信時間) 與 系統時間 差異太大,則input第25項值為1 • Output • Output value between 0.0 and 1.0
System environment • OS • Red Hat Enterprise Linux AS 4 • Mail server • Sendmail 8.13.1 • Client using browser • OpenWebMail 2.52 • Provide web GUI for checking mail • Software tools • Matlab 7
Milter (Mail Filter) Matlab BPN (Neural Network) Add, Change headers Flow chart of filtering e-mail maillog Sendmail server header get_value User’s mailbox
Example-1 透過 telnet傳遞一封垃圾信 ehlo localhost Mail from: s13943013@mail.nuu.idv.tw RCPT TO: s13943013@mail.nuu.idv.tw Data From: “s” s13943013@mail.nuu.idv.tw To: s13943013@mail.nuu.idv.tw Reply-To: s13943013@mail.nuu.idv.tw Subject: 中文信 Date: +0800 …. Quit
Example 收到信件 並已偵測 為SPAM
Content of headers 收件人與收回覆的email相同 ,常理應不相同.
Example-2 Server 上 Maillog 的內容
Conclusion • Identification rate ≒ 80%. • Defined rules with subjectiveness. • Better to combine filtering of content. • eg. SpamAssassin