1 / 14

Spam

Spam. 軟體工廠. SMTP protocol. Why we have to know smtp ? SMTP : Simple Mail Transfer Protocol (rfc821) A smtp sample. SMTP sample. 2004/11/5 下午 02:50:55 <Send> 220 Want Email No Spam !. 2004/11/5 下午 02:50:55 HELO sohu.com 2004/11/5 下午 02:50:55 <Send> 250 OK

nami
Download Presentation

Spam

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Spam 軟體工廠

  2. SMTP protocol • Why we have to know smtp ? • SMTP : Simple Mail Transfer Protocol (rfc821) • A smtp sample

  3. SMTP sample 2004/11/5 下午 02:50:55 <Send> 220 Want Email No Spam !. 2004/11/5 下午 02:50:55 HELO sohu.com 2004/11/5 下午 02:50:55 <Send> 250 OK 2004/11/5 下午 02:50:56 MAIL FROM:<hggme.hggme@sohu.com> 2004/11/5 下午 02:50:56 <Send> 250 OK 2004/11/5 下午 02:50:56 RCPT TO:<spam@microbean.com.tw> 2004/11/5 下午 02:50:56 <Send> 250 OK 2004/11/5 下午 02:50:57 DATA 2004/11/5 下午 02:50:57 <Send> 354 OK DATA

  4. From: =?Big5?B?TGludXixTbd+u3vD0qSkpN8=?= <hggme.hggme@sohu.com> Subject: =?Big5?B?s8ymaKX4t36/76XOpEikfqFBsU23fqfes06orbv5sKqhSQ==?= To: "spam@microbean.com.tw" <spam@microbean.com.tw> Content-Type: multipart/alternative; boundary="=_NextPart_2rfkindysadvnqw3nerasdf" MIME-Version: 1.0 Date: Fri, 5 Nov 2004 14:48:38 +0800 MIME-Version: 1.0 X-Priority: 3 X-MSMail-Priority: Normal .......................Data . 2004/11/5 下午 02:50:58 Mail Save OK! 2004/11/5 下午 02:50:58 <Send> 250 OK 2004/11/5 下午 02:50:58 QUIT 2004/11/5 下午 02:50:58 <Send> 221 Bye 2004/11/5 下午 02:50:58 WM_Disconnect:584 Mail Header

  5. What’s wrong with SMTP? • 免費的服務 (沒有使用者付費的概念) • 寄件者無法標地 • Helo 沒有確實實現檢查驗證 • Mail From 可以造假,無法驗證 • Mail Header 皆可造假

  6. What’s solution ? • 積極解 • SenderID (Microsoft) • Smtp 被新的通訊服務取代 • 消極解 • smtp server 小把戲 (pchome) • Spam Filter

  7. Filter 方法 • SenderIP (block / allow) • Sender (block / allow) • Subject (block / allow) • MailTo verify • Content keyword filter

  8. Content keyword filter • Simple • Bayesian • Text Classification (fuzzy-logic filters)

  9. Bayesian Filter OK Mail DB Spam Mail DB Token DB If the word "mortgage" occurs in 400 of 3,000 spam mails and in 5 out of 300 legitimate emails, for example, then its spam probability would be 0.8889 (that is, [400/3000] divided by [5/300 + 400/3000]).

  10. Bayesian Filter 該信件的 Spam Rate 目前 total Spam Mail 數 Spam Rate = 該信件的 Spam Rate 該信件的 OK Rate + 目前 total Spam Mail 數 目前 total OK Mail 數 If Spam Rate > 裁定數 => 廣告信 1.0 > 裁定數 > 0.5

  11. Bayesian Filter Step • Token design • Rating • Judging – system , human • Adjustment - human

  12. Token design • 如何決定 token ? • 具備鑑別性 • 隨著鑑別性產生的誤判 • 根據該環境的信件特性決定 • Ex : $ , 元 , 便宜 …… • 嚴格或寬鬆的影響

  13. Spam filter 的挑戰 • 廣告信的行為,隨時在改變及成長 • 越精準的過濾 , 誤判也會隨之越高 • 人為的認定誤判(忘記訂閱了該份電子報) • 利用可信賴的第三方內容發信 • 混淆視聽法

  14. 結論 • Spam Filter 需要不斷的更新邏輯 • 系統裁定的嚴謹度,由使用者自行決定 • 誤判回饋機制需要完善 • www.sotfworking.com

More Related