1 / 11

E-Mail Q&A

E-Mail Q&A. Telecooperation Group TU Darmstadt. Interoperability. No need to implement everything from RFCs 2045-2047 Way too much work Correctly implemented, you would out-standard most common e-mail clients Your implementation should have this functionality 7Bit encoding

Download Presentation

E-Mail Q&A

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. E-Mail Q&A Telecooperation Group TU Darmstadt

  2. Interoperability • No need to implement everything from RFCs 2045-2047 • Way too much work • Correctly implemented, you would out-standard most common e-mail clients • Your implementation should have this functionality • 7Bit encoding • Quoted printable & Base64 encoding with all charsets Java can handle (i.e. every charsetName that does not throw an UnsupportedEncodingException) • Multipart messages are recognized and decoded correctly • Robustness: Do not choke on unrecognized headers • Programs will be tested with public test cases + secret ones • Secret test cases only use above mentioned functionality, too

  3. Headers • Multiline-Headers • Line continuations start with a “folding whitespace” –may be space or tab (\t) • Ignore every header you do not know • If you want, you can also display additional headers like BCC – but required are only those mentioned in milestone 3.1 • Case-sensitivity • Header names are always case-insensitive • c.f. RFC 2822, section 1.2.2. „Characters will be specified […] by a case-insensitive literal value enclosed in quotation marks“ • Header values used in the assignment are usually case-insensitive, e.g. Content-Transfer-Encoding: Base64 and base64 are both possible • Exceptions: multipart-boundaryall header values displayed to the user

  4. Date • Look into the documentation of SimpleDateFormat • no need to parse each item for yourself, even recognizes “GMT” and “UTC” as timezones • Modify the parser with Locale.US in order to let it parse things like “May” • Output via DateFormat.getDateTimeInstance() • Timezone • Setting via SimpleDateFormat or Calender#setTimeZone is preferred to manual time manipulation • Reason: DateFormat may be configured to display the timezone

  5. Attachments • Base64 encoded lines are always 76 characters wide – only exception is the last line • If numberofchars % 4 != 0, you may just throw an exception and terminate • Do not use javax.mail.internet.MimeUtility or similar additional libraries for decoding • Use the Content-Disposition header to suggest a name for saving • Attachments that are not of type text/… don’t have and don’t need a charset • Just treat as stream of bytes/byte array

  6. Base64-Example • Take group of 4 charactersS W 4 g • Decode according to RFC • S = 0x12; W = 0x16; 4 = 0x38; g = 0x20 • Decoding may be done in groups: A-Z  char – ‘A’; a-z  char – ‘a’ + 26;0-9 = char – ‘0’ + 26*2; +, /, = must be treated separately • Combine to 24 bit number, shift according to index (big endian) • 0x12 << 18 | 0x16 << 12 | 0x38 << 6 | 0x20 << 0  0x496e20 • Shift number back in 8 bit blocks (also big endian) • Byte 0 = 0x496e20 >> 16 & 0xff = 0x49 • Byte 1 = 0x496e20 >> 8 & 0xff = 0x6e • Byte 2 = 0x496e20 >> 0 & 0xff = 0x20

  7. Decoding • Your own input stream • Elegant way of decoding Base64 and Quoted-Printable data(you can do it differently, only a suggestion) • Extend java.io.InputStream • Take character-array of undecoded data as parameter • Overwrite read() • Decode the character data when • Return -1 if end of data reached • Let the InputStreamReader deal with the nasty problem of decoding charsets • Sample application has only 50 LoC for decoding quoted printable, 100 LoC for Base64

  8. Regular Expressions • Regular expressions are a nice way for filtering out substrings • A bit like file name patterns (*, ?), but more powerful • Letters, Numbers remain the same • Punctuation characters usually have a special meaning, for characters escape them by a \ • to use the character [, use \[ • Attention: you need to escape the Backslash in Java-Strings  \[ == "\\[" • Alternatives: use [] • [abc] matches a or b or c • [A-Z] matches A or B or … or Z • Negation: [^abc] matches everything but a or b or c • Wildcard . matches everything • Repetition • * means “the previous element zero or more times” • + means “the previous element one or more times”

  9. Regular Expressions with Java • Part of java.util.regex • First, compile the pattern to search: • Pattern p = Pattern.compile("charset=[^ ]*") • The compile method has a variant that takes flags – use it for case-insensitivity: Pattern.CASE_INSENSITIVE • Next, make a Matcher for a String out of it • Matcher m = p.match("Content-Type: text/plain; charset=\"us-ascii\"") • Be sure to call the Matcher’s find method • m.find() • m.group(0) now contains everything that maches • charset="us-ascii"

  10. Grouping • You need the thing after “charset=“ • Solution 1: parse for yourself • Solution 2: add groups to the expression • Groups are signified by () and counted from 1 • Pattern p = Pattern.compile("charset=([^ ]*)") • After matching, group(1) contains "\"us-ascii\")

  11. Debugging • Mail clients should be able to connect to the server and fetch the mail • Always helpful: try to connect to the pop-server via telnet and issue POP commands manually • For closer examination, you may unzip the JAR-file and have a look at “mailbox.xml”

More Related