110 likes | 227 Views
E-Mail Q&A. Telecooperation Group TU Darmstadt. Interoperability. No need to implement everything from RFCs 2045-2047 Way too much work Correctly implemented, you would out-standard most common e-mail clients Your implementation should have this functionality 7Bit encoding
E N D
E-Mail Q&A Telecooperation Group TU Darmstadt
Interoperability • No need to implement everything from RFCs 2045-2047 • Way too much work • Correctly implemented, you would out-standard most common e-mail clients • Your implementation should have this functionality • 7Bit encoding • Quoted printable & Base64 encoding with all charsets Java can handle (i.e. every charsetName that does not throw an UnsupportedEncodingException) • Multipart messages are recognized and decoded correctly • Robustness: Do not choke on unrecognized headers • Programs will be tested with public test cases + secret ones • Secret test cases only use above mentioned functionality, too
Headers • Multiline-Headers • Line continuations start with a “folding whitespace” –may be space or tab (\t) • Ignore every header you do not know • If you want, you can also display additional headers like BCC – but required are only those mentioned in milestone 3.1 • Case-sensitivity • Header names are always case-insensitive • c.f. RFC 2822, section 1.2.2. „Characters will be specified […] by a case-insensitive literal value enclosed in quotation marks“ • Header values used in the assignment are usually case-insensitive, e.g. Content-Transfer-Encoding: Base64 and base64 are both possible • Exceptions: multipart-boundaryall header values displayed to the user
Date • Look into the documentation of SimpleDateFormat • no need to parse each item for yourself, even recognizes “GMT” and “UTC” as timezones • Modify the parser with Locale.US in order to let it parse things like “May” • Output via DateFormat.getDateTimeInstance() • Timezone • Setting via SimpleDateFormat or Calender#setTimeZone is preferred to manual time manipulation • Reason: DateFormat may be configured to display the timezone
Attachments • Base64 encoded lines are always 76 characters wide – only exception is the last line • If numberofchars % 4 != 0, you may just throw an exception and terminate • Do not use javax.mail.internet.MimeUtility or similar additional libraries for decoding • Use the Content-Disposition header to suggest a name for saving • Attachments that are not of type text/… don’t have and don’t need a charset • Just treat as stream of bytes/byte array
Base64-Example • Take group of 4 charactersS W 4 g • Decode according to RFC • S = 0x12; W = 0x16; 4 = 0x38; g = 0x20 • Decoding may be done in groups: A-Z char – ‘A’; a-z char – ‘a’ + 26;0-9 = char – ‘0’ + 26*2; +, /, = must be treated separately • Combine to 24 bit number, shift according to index (big endian) • 0x12 << 18 | 0x16 << 12 | 0x38 << 6 | 0x20 << 0 0x496e20 • Shift number back in 8 bit blocks (also big endian) • Byte 0 = 0x496e20 >> 16 & 0xff = 0x49 • Byte 1 = 0x496e20 >> 8 & 0xff = 0x6e • Byte 2 = 0x496e20 >> 0 & 0xff = 0x20
Decoding • Your own input stream • Elegant way of decoding Base64 and Quoted-Printable data(you can do it differently, only a suggestion) • Extend java.io.InputStream • Take character-array of undecoded data as parameter • Overwrite read() • Decode the character data when • Return -1 if end of data reached • Let the InputStreamReader deal with the nasty problem of decoding charsets • Sample application has only 50 LoC for decoding quoted printable, 100 LoC for Base64
Regular Expressions • Regular expressions are a nice way for filtering out substrings • A bit like file name patterns (*, ?), but more powerful • Letters, Numbers remain the same • Punctuation characters usually have a special meaning, for characters escape them by a \ • to use the character [, use \[ • Attention: you need to escape the Backslash in Java-Strings \[ == "\\[" • Alternatives: use [] • [abc] matches a or b or c • [A-Z] matches A or B or … or Z • Negation: [^abc] matches everything but a or b or c • Wildcard . matches everything • Repetition • * means “the previous element zero or more times” • + means “the previous element one or more times”
Regular Expressions with Java • Part of java.util.regex • First, compile the pattern to search: • Pattern p = Pattern.compile("charset=[^ ]*") • The compile method has a variant that takes flags – use it for case-insensitivity: Pattern.CASE_INSENSITIVE • Next, make a Matcher for a String out of it • Matcher m = p.match("Content-Type: text/plain; charset=\"us-ascii\"") • Be sure to call the Matcher’s find method • m.find() • m.group(0) now contains everything that maches • charset="us-ascii"
Grouping • You need the thing after “charset=“ • Solution 1: parse for yourself • Solution 2: add groups to the expression • Groups are signified by () and counted from 1 • Pattern p = Pattern.compile("charset=([^ ]*)") • After matching, group(1) contains "\"us-ascii\")
Debugging • Mail clients should be able to connect to the server and fetch the mail • Always helpful: try to connect to the pop-server via telnet and issue POP commands manually • For closer examination, you may unzip the JAR-file and have a look at “mailbox.xml”