500 likes | 724 Views
Socket I/O. 2005. 6. 8 백 일 우 steigensonne@hufs.ac.kr. Concept. Code Introduction Socket Buffer Write, writev, sendto, sendmsg Sendit Function Sosend Function Read, readv, recvfrom, recvmsg Recvmsg systemp call Recvit Function Soreceive Function. Code Intro. Socket Buffers.
E N D
Socket I/O 2005. 6. 8 백 일 우 steigensonne@hufs.ac.kr
Concept • Code Introduction • Socket Buffer • Write, writev, sendto, sendmsg • Sendit Function • Sosend Function • Read, readv, recvfrom, recvmsg • Recvmsg systemp call • Recvit Function • Soreceive Function
Socket Buffers • Each socket has an associated send & receive buffer sb_cc : total number of data bytes sb_hiwat, sb_lowat : socket flow control algorithm sb_mbcnt : total amount of memory allocated to the mbufs in the buffer sb_cnt: total number amount of memory allocated to the mbufs sb_mbmax : Upper bound on the amount of memory to be allocated as mbufs for each socket buffer sb_mb : points to the first mbuf in the chain sb_timeo : measured in clock ticks and limits the time a process blocks during read/write
Sb_flags > Default socket buffer limits for the Internet protocols
Socket Macros and functions • They handle buffer locking and synchronization
Socket Macros and functions_cont’ • For socket buffer allocation and manipulation
Write, writev, sendto, and Sendmsg • All the write system calls, directly or indirectly, call sosend • Copy data from the process to the kernel and pass data to protocol associated with the socket
iovec • Writing from multiple buffers is called gathering • Analogous read operation is called scattering • In a gather operation, the kernel accepts data from each buffer specified in an array of iovec structures • Without this type of interface,. • Should Copy buffers into a single larger buffer • Should make write system calls to send data from multiple buffers • Above of all INEFFICENT, so iovec Iov_base : points to the start of a buffer of iov_len bytes
iovec • iovec arguments to writev • Datagram protocols require a destination address • Write, writev and send don’t accept explicit address, so Called only after a destination has been associated with a connectionless socket by calling connect • A Destination must be provided with sendto or sendmsg() or connect must have been previously called iovp : points to the first element of the array iovcnt : the size of the array
msghdr • Only sendmsg call supports control info • Control info and several argu to sendmsg are specified within a msghdr Should be declared as a pointer to a sockaddr structure, since it contains a network address Control infomation 1. Control message is formatted as a cmsmsg 2. Control info is not interpreted be socket layer, but messages are typed (cmsg_type) and they have an explicit length (cmsg_len)
Msghdr structure • msghdr structure for sendmsg system call
sendmsg System Call 8, 1024 // Copy the msghdr from user space to the kernel // Message too long iovec with 8 entries is allocated automatically on stack If not large enough, calls MALLOC copyin : places a copy of the iovec array from user space into Larger array Delivered to the appropriate protocol or an error sendmsg release iovec array and return
sendit Function • sendit is the common func called by sendto and sndmsg • Initialzes a uio structure • Copy control and address information from proccess into the kernel • uiomove function • Moves n bytes between a single buffer referenced by cp and the multiple buffers specified by an iovec array in uio // Instruction space Points to an array of iovec structure each time uiomove is called, uio_offset Increase by n and uio_resid decreases by n // counts the number of bytes transferred by uiomove // Counts the number of bytes remaining to be transferred
uio structure before and after uiomove BEFORE Points to a buffer within the kernel ,typically data area of an mbuf AFTER The data from the buffer in the proccess has been moved into the kernel’s buffer because uio_raw was UIO_WRITE
sendit Code Code is for initialization of uio To get the file structure associated with descriptor s Initialize uio structure to gather the output buffers into mbufs in the kernel Calculate the length of the transfer and save in uio_resid Ensure that buffer length is nonnegative Ensure that uio_resid does not overflow ( it is signed integer) and Guarantee iov_len is nonnegative
sendit Code(cont’) Code : address and control information from the proccess sockargs() makes copies of the dst address and control information into mbufs if they are provided by the proccess The number of transferred can be calculated if sosend() doesn’t accept all the data ( it is remaining length) 1. When transfer data and is interrupted by signal or blocking, error is discarded and partial transfer is reported 2. If return EPIPE, send SIGPIPE signal 3. No error occurred, transferred bytes are calculated and saved in *retsize
sosend Function • Has responsibility to pass data and control info to pr_usrreq function of the protocol associated with the socket • Before pass, check out for enough space in send buffer • sosend never places data in the send buffer • To store and remove the data is protocol’s responsibility : Protocol DO • send buffer’s sb_hiwat and sb_lowat values by sosend depends on whether protocol is reliable or unreliable transfer semantics • For reliable protocol, send buffer has both data,,. • Data that has not yet been transmitted • Data that has been sent, but Not ACKed • Sb_cc is the number of the bytes of data that reside in the sendbuffer • 0 <= sb_cc <= sb_hiwat
Sosend Function : how to pass • If PR_ATOMIC is set,. • sosend() must preserve message boundaries between process and protocol layer • In this case, sosend() waits for enough space to become available to hold entire message • If available, mbuf having the message is constructed and passed to the protocol in a single call • If NOT set, • Sosend() passes the message to the protocol one mbuf at a time • Pass a partial mbuf to avoid exceeding the high-water mark
sosend() • Unreliable Protocol Buffering • No data is ever stored in the sendbuffer and no ACK is expected • Each message is passed to the protocol immediately • So, sb_cc always 0, and sb_hiwat specifies MMS • Sb_hiwat Default for UDP is 9216(9*1024) • Unless process changes sb_hiwat with SO_SNDBUF socket option, Trying to write more than 9216 bytes returns ERROR
sosend Code so : pointer to the relevant socket addr : pointer to an destination address uio : pointer uio structure top : mbuf chain that holds data to be sent control : mbuf that holds control info to be sent flags : contains option for this write call /* initialization ( Figure 16.23) */ Lock send buffer the lock ensure orderly access to the socket buffer by multiple process /* wait for space in send buffer (figure 16.24) */ Obtain the lock and prepare to deliver data to the protocol End of record If not NULL, transfer data from the process /* fill a single mbuf or an mbuf chain (Figure 16.25) */ /* pass mbuf chain to protocol (Figure 16.26) */ After all data is passed to protocol, socket buffer is unlocked, any remaining mbufs are discarded
sosend() : initialization sosendallatonce is true, atomic is set this flag controls whether data is passed to the protocol as a single mbuf or seperately Number of bytes in the iovec buffers or top mbuf chain // optional control mbuf
sosend() : error and resource checking // socket can’t send more Protocol require connection and connection is not established or connection attempt has not been Started, ENOTCONN is returned // NO address // Computes the amount of free space in the send buffer //if atomic and larger than high_watermark, EMSGSIZE returned Message must be passed in a single request(atomic) Msg may be split, but free space fall below low_water Or the control info don’t fit in the available space => sosend() must wait!!! mp holds pointer used to construct mbuf chain
Sosend() Function : data transfer Allocate packet header or standard mbuf IF atomic set, allocate packet header during first loop and then standardmbuf IF Not, always allocate packet header, because top is cleared before entering the loop Cluster is attached to the mbuf If set, reserve room for header. But If not set, No reserved // Msg len, buffer len, mbuf len Locates data at the end o the buffer in the chain May leave room for header, depending on how much data is placed in mbuf // copy len bytes of data from process to mbuf // update mbuf length // New mbuf is linked with previous mbuf // mbuf chain length is updated when last byte is transferred from the proccess, if M_EOR is set, sosend() breaks out
sosend() function : protocol dispatch // only could be enabled for assign message // reset
recvmsg System call socket descriptor for control information 8 on stack // Copy msg structure to kernel // 8 Copy iov array // 1024 after receive data, copy msghdr to process
recvit Function : initialization // return file structure ‘s’ // compute number bytes of transferred data by adding array length Total length is computed and saved
recvit Function : initialization // the number of bytes of data transferred copy address Copy address & control info to the proccess control information
soreceive Function • soreceive transfer data from the receive buffer of socket to the buffer specified by the process • recvmsg is the only read system call that returns flags to process • In the other calls, the info is discarded by kernel before control returns to the process • Out-of-Band data • Two mechanisms to facilitate handling OOB • Tagging & synchronization
OOB Handling • TAG • Sending process tags data as OOB by setting MSG_OOB flag • Sosend() pass this info to socket protocol • When receive OOB, the data is set aside instead of placing socket’s receivce buffer • Receive OOB data by setting MSG_OOB • Synchronization • The receiving process can ask the protocol to place OOD data inline with the regular data • In this case, MSG_OOB flag is not used • ReadCalls return either all regular data or all OOB data
OOB • Receiving out-of-band data
Receive Buffer Organization • Message boundaries • For protocol that support message boundaries, each message is stored in a single chain of mbufs • Multiple messages in recv buffer are linked together by m_nextpkt • Protocol layer adds data to the recv que and socket layer removes the data from recv que • High_water mark for recv buffer restricts the amount of data • When PR_ATOMIC is not set,. • Protocol layer stores as much data in the buffer as possible and discards the portion of incoming data that does not fit • For TCP, it means that any data which is out side the window is discarded • PRO_ATOMIC is set,. • Protocol use sbappendaddr to construct an mbuf chain and add it to recv queue
Receive Buffer Organization • No Message Boundaries • Such as TCP, incoming data is appended to the end of the last mbuf chan in the buffer with sbappend • Incoming data is trimmed to fit within the recv buffer, and sb_lowat puts a lower boun on the number of bytes returned by a read system call
soreceive Code so : socket, paddr : address info, mp0 : mbuf pointer controlp : control info in mbufpointer // size of receive request, if addr, info are copied to kernel, set it to 0 if data is copied, it is updated cleared // Before access buffer, Get lock // ‘m’ is first mbuf chain Check several conditions and if need to wait for more data If soreceive sleeps in this code, it jumps back to restart when it wakes up to see if enough data has arrives This continue until request is satisfied
soreceive Code // jumps here when it has enough data to satisfy the request before any other data is transferred from the receive buffer Setup data transfer : remember the type of data at the front of the Que, so soreceive can stop transfer when the type changes Clean_up
soreceive function < Out_of_band data > OOB is not stored, soreceive() allocates a standard mbuf and issues PRU_RCVOOB request to protocol` while loop copies data returned by protocol to the buffers specified by uio. After copy, soreceive returns 0 or error < Connection Confirmation > If data is returned, *mp is cleared up as NULL If socket is in the SO_ISFIRMING state, PRU_RCVD request notifies protocol which is attempting to receive data
soreceive function • Enough data ? • 1. There is no data in the recv buffer ( m ==0 ) • 2. Not enough data to satisfy the entire read (sb_cc < uio_resid), the minimum amount of data is available, • data can be appended to this chain when it arrives(m_nextpht = o and PR_ATOMIC is set) • 3. No enough data to satisfy the entire read, minimum amount of data is available, data can be added to chain • , but MSG_WAITALL indicates that soreceive must wait until the entire rean can be satisfied
soreceive : wait for more data If socket is in ERROR and ‘m’ is NULL, return ERROR If ERROR and nonNULL, return data => if MSG_PEEK is set, error is not cleared, since ReadCall with MSG_PEEK should not change the state of socket If data remain in the recv buffer, sosend() doesn’t wait and return data to process If recvbuff is empty, sorecv jumps to release and read system call return 0 If contain OOB or end of logical record soreceive doesn’t wait for additional data and Jump to dontlock If protocol request a connection but No exits, ENOTCONN is posted and jump to release
soreceive Function : return address and info [Return Address] Like UDP, mbuf containing address is removed from the mbuf chain and retuned in *paddr if MSG_PEEK is set, data is removed from the buffer if NULL, the address is discarded
soreceive Function : control information // Each control mbuf is removed from the buffer and PEEK set, and is attached to *contolp( => if NULL, discarded) If the process is prepared to receive control info, // If controlp is NULL, Discarded.. points to next mbuf After the control information has been processed, the chain should contain regular, OOB mbuf or no mbufs at all
soreceive Function : uio move // Continues while there are more mbufs, process’s buffer is not full, and No error If the type of mbuf changes, the transfer stops So, regular and OOB data are not both returned in the same message Distance to OOB is computed and limits the size of tranfer, so the byte before the mark is the last byte transferred
soreceive Function : update buffer // if all bytes in mbuf has been transferred, mbuf must be discarded or pointer advanced Finished with mbuf? More data to process if request didn’t consume all the data, if so_oobmark cut the request short, if additional data arrived during uiomove, => there may be more data to process
soreceive Function : OOB mark If OOB mark is not ZERO, decremented by the number of bytes transferred If mark has been reached, SS_RCVATMARK is set and breaks out If MSG_PEEK is set, offset is updated instead of so_oobmark // End of Record
soreceive Function : MSG_WAITALL processing // If [MSG_WAITALL is set, No more data in the recv buffer(m==0), wants more data, this is the last record in recv buffer] => Must wait for additional data When recv buffer is changed by protocol layer, sbwait return If the wait was interrupted by a signal, sosend returns immediately Sync ‘m’ and ‘nextrecord’ with recv buffer
soreceive Function : cleanup Truncated Message If buffer is too small, so truncated End of the record processing : Next mbuf chain is attached to the receive buffer Nothing Transferred