370 likes | 474 Views
Sector & Sphere. Tutorial. Yunhong Gu Univ. of Illinois at Chicago @Booz Allen Hamilton, Aug 6, 2009. Outline. Installation Sector File System Sphere Programming. Installation: System Requirement. Linux (debian recommended, XFS recommended) gcc 3.4 or above openssl development library
E N D
Sector & Sphere Tutorial Yunhong Gu Univ. of Illinois at Chicago @Booz Allen Hamilton, Aug 6, 2009
Outline • Installation • Sector File System • Sphere Programming
Installation: System Requirement • Linux (debian recommended, XFS recommended) • gcc 3.4 or above • openssl development library • FUSE development library (optional)
System Architecture security_node.key ./users slave_acl.conf master_acl.conf security_node.cert master_node.key master.conf, topology.conf slaves.list master_node.cert client.conf Security Server Masters Clients SSL SSL Data slaves slaves master_node.cert slave.conf
ls ./codeblue2 • Makefile • client • conf • gmp • master • slave • common • doc • lib • security • udt
Configure Security Server • For a testing system, you can use the default configurations • Otherwise, update slave ACL, master ACL, and user accounts
Access Control List (ACL) • Format IP1 IP2 IP3/Mask • Example: 10.0.0.1 192.168.0.0/24
User Account • All accounts in ./conf/users • One account per file • Example: ./conf/users/test is the account configuration for account “test”
User Account PASSWORD xxx READ_PERMISSION / WRITE_PERMISSION /test /angle EXEC_PERMISSION TRUE ACL 0.0.0.0/0 QUOTA 1000000
Start the Security Server • ./sserver <port> • Default port is 5000
Configure the Master Server • ./conf/master.conf SECTOR_PORT 6000 SECURITY_SERVER ncdm161.lac.uic.edu:5000 REPLICA_NUM 2 DATA_DIRECTORY /home/u2/yunhong/work/data/
Configure the Slaves • ./conf/slave.conf MASTER_ADDRESS ncdm161.lac.uic.edu:6000 DATA_DIRECTORY /raid/sector/data/
Start masters and slaves • ./start_master • ./start_slave • ./start_all • ./stop_all • Password-free SSH • ./conf/slaves.list
./conf/slaves.list gu@192.168.136.1 /home/gu/codeblue2/slave/ gu@192.168.136.2 /home/gu/codeblue2/slave/ gu@192.168.136.3 /home/gu/codeblue2/slave/ username@slave_ip BLANK/TAB slave_path • NOT the slave data directory path! • Sector will automatically restart an offline slave, if its address is on this list
Configure the Client • ./conf/client.conf • Optional, but useful for client tools and examples MASTER_ADDRESS ncdm161.lac.uic.edu:6000 USERNAME test PASSWORD xxx CERTIFICATE /home/gu/codeblue2/conf/master_node.cert
Check System Status $cd client $cd tools $./sysinfo Display system information: list of masters, slaves, available disk spaces, etc. ./master/sector.log
Accessing Sector FS • Tools: ./client/tools • ls, mkdir, stat, rm, download, upload, cp, mv • FUSE: ./client/fuse • make • mount: ./sector-fuse <local dir> • unmount: fusermount -u <local dir>
Programming with Sector • #include <fsclient.h> • Sector::init(master_ip, master_port); • Sector::login(username, password, cert); • Sector::logout(); • Sector::close();
Programming with Sector • Sector::list(path, vector<SNode>& attr) • Sector::stat(path, SNode& attr) • Sector::mkdir(path) • Sector::move(src, dst) • Sector::remove(path) • Sector::copy(src, dst) • Sector::utime(path, ts)
SNode • std::string m_strName; • bool m_bIsDir; • std::set<Address, AddrComp> m_sLocation; • int64_t m_llTimeStamp; • int64_t m_llSize;
Sector Files • SectorFile handle; • handle.open(path, mode); • handle.read(buf, size); • handle.write(buf, size); • handle.close(); • seekp, seekg, tellp, tellg, upload, download
Sphere Programming for each file F in (SDSS datasets) for each image I in F findBrownDwarf(I, …); SphereStream sdss; sdss.init("sdss files"); SphereProcess myproc; myproc->run(sdss,"findBrownDwarf", …); myproc->read(result); findBrownDwarf(char* image, int isize, char* result, int rsize);
Record Offset Index • Data Text1 text1 text1 text1 Text2 text2 Text3 text3 text3 • Index 0 23 44 61 • Index is a binary file with 64-bit integers, with a postfix of “idx” • user.dat / user.dat.idx
Hashing and Bucket Files • Similar to the Reduce process in MapReduce • Each output record is assigned a bucket ID • Records with the same bucket ID will be sent to the same bucket file
User Defined Function (UDF) • int _FUNCTION_(const SInput* input, SOutput* output, SFile* file)
UDF::SInput struct SInput{ char* m_pcUnit; int m_iRows; int64_t* m_pllIndex; char* m_pcParam; int m_iPSize; };
UDF::SOutput struct SOutput{ char* m_pcResult; int m_iBufSize; int m_iResSize; int64_t* m_pllIndex; int m_iIndSize; int m_iRows; int* m_piBucketID; int64_t m_llOffset; string m_strError; };
UDF::SOutput • If m_pcResult or m_pllIndex is not large enough, resize it • When processing a file, if the result is too large, set m_llOffset to record the current file position and the UDF will be called again to restart processing from m_llOffset, until m_llOffset is set to -1.
UDF::SFile struct SFile{ std::string m_strHomeDir; std::string m_strLibDir; std::string m_strTempDir; std::set <std::string> m_sstrFiles; }; Results can be written into local files, the paths should be put into m_sstrFiles
UDF • __FUNCTION__.cpp #include <sphere.h> extern “C” { int _FUNCTION_(const SInput* input, SOutput* output, SFile* file) { } } • generate FUNC.so file
A Sphere Program #include <dcclient.h> Sector::init(); Sector::login(…) SphereStream input; SphereStream output; SphereProcess myProc; myProc.loadOperator(“func.so”); myProc.run(input, output, func, 0); myProc.read(result) myProc.close(); Sector::logout(); Sector::close();
Sphere Stream • Input vector<string> files;files.insert(files.end(), "/html");SphereStream s;s.init(files); • Output SphereStream temp;temp.setOutputPath("/result", "bucket");temp.init(256);
Upload UDF and related files • SphereProcess::loadOperator(path) • Send UDF to all selected slaves for the current process • Can also send any other files (applications, parameter data, etc.) • The path will be stored in SFiles::m_strLibDir
Run a Sphere Process • int run(const SphereStream& input, SphereStream& output, const string& op, const int& rows, const char* param = NULL, const int& size = 0); • rows: number of rows to pass to UDF each time • N > 0: N rows • 0: the whole segment • -1: the whole file
Read Result and Check Progress • SphereProcess:read(SphereResult*& res, const bool& inorder = false, const bool& wait = true); • If output.init(0), results will be sent back to the client • int checkProgress();