510 likes | 660 Views
BEHOLD, WE HAVE SIGNAL. Software Architecture in Cyberspace. Mary Shaw Carnegie Mellon University http://www.cs.cmu.edu/~shaw/. The Internet infrastructure supports a creative and thriving social and economic system. The Internet is much richer than its communication infrastructure.
E N D
Software ArchitectureinCyberspace Mary Shaw Carnegie Mellon University http://www.cs.cmu.edu/~shaw/
The Internet infrastructure supports a creative and thriving social and economic system. • The Internet is much richer than its communication infrastructure. • Architectural styles help explain software systems. • What is an appropriate theory of architectural style for Internet applications?
The Internet infrastructure supports a creative and thriving social and economic system. • The Internet is much richer than its communication infrastructure. • Architectural styles help explain software systems. • What is an appropriate theory of architectural style for Internet applications?
The Internet infrastructure supports a creative and thriving social and economic system. • End users are integral to the Internet, not simply its audience • end users are active participants, not passive consumers • end user producers outnumber technical professionals • Architectural styles help explain software systems. • What is an appropriate theory of architectural style for Internet applications?
Demographics of US Internet users Overall Total adults 73% Women 73 Men 73 Age 18-29 90% 30-49 85 50-64 70 65+ 35 Geography urban 74% suburban 77 rural 63 Education < high school 44% high school 63 some college 84 college + 91 Pew Internet & American Life Project, 2008
Activities of online users (“have you ever?”) Activity % of ‘net users Date Send or read email 92 Dec 07 Use search engine 89 May 08 Search for map/directions 86 Dec 06 Look for info on hobby/interest 83 F-M 07 Check on product before buying 81 Sep 07 Check weather 80 May 08 Look for health/medical info 75 Dec 07 Get travel info 73 M-J 04 Get news 73 May 08 Buy product 71 Dec 07 Visit government website 66 May 08 Buy travel reservation 64 Sep 07 Surf for fun 62 F-A 06 + 65 more Pew Internet & American Life Project, 2008
Content from online users (“have you ever?”) Activity % of ‘net users Date Upload photos to share 37 Aug 06 Rate product with online rating system 32 Sep 07 Post comment or review of product 30 Sep 07 Use online social networking site 29 Dec 06 Categorize or tag online content 28 Dec 06 Share files from your computer 27 M-J 05 Post comments to online group or blog 22 Dec 07 Share something online you created 21 Dec 07 Create content for the internet 19 Nov 04 !! Share files with peer-to-peer sharing 15 May 08 Create or work on your own webpage 14 Dec 07 Create web pages/blogs for others 13 Dec 07 Create or work on your own blog 12 May 08 Participate in online health discussion 12 Aug 06 Pew Internet & American Life Project, 2008
Generation differences Online Gen Y Gen X Trailing Leading Matures After Teens Boomer Boomer Work Activity 12-17 18-28 29-40 41-50 51-59 60-69 70+ Go online 87 84 87 79 75 64 21 Teens and Gen Y more likely than older users: Online games 81 54 37 29 25 25 32 Inst message 75 66 52 38 42 33 25 Download music 51 45 28 16 14 8 5 Download video 31 27 22 14 8 8 1 Gen X or older generations dominate Get health info 73 84 80 84 68 72 Travel rez 50 72 64 64 59 60 Bank online 38 50 44 37 35 22 Pew Internet & American Life Project, 2005
There are lots of end users Using data from the Bureau of Labor Statistics, we estimate that over 90M Americans will use computers at work in 2012. Of these, only about 2.5M will be professional programmers; 40.5M will be managers and (non-software) professionals. This does not include home users or non-US users, so there will be many more than 90M total end users. Most of them will “program” in some way. C. Scaffidi, M. Shaw, and B. Myers. Estimating the Numbers of End Users and End User Programmers. VL/HCC'05: Proc 2005 IEEE Symposium on Visual Languages and Human-Centric Computing, pp. 207-214, 2005.
End users are not software engineers • End users do not have rich and robust mental models of their computing systems • they fail to do backups • They misunderstand storage models (especially local vs network storage) • they execute malware • they innocently engage in other risky behavior. • The responses of SE to this mismatch between real computing systems and end users’ models has been to seek ways to help the users act “rationally”. • But there is lots of evidence that people do not reason that way.
Human network is part of the Internet Intrinsic property (say, dependability) of a unit in a specific (or assumed) environment based on a specific set of attributes + context, value proposition Situational dependency, reflecting the environment and a specific user’s needs, tolerances, priorities, expectations + perception, understanding humans Pragmatic dependability – as realized in practice – which results from decisions made by people and social groups and by the ways users behave
The Internet infrastructure supports a creative and thriving social and economic system. • The Internet is much richer than its communication infrastructure. • This is an Ultra-Large Scale System • Usual software system assumptions don’t hold • Architectural styles help explain software systems. • What is an appropriate theory of architectural style for Internet applications?
Ultra-Large-Scale Systems • More than “systems of systems” or “networks of networks” • Large size on many dimensions … • Lines of code, amount of data, users, dependencies among and complexity of components, etc • … but more than that • Decentralized operation and control • Conflicting, unknowable, diverse requirements • Continuous evolution and deployment • Heterogeneous, inconsistent, changing elements • Indistinct people/system boundary • Normal failures • New forms of acquisition and policy SEI. Ultra-Large-Scale Systems. 2006
Decentralized operation and control • ULS system scale offers only limited possibilities for central or hierarchical control • Long life, multiple users and objectives, span of physical jurisdictions are the norm at ULS scale • Many versions of subsystems must work together • Modifications are developed and installed by independent groups • Spontaneous, unanticipated new uses arise Undermines common assumption: • All conflicts must be resolved and must be resolved uniformly
Conflicting, unknowable, diverse requirements • ULS systems serve wide range of competing needs • Competing users contend for requirements • Understanding of problem evolves • Dependability is “better/worse”, not “right/wrong” • Wicked problems Undermines common assumptions: • Requirements are known in advance, evolve slowly • Tradeoff decisions will be stable
Continuous evolution and deployment • ULS systems have long lives and multiple independent developers • Different groups may install capability for their own needs; this may conflict with other groups • Evolution can’t be controlled centrally, must be shaped by rules and policies that protect critical services and allow diversity at the edges Undermines common assumption: • System improvements introduced in “releases” • Users/developers know about releases and can choose to accept them or not
Heterogeneous, inconsistent, changing elements • ULS systems will be composed from independently-created components • Heterogeneous: many sources, no single interface standard, often incorporating legacy systems • Inconsistent: evolution spontaneous, not planned; different objectives may cause inconsistent versions • Changing: hardware, software, operating environment change based on local decisions Undermines common assumptions: • Effect of change can be predicted adequately • Configuration information is accurate and controlled • Components and users are fairly homogeneous
Indistinct people/system boundary • ULS systems’ service to a user depends on actions of other users; user/developer distinction soft • User actions may affect overall system health • System must adapt to changing usage patterns • Aggregate analysis may be better than exact analysis Undermines common assumption: • Users’ behavior doesn’t affect overall system • Collective behavior of people is not relevant • Social interactions are not relevant
Normal failures • ULS system scale implies inevitable failures, so systems must do protection/recovery/enforcement • Hardware failures inevitable because of scale • Legitimate use of software and services outside planned capability will cause degradation/failure • Malicious use will cause problems Undermines common assumptions: • Failures will be infrequent and exceptional • Defects can be removed
New forms of acquisition and policy • ULS systems will evolve, but there must be governance to prevent anarchy • Success of system depends on organic evolution • Individual developers won’t fully understand core infrastructure • Need effective guidance on allowed/unallowed change Undermines common assumption: • There is a single agent responsible for system development, operation, and evolution
Analogy: Cities and city planning • Cities are complex systems • Built of individual components chosen by individuals • Constantly evolve • Withstand failures and attacks • Cities are not centrally controlled • Standards for infrastructures • Building codes, highway standards • Policies that allow individual action within constraints • Zoning laws • Regulations that govern individual action • Enforcement after the fact, rather than prior constraint
The Internet infrastructure supports a creative and thriving social and economic system. • The Internet is much richer than its communication infrastructure. • Architectural styles help explain software systems. • Style establishes the abstract structure of a system • Character of problem should lead to style for solution • What is an appropriate theory of architectural style for Internet applications?
Architectural style in software • Styles are rooted in common knowledge, intuition, informal prose, box-and-line diagrams • A style is a set of design rules that identify • which kinds of components make up a system • which kinds of connectors compose the components • how control is shared among components • how data is transmitted through the system • what type of reasoning is compatible with the style • Focus on abstractions used by designers • In practice, implemented as processes and calls … • … but that loses the designer’s intent M Shaw and P. Clements, A Field Guide to Boxology… COMPSAC 1997
Examples of styles • Data flow styles • Batch sequential, dataflow network, closed-loop control • Call-and-return styles • Main program/subroutines; object-oriented systems • Interacting process styles • Communicating processes, event subsystems • Data-centered repository styles • Transactional database, blackboard • Data-sharing styles • Compound document, hypertext, Fortran common • Hierarchical styles • Layers
Members of Interacting Process Family All members of the family are dominated by message-passing protocols among independent, usually concurrent processes with sporadic low-volume traffic. ===Control=== ====Data==== Topo- Synch- Topo- Substyle logy roniciy logy Mode Comm proc arb not seq arb any 1way data flow linear asynch linear passed Client-server star synch star passed Heartbeat hier ls/par hier/star pass,share Broadcast arb asynch star bdcast Token passing arb asynch arb passed G. Andrews, Paradigms for Process Interaction in Distributed Programs. ACM Comp Surv 1991
Language of Styles: Constituent Parts • Component: unit of software that performs some function at runtime process stand-alone program procedure transducer manager memory • Connector: mechanism that mediates communication, coordination, cooperation among components procedure call data stream shared representation direct access message protocol implicit invocation
Language of Styles: Control Issues • Topology: Geometric form of control flow linear acyclic hierarchical arbitrary • Synchronicity: dependency among components synchronous asynchronous opportunistic direct access • Binding time: when are control relations set up message protocol compile time initialization time while running
Language of Styles: Data Issues • Topology: Geometric form of data flow linear acyclic hierarchical arbitrary • Continuity: how continuous is flow continuous sporadic (discrete times) • Mode: how data is made available explicit passing sharing broadcast multicast • Binding time: when are control relations set up message protocol compile time initialization time while running
Language of Styles: Reasoning • Data flow styles • Functional composition • Call-and-return styles • Hierarchy (local reasoning) • Interacting process styles • Nondeterminism • Data-centered repository styles • Data integrity (ACID, convergence, invariants) • Data-sharing styles • Representation • Hierarchical styles • Levels of service
Utility of Arch Description Languages • Establish uniform descriptive notation • Clarify informal concepts • Document design intent • Discriminate among different styles • Bring out significant differences that affect suitability for various tasks • Provide advice on selecting style for a problem • Following Jackson, characterize problem domain and use this to select appropriate abstract architecture • Separate concerns about fit of architecture to problem from implementation and performance issues • Allow comparison of alternatives, potentially simulation and analysis
The Internet infrastructure supports a creative and thriving social and economic system. • The Internet is much richer than its communication infrastructure. • Architectural styles help explain software systems. • What is an appropriate theory of architectural style for Internet applications? • Style establishes the abstract structure of a system • Character of problem should lead to style for solution
Picking a target in cyberspace • Cyberspace supports many sophisticated applications, interactions, and communities, including everything from IM/email to enterprise integration. • Focus here on the web as used by everyday people • This is the most different from conventional software systems • We need to start with a specific focus • End users are in dire need of support
Example: Twitter Vote Report On US Election Day 2008, this application, set up by an interested user community, collected reports on status of polling places and displayed it on a map in near real time so that interested people, especially the public, could spot problem areas
VoteReport input stream Data was collected by twitter, SMS, and telephone call. A stylized format was provided (and often ignored) The message feed was available
VoteReport composition The VoteReport application interpreted the Twitter/SMS feed, geo-referenced the data, and mapped them, presumably through the efforts of professional programmers. The report could be embedded in a web site by anyone The input stream was also available
Mashups • “Mashup”s combine functionality of multiple websites or augment existing websites; they show how users build on existing systems in new and innovative ways. • Wong did a qualitative survey of high-quality mashups to see how they used/improved existing web sites, how they combined data from multiple sites, and what kinds of user tasks they support • Overall impression: mashups are ad hoc and idiosyncratic J. Won and J Hong. What do we “mashup” when we make mashus? WEuSE IV 08
Capabilities of Mashups • Mashups provided a variety of capabilities, often more than one per mashup • Search: Does it have a search interface? • Visualization: Does it add visualization to the date? • Real-time: Is the purpose to allow user to monitor original websie as realtime data set? • Widget: Is it actually a widget or plugin? • Personalized: Does it use personal information or enable personalization? • Folksonomy: Does it use or add tagging? • In-situ use: Does it simply tailor a website for a specific situation or use?
Categories of mashups • Aggregation: aggregate multiple websites or summarize sets of data • Alternate UI & In-situ use: provide new ways t interact with a website or support specific use cases • Personalization: specialize a website based on personal data from that site or another source • Focused View of Data: index or categorize contents of another website • Real-time Monitoring: react to changes in a website and make user aware of them
Implementations of mashups • Inputs • RSS feeds, search engine feeds, personal names, user ids, settings, ranges, geographical points and directions, search terms, tracking numbers, ISBNs, headlines, tags, “clippings” (screen scraping?) • Outputs • RSS feeds, list of links, visualizations, text, maps, pictures, numeric tables, lists, reports • APIs • Delicious, yelp, craiglist, flickr, city information feeds, eBay, Google Maps, Google Earth, USGS DRM, amazon, UPS, USPS, FedEx, iTunes, Napster, news sites, wikipedia daylife, Rhapsody, search engines, Twitter, YouTube, MySpace
Architectural Task for Web Applications • How do web applications differ from classical software systems? • Richer set of component types • Much stronger role for data and its presentation • More integration by inclusion or embedding • More significance for initiative (push/pull) • Minimal reliance on compiler for integration • How much of classic architectural style carries over? How much extension is required? • Structure (component/connector/composition) remains valid • Details partially hold, but require extension
Distinguishing the Classes of Solutions • One role of an architecture for cyberspace is to explain the structural distinctions among the applications that support user activities • Sometime differences matter • You wouldn’t use IM for offline backup of photo files • What attributes of IM, photo files, and the backup task show the mismatch? • Sometimes the same application can be realized in different ways • Both FTP and P2P support file sharing, but with different resource implications • What attributes show the abstract similarity, and what attributes show the resource implications?
Elementary Component Types (provisional) • Text (human-readable, file or message) • Encoded file (typed set of bits) • Database (structured, with query capability) • Web page (human-readable, possibly generated) • Computation or service • Stream (continuous, nonpersistent) • Human (yes, people are part of the system) • Attributes of interest • state persistence, qualitative size, type/internal structure, rate of change (static/dynamic)
Connector Primitives (provisional) • The lowest level of primitive is the TCP/UDP port 20, 21 FTP 23 Telnet25 SMTP 80, 443 HTTP, HTTPS109, 110 POP2, POP3 143, 220 IMAP194 IRC 666 Doom 3724 World Warcraft 4664 Google desktop5190 AOL IM 6881-6988 BitTorrent … and a couple of thousand others • Attributes of interest • activation (push/pull/continuous/interactive query-respond); arity; signature; knowledge of targets; abstract protocol; synchronicity; performance (rates, bandwidth, latency); state of interaction (persistence, location); duration of interaction
Composition of primitives • Abstractly, we can recognize • Simple connections • Scripting, linking • Remote call in various forms • RPC, service invocation • Extended sessions • Browser session with session ID • User-facing compositions • Installation of plug-in, mashups • Enterprise compositions • SOA, SaaS
Example: Internet Messaging • A human creates short unstructured text units and pushes them via IRC to another known person. They are delivered with low latency, but they are not persistent.
Example: Grapevine model of email • An email system has five components: a composer, a reader, a database for archiving messages, a directory system, and a transport mechanism. • A message is a structured text entity. It has formatted headers, including an addresstext string, and may have internal formatting and/or attachments. • The composer is an interactive editor (computation) that produces a mail message • The reader can display the message, and it is integrated with the database for archiving messages, which it can also classify and retrieve. • The directory system translates a symbolic address to a destination. • The transport mechanism moves mail to is destination. Grapevine paper, CACM, sometime in the 70s
Example: VoteReport • Here is a plausible architectural description of the VoteReport application • The input is a stream of messages, merged from twitter, SMS, and transcriptions of phone calls. • The input is supposed to be structured but often is not • A parser receives the input stream and transduces the messages (to the extent possible) to georeferenced reports • A report has a text, a timestamp, and an icon • Recent reports are displayed on a zoomable map embedded in a web page • Filters are provided, and summaries are displayed • The map and input stream are elements that can be incorporated in other web pages.
The Internet infrastructure supports a creative and thriving social and economic system. • Some of the wires in the network have myelin sheaths • The Internet is much richer than its communication infrastructure. • The problems are wicked, not just technical • Architectural styles help explain software systems. • Software architecture provides a model for a descriptive theory of software in cyberspace • What is an appropriate theory of architectural style for Internet applications? • Can we help end users at the ends of the network?