| 1 | 1) Make a new bank account: |
|---|
| 2 | |
|---|
| 3 | tycoon -o UserName=bank_admin bank create_account l.zhang@hp.com 1000 a __Bank |
|---|
| 4 | |
|---|
| 5 | 2) sync for debugging: |
|---|
| 6 | debugsync machine_name |
|---|
| 7 | |
|---|
| 8 | *) Check all input from network. |
|---|
| 9 | |
|---|
| 10 | *) Auctioneer handles XMLRPC, argument checking, transferring of money. |
|---|
| 11 | |
|---|
| 12 | *) AuctioneerAllocator handles management of resources |
|---|
| 13 | |
|---|
| 14 | *) VMM handles specifics of a virtualization technology |
|---|
| 15 | |
|---|
| 16 | *) Use assertions to check things that should never happen |
|---|
| 17 | |
|---|
| 18 | *) Throw exceptions for things that could happen, but shouldn't, e.g., |
|---|
| 19 | erroneous input from command line or network |
|---|
| 20 | |
|---|
| 21 | *) Don't use multiple inheritance. |
|---|
| 22 | |
|---|
| 23 | *) Avoid inheritance where convenient. |
|---|
| 24 | |
|---|
| 25 | *) Test all code using automated test harnesses. |
|---|
| 26 | |
|---|
| 27 | |
|---|
| 28 | 2/21 Should minimum and contingency funds be kept separate? |
|---|
| 29 | a) Separate pools: Can have very strong assurances to meet minimums |
|---|
| 30 | b) Single pool: Can spend savings to get better expected performance |
|---|
| 31 | |
|---|
| 32 | Separate pools are better |
|---|
| 33 | |
|---|
| 34 | 2/21 Should have separate or shared contingency fund? |
|---|
| 35 | Shared is better to spread risk, but disk is special case because of |
|---|
| 36 | importance of persistent data and ability to shutdown use of active |
|---|
| 37 | resources |
|---|
| 38 | |
|---|
| 39 | Shared, with pool bid over 1 time interval according to weights |
|---|
| 40 | |
|---|
| 41 | 2/24 What should the maximum memory of Xen VMs be? |
|---|
| 42 | * Bid's maximum memory: too large. Other VMs may prevent this from |
|---|
| 43 | being possible |
|---|
| 44 | * Actual available capacity based on bid, etc. |
|---|
| 45 | * need to write config file before boot. |
|---|
| 46 | |
|---|
| 47 | 4/24 Magnitude of bids must be close to that of preferences and each |
|---|
| 48 | other. |
|---|
| 49 | |
|---|
| 50 | 5/2 Auctioneer API: return values are a dictionary. An "error" element |
|---|
| 51 | indicates an error with the value being a human readable |
|---|
| 52 | message indicating the type of the error. Additional machine readable |
|---|
| 53 | data can be in "error_value". |
|---|
| 54 | |
|---|
| 55 | 5/2 Should probably avoid the use of exceptions in favor of explicit |
|---|
| 56 | return values. exceptions should be for unexpected situations, not |
|---|
| 57 | expected ones. |
|---|
| 58 | |
|---|
| 59 | 6/6 Error handling is significantly complicated by using explicit |
|---|
| 60 | return values instead of Exceptions. Explicit return values have to be |
|---|
| 61 | processed all the way up the call stack even if only the top level |
|---|
| 62 | handles them. Exceptions only have to handled at the top level. The |
|---|
| 63 | new policy is to relax the old policy and use exceptions to signal |
|---|
| 64 | error conditions while still using return values to pass the expected |
|---|
| 65 | value of a function. |
|---|
| 66 | |
|---|
| 67 | The actual exception used should be separate from the standard system |
|---|
| 68 | exceptions to distinguish anticipated error conditions from |
|---|
| 69 | unanticipated error conditions. Also, the name of exception should be |
|---|
| 70 | different for each module. |
|---|
| 71 | |
|---|
| 72 | 5/7 Resources have a name and type. A name specifies a specific |
|---|
| 73 | resource while a type specifies a class of resources. |
|---|
| 74 | |
|---|
| 75 | 5/24 Sample contingency bid: |
|---|
| 76 | |
|---|
| 77 | python tycoon host bid 67.164.68.18 0 "{'CPU': (1, 0.0, 1.0, 0.0)}" |
|---|
| 78 | |
|---|
| 79 | "{('TCPv4Port', 30000): (1, 1.0, 1.0, 0.0)}" |
|---|
| 80 | |
|---|
| 81 | 6/9 There are several disk allocation quantities: |
|---|
| 82 | bid_min and bid_max: specified by the account's bid |
|---|
| 83 | usage: how much is actually being used by the account. Includes |
|---|
| 84 | the file system size and all other files |
|---|
| 85 | allocation: amount that the allocator gives to the account |
|---|
| 86 | |
|---|
| 87 | 6/9 The relative values of these quantities trigger different events: |
|---|
| 88 | |
|---|
| 89 | allocation < bid_min: delete the account. |
|---|
| 90 | This is the definition of bid_min |
|---|
| 91 | |
|---|
| 92 | allocation < usage: try to shrink file system, otherwise delete |
|---|
| 93 | Ideally, the auctioneer would call an application specific |
|---|
| 94 | handler to delete files. |
|---|
| 95 | |
|---|
| 96 | 6/13 Root must have a contingency ratio .9999999 instead of 1-epsilon |
|---|
| 97 | so that resource prices are not zero. Zero prices causes |
|---|
| 98 | discontinuities in allocation like allocating more than 1.0 |
|---|
| 99 | resources. This has been a problem for few users or when users all use |
|---|
| 100 | CR of 1.0 for a resource. |
|---|
| 101 | |
|---|
| 102 | 6/20 What happens when the expiration time has arrived, but there are |
|---|
| 103 | still funds remaining? Here are the choices: |
|---|
| 104 | 1) Delete account and refund remaining funds. Harsh. |
|---|
| 105 | 2) Shutdown and reset with long interval. Much kinder. Account may |
|---|
| 106 | still be garbage collected if outbid. |
|---|
| 107 | Decided to do 2), but not sure if this was the right decision. |
|---|
| 108 | |
|---|
| 109 | 6/27 Testing strategy: high level to low level. Group tests by |
|---|
| 110 | area. Each test should be completely independent. This takes more |
|---|
| 111 | time, but is easier to diagnose. Tests should verify results. |
|---|
| 112 | |
|---|
| 113 | 6/27 Problems with asynchronous error handling: |
|---|
| 114 | 1) have to handle both exceptions and Failures because an exception |
|---|
| 115 | may occur before the first "yield" |
|---|
| 116 | 2) have to handle failures at every level because twisted error loop |
|---|
| 117 | doesn't like failures propagated more than one level. May be able to |
|---|
| 118 | fix by fixing flow code. |
|---|
| 119 | |
|---|
| 120 | 6/27 Users are prevented from setting bids which cause their own |
|---|
| 121 | accounts to be deleted or shutdown. This includes changing of the |
|---|
| 122 | interval, refunding of currency, and asking for high minimums. |
|---|
| 123 | |
|---|
| 124 | 6/27 What happens if a setquota lowers the quota below the current |
|---|
| 125 | allocation? Users are marked in repquota and a grace period counts |
|---|
| 126 | down, but nothing happens when it expires. |
|---|
| 127 | |
|---|
| 128 | 9/20 command line return value = number of errors |
|---|
| 129 | |
|---|
| 130 | 10/24 Contingency Accounting: as per previous notes, the contingency |
|---|
| 131 | fund should be separate from the expected value fund, and a single |
|---|
| 132 | contingency fund should be shared by all resources. This provides |
|---|
| 133 | maximum predictability in resource allocation, but could allow a |
|---|
| 134 | malicious user to bid up one resource and drain the contingency fund, |
|---|
| 135 | causing unpredictability in another resource. This raises the related |
|---|
| 136 | question of how much the weighted resources bids with shared funding |
|---|
| 137 | pool model makes sense. |
|---|
| 138 | |
|---|
| 139 | The question here is how the contingency fund should be managed. The |
|---|
| 140 | issues are idempotency of bids and simplicity of bidding. Some |
|---|
| 141 | possible solutions for when to reset the contingency fund: |
|---|
| 142 | |
|---|
| 143 | 1) with every bid. Simple to understand, difficult to manage. |
|---|
| 144 | |
|---|
| 145 | 2) with bids that change the duration or contingency ratio. Somewhat |
|---|
| 146 | difficult to understand, but allows for simpler idempotency. |
|---|
| 147 | |
|---|
| 148 | 3) with funding operations. Too complex; introduces a third kind of |
|---|
| 149 | bid operation (others are bid and fund). |
|---|
| 150 | |
|---|
| 151 | 11/14 Error handling revisited 3: there are several kinds of rpc error |
|---|
| 152 | mechanisms without a clear policy on how they should be used: |
|---|
| 153 | |
|---|
| 154 | 1) a TyException passed in the "result" field of the result. |
|---|
| 155 | 2) a Fault that is raised |
|---|
| 156 | 3) an unexpected Exception that occurs |
|---|
| 157 | |
|---|
| 158 | #2 and #3 should be retired because they cannot be authenticated |
|---|
| 159 | |
|---|
| 160 | 12/18 Bid semantics are confusing because of the ability to use |
|---|
| 161 | default values and the precedence of different values. Here are the |
|---|
| 162 | rules: |
|---|
| 163 | |
|---|
| 164 | 1) Any value can be "None". This means that the existing value is used. |
|---|
| 165 | 2) However, the min_resource and min_resource_amount cannot both be |
|---|
| 166 | specified (the same goes for max_resource and max_resource_amount), so |
|---|
| 167 | when one is specified, it takes precedence over the earlier value. |
|---|
| 168 | |
|---|
| 169 | 1/17 Long standing auctioneer bugs: |
|---|
| 170 | 1) balance and contingency balance need to take effect on |
|---|
| 171 | immediate account_and_allocate. They were waiting for next |
|---|
| 172 | accounting cycle |
|---|
| 173 | 2) balance, contingency_balance, and allocation need to be |
|---|
| 174 | undone on failed bid, fund, and refund |
|---|
| 175 | 3) allocation needs to be undone on failed create_account |
|---|
| 176 | |
|---|
| 177 | Solutions: |
|---|
| 178 | *) For 2 and 3: rerun allocation cycle after undoing operation |
|---|
| 179 | *) For 1: Save balance, contingency balance information, restore |
|---|
| 180 | on failure. This opens a window when the balance is altered and |
|---|
| 181 | an RPC call could alter it. Need to preclude race conditions |
|---|
| 182 | anyway. |
|---|
| 183 | |
|---|
| 184 | 1/17 Revisiting the question of when the contingency fund should be |
|---|
| 185 | recalculated, the previous solution of only doing it when duration or |
|---|
| 186 | contingency ratios change seems to allow incorrect values when the |
|---|
| 187 | weights change. As a result, we should recompute the contingency fund |
|---|
| 188 | with every bid change. Also, the contingency accounting counters |
|---|
| 189 | should be reset. As a special case, if the bid does not change at all, |
|---|
| 190 | we will keep all contingency information unchanged. This is useful if |
|---|
| 191 | the client sends multiple identical bids to increase reliability. |
|---|
| 192 | |
|---|
| 193 | 1/25 Account and allocate problem: solutions to simplify handling of |
|---|
| 194 | failed bids: |
|---|
| 195 | 1) Make all state ephemeral and write it back after checking for |
|---|
| 196 | errors |
|---|
| 197 | 2) Run another allocation pass after undoing error |
|---|
| 198 | 3) Hybrid: commit accounting immediately, make allocation ephmeral, |
|---|
| 199 | only re-run allocation on error, only commit allocation late. |
|---|
| 200 | |
|---|
| 201 | 2/7 Should a failure to get a unique resource cause a |
|---|
| 202 | bid/creation/fund/refund to fail? No, putting in a failed bid for |
|---|
| 203 | unique resources is useful to raise the price of that resource. |
|---|
| 204 | |
|---|
| 205 | 2/9 Resource model: There should be a more unified model of resources |
|---|
| 206 | that encompasses both non-unique and unique resources. This should make |
|---|
| 207 | it easier to think about and write code. Here is a new model: |
|---|
| 208 | |
|---|
| 209 | Resource Type: memory, IPv4Address, etc. |
|---|
| 210 | { identifier: |
|---|
| 211 | { 'capacity': c, 'price': p, 'time': t, 'price_statistics': ps, |
|---|
| 212 | 'misc': m} |
|---|
| 213 | ... |
|---|
| 214 | } |
|---|
| 215 | |
|---|
| 216 | identifier can be any scalar or a tuple (l, r) indicating a range. |
|---|
| 217 | |
|---|
| 218 | The issues: |
|---|
| 219 | |
|---|
| 220 | 1) When to combine identifiers into a range? |
|---|
| 221 | |
|---|
| 222 | 2) When to split them? |
|---|
| 223 | |
|---|
| 224 | 3) How to express preferences? |
|---|
| 225 | |
|---|
| 226 | Possibilities: |
|---|
| 227 | capacity id example |
|---|
| 228 | min, max any one disk |
|---|
| 229 | 1.0, 1.0 specific one port 80 |
|---|
| 230 | 1.0, 1.0 any one any IP address |
|---|
| 231 | min, max all |
|---|
| 232 | |
|---|
| 233 | Use the existing interface and defer this issue. |
|---|
| 234 | |
|---|
| 235 | 4) How to allocate based on preferences? |
|---|
| 236 | |
|---|
| 237 | Use the existing allocation algorithm and defer this issue. |
|---|
| 238 | |
|---|
| 239 | 5/2/2007 Unique resource states: |
|---|
| 240 | |
|---|
| 241 | 1) Allocated. These have individual entries in the status data |
|---|
| 242 | structure for keeping price statistics |
|---|
| 243 | 2) Recently Unallocated. These are the same as the allocated. |
|---|
| 244 | 3) Unallocated and old. These are coalesced with adjacent instances |
|---|
| 245 | and do not have price statistics. |
|---|
| 246 | |
|---|
| 247 | 5/23/2007 Unique resource pricing |
|---|
| 248 | |
|---|
| 249 | The current second price auction for unique resources has problems: |
|---|
| 250 | |
|---|
| 251 | 1) It is difficult to gauge demand. The price is zero most of the |
|---|
| 252 | time and only surges when someone else tries to acquire the |
|---|
| 253 | resource. In many cases, the other person will not even try to |
|---|
| 254 | acquire it because he can see that he will not meet the highest |
|---|
| 255 | bid. The inability to gauge demand prevents users from making |
|---|
| 256 | rational bids for the resource. |
|---|
| 257 | |
|---|
| 258 | 2) A users who tries to outbid another user may not pay anything for |
|---|
| 259 | the attempt. This will happen if there are only two users. Bidding |
|---|
| 260 | should cost something to limit the damage that a malicious user can |
|---|
| 261 | cause. In this case, the damage is denying a user the use of a |
|---|
| 262 | unique resource. |
|---|
| 263 | |
|---|
| 264 | 3) Paying zero most of the time for unique resources is unfair to the |
|---|
| 265 | provider. |
|---|
| 266 | |
|---|
| 267 | Solutions: |
|---|
| 268 | |
|---|
| 269 | 1a) Use the top bid to record current and historical prices. This will |
|---|
| 270 | give a better sense of demand. Unfortunately, this also allows an |
|---|
| 271 | attacker to easily outbid the top bidder. Publishing the top price |
|---|
| 272 | removes the sealed bid property. |
|---|
| 273 | |
|---|
| 274 | 1b) The top bidder pays the maximum of his minimum bid, the second |
|---|
| 275 | highest bid, and the provider's minimum bid. This is the reported |
|---|
| 276 | price. Users who want to advertise a high bid must pay for that |
|---|
| 277 | privilege. The problem with this is that it is unclear what a |
|---|
| 278 | late-arriving bidder should do since he does not know what bid |
|---|
| 279 | will be required to outbid the existing bidder. He will probably |
|---|
| 280 | bid when his valuation is much greater than the advertised price. |
|---|
| 281 | |
|---|
| 282 | 2) Failed bids always pay their bid instead of zero. This will limit |
|---|
| 283 | the damage that malicious attackers can do without overly charging |
|---|
| 284 | legitimate users. |
|---|
| 285 | |
|---|
| 286 | 5/31/2007 |
|---|
| 287 | |
|---|
| 288 | Bank bid: |
|---|
| 289 | |
|---|
| 290 | tycoon -u bank host bid tycoon-ext-02.hpl.hp.com 3y \ |
|---|
| 291 | memory:1,256MiB,1GiB,10000.9 CPU:1,0.10,1,10000.9 \ |
|---|
| 292 | disk:1,100GB,100GB,10000 NetworkInterfaceIncoming:1,0.5,1.0,10000.9 \ |
|---|
| 293 | NetworkInterfaceOutgoing:1,0.5,1.0,10000.9 \ |
|---|
| 294 | IPv4Address,204.123.34.92:1,10000 |
|---|
| 295 | |
|---|
| 296 | sls bid: |
|---|
| 297 | |
|---|
| 298 | tycoon -u sls host bid tycoon-ext-03.hpl.hp.com 3y \ |
|---|
| 299 | memory:1,256MiB,1GiB,10000.9 CPU:1,0.10,1,10000.9 \ |
|---|
| 300 | NetworkInterfaceIncoming:1,0.5,1.0,10000.9 \ |
|---|
| 301 | NetworkInterfaceOutgoing:1,0.5,1.0,10000.9 disk:1,1GiB,1GiB,10000 \ |
|---|
| 302 | IPv4Address,204.123.34.101:1,10000 |
|---|
| 303 | |
|---|
| 304 | 6/15/2007 |
|---|
| 305 | |
|---|
| 306 | Replay defense system: replay prevention is important in any system |
|---|
| 307 | that does not have real time bi-directional communication. Previous |
|---|
| 308 | work has used three basic techniques for replay prevention: 1) |
|---|
| 309 | timestamps, and 2) random nonces with history buffers, and 3) |
|---|
| 310 | incrementing nonces. |
|---|
| 311 | |
|---|
| 312 | With timestamps, the sender includes a timestamp in a message and the |
|---|
| 313 | source verifies that the timestamp is after an offset from its own |
|---|
| 314 | time. The disadvantages are that the clocks must be relatively |
|---|
| 315 | synchronized and timestamped messages can still be replayed within the |
|---|
| 316 | offset time. |
|---|
| 317 | |
|---|
| 318 | With random nones and history buffers, the sender includes a |
|---|
| 319 | randomly-generated identifier in a message and the receiver compares |
|---|
| 320 | it to entries in a buffer of received nonces. If the identifier is not |
|---|
| 321 | in the buffer, then it is accepted. Otherwise, it is rejected. The |
|---|
| 322 | disadvantage is that it requires unbounded state. |
|---|
| 323 | |
|---|
| 324 | With incrementing nonces, the sender simply keeps a counter that it |
|---|
| 325 | increments after every communication. The receiver compares this |
|---|
| 326 | counter to its largest seen counter and verifies that the new counter |
|---|
| 327 | is larger than any previously received counter. The disadvantage is |
|---|
| 328 | that the sender and receiver's state must be synchronized, which is |
|---|
| 329 | difficult when sender use multiple machines or the sender's machines |
|---|
| 330 | fails. |
|---|
| 331 | |
|---|
| 332 | Our solution is to combine the best properties of timestamps and |
|---|
| 333 | history buffers. The sender sends a timestamp with each message. The |
|---|
| 334 | receiver keeps a buffer of recently seen timestamps. If the |
|---|
| 335 | transmitted timestamp is after the oldest timestamp in the buffer and |
|---|
| 336 | is not in the buffer, then it is not a replay. Otherwise, it is either |
|---|
| 337 | too old, or a duplicate of timestamp in the buffer. Both are |
|---|
| 338 | suspicious, so the timestamp is rejected. As with a timestamp-only |
|---|
| 339 | approach, this technique uses a constant amount of state. As with the |
|---|
| 340 | history buffer approach, this technique can detect duplicates that are |
|---|
| 341 | within the clock skew horizon. |
|---|
| 342 | |
|---|
| 343 | In Tycoon, this system is used for RPCs and bank receipts. |
|---|
| 344 | |
|---|
| 345 | 9/12/2007 |
|---|
| 346 | |
|---|
| 347 | Service location service authentication. The SLS needs to authenticate |
|---|
| 348 | clients to prevent clients from hijacking other clients' addresses. |
|---|
| 349 | |
|---|
| 350 | This is complicated by web proxies and NATing. For example, the |
|---|
| 351 | simplest solution is to require that operations on an IP address be |
|---|
| 352 | from that IP address. This works for normal hosts and NAT-ed hosts, |
|---|
| 353 | but not for hosts behind web proxies. Another solution is to use the |
|---|
| 354 | address contained within the registration message. This works for |
|---|
| 355 | normal and proxied hosts, but NAT-ed hosts usually do not know their |
|---|
| 356 | externally routable address. In addition, this solution allows clients |
|---|
| 357 | to "squat" on the addresses of others, preventing them from being |
|---|
| 358 | registered by the real owner. |
|---|
| 359 | |
|---|
| 360 | There are variety of cases that the solution must handle: |
|---|
| 361 | |
|---|
| 362 | A. Well-behaved, non-proxied, non-nat-ed clients. |
|---|
| 363 | B. Well-behaved, nat-ed clients. They can be contacted externally. |
|---|
| 364 | C. Well-behaved, proxied clients. They cannot be contacted externally. |
|---|
| 365 | D. Squatting clients. |
|---|
| 366 | |
|---|
| 367 | |
|---|
| 368 | Here is a more robust (and complex) registration solution that allows |
|---|
| 369 | NAT-ed and proxied clients to co-exist: |
|---|
| 370 | |
|---|
| 371 | 1. The client contacts the SLS with its self-known address and the |
|---|
| 372 | externally known address used at the network layer. |
|---|
| 373 | |
|---|
| 374 | 2. The SLS tries to connect back to both the self-known address and |
|---|
| 375 | externally known address to verify the hash. |
|---|
| 376 | |
|---|
| 377 | external self-known |
|---|
| 378 | conn, hash verified conn, hash verified if ex==self-known, accept ex |
|---|
| 379 | conn, hash verified conn, hash failed accept ex, NAT w/external |
|---|
| 380 | conn, hash verified no conn accept ex, NAT w/external |
|---|
| 381 | conn, hash failed conn, hash verified accept self, proxy? |
|---|
| 382 | conn, hash failed conn, hash failed reject |
|---|
| 383 | conn, hash failed no conn accept self, NAT |
|---|
| 384 | no conn conn, hash verified accept self, proxy? |
|---|
| 385 | no conn conn, hash failed reject |
|---|
| 386 | no conn no conn accept self, proxy |
|---|
| 387 | |
|---|
| 388 | In any case where a registration was accepted without callback hash |
|---|
| 389 | verification, a new registration that passes callback hash |
|---|
| 390 | verification supercedes it. |
|---|
| 391 | |
|---|
| 392 | 4/1/2008 Migration |
|---|
| 393 | |
|---|
| 394 | Automatic VM migration is important for maintaining high reliability |
|---|
| 395 | and reducing admin overhead. The basic design is for clients to |
|---|
| 396 | request a migration. |
|---|
| 397 | |
|---|
| 398 | 4/1/2008 Uploading a file system |
|---|
| 399 | |
|---|
| 400 | This is challenging because python's XML-RPC implementation is not |
|---|
| 401 | designed to for streaming. More specifically, it assumes that all the |
|---|
| 402 | call arguments can be held in memory. Since file system images are in |
|---|
| 403 | the range of 300MB-2GB, this could cause paging and slow the entire |
|---|
| 404 | system down. Some alternatives: |
|---|
| 405 | |
|---|
| 406 | 1) Split the upload into multiple XML-RPC calls |
|---|
| 407 | |
|---|
| 408 | 2) Do the upload as a separate scp operation. The process would be: |
|---|
| 409 | a) Create_account with a dummy file system name |
|---|
| 410 | b) Upload using scp, overwriting the dummy file system |
|---|
| 411 | c) Configure the file system |
|---|
| 412 | |
|---|
| 413 | 5/30/2008 XML Parsing Benchmarks |
|---|
| 414 | 1.87 GHz laptop, 500,000 elements |
|---|
| 415 | Serialize Unserialize |
|---|
| 416 | Unpatched 1.9878 10.8072 |
|---|
| 417 | cElemenTree 1.9327 7.0276 |
|---|
| 418 | lxml 2.0.5 1.9271 9.7098 |
|---|
| 419 | |
|---|
| 420 | Local Variables: |
|---|
| 421 | mode:text |
|---|
| 422 | End: |
|---|