/----------------------------------------------------------------------------\
|                                                                            |
|      |                      The Greater Scroll                      |      |
|    -=*=-                            of                            -=*=-    |
|      |                        Dialing Wisdom                        |      |
|                                                                            |
\----------------------------------------------------------------------------/

*** WebTV/Microsoft Confidential ***

This document contains a lot of detail on how our service works. While not strictly "trade secret" information, care should be taken to keep this within WebTV/Microsoft.



Contents:

I. System Overview
A. Whassup with this
B. Joe gets wired
C. Recap
II. Telco Issues
A. Fancy words and TLAs
B. Dial patterns
C. Area code splits
D. Semi-automatic number identification
E. The local, the toll, and the ugly
F. POP, phone line, and network quality issues
III. Service Fundamentals
A. Dial overrides, Satan, and you
B. Introduction to tellyscripts
C. Dial sequences and the fallback number
D. The "clientinfo" command
E. Vend-A-Telly
F. Visible Dialing
IV. Dialing Details
A. PhoneDB details
B. Intro to POP load balancing and provider rotation
C. Tellyscript return codes
D. Dial patterns revisited
E. Secret codes, NVRAM, and "have you moved?"
F. How phone settings work
G. Radius, access numbers, and PSI
H. OpenISP
I. Client upgrades and brain-dead boxes
J. ComingSoon and friends
K. Pick-yer-POP
L. MessageWatch and EPG
M. Idle timeouts
N. Adding new providers
O. VideoAds
P. Automatic Number Frustration
Q. OraclePhoneDB and POPtimization
R. MCI and CHAP
S. Oh, Canada
T. Perhaps you should surf scriptless
V. Extra Goodies
A. POPtimization
B. Fallback usage cap
C. There's ECMA in my TellyScript
D. TVPAK and DialScript
VI. For Further Reading
A. On the web
B. In the service source tree

 


Revision history

1999/06/25 fadden Added references to new dialup options documentation.  Adjusted source tree references.
1999/03/19 fadden Updated for Grunge service.  Added V.D and some notes about dial overrides.
1998/11/02 fadden Minor corrections
1998/10/09 fadden Updated for Funk service.  Added IV.T, V.C, and numerous minor updates and corrections.
1998/07/07 fadden Updated with information on Etude service.  Converted to HTML   General release.
1998/01/07 fadden Added IV.P and made numerous small changes.
1997/11/13 fadden Added some new sections and many corrections.  General release.
1997/11/03 fadden Several corrections and new additions.  First public draft.
1997/10/30 fadden Complete rewrite; renamed to "Greater Scroll".
1997/08/11 fadden Touched up a bit for Microsoft folks.
1996/11/19 fadden Assorted notes on interactions between dialing options, "visible dialing", black holes, and Spooky dial Options.  (This was the first version widely distributed within the company.)
1996/11/17 fadden Load balancing makes its debut.
1996/11/11 fadden We now have flat-rate IAPs and associated nastiness.
1996/09/01 fadden Did something.
1996/08/30 fadden Added comment about handling "1-800 addicts".
1996/08/29 fadden Clarified access number handling.
1996/08/27 fadden First draft of "The Great Scroll of Dialing Wisdom".

 


-I- System Overview

-I.A- Whassup with this

The WebTV system is a combination of a set-top box and an online service. The set-top box ("WebTV Internet Terminal" or "WebTV Plus Receiver"; henceforth just "box"), is connected to a television and a phone line. Once it has successfully dialed into an Internet Service Provider, it connects to the WebTV service, and great things happen.

The simple act of getting a user connected to a local ISP is surprisingly difficult. This document explains the fundamentals of getting connected. The intended audience is Customer Care, SOC, Network Operations, QA, and Engineering. Not all sections are relevant for everyone.


The focus of this document is on the U.S. phone system. International issues, including a description of the Japanese phone system, can be found in the "IntlPhoneNotes" document.

Sections I, II, and III should be generally useful. Sections IV and V are more technical and aren't important for everyone to understand. I've chosen not to include the internal workings of tellyscript and PhoneDB generation in this document, because they're complicated, volatile, and really only necessary for engineering and a few people in SOC and netops.

Recommendations on Customer Care practices (notably with regard to dial overrides) are simply that: recommendations. They may or may not be consistent with current Customer Care policies.

 

-I.B- Joe gets wired

A brief example should help illustrate the major components of the WebTV system.

When Joe User brings his box home from the store, the first thing he does is try to set it up, usually without reading the instructions. Sometimes this even works. When the box is powered on, it listens for a dial tone on the phone line. (You can turn this off in the dialing options; if you do, it wants for a few seconds and then dials blindly.) If the phone line hasn't been plugged in or isn't hooked up correctly, the box will complain that it can't hear a dial tone, and offer to try again or go to the dialing options screen.

Joe gets everything wired up and tries again. This time the box hears a dial tone, so it dials a toll-free 800 number. This number is usually referred to as the "scriptlessd number" or (for historical reasons that we're hoping to obliterate) the "prereg number". Most users can connect to this without trouble once their box is set up properly.

Once connected, Joe's box starts talking to the scriptlessd server. scriptlessd gets the caller's phone number via a feature called ANI (Automatic Number Identification) that is similar to CallerID, except that it works from almost everywhere and can't be blocked. If the service is unable to get the user's number from ANI, scriptlessd will put up a screen asking the user to enter their phone number.

From the user's ANI we know where they are and what the closest POPs are (POP is Point Of Presence, typically a bank of modems connected to an Internet Service Provider, or ISP). The two or three best POPs are assigned to that user, and put into a set of dialing instructions called a "tellyscript" (a bad pun on a product from General Magic). The tellyscript tells the box which numbers to dial and how to dial them.

After getting the tellyscript, the box hangs up and dials the first POP in the list, which is hopefully a local call. If the first number is busy it will hang up and try another. After trying each of the POPs twice it may try to call a toll-free 800 "fallback" number.  Barring network outages or excessive local congestion, most users shouldn't need to use the fallback number.

With a little luck, most users will successfully connect to the WebTV Network without further intervention. The box connects to the "headwaiter" server, which tells it where to go. Shortly after connecting the box sends up a "phone log" (sometimes called a "connection log") that shows what numbers were dialed, what failed, and what ultimately succeeded. These logs are used to generate POP failure statistics and to debug problems.

When Joe turns his box off with the keyboard or remote, the tellyscript is saved in NVRAM (Non-Volatile Random Access Memory, which isn't what we're actually using but it works the same). The next time the box is powered on, it skips the scriptlessd step and dials directly into the local POP.

 

-I.C- Recap

The "box" is the thing what sits on your television set. The "service" is what it talks to when it gets dialed in. The service is composed of multiple "servers" that do specific things, like hand out "tellyscripts", show you the home page, or let you read mail. The box knows how to find "scriptlessd", scriptlessd knows how to find the "headwaiter", and headwaiterd knows how to find all the other servers.

ANI tells us the caller's phone number. From the ANI data we assign local POPs to the user. The specific dialing instructions are contained in a tellyscript.

Boxes with tellyscripts dial into their local POP, and connect to the headwaiter. Boxes without tellyscripts go to "scriptlessd" first to get a tellyscript, and then hang up and redial a local POP.

 


-II- Operation in Detail

-II.A- Fancy words and TLAs

While the evolution of the US phone system shows a great deal of careful and occasionally ingenious thought, there are some things about it that just plain suck. Before we can go into detail, there are a few terms and proper nouns that should be defined.

Telco means "telephone company". It's a generic term, as in "this telco billing stuff drives me nuts." Telco guys like to say telco a lot. Telco.

CCMI is Center for Communications Management Information. CCMI sells us a database of call pricing information that is frequently accurate.

POP is Point Of Presence, generally a bank of modems with a terminal server that connects to a network. The modems are usually part of a "hunt group", so that you can dial just one number, and if the first line is busy it "hunts" for the next free one. When we try to dial a POP, but get a person instead, we refer to it as dialing a MOM.

You sign up for a calling plan when you get telephone service hooked up. In the Bay Area you usually just choose between flat-rate and measured-rate service, but in other places you have a wide range of choices. For example, by adding a higher monthly charge to your phone bill you could get flat-rate local calling to a larger area.

A dial pattern tells you how many digits to dial when you're calling a particular number. In the US, these may be 7, 10, or 11 digits long. In the Bay Area you can usually call yourself with a short number like 614-5539 or a long one like 1-650-614-5539, but in other areas the systems are less lenient. In some cases it can be more expensive to dial 11 digits than 7. The customer's choice of calling plan can affect the dial patterns that they have to use.

TellyScripts are a WebTV creation; they're programs sent to the box by the service. They contain instructions that tell the box how to configure the modem, which POPs to call, and how it should dial them.

V.xxx is a modem protocol.  V.32 handles connections up to 9600Kbps, V.32bis goes up to 14.4Kbps, V.34 goes up to 33.6Kbps, and V.90 handles 56K (which is really at most 53K, because the FCC won't let them go any faster).  V.FC is an alternative, "deprecated" 28.8Kbps standard.  X2 and K56Flex are competing 56K proposals that have been superseded by V.90.  V.42 defines modem error correction, and V.42bis provides data compression.

LEC is Local Exchange Carrier. These are the guys who handle local calls and "local toll". Pacific Bell is our LEC. CLECs are Competitive Local Exchange Carriers, a new kind of carrier made possible by the 1996 Telecommunications Act. You too can run your own phone company.  Non-CLECs are referred to as ILECs, or Incumbent Local Exchange Carriers.

RBOC is Regional Bell Operating Company. These are the "Baby Bells" that got spun out of AT&T several years ago. Pacific Bell is an RBOC. Sometimes these are just referred to as "BOC"s.

IOC and UOC are CCMI abbreviations for Independent Operating Company and Unknown Operating Company. Contrast with BOC. IOCs tend to be smaller phone companies or CLECs, UOCs are usually phone companies run by rural cooperatives or out of somebody's garage. An IOC that CCMI doesn't know anything about is a UOC.

IXC (sometimes IEC) is Inter-eXchange Carrier. This is a fancy term for "long distance company" that telco people like to throw around. AT&T is an IXC. When you make a long-distance call, the IXC has to pay money to the LEC where the call came from and to the LEC where the call went to, so calls that avoid IXCs tend to be cheaper.

LATA is Local Access Transport Area, a geographical region defined by the phone companies. The way things traditionally worked is that your LEC handles local calls and intra-LATA (in the same LATA) toll calls, while your IXC handles inter-LATA (between LATA) toll calls. So a toll call to a location 20 miles north might be handled by Pacific Bell, while a similar call in the other direction might be handled by AT&T, based on where the LATA boundaries fall. Calls that cross state boundaries follow an even more mysterious set of rules.

The Telecommunications Act of 1996 really screwed everything up. Your IXCs can be LECs, CLECs can provide local service with the LEC's equipment, and generally anybody can do anything. This is why MCI can offer local service now.

PIC is Primary Interexchange Carrier. This term can be used both as a verb and an adjective. Your phone line can be "PICed" to use a specific carrier for your IXC, and more recently you can have an intra-LATA PIC done for local toll calls. A "PIC code" is a sequence of digits that you can enter before dialing a number to choose a different carrier; examples are 10288 (1-0-ATT) or 1010321 (Telecom*USA's 10-10-3-2-1 program). "PIC charges" are the fees that your IXC pays to your LEC when you change your long distance company. The PIC code format has changed from 10XXX to 101XXXX, e.g. 10-321 to 10-10-321, because they were running out of codes.

Tariffs tell you how much a call between two points costs. For long distance calls, the tariffs from the LECs on both ends and the relevant IXC all have to be factored in.

PUC is Public Utilities Commission. The PUC in each state has a great deal of control over the tariffs that the phone companies use. There are places where a long-distance call handled by AT&T is completely free, because the PUC decided that it should be.

Local calls, in the telco world, are not necessarily free calls. The difference between local and toll is defined by the tariffs, which are filed by the phone companies and monitored by the PUCs. Pacific Bell defines "zone 3" calls, which charge per-minute rates even to subscribers with flat-rate plans, as local. In the WebTV world we try to define "Local" as least-cost and "Expensive Local" (ExpLocal) as any local call that is more expensive than the minimum. We calculate the minimum by figuring out what it would cost for the customer to call himself. Any local call that costs more is labeled ExpLocal.

The rate center is a geographic point used for billing purposes. "MTS" (Message Toll Service) coordinates are based on the rate center. The cost of a long distance call is based on "major MTS coordinates" for calls over 40 miles, and "minor MTS coordinates" for calls under 40 miles. For local calls the "wire center" coordinates are used. Yes, it could be more complicated: the coordinates are specified in "V&H" (Vertical and Horizontal) units, 1670 feet each.

POTS is Plain Old Telephone Service. The term is used to differentiate standard phone service from things like cable modems or cellular phones.

C.O. is Central Office. In the typical house or apartment, a pair of copper wires runs from your telephone to the central office. The distance between your phone (or, more importantly, your WebTV box) and the central office, and how well the wires are shielded, can affect the quality of your phone connection and hence your modem connect rate.

NPA/NXX is the obfuscated term for area code and prefix. If your phone number is 650-614-5539, your NPA is 650 and your NXX is 614. The NPA and NXX are enough to identify where the call is coming from. The last four digits of the phone number are sometimes called the "subscriber number". In some contexts the term "exchange" is synonymous with NPA/NXX.

An Exchange Area is defined by CCMI as a collection of NPA/NXXs for which the billing is identical. For example, two calls from anywhere in Palo Alto will have the same cost so long as both callers have the same calling plan and service providers. Exchange areas may include dozens of NPA/NXXs or might only have one. They might overlap geographically (because of paging/cellular exchanges), but each NPA/NXX is part of only one exchange area.

LCA is short for Local Calling Area. The LCA for Palo Alto is the set of exchange areas that are local calls from the Palo Alto exchange area. Put more simply, if you're a local call for me, then you're in my LCA. LCAs may overlap. LCAs aren't necessarily symmetric; just because you are a local call for me doesn't mean that I am a local call for you.

NANP is the North American Numbering Plan. The Plan defines all the area codes, how dialing patterns will work in the future, and other dry subjects. It's "NANP" rather than "USNP" because it applies to Canada, Guam, and places out in the Caribbean, all of which are part of North America if you lean back and squint. It does not cover Mexico.

ISP and IAP are Internet Service Provider and Internet Access Provider. They are essentially the same thing, with a subtle and unimportant difference. We usually refer to them as IAPs. Concentric Networks Corp. (cnc), PSINet, Inc. (psi), and UUNET Technologies, Inc. (uunet) are examples of IAPs.

The backhoe is a large piece of construction equipment used for digging trenches and cutting through network cables at inopportune moments.

The PhoneDB is a WebTV creation that combines the CCMI data with a list of POPs from several IAPs, and comes up with POP assignments for every NPA/NXX. (If you understand what I just said, you're ready to graduate.) The POP-O-Rama web page lets you do queries on current and past PhoneDBs.

 

-II.B- Dial patterns

People who grew up in California were spoiled by Pacific Bell's coherent dialing pattern system. For the most part, you can dial to any point within the same area code by entering a 7-digit number, and you get to numbers in other area codes by entering an 11-digit number. Dialing numbers in the same area code using an 11-digit number is allowed.

Other parts of the country aren't as straightforward. There are actually four kinds of calls you can make:

HL - Home area code, Local call. Calls within Mountain View are HL.
HT - Home area code, Toll call. Sunnyvale (408) calling Santa Cruz (408).
FL - Foreign area code, Local call. Mountain View (650) to Sunnyvale (408).
FT - Foreign area code, Toll call. Mountain View to New York.

Each of the four types can have a different "expected" dialing pattern, as well as a "permitted" dialing pattern. Certain combinations have unpleasant consequences.

HL is almost always 7 digits, but some places (like Maryland) require 10-digit dialing for all local calls. Yes, you have to include the area code to call your neighbor down the street. Enlightened areas like California have 11-digit dialing as a "permitted" HL pattern.

HT is generally 7, 11, or both. Places that require 7-digit dialing for home/local calls and require 11-digit dialing for home/toll calls are troublesome, because the number of digits depends on whether the destination is a local call, and the definition of "local" depends on your calling plan. In many cases there is no way for WebTV to know ahead of time how many digits the box should dial. Guessing wrong results in a recording from the phone company.

FL is usually 10 or 11, but in some cases is 7. In nasty cases it's 7 and 10/11 aren't allowed at all. It's nasty because we are required to dial a 7-digit number into a different area code when the call is local, but would be dialing an 11-digit number if the call were toll. So if we think something is local when it really isn't, we could be dialing a 7-digit number in the caller's area code rather than the callee's area code, and the WebTV box will be waking up somebody's grandmother. The service takes great pains to avoid this situation.

FT is always 11, no exceptions.


Using the right pattern can be important. For example, there are places where you are either not allowed to dial 11-digit numbers for local calls, or are charged more than you would for dialing 7 (presumably because the call is routed through the IXC as soon as the leading '1' is seen, instead of being handled by the LEC).

The CCMI database has "hints" on dialing patterns, but they are sometimes inaccurate. Because the dialing pattern depends on whether a call is local or toll, it depends on what your calling plan defines as being local. This makes it a bit of a challenge to get the dial pattern right. To work around these issues, the WebTV service takes the best guess it can, and remembers the cases that succeed.

The service remembers a set of dialing patterns that looks like this (output is from "dpedit", the Dial Pattern EDITor):

The dial patterns for '01fad82501b002ba' (ANI=004154631671) are:
  S  # ANI          POP          Mode
  +  0 415-614-5539 415-233-0570 7-digit
  +  1 415-614-5539 415-322-0489 11-digit
  +  2 415-463-1671 415-233-0570 7-digit
  I  3 415-463-1671 415-666-9999 7-digit
  +  4 415-463-1660 415-322-0489 11-digit
  +  5 415-463-1660 415-233-0570 7-digit
  N  6 415-463-1660 510-742-0207 11-digit
  -  7 <empty>

Each line is one entry in the dial pattern table. It has the person's ANI at the time the call was placed, the POP number that the person was calling, and how many digits were used to dial it. We have to record the ANI, because if they move the box to a different place, or even to a different phone line with a different calling plan, the dial patterns can be different. Same story for area code splits (see next section).

When a user first signs up, or first appears at a new number, we have no information about a person's dial patterns. The tellyscript that gets sent down will first try one pattern, then if that fails, it will try the next. When one succeeds, we add an entry to the table.

Suppose the tellyscript for Palo Alto first tries 7-digit dialing and then tries 11-digit dialing. What happens if the POP happens to be busy on the first attempt, but succeeds on the second? We will end up recording a success with 11-digit dialing, and will use that from then on. This isn't perfect, but it's hard to tell the difference between different kinds of failures ("all circuits are busy" sounds just like "you don't need to dial a 1 in front of that" to the modem). Most of the time it works.

A problem that occasionally surfaces is with customers who turn "audible dialing" on and get excited when the first attempt fails. If they were to wait for a minute or two until the box timed out and tried the next number, everything would work out fine; but instead they hear the first attempt fail and immediately call Customer Care. The solution is NOT a dial override, but rather to encourage the customer to have more patience. (In one case the user was told to use the 32768 secret code, which clears out all of the settings in NVRAM. This turned off audible dialing. The customer successfully dialed in shortly thereafter.)

It is also possible for a customer's dialing patterns to change over time, perhaps because they change local calling plans. This is not handled automatically, because the service can't easily distinguish a dead POP from a bad pattern. Once again, the solution is NOT a dial override. The "dpedit" utility can be used to adjust the dial patterns. Once changed, send the user through the "new number" routine so they go back through scriptlessd and get a script with the updated data immediately.

See the dpedit README file for details on using it.


Sometimes there are exceptions to dial pattern rules within a certain area. For example, there used to be an InternetMCI POP at 415-482-2900 in Redwood City that was a local call from Palo Alto. Every other call to Redwood City could be dialed with 7 or 11 digits, but not that one. If you didn't use 7-digit dialing, you got a recording chastising you for being so clueless. The moral of the story is that there's no way to know for sure what will work until it's tried.

Things can get pretty weird. In the 608-326 exchange in Wisconsin, if you call "873-xxxx", you get a local number in Iowa at 1-319-873-xxxx. If, on the other hand, you dial 1-608-873-xxxx, you make a toll call to another point in Wisconsin. Even though you're in the 608 area code, and there's a 608-873-xxxx, your call to "873-xxxx" goes to a different area code. In this particular case, we're allowed to dial 1-319-873-xxxx, so by using 11-digit dialing there's no ambiguity.


One other note: the list of dial patterns only determines whether the box dials 7, 10, or 11 digits when calling a POP. It does *not* decide which POP a customer will get, or in what order they will be tried.

 

-II.C- Area code splits

Area code splits come in two varieties, geographical splits and overlays. Geographical splits are done like the 415/510 and 415/650 splits, where a geographic region gets a different area code. With overlays, the same area gets two area codes. Usually one area code is used for voice, while the other is used for FAX machines, pagers, and cellular phones.

For both kinds of splits, the transition is done over a period of a few months. The following chart illustrates the process, assuming that somebody in San Francisco at 415-659-0610 and somebody in Palo Alto at 415-614-5539 (changing to 650-614-5539) are trying to call each other.

(1) Pre-split. The 650 area code does not exist yet.

From S.F., dialing 614-5539 works.
From S.F., dialing 1-415-614-5539 works.
From S.F., dialing 1-650-614-5539 results in a "what the hell area code is that?" message.

From P.A., dialing 659-0610 works.
From P.A., dialing 1-415-659-0610 works.

The ANI for the person in Palo Alto is 415-614-5539.

(2) "Permissive" dialing. You are allowed, but not required, to dial 650.

From S.F., dialing 614-5539 works.
From S.F., dialing 1-415-614-5539 works.
From S.F., dialing 1-650-614-5539 works.

From P.A., dialing 659-0610 works.
From P.A., dialing 1-415-659-0610 works.

The ANI for the called person is now 650-614-5539. (Sometimes the local phone companies blow this, and do it early or late. It's unwise to assume that the ANI will change at the very start of the permissive period.)

(3) "Mandatory" dialing (usually starts about 6 months after "permissive").

From S.F., dialing 614-5539 gets a "you need to dial 650" recording.
From S.F., dialing 1-415-614-5539 gets a "you need to dial 650" recording.
From S.F., dialing 1-650-614-5539 works.

From P.A., dialing 659-0610 gets a "you need to dial 415" recording.
From P.A., dialing 1-415-659-0610 works.

(4) Eventually the no-longer-used numbers get reassigned.

From S.F., dialing 614-5539 gets a wrong number.
From S.F., dialing 1-415-614-5539 gets a wrong number.
From S.F., dialing 1-650-614-5539 works.

From P.A., dialing 659-0610 gets a wrong number.
From P.A., dialing 1-415-659-0610 works.

What makes area code splits especially frustrating for us is that the dial pattern can change. Before the split, if you were in Palo Alto and calling a San Francisco POP at 415-659-0610, you could just dial 659-0610. After the split, you would be calling a number in a different area code, and would be required to dial 1-415-659-0610. Even though you haven't moved, your ANI has changed out from under you. The WebTV service can't fix you if you can't log in, and guess what, you can't log in except through the 800 number.

The good news is that if you make your box go back through scriptlessd, it will detect that your ANI has changed, and all of your old dial patterns will be ignored because they were tied to your old ANI. Ideally we wouldn't have to put the users through this manual step, and would either send them back through scriptlessd automatically or just make the change to their area code directly. But how do we do this?

One solution here is to have an 800 fallback number that also gets your ANI, and compare the current ANI with the ANI on record. If all of your local POPs are failing because we're using the wrong dial pattern, you end up on the fallback number, and once there we can automatically detect that it's because your area code changed. Also, given sufficiently detailed information about area code splits, we could program the box to dial a different set of numbers depending on whether "today" is pre-split or post-split. The latter solution isn't perfect, because if the box loses power it forgets what day it is, but it's a little cleaner.

You might be tempted to think that dialing the full 11-digit number every time would solve this problem. In the San Francisco/Palo Alto example above, the 11-digit pattern worked correctly in every case. Unfortunately, as mentioned in the section on dial patterns, 11-digit calls might either be disallowed or might be more expensive than a 7-digit call to the same number.

A particularly troublesome area code split happened in Maryland in the middle of 1997. Not only did the area code split, but all local calls suddenly had to be dialed with 10-digit numbers. This change required that the service "forget" all 7-digit patterns for callers whose ANI showed them to be in Maryland. The service config option IgnoreDialPattern was added to deal with changes like this in the future.

 

-II.D- Semi-automatic number identification

When we get the caller's phone number via ANI on the 800 scriptlessd number, we get a little more data with it. A typical ANI string looks like "006506145539". The last 10 digits are the phone number. The first two are the OLS (Originating Line Screening) code. This allows us to tell if somebody is calling in from a prison, hotel room, or pay phone rather than a standard phone line.

At least, it would, if we were able to get at the OLS code with our systems, which we can't. But I digress.

If you're calling in from a point in the United States, Canada, or affiliated areas like Puerto Rico, chances are the ANI number is valid. There are specific regions that don't support ANI, however, and there are times when the ANI just doesn't seem to want to show up.

In cases like these, the service will ask the user to enter their own phone number. It doesn't need to be exactly right; it just needs to be in the same "exchange area" as the box. If the person has two phone lines, and puts in the voice number, it will usually work just fine. If the service for the lines are provided by different local phone companies, though, the billing can be quite different, so the system works best when the number comes from ANI.

To make it easier to diagnose cases where the user entered the wrong value for their phone number, the service labels "manual ANI" entries by replacing the OLS code with a WebTV-defined value. Some interesting values:

99 (+ 10 digits) - number was entered on "enter your phone number" screen.
98 (+0000000000) - special code used; probably an international demo box.
97 (+0000000000) - special code used; probably an international demo box.
96 (+ 10 digits) - number changed with dpedit or clientpopedit.
95 (+0000000000) - service is ignoring ANI values (never on production!)

If somebody is dialing a totally inappropriate set of POPs, and their ANI number starts with "99", chances are they entered the wrong number on the "enter your phone number" screen. WebTV isn't responsible for toll charges incurred by sticky-fingered users, but diagnosing this quickly will leave the customer happier. Sometimes you need to check the "ANI history" to see if they blew it at some point in the past.


What happens if we successfully get the user's ANI but can't recognize the number? This happens when new exchanges are added faster than CCMI can keep up. In cases like this, we give the user the "global default" POP, which is usually an 800 number embedded in the PhoneDB.

When we finally put out a PhoneDB that does recognize their ANI, we will automatically send them a new tellyscript with the appropriate POPs when they next visit the headwaiter. If the PhoneDB "forgets" some numbers, possibly because an old area code split has caused some exchanges to cease to exist, we will simply stop updating their tellyscript until the next time they go through scriptlessd. (The service should actually force them back through scriptlessd once, in case their ANI changed as part of an area code split but we never caught it. This is currently an open bug.)

If we get the ANI, and we recognize it, but it's for an area that we don't yet support (e.g. Puerto Rico), we don't send the user a tellyscript at all. Instead they just get a message saying that WebTV isn't yet supported in their area.


What happens if we don't get their ANI, and it's a "Classic" box doing a flash download? Now we're in trouble: we don't have their ANI, and we can't put up a user interface and ask because the "Classic" flash downloader doesn't have a user interface. If they're talking to scriptlessd, they must be brain-dead, probably from an earlier failed download. We temporarily send them to an 800 number (the "NoANI" number), until they can finish the download. When the download finishes successfully, the box will automatically go back through scriptlessd and get a regular set of POPs.

This has the added bonus of giving most users a more stable environment for doing the download, because the POP they're calling is under our control.


One of the pitfalls of using ANI is that it only works when the user dials into an 800 number. It's very important that we know where the box is, because if we have the wrong value for their ANI -- perhaps because the user moved the box -- we will be handing out the wrong set of POPs. If one of those POPs is a 7-digit number, we could be dialing a 7-digit number in the wrong area code, and call a MOM instead. On the other hand, 800# calls are expensive, and we have limited capacity on the modem racks, so we can't have the box dial into the 800 number every time the box powers up.

The current approach for dealing with this is to assume that the box might have moved whenever it loses power. We display a message the first time the box turns on after losing power that shows their phone number (e.g. "650-614-XXXX"; the last four digits are blanked in case they return the box to the store). If the user has moved the box to a different phone number, they can just hit "Moved", and the box will go back through scriptlessd. Versions of the box before client 1.2 weren't able to display the ANI number in the dialog.

As of the Funk service, if the user dials in through one of our 800 numbers (probably a fallback number after all of their regular POPs have failed) we will check their ANI and decide whether or not to send them a new script.  The goal is to automatically find people whose local dialing has been completely disrupted by area code splits or neglecting to tell the box that they have moved.


A practical issue that has arisen on a few occasions is when a helpful store salesman runs the box through an initial scriptlessd connection before the customer takes it home. If the customer gets home and asserts that the box hasn't moved, they will end up with a tellyscript for the store's ANI rather than their own ANI. Because most of the units on shelves are client 1.0, they can't display the partial ANI in the "have you moved" dialog.

The workaround was to put a test at the start of registration that figures out how long it has been since the box went through scriptlessd. If it has been more than a certain amount of time, the box is thrown out and must come back in through the 800 number. In the usual (non-helpful-salesman) case, the box will proceed to registration within a few minutes of visiting scriptlessd, so with a suitably defined interval -- currently 15 minutes -- we can solve the problem without creating a new one.

 

-II.E- The local, the toll, and the ugly

Figuring out what's local and what's not is far more difficult than you might expect. The single biggest obstacle is the lack of completely accurate data. What we get from CCMI is fairly accurate, but they're collecting tariff data from dozens of companies on hundreds of calling plans for 25,000 different exchange areas. With that much data, in a system as convoluted as the U.S. phone system, there's bound to be problems, and there's an awful lot of "process" between finding a problem and getting it fixed.

We also have trouble with missing data. Some LCAs are entirely unsupported, others are partially supported. A "partially supported" LCA is one where the data is loaded once, when somebody asks for it. It isn't kept up to date, and there is no pricing information associated with the local calls. Based on this data the PhoneDB generator can tell that a call is local, or at least was local in the recent past, but can't tell how much it costs. This makes it impossible to distinguish between "Local" and "Expensive Local".

The myriad filters and fancy footwork we do when generating a PhoneDB are outside the scope of this document. What's important is to understand how far you can trust the data and why it might be wrong, so that you can understand POP-O-Rama output and try to differentiate customer error from CCMI error.


Here's an example of output from the "lookuppop" tool, which generates the output for the POP-O-Rama web page:

For 561-357-0000 from W PALM BCH, FL (base cost=0):
  cnc/561-227-0012 in or near "West Palm Beach, FL" (W PALM BCH, FL)
    LOCAL 0.0mi  [wc=7.6mi] cost=0 
      --> 227-0012 then 1-561-227-0012
  uunet/561-681-9557 in or near "West Palm Beach, FL" (W PALM BCH, FL)
    LOCAL 0.0mi  [wc=5.4mi] cost=0 
      --> 681-9557 then 1-561-681-9557
  cnc/561-226-0010 in or near "Boca Raton, FL" (BOCA RATON, FL)
    ExpLocal 23.7mi  [wc=19.0mi] cost=1840 
      --> 226-0010 then 1-561-226-0010
  uunet/561-368-8801 in or near "Boca Raton, FL" (BOCA RATON, FL)
    ExpLocal 23.7mi  [wc=19.0mi] cost=1840 
      --> 368-8801 then 1-561-368-8801
  psi/954-971-5720 in or near "Pompano Beach, FL" (POMPANOBCH, FL)
    toll* 31.9mi  [wc=26.6mi] cost=2927 
      --> 1-954-971-5720
  uunet/954-486-4806 in or near "Fort Lauderdale, FL" (FTLAUDERDL, FL)
    toll 39.9mi  [wc=31.9mi] cost=2927 
      --> 1-954-486-4806
  cnc/954-845-0336 in or near "Ft. Lauderdale, FL" (FTLAUDERDL, FL)
    toll 39.9mi  [wc=36.4mi] cost=2927 
      --> 1-954-845-0336
  cnc/305-651-1819 in or near "Miami, FL" (NORTH DADE, FL)
    toll 53.5mi  [wc=46.8mi] cost=2927 
      --> 1-305-651-1819

The first line identifies the exchange where the caller is. In this case, I asked for "561-357", and it filled in the last four digits with zeros (remember, you only need the NPA and NXX to identify the location). The location name is "W PALM BCH, FL". The names are cryptic because the CCMI database only has space for 10 characters, and they're all upper case. "FL" is the state, in this case Florida. "Base cost" is what we computed it would cost for somebody in the 561-357 NPA/NXX to call themselves, based on a call of a certain duration at a certain time of day. DO NOT tell this cost to a customer! It might be based on a calling plan other than what the customer has, and we don't want to be held responsible for giving out cost figures that are based on inappropriate or possibly even inaccurate data.

After the first line are eight sets of three lines, with one line for each POP. The first line in each set identifies the POP. "cnc/561-227-0012" means it's a Concentric Networks POP at 561-227-0012. There are two city names, "West Palm Beach" and "W PALM BCH". The latter is supplied by CCMI. The former is sent to us by the IAP, can be edited fairly easily, and is displayed to the customer in the "have you moved" dialog. The names don't always match up; note that the last entry says "Miami" and "NORTH DADE". This is generally because the CCMI entry describes things from the telco perspective. For example, the Pacific Bell phone book describes Cupertino as being in "San Jose 2", and CCMI shows Cupertino numbers as being in "SAN JOSE W". Ditto for Menlo Park, which appears to be in PALO ALTO. In general, the "nice" name is more accurate. If you believe the two are totally out of whack, ask the SOC to look into it.

There is no "nice" name on the top line, because (1) we only have "nice" names for places where the POPs are, and (2) the NPA/NXX isn't enough to tell you what city the person lives in. Some NPA/NXXs cover more than one city.

The next line tells you about what it costs for a user at the NPA/NXX to call that POP. The first word is one of the following:

LOCAL
We believe the call is local, and that the cost of the call is the same as if the user called themselves.
ExpLocal
CCMI says it's a local call, but it's more expensive to call than other local calls. Zone 3 calls in California are ExpLocal.
PseudoLocal
Equivalent to ExpLocal in almost every respect. Explained below.
toll
This is a toll call. It might be a "local toll" handled by the LEC or a long-distance call handled by an IXC.

(In the ancient days of yore, there was a distinction between "LOCAL" and "local". The LocalMustEqualCostToSelf PhoneDB feature removed this distinction.)

Regardless of how the calls price out, local calls always come before ExpLocal, and ExpLocal calls always come before toll. Toll calls that are cheaper than local calls are extremely rare, so we always prefer the local calls just in case there's an error in the tariff data.

Entries with an asterisk (e.g. "toll*") denote a certain kind of IAP. This is explained later. Usually you should just ignore the asterisk.

The number after the local/toll indication is the distance in miles between the rate center for the caller and the rate center for the POP, using the "minor" (a/k/a "under 40") MTS coordinates. Put more simply, it's how far apart the phone company thinks the two points are. Calls aren't usually local beyond 10 or 15 miles, but there's one case in Florida where you could make a 135-mile local call for $0.25 per call.

The next number in square brackets is the distance between the wire centers for the caller and the POP. In some situations the wire center distance is used when pricing local calls. As you can see in the example above, the MTS coordinate distances are both 0.0, but the wire center distances are slightly different. Usually the numbers are pretty close, but because of the way some POPs are connected to the phone system, the wc numbers can be large (perhaps 20 miles). When tracking down problems, it's usually best to pay attention to the first number (the MTS coordinate) and ignore the wc coordinate.

The final item on the line is the cost of a call made for a given duration at a specific time of day on a particular day of week with a certain calling plan. Sometimes we average rates from multiple carriers together, which complicates things. At any rate (no pun intended), it's the most important value we use when deciding the order in which to hand out POPs.

The last line of the output shows the dialing patterns that we will try, in the order that we will try them. For the first entry we will try 7-digit dialing and then 11-digit dialing (it's a home/local call); for the last entry we just try 11-digit (it's foreign/toll).


Occasionally you will see entries that look like this:

For 205-526-0000 from LEESBURG, AL (base cost=241):
  tdsnet/205-927-6200 in or near "Centre, AL" (CENTRE, AL)
    PsuedoLocal 5.1mi  [wc=5.1mi] cost=2040  [LCA not sup]
      --> 927-6200 then 1-205-927-6200
  tdsnet/205-528-6200 in or near "Crossville, AL" (CROSSVILLE, AL)
    toll 14.5mi  [wc=14.5mi] cost=3137  [LCA not sup]
      --> 1-205-528-6200 then 528-6200

The end of the second line in each set may have a special code in square brackets. The most popular ones are "unsupported local" and "LCA not sup". When you see "unsupported local", it means that we have the LCA (Local Calling Area) definition, but no rate information (this is the "partially supported" LCA data mentioned earlier). Chances are the LCA is not getting updated regularly, but since these LCAs are usually small rural areas, it probably doesn't need to get updated very often.

When you see "LCA not sup" it means we have no information at all about the LCA for this area. We just plain can't tell what calls are local, and have to punt.

Well, that's not entirely true. If the caller and POP are in the same exchange area, we go ahead and assume that it's a local call. We also have a feature where we declare that everything within a specific radius (currently 10 miles) of the caller in an "LCA not sup" area is local. Since we can't determine the cost, we define them to be ExpLocal. To make the distinction clear, we display ExpLocal calls in "LCA not sup" areas as "PseudoLocal". As mentioned above, PseudoLocal is functionally equivalent to ExpLocal; we just show it differently because the definition of "local" is based purely on MTS distance rather than telco tariffs, and therefore is more prone to problems.

The motivation for doing PseudoLocal was that ExpLocal calls are always prioritized ahead of toll calls. Because of weirdnesses in the phone system, it may cost more to call yourself with AT&T than it would to call the other side of the country. Without PseudoLocal, people in some rural areas -- who most likely had local POPs nearby -- were being told to dial distant locations, because an AT&T call cost less, and the only rating information we had was for the IXCs. (You might be tempted to just do the POP assignments by distance rather than cost, but there are many areas where distance and cost don't correlate. Some 50-mile calls in Florida are more expensive than 300-mile calls into a different state.)

There's a problem with doing this though. Suppose we're in an area where local calls that cross area code boundaries (FL) require 7-digit dialing. Suppose further that we're in an unsupported LCA. We're now in the uncomfortable position of telling the box to use 7-digit dialing across area codes, based solely on the fact that the POP is less than 10 miles from the caller. Fortunately it's easy to manually verify that we're not doing bad assignments; just dial the 7-digit POP number, using the caller's area code. If you get something other than a recording, we might be in trouble. (Turning off UnsupLCADistOnlyRadius fixes it, but then we lose PseudoLocal, which will make us rather unpopular with some customers.)

Ideally we would be able to add our own LCA definitions to the CCMI data, and avoid the problems entirely. Of 25,000 or so exchange areas, 5,000 are completely unsupported. Some of those 5,000 are devoted exclusively to paging or cellular activity, which don't have local calling areas that we're interested in, but most of them are just low-density areas.   Maintaining a complete set of data for areas with a tiny handful of people isn't cost-effective, for us or for CCMI, but it would be nice if we could fix the areas where we do have some customers.


A more insidious problem has occurred in a few places, notably parts of Texas (Grand Prairie, anyone?). In these cases, CCMI had only one local calling plan in the database, and it was an extended-area "metro" plan that not all of our customers had signed up for. The data that we got out of CCMI showed certain POPs as being free local calls, and sure enough, they were free for everybody who had signed up for the extended plan. The rest of the people were a trifle irked.

The PhoneDB generation process scans the entire set of local calling plans, and always uses the most restrictive definition. When a wide-area plan is the most restrictive definition of an LCA, we're in trouble.

This sort of problem is difficult to deal with, because in these situations the CCMI data is accurate. It just happens to be incomplete. In this particular case I asked them to add the standard calling plan, and they said they would look into it. This is another scenario where being able to tweak the local calling plan definitions would be useful. We can do a limited amount of fixing with the "ChangeCallCost" PhoneDB feature, but that's clumsy at best.


There are some other odd things you might see in POP-O-Rama output, like:

For 604-523-0000 from NWESTMNSTR, BC (base cost=??):
  uunetdan/360-383-1000 in or near "Bellingham, WA" (FERNDALE, WA)
    toll?? 29.4mi  [wc=0.0mi] cost=??  [origin not in DB]
      --> 1-360-383-1000

"Origin not in DB" happens because the point of origin is in Canada, and at the time this example was created we weren't able to price calls made from Canada. Note that "base cost" is "??", which means that we weren't able to figure out what it would cost for someone in 604-523 to call themselves.

For 817-278-0000 from EULESS, TX (base cost=0):
  cnc/972-375-0501 in or near "Dallas, TX" (GRAND PRAR, TX)
    ExpLocal 8.9mi  [wc=8.2mi] cost=242  [hacked!]
      --> 1-972-375-0501 then 972-375-0501

You will see "hacked!" when the kind of call and cost of the call have been explicitly changed by the person generating the PhoneDB. (There's probably a better word to use than "hacked".)


All of our local cost calculations are actually based on business rate plans. There are residential rate plans available in the CCMI database, but very few of CCMI's customers actually use them, so they're not as carefully scrutinized. A comparison of residential vs. business rates done early in 1997 suggested that, while some areas were more accurately rated using the residential data, other areas seemed wildly inaccurate. The decision was made to ignore the residential rate data for now.


If you find yourself answering a phone call or an e-mail message from a customer who claims that a POP isn't local even though we think it is, don't jump to any conclusions without some corroborating evidence. I received a handful of bug reports saying that 510-742-xxxx (in Fremont) wasn't local from Palo Alto, even though the pages in the front of the Pacific Bell white pages showed that it was. People in areas with low population densities will often assume that exchanges they don't recognize aren't local. (This problem has returned, too: now people in 510 don't realize that they can dial into the northern part of San Jose. Sigh.)

Of course, it would be a bad idea to dismiss such claims out of hand. The best evidence is a phone bill that shows the POP as being non-local. There have been several cases where the phone company mis-billed a call, either because of 11-digit dial patterns or errors on their part; with the bill in hand we can easily get either the telco or CCMI to straighten out their data. If the customer hasn't yet received a bill, a call to the business office or even an operator at the telco that handles the call will resolve the matter, but there have been cases where conflicting answers have come from the same source on subsequent calls. Also, be sure that you're talking to the right LEC, because different carriers will have different calling plans.

Local vs toll issues should be reported to the SOC. If you're the one investigating a complaint, and we don't have a phone bill to look at, you should talk to the operator about the calls in question and ask whether they are (1) local, (2) local but expensive (e.g. zone 3 calling), (3) local toll, or (4) long distance. Most operators will just say "local" for #1 and "toll" for #2, #3, and #4 to avoid confusing the customer, but the distinction is important for us.

 

-II.F- POP, phone line, and network quality issues

Not all POPs are created equal. WebTV requires that all POPs we use are capable of 28.8Kbps communication, and we take steps to ensure that there is adequate network capacity between our IAPs and us. Even so, there are cases where an individual POP or individual user will see substandard performance. This section provides a quick overview of symptoms and their causes.

The most common problems are in the user's house or apartment. Line splitters, large numbers of phones on the same line, phone extenders that plug into an A/C power outlet (commonly used with DSS systems), and old wiring are common sources of problems. They can interfere with the phone line, resulting in slow connections.

The initial connect rate shown on the tricks-info page and in the phone logs doesn't tell the whole story. One of the features of modern modems is that they will "negotiate down", or start talking more slowly, if a lot of errors are detected. This is done because the modems are less susceptible to disruption at lower speeds. If the line conditions improve, the modem will negotiate back up. Unfortunately, we have no way to monitor the current speed or know the lowest speed used, so it's difficult to identify problems just by looking at the initial connect rate.

Even so, if you see connections being established at 21600bps or lower, there's a good chance that the user's phone connection is poor. If many users are reporting similar troubles with that POP, and you connect at a slow rate when calling the same POP from here (you can do this with Vend-A-Telly, described in a later section), there's a chance that the POP itself is poorly connected.

Most phone companies won't guarantee connect rates of 28.8Kbps or higher. Pacific Bell only guarantees 4800bps, which is pretty pathetic.

The box will refuse to connect at less than 14.4Kbps, but could conceivably negotiate lower. It may be possible to disable downward negotiation below 14.4, but it's not clear that this is always desirable.


In the very early days, before the service went public, we displayed the connect rate right below the WebTV logo that you see before you get to the home page. The information was removed to avoid being swamped with calls from customers wondering why they weren't getting the full 33.6Kbps connections that they paid for. (It turns out that you can see the connect rate by putting "&rate;" on any web page, so the Funk service was changed to trick the box into thinking that it's always connected at 57600.) The reality is that not all IAPs have POPs that go above 28.8Kbps, and even then, most 28.8, 33.6, and 56K modem users don't get the speed they would hope for (26.4, 31.2, and 42K are much more common) because of noisy phone lines or other external factors. The reviewers of some 56K modems were unable to get actual data rates above 44K with even the best of modems. The worst couldn't break 30K.

When LECs won't even guarantee 14.4Kbps, it's impossible for WebTV to guarantee anything higher. We should make every effort to determine the cause of poor performance, but some things are beyond our control. If the user has a PC with a modem that has no trouble connecting, try to get the WebTV box configured as close to what the PC does as possible, or ask the user to have the PC call the POP that the WebTV box is calling. They don't need to log in, just call the POP with the PC modem and watch the connect rate.


There's more to POP quality than just modem connect speed. Everything that the box receives has to be sent from our servers, across either the Internet or a private network connection to the IAP, from the IAP to the terminal server at the POP, then out through the modem and down to the user's box. The modem speed is a good place to start, but it's also important to consider the network performance.

It's difficult to get a simple performance number out of the network connections, because they may hit peaks where traffic grinds to a crawl for short periods, may exhibit spasmodic behavior with bursts of activity followed by long periods of silence, or may just move at a steady snail's pace. The easiest way to check the performance is to try to download a large image file (say a 150K GIF or JPEG) and see how long it takes to arrive. This feature is also provided by Vend-A-Telly.


An issue related to POP performance is line drops. There are a number of reasons why the box might suddenly disconnect from the service, some of which are discussed in a later section on "idle timeouts". Disabling or reducing the sensitivity of call waiting in the Dialing Options screen resolves most problems with unexpected disconnects.

The cause of some of our troubles with call waiting is that the box doesn't detect the call waiting "bong" accurately. Any substantial disruption, such as a random burst of noise on the line or somebody picking up an extension phone, will be interpreted as an incoming call. Adjusting the sensitivity setting will reduce false-positives and missed calls, but for many customers the system is not 100% reliable, and never will be with the modems built into WebTV "Classic" boxes. It appears that "Plus" boxes will be similarly unreliable.  See also "LineShare is a Harsh Mistress".

Some line drops don't go away with the call waiting setting. There have been cases where the IAP's modems dropped the connection when a significant amount of line noise was detected, regardless of the setting on the WebTV box. This can usually be corrected by the IAP.


"Black hole" is the WebTV term for a POP that accepts modem connections but is unable to carry network traffic between the box and service. The tellyscript believes it has made a successful connection, but the box is unable to do anything after getting connected. Early boxes (pre-client 1.1) would connect to black hole POPs and stay there until disconnected by a timeout or an impatient user. As of client 1.1, the box will try to connect to the service for a minute and a half. If it is unable to get a response from the headwaiter in that time, it will disconnect, then restart the tellyscript at the point where it left off.

(There was a fun bug related to black holes, where a box would get connected successfully but not realize it. This usually happened during registration. After being connected for about a minute and a half, the box would spontaneously disconnect and redial the service.)


More information on diagnosing and correcting the above should be available from Customer Care. This document is long enough without having a complete troubleshooting guide in it as well.

 


-III- Service Mechanisms

-III.A- Dial overrides, Satan, and you

Dial overrides are a quick and easy way to send somebody to a particular number with a specific dial pattern. Unfortunately they're a little too easy. They can solve a problem (or at least placate a customer) quickly, but they don't go away when the underlying problem gets solved. In general dial overrides are a Bad Thing, and alternate solutions should be used whenever possible.

In the early days of the service, there was no such thing as a dial override. Because there was no quick solution, the problems were fixed in other ways, or were analyzed until it was determined that the problem was unrelated to the POP number being dialed. This was time-consuming but very effective at identifying the root cause of problems.

The issue that drove the existence of dial overrides was that some customers bought special calling plans through their phone company that allowed them to call a specific region or number for a flat rate per month. If the PhoneDB got updated, and their primary number changed, they would no longer be dialing the preferred number. We needed a way to send people to a specific area.

The initial solution wasn't pretty, but it was the best that could be done with the available facilities: the user's ANI of record was changed to an NPA/NXX that had the target POP as the primary. Since there were only two IAPs (cnc and uunet), and load balancing was a distant dream, this worked fairly well. Unless, of course, the box lost power, and the user said "yes, I've moved".

Clearly we needed something else. The first version of dial overrides was added a few hours after a service release had frozen, because by consensus it had been placed on the C-grade "would be nice" list, and wasn't really supposed to be done at all. Consequently it was done in a big hurry, in mid-November 1996. The database held a single override that had an ANI, a provider name, and the exact string of digits needed for dialing the POP. If the ANI matched, we sent a tellyscript for that POP and provider, complete with a warning dialog. This mechanism quickly became popular, and eventually support for it was added to the CMR tool.

With a little experience it became clear that the mechanism was insufficient. You couldn't put in an override for a box behind a store's PBX, because the ANI value might be different each time the box logged in. You couldn't override to an 800 number because the warning dialog would show the 800 number (this is a bad thing, as explained in a later section). The override didn't go away if the POP went away. And you couldn't have the override go dormant if local coverage was added.

The second generation of dial overrides provided for these, mostly. It was again done at the last minute and at a low priority, in January 1997. Nearly a year later the CMR tool still couldn't parse the new format, and some of the features -- like disabling the override when the POP goes away -- weren't implemented. The only way to do the new-style overrides is with "clientpopedit" (the first version of which, incidentally, was a truly frightening piece of work).

The third generation of dial overrides was designed with ample time and at a high priority, and added to the service in mid-1999.  All requested features were implemented, including expiration dates, "negative" overrides, and the flags that were defined but not supported in the previous generation.


As of the Etude service, you can specify more than one dial override for a customer.   These can be tied to different values for the ANI -- perhaps they have one override at home and one override for when they're at Grandma's house -- or several of them can be for the same place.  If the latter, each override will be attempted in turn, up to a limit specified in the config file (see the discussion of dial sequence templates in III.C for details).

One interesting "feature" of overrides is that they are bound to a box, not to a subscriber. If a user swaps a box because of defects and has their account moved over, the dial override doesn't move with them. This isn't necessarily a bad thing, because the dial override might have been entered as part of diagnosing a problematic box. When the old box is "unregistered" prior to adding a new account, the dial override is purged automatically.  Be very careful when handing dial overrides to boxes that haven't yet registered!  If the box goes back to the store, it won't need to be unregistered, and the dial override won't get purged automatically.   For this reason, entering a dial override that doesn't depend on ANI is a bad idea.


Whatever fancy features are available to manage overrides, the rule of thumb remains: don't use them unless you absolutely need to. And the only valid reason for needing to are for users with specific calling plans that we can't take into account otherwise.

Some common abuses of dial overrides are:

Like the saying says, "if you don't have time to do it right, when will you have time to do it over?"  Every dial override that gets added also has to get removed, because sooner or later that POP will go away or more local numbers will be added or whatever. If everybody gets overridden to POP #2 when POP #1 gets congested, the load balancing algorithms can't do their work, and pretty soon POP #2 is going to be congested and all those people are going to be calling you all over again.  Expiration dates and "don't use the override if the POP goes away" will reduce the amount of maintenance, but if customers believe they can get a "quick fix" by calling, they will call every time the network hiccups.

Avoid quick fixes that just postpone the inevitable.

 

-III.B- Introduction to tellyscripts

A tellyscript is a C-like program that is interpreted by the box. Their most important and most obvious function is to tell the box what numbers to dial, but they do a lot of other work besides.

Most communication software use what are known as "send/expect" scripts. Send/expect scripts send a particular string, and then expect a certain response. The MacPPP configuration is a simple example: generally you send a dial string, expect the word "Login:", send your user name, expect "Password:", and then send your password. The fancier versions will allow you to expect one of several different responses, and perform different actions based on what you get back.

Andy Rubin thought this was a little simple-minded, so he combined the send/expect concept with a minimal C interpreter, and named the result after a product from his former company (General Magic). The result was a program that could do all the usual sending and expecting, but with the flexibility of C code.

The current batch of tellyscripts will:

Each tellyscript is divided into four sections. The pieces are combined on the service, and the full script is then tokenized and compressed before being sent to the client. On disk, the files are named ".tsf", which stands for TellyScript Fragment. The four sections are:

base.tsf
Common functions.
locale.tsf
Country-specific features (e.g. Japanese connect messages).
<iap>.tsf
One or more tellyscript fragments, one per IAP. These are named after the IAP, so CNC's .tsf file would be called cnc.tsf. These are very short; usually they just have the IAP's Radius login info.
<generated>
Tellyscript code generated on the fly. This is where the actual phone numbers and "this may be a toll call" warnings go.

The combined size of the four sections is about 45K when in C code form. This boils down to about 12K when tokenized, and 5K when compressed.  WebTV "Classic" boxes are limited to 16K of space for the tellyscript, because that's all that the boot ROM (based on the 1.0 client) can hold.

In late August 1998, we discovered that some of the tellyscripts we were generating were too large for the box to handle.  The situation involved scripts that had two months worth of POPtimized data and toll warning messages for three or more POPs.   The situation was rare, but not rare enough.  The Funk service was given the ability to use C-style "ifdefs" to exclude LC2-only sections of the script, reducing the FCS script size significantly, and the tellyscript generator was changed to retry without POPtimization when the script overflows.

It should be mentioned that service-side detection of script overflow is somewhat speculative.  The 16K of space available to the script is used for the code, heap, and stack.  It's not possible for the service to know exactly how much stack and heap the script will use at its peak, so it uses a conservative hard-coded notion of how much space a script really has.

Script overflow is a decidedly dangerous occurrence, because tellyscript crashes almost always crash the box.  When an FCS box is preparing to do a flash download, it sets a "start download on next reboot" flag in NVRAM, and then reboots. Upon booting, it immediately executes the tellyscript.  Because there is no opportunity to enter a secret code, there is no way to unset the "start download" flag or remove the tellyscript from NVRAM. A box with a bad tellyscript that is told to do a download cannot be recovered without attaching a Pekoe development unit to it.


When the service sends a script down, it saves a blob of information in the service that looks like this (broken in half for readability):

0x358856F8-0x1892aa20-base:54:-|locale:3:-|_F_wpb:0:3261095|_H_uunetdan:0:6872187|
  _F_wpb:0:3261095|_H_uunetdan:0:6872187|_H_ziplink:0:2230067|_H_ziplink:0:16502230067

Translated into human-readable form, it looks like this:

    Hash 0x1892aa20, sent Wed Jun 17 16:53:28 1998
    v54 - base/-
    v3  - locale/-
    v0  F wpb/3261095
    v0  H uunetdan/6872187
    v0  F wpb/3261095
    v0  H uunetdan/6872187
    v0  H ziplink/2230067
    v0  H ziplink/16502230067

The "vN" part tells you what version of the base scripts was sent down. We gave the user version 54 of base.tsf, and version 3 of locale.tsf. There are no versions recorded for individual providers, like wpb, uunetdan, and ziplink, so they are always displayed as zero (before Etude, there were "real" numbers there). The letter after the version is the POP billing type ('F'=flat rate, 'P'=per-port, 'H'=hourly rate, 'G'=generic). The "sent Wed Jun ..." part tells you when the script was sent down.

The numbers after the providers' names show you the exact string of digits that the box is going to dial. In the example, the user has the wpb/650-326-1095, uunetdan/650-687-2187, and ziplink/650-223-0067 POPs. He will use 7-digit dialing on the wpb and uunetdan attempts, but will try 7-digit dialing on the first ziplink attempt and 11 on the second. This user has apparently established a 7-digit dialing pattern for the wpb and uunetdan POPs, but hasn't yet determined the pattern to use for the ziplink POP.

If the user were given a toll warning message, the first line for the provider would look something like this:

    v0  F wpb/3261095 {toll warning sent}

and "_F_wpb:1:3261095" would instead be "_WF_wpb:1:3261095" (i.e. with a 'W' up front).

The "Hash 0x4abf9aa7" part is the key to getting tellyscripts updated. This number is a (hopefully unique) representation of the big blob. It's sent down to the box with the tellyscript and handed back up on every connection. When the box reaches the headwaiter, we recompute the tellyscript that they should have, and compare the new hash value with the box's hash value. If any part of the blob changes, the new "hash" value will be different, and we know that they need a new script.   (Technically, the big blob you see isn't the one we hash.  The

This means that if a provider or dial pattern changes, a tellyscript fragment gets updated, or a toll warning dialog is added or removed, the service will automatically send the box a new tellyscript. Since the box tells the service what it has, there's no risk of the service thinking that the box has a different tellyscript than it actually has. (Which, incidentally, is a real problem, because the box doesn't save the tellyscript into NVRAM until the box is powered off with the remote control or keyboard. If the box crashes or loses A/C power before the tellyscript is written, or the user hits the reset button on a "Classic" box, the previous tellyscript will be used on the next connect. For this reason, the service tracks the two most recently sent tellyscripts.)

Most people don't need to understand the above in depth. Either trust that the system works, or read the above until you're convinced (one way or the other).

We currently don't send new tellyscripts on reconnects (i.e. when the box was connected to the service, disconnected, and then reconnected by hitting the "reconnect" button).  This feature was initially added for Funk, but it was determined that the 2.2 client completely ignored the script.  To prevent the server from updating the script-info fields when the box wasn't updated, the feature was removed.

 

-III.C- Dial sequences and the fallback number

Once we've chosen the POPs and checked the available dial patterns, we have to dial the phone. We know which POP to try first, but should we do the first POP twice in a row and then do the second, or alternate between the first and second? What if we have one POP or three POPs?

The dial sequence depends on how many POPs they have and what kind of call each is. In every script, we bail out when we connect successfully or if we are unable to detect dialtone before dialing. "Black holes", where we connect successfully but then are unable to talk to the WebTV service, are handled specially (see the end of section II.F).

Before the Etude service, we were locked into a specific call order.  With Etude, the call sequence is configurable.  The default configuration works just like the hard-coded values from previous services releases, except that it can now take advantage of three local POPs.  The settings, defined by DialSequenceTemplate lines in the arcadia.cf config file, look like this:

    DialSequenceTemplate 1l 1a,1b,1a,f
    DialSequenceTemplate 1t 1a,1b,1a,f
    DialSequenceTemplate 2ll 1a,2a,1b/1a,2b/2a,f
    DialSequenceTemplate 2lt 1a,1b/1a,2a,2b,f
    DialSequenceTemplate 2tt 1a,1b/1a,2a,2b,f
    DialSequenceTemplate 3lll 1a,2a,1b/1a,2b/2a,3a,3b
    DialSequenceTemplate 3llt 1a,2a,1b/1a,2b/2a,f
    DialSequenceTemplate 3ltt 1a,1b/1a,2a,2b/2a,f
    DialSequenceTemplate 3ttt 1a,1b/1a,2a,2b/2a,f

Each line has two parts, the call type identifier (the "3llt" part) and the call order definition (the rest of the line).  The call type identifier consists of the number of calls (1, 2, 3, or 4) followed by an 'l' or 't' for each call.  For example, "3llt" means three calls, the first two of which are local.  You aren't allowed to specify just one member of a set, so if you list "3lll" you must also specify "3llt", "3ltt", and "3ttt".  Local calls are always tried first, so there is no need for something like "3tlt".

A tellyscript will have at most as many different POPs as are possible with the highest-numbered call type identifier.  If the only lines in the config file are "1l", "1t", "2ll", "2lt", and "2tt", then users will get at most two POPs.  If we want to start delivering tellyscripts that allow up to four POPs, all we have to do is add appropriate DialSequenceTemplate lines (4llll, 4lllt, etc).

When the tellyscript generation functions are assigning POPs, they look at how many total POPs are available (usually eight) and how many of them are local calls.  If they have the usual eight POPs, and two of them are local, the tellyscript generator will use the dial sequence template "3llt".  If none were local, "3ttt" would be used.

The second half of the line defines the call sequence.  The numbers you see in the example above refer to the POP number (1, 2, 3, 4, or 'f' for failover).  The 'a' or 'b' refers to the dial pattern.  There are one or two dial patterns that we will try for every POP.  If we have established a specific dial pattern for the user, we will always use that one.  The 'a' in the DialSequenceTemplate line means "use the dial pattern that we know works for this user and this POP, or use the first one that we would normally try".  The 'b' means "use the alternate pattern that we would normally try, if it exists".  Each dial attempt is separated by a comma.  Slashes indicate that one or the other should be tried, so "2b/2a" means "dial POP #2 with the alternate pattern, if one exists; if only one pattern exists for this POP, then dial POP #2 with the primary pattern".   If "2b" is sitting all by itself between commas, and there is no alternate dial pattern, the entry is skipped.

Some examples are in order.  If we only have one POP, we will use the "1l" or "1t" template, both of which are set to "1a,1b,1a,f ":

  1. try number
  2. IF we have a secondary dial pattern, try it; otherwise skip this step
  3. retry number
  4. call 800 fallback

If we have two POPs, and both are local, we use "2ll", which is "1a,2a,1b/1a,2b/2a,f":

  1. try pop#1
  2. try pop#2
  3. retry pop#1, using secondary dial pattern if it exists
  4. retry pop#2, using secondary dial pattern if it exists
  5. call 800 fallback

If we have two POPs, and one or both are toll, we use "2lt" or "2tt", which are set to "1a,1b/1a,2a,2b,f":

  1. try pop#1
  2. retry pop#1, using secondary dial pattern if it exists
  3. try pop#2
  4. IF we have a secondary dial pattern for pop#2, try it; otherwise skip
  5. call 800 fallback

The examples above are unchanged from the pre-Etude behavior.  With Etude, we now have different behavior for users with three or more local POPs.  They will use "3lll", which is set to "1a,2a,1b/1a,2b/2a,3a,3b":

  1. try pop#1
  2. try pop#2
  3. retry pop#1, using secondary dial pattern if it exists
  4. retry pop#2, using secondary dial pattern if it exists
  5. try pop#3
  6. IF we have a secondary dial pattern for pop#3, try it; otherwise skip

The instances where we have lots of POPs but less than three are local (3llt, 3ltt, 3ttt) are configured to be similar to the 2lt and 2tt cases.  The third POP isn't used. NOTE: there is no fallback number for users with the "3lll" dial sequence template.  If all three of their local POPs are congested, chances are the fallback number is congested as well.  Avoiding the fallback saves us money.

In no case do we try more than 6 numbers, and we don't try a more expensive number more than once unless we're trying to figure out what the correct dialing pattern is.

Dial overrides obey the dial sequence templates, just like normal POPs do.  If two entries in a box's dial override list apply, they are fed into the tellyscript generation process as two separate local POPs.  Whatever dial sequence applies to "2ll" will apply to the override.  One override would use "1l", three would use "3lll", and any overrides after the third won't get used.

 

We show the toll warning dialog before the first time we call an ExpLocal or toll POP. The warning contains the number to be dialed and the city name where the POP lives, using the "nice" form of the city name.


The toll-free fallback number, sometimes called "fallover" or "failover", has been around since the early days of dialing. The idea was to prevent certain kinds of failures, such as POP outages or number assignment glitches, from giving the service a bad name.

It is important to remember that nowhere in the Terms of Service does it guarantee connectivity, and we have never promised customers that they would have unlimited toll-free access at our expense. The fallback number is supported as a courtesy, and may go away or have its use restricted at any time and without notice.


The 800 fallback number will be omitted in certain circumstances. The most significant one is called the "AllTollNoRoll" feature. It was added because some users without local POPs had, strangely, neglected to order long distance service on the phone line connected to their WebTV box. Every POP number would fail, until the box called the fallback number. The easiest way to avoid this situation was to leave the fallback number out of tellyscripts for users with nothing but toll calls.

A similar situation existed for a customer with phone service that only allowed calls to 800 numbers and 911 (Universal Lifeline Service?). In this case, not even local calls could be made, so despite having two local POPs the user ended up on the fallback number every time. The cure for such users (besides asking them to get a real phone line) is the "disable fallback" flag in the customer's account. It should be possible to set this from the CMR tool.

Of course, it's always possible for users to disrupt the dialing sequence several times until the box dials the 800 number. For most people this is unnecessary and inconvenient: if they didn't have (in CCMI's and our opinion) a local call, they wouldn't have the fallback number in their script, so either they'll never get to the fallback number or they're trying really hard to avoid making local calls. We can identify such users through usage reports, and deal with them on an individual basis as necessary.

You might be tempted to get rid of the "all toll no roll" feature now that we have the DialSequenceTemplates.  After all, you could just drop the 'f' from the 1t, 2tt, and 3ttt lines.  However, AllTollNoRoll only drops the fallback from people who have nothing but toll calls.  If you have one ExpLocal call and two toll calls, you are still allowed to call the fallback number.  With the DialSequenceTemplate, one ExpLocal and two toll calls evaluates to "3ttt", just like three toll calls would.

A related feature in the service is the 800 fallback usage cap. This is explained later.

Allowing calls on the fallback number to be billed at an hourly rate for customers without local POPs has been suggested. It may be implemented in a future release of the service.

 

-III.D- The "clientinfo" command

The "clientinfo" tool is a UNIX shell command. It got its name because the database DEVICE table entries are referred to as "Client" structures in the service. The tool was written as a quick way to dump certain fields from the Client structure, but it has grown beyond that.

(For those of you not up on your database lingo, the "device" entry is linked to a physical box, and has a "subscriber" associated with it. When you move a user's account from one box to another, you are changing the link to make the subscriber associated with a different device. The device entry is usually created as part of the manufacturing process so that we can get the back-of-unit serial numbers into the database, but if it doesn't exist it will be created by scriptlessd when the box first connects. The subscriber is always created by registerd when registration is complete.)

There are several sections in the clientinfo output. The first is the PhoneDB version info:

PhoneDB v32 (built Mon Jun 15 23:50:43 1998 by uid=1637)
 USA      hash=0xe0150e09 (built Mon Jun 15 23:31:23 1998 by uid=1637)
   standard (built Mon Jun 15 23:31:24 1998 by uid=1637)
     Features: [CCMI] [com] [ld-avg] [wlca] [zd] [lec]
     PhoneDB-etude-standard.1998.06.15.v32
   mci (built Mon Jun 15 20:37:23 1998 by uid=1637)
     Features: [CCMI] [com] [ld-avg] [wlca] [zd] [lec]
     PhoneDB-etude-mci.1998.06.15.v32
 Japan    hash=0xeb699e85 (built Wed May 27 15:01:15 1998 by uid=1057)
   standard (built Wed May 27 15:01:15 1998 by uid=1057)
    PhoneDB_ShowHeaderInfo()  -/-
    PhoneDB.c:421             (unknown)[6691,0]                 06/17 17:55:03

This tells you what version the PhoneDB is, when it was built, who built it, and what features were enabled. You don't usually need to worry about this, but keep an eye out for bad dates. If the "built" date on the first line is more than a couple of weeks old, you may be looking at stale data.  The PhoneDB used to generate the above has the standard POP assignments for the USA, the MCI co-branded service POP assignments for the USA, and the standard POP assignments for Japan.  (The use of "USA" is actually a misnomer; it should be "NANP", since it encompasses all of the North American Numbering Plan.  Canadian POP assignments are included as part of the "USA" data.  The terminology may be corrected in a future service release.)

Next comes the options header:

--- Client info for serial '01100f7401000004' ---

  ANI ....................... 00 650-614-5539 (PALO ALTO, CA)
  Device ID ................. 58306
  Tellyscript User ID ....... 'f0c68d5e1'
  Shared secret ............. 'mvDWLQ5i76M='
  CHAP secret ............... '9;jYRoKRbV;5I8hf1X'
  Script locked? ............ no
  Fallback disallowed? ...... no
  Revisit scriptlessd? ...... no
  Avoid non-local calls? .... no
  Okay to use ATT POPs? ..... no
  Okay to use 56K POPs? ..... yes
  Show VideoAd? ............. yes
  Call Waiting Threshold .... 1
  PSI account ............... 1
  AppROM/bootROM versions ... v3063/v2046
  ChipID/SubType/RomKind .... 0x03120000 / 'lc2' / 02012001
  Service tier .............. 1
  Last successful connect ... 326-1095
  Category .................. alpha

The fields are:

ANI
The caller's ANI, and our best guess about the city and state they're calling from.
Device ID
A unique identifier used when connecting to the service.  This replaces the "short silicon serial number", which was only guaranteed to be unique when it was based on a certain kind of serial number part.  The silicon serial numbers for the Sega Dreamcast product were going to be completely different, so we added these.
Tellyscript User ID
The "short silicon serial number".  This is the SSID without the identifying bits and checksum.  Manufacturer 0x0f (stored as 'f0' at the front of the ID) is reserved for "generic" SSIDs and simulators.
Shared secret
Set by the service on every login. Nothing to do with dialing... yet.
CHAP secret
Set by the service periodically (perhaps every 30 days). Think of this as a password used for CHAP-based POPs, like MCI's. A copy of the secret is embedded in the tellyscript, and used when connecting. If the CHAP secret held by the box and service get out of sync, the box will be unable to use a POP requiring CHAP authentication.
Script locked?
Set by scriptlessd. Indicates that scriptlessd handled the box specially for some reason, and doesn't want the headwaiter to send down a new the tellyscript.
Fallback disallowed?
Set by SOC. Prevents the box from getting the fallback number in a tellyscript. Useful for abusive people who deliberately make their box fail so they can get access on an 800 number.
Revisit scriptlessd?
Set by SOC, cleared by headwaiterd. If set, the box will reboot and go back through scriptlessd the next time it connects to the headwaiter.
Avoid non-local calls?
Set by customer care. If a box has at least one local POP, it will refuse to dial non-local numbers and the fallback number.  If it has no local POPs, the setting of this flag is ignored.  First implemented in Funk.
Okay to use ATT POPs?
Set by somebody. A temporary feature, part of the MCI rollout. If this flag isn't set, MCI customers will use normal WebTV POPs instead of MCI POPs. The intent was to allow a gradual transition to MCI.  The field was renamed from "MCI" to "ATT" while we were working on the AT&T deal, but that fell through, so the flag currently serves no useful purpose.
Okay to use 56K POPs?
Set by SOC.  The description is slightly inaccurate; what this really does is determine whether or not the box is allowed to negotiate a 56K connection.  Even if set to "no", the box can still connect to a 56K POP, but it will do so at a maxmium of 33.6Kbps (the maximum allowed by V.34).
Show VideoAd?
Set every time a tellyscript is generated. It indicates whether the use is eligible for a VideoAd, based on the set of POPs they have been given. (See also section IV.O.)
Call Waiting Threshold
Set when a phone log is posted. This was originally intended to be a way to play with different call waiting settings, but that never happened. Instead, it indicates the call waiting sensitivity value set on the box. The number may be from 1 to 4, with 1 meaning "most likely to hang up" and 4 meaning "most likely to ignore calls". This field does not indicate whether the user actually has the "lineshare" feature enabled. The box default is 1.
PSI account
Set once, when a PSI account is created. (See also section IV.G.)   [No longer used.]
AppROM/bootROM versions
The approm and bootrom versions that the box had when it last connected to the service.
ChipID/SubType/RomKind
Set whenever the box connects to the service (but not expected to change). The values from the ChipID register on the box, followed by our best guess as to whether it's an FCS or LC2 box, followed by a magic value that tells you exactly what kind of box it is. The CMR tool should have a lookup table that converts RomKind into something like "LC2 8MB with a hard drive and softmodem".
Service tier
Used for devices with multiple tiers of service, such as the Thomson eTV.
Last successful connect
Set when a phone log is posted. This holds the contents of the "dialString" field from the last DialInSuccess record found in the phone log. More simply, this is what the user dialed the last time he got into the service. The prefix fields (like "dial 9") aren't included.
Category
Set by CMR or other database tools. This is the user's category. Note there are two copies of the category, one in the "device", and one in the "subscriber". If the box in question has a subscriber, and the category there doesn't match what's in the device, an appropriate message will be displayed.

Also, the "client info for serial" line will indicate if the user is enrolled in optional programs like MCI and OpenISP.

After that we see the tellyscript description. We saw one of these earlier:

  Most recent script sent to client:
    Hash 0x1892aa20, sent Wed Jun 17 16:43:01 1998
    v54 - base/-
    v3  - locale/-
    v0  F wpb/3261095
    v0  H uunetdan/6872187
    v0  F wpb/3261095
    v0  H uunetdan/6872187
    v0  H ziplink/2230067
    v0  H ziplink/16502230067
  Previous script sent to client:
    Hash 0x1892aa20, sent Mon Jun 15 14:03:35 1998
    v54 - base/-
    v3  - locale/-
    v0  F wpb/3261095
    v0  H uunetdan/6872187
    v0  F wpb/3261095
    v0  H uunetdan/6872187
    v0  H ziplink/2230067
    v0  H ziplink/16502230067

After that we have the set of known dial patterns:

  Established dialing patterns:
    ANI 650-614-5539 + POP 650-326-1095 --> mode=7-digit

Naturally we don't have any dial overrides, but if we did, one might look like this:

  Dialing overrides:
  N M ANI            POP          Bil Dlg Lnk ONL Provider    Digits
  - F/650-614-5539   650-326-1095  G   N   Y   N  wpb         '3261095'
                                  Modified: 1999/03/18  Expires: <never>

"ANI" is the caller's ANI, preceeded by a character that says how to compare against what's in the account.  'F' tells it to compare the Full number.   "POP" is the full 10-digit POP number.  "Bil" is what we tell the ISP to bill the call as; "Dlg" is set if a warning dialog should be sent; "Lnk" means the override should be linked to the POP, and should go away if the POP goes away; and "ONL" is set if the override should be used Only when the user has No Local POPs. The "Provider" field says who owns the POP, and "Digits" is the actual string of digits to use. "Modified" is the date the override was last changed by clientpopedit, and "Expires" is the expiration date.  The output is identical to what "clientpopedit" shows. You can find out more about dial overrides in the Dial Override Handbook.

After this comes the load-balanced POP assignments for this user (you can see the non-load-balanced version on the POP-O-Rama page):

    wpb/650-326-1095 (PALO ALTO, CA) 0.0mi  cost=0 (wc=0.0mi)
      (tries 326-1095 then 1-650-326-1095)  LOCAL
    uunetdan/650-687-2187 (PALO ALTO, CA) 0.0mi  cost=0 (wc=0.0mi)
      (tries 687-2187 then 1-650-687-2187)  LOCAL
    ziplink/650-223-0067 (PALO ALTO, CA) 0.0mi  cost=0 (wc=0.0mi)
      (tries 223-0067 then 1-650-223-0067)  LOCAL
    cnc/650-687-0610 (PALO ALTO, CA) 0.0mi  cost=0 (wc=0.0mi)
      (tries 687-0610 then 1-650-687-0610)  LOCAL
    pbi/650-363-1099 (REDWOOD CY, CA) 0.0mi  cost=0 (wc=0.0mi)
      (tries 363-1099 then 1-650-363-1099)  LOCAL*
    psi/650-390-0900 (MOUNTAINVW, CA) 0.0mi  cost=0 (wc=0.0mi)
      (tries 390-0900 then 1-650-390-0900)  LOCAL*
    cnc/650-423-0025 (REDWOOD CY, CA) 0.0mi  cost=0 (wc=0.0mi)
      (tries 423-0025 then 1-650-423-0025)  LOCAL
    cnc/650-481-0896 (REDWOOD CY, CA) 0.0mi  cost=0 (wc=0.0mi)
      (tries 481-0896 then 1-650-481-0896)  LOCAL

(In Etude, the cost and distance values ended up getting zeroed. They may return for the HipHop release. Until then, POP-O-Rama has the correct values.)

We currently compute eight entries for every NPA/NXX. The load balancing is explained later. Each pair of lines has most of the information included in the POP-O-Rama output, but in a slightly different format. See secton II.E for information about POP-O-Rama output.

After this is the POPtimization data, when available:

  POPtimized assignments:
  MONTH Oct 1997
    DAYS SMTWRFS
      TIMES 00:00 - 00:00
        POP 1 0:650-326-1095 conn=F
        POP 2 1:650-687-0610 conn=P
        POP 3 2:650-687-2255 conn=H
  MONTH Nov 1997
    DAYS SMTWRFS
      TIMES 00:00 - 00:00
        POP 1 0:650-326-1095 conn=F
        POP 2 1:650-687-0610 conn=P
        POP 3 2:650-687-2255 conn=H

This is also explained later (sections IV.Q and V.A).

The nice thing about clientinfo is that it tells you what they are dialing, what they were dialing, and would they would be dialing, all in one place. POP-O-Rama can show you the set of POPs that the service has to choose from for a particular area, but can't tell you which ones will be given to a specific user, because the actual assignment depends on the box serial number and POPtimization data.


A potentially useful option for SOC folks is the "-t" flag, which causes clientinfo to write the tellyscript to stdout. If you want to see what tellyscript the user would get if they showed up right now, run "clientinfo -t {serial-number} > script.out". The output is tokenized but not compressed, so it's hard to read but you should still be able to find the phone numbers. "strings -a script.out" may be helpful. Note that there are always two copies of the phone number, a 10-digit version with dashes (e.g. 650-326-1095) and the actual number dialed with no dashes (e.g. 3261095). If you're trying to tell whether or not a dial pattern has taken hold, be sure you're looking at the right set of numbers.  The full "dash-enabled" version is only used for the phone log.

 

-III.E- Vend-A-Telly

Vend-A-Telly is a web page attached to the "WebTV Tricks" page in the service. From there you can tell your box to dial any POP from any provider. You can even include modem AT commands as part of the dial string; these may override some of the features that are usually set by the box, so use only with caution.

The page should be used whenever a POP is suspected of being flaky or slow. You can enter the POP number, dial in, check the connect rate, and download a large test image to see if the network is slow.  As of the Grunge service, you can even have the phone log displayed when you reconnect.

If the POP is dead or deathly slow, DO NOT give the user a dial override unless you leave an open trouble ticket in Remedy that will allow somebody to remove the override when the POP gets better. Only when the override is removed should the matter be marked as "resolved". Network congestion is a fact of life; moving users between POPs will most likely just make the problem move with the users.

Troublesome POPs should be reported to the SOC.

 

-III.F- Visible Dialing

The current generation of WebTV boxes will display the phone number being dialed as part of the connection progress messages. In the early days, because of some weird sense of paranoia, the box didn't tell you what it was dialing. (This same paranoia accounted for the XXXXs over the last four digits of the phone numbers in the WebTV Phone Book on our web site.  Those vanished when Etude shipped.)

Version 1.1 and later clients support "visible dialing", where we show the phone number to the user as we dial. It got its name because there was concern that showing phone numbers was a user interface aberration, and people would become greatly disturbed if the deep inner workings of the box were revealed. For this reason we only displayed the phone number when "Audible Dialing" was turned on; hence the nickname "visible dialing".

As it happens, people really like knowing what the box is doing with their phone line, and are better able to identify local/toll problems before they get a huge phone bill. In some cases though we want to mask the phone number, such as when calling a toll-free number. Here are some examples:

visible dialing off (also v1.0 clients and "Classic" boxes doing upgrades):
"Dialing WebTV"
normal case:
"Dialing 16506145539"
normal case, with a prefix of "9":
"Dialing 9,16506145539"
dialing a toll-free POP (e.g. the fallback number)
"Dialing WebTV..."
access number "324-0657" used:
"Dialing A/N 324-0657"

Toll-free POPs have numbers starting with "1800" or "1888". (Yes, it's checked before the "remove leading 1" function is handled.) If someone puts in an override with clientpopedit that starts with "1-800" instead of "1800", the user is going to be able to see the number. Appropriately nasty warnings have been added to clientpopedit.

The call waiting disable prefix will also be shown. If you have too many numbers to display in the field, the end will be cut off, and "..." will be displayed.

 


-IV- Dialing Details

-IV.A- PhoneDB details

You may have noticed when looking at POP-O-Rama that the POPs aren't always sorted in the order you'd expect. In a boring world we could sort by cost and distance be done with it, but in the exciting world of WebTV we don't have that luxury.

The first complicating factor is the amount that the provider costs us to use. Some providers are less expensive than others, or simply have more capacity, and as a result are given a higher priority during PhoneDB generation. Some POPs from the same provider may be more expensive than others. This cost is sometimes referred to as a "static priority". If two calls have the same cost and MTS distance, we sort based on the provider cost.


A second factor is failure containment. If one of our major providers had a serious network outage affecting half the country, it wouldn't be very useful for a user to have a tellyscript with several POPs from the same provider. If a backbone gets backhoed, all the POPs are going to be useless. For this reason we try to hand out POPs from multiple providers whenever possible.

Priority is given to leaving the primary provider in place, but the later POPs are shuffled around freely so long as they are listed as LOCAL calls and the provider costs us the same amount. We try to get a mix of different providers in the first few POPs, so that users will have numbers from more than one IAP whenever possible. This is known as "provider interleaving".

Toll and ExpLocal calls aren't subject to provider interleaving.


One of the more troublesome aspects of all this POP shuffling is dealing with providers who charge us a flat rate per user. Every month, certain IAPs charge us a fixed amount for each user who touches their system, even if the user only logged in once. If we gave a flat-rate IAP as a secondary POP to a customer with a very good primary, and the primary failed once at any point during a month, we would have to pay the full charge for that user for that one call. Clearly, we only want to give flat-rate IAPs out as primaries.

This is where things start to get messy (it gets worse in the next section). Ensuring that the second POP isn't a flat-rate IAP can require making some tough choices. For example, suppose that the first three POPs listed for an NPA/NXX are an hourly-rate LOCAL, a flat-rate ExpLocal, and an hourly-rate toll. The initial POP layout looks like this:

  1. hourly-rate LOCAL
  2. flat-rate ExpLocal
  3. hourly-rate toll

However, we can't leave the flat-rate in the second position. We can't put it in the primary position, because that would take the local call away, and if we swap it with the toll call we replace a relatively inexpensive secondary with a nasty toll one. In cases like these, we do the latter.

Because they can cause expensive calls to move ahead of less expensive ones, flat-rate IAPs are marked with an asterisk in POP-O-Rama output (i.e. "LOCAL*", "ExpLocal*", or "toll*").

Flat-rate IAP assignments have an unfortunate tendency to undo POP cost ordering, provider interleaving, and some of the load-balancing measures described in the next section. The problem was initially addressed with "hybrid" IAPs, which could be used as either flat-rate or hourly-rate. For hybrid-billed IAPs, we treated the call as flat rate if it was the primary POP, and hourly rate if it wasn't. This gave us the price savings of a flat-rate IAP with the flexibility of an hourly-rate IAP.

With Etude, we gained flexibility.  A provider can support an arbitrary combination of flat, hourly, and per-port.  The POPtimization data is able to specify the exact billing method to use for each call.  We have also managed to negotiate hourly-rate contracts with most of the flat-rate-only providers, so we are able to take advantage of the new-found freedom of choice.

 

-IV.B- Intro to POP load balancing and provider rotation

The previous section talked about how POPs may be shuffled while the PhoneDB is being created. There are some additional things that the service does before sending the POPs down to the client.


In some situations we have more than one local POP from the same provider. If we just used the assignments straight out of POP-O-Rama, we would end up sending everybody to the first POP listed, and nobody to the second. The second POP might end up in the tellyscript in the secondary slot, but there's a good chance that provider interleaving will put a local POP from another ISP into the secondary slot.  More than likely, the second POP from the primary IAP would simply go to waste. To correct this situation we use "provider rotation".

Provider rotation is a simple form of load balancing. If there are two local POPs from the same provider, it ensures that each will get no more than 50% of the traffic. If there are three, each gets 33%, and so on. This is done by using the last byte of the silicon serial number to choose between the available options.

The rotation code swaps the primary POP with one of the others. Nothing else is changed. The POPs must be from the same provider, have the same cost for us, and must be LOCAL.


One of the limitations of the data in the PhoneDB is that it operates on entire exchange areas. If the PhoneDB assigns wpb/650-326-1095 as the primary POP for Palo Alto, everybody in Palo Alto will hit that POP. In an attempt to avoid swamping some POPs with users while ignoring others, a simple load balancing system was implemented. As usual, it was done at the last minute and in a big hurry.

The basic idea is that we carve up the POP assignment pie into pieces. Some of the providers get a piece, some don't. Each piece can be a different size. The last digit of your silicon serial number (which happens to be a checksum with a very nice distribution over the set of our users) determines which piece you're a part of. If the tellyscript generator can find a LOCAL POP from that provider, it makes that your primary POP; if not, nothing changes.

The initial implementation had one definition of the pieces for the entire country. Several months later, the system was enhanced to allow the pieces to be defined in individual exchange areas, which came in handy when trying to put Bay Area people on the "wpb" (WebTV PacBell) POP.

As you may have noticed, the system is less than perfect. For example, if the load balancing parameters say "50% cnc, 50% uunet", and the users in a particular area have nothing but psi and ziplink, they won't be affected at all. Chances are they'll all be piled on top of the same primary POP, and the next local POP will always be listed as the secondary. (Yes, they'll end up spilling over onto the secondary when the primary fills, but it's so much nicer to not have to wait for the "all circuits are busy" timeout.) This scheme has been augmented with the POPtimization system, described in section IV.Q.

As mentioned earlier, flat-rate IAPs will cause problems for us. For example, suppose we are limiting ourselves to two POPs per tellyscript, and we have these three local POPs available:

PSI (flat-rate local)
ZipLink (hourly-rate local)
UUNET (hourly-rate local)

Suppose the load balancing algorithm says we should use UUNET as our primary. The POPs above would get rearranged to be UUNET, then ZipLink. PSI would end up in the 3rd position, where it wouldn't get used because we're only using two POPs per tellyscript.   This is a good thing, since PSI is flat-rate-only, and we would only want to use them in the primary slot.  If, on the other hand, the initial arrangement were:

PSI (flat-rate local)
UUNET (hourly-rate local)
some hourly-rate toll number

This is difficult to rearrange, because we can't make PSI the secondary, and we don't want to give them a toll number when they have two local ones.

Refusing to rearrange POPs like the above could lead to situations where a flat-rate provider receives a much heavier load in a certain area than we'd like. To deal with this, the configuration file allows a "tenacity" setting to be adjusted. The primary can be left alone, moved into the secondary slot, or swapped with a more expensive toll call. This decision applies globally. The default is to leave it alone; in the above case, PSI would still be the primary and UUNET the secondary.

The setting also affects what happens when all of a user's local POPs are flat rate. The default behavior is to go ahead and give them the local POPs anyway.   This can result in users getting two flat-rate POPs from the same or different IAPs, which can be confusing because it appears to violate one of the cardinal rules of tellyscript generation.

In Disco we found the best eight POPs for every exchange area, and used the best two.   Flat-rate POPs had to appear in the 3rd or later position.  In Etude, we still find the best eight POPs, but now we can use up to four.  This means that POPs from flat-rate-only IAPs have to be pushed off to the 5th or later position, which can lead to difficulties.  See "The Perilous 4-POP TellyScript" for further insights.


Here's a real-life example from an old PhoneDB:

For 510-799-0000 from HERCULSROD, CA (base cost=240):
  psi/510-848-1398 in or near "Berkeley, CA" (OAKLAND, CA)
    LOCAL* 10.1mi  [wc=10.1mi] cost=240 
      --> 848-1398 then 1-510-848-1398
  uunet/510-982-1757 in or near "Berkeley, CA" (OAKLAND, CA)
    LOCAL 10.1mi  [wc=17.1mi] cost=240 
      --> 982-1757 then 1-510-982-1757
  psi/510-254-7549 in or near "Orinda, CA" (ORINDA, CA)
    LOCAL* 10.8mi  [wc=10.1mi] cost=240 
      --> 254-7549 then 1-510-254-7549
  psi/510-688-2363 in or near "Concord, CA" (CONCORD, CA)
    ExpLocal* 13.3mi  [wc=13.3mi] cost=420 
      --> 688-2363 then 1-510-688-2363

Four POPs were found. The 1st, 2nd, and 3rd say "LOCAL", which means that they can be swapped in with the primary. The 1st, 3rd, and 4th have an asterisk after the call type, meaning that they're flat-rate and therefore can't be put into the secondary position. (Actually, the asterisk means they can't be moved, and therefore they're flat-rate, but that's a detail worth forgetting.)

This satisfies the POP interleave rules (1st and 2nd provider are from different POPs), and the flat-rate rule (2nd provider isn't flat-rate).

If the load balancing algorithm wanted to use CNC or UUNET as the primary, it would fail, because there's no CNC POP and there's no POP eligible for use as a secondary if UUNET were moved into the first position. There is nothing the POP load balancing routines can do here.

Things are looking better for provider rotation though. If the last byte of the silicon serial for a user at that location was odd, the script handed out would have the first two POPs shown above. If the byte were even, the tellyscript generator would swap the 3rd POP into the primary position. The fourth POP is ExpLocal, and therefore isn't eligible for rotation.

 

-IV.C- Tellyscript return codes

After a failure that occurs while the box is connecting to the service, the box will display a dialog with an error message. If you hit the "Options" key on the keyboard or remote, it will display an "M" code and an "S" code, e.g. "M-26/S10". The "M" code is the box's message code, and the "S" code is the return value from the tellyscript.

The current set of tellyscript return values ("S" codes) are:

0    ParseError Tellyscript was bad
1    Connecting (not really an error)
2    Success Tellyscript finished successfully
3    ConfigurationError Modem and box not on speaking terms
4    DialingError Modem not saying what we wanted it to
5    NoDialtone Didn't hear a dial tone on the phone line
6    NoAnswer POP number just kept ringing
7    Busy POP number was busy
8    HandshakeFailure Modem handshake failure; this is rare
9    UnknownError Got an unknown result code back from the modem (the error itself isn't unknown)
10  BadPassword Authentication failure
11  PPPHandshakeFailure Couldn't negotiate PPP successfully
12  NoCarrier Something answered, but it wasn't a modem
13  BlackHole Rare; last POP was a black hole, and we ran out of POPs
14  VerySlowConnect Modems connected at less than 14.4Kbps
15  BadPasswordNR Same as #10, but we don't reboot the box
16  UnhappyScript The tellyscript generator blew it. This is bad

When dealing with customers who are having trouble calling in, it is important to get both the "M" codes and the "S" codes. The "M" codes are described elsewhere.

Incidentally, the codes defined in the current (client 2.2) box sources look like this:

0   kTellyParseError
1   kTellyConnecting
2   kTellyLinkConnected
3   kTellyConfigurationError
4   kTellyDialingError
5   kTellyNoDialtone
6   kTellyNoAnswer
7   kTellyBusy
8   kTellyHandshakeFailure
9   kTellyUnknownError
10  kTellyBadPassword
11  kTellyPPPFailed
12  kTellyNoCarrier
13  kTellyBlackHole
14  kTellyDownloadOK
15  kTellyNoLoader
16  kTellyNoFirmware
17  kTellyLoaderFailed
18  kTellyNoResponseFromLoader
19  kTellyFirmwareFailed
20  kTellyNoResponseFromFirmware
21  kTellyScriptExpired

The meanings of 14, 15, and 16 don't agree, which is unfortunate but not fatal. Because the box codes have to do with modem firmware initialization and not dialing it's possible to tell which is which from their context.

 

-IV.D- Dial patterns revisited

An earlier section explained that the service remembers successful dial patterns, and uses them when generating tellyscripts. This section explains the mechanism in more detail.

At about the time that the splash page -- the WebTV logo that comes up before you get to the home page -- is appearing on the screen, the box is talking to a service called logserverd.  The purpose of logserverd isn't to serve anything; rather, it collects different types of logs that are sent up by the box, including crash logs, TCP logs, error and warning logs, TV logs, and phone logs. What we're interested in here are phone logs, which are sometimes referred to as "connection logs" or occasionally "configuration logs".

A simple phone log looks like this:

PhoneLog from 013bc1e60100002c (version=44, length=236)
  numPhoneBusy=0                    tcpInputPackets=32
  numPhoneNoAnswer=0                tcpOutputPackets=61
  numPhoneNoConnection=0            tcpDuplicatePackets=0
  [ ... blah blah blah we don't care about this blah blah blah ... ]
  realAudio2Used=0
  realAudio3Used=0

  Records:
    0x0c SetClock
      when=0x3588511a (Wed Jun 17 16:28:26 1998)
      prevWhen=0x3588510e (Wed Jun 17 16:28:14 1998)
    0x08 NukeTellyScript
      when=0x3588511b (Wed Jun 17 16:28:27 1998)
      id=0x00000000 modWhen=0x00000000 [Wed Dec 31 16:00:00 1969]
    0x05 Disconnection
      when=0x3588511b (Wed Jun 17 16:28:27 1998)
      disconnectionType=6 "redial"  flags=0x44
      connectWhen=0x35885109 (Wed Jun 17 16:28:09 1998)
      dialString='prereg: 18004653537'  fullPOPNumber='' []
      LastConnectionSpeed=28800 LastConnectionCompression=2
      PowerOnReason=0 "normal"
      LastSuccessfulConnectionBegin=0x358850ef (Wed Jun 17 16:27:43 1998)
      NumPopFailuresSinceBeginDialing=0
      FirstPOPAttempted=''
    0x10 BeginDialing
      when=0x3588511c (Wed Jun 17 16:28:28 1998)
      isReconnect=false
    0x01 RunScriptReport
      when=0x3588511c (Wed Jun 17 16:28:28 1998)
      id=0x99772c40 modWhen=0x3588511a [Wed Jun 17 16:28:26 1998]
    0x03 GetDialInSuccess
      when=0x3588513a (Wed Jun 17 16:28:58 1998)
      dialString='16506944628'  fullPOPNumber='' []
      callWaitingPrefix=''  dialOutsidePrefix=''  longDistancePrefix=''
      accessNumber=''  tollFreeAccessNumber=''
      flags=0x44 (aud waittone )
      dialSpeed=2  cwSensitivity=1
      dceRate=28800  dteRate=234000  protocol=0  compression=2
      totalScriptTime=1824  boxIPAddress=172.17.134.151
    0x0c SetClock
      when=0x3588514b (Wed Jun 17 16:29:15 1998)
      prevWhen=0x3588514c (Wed Jun 17 16:29:16 1998)
    PhoneLog_Log()            3106743/013bc1e60100002c
    PhoneLog.c:459            dblogserverd[6360,8-0]            06/17 16:29:19

Every time the box does something "interesting", it adds an entry to its phone log. When the box gets connected to the service, it sends the log up to logserverd, and erases its local copy. The service collects the logs, which are used to generate usage reports and POP health statistics.

A complete discussion of phone logs is beyond the scope of this document. For now we're just interested in the second-to-last entry in the log, which tells us that the box connected successfully to the service.

The entry shows that the box connected to the POP at 650-694-4628 by dialing "16506944628". When dblogserverd sees this, it adds an entry to the list of dial patterns indicating that calls from the user's ANI to the POP at 650-694-4628 should be made with 11 digit dialing, and sets the "last successful connect" field to "16506944628".

The service screens out numbers that don't correspond to POPs that might be sent to the box. If you put a number in the access number field or give the box a dial override to a POP that it wouldn't normally use, the dial pattern table will be unaffected. There are two motivations for being so picky: limited space, and the need to avoid garbage. For example, if you had to dial "9" followed by a 10-digit number, you might be given an override or access number like "96503240657". If the service isn't careful, it would record that you needed an 11-digit dial pattern to dial that number, which wouldn't be accurate. Rather than establish a complex set of rules for screening out bad numbers, the service uses a restricted notion of the set of good numbers.

Dial pattern entries are stored in "most recently used" order. What this means is that the most recently used dial pattern is always at the top of the list. The service only holds onto eight entries, so if we already have eight and then make a new discovery, the entry at the bottom is thrown out. If the box logs in, and we see that the ANI, POP, and dial pattern are already known, we just pull the entry up to the top. If the ANI and POP match but the dial pattern is different, we replace the dial pattern field and then pull it up. To make matters more complicated, we try to reduce database accesses by not adjusting the order if the entry is already one of the top three.


The feedback mechanism seems pretty clean on the surface, but there's actually a race condition during login. There's no way to be sure that the phone log will get uploaded before the headwaiter checks to see if the box needs a new tellyscript. If the phone log comes up first, then the headwaiter will compute a new tellyscript that takes into account the latest dial pattern information. If the phone log comes up second, the headwaiter will make its tellyscript decisions without the benefit of knowledge learned from the current phone log.  (And up until Funk, there was some chance of losing one set of database changes because two near-simultaneous writes would step on each other.)

Of course, it's even worse than this if you're on the phone with a customer. It's possible for them to have the right patterns but not have a tellyscript that includes that knowledge, because the knowledge was gained after they got through the headwaiter. They have to hang up, come in again, get a new tellyscript with the new patterns, then hang up and redial again to actually use the new patterns.

For these reasons, customers whose dial patterns have been edited manually are usually told to go back through scriptlessd. They will immediately get a script with the latest information.

 

-IV.E- Secret codes, NVRAM, and "have you moved?"

The "have you moved?" dialog was briefly described in the ANI section. In short, it appears whenever the box is unplugged. The dialog has changed over time, with the wording being updated with almost every client release. Back in v1.0 the default action (i.e. what would happen if you just hit the "go" button without moving the selection rectangle) was "I haven't moved", in v1.1 and later it changed to "I have moved", and in v1.2 we started showing the user's ANI as well.

When you tell the box that you've moved, all it really does is throw out the tellyscript and the headwaiter's IP address. When the box sees that it doesn't have these, it heads off to scriptlessd, which gets the ANI data and sends down a new tellyscript.

You can get similar behavior by using the "7265" secret code. This is related to the "7264" secret code, which has a long history of not working right. I don't know offhand which client versions implement the code correctly (I'm told that *none* of the 1.x releases through 1.3 does it right!), so unplugging the box and entering "yes, I've moved" is still the most reliable way to wipe out the tellyscript. Unfortunately this also causes the clock to be reset, and in "Plus" boxes this means that the box can't show the current TV listings until it reconnects.

The "32768" secret code wipes out all of NVRAM. This is generally a bad thing, because it kills some other things like screen centering and TV configuration. The phone log lives in NVRAM when the box is powered off, so if a user uses 32768 he loses all phone log data collected up to that point. Since the information could potentially help us identify a problem with his POP or phone line, losing it is bad. For these reasons, 32768 should only be used as a last resort.

On internal boxes, the "93288" secret code allows you to choose which service you want to connect to. The box will wipe out the tellyscript to force the box to go back through scriptlessd. This is necessary because scriptlessd hands out a shared secret that is used for secure communication. If the different services have different notions of what the shared secret should be, the box won't be able to talk to the new service, so we send it through scriptlessd to make sure the secret is in sync. NOTE: the connection setup information is transient until you actually get connected. If you power the box off, or even go into the dialing setup screen through the convenient button at the bottom of the screen on some builds you will lose the information and end up connecting to the default service for that box.

The "1-800-GoWebTV" code (actually 18004693288) clears NVRAM and then sets a "force registration" flag. When the service sees the flag, it sends you back through registration so you can set up a new subscriber. The interactions with tellyscripts are a little funny, because unregistering the box causes some fields in the device record to get reset. These fields are normally initialized by scriptlessd. Since we've already been through scriptlessd, though, they get cleared and not set again. In the current service this is generally harmless, but could cause unexpected behavior.


Historical note: in the very early days, the box really did have NVRAM (Non-Volatile RAM). As a cost-cutting measure, we decided to remove the NVRAM part and dedicate a small piece (about 16K for US "Classic" boxes) at the upper end of the flash ROM for storage. The name "NVRAM" stuck, even though it now refers to flash ROM for "Classic" boxes and a disk block for "Plus" boxes.

 

-IV.F- How phone settings work

There are three prefix fields that may be applied to a dial string. The "Basic" screen has the "Prefix" field, "Call Waiting" has the "Block calls" field, and "Obscure Dialing Options" (known as "Spooky Dialing Options" in v1.1 and v1.2 clients) has the "Long-distance prefix" field.

The "block call waiting" prefix always gets sent first. After that comes either the prefix or the long-distance prefix, depending on which were set and what kind of call you're making. The following chart shows all four combinations of prefixes, and what a local and a long distance call would look like for each:

prefix=(none), LD prefix=(none)
local=6145539    long=18005551212
prefix=9, LD prefix=(none)
local=9,6145539   long=9,18005551212
prefix=(none), LD prefix=8
local=6145539    long=8,18005551212
prefix=9, LD prefix=8
local=9,6145539   long=8,18005551212

The determination of "local" or "long" is made by the service when the tellyscript is generated. POPs that are LOCAL or ExpLocal are treated as local, and toll calls are treated as long. The fallback number is always considered long distance, as are numbers entered with Vend-A-Telly or clientpopedit (if you're using the latter two methods, you shouldn't need a long-distance prefix anyway... just enter the full set of digits you need). Ditto for tellyscripts handed out when "IgnoreANI" is set in the config file (only development servers are configured this way).

For the LD prefix we regard LOCAL and ExpLocal as non-LD, but the system works differently for the "this may be a toll call" dialog. Local calls don't get the dialog, but both ExpLocal and toll calls do. The reason it's like this is that the dial prefix is assigned based on the telco definition of what a local call is, which often has little to do with the call being inexpensive.

While we're here, I should mention that the "Don't dial 1 for long distance" flag in Obscure Dialing Options doesn't really have anything to do with making long distance calls. If the flag is set, and you're not using an access number, it just checks for a leading '1' on the POP number, and removes it if found. It has no effect on leading '1's in prefix fields.

One final note on prefixes: some of the "Classic" boxes on store shelves and in warehouses are still v1.0 clients. These boxes only have the "basic" prefix, so for these the script behaves as if the other prefix fields exist but were left blank.


Understanding how the other phone settings are handled isn't vital but may come in handy. If you need to understand precisely how something is handled, Initialize() in base.tsf has the ultimate answer.

Pulse Dialing - we send a DT to the modem for tone or DP for pulse.

Call Waiting - for the US, the S10 setting determines all. There are five values, one meaning "off" and four meaning "on" with different sensitivity levels. For Japan we also set S220.

Wait For Dialtone - determines whether the modem should wait until it hears a dialtone, or just sit there for three seconds and then go. Set by tweaking S6.

Audible Dialing - send M0 or M1. The tellyscript always turns audible dialing off when showing a VideoAd while connecting..

Dial Speed - three settings, set with S11. We now also set &P to control the speed of pulse dialing. This really only applies in Japan, but the US seems to work with the Japanese settings. The cool thing is you can now crank up the pulse speed if you select "fast dialing".

Access numbers are covered in the next section.

 

-IV.G- Radius, access numbers, and PSI

When a box logs in, its tellyscript knows how to send a login and password that the provider of the POP will accept. All of our providers use a system called Radius to verify login names and passwords, and all but one uses a "proxy Radius" configuration that allows WebTV to make the actual accept/reject decision.

The usual sequence of events during login starts with the box sending up a login and password to the IAP's Radius server, using the PAP authentication protocol. Attached to the login name is a special prefix or suffix that tells the IAP that the request is coming from a WebTV box. The IAP's Radius server forwards the request to our Radius server, which verifies the login and password, and sends back an ACK ("yes") or NAK ("no") response. By doing things this way, we retain control over which boxes are allowed in, and avoid the hoops we had to jump through with a provider like PSI.

PSI initially refused to do proxy Radius, so we had to create an account with them for every box before the box ever logs in. This meant that scriptlessd has to connect to their system and create the account before the box could hang up and redial. Otherwise, if the account creation attempt failed (as it occasionally did), we would end up giving the user a tellyscript with POPs that they can't dial into. If scriptlessd isn't able to contact PSI, all PSI POPs are stripped out of the script, and the box gets whatever is left. A more thorough discussion of service changes and the potential dangers involved with doing things this way can be found in a separate document.  Fortunately, PSI has agreed to implement proxy Radius, and as of Etude we should no longer be dependent on the nasty psiauthd service.


Some IAPs provide the ability to select how a connection should be billed.   Possibilities include hourly-rate connections, flat-rate monthly billing, and peak-port-usage billing.  We tell the IAP which method we want to use by sending them a specific Radius prefix or suffix.

The set of available billing methods is enumerated in the POP lists used to generate the PhoneDB.  The mapping of billing method to Radius prefix or suffix is done by the ".tsf" file for that provider.  (More detail can be found in section IV.N.)  The service selects a default value as follows:

These can be overridden by POPtimization (section IV.Q).


In the 1.0 client, if the last POP in your list failed with a Radius authentication error, you would get a message that said "your box needs to be reconfigured". As of v1.1 the box would simply wipe its brain and restart. More recent service releases reduced the frequency of this behavior, but it may come back depending on what sort of security mechanisms we choose to implement.

Authentication failures on POPs other than the last in the list just cause the tellyscript to roll on to the next POP. It's only the last POP that has potentially dire consequences.  The situation where this becomes necessary is more likely to occur with CHAP authentication.  See section IV.R.


We know that each IAP has a different Radius suffix or prefix. Each may also require a different password. If you type a POP number into the "Access Number" field in the Dialing Options screen, which values should it use?

The trouble with access numbers is that the tellyscript has no way of determining which IAP it's supposed to use. The number is held directly on the box, not on the service, so we'd have to download the complete set of POP information to the box to make this work smoothly.

The approach we took was to use whatever IAP happens to come first in the tellyscript on the box. If your primary POP is from CNC, then you can enter any CNC number in the access number field and it will work. Entering a POP number for UUNET, ZipLink, PSI, or any of the other IAPs will fail, unless Radius is configured in a particularly forgiving manner. (Incidentally, this is why we're so paranoid about showing toll-free numbers: we were using CNC's 800 number for quite a while, and the Radius authentication information was exactly the same as CNC's regular POPs. If you were one of the 60% of our customers who had CNC as their primary POP at the time, you could get toll-free access at our expense just by putting the CNC 800 number in your Access Number field.)

Some people who do international demos have had cause to enter "cnc-palo", "uunet-palo", or "artemis-palo" in the "Enter Your Phone Number" screen that scriptlessd shows when it can't get ANI data. The reason this was added wasn't so much to allow them to use a specific POP as it was to get a specific IAP into the first position. If you know that a UUNET POP comes first in your tellyscript, you know that putting a UUNET POP number in the access number field (along with whatever weird things need to be done to dial out of a foreign country or to dial that IAP's POPs within the foreign country) will work.

Because of the difficulty in getting the POP number matched up with the first entry in the tellyscript, using this field is strongly discouraged except in certain rare cases.


One place where the access number field is useful is when it's not really used as an access number. A special feature was added to the service to support dialing suffixes via the access number field. If the '$' character appears in the access number field, the tellyscript will replace it with the POP number currently being dialed.

For example, if you set your access number to "10288,$,54321", and the POP numbers assigned by the service are 3261095 and 6145539, the box will dial the string "10288,3261095,54321" (the commas are brief pauses), and if that fails, it will next try "10288,6145539,54321". (Prefixes like 10288 really ought to go in the prefix fields rather than the access number field; I included it here to show that the '$' can be anywhere.) This isn't really the intended use for the Access Number field, but since the intended use is all but useless it was deemed acceptable.


In v1.1 and later clients, 77437 brings up the Obscure Dialing Options page. Here you can enter an "800 access" number that will replace the toll-free scriptlessd number. The scripts sent down by the service don't even look at this field, because it only matters when you're dialing into scriptlessd.

In v1.0 boxes, the Access Number field does double duty, and will change the number used to dial scriptlessd. This makes it extremely cumbersome to use, because you have to set it to one thing while dialing scriptlessd, and then change it to another before dialing into a POP. The toll-free access number field was added to help people doing international demos and other situations where an access number was needed just for scriptlessd calls.

 

-IV.H- OpenISP

OpenISP, which has also been known as Pick-an-ISP, BYOISP, and OpenAccess, has been around conceptually since one of the early "connectfolk" meetings in late 1996. It wasn't until the first part of 1997 that it went from being considered more trouble than it was worth to a high priority. One of the driving factors was competition: every emerging competitor we had claimed to work with arbitrary ISPs, and in fact some of our wanna-be competitors used this feature as their sole distinguishing characteristic.

The idea behind OpenISP is that you can choose to use your own ISP instead of the ones that WebTV provides. Any ISP that supports PPP (a standard network protocol) and PAP (a standard method of sending up login and password information) will work. All you have to enter are your login name, password, and the phone number to dial, and everything else just works.

Surprisingly few changes were needed to implement OpenISP. Most had to do with presenting an appropriate user interface, and making sure that the feature was activated and disabled when appropriate. The login, password, phone number, alternate phone number, and an ISP name (which isn't really used) are all stored in NVRAM on the box, and the tellyscript pulls these values out and uses them. The service doesn't store these values (see also "Keeping OpenISP closed").

Because of an early design decision that later got changed (for a while the box was going to inject the login and password into the script; now the script goes looking for the data), and also to keep the size of a tellyscript small, tellyscripts are either OpenISP scripts or standard scripts. We don't send down a script that can either dial OpenISP or dial standard POPs. This may change, but we're limited by the space available for tellyscripts in the "Classic" boxes.

You can tell if somebody most recently received an OpenISP tellyscript by looking at the information shown by clientinfo or CMR. It will look something like this:

  Most recent script sent to client:
    Hash 0x7961e156, sent Thu Jun 11 16:14:11 1998
    v54 - base/-
    v3  - locale/-
    v0  G OpenISP/-

The only IAP listed is "OpenISP". The service doesn't know what provider they're using or what number they're dialing, so those can't be shown.


The call ordering for OpenISP is like this if they entered one number:

Call first number
Retry first number

If they entered two numbers, it goes like this:

Call first number
Call second number

After making two calls we give up. We never try a fallback number. If you want to use your own ISP, guess what, you're going to use your own ISP.

The Access Number field is ignored for OpenISP users, unless they use the fancy kind of access number that has a '$' in it. Dial patterns and dial overrides have no effect on OpenISP customers.

 

-IV.I- Client upgrades and brain-dead boxes

Client upgrades for WebTV "Plus" boxes are terribly uninteresting, because they can do the download without disconnecting. Also, WebTV "Plus" boxes with damaged approm images go into the "mini-browser", which has most of the features you'd find in a full v1.3 client. The discussion here concentrates on "Classic" boxes, which are far more interesting.


Client upgrades (a/k/a flash downloads) for WebTV "Classic" boxes are done by the boot ROM, because you can't be executing code from a ROM image that you're updating. The boot ROM has a minimal subset of the features available in the full ROM (it's 1/8th the size). All it really knows how to do is dial in, issue simple requests, and write chunks of data into flash ROM.

The usual behavior is that flashromd tells the client to go flash itself. The box hangs up, dials back in with the current tellyscript, reconnects to the same flashromd, and starts asking for parts. When it has all of the pieces, it hangs up and reboots. (More details than you could possibly be interested in are available from the flashromd commentary.)

Because the box is essentially a v1.0 client during downloads -- regardless of what client version was running on the box before -- some tellyscript gymnastics are required to get at dialing options added after v1.0, notably the "don't dial 1" flag and OpenISP settings. These were broken for a while, but should work as expected now. (See "The trouble with dial options" for details.)


The term "brain-dead box" refers to a WebTV "Classic" unit with a damaged ROM image. The easiest way to become brain-damaged is to initiate a download and have it interrupted before completion. When the box restarts, it computes a checksum on the ROM, and discovers that things don't look the way it had expected. It boots into the boot ROM and immediately starts a flash download.

The boot ROM ignores everything in NVRAM, because flash is corrupted and NVRAM is held in flash. It will accept an access number and a dial prefix, which have to be entered with the extremely limited user interface supplied by the boot ROM, but most of the other dial options can't be set. You can't use any secret codes with a brain-dead box, because the codes are handled by the full client ROM, not the boot ROM.

Every time the brain-dead box is powered on, it connects to scriptlessd, asks for a tellyscript and an IP address for flashromd, then disconnects and executes the tellyscript. After connecting to the local POP, it initiates a download.  If we weren't able to get ANI for the box, we have a problem, because the alternate way of getting their location -- putting up a screen that asks them to enter their phone number -- isn't available to us (no UI on a brain-dead Classic).  We store "000000000000" for the ANI, and the user does the upgrade on one of our 800 numbers.


An interesting problem arises when an OpenISP box becomes brain dead. We no longer have access to the person's OpenISP login and password, because those are kept in NVRAM, and we can no longer believe that NVRAM is valid. We have to send them somewhere else. But where?

The obvious choice is to send them to the POPs that they would have if they weren't an OpenISP user, but there are a couple of problems with that. First of all, the user might have signed up for OpenISP because they didn't have any local POPs. Their POPs might be toll calls, which isn't going to make them very happy. Second, it's possible that their primary POP is a flat-rate IAP, which means we will have to pay for a full month of service for this user if they only show up once to do a download.

There are two alternate solutions. The first is to send the user to an 800 number. This is a fairly good solution, because it doesn't cost the user anything, and it may well cost us less than the usual primary POP. The down side is that it requires a large short-term increase in port capacity on our 800 lines. If we have a hundred thousand OpenISP users, and even a small percentage go brain-dead, we're going to need to add a lot of modems for a couple of weeks to handle the load.

The second solution is better but more difficult. The box ignores the NVRAM settings because the ROM checksum failed, and it can't trust that the values in the NVRAM section of memory are good. However, the tellyscript that we send down is capable of running its own checksum on NVRAM, and using the values there if they're valid.

This gets complicated when you consider that, until now, tellyscripts are either OpenISP or non-OpenISP. The second solution requires that the box be able to dial either, and decide which it's going to do when the box starts up. The only good news is that the box will arrive at scriptlessd when brain-dead, and won't store the "double" script in NVRAM, so any wackiness in the script doesn't necessary have to affect anybody else.

We will need to move to the second solution at some point, but for now we're just sending brain-dead OpenISP boxes to an 800 number, and the cramped tellyscript buffer makes this difficult.

 

-IV.J- ComingSoon and friends

The "coming soon" program was, arguably, a bad idea. What is indisputable is that it cost an arm and a leg. As a lesson to future generations it's worth explaining what it was, why we did it, and why it went away.

In the halcyon days of WebTV's youth, we discovered that our IAPs' claims of covering well over 90% of the country were subject to interpretation. They weren't far off -- the actual figure was around 87% -- but that last 13% was a large and noisy bunch.

In an attempt to kick-start an increase in local coverage as we were entering the 1996 holiday season, we were directed to institute what became known as the "coming soon" coverage plan. Rather than wait until we had a signed contract with an IAP, we would provide the same coverage that the IAP did using an 800 number.

To be eligible for "coming soon" access, you had to be in a situation where you didn't have a local call to a "real" POP, but did have a local call to a "coming soon" POP. That meant you didn't have a local call, but you were going to have one real soon. The POP lists for the IAPs that were coming real soon were added to the PhoneDB, and pretty soon we were letting hundreds of people surf the net at our expense.

Getting new IAPs to sign up turned out to be a bit of an ordeal. Some of the IAPs we threw into the mix weren't technically competent or didn't have (and would likely never have) the kind of capacity we needed. Others were unwilling or unable to configure Radius servers the way we wanted, and some took months of negotiation before either they signed or we gave up in frustration. The net result was that we were paying per-minute charges for several months.

The project, which cost several million dollars over its lifetime, was finally killed in October 1997. A couple hundred people still didn't have a "real" ISP, so they were "grandfathered" in with dial overrides to a different 800 number.


A similar but less painful situation exists in Phillips, WI and Webb, MS. These two small towns were to be part of an advertising campaign capitalizing on the names of the cities. Since neither had a local ISP, both were granted perpetual free access via an 800 number. Nothing ever came of the marketing plan, but we still shell out money for a box in the library in Phillips. In both cases, the override was done for the entire NPA/NXX by making a special entry in the PhoneDB.  (Actually... it looks like the override was removed at the start of July, 1998.)

 

-IV.K- Pick-yer-POP

The Pick-yer-POP program was a good idea that had some serious flaws. The basic idea was to allow the customer to choose their own POPs from a list. They would be able to specify how many digits to dial, and change to a different POP at will.

The most significant barrier to implementing this was flat-rate IAPs. If a user switched between three different flat-rate IAPs during the course of a month, we would have had to pay 3x the fees for that one user. A related issue is what happens when a user chooses an hourly-rate IAP as the primary, and then proceeds to use it for a large number of hours. With a flat-rate primary we would pay a fixed amount, but with an hourly primary the costs could be much higher.

We can't afford to lose control over POP assignments unless we have some way of making the user share the costs. If they use a POP that costs us more than the POP that we would have given them, we have to bill them for the difference. Unfortunately this is difficult to calculate, and even more difficult to explain to the customer.

Pick-yer-POP also removes any hope of load balancing. I would expect users struggling to get in during peak hours to change their POP frequently, resulting in large swings between local IAPs and lots of complaints.

The proposed implementation for Pick-yer-POP was essentially a user-driven dial override.  The flag was finally removed in the great Grunge dial override overhaul

 

-IV.L- MessageWatch and EPG

MessageWatch is the fancy name we use for a feature that allows the box to dial in at a specified time and check for new mail. The idea was to have it log in during the early morning hours, so that you can see if you have new mail without needing to log in when you wake up.

As luck would have it, a fairly large number of people configured it to log in around 5pm, so that the mail light is set when they get home from work. This is unfortunate because it means the boxes on the west coast are coming in at the height of peak usage on the east coast.

Whatever the case, MessageWatch connections are vastly simplified versions of normal connections. A few salient facts:

If a user is seeing multiple calls starting at a specific time and separated by 30 minutes each on his or her phone bill, chances are MessageWatch is involved.

WebTV "Plus" boxes do something similar with EPG (Electronic Program Guide) data downloads. However, in the 2.1 client, the EPG downloader won't stop with the first POP if the second one is toll. There are plans to fix this for future client releases.

In "Classical" and earlier service releases, MessageWatch is only enabled when the user turns it on. In "Disco" and later, it is enabled for all new users by default.

 

-IV.M- Idle timeouts

Idle timeouts make the box disconnect from the service and hang up the phone when nothing has happened for a set period of time. There are two kinds of idle timeouts, input timeouts and network timeouts.

Input timeouts happen when the user stops using the box. If the box doesn't see any activity from the user, such as typing on the keyboard or hitting buttons on the remote control, it will disconnect after 10 minutes. This timeout is set by the service. If the user is connected through an 800 number (determined by comparing the box's IP address against a list of known values), the input idle timer is reduced to 5 minutes.

Network timeouts happen when no packets are being transmitted between the box and service. The box used to have a network idle timeout, but this is no longer in use. However, some IAPs, notably CNC, have idle timeouts on their equipment. After 30 minutes with no network activity, CNC's terminal servers will drop the line.

If a user is flipping through a large page, or is composing a long e-mail message, there is no network activity. The box won't choose to disconnect, but the terminal server will. If a user is experiencing line drops while composing long e-mail messages, this is probably the cause.

Some providers have time limits that don't care whether you're idle or not. After an hour or two the connection is dropped, so that computer users can't leave their machines running and wander off. (Some computers will just redial when disconnected anyway, but try telling that to the IAPs.) We have added something similar in the form of a usage cap on the fallback 800 number (see section V.B).

 

-IV.N- Adding new providers

Adding a new provider to the system isn't something that most people will have to do. If done incorrectly, however, it can adversely affect a large number of people. This section explains the right way to do it, when it should and shouldn't be done, and how things fail if it's done the wrong way.

Each IAP should be a separate provider. A provider is defined by a "Provider:" line in a POP list in the PhoneDB. Several attributes are defined for each:

All of the above are included in and available from the PhoneDB. The "dumppops" phonetool command will display them (see the phonetool README).

The choice of abbreviation is important, because it's used in the tellyscript, in reports, and often in casual conversation. It has to follow C syntax rules for function names, which means it has to start with a letter and may only contain letters, digits and the underscore ('_'). No spaces, dashes, periods, or other fancy characters are allowed. It can't be longer than 15 characters, and by convention is entirely lower case.

More information on POP lists can be found in the rawphonetool README.


Adding the "Provider:" line and a few POPs to a POP list is only half the story. The other half is adding a new .tsf file. When tellyscripts are generated, the service gathers up the .tsf files for every provider that might be dialed, and combines them with several other components to form the complete script. The service doesn't attempt to verify that the tellyscript fragment is correctly written, so it is imperative that the script be error-free.

Here's the current script fragment for ZipLink (ziplink.tsf):

/* TLLY ver=4 */
/*
 * This is included from "ziplink.tsf".
 */
Chat_ziplink(int billing, int svc)
{
    if (billing == 'F') {   /* want flat */
        return PAPChat("ZTV/%s", 0);    /* flat */
    } else {
        return PAPChat("XTV/%s", 0);    /* hourly */
    }
}

/* --- ziplink.tsf END --- */

The "TLLY ver=4" at the top specifies the script version number. This should be incremented every time the script is changed. The first line must look EXACTLY like the one shown above, or the service will reject the script. The last line in the script is also required in all .tsf files; it ensures that we don't use a tellyscript fragment that was accidentally truncated.

The C-like function is always named "Chat_something", where "something" is the provider abbreviation. It takes a billing flag and a service flag as arguments. The end result is always to call the PAPChat or CHAPChat function with a prototype user name and a password. For most providers, the prototype user name is a Radius prefix or suffix (the "ZTV/" and "XTV/" parts are the prefixes for ZipLink) followed by the box's user name (replaces the "%s"), and the password is generated automatically (indicated by passing in "0" for the second argument).

The billing flag requests flat ('F'), hourly ('H'), per-port ('P'), or generic ('G') billing. Not all ISPs give us a choice of billing method, and those that do don't support all possibilities, so one billing choice is used as the default. For the example above, the call will be billed to WebTV as a flat-rate call if we request it, otherwise it's an hourly-rate call. Default values for the billing flag are specified in the PhoneDB, and can be overridden by POPtimization.

Older versions of the service used two functions, called Chat_something() and Chat_something_2(). This was less flexible and more prone to errors.

The service flag isn't really used at present. Instead, there are several copies of ziplink.tsf, each in a separate directory (e.g. US, US_daily, US_daily_alpha). By specifying different radius prefixes, we can have the ISP route authentication requests to development servers instead of the production servers. Were this to be implemented within a single .tsf file instead of in multiple .tsf files, the "svc" argument would be checked to see which service created the script. Example values are 'p' for production, 'w' for weekly, and 'd' for personal development servers.

The format is simple enough, but if you have any doubts you can always run the .tsf file through a C compiler. (You will want to have some other things defined if you don't want to be drowning in warnings; see network/src/lib/tellyscript/scripts/ScriptIncl.h.) IMPORTANT: the curly braces ('{', '}') on each part of the "if" statement are not optional! There is a bug in the tellyscript interpreter that can cause odd behavior if you don't put braces around the statements.


What happens if we have a .tsf file, but no "Provider:" entry in a POP list, and therefore no information about the provider in the PhoneDB? Nothing. The service will not have heard about the provider, so it won't try to use it. Heck, without a POP list there's nothing to use anyway.

What happens if there's an entry in the PhoneDB, but no matching .tsf file? Bad things. headwaiterd will refuse to send a new tellyscript to people who would get a script with the partially-defined provider, and scriptlessd will actually send people off to wtv-*. The reason scriptlessd was written this way was to avoid sending such users to an 800 number, and to make it immediately obvious that a serious but easily correctable problem exists. It would be nicer to send the users to alternative POPs and inform a pager instead of the customer. This may be implemented in a future service release, especially since brain-dead Classic boxes will report the mysterious "couldn't get IP address" error in this situation.

What happens if there's a PhoneDB and a .tsf file, but the .tsf file contains an error? Very, very bad things. The box will probably crash when it tries to execute the tellyscript. It is always prudent to test new PhoneDB and .tsf combinations with the Vend-A-Telly page before releasing them to customers.


In general, there should be a 1-to-1 mapping between providers and IAPs. The load balancing and provider interleaving algorithms do their best to avoid saturating a user with POPs from the same provider, but the only way they can tell whose POPs are whose is by the provider abbreviation. If you split cnc into cnc1 and cnc2, there's nothing to prevent the user from getting cnc1 as their primary provider and cnc2 as their secondary, and a localized network outage within CNC will shut out the user.

If a provider has multiple categories of POPs, such as new POPs with higher capacity that are meant to replace older ones, you can give higher priority to the better POPs by assigning cost values to specific POP numbers. This will cause the PhoneDB to place them ahead of otherwise equivalent POPs from the same provider, and will prevent the provider rotation in the service from swapping the POPs around.

There are two cases where we've broken this rule. The first is cnc vs cnc800. We used a separate provider here to make it easy to spot the users who were on the 800 number. This was only used for "coming soon" and other special programs, so there was no risk of multiple CNC assignments causing trouble. The second case was "uunet" vs "uunetdan". Again, it was felt important to distinguish the two because we had radically different pricing on them, and more importantly we wanted the load balancing parameters to only affect the uunetdan set. Since uunet was given a very high cost (low priority), and uunetdan a very low cost (high priority), there was little chance of a user ending up with one of each unless they had no other POPs anyway.

 

-IV.O- VideoAds

VideoAds are short (15-second) VideoFlash clips that play when the box is powered on. They are downloaded during a MessageWatch connect, play once, and are then thrown away. This feature was first available in the 1.3 client.

There are a number of restrictions on the set of users that get VideoAds. The download takes about 5 minutes, which isn't terribly long, but if the box is making a toll call every night it can add up. We also want to control our own costs by not sending the VideoAds to users with hourly-rate POPs. Even if the revenue from an ad impression is more than the cost of a 5-minute call on an hourly IAP, we won't come out ahead unless the user logs in almost every day. We only get the revenue if the ad plays, and new ads are sent down every night whether the box plays the ad or not.

The rules are:

Most of these (pretty much everything but the user category) are checked when the tellyscript is created, rather than when they log in.  The VideoAd eligibility flag is stored in the database.  That way we base the decision on what they're dialing in with now, rather than what we would assign to them if we handed out tellyscripts right now.  It's a subtle distinction, but it can make a difference.

The VideoAd plays during the first part of the box's connection to the service. Instead of seeing the Road to Nowhere and the connection status bar, you watch the movie. Audible dialing is disabled for connections that start with a VideoAd playing.

 

-IV.P- Automatic Number Frustration

There are cases where ANI doesn't work that weren't worth covering in the introductory sections, but should be mentioned for completeness.

One of the unnerving things about ANI is that anybody with a T1 or PRI can convince you that they're calling from anywhere. Some PBX systems, especially those targeted for use by telemarketers, explicitly allow you to set the outbound ANI and CallerID information. This means that sometimes the service will receive ANI that is inadvertently or deliberately misleading.

A prime example is Microsoft, whose Redmond campus phone system was sending up ANI values that looked like "100-010-1180" or "100-111-5566". Clearly these aren't valid US phone numbers. In this situation, the service will put up the "enter your ANI" page.

A more insidious example is a store whose number was 804-850-xxxx. After an area code split, their number changed to 757-850-xxxx, but the PBX was never updated. When CCMI finally removed the 804-850 exchange from the database, we no longer recognized the ANI as valid, and (on the assumption that it was a new exchange that we weren't recognizing yet) the service started handing out tellyscripts with an 800 number. Not only does this cost us money, it might cost the store money in the future: if the exchange were used for a different location in the new area code, it might be a considerable distance away from the store, and the store would start making toll calls because their ANI is wrong.

Another fun case was the user showing up with 415-700-xxxx. This exchange doesn't exist, and apparently never has. As it happens, the caller with this ANI was in Paris, France, and was using an international 800 code to get to us. For whatever reason, the carrier decided to return 415-700-xxxx as the source.

 

-IV.Q- OraclePhoneDB and POPtimization

Until the "Disco" service at the end of 1997, the PhoneDB had just been a file on disk. As of Disco, the PhoneDB is also kept in the database, and in some future release the disk file may vanish altogether. The purpose behind this is to gain greater flexibility and provide direct access to the PhoneDB for database queries.

One of the more important developments associated with the OraclePhoneDB (so named because we're using an Oracle database right now) is an optimized POP assignment system, usually referred to as POPtimization. The goal of POPtimization is to assign POPs on an individual basis, rather than on an exchange area basis.

The current load balancing system has a number of flaws, but the biggest of them is that it doesn't consider groups of people. When you log in, it looks at the usage percentages assigned to the different providers, looks at your serial number, makes an assignment, and then forgets all about you. It doesn't know how many ports each POP has, and even if it did, it wouldn't know how many users have had that POP assigned to them, or which of those users is likely to dial in during peak hours.

POPtimization takes into account customer usage patterns (like number of hours per month and typical time of day logging in), POP capacity, and several other factors, and assigns POPs to all users in an entire region. This allows us to use all the capacity that is available while minimizing costs.

The aspect of POPtimization that most directly affects Customer Care and Operations is that tellyscripts can now hold multiple sets of POPs, and can invoke a different set based on what time it is, what day of the week it is, and what month it is. Here is an example of a tellyscript assignment with two sets, one for October 1997 and one for November 1997:

  Most recent script sent to client:
    Hash 0x91c63c79, sent Wed Oct 22 20:54:41 1997
    v42 base/-
    v2  locale/-
    v0  ---Poptimized/199710
    v1  wpb/3261095
    v2  cnc/6870610
    v1  wpb/16503261095
    v2  cnc/16506870610
    v1  artemis/18004653537
    v0  ---Poptimized/199711
    v1  wpb/3261095
    v2  cnc/6870610
    v1  wpb/16503261095
    v2  cnc/16506870610
    v1  artemis/18004653537

This example shows only two sets, but a tellyscript might have as many as eight. The output of clientinfo will also show the sets in a hierarchical fashion:

  POPtimized assignments:
  MONTH Oct 1997
    DAYS SMTWRFS
      TIMES 00:00 - 00:00
        POP 1 0:650-326-1095 conn=F
        POP 2 1:650-687-0610 conn=P
        POP 3 2:650-687-2255 conn=H
  MONTH Nov 1997
    DAYS SMTWRFS
      TIMES 00:00 - 00:00
        POP 1 0:650-326-1095 conn=F
        POP 2 1:650-687-0610 conn=P
        POP 3 2:650-687-2255 conn=H

The "conn=X" part tells you if the connection is supposed to be (F)lat rate, (P)er-port, or (H)ourly rate. Starting with the Etude service, these values determines how IAPs bill us for connections by changing the Radius prefix used in the tellyscript fragment.

Switching POP sets on calendar month boundaries is especially important when flat-rate IAPs are used. The IAP bills us if a call starts in a particular month, so if we can have the box switch between two flat-rate providers exactly on a month boundary, we won't end up paying two IAPs for the same box in one month or the other.


The POPtimization data is determined over the set of existing users, and is updated periodically. New users will just get the standard load-balanced PhoneDB selection. A (few) more details about the process are provided in section V.A.


There are a number of operational issues related OraclePhoneDB and POPtimization, but it's not really appropriate to list them all here. Some early notes on operations issues are available in Living with OraclePhoneDB.

The service has a number of safeguards to prevent really bizarre behavior. For example, every POP in every set must be one of the ones shown by POP-O-Rama. There is no way for corruption in the POPtimization tables to cause a box to dial a POP that is completely wrong unless the PhoneDB itself is damaged somehow. The results from the OraclePhoneDB are currently compared bit-for-bit against the results from the file version, and the file is checksummed and verified in various ways, so PhoneDB corruption is unlikely unless the POP lists or PhoneDB tools are screwed up. And if the POP lists or PhoneDB tools are broken, then we'll have problems whether we're using the PhoneDB in Oracle or in a file.

The flip side of this safety is that, should the PhoneDB file become out of sync with what's in the database due to some minor installation mishap, POPtimization will be disabled for all users.  This could prove costly for us, though it should have no impact on the users.  I'm not expecting customers to be adversely affected by POPtimization.


If a box loses power, it loses track of the date and time. In such an event, it will behave as if it were Wednesday at 7pm (local time) in the most recent month in the script.

 

-IV.R- MCI and CHAP

Toward the end of 1997, WebTV and MCI reached an agreement that allows WebTV customers to switch to an MCI/WebTV co-branded service. Such customers will use MCI POPs exclusively, call MCI's customer support number, and will pay a lower monthly fee if they are also subscribed to MCI's long distance service.

In late 1998, because of some restructuring in the MCI/Worldcom deal, it was decided to cancel the MCI program without ever having actually used the MCI POPs.  The discussion in this section is still relevant for other potential partners though (e.g AT&T).

The trick to using MCI's POPs is that they only support CHAP authentication, while all of our other POPs support PAP. To use their POPs we needed to add CHAP support to the box.   This feature first became available with the Etude service, and the 1.4 and 2.1 (2.2?) clients.  Because we wanted to sign people up for the MCI co-branded service before CHAP support was available, we continued to assign the normal WebTV POPs to MCI customers for the end of 1997 and first half of 1998.


We will have some troubles with flash downloads, because the flash downloader on "Classic" boxes will always behave like a 1.0 client, and therefore can't negotiate CHAP.  Same story for the mini-browser on "Plus" boxes, which behaves like a 1.3 client. The tellyscript sent to MCI customers has to be able to dial either MCI POPs or normal WebTV POPs, and will switch based on whether or not the currently executing ROM supports CHAP authentication.

Because of the potentially large number of MCI customers who will briefly be using our POPs, we only regard a customer as eligible for MCI if they have a local MCI POP and they have two local WebTV POPs that aren't from flat-rate-only IAPs. It is possible for a customer to gain or lose eligibility without any changes in MCI's POP list.

These issues aren't unique to MCI.  Any provider that wishes to use CHAP authentication will have the same problems.

CHAP authentication requires the CHAP secret, which is stored in the service and embedded in the tellyscript.  The secret is changed once every 30 days or so, depending on the value of a config option.  If the secrets become out of sync, the box will be unable to connect to a CHAP POP.  Eventually the box will hit the (PAP-based) fallback number and get straightened out, or if they aren't allowed to use the fallback the box will reset itself when the last POP in the sequence fails.

The richest source for potential trouble with CHAP passwords is the delay between when we send the tellyscript to the box and when it commits the new script to NVRAM.  On an FCS box, the script is only saved when the box is powered off with the remote or keyboard.  If you hit the reset button, or the box loses power, the script is lost, and with it the new CHAP secret.  The service will still have its copy though, and the next time the box tries to connect it will fail because the secrets are now out of sync.


The MCI POP list only has POPs from one provider.  This has some implications for the way we construct a tellyscript, because the most likely situation is for a user to have only one local POP.  It doesn't really make sense to try a second toll number, because it's from the same provider (so if the system goes down, you're dead anyway), and many people with one local and one toll POP have asked that we don't try the toll number at all.

We use a different set of DialSequenceTemplate lines for MCI in the 2lt and 2tt cases.  These try the first number twice, and then move on to the failover.  The 3xxx lines aren't even used.

It isn't possible to POPtimize MCI.  There wouldn't be any point, really; there's only one provider, and the service already does rotation for equivalent local POPs.

In the Disco service, we had to use an entirely separate copy of the PhoneDB file to support MCI.  With Etude, it's all in one PhoneDB file, which makes it easier to maintain the service and to ensure that the MCI POP assignments were generated with the same data used to generate the standard POP assignments.

 

-IV.S- Oh, Canada

Some minor changes had to be made to support Canada.  Canada's phone system works very much like the one in the US, so Canadian POP assignments are included as part of the standard PhoneDB.

The most difficult aspect of Canadian support, from the perspective of dialing into the service, is that the CCMI data we get doesn't allow us to price calls that originate in Canada.  We can price calls from the US into Canada, but not the other way around.   We had to use data from a new source (International Data Bases, or IDB), and use a different software library to interpret it (ERTRS, which I think means Enterprise Real Time Rating System).  ERTRS, and IDB's Canadian data in ERTRS format, were both relatively new products, and there were a few hurdles to overcome.

Calling into the US from Canada on an 800 number is rather expensive.  We constructed a workaround for the scriptlessd number, but decided to omit the fallback number entirely.  This is configurable.

 

-IV.T Perhaps you should surf scriptless

Connecting to the service requires two things: a tellyscript and the service list entry for the headwaiter.  The tellyscript tells it how to dial the local POP, and the service list entry (which looks something like "name=wtv-head-waiter host=204.254.74.107 port=6801") tells it the IP address and port number of the headwaiter.  Both of these items are stored in NVRAM.

If neither of these items is present, the box will use the default tellyscript that it keeps in ROM.  This dials the scriptlessd number.  Once connected, it will try to contact "10.0.0.1", which our routers redirect to the actual scriptlessd machine.  This works fine, but we run into trouble if the box has one of the items but not both.

If we have the tellyscript but not the headwaiterd service entry, the box will dial the local POPs as usual.  Once connected, because it doesn't have the headwaiterd IP address, it will try to connect to "10.0.0.1" on port 1615.  Because none of the local ISPs has anything meaningful at that address and port, the connection attempt fails, and the box will put up a message indicating that "perhaps the service is down for maintenance".  This became widely known as The Perhaps Problem.  Recent clients (1.3 and later??) were fixed to handle this problem gracefully.  (I don't think we completely nailed down why the service entry was getting deleted.)

If we have the service entry but not the tellyscript, we will call the scriptless 800 number and then connect to headwaiterd.  This usually doesn't have any consequences for the end-user, but it can be rather expensive for WebTV.  A fix for one cause of tellyscript loss was added to Etude-1 during Funk development, but the problem persists.

 


-V- Extra Goodies

-V.A- POPtimization

POPtimization, sometimes referred to as "POPz", is WebTV's mechanism for optimizing the cost of dialup access.  The goal is to minimize WebTV's costs for providing service without affecting the customer's phone bills.

Unlike the simplistic load-balancing and ISP rotation provided by the service, POPtimization operates on an individual user level.  Each subscriber has a set of POPs selected for them, based on their usage patterns.  This allows us to keep heavy users off of hourly-rate POPs, and frequent peak-time users off of per-port POPs.

POPtimization data is stored with a copy of the PhoneDB in an Oracle database.   New calculations are done on a monthly basis, using phone-log data from the data warehouse.  A user can be told to dial different POPs based on what month is, what day of the week, or what time of day.

The service itself doesn't do any POPtimization calculations.  It just provides a mechanism for implementing the POP choices selected by the back-end process.  If no POPtimization data is available for a subscriber, perhaps because the customer just signed up, the service will assign POPs in the usual non-POPtimized manner.

For more information about the POPtimizatioprocess, contact Joy Mundy (joy@corp.webtv.net).  She has a status page on  http://webhost-1/~joy/Poptimization_Status.htm.

 

-V.B- Fallback usage cap

This feature was added for Disco, and partially un-implemented in Etude. The work on the original feature was done by Wiltse Carpenter (wiltse@corp.webtv.net).

The basic idea is to cut our costs by reducing the amount of time that people spend on the fallback 800 number. This is accomplished with two different mechanisms, a per-session limit and a per-month limit.

The per-session limit is like the 10-minute idle timeout that the box has, except that it forces you to hang up and redial whether you're idle or not. This is somewhat intrusive, but only affects people connected to the fallback number, and does a marvelous job of reducing costs.

The per-month limit prevents you from using the fallback number if you have used more than a set number of hours in a calendar month. This caps the per-user cost at a tolerable level, while still allowing relief during temporary POP outages. We found that a handful of users accounted for a large percentage of the costs, so the cap would dramatically reduce costs while only affecting a handful of users. When the monthly usage cap is exceeded, users dialing in through the fallback number will get an HTML page from the headwaiter telling them that all local POPs are unavailable.

Changes in the way usage-based billing was being implemented for Etude caused the per-month usage cap feature to be removed.  The "churn" feature is still in place.

 

-V.C- There's ECMA in my TellyScript

The Sega Dreamcast machine has a very limited amount of NVRAM.  While our initial FCS boxes had 16K allocated for the purpose, Dreamcast only has 2K.  Because a tokenized, compressed tellyscript occupies about 5K of space, there was no way to fit a full tellyscript into that space.

The alternative plan, developed by Doug Steedman (doug@corp.webtv.net), was to generate the script in ECMAScript (pronounced "JavaScript") instead.  By embedding most of the tellyscript functionality in the ROM, and using the existing JavaScript interpreter, the compressed size of the scripts could be reduced to a few hundred bytes.

This required some changes in the tellyscript generation code.  The script computation is still carried out in the same way, but the actual script is generated from JavaScript fragments instead of TellyScript fragments.  They're stored in the same directory as the ".tsf" files, but have a ".js" extension instead.

Keeping two sets of provider files (e.g. cnc.tsf and cnc.js) up-to-date would be a waste of time, so Doug defined a third format called ".tsx".  The tellyscript generator knows how to convert a ".tsx" file into either a ".tsf" or a ".js" file.  This way the provider access rules can still be specified only once.

Some notes are available on http://webhost-1/~doug/dialscript.htm.

The ECMAScript is currently in limbo, because Microsoft chose to use the WebTV (Fiji) browser instead of their (Merlin) client.  The problem with the too-small NVRAM was solved by putting 90% of the tellyscript into ROM, and doing a merge on the box.

 

-V.D- TVPAK and DialScript

The future of WebTV clients is called "TVPAK".  Built on top of Windows CE, these come with a whole bunch of Microsoft goodies for managing the modem.  So, while they still accept scripts written in the tellyscript langage, the content of the scripts is dramatically different.  To make the distinction clear, they are referred to as "dialscripts".

The person in charge of dialscripts is Venkatesh Krishnamurthy (vkrishna@corp.webtv.net).

 


-VI- For Further Reading

-VI.A- On the web

Resources available on the web, internally or externally.

http://webhost-1/~fadden/DocArchive/
A collection of documents on various subjects, some related to the material here, some not. Take a look sometime.
http://webhost-1/~fadden/greater-scroll.html
This file!  (The official copy is in the service source tree.)
http://webhost-1/~fadden/override-handbook.html
The Dial Override Handbook.  (Official copy is in source tree.)
http://webhost-1/~fadden/dialup-options.html
The Field Guide to Dialup Options.  (Official copy is in source tree.)
http://hyperarchive.lcs.mit.edu/telecom-archives/
TELECOM Digest archives. Several years' worth of interesting articles.
http://frodo.bruderhof.com/areacode/
Area code split details.
http://www.areacode-info.com/
Assorted area code stuff.
http://www.cnet.com/Content/Reviews/Compare/56kmodems/index.html
Reviews of 23 56K modems.

 

-VI.B- In the service source tree

Documents checked into the service source tree. The links go to the Perforce web browser, which will let you look at any version of the document that has been checked in.

network/src/doc/GreaterScroll.html
This file!  Used to be called network/src/doc/DialingInfo.
network/src/doc/OverrideHandbook.html
The Dial Override Handbook (complete description of dial overrides).
network/src/doc/DialupOptions.html
The Field Guide to Dialup Options (explanation of all dialup-related config options).
network/src/doc/ANICodes
List of OLS codes (found in the first two digits of the ANI number).
network/src/doc/IntlPhoneNotes
A few notes on how the service deals with the phone systems in foreign countries, e.g. Japan.
network/src/doc/POPBalancing
A detailed technical discussion of the ramifications of POP load balancing, written while I was trying to convince myself that the system was behaving correctly.
network/src/doc/PSI
Description of the changes made to the service to support PSI.
network/src/phonedb/README
Tips and tricks for advanced "phonetool" use.
network/src/tool/clientpopedit/README
Documentation for the "clientpopedit" tool.
network/src/tool/dpedit/README
Documentation for the "dpedit" tool.
network/src/tool/phonetool/README (and README_JP)
Documentation for the "phonetool" tool, which is actually a collection of tools. Of particular interest for some people is the table of dialing pattern codes that are output by the "dumpnpas" sub-command.
network/src/tool/rawphonetool/README (and README_JP)
Documentation for the "rawphonetool" tool, which is actually a collection of tools. This tells you what all the nasty messages printed by rgenphonedb mean.
network/src/tool/psiutil/README
How to use psiutil, if you are ever unfortunate enough to need it.

That's all, folks...

*** WebTV/Microsoft Confidential ***