Lesser Scroll of Dialing Wisdom

/----------------------------------------------------------------------------\

|                                                                            |

|      |                      The Lesser Scroll                       |      |

|    -=*=-                            of                            -=*=-    |

|      |                        Dialing Wisdom                        |      |

|                                                                            |

\----------------------------------------------------------------------------/

*** WebTV/Microsoft Confidential ***

This document contains a lot of detail on how our service works. While not strictly "trade secret" information, care should be taken to keep this within WebTV/Microsoft and their outsourcers.

Contents:

System Overview

Telco Issues

Service Fundamentals

Dialing Details

NOTE: Click on the table of contents to bring you to the appropriate section.
Click on the section header to bring you back to the table of contents.

The WebTV system is a combination of a set-top box and an online service. The set-top box ("WebTV Internet Terminal" or "WebTV Plus Receiver"; henceforth referred to as the "box"), is connected to a television and a phone line. Once it has successfully dialed into an Internet Service Provider, it connects to the WebTV service, and great things happen.

Like the Greater Scroll of Dialing Wisdom, this document attempts to explain the intricacies of getting the box connected to the service. But unlike its predecessor, the Lesser Scroll concentrates specifically on the information that is relevant to the Customer Care department and their outsourcers. The goal here is not to gain a complete understanding of every aspect of the service, but to gain a solid understanding of the basics, so that it is easier to troubleshoot our customers' connection problems.

Recommendations on specific practices (notably with regard to dial overrides) are simply that: recommendations. They may or may not be consistent with current Customer Care policies. As always, consult your supervisor if there are any questions on policy.

WebTV at a Glance

A brief example should help illustrate the major components of the WebTV system.

When Joe User brings his box home from the store, the first thing he does is try to set it up. Assuming that he reads the instruction manual, this part should be easy. Once the box is powered on, it listens for a dial tone on the phone line. When it hears the dial tone, it dials a toll-free 800 number that we call the "scriptlessd number." Most users can connect to this without trouble once their box is set up properly.

Once connected, Joe's box starts talking to the scriptlessd server. Scriptlessd gets the caller's phone number via a feature called ANI (Automatic Number Identification) that is similar to CallerID, except that it works from almost everywhere and can't be blocked. If the service is unable to get the user's number from ANI, scriptlessd will put up a screen asking the user to enter their phone number.

From the user's ANI we know where they are and what the closest access numbers (called POPs). The two best POPs are assigned to that user, and put into a set of dialing instructions called a "tellyscript." The tellyscript tells the box which numbers to dial and how to dial them.

After getting the tellyscript, the box hangs up and dials the first POP in the list, which is hopefully a local call. If the first number is busy or does not connect it will hang up and try another. After trying each of the POPs twice it will give up and call a toll-free 800 "fallback" number. Excluding network outages or excessive local congestion, most users shouldn't need to use the fallback number.

With a little luck, most users will successfully connect to the WebTV Network without further intervention. The box connects to the "headwaiter" server, which tells it where to go. Shortly after connecting the box sends up a "phone log" (sometimes called a "connection log") that shows what numbers were dialed, what failed, and what ultimately succeeded. These logs are used to generate POP failure statistics, to debug problems, and to decide how that box will dial in the future.

When Joe turns his box off with the keyboard or remote, the tellyscript is saved in NVRAM (Non-Volatile Random Access Memory). The next time the box is powered on, it skips the scriptlessd step and dials directly into the local POP.

Review

The "box" is the thing what sits on your television set. The "service" is what it talks to when it gets dialed in. The service is composed of multiple "servers" that do specific things, like hand out "tellyscripts", show you the home page, or let you read mail. The box knows how to find "scriptlessd", scriptlessd knows how to find the "headwaiter", and headwaiterd knows how to find all the other servers.

ANI tells us the caller's phone number. From the ANI data we assign local POPs to the user. The specific dialing instructions are contained in a tellyscript.

Boxes with tellyscripts dial into their local POP, and connect to the headwaiter. Boxes without tellyscripts go to "scriptlessd" first to get a tellyscript, and then hang up and redial a local POP.

Telco Issues

Terms & Definitions

While the evolution of the US phone system shows a great deal of careful and occasionally ingenious thought, there are some things about it that are just plain confusing. Before we can go into detail, there are a few terms and proper nouns that should be defined.

Telco is an abbreviation for "telephone company."

CCMI is Center for Communications Management Information. CCMI sells us a database of call pricing information, which is usually accurate.

POP is Point Of Presence. Generally it is a bank of modems with a terminal server that connects to a network. The modems are usually part of a "hunt group", so that you can dial just one number, and if the first line is busy it "hunts" for the next free one. POPs are also referred to as access numbers.

A calling plan is an agreement with your telephone company that decides how much you will be charged for the calls you make. Telephone companies usually have several calling plans to choose from. The most common are flat-rate plans (where you can make as many calls as you like to a certain area for a flat monthly rate) or measured-rate service (where you are charged for every outgoing call). But the choices vary between different telephone companies. For example, by adding a higher monthly charge to your phone bill you could get flat-rate local calling to a larger area.

A dial pattern tells you how many digits to dial when you're calling a particular number. In the US, these may be 7, 10, or 11 digits long. The dialing pattern necessary to call from one area to another is dependent on where they are in the country, the rules of their local telephone company, and sometimes the rules of their calling plan.

Tellyscripts are a WebTV creation. They are programs containing instructions that tell the box how to configure the modem, which POPs to call, and how it should dial them. The tellyscript is sent to the box by the service.

LEC is Local Exchange Carrier, or local telephone company. These are the guys who handle local calls and "local toll". Pacific Bell, Bell Atlantic, and Ameritech are examples of LECs. CLECs are Competitive Local Exchange Carriers, a new kind of carrier made possible by the 1996 Telecommunications Act. (Now you too can run your own phone company!).

RBOC is Regional Bell Operating Company. These are the "baby Bells" that got spun out of AT&T several years ago. Pacific Bell is an RBOC. Sometimes these are just referred to as BOCs.

IOC and UOC are CCMI abbreviations for Independent Operating Company and Unknown Operating Company. Contrast with BOC. IOCs tend to be smaller phone companies or CLECs, UOCs are usually phone companies run by rural cooperatives or out of somebody's garage. An IOC that CCMI doesn't know anything about is a UOC.

IXC (sometimes IEC) is Inter-eXchange Carrier. This is a fancy term for "long distance company" that telco people like to use. AT&T and MCI are examples of IXCs. When you make a long-distance call, the IXC pays money to the LEC where the call came from and the LEC where the call went to, so calls that avoid IXCs tend to be cheaper.

LATA is Local Access Transport Area, a geographical region defined by the phone companies. The way things traditionally work is that your LEC handles local calls and intra-LATA (in the same LATA) toll calls, while your IXC handles inter-LATA (between LATA) toll calls. So a toll call to a location 20 miles north might be handled by Pacific Bell, while a similar call in the other direction might be handled by AT&T, based on where the LATA boundaries fall. Calls that cross state boundaries follow an even more mysterious set of rules.

The Telecommunications Act of 1996 really confused things. Your IXCs can be LECs, CLECs can provide local service with the LEC's equipment, and generally anybody can do anything. This is why MCI can offer local service now.

PIC is Primary Inter-exchange Carrier. This term can be used both as a verb and an adjective. Your phone line can be "PICed" to use a specific carrier for your IXC, and more recently you can have an intra-LATA PIC done for local toll calls. A "PIC code" is a sequence of digits that you can enter before dialing a number to choose a different carrier; examples are 10288 (1-0-ATT) or 10321 (Telecom*USA's 10-3-2-1 program). "PIC charges" are the fees that your IXC pays to your LEC when you change your long distance company. The PIC code format is in the process of changing from 10XXX to 101XXXX.

Tariffs tell you how much a call between two points costs. For long distance calls, the tariffs from the LECs on both ends and the relevant IXC all have to be factored in.

PUC is Public Utilities Commission. The PUC in each state has a great deal of control over the tariffs that the phone companies use. There are places where a long-distance call handled by AT&T is completely free, because the PUC decided that it should be.

Local calls, in the telco world, are not necessarily free calls. The difference between local and toll is defined by the tariffs, which are filed by the phone companies and monitored by the PUCs. Pacific Bell defines "zone 3" calls, which charge per-minute rates even to subscribers with flat-rate plans, as local. In the WebTV world we try to define "Local" as least-cost and "Expensive Local" (ExpLocal) as any local call that is more expensive than the minimum. We calculate the minimum by figuring out what it would cost for the customer to call himself. Any local call that costs more than that is labeled ExpLocal.

The rate center is a geographic point used for billing purposes. "MTS" (Message Toll Service) coordinates are based on the rate center. The cost of a long distance call is based on "major MTS coordinates" for calls over 40 miles, and "minor MTS coordinates" for calls under 40 miles. For local calls the "wire center" coordinates are used. Yes, it could be more complicated: the coordinates are specified in "V&H" (Vertical and Horizontal) units, 1670 feet each.

POTS is Plain Old Telephone Service. The term is used to differentiate standard phone service from things like ISDN or cellular.

C.O. is Central Office. In the typical house or apartment, a pair of copper wires runs from your telephone to the central office. The distance between your phone (or, more importantly, your WebTV box) and the central office, and how well the wires are shielded, can affect the quality of your phone connection and hence your modem connect rate.

NPA/NXX is the obfuscated term for area code and prefix. If your phone number is 650-614-5500, your NPA is 650 and your NXX is 614. The NPA and NXX are enough to identify where the call is coming from. The last four digits of the phone number are sometimes called the "subscriber number". In some contexts the term "exchange" is synonymous with NPA/NXX.

An Exchange Area is a collection of NPA/NXXs for which the billing is identical. For example, two calls from anywhere in Palo Alto will have the same cost so long as both callers have the same calling plan and service providers. Exchange areas may include dozens of NPA/NXXs or might only have one. They might overlap geographically (because of paging/cellular exchanges), but each NPA/NXX is part of only one exchange area.

LCA is short for Local Calling Area. Your LAC is the set of exchange areas that are local calls from your exchange area. Put more simply, if I'm a local call for you, then I'm in your LCA. LCAs may overlap. LCAs aren't necessarily symmetric; just because I am a local call for you doesn't mean that you are a local call for me.

NANP is the North American Numbering Plan. The Plan defines all the area codes, how dialing patterns will work in the future, and other dry subjects. It's NANP rather than USNP because it applies to Canada, Guam, and places out in the Caribbean, all of which are part of North America if you lean back and squint. It does not cover Mexico.

ISP and IAP are Internet Service Provider and Internet Access Provider. They are essentially the same thing, with a subtle and unimportant difference. We usually refer to them as ISPs. Concentric Networks Corp. (cnc), PSINet, Inc. (psi), and UUNET Technologies, Inc. (uunet) are examples of ISPs.

The backhoe is a large piece of construction equipment used for digging trenches and cutting through network cables at inopportune moments.

The PhoneDB is a WebTV creation that combines the CCMI data with a list of POPs from several ISPs, and comes up with POP assignments for every NPA/NXX. (If you understand that last sentence, then you're ready to graduate.) The POP-O-Rama web page lets you do queries on current and past PhoneDBs.

Dial Patterns

People who grew up in California were spoiled by Pacific Bell's coherent dialing pattern system. For the most part, you can dial to any point within the same area code by entering a 7-digit number, and you get to numbers in other area codes by entering an 11-digit number. Dialing numbers in the same area code using an 11-digit number is allowed. Other parts of the country aren't so straightforward.

There are actually four kinds of calls you can make:

HL - Home area code, Local call.
HT - Home area code, Toll call.
FL - Foreign area code, Local call.
FT - Foreign area code, Toll call.

Each of the four types can have a different "expected" dialing pattern, as well as a "permitted" dialing pattern. Certain combinations have unpleasant consequences.

HL is almost always 7 digits, but some places (like Maryland) require 10-digit dialing for *all* local calls. Yes, you have to include the area code to call your neighbor down the street. Some areas (like California) have 11-digit dialing as a "permitted" HL pattern.
HT is generally 7, 11, or both. Places that require 7-digit dialing for home/local calls and require 11-digit dialing for home/toll calls are troublesome, because the number of digits depends on whether the destination is a local call, and the definition of "local" depends on your calling plan. In many cases there is no way for WebTV to know ahead of time how many digits the box should dial. Guessing wrong results in a recording from the phone company.
FL is usually 10 or 11, but in some cases is 7. In nasty cases it's 7 and 10/11 aren't allowed at all. It's nasty because we are *required* to dial a 7-digit number into a different area code when the call is local, but would be dialing an 11-digit number if the call were toll. So if we think something is local when it really isn't, we could be dialing a 7-digit number in the *caller's* area code rather than the *callee's* area code, and the WebTV box will be waking up somebody's grandmother. The service takes great pains to avoid this situation.
FT is always 11, no exceptions.

Using the right pattern can be important. For example, there are places where you are either not allowed to dial 11-digit numbers for local calls, or are charged more than you would for dialing 7 (presumably because the call is routed through the IXC as soon as the leading '1' is seen, instead of being handled by the LEC).

The CCMI database has "hints" on dialing patterns, but they are sometimes inaccurate. Because the dialing pattern depends on whether a call is local or toll, it depends on what your calling plan defines as being local. This makes it a bit of a challenge to get the dial pattern right. To work around these issues, the WebTV service takes the best guess it can, and remembers the cases that succeed.

The service remembers a set of dialing patterns that looks like this (output is from the Unix version of "dpedit", the Dial Pattern Editor):


 The dial patterns for '01fad82501b002ba' (ANI=004154631671) are:

  S  # ANI          POP          Mode

  +  0 415-614-5539 415-233-0570 7-digit

  +  1 415-614-5539 415-322-0489 11-digit

  +  2 415-463-1671 415-233-0570 7-digit

  I  3 415-463-1671 415-666-9999 7-digit

  +  4 415-463-1660 415-322-0489 11-digit

  +  5 415-463-1660 415-233-0570 7-digit

  N  6 415-463-1660 510-742-0207 11-digit

  -  7

Each line is one entry in the dial pattern table. It has the person's ANI at the time the call was placed, the POP number that the person was calling, and how many digits were used to dial it. We have to record the ANI, because if they move the box to a different place, or even to a different phone line with a different calling plan, the dial patterns can be different. The same applies for area code splits (see next section).

When a user first signs up, or first appears at a new number, we have no information about a person's dial patterns. The tellyscript that gets sent down will first try one pattern. Then, if that fails, it will try the next. When one succeeds, we add an entry to the table.

Suppose the tellyscript for a certain ANI first tries 7-digit dialing and then tries 11-digit dialing. What happens if the POP happens to be busy on the first attempt, but succeeds on the second? We will end up recording a success with 11-digit dialing, and will use that from then on. This isn't perfect, but it's hard to tell the difference between different kinds of failures ("all circuits are busy" sounds just like "you don't need to dial a 1 in front of that" to the modem). But most of the time it works.

It is also possible for a customer's dialing patterns to change over time, perhaps because they change local calling plans. This is not handled automatically, because the service can't easily distinguish a dead POP from a bad pattern. The solution for this is NOT a dial override. The "dpedit" tool can be used to adjust the dial patterns. Once changed, all the user has to do is unplug/replug and choose "moved" so that they go back through scriptlessd and get a script with the updated data.

Sometimes there are exceptions to dial pattern rules within a certain area. For example, there was an InternetMCI POP at 415-482-2900 in Redwood City that was a local call from Palo Alto. Every other call to Redwood City could be dialed with 7 or 11 digits, but not that one. If you didn't use 7-digit dialing, you got a recording telling you that you needed to dial a 1 and the area code. So there's really no way to know for sure what will work until it's tried.

Sometimes, things can get pretty weird. In the 608-326 exchange in Wisconsin, if you call "873-xxxx", you get a local number in Iowa at 1-319-873-xxxx. If, on the other hand, you dial 1-608-873-xxxx, you make a toll call to another point in Wisconsin. Even though you're in the 608 area code, and there's a 608-873-xxxx, your call to "873-xxxx" goes to a different area code. In this particular case, we're allowed to dial 1-319-873-xxxx, so by using 11-digit dialing there's no ambiguity.

It should also be noted that the list of dial patterns only determines whether the box dials 7, 10, or 11 digits when calling a POP. It does not decide which POP a customer will get, or in what order they will be tried.

Area Code Splits

Area code splits come in two varieties: geographical splits and overlays. Geographical splits are where a geographic region gets a different area code. With overlays, the same geographic area gets two area codes. Usually one area code is used for voice, while the other is used for FAX machines, pagers, and cellular phones.

For both kinds of splits, the transition is done over a period of a few months. The following chart illustrates the process, assuming that somebody in San Francisco at 415-659-xxxx and somebody in Palo Alto at 415-614-xxxx (changing to 650-614-xxxx) are trying to call each other.

Pre-split. The 650 area code does not exist yet.

From S.F., dialing 614-xxxx works.
From S.F., dialing 1-415-614-xxxx works.
From S.F., dialing 1-650-614-xxxx results in an operator message informing you that the area code you dialed does not exist.
From P.A., dialing 659-xxxx works.
From P.A., dialing 1-415-659-xxxx works.
The ANI for the person in Palo Alto is 415-614-xxxx.

"Permissive" dialing. You are allowed, but not required, to dial 650.

From S.F., dialing 614-xxxx works.
From S.F., dialing 1-415-614-xxxx works.
From S.F., dialing 1-650-614-xxxx works.
From P.A., dialing 659-xxxx works.
From P.A., dialing 1-415-659-xxxx works.
The ANI for the called person is now 650-614-xxxx. (Sometimes the local phone companies mess up and do this early or late. It is unwise to assume that the ANI will change at the very start of the permissive period.)

"Mandatory" dialing (usually starts about 6 months after "permissive").

From S.F., dialing 614-xxxx gets a "you need to dial 650" recording.
From S.F., dialing 1-415-614-xxxx gets a "you need to dial 650" recording.
From S.F., dialing 1-650-614-xxxx works.
From P.A., dialing 659-xxxx gets a "you need to dial 415" recording.
From P.A., dialing 1-415-659-xxxx works.

Eventually the no-longer-used numbers get reassigned.

From S.F., dialing 614-xxxx gets a wrong number.
From S.F., dialing 1-415-614-xxxx gets a wrong number.
From S.F., dialing 1-650-614-xxxx works.
From P.A., dialing 659-xxxx gets a wrong number.
From P.A., dialing 1-415-659-xxxx works.

What makes area code splits especially frustrating for us is that the dial pattern can change. Before the split, if you were in Palo Alto and calling a San Francisco POP at 415-659-xxxx, you could just dial 659-xxxx. After the split, you would be calling a number in a different area code, and would be required to dial 1-415-659-xxxx. Even though you haven't moved, your ANI has changed out from under you. The WebTV service can't fix you if you can't log in, and guess what, you can't log in except through the 800 number.

The good news is that if you make your box go back through scriptlessd, it will detect that your ANI has changed, and all of your old dial patterns will be ignored because they were tied to your old ANI. Ideally we wouldn't have to put the users through this manual step, and would either send them back through scriptlessd automatically or just make the change to their area code directly. But how do we do this?

One solution here is to have an 800 fallback number that also gets your ANI, and compare the current ANI with the ANI on record. If all of your local POPs are failing because we're using the wrong dial pattern, you end up on the fallback number, and once there we can automatically detect that it's because your area code changed. Also, given sufficiently detailed information about area code splits, we could program the box to dial a different set of numbers depending on whether "today" is pre-split or post-split. The latter solution isn't perfect, because if the box loses power it forgets what day it is, but it's a little cleaner.

You might be tempted to think that dialing the full 11-digit number every time would solve this problem. In the San Francisco/Palo Alto example above, the 11-digit pattern worked correctly in every case. Unfortunately, as mentioned in the section on dial patterns, 11-digit calls might either be disallowed or might be more expensive than a 7-digit call to the same number.

Automatic Number Identification (ANI)

When we get the caller's phone number via ANI on the 800 scriptlessd number, we get a little more data with it. A typical ANI string looks like "006506145539". The last 10 digits are the phone number. The first two are the OLS (Originating Line Screening) code. This allows us to tell if somebody is calling in from a prison, hotel room, or pay phone rather than a standard phone line.

If you're calling in from a point in the United States, Canada, or affiliated areas like Puerto Rico, chances are the ANI number is valid. There are specific regions that don't support ANI, however, and there are times when the ANI just doesn't seem to want to show up.

In cases like these, the service will ask the user to enter their own phone number. It doesn't need to be exactly right; it just needs to be in the same "exchange area" as the box. If the person has two phone lines, and puts in the voice number, it will usually work just fine. If the service for the lines are provided by different local phone companies, though, the billing can be quite different, so the system works best when the number comes from ANI.

To make it easier to diagnose cases where the user entered the wrong value for their phone number, the service labels "manual ANI" entries by replacing the OLS code with a WebTV-defined value. Some interesting values:


  99 (+ 10 digits) - number was entered on "enter your phone number" screen.

  98 (+0000000000) - special code used; probably an international demo box.

  97 (+0000000000) - special code used; probably an international demo box.

  96 (+ 10 digits) - number changed with the dpedit or clientpopedit tools.

  95 (+0000000000) - service is ignoring ANI values (this never occurs on production!)

If somebody is dialing a totally inappropriate set of POPs, and their ANI number starts with "99", chances are they entered the wrong number on the "enter your phone number" screen. WebTV isn't responsible for toll charges incurred by incorrectly entered numbers, but diagnosing this quickly will leave the customer a bit happier. Sometimes you need to check the "ANI history" to see if they made an error at some point in the past.

What happens if we successfully get the user's ANI but can't recognize the number? This happens when new exchanges are added faster than CCMI can keep up. In cases like this, we give the user the "global default" POP, which is usually an 800 number embedded in the PhoneDB.

When we finally put out a PhoneDB that does recognize their ANI, we will automatically send them a new tellyscript with the appropriate POPs when they next visit the headwaiter. If the PhoneDB "forgets" some numbers, possibly because an old area code split has caused some exchanges to cease to exist, we will simply stop updating their tellyscript until the next time they go through scriptlessd.

If we get the ANI, and we recognize it, but it's for an area that we don't yet support (e.g. Puerto Rico), we don't send the user a tellyscript at all. Instead they just get a message saying that WebTV isn't yet supported in their area.

One of the pitfalls of using ANI is that it only works when the user dials into an 800 number. It's very important that we know where the box is, because if we have the wrong value for their ANI we will be handing out the wrong set of POPs. If one of those POPs is a 7-digit number, we could be dialing a 7-digit number in the wrong area code and call a real person (wrong number) instead of a POP. On the other hand, 800# calls are expensive, and we have limited capacity on the modem racks, so we can't have the box dial into the 800 number every time the box powers up.

The current approach for dealing with this is to assume that the box might have moved whenever it loses power. We display a message the first time the box turns on after losing power that shows the most recent ANI (e.g. "650-614-XXXX"; the last four digits are blanked in case they return the box to the store) and asks them if they are still calling from that number. If the user has moved the box to a different phone number, they can just hit "Moved," and the box will go back through scriptlessd. (Versions of the box before client 1.2 weren't able to display the ANI number in the dialog).

Of course, this does not account for the possibility of user error. If a customer buys their WebTV terminal from a previous owner (or buys a box that was used as a retail demo), they bring it home and asserts that the box hasn't moved, they will end up with a tellyscript for the previous owner's (or store's) ANI rather than their own ANI. If the previous owner's ANI was in their same exchange (or very close) this might not be a problem, but it is usually in a different exchange and causes a huge phone bill. As with the incorrectly entered ANI, we are not responsible for their phone bill, but diagnosing the problem quickly (by comparing their ANI history to the date they brought the box home) can make the experience slightly less painful for the customer.

Local vs. Toll

Figuring out what's local and what's not is far more difficult than you might expect. The single biggest obstacle is the lack of completely accurate data. What we get from CCMI is fairly accurate, but they're collecting tariff data from dozens of companies on hundreds of calling plans for 25,000 different exchange areas. With that much data, in a system as convoluted as the U.S. phone system, there's bound to be problems, and there's an awful lot of "process" between finding a problem and getting it fixed.

We also have trouble with missing data. Some LCAs are entirely unsupported, others are partially supported. A "partially supported" LCA is one where the data is loaded only once, when somebody asks for it. It isn't kept up to date, and there is no pricing information associated with the local calls. Based on this data the PhoneDB generator can tell that a call is local, or at least was local in the recent past, but can't tell how much it costs. This makes it impossible to distinguish between "Local" and "Expensive Local".

The myriad filters and fancy footwork we do when generating a PhoneDB are outside the scope of this document. What's important is to understand how far you can trust the data and why it might be wrong, so that you can understand POP-O-Rama output and try to differentiate customer error from CCMI error.

Here's an example of output from the "lookuppop" tool, which generates the output for the POP-O-Rama web page:


For 561-357-0000 from W PALM BCH, FL (base cost=0):

  cnc/561-227-0012 in or near "West Palm Beach, FL" (W PALM BCH, FL)

    LOCAL 0.0mi  [wc=7.6mi] cost=0 

      --> 227-0012 then 1-561-227-0012

  uunet/561-681-9557 in or near "West Palm Beach, FL" (W PALM BCH, FL)

    LOCAL 0.0mi  [wc=5.4mi] cost=0 

      --> 681-9557 then 1-561-681-9557

  cnc/561-226-0010 in or near "Boca Raton, FL" (BOCA RATON, FL)

    ExpLocal 23.7mi  [wc=19.0mi] cost=1840 

      --> 226-0010 then 1-561-226-0010

  uunet/561-368-8801 in or near "Boca Raton, FL" (BOCA RATON, FL)

    ExpLocal 23.7mi  [wc=19.0mi] cost=1840 

      --> 368-8801 then 1-561-368-8801

  psi/954-971-5720 in or near "Pompano Beach, FL" (POMPANOBCH, FL)

    toll* 31.9mi  [wc=26.6mi] cost=2927 

      --> 1-954-971-5720

  uunet/954-486-4806 in or near "Fort Lauderdale, FL" (FTLAUDERDL, FL)

    toll 39.9mi  [wc=31.9mi] cost=2927 

      --> 1-954-486-4806

  cnc/954-845-0336 in or near "Ft. Lauderdale, FL" (FTLAUDERDL, FL)

    toll 39.9mi  [wc=36.4mi] cost=2927 

      --> 1-954-845-0336

  cnc/305-651-1819 in or near "Miami, FL" (NORTH DADE, FL)

    toll 53.5mi  [wc=46.8mi] cost=2927 

      --> 1-305-651-1819

The first line identifies the exchange where the caller is. In this case, we asked for "561-357", and it filled in the last four digits with zeros (remember, you only need the NPA and NXX to identify the location). The location name is "W PALM BCH, FL". The names are cryptic because the CCMI database only has space for 10 characters, and they're all upper case. "FL" is the state, in this case Florida. "Base cost" is what we computed it would cost for somebody in the 561-357 NPA/NXX to call themselves, based on a call of a certain duration at a certain time of day. DO NOT tell this cost to a customer! It might be based on a calling plan other than what the customer has, and we don't want to be responsible for giving out cost figures that are based on inappropriate or possibly even inaccurate data.

After the first line are eight sets of three lines, with one line for each POP. The first line in each set identifies the POP. "cnc/561-227-0012" means it's a Concentric Networks POP at 561-227-0012. There are two city names, "West Palm Beach" and "W PALM BCH". The latter is supplied by CCMI. The former is sent to us by the IAP, can be edited fairly easily, and is displayed to the customer in the "have you moved" dialog. The names don't always match up; note that the last entry says "Miami" and "NORTH DADE". This is generally because the CCMI entry describes things from the telco perspective. For example, the Pacific Bell phone book describes Cupertino as being in "San Jose 2", and CCMI shows Cupertino numbers as being in "SAN JOSE W". Ditto for Menlo Park, which appears to be in PALO ALTO. In general, the "nice" name is more accurate. If you believe the two are totally out of whack, the SOC can look into it and find out for sure.

There is no "nice" name on the top line, because (1) we only have "nice" names for places where the POPs are, and (2) the NPA/NXX isn't enough to tell you what city the person lives in. Some NPA/NXXs cover more than one city.

The next line tells you about what it costs for a user at the NPA/NXX to call that POP. The first word is one of the following:

LOCAL - we believe the call is local, and that the cost of the call is the same as if the user called themselves.
ExpLocal - CCMI says it's a local call, but it's more expensive to call than other local calls.
PsuedoLocal - equivalent to ExpLocal in almost every respect. Explained below.
toll - this is a toll call. It might be a "local toll" handled by the LEC or a long-distance call handled by an IXC.

Regardless of how the calls price out, local calls always come before ExpLocal, and ExpLocal calls always come before toll. Toll calls that are cheaper than local calls are extremely rare, so we always prefer the local calls just in case there's an error in the tariff data.

Entries with an asterisk (i.e. "toll*") denote ISPs who charge us on a flat-rate. Usually you should just ignore the asterisk.

The number after the local/toll indication is the distance in miles between the rate center for the caller and the rate center for the POP, using the "minor" (a/k/a "under 40") MTS coordinates. Put more simply, it's how far apart the phone company thinks the two points are. Calls aren't usually local beyond 10 or 15 miles, but there's one case in Florida where you could make a 135-mile local call for $0.25 per call.

The next number in square brackets is the distance between the wire centers for the caller and the POP. In some situations the wire center distance is used when pricing local calls. As you can see in the example above, the MTS coordinate distances are both 0.0, but the wire center distances are slightly different. Usually the numbers are pretty close, but because of the way some POPs are connected to the phone system, the wire center numbers can be large (perhaps 20 miles). When tracking down problems, it's usually best to pay attention to the first number (the MTS coordinate) and ignore the wire center coordinate.

The final item on the line is the cost of a call made for a given duration at a specific time of day on a particular day of week with a certain calling plan. Sometimes we average rates from multiple carriers together, which complicates things. At any rate (no pun intended), it's the most important value we use when deciding the order in which to hand out POPs.

The last line of the output shows the dialing patterns that we will try, in the order that we will try them. For the first entry we will try 7-digit dialing and then 11-digit dialing (it's a home/local call); for the last entry we just try 11-digit (it's foreign/toll).

Occasionally you will see entries that look like this:


For 205-526-0000 from LEESBURG, AL (base cost=241):

  tdsnet/205-927-6200 in or near "Centre, AL" (CENTRE, AL)

    PsuedoLocal 5.1mi  [wc=5.1mi] cost=2040  [LCA not sup]

      --> 927-6200 then 1-205-927-6200

  tdsnet/205-528-6200 in or near "Crossville, AL" (CROSSVILLE, AL)

    toll 14.5mi  [wc=14.5mi] cost=3137  [LCA not sup]

      --> 1-205-528-6200 then 528-6200

The end of the second line in each set may have a special code in square brackets. The most popular ones are "unsupported local" and "LCA not sup". When you see "unsupported local", it means that we have the LCA (Local Calling Area) definition, but no rate information (this is the "partially supported" LCA data mentioned earlier). Chances are the LCA is not getting updated regularly, but since these LCAs are usually small rural areas, it probably doesn't *need* to get updated very often.

When you see "LCA not sup" it means we have no information at all about the LCA for this area. We just plain can't tell what calls are local, and have to take our best guess. If the caller and POP are in the same exchange area, we go ahead and assume that it's a local call. We also have a feature where we declare that everything within a specific radius (currently 10 miles) of the caller in an "LCA not sup" area is local. Since we can't determine the cost, we define them to be ExpLocal. To make the distinction clear, we display ExpLocal calls in "LCA not sup" areas as "PseudoLocal". As mentioned above, PseudoLocal is functionally equivalent to ExpLocal; we just show it differently because the definition of "local" is based purely on MTS distance rather than telco tariffs, and therefore is more prone to problems.

The motivation for doing PseudoLocal was that ExpLocal calls are always prioritized ahead of toll calls. Because of weirdness in the phone system, it may cost more to call yourself with AT&T than it would to call the other side of the country. Without PseudoLocal, people in some rural areas -- who most likely had local POPs nearby -- were being told to dial distant locations, because an AT&T call cost less, and the only rating information we had was for the IXCs. (You might be tempted to just do the POP assignments by distance rather than cost, but there are many areas where distance and cost don't correlate. Some 50-mile calls in Florida are more expensive than 300-mile calls into a different state).

There's a problem with doing this though. Suppose we're in an area where local calls that cross area code boundaries require 7-digit dialing. Suppose further that we're in an unsupported LCA. We're now in the uncomfortable position of telling the box to use 7-digit dialing across area codes, based solely on the fact that the POP is less than 10 miles from the caller. Fortunately it's easy to manually verify that we're not doing bad assignments; just dial the 7-digit POP number, using the caller's area code. If you get something other than a recording, we're in a lot of trouble.

Ideally we would be able to add our own LCA definitions to the CCMI data, and avoid the problems entirely. Of 25,000 or so exchange areas, 5,000 are completely unsupported. Maintaining a complete set of data for areas with a tiny handful of people isn't cost-effective, for us or CCMI, but it would be nice if we could fix the areas where we do have some customers.

A more insidious problem has occurred in a few places, notably parts of Texas. In these cases, CCMI had only one local calling plan in the database, and it was an extended-area "metro" plan that not all of our customers had signed up for. The data that we got out of CCMI showed certain POPs as being free local calls, and sure enough, they were for everybody who had signed up for the extended plan. The rest of the people were less than pleased with the charges that showed up on their phone bills.

The PhoneDB generation process scans the entire set of local calling plans, and always uses the most restrictive definition. When a wide-area plan is the most restrictive definition of an LCA, we're in trouble.

This sort of problem is difficult to deal with, because in these situations the CCMI data is accurate. It just happens to be incomplete. In this particular case, we asked them to add the standard calling plan and they said they would look into it. This is another scenario where being able to tweak the local calling plan definitions would be useful. But for now all we can do is have the SOC change the cost of the POP from Local to toll using the "ChangeCallCost" PhoneDB feature.

There are some other odd things you might see in POP-O-Rama output, like:


For 604-523-0000 from NWESTMNSTR, BC (base cost=??):

  uunetdan/360-383-1000 in or near "Bellingham, WA" (FERNDALE, WA)

    toll?? 29.4mi  [wc=0.0mi] cost=??  [origin not in DB]

      --> 1-360-383-1000

"Origin not in DB" happens because the point of origin is in Canada, and we don't currently have data from CCMI for calls made from Canada. Note that "base cost" is "??", which means that we weren't able to figure out what it would cost for someone in 604-523 to call themselves.


For 817-278-0000 from EULESS, TX (base cost=0):

  cnc/972-375-0501 in or near "Dallas, TX" (GRAND PRAR, TX)

    ExpLocal 8.9mi  [wc=8.2mi] cost=242  [hacked!]

      --> 1-972-375-0501 then 972-375-0501

You will see "hacked!" when the kind of call and cost of the call have been explicitly changed by the person generating the PhoneDB. (Such as when the SOC changes the cost of a POP with the "ChangeCallCost" feature).

If you find yourself answering a phone call or an e-mail message from a customer who claims that a POP isn't local even though we think it is, don't jump to any conclusions without some corroborating evidence. People in areas with low population densities will often assume that exchanges they don't recognize aren't local. And if there has been a nearby area code split that the customer is unaware of, they will not recognize the new area code.

Of course, it would be a bad idea to dismiss such claims out of hand. The best evidence is a phone bill that shows the POP as being non-local. There have been several cases where the phone company mis-billed a call, either because of 11-digit dial patterns or errors on their part. With the bill in hand we can easily get either the telco or CCMI to straighten out their data. If they haven't yet received a bill, a call to the telco's business office will usually resolve the matter. But there have been cases where conflicting answers have come from the same source on subsequent calls. Also, be sure that you're talking to the right LEC, because different carriers will have different calling plans.

Local vs. toll issues should be reported to the SOC. If you're the one investigating a complaint, and we don't have a phone bill to look at, you should talk to the operator about the calls in question and ask whether they are (1) local, (2) local but expensive (e.g. zone 3 calling), (3) local toll, or (4) long distance. Most operators will just say "local" for #1 and "toll" for #2, #3, and #4 to avoid confusing the customer, but the distinction is important for us. It is also a good idea to find out how much per minute they are being charged for the call (to help determine if it's #2, #3, or #4).

POP, Phone Line, and Network Quality Issues

Not all POPs are created equal. WebTV requires that all POPs we use are capable of 28.8 kbps communication, and we take steps to ensure that there is adequate network capacity between our ISPs and us. Even so, there are cases where an individual POP or individual user will see substandard performance. This section provides a quick overview of symptoms and their causes.

The most common problems are in the user's house or apartment. Line splitters, large numbers of phones on the same line, phone extenders that plug into an A/C power outlet (commonly used with DSS systems), answering machines, called id boxes, and old wiring are common sources of problems. They can interfere with the phone line, resulting in slow connections.

The initial connect rate shown on the tricks-info page (or on the Vend-A-Telly page, which is explained later) doesn't tell the whole story. One of the features of modern modems is that they will "negotiate down", or start talking more slowly, if a lot of errors are detected. This is done because the modems are less susceptible to disruption at lower speeds. If the line conditions improve, the modem will negotiate back up. Unfortunately, we have no way to monitor the current speed or know the lowest speed used, so it's difficult to identify problems just by looking at the initial connect rate.

Even so, if you see connections being established at 21600 bps or lower, there's a good chance that the user's phone connection is poor. If many users are reporting similar troubles with that POP, and you connect at a slow rate when calling the same POP from here (you can do this with Vend-A-Telly), there's a chance that the POP itself is poorly connected.

Most phone companies won't guarantee connect rates of 28.8 kbps or higher. Pacific Bell only guarantees 4800 bps, which is pretty pathetic. The box will refuse to connect at less than 14.4 kbps, but could conceivably negotiate lower.

In the very early days, before the service went public, we displayed the connect rate right below the WebTV logo that you see before you get to the home page. The information was removed to avoid being swamped with calls from customers wondering why they weren't getting the full 33.6Kbps connections that they paid for. The reality is that not all ISPs have POPs that go above 28.8 kbps, and even then, most 28.8, 33.6, and 56K modem users don't get the speed they would hope for (26.4, 31.2, and 42K are much more common) because of noisy phone lines or other external factors. The reviewers of some 56K modems were unable to get actual data rates above 44K with even the best of modems. The worst couldn't break 30K.

When LECs won't even guarantee 14.4 kbps, it's impossible for WebTV to guarantee anything higher. We should make every effort to determine the cause of poor performance, but some things are beyond our control.

There's more to POP quality than just modem connect speed. Everything that the box receives has to be sent from our servers, across either the Internet or a private network connection to the ISP, from the ISP to the terminal server at the POP, then out through the modem and down to the user's box. The modem speed is a good place to start, but it's also important to consider the network performance.

It's difficult to get a simple performance number out of the network connections, because they may hit peaks where traffic grinds to a crawl for short periods, may exhibit spasmodic behavior with bursts of activity followed by long periods of silence, or may just move at a steady snail's pace. The easiest way to check the performance is to try to download a large image file (say a 150K GIF or JPEG) and see how long it takes to arrive. This feature is also provided by Vend-A-Telly.

An issue related to POP performance is line drops. There are a number of reasons why the box might suddenly disconnect from the service, some of which are discussed in a later section on "idle timeouts". Disabling or reducing the sensitivity of call waiting in the Dialing Options screen resolves most problems with unexpected disconnects.

The cause of some of our troubles with call waiting is that the box doesn't detect the call waiting "bong" accurately. If Any substantial disruption, including somebody picking up an extension phone or a random burst of noise on the line, can be interpreted as an incoming call, in which case the WebTV will disconnect, telling the user that there is an incoming call, even though there is not. Lowering the sensitivity setting will reduce these false-positive interruptions (or "phantom calls").

On the other side of the coin, if the box is not receiving a strong enough signal when there really is an incoming call, it may not disconnect at all, giving a busy signal to the person who is trying to call. Raising the sensitivity can reduce these occurrences. Of course, if the amount of line noise in their phone line is fluctuating a great deal and the customer is experiencing both of these problems, they may have to choose between the lesser of two evils. Since we have no control over the amount of line noise, it's really out of our hands.

Some line drops don't go away with the call waiting setting. There have been cases where the ISP's modems dropped the connection when a significant amount of line noise was detected, regardless of the setting on the WebTV box. This can usually be corrected by the ISP.

Service Fundamentals

Introduction to Tellyscripts

A tellyscript is a C-like program that is interpreted by the box. Their most important and most obvious function is to tell the box what numbers to dial and how to dial them, but they do a lot of other work besides.

Most communication software use what are known as "send/expect" scripts. Send/expect scripts send a particular string, and then expect a certain response. The MacPPP configuration is a simple example: generally you send a dial string, expect the word "Login:", send your user name, expect "Password:", and then send your password. The fancier versions will allow you to expect one of several different responses, and perform different actions based on what you get back.

WebTV's engineers thought this was a little simple-minded, so they combined the send/expect concept with a minimal C interpreter. The result was a program that could do all the usual sending and expecting, but with the flexibility of C code.

The current batch of tellyscripts will:

Initialize the modem. All of the phone settings in the user interface, including things like dial speed and call waiting sensitivity, are put into practice by the tellyscript.
Update the message on the progress bar in an appropriate language while the box is connecting.
Send the appropriate login and password to one or more of several different ISPs (including OpenISP ISPs).
Parse all modem result codes, and convert them into connect rate and protocol values for display by the box (like on the tricks-info and vend-a-telly pages).
Do some really funky things involving NVRAM and phone settings.
Combine dial prefixes, including the special "only for long-distance calls" prefix on the Obscure Dialing Options page.
Work around bugs in certain versions of the modem firmware.
Deal with several different failure modes, and return appropriate error status codes.
Post "this may be a toll call" alert dialogs.
Dial POPs several times and in different orders, moving on to the next when one fails.
With POPtimization, use one of up to eight different *sets* of POPs based on day of week, time of day, and what month it is. (This is still in development).
Set the primary and secondary name servers that the box uses when in proxy-less mode.
Send and expect.

Each tellyscript is divided into four sections. The pieces are combined on the service, and the full script is then tokenized and compressed before being sent to the client. On disk, the files are named ".tsf", which stands for TellyScript Fragment. The four sections are:

base.tsf - common functions.
locale.tsf - country-specific features (e.g. Japanese connect messages).
<isp>.tsf - one or more tellyscript fragments, one per ISP. These are named after the ISP, so CNC's .tsf file would be called cnc.tsf. These are very short; usually they just have the ISP's Radius login info.
<generated> - tellyscript code generated on the fly. This is where the actual phone numbers and "this may be a toll call" warnings go.

When the service sends a script down, it saves a blob of information in the service that looks like this (line broken in half for readability):


    0x34567117-0x4abf9aa7-base:36:-|locale:2:-|__wpb:1:3261095|__cnc:2:6870610|

      __wpb:1:3261095|__cnc:2:16506870610|__artemis:1:18006108918

Translated into human-readable form, it looks like this:


    Hash 0x4abf9aa7, sent Tue Oct 28 15:02:17 1997

    v36 base/-

    v2  locale/-

    v1  wpb/3261095

    v2  cnc/6870610

    v1  wpb/3261095

    v2  cnc/16506870610

    v1  artemis/18006108918

The "vN" part tells you what version of the script was sent down. We gave the user version 36 of base.tsf, version 2 of locale.tsf and cnc.tsf, and version 1 of cnc.tsf and artemis.tsf. The "sent Tue Oct ..." part tells you when the script was sent down, and the numbers after the providers' names show you the exact string of digits that the box is going to dial. (If the customer has a dial override, the number may be divided by hyphens: ie, "wpb/326-1095").

(In the example, the user has the wpb/650-326-1095 and cnc/650-687-0610 POPs. He will use 7-digit dialing on both wpb attempts, but will try 7-digit dialing on the first cnc attempt and 11 on the second. This user has apparently established a 7-digit dialing for the wpb POP, but hasn't yet determined the pattern to use for the cnc POP.)

If the user were given a toll warning message, the first line for the provider would look something like this:


    v2  wpb/3261095 {toll warning sent}

and "__wpb:1:3261095" would be "_W_wpb:1:3261095" (with a 'W' up front).

The "Hash 0x4abf9aa7" part is the key to getting tellyscripts updated. This number is essentially a unique representation of the big blob. It's sent down to the box with the tellyscript and handed back up on every connection. When the box reaches the headwaiter, we recompute the tellyscript that they should have, and compare the new hash value with the box's hash value. If any part of the blob changes, the new "hash" value will be different, and we know that they need a new script.

This means that if a provider or dial pattern changes, a tellyscript fragment gets updated, or a toll warning dialog is added or removed, the service will automatically send the box a new tellyscript. Since the box tells the service what it has, there's no risk of the service thinking that the box has a different tellyscript than it actually has.

Most people don't need to understand the above in detail. Either trust that the system works, or reread the above until you're convinced (one way or the other).

Tellyscript Return Codes

After a failure that occurs while the box is connecting to the service, the box will display a dialog with an error message. If you hit the "Options" key on the keyboard or remote, it will display an "M" code and an "S" code, e.g. "M-26/S10". The "M" code is the box's message code, and the "S" code is the return value from the tellyscript.

The current set of tellyscript return values ("S" codes) are:


   0  ParseError - tellyscript was bad.

   1  Connecting - (not really an error)

   2  Success - tellyscript finished successfully

   3  ConfigurationError - modem and box not on speaking terms.

   4  DialingError - modem not saying what we wanted it to.

   5  NoDialtone - didn't hear a dial tone on the phone line.

   6  NoAnswer - POP number just kept ringing.

   7  Busy - POP number was busy.

   8  HandshakeFailure - modem handshake failure; this is rare.

   9  UnknownError - got an unknown result code back from the modem.

  10  BadPassword - authentication failure.

  11  PPPHandshakeFailure - couldn't negotiate PPP successfully.

  12  NoCarrier - something answered, but it wasn't a modem.

  13  BlackHole - rare; last POP was a black hole, and we ran out of POPs.

  14  VerySlowConnect - modems connected at less than 14.4Kbps.

  15  BadPasswordNR - same as #10, but we don't reboot the box.

  16  UnhappyScript - the tellyscript generator blew it.  This is bad.

When dealing with customers who are having trouble calling in, it is important to get both the "M" codes and the "S" codes.

The "M" codes are described elsewhere.

Dial Overrides

Dial overrides are a quick and easy way to send somebody to a particular number with a specific dial pattern. Unfortunately they're a little too easy. They can solve a problem (or at least placate a customer) quickly, but they don't go away when the underlying problem gets solved. In general, dial overrides are a bad thing, and alternate solutions should be used whenever possible.

In the early days of the service, there was no such thing as a dial override. Because there was no quick solution, the problems were fixed in other ways, or were analyzed until it was determined that the problem was unrelated to the POP number being dialed. This was time-consuming, but very effective at identifying the root cause of problems.

The issue that drove the existence of dial overrides was that some customers bought special calling plans through their phone company that allowed them to call a specific region or number for a flat rate per month. If the PhoneDB got updated, and their primary number changed, they would no longer be dialing the preferred number. We needed a way to send people to a specific area.

The first version of dial overrides was added a few hours after a service release had frozen, because by consensus it had been placed on the C-grade "would be nice" list, and wasn't really supposed to be done at all. Consequently it was done in a big hurry. The database stored one override that had an ANI, a provider name, and the exact string of digits needed for dialing the POP. If the ANI matched, we sent a tellyscript for that POP and provider, complete with a warning dialog. This mechanism quickly became popular, and eventually support for it was added to the CMR tool.

With a little experience it became clear that the mechanism was insufficient. You couldn't put in an override for a box behind a store's PBX, because the ANI value might be different each time the box logged in. You couldn't override to an 800 number because the warning dialog would show the 800 number (this is a bad thing, as explained in a later section). The override didn't go away if the POP went away. And you couldn't have the override go dormant if the user moved to an area with local coverage.

The second generation of dial overrides provided for these, mostly. It was again done at the last minute and at a low priority. Nearly a year later the CMR tool still couldn't (and even now can't?) parse the new format, and some of the features -- like disabling the override when the POP goes away -- weren't implemented. The only way to do the new-style overrides is to have the SOC do it with the "clientpopedit" feature.

There are things that can be done to make overrides less harmful. The trouble with them is that it will require CMR changes to make them accessible to Customer Care.

High on the SOC's request list are "negative overrides", where you get to specify a number (or perhaps a complete exchange area) that the user says they don't want to be calling to. You can remove the POPs that the user doesn't like, and leave all the rest in. Another desirable item are overrides with expiration dates, for cases where a POP is temporarily out of commission, and the user is screaming because they're too impatient to wait for it to give up and try the next number.

One interesting "feature" of overrides is that they are bound to a box, not to a subscriber. If a user swaps a box because of defects and has their account moved over to the new box, the dial override doesn't move with them. This isn't necessarily a bad thing, because the dial override might have been entered as part of diagnosing a problematic box. When the old box is "unregistered" prior to adding a new account, the dial override is purged automatically.

But no matter what fancy features may get added to overrides in the future, the rule of thumb remains: don't use them unless you absolutely need to. The only valid reason for a permanent dial override is if the customer has a special calling plan that requires them to call a specific POP.

Some common abuses of dial overrides are:

Dial pattern fixes. Use the "dpedit" tool for this. Once the pattern has been changed, tell them to unplug and say they've moved so they'll go back through scriptlessd and pick up the changes.
Dead POP workarounds. If the POP is dead, but they roll over to their secondary POP (or the 800 fallback POP), send the issue to the SOC and ask the customer to be patient, we're working on it. If the POP is dead and it will not roll over to the secondary number, you can, in some cases, give them a temporary dial override. if this is done, you must leave the ticket in Oh-Hold status and remove the override as soon as the POP is back in working order. Only then can the ticket be closed.
Slow POP workarounds. This is harder, because the POP is connecting but is performing poorly. A simple technique is to turn audible dialing on, then unplug the phone after the first dialing sequence completes. When it gives up it'll try the second number (unless they only have one local call, in which case it tries the first number twice). It's a pain, but it works. If they insist on getting a fix, give them a temporary dial override, but leave the trouble ticket in On-Hold status. Remove the override a few days later when things are better and close the ticket then.
PhoneDB local vs. toll problems. Using a dial override to fix these temporarily is okay, but the ticket should be left open as long as the override is in place. The problem is not solved until the PhoneDB is correct. When the PhoneDB is fixed, the override gets removed, and only then is the ticket closed.

Like the saying says, "if you don't have time to do it right, when will you have time to do it over?" Every dial override that gets added also has to get removed, because sooner or later that POP will go away or more local numbers will be added or whatever. If everybody gets overridden to POP #2 when POP #1 gets congested, the load balancing algorithms can't do their work, and pretty soon POP #2 is going to be congested and all those people are going to be calling you all over again.

Customers that can get a quick fix by calling Customer Care will do so every time their POP gets slow. Don't encourage people to call up every time they have the slightest problem.

Avoid quick fixes that just postpone the inevitable.

Call Ordering and the Fallback Number

Once we've chosen the POPs and checked the available dial patterns, we have to dial the phone. We know which POP to try first, but should we do the first POP twice in a row and then do the second, or alternate between the first and second?

The call ordering depends on how many POPs they have and what kind of call each is. In every script, we bail out when we connect successfully or if we are unable to detect dial tone before dialing. "Black holes", where we connect successfully but then are unable to talk to the WebTV service, are handled specially (explained later).

"Black hole" is the WebTV term for a POP that accepts modem connections but is unable to carry network traffic between the box and service. The tellyscript believes it has made a successful connection, but the box is unable to do anything after getting connected. Early boxes (pre-client 1.1) would connect to black hole POPs and stay there until disconnected by a timeout or an impatient user. As of client 1.1, the box will try to connect to the service for a minute and a half. If it is unable to get a response from the headwaiter in that time, it will disconnect, then restart the tellyscript at the point where it left off.

If we only have one POP:

Try the number.
If we have a secondary dial pattern, try it; otherwise skip this step.
Retry the number
Call the 800 fallback number.

If both POPs have the same cost (i.e. both are LOCAL, or both are ExpLocal or toll but have the same estimated cost):

Try POP #1.
Try POP #2.
Retry POP #1, using secondary dial pattern if we have one.
Retry POP #2, using secondary dial pattern if we have one.
Call the 800 fallback number.

If one POP is more expensive than the other (perhaps one local and one toll):

Try POP #1.
Retry POP #1, using secondary dial pattern if we have one.
Try POP #2.
If we have a secondary dial pattern for POP #2, try it; otherwise skip this step.
Call the 800 fallback number.

In no case do we try more than 5 numbers, and we don't try a more expensive number more than once unless we're trying to figure out what the correct dialing pattern is.

We show the toll warning dialog before the first time we call an ExpLocal or toll POP. The warning contains the number to be dialed and the city name where the POP lives, using the "nice" form of the city name, along with a suggestion to contact their telephone company to confirm whether or not it's a toll call.

The toll-free fallback number, sometimes called "fallover" or "failover", has been around since the early days of dialing. The idea was to prevent certain kinds of failures, such as POP outages or number assignment glitches, from giving the service a bad name.

It is important to remember that nowhere in the Terms of Service does it guarantee connectivity, and we have never promised customers that they would have unlimited toll-free access at our expense. The fallback number is supported as a courtesy, and may go away or have its use restricted at any time and without notice.

The 800 fallback number will be omitted in certain circumstances. The most significant one is called the "AllTollNoRoll" feature. It was added because some users without local POPs had, strangely, neglected to order long distance service on their WebTV line. Every POP number would fail, until the box called the fallback number. The easiest way to avoid this situation was to leave the fallback number out of tellyscripts for users with nothing but toll calls.

A similar situation existed for a customer with phone service that only allowed calls to 800 numbers and 911 (Universal Lifeline Service?). In this case, not even local calls could be made, so despite having two local POPs the user ended up on the fallback number every time. The cure for such users (besides them getting a real phone line) is the "disable fallback" flag in the customer's account. Since it is not possible to do this from CMR, the SOC would need to do it.

Of course, it's always possible for users to disrupt the dialing sequence several times until the box dials the 800 number. For most people this is unnecessary and inconvenient: if they didn't have (in CCMI's and our opinion) a local call, they wouldn't have the fallback number in their script, so either they'll never get to the fallback number or they're trying really hard to avoid making local calls. We can identify such users through usage reports, and deal with them on an individual basis as necessary.

Idle Timeouts

Idle timeouts make the box disconnect from the service and hang up the phone when nothing has happened for a set period of time. There are two kinds of idle timeouts: input timeouts and network timeouts.

Input timeouts happen when the user stops using the box. If the box doesn't see any activity from the user, such as typing on the keyboard or hitting buttons on the remote control, it will disconnect after 10 minutes. This timeout is set by the service. If the user is connected through an 800 number, the input idle timer is reduced to 5 minutes.

Network timeouts happen when no packets are being transmitted between the box and service. The box used to have a network idle timeout, but this is no longer in use. However, some ISPs, notably CNC, have idle timeouts on their equipment. After 30 minutes with no network activity, CNC's terminal servers will drop the line.

If a user is flipping through a large page, or is composing a long e-mail message, there is no network activity. The box won't choose to disconnect, but the terminal server will. If a user is experiencing line drops while composing long e-mail messages, this is probably the cause. Hitting the "Options" button on the keyboard every so often will send a ping to the system and restart the clock.

Some providers have time limits that don't care whether you're idle or not. After an hour or two the connection is dropped, so that computer users can't leave their machines running and wander off. (Some computers will just redial when disconnected anyway, but try telling that to the ISPs).

Dialing Details

OpenISP

The idea behind OpenISP is that the customer can choose to use their own ISP instead of the ones that WebTV provides, for a discount of $10 per month on their WebTV subscription. This way people for whom we do not have local access (or who already have an ISP account for their computer) can connect at a less expensive rate to them, and we can save money by not having to provide POP support for those customers.

Any ISP that supports PPP (a standard network protocol) and PAP (a standard method of sending up login and password information) will work. But we also require that they be capable of at least a 26.4 connection rate and that they provide their own tech support (since they're not our POPs, we can't fix them if they break).

When the customer enters their login name, password, and the phone number(s) to dial, the service will generate an OISP tellyscript for them, which will look something like this:


0x3521411d-0x214db8cd-base:49:-|locale:2:-|__OpenISP:3:-

If the user enters only one POP number, we will try that number twice. If they enter two POP numbers, we will try the first one once and the second one once. After making two calls we give up. We never try a fallback number. When they're using their own ISP, they're only using their own ISP.

the Access Number field in dialing options is ignored for OpenISP customers, unless they are using an access number that has a "$" in it. Dial patterns and dial overrides have no effect on OpenISP customers.

It is important to note that the discounted rate applies only to months in which the user connects solely though their OISP. If they connect to the service just once using our POPs they will be charged the full subscription price. luckily, CMR allows you to view the history of when they start/stop using their OISP (on the "Boxes" screen).

Access Field

If you type a POP number into the "Access" field in the Dialing Options screen, you can force the box to dial a POP that is not in the tellyscript. The trouble is that the tellyscript has no way to determine which ISP the phone number in the Access field is associated with. So it just assumes that it's the same as the ISP that's associated with the first POP in the tellyscript.

If your primary POP is from CNC, then you can enter any CNC number in the access number field and it will connect to that POP instead of the one in the tellyscript. Entering a POP number for UUNET, ZipLink, PSI, or any of the other ISPs will result in a failed connection, since it will try to use CNCs authentication info to connect. (Sometimes two providers use similar enough info that it will work, but this is rare).

Because of the difficulty in getting the POP number matched up with the first entry in the tellyscript, using this field is discouraged except in certain rare cases.

One place where the access number field is useful is when it's not really used as an access number. A special feature was added to the service to support dialing *suffixes* via the access number field. If the '$' character appears in the access number field, the tellyscript will replace it with the POP number currently being dialed.

For example, if you set your access number to "10321,$,54321", and the POP numbers assigned by the service are 3261095 and 6145539, the box will dial the string "10321,3261095,54321" (the commas are two second pauses), and if that fails, it will next try "10321,6145539,54321". (Prefixes like 10321 really ought to go in the prefix fields rather than the access number field; It is included it here to show that the '$' can be anywhere).

This is most often used for customers who are trying to connect from a military base or dorm room.

Vend-A-Telly

Vend-A-Telly is a web page attached to the "WebTV Tricks" page in the service. From there you can tell your box to dial any POP from any provider.

The page should be used whenever a POP is suspected of having poor or slow connection. You can enter the POP number, dial in, check the connect rate, and download a large test image to see if the network is slow.

If the POP is dead or deathly slow, DO NOT give the user a dial override unless it is a temporary override, in which case the remedy ticket is to be kept open in your ticket queue. Once the POP gets better, you can remove the override and then you can close the ticket. Network congestion is a fact of life; constantly moving users between POPs will most likely just make the problem move with the users.

Troublesome POPs should be reported to the SOC.

Client Upgrades and Brain-Dead Boxes

Client upgrades for WebTV "Plus" boxes are terribly uninteresting, because the hard drive allows them to do the download without disconnecting. Also, WebTV "Plus" boxes with damaged approm images go into the "mini-browser", which has most of the features you'd find in a full v1.3 client, so even if something goes wrong you can still use secret codes and such. The discussion here concentrates on "Classic" boxes, which are far more interesting.

On a WebTV "Classic" box, the client software is stored in ROM. Since the ROM is completely wiped away during an upgrade to make room for the new client software, the upgrade is actually done by the boot ROM. the boot ROM is only about 1/8th the size of the ROM and contains very little information. All the boot ROM knows is dial in on the scriptlessd number, issue simple requests, and write chunks of data into flash ROM.

The usual behavior is flashromd (one of our servers) tells the client to go flash itself. The box hangs up, dials back in with the current tellyscript, reconnects to the same flashromd, and starts asking for the current client software, to replace what it just flashed. When it has all of the pieces, it hangs up and reboots.

The term "brain-dead box" refers to a WebTV "Classic" unit with a damaged ROM image. The easiest way to get a damaged ROM image is to initiate a download and have it interrupted before completion (by an incoming call, connection problems, etc). When the box restarts, it does a checksum on the ROM, and discovers that things don't look the way they should. It boots into the boot ROM and immediately starts a flash download to get the current client version back on the ROM.

The boot ROM ignores everything in NVRAM, because flash is corrupted and NVRAM is held in flash. It will accept an access number and a dial prefix, which have to be entered with the extremely limited user interface supplied by the boot ROM, but most of the other dial options can't be set. You can't use any secret codes with a brain-dead box, because the codes are handled by the full client ROM, not the boot ROM.

Every time the brain-dead box is powered on, it connects to scriptlessd, asks for a tellyscript and an IP address for flashromd, then disconnects and executes the tellyscript. After connecting to the local POP, it initiates a download.

An interesting problem arises when an OpenISP box becomes brain dead. We no longer have access to the person's OpenISP login and password, because those are kept in NVRAM, and we can no longer believe that NVRAM is valid. We have to send them somewhere else. But where?

Currently, the only option is to have them connect through the POPs that they would normally connect through if they weren't on OpenISP. It's likely that if they're on OISP it's because those POPs are toll for them, but if they are unable to successfully upgrade on their OISP POP, then connecting to our POPs is the only way they're going to get a successful upgrade. So they may be charged by their telco for the call, but we don't want to charge them the full subscription rate that month. The systems will automatically charge them the full subscription rate, since they did hit out POPs, so we'll need to send these tickets to Finance to get the 10.00 credited back to the account.

MessageWatch and EPG

MessageWatch is the fancy name we use for a feature that allows the box to dial in at a specified time and check for new mail. The idea was to have it log in during the early morning hours, so that you can see if you have new mail without needing to log in when you wake up.

Unfortunately, a fairly large number of people configured it to log in around 5pm, so that the mail light is set when they get home from work. This is unfortunate because it means the boxes on the west coast are coming in at the height of peak usage on the east coast.

Whatever the case, MessageWatch connections are vastly simplified versions of normal connections. A few salient facts:

The box only talks to the headwaiter. It continues to accumulate phonelog data, but doesn't send it's phonelog to the service.
The box will retry every 30 minutes if it can't get in.
The box will shut itself off after 2.5 (?) minutes, no matter what.
If a user has one local and one toll call, only the local POP will be used.
If a user has nothing but toll calls, only the first toll POP will be used (and it will not give them the toll warning, since they are not likely to be there to hit continue).
If the box is unable to connect, it will display a message informing the user of this the next time the box is powered on.

If a user is seeing multiple calls starting at a specific time and separated by 30 minutes each on his or her phone bill, chances are MessageWatch is involved.

WebTV "Plus" boxes do something similar with EPG (Electronic Program Guide) data downloads. However, in the 2.1 client, the EPG down loader won't stop with the first POP if the second one is toll. There are plans to fix this for future client releases.

In "Classical" and earlier service releases, MessageWatch was only enabled when the user turned it on. In "Disco" and later, it may be enabled for all new users by default.

VideoAds

VideoAds are short (15-second) VideoFlash clips that play when the box is powered on. They are downloaded during a MessageWatch connect, play once, and are then thrown away. This feature was first added in the 1.3 client.

There are a number of restrictions on the set of users that get VideoAds. The download takes about 5 minutes, which isn't terribly long; but if the box is making a toll call every night it can add up. We also want to control our own costs by not sending the VideoAds to users with hourly-rate POPs.

The rules are:

Don't send it if they're using OpenISP.
Don't send it if they're making an ExpLocal or toll call.
Don't send it if they're on an 800# POP.
Don't send it if they're connected to an hourly provider.
Don't send it if they're not in the right user category.

The VideoAd plays during the first part of the box's connection to the service. Instead of seeing the endless highway and the connection status bar, you watch the movie. Audible dialing is disabled for connections that start with a VideoAd playing.

Last Modified 27 May 1998 by Ray Hill