Distributed Computing = Distributed Responsibility = Fingerpointing

So this morning I tried to process an order a customer had sent me last night.  I went to log into my photolab service provider, exposuremanager.com, just like I always do.  Bzzzt.  Can't get in.  “This link appears to be broken.”

Huh, maybe it's my machine.  So I try my daughter's machine, nope she can't see it either.  Internet down?  Nope I can get my email and see all the other sites I use regularly.

So I (reasonably) assume exposuremanager is down.  That's not too cool–because while it is down my customers can't place orders.  So I call customer support and leave them a message that their site is down.

A little while later I get an e-mail from them stating their site is up and running fine (and by the way, nice pictures on your site!) Thanks, but I still can't get in.

So I try the DOS utility “ping” to ping their server.  Sometimes it can't resolve exposuremanager to an IP address (implying a problem with a domain name server at Comcast) and other times it can resolve the IP address but gets no response.  I check with WHOIS on Tucows (their registrar) to see if their domain has expired, but it hasn't.  WTF?

So then I run a traceroute in an attempt to see where the communication fails between me and exposuremanager:

Tracing route to exposuremanager.com [66.254.91.235]

over a maximum of 30 hops:

 

  1    <1 ms    <1 ms    <1 ms  www.routerlogin.com [192.168.1.1]

  2     7 ms    10 ms     7 ms  [ my IP omitted ]

  3     8 ms     8 ms     6 ms  ge-1-2-ur01.gardner.ma.boston.comcast.net [68.85.187.109]

  4    11 ms    10 ms     9 ms  te-0-8-0-2-ar01.woburn.ma.boston.comcast.net [68.85.162.93]

  5    14 ms     9 ms     9 ms  pos-0-15-0-0-ar01.needham.ma.boston.comcast.net [68.85.162.145]

  6    15 ms    14 ms    22 ms  pos-0-0-0-0-ar01.chartford.ct.hartford.comcast.net [68.85.162.70]

  7    16 ms    18 ms    17 ms  pos-2-4-0-0-cr01.newyork.ny.ibone.comcast.net [68.86.90.61]

  8    22 ms    18 ms    17 ms  Vlan546.icore1.NJY-Newark.as6453.net [206.82.132.41]

  9    17 ms    45 ms    18 ms  if-6-0-0-25.mcore3.NJY-Newark.as6453.net [216.6.57.41]

 10    19 ms    17 ms    18 ms  if-2-0.core1.NTO-NewYork.as6453.net [216.6.57.66]

 11    17 ms    18 ms    19 ms  sl-gw31-nyc-14-0-0.sprintlink.net [160.81.249.29]

 12    18 ms    19 ms    17 ms  sl-crs2-nyc-0-2-0-0.sprintlink.net [144.232.13.35]

 13    23 ms    27 ms    19 ms  sl-bb20-msq-2-0-0.sprintlink.net [144.232.20.74]

 14    22 ms    22 ms    24 ms  sl-bb21-msq-15-0-0.sprintlink.net [144.232.9.110]

 15    26 ms    26 ms    27 ms  sl-crs1-rly-0-8-5-0.sprintlink.net [144.232.20.73]

 16    25 ms    25 ms    26 ms  sl-bb20-dc-5-0-0.sprintlink.net [144.232.8.162]

 17    24 ms    28 ms    25 ms  sl-crs1-dc-0-0-0-0.sprintlink.net [144.232.15.11]

 18    74 ms    73 ms    74 ms  sl-crs2-fw-0-11-3-0.sprintlink.net [144.232.19.200]

 19     *        *        *     Request timed out.

 20     *        *        *     Request timed out.

 21     *        *        *     Request timed out.

 22     *        *        *     Request timed out.

 23     *        *        *     Request timed out.

 24     *        *        *     Request timed out.

 25     *        *        *     Request timed out.

 26     *        *        *     Request timed out.

 27     *        *        *     Request timed out.

 28     *        *        *     Request timed out.

 29     *        *        *     Request timed out.

 30     *        *        *     Request timed out.

 

Trace complete.

As you can see, the communication makes it to “sl-crs2-fw-0-11-3-0.sprintlink.net [144.232.19.200]” and then fails.  Sprintlink.net is inside of the Sprint network.  To get as far as I did required the services of three companies, Comcast (comcast.net),  Tata Communications (as6453.net), and Sprint (sprintlink.net). None of these companies is my company (sagewoodstudios.com) or my photolab service provider (exposuremanager.com).  In fact after the last hop, I don't know what the next address would be–it might be a fourth company, or another server in the Sprint network.  Exposure Manager can't help me–it's not their computer.  Comcast can't help me–it's not their computer either.  Sprint *might* be able to help me, but I'm not their customer. 

What I do know is, if I didn't have to use this route, I would indeed be able to get there.  I found an online provider of the standard 'net tools (ping, traceroute, etc.) called Network-Tools.com (that's a handy link btw, you might want to bookmark it).  I can get to their server to see their website, and when I ask THEM to do a traceroute to exposuremanager, they can get there just fine:

TraceRoute to 66.254.91.235 [exposuremanager.com]

Hop

(ms)

(ms)

(ms)

IP Address

Host name

1

8

6

6

206.123.69.137

-

2

9

6

6

8.9.232.73

xe-5-3-0.edge3.dallas1.level3.net

3

12

7

7

4.69.145.140

ae-3-80.edge2.dallas3.level3.net

4

8

15

6

4.71.220.14

xo-communic.edge2.dallas3.level3.net

5

10

12

9

207.88.13.122

207.88.13.122.ptr.us.xo.net

6

41

40

40

207.88.12.46

207.88.12.46.ptr.us.xo.net

7

41

40

40

65.106.1.69

65.106.1.69.ptr.us.xo.net

8

41

49

52

65.106.5.10

p0-0-0.mar2.la-ca.us.xo.net

9

46

40

40

207.88.81.170

p15-0.chr1.la-ca.us.xo.net

10

50

44

45

66.238.50.106

66.238.50.106.ptr.us.xo.net

11

53

45

47

66.254.64.1

gw1.pixelgate.net

12

53

50

47

66.254.91.235

host235.exposuremanager.com

Trace complete

As you can see, because they are starting from a different provider (level3.net) their communication path takes a different route that never involves any of the companies I'm forced to use.  If Network-Tools.com provided a “browser in a browser” basically an embedded frame that I could point anywhere I want to, I'd be able to get to my photolab and process my customer's orders.

In the meantime I'm pretty stuck.  I can't help my customers, and my service provider can't help me.  Welcome to the Internet… where you really “can't get thar from hyar”.

10 thoughts on “Distributed Computing = Distributed Responsibility = Fingerpointing

  1. Hi,
    Donovan Janus here from ExposureManager. Thank you very much for the data on this post. We are trying to work with our data center on narrowing down exactly which routes are effected and what calls we can place to get this problem fixed. It seems to be effecting primarily Comcast customers but that is not to say Comcast is at fault (like you pointed out, it stops at Sprint).
    Could you do a traceroute to the following addresses as well for me?
    http://www.pixelgate.net
    http://www.spiritofamerica.net
    Thank you again for your help and patience. We are trying all we can to get this resolved.
    Donovan Janus
    Chief Executive Officer
    ExposureManager.com

  2. Sure Donovan, and thanks for getting in touch with me. I think my e-mails may not be getting through to exposuremanager for the same reasons. I tried to traceroute to “www.pixelgate.net” this morning and I could get there, but I could *not* get to “gw1.pixelgate.net”–I'll give it a shot now. Here's the trace:

    Tracing route to http://www.pixelgate.net [66.254.66.110]
    over a maximum of 30 hops:
    1     2 ms    <1 ms    <1 ms  http://www.routerlogin.com [192.168.1.1]
      2    15 ms     6 ms    10 ms  (omitting my IP)
      3     7 ms     7 ms    11 ms  ge-1-2-ur01.gardner.ma.boston.comcast.net [68.85.187.109]
      4     9 ms     9 ms    11 ms  te-0-8-0-2-ar01.woburn.ma.boston.comcast.net [68.85.162.93]
      5    18 ms    13 ms    29 ms  po-16-ar01.berlin.ct.hartford.comcast.net [68.87.146.50]
      6    15 ms    18 ms    14 ms  be-10-ar01.chartford.ct.hartford.comcast.net [68.87.146.29]
      7    18 ms    17 ms    16 ms  pos-2-4-0-0-cr01.newyork.ny.ibone.comcast.net [68.86.90.61]
      8    21 ms    18 ms    35 ms  Vlan546.icore1.NJY-Newark.as6453.net [206.82.132.41]
      9    17 ms    18 ms    24 ms  if-6-0-0-25.mcore3.NJY-Newark.as6453.net [216.6.57.41]
     10    18 ms    17 ms    26 ms  if-2-0.core1.NTO-NewYork.as6453.net [216.6.57.66]
     11    19 ms    18 ms    19 ms  sl-gw31-nyc-4-0-0.sprintlink.net [160.81.43.177]
     12    20 ms    19 ms    18 ms  sl-crs1-nyc-0-2-0-0.sprintlink.net [144.232.13.33]
     13    51 ms    35 ms    32 ms  sl-crs2-rly-0-8-5-0.sprintlink.net [144.232.20.165]
     14    32 ms    33 ms    32 ms  sl-crs2-dc-0-12-2-0.sprintlink.net [144.232.19.221]
     15    82 ms    85 ms    83 ms  sl-crs2-fw-0-12-0-1.sprintlink.net [144.232.19.102]
     16     *        *        *     Request timed out.
     17     *        *        *     Request timed out.
     18     *        *        *     Request timed out.
     19   116 ms   122 ms   114 ms  66.254.66.110
    Trace complete.

    See that seems to work. But if I try to get to “gw1.pixelgate.net” (one hop away from exposuremanager.com):

    Tracing route to gw1.pixelgate.net [66.254.64.1]
    over a maximum of 30 hops:
    1    <1 ms    <1 ms    <1 ms  http://www.routerlogin.com [192.168.1.1]
      2     8 ms     8 ms     9 ms  (omitting my IP)
      3     9 ms     7 ms     8 ms  ge-1-2-ur01.gardner.ma.boston.comcast.net [68.85.187.109]
      4     9 ms     9 ms     9 ms  te-0-8-0-2-ar01.woburn.ma.boston.comcast.net [68.85.162.93]
      5    16 ms    13 ms    13 ms  po-16-ar01.berlin.ct.hartford.comcast.net [68.87.146.50]
      6    19 ms    15 ms    14 ms  be-10-ar01.chartford.ct.hartford.comcast.net [68.87.146.29]
      7    16 ms    17 ms    28 ms  pos-2-3-0-0-cr01.newyork.ny.ibone.comcast.net [68.86.90.57]
      8    20 ms    20 ms    16 ms  Vlan546.icore1.NJY-Newark.as6453.net [206.82.132.41]
      9    18 ms    25 ms    19 ms  if-6-0-0-25.mcore3.NJY-Newark.as6453.net [216.6.57.41]
     10    20 ms    38 ms    18 ms  if-2-0.core1.NTO-NewYork.as6453.net [216.6.57.66]
     11    19 ms    18 ms    18 ms  sl-gw31-nyc-4-0-0.sprintlink.net [160.81.43.177]
     12    29 ms    21 ms    18 ms  sl-crs2-nyc-0-2-0-0.sprintlink.net [144.232.13.35]
     13    58 ms    42 ms    46 ms  sl-crs2-chi-0-5-0-0.sprintlink.net [144.232.20.162]
     14    54 ms    54 ms    53 ms  sl-crs2-kc-0-8-0-0.sprintlink.net [144.232.18.7]
     15    74 ms    76 ms    75 ms  sl-crs2-fw-0-4-0-1.sprintlink.net [144.232.19.140]
     16   110 ms   106 ms   107 ms  sl-crs1-ana-0-9-3-0.sprintlink.net [144.232.20.131]
     17     *        *        *     Request timed out.
     18     *        *        *     Request timed out.
     19     *        *        *     Request timed out.
     20     *        *        *     Request timed out.
     21     *        *        *     Request timed out.
     22     *        *        *     Request timed out.
     23     *        *        *     Request timed out.
     24     *        *        *     Request timed out.
     25     *        *        *     Request timed out.
     26     *        *        *     Request timed out.
     27     *        *        *     Request timed out.
     28     *        *        *     Request timed out.
     29     *        *        *     Request timed out.
     30     *        *        *     Request timed out.
    Trace complete.

    Here's the trace to http://www.spiritofamerica.com (which also fails):

    Tracing route to http://www.spiritofamerica.net [66.254.91.162]
    over a maximum of 30 hops:
    1    <1 ms    <1 ms    <1 ms  http://www.routerlogin.com [192.168.1.1]
      2     6 ms     7 ms     7 ms  (my IP omitted)
      3     7 ms     9 ms     7 ms  ge-1-2-ur01.gardner.ma.boston.comcast.net [68.85.187.109]
      4    13 ms    13 ms     8 ms  te-0-8-0-2-ar01.woburn.ma.boston.comcast.net [68.85.162.93]
      5    10 ms     9 ms    17 ms  pos-0-15-0-0-ar01.needham.ma.boston.comcast.net [68.85.162.145]
      6    14 ms    15 ms    15 ms  pos-0-1-0-0-ar01.chartford.ct.hartford.comcast.net [68.85.162.74]
      7    17 ms    19 ms    15 ms  pos-2-4-0-0-cr01.newyork.ny.ibone.comcast.net [68.86.90.61]
      8    26 ms    35 ms    17 ms  Vlan546.icore1.NJY-Newark.as6453.net [206.82.132.41]
      9    24 ms    21 ms    17 ms  if-0-0-0-1100.mcore3.NJY-Newark.as6453.net [216.6.57.1]
     10    17 ms    22 ms    17 ms  if-2-0.core1.NTO-NewYork.as6453.net [216.6.57.66]
     11    18 ms    24 ms    17 ms  sl-gw31-nyc-14-0-0.sprintlink.net [160.81.249.29]
     12    20 ms    20 ms    18 ms  sl-crs1-nyc-0-2-0-0.sprintlink.net [144.232.13.33]
     13    26 ms    26 ms    27 ms  144.232.18.210
     14    26 ms    24 ms    40 ms  sl-crs2-rly-0-2-2-0.sprintlink.net [144.232.19.2]
     15    24 ms    28 ms    29 ms  sl-bb21-dc-5-0-0.sprintlink.net [144.232.8.164]
     16    24 ms    25 ms    24 ms  sl-crs2-dc-0-4-0-0.sprintlink.net [144.232.15.19]
     17    83 ms    72 ms    72 ms  sl-crs1-fw-0-11-3-0.sprintlink.net [144.232.19.202]
     18     *        *        *     Request timed out.
     19     *        *        *     Request timed out.
     20     *        *        *     Request timed out.
     21     *        *        *     Request timed out.
     22     *        *        *     Request timed out.
     23     *        *        *     Request timed out.
     24     *        *        *     Request timed out.
     25     *        *        *     Request timed out.
     26     *        *        *     Request timed out.
     27     *        *        *     Request timed out.
     28     *        *        *     Request timed out.
     29     *        *        *     Request timed out.
     30     *        *        *     Request timed out.
    Trace complete.
  3. Hi Chuck,
    Thank you for all that info! I've forwarded it to our data center. The latest from Comcast is that it is a known issue and that they are working on it, but I heard that through another user. When I spoke to a comcast tech, he said it was working for him and could not confirm it was a known issue.
    I'll keep you posted. Thanks again,
    Donovan

  4. Hi Donovan, I was online with Comcast tech last night and they gave me a case number and promised me they would forward the problem on to their technical staff. They kept recommending stuff like powercycling mymodem or bypassing my local router (like that is going to change that's going on in the midwest.) I dutifully tried all of their suggestions to no avail, though.

  5. Hi Chuck,
    Yeah, rebooting a modem seldom solves anything! We've now isolated it to a router inside Comcast that is used on the outbound traffic (from EM). We've notified Comcast of the location on the route and hopefully they will get to it soon.
    It's a router BTW that's in LA (but not at our data center) which is why it is effecting almost all Comcast users (some people located in LA seem to be using a different router at the same facility and not have problem).
    Anyway, I'll keep you posted the moment I know more.

  6. Very interesting… I knew it took multiple hops to do something on the Internet, but I never realized there were so many. Too bad there isn't some way to specify domains you'd like to avoid.
    Also interesting that this didn't show up in my RSS feed till today – nearly a week.

  7. This type of thing happens to me at my wife's office. I can only get to forumer (the home of Paint.NET's forum) for about 5 minutes before the requests just die. Then, if I reset my cable modem, it will work for about another 5 minutes. Repeat. No other web sites are affected (as far as I can tell).
    I agree, the web can be very frustrating when it doesn't work properly.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>