Repair log: Centipede and the case of the borderline chip...

lilypad19

Well-known member

Donor 7 years: 2019-2025
Joined
Nov 18, 2019
Messages
2,031
Reaction score
3,217
Location
Billerica, Massachusetts
This one started pretty simple, then this repair turned into an interesting test of will... If you are a computer nerd.. read on..


Self test would not work for some reason, game ran, but it was a mess. Connected up to tester - all ROM good and a bad RAM@F2. Replace it and it seemed 100%! Should have been easy, right?

Once it was running, I performed all of my normal board maintenance - clean chip legs, edge connectors, Deoxit, etc.. Then I add a credit and start a burn in cycle. Day 1 is run 12+ hours with a credit and make sure it doesn't reset. Not too long later.. credit was gone.. game reset. When this happens I run it a day and see if maybe the bad chip will fail completely (usually a RAM).. not this time. While working on other stuff, I added a credit and it reset after about 10 minutes. After a few of these I started timing them and it would reset in attract mode after 9 min 10 seconds and the next screen change. (this will make sense in a minute).

Had to be some weird software issue? Right? I swapped all the ROMs - no difference. Replaced the ROM sockets just to be sure, no difference. Once the pattern became more apparent I narrowed my testing approach. The reset was coming from the watchdog circuit. I clipped the scope to it and determined a watchdog reset was actually happening. If I disabled the watchdog - it would run forever..

So far the only issue was the game reset after 9+ minutes..

Would it reset during gameplay? Fortunately Centipede is my best game. Every 10K points is about 1 minute of gameplay. I played 2 games well over 10 minutes (~170K and ~220K) and the board did not reset.. So only in attract mode? One thing different about this board is it is a -01 hardware release. It is a bit different than all the other revisions and the differences are not documented on the schematics as far as I have seen.

Could this be a weird bug related to this hardware release? I went through my pile of Centipede's and had two -01 boards. I repaired one .. tested it.. no resets.. Just to be 1000% sure, quickly repaired my second one.. tested it.. no resets.. Whatever is going on, it is specific to this board and is not revision dependent.

Around now is when I went to: https://github.com/historicalsource/centipede and started reading Centipede source code.. I wanted to know why it always died at ~9 min 10 sec +

LDA FRAME+1
BPL 52$ ; Every 10 minutes...
AND I,7F
STA FRAME+1 ; Reset counter
JSR SWAP ; Swap screen to clear memory errors
JSR SCORES ; Display high scores

Here is the code that does it.. The FRAME counter ticks 60 times/second. If it rolls over 0x7FFF (32,767) it falls through to 'JSR SWAP'.. (JSR = jump subroutine)

Remember earlier when I mentioned it was very consistent?

32,767 frames / 60 ticks per frame /60 seconds = 9.1 minutes. Exactly when the board watchdog reset at the next screen change in attract mode. At least I determined where the 9 min ~10 seconds came from.. However, it should not watchdog reset. This is just a catchall error check and good coding practice from the original developers. They clear the attract mode every 9 minutes to refresh the screen.

I had a lot of theories at this point.. One was how it was counting..

image.png
Hidden up here in the LS257@K9 - the VBLANK signal is read and toggles a bit 6@0C00

image-1.png
Here is the main code loop for Centipede, I saw VBLANK, IRQ and WATCHDOG and spent time trying to determine if the interrupts were running correctly (seemed to be as compared to a known board). Checked what I could around VBLANK, etc. Swapped in a new socket and PROM@P4.

image-2.png
I checked frequencies, etc. with the scope - thought maybe there was a blip causing a reset.

I was running out of ideas.. Since it was just attract mode and only lost a credit after 9 minutes.. I started going through my testing checklist. One test is make sure I can start 2 players and check cocktail mode..

Started a 2 player game. After Player 1 died - game immediately reset going into Player 2.. REALLY?! Had to be related.. Something is crazy on this board.. I also tested with scoring on Player 1.. when it ended .. it spent all the time rebuilding all of the half shot mushrooms as it totaled up points, then reset as it went to Player 2.

Trying to determine what the attract mode and Player 2 situations had in common, both had 'JSR SWAP' right at the time of the reset among other things.

Looking into SWAP - Centipede does a very smart thing. To preserve the 'mushroom' layout for each player screen, the SWAP routine reads in 8 tiles and bitmaps them into a single byte. The entire screen is preserved in 160 bytes of memory. But this was all CPU and memory moves, it didn't depend on any of the chips that were not already working correctly (as far as I could tell).

During all of this I used a comparator and checked a lot of chips in the clock, watchdog and IRQ circuits as well as verified items with my scope against a good board.. Board ran great if you only wanted a 1 player game ... forever..

I set it aside a couple of days.. which let me sleep on it a bit.

Here is what I knew:
  • Watchdog resets in attract mode at the 9 min 10 second mark
  • Watchdog resets when rolling to Player 2 immediately
  • All the ROMs and the 2 program RAM had been swapped
  • IRQ is firing at the right speed
  • VBLANK is firing at the right speed
Then I woke up with an idea - I originally disabled the watchdog and attract mode did not reset. I never tried that with the Player 2 issue..

I disabled the watchdog and started a 2 Player game.. Got to Player 2.. No problem! The clue I needed. It had to be in the watchdog circuit itself since the code was not crashing.

image-3.png
Replaced the LS90@L2 - Board runs perfectly now.. but why?

My theory: The SWAP routine gets called after 9 minutes of attract mode and when players swap sides. The actual swap code is a pretty good sized routine for assembly language with a lot of bit rotation, memory moves and manipulation. I'm betting it is a long running routine coupled with a defect in the LS90 counter which is counting internally 'too fast' and triggers the reset too quickly.. The LS90 absolutely works because it does trigger watchdogs.. It just seems to be a bit to eager to bark. This may be the MOST borderline defective chip I've ever run into.

Board works!
 
Not only did you fix the problem, you found the proximate cause of the problem and I would argue fully understand it.

Outstanding work!
 
I knew.. I was just able to actually show it was doing just enough bad math... but to be that borderline is crazy..
It happens. I had a stuck bit on a switch. I was able to find it (finally) with the RCT Pro.

That was satisfying - getting the "FAIL" "raspberry" sound on the RCT Pro. I re-ran the test several times to hear it.

Problem solved, board working.
 
Ok.. my cheap Chinese chip tester can't check the LS90.. But my T48/TL866 tester can check it..
Good 74LS90 and the bad one.. Partially working chip...
View attachment 813883View attachment 813885

Heh.. I was trying to read that as if it was really doing a Div10... Didn't notice they were clocking the div2 and the Div5.

Looks the reset to 0 and 9 were fine, and the div2 is fine, but the div5 is hosed.

000, 001, 010, 011, 100, 000 <-- Should be counting
000, 001, 110, 010, 011, 100 <-- Is counting

That added 110 state between 001 and 101 is very weird.

That's making the watchdog effectively wait half as long to bark.
 
Maybe I get to the point where I can read chip states like that.. (I'm kidding .. I doubt it) .. I just tried reading the tester output and I'm not seeing the binary counting. May need to look longer.
Somewhere in my head I knew it was doing bad math and kicking the dog too soon once I knew the actual code was not crashing..
Thanks for adding more detail.
 
Maybe I get to the point where I can read chip states like that.. (I'm kidding .. I doubt it) .. I just tried reading the tester output and I'm not seeing the binary counting. May need to look longer.
Somewhere in my head I knew it was doing bad math and kicking the dog too soon once I knew the actual code was not crashing..
Thanks for adding more detail.

It doesn't help that the bits are out of order.

The Clk on pin 14 is making pin 12 count.

The Clk on pin 1 is making {pin 11, pin 8, pin 9} count.
On the expected outputs, you can see 9 go LH, 8 goes LLHH, 11 goes LLLLH.
Red is mismatches.
 
The format they're using to test TTL appears to be similar to the test vectors embedded in some .jed files for programming/testing PALs/GALs.

IIRC, the WINCUPL documentation has more details on it.
 
It doesn't help that the bits are out of order.
The Clk on pin 14 is making pin 12 count.

The Clk on pin 1 is making {pin 11, pin 8, pin 9} count.
On the expected outputs, you can see 9 go LH, 8 goes LLHH, 11 goes LLLLH.
Red is mismatches.
I need to go write them down on paper in order to see it.. thanks for the insight.
 
Scott C.
 

Attachments

  • 27c5daaf52bfd84654e6b5457055b47b-1661777010.jpg
    27c5daaf52bfd84654e6b5457055b47b-1661777010.jpg
    31.5 KB · Views: 2
Back
Top Bottom