This is a VHDL version of Amstrad CPC 6128 running on FPGA starter-kit NEXYS2 500k-gates from Digilent. A starter-kit is a board made for learning FPGA, so it is a standard FPGA development board.
Please refer to [MiST-board CoreDocAmstrad] for the final user version, running on MiST-board platform.
Some games running on it : 1942 1943, 3D grand prix, A view to a kill, Action Fighter, animated strip poker, Antiriad, Arachnaphobia, Arkanoid, Arkanoid Revenge of Doh, Asphalt, Axiens, Barbarian MH, Barbarian PA, Barbarian II, Battleships, Bomb Jack I and II, Boulder Dash, Bruce Lee, Bubble Ghost, Buggy Boy, Bully's Sporting Darts, Cauldron I and II, Chase HQ, Chicken Chase, Classic Axiens, Classic Invaders, Crazy Cars, Crazy Snake, Dan Dare I II and III, Dizzy 7, Donkey Kong, Double Dragon I and II, Druid, E-Motion, Exolon, Express, Fruity Frank, Ghost'n'Goblins, Golden Axe, Gryzor, Heart Land, Hold-up, Hyper bowl, Ikari Warriors, Impossible Mission, Invasion of the Zombie Monsters, Iron Lord, Killapede, Klax64, Laser Squad, Light Force, Locomotion, Lode Runner, Macadam Bumper, Mange-Cailloux, Mario Bros, Maze Mania, P47, POP-UP, Prince of Persia, Prohibition, Realm, Rebel Star, Rick Dangerous I and II, R-Typeee, Rock Raid, Rogue, Rygar, Salamander, Sapiens, Shinobi, Sim City, Sorcery, Spectra, Spherica, Spin Dizzy, Star Ranger, Star Quake, Star Wars, Strider II, Super Ski, Superkid In Space, Tank Command, Tempest, Tetris 95, The Duct, The Empire Strikes Back, Trail Blazer, Trakers, Turbo Tortoise, Turrican I and II, Venoms, Victory Road, Vixen, WEC Le Mans, West Bank, Wizard's Lair, Word Cup Challenge, Word Series Baseball, Xevious, Xor, Zolyx.
How to assemble it
NEXYS2 Xilinx version is obsolete, it is still describe here for history reason (showing the prototyping part). Please refer to [MiST-board CoreDocAmstrad] for the final user version, running on MiST-board platform.
You need:
- a "NEXYS2 500kgates" starter kit from Diligent [[1]] (1200kgates should be better for future version => in fact I choosen NEXYS4, it is same RAM inside) also in France : [[2]] or in Germany : [[3]]
- a "PMODSD" module for reading sdcard [[4]]
- an alimentation (cause they don't give it with starter kit) [[5]]
- optionally a DIGILENT USB JTAG (normally starter kit can be programmed directly by usb, but I don't have tested this way) [[6]]
- a 4GB SDCARD (no more), I have exactly a "SDHC 4GB class4 Verbatim"
- the binary of this project (candidate 002) : source+500k-gates binary [[7]] ; 1200k-gates binary [[8]]
- several ROM files: OS6128.ROM BASIC1-1.ROM AMSDOS.ROM (from JavaCPC[[9]]); MAXAM.ROM[[10]]
- one or more DSK files : TEMPEST.DSK[[11]] ARKANOID.DSK[[12]] FRUITY.DSK[[13]] BRUCELEE.DSK[[14]] CHASEHQ.DSK[[15]] WIZLAIR.DSK[[16]] XEVIOUS.DSK[[17]] BOULDER.DSK[[18]] CLASSIC_AXIENS.DSK[[19]] CLASSIC_INVADERS.DSK[[20]] GRYZOR.DSK[[21]] PRINCE.DSK[[22]] BUGGY.DSK[[23]] TRAIL.DSK[[24]] P47.DSK[[25]] SUPER_SKI.DSK[[26]] ACTION_FIGHTER.DSK[[27]]
A package that contains minimum set of ROM and DSK for filling simply sdcard [[28]] (OS6128, BASIC1-1, AMSDOS, MAXAM)
 You have to: 
- program FPGA with the binary file "amstrad_switch_z80_vga_sd.bit" of this project, for it I use Digilent Adept software and my USB JTAG cable
- format your 4GB SDCARD in FAT32 4096 byte allocation size
- copy ROM and DSK on SDCARD
- plug PMODSD on slot JC1 of starter kit, and set all 8 switches to 0
- plug VGA, and turn on starter kit
You can:
- plug a PS/2 keyboard, and type "cat"
- increment switch to select another disk at boot, if screen became RED, it's that binary value done by switches is too big, leds are doing a small animation when a disk is correctly loaded
- plug principal joystick on slot JB1 (Vcc 3.3v is common)
- plug another joystick on slot JA1 (Vcc 3.3v is common)
- plug a jack on slot JD1, one at upper GND plug, and second wire at next plug, just at left of it (if it was 3.3v Vcc, choose the right one instead)
Wires are plugs at upper part of pmod D.
From bottom to upper: [nothing: Vcc 3.3v] [blue: GND] [red: sound]
Sure you can connect a jack...
And two joysticks. On pmod B: bottom is (both) 3.3v, so joystick's common, next is (both) GND (not used), and others are joystick connections, to test manually :)
Video
Last news about this project
In May 2018, I programmed my first CPC game http://www.pouet.net/prod.php?which=75855 following JDVA youtube tutorial since january, they are based on CPCMania 2005's website knowledge about programing in CPC using SDCC. I think that if I do progress this way enough, I'll implement my own CPC testbenches, needed for reaching next realise of FPGAmstrad.
In January 2018, Jepalza has ported FPGAmstrad from this wiki (Xilinx version, principe of concept 2011) on spanish ZX-Uno low-cost FPGA final platform (three times cheaper than MiST-board/same Xilinx chip poc 2011/chip used at 100%). So I bought a ZX-Uno to help around this fork, merging components. Normaly I can go a little further later (CRTC0, joystick), and then go back to MiST-board :)
https://www.youtube.com/watch?v=tpr9xxx1rsA
In May 2017, FPGAmstrad TV mode is validated using a TV from Tetalab group.
In February 2017, CRTC1 is also implemented following JavaCPC's source code, now you can choose between CRTC0 and CRTC1 in the OSD menu.
In January 2017, scanlines mode is implemented, you can select it from the OSD menu.
In December 2017, implementing green screen, using "Les Sucres en Morceaux" tutorial.
In September 2016, FPGAmstrad does use external RAM as RAM+VRAM, no more "LowerVRAM/UpperVRAM" switches to select in the OSD menu.
In August 2016, FPGAmstrad does pass some arnoldemu's testbench : PPI PSG CPCTEST.
In February 2016, FPGAmstrad has a TV mode (original signal)
In September 2016, FPGAmstrad has write access on both disk drives, and can change dsk.
In March 2015, FPGAmstrad is stable on MiST-board, please refer to MiST-board CoreDocAmstrad and mist-board amstrad bin core
In December 2014, FPGAmstrad is stable on NEXYS4, WIP, BOOTLOADER improved, VRAM improved (with BORDER colors !), RAM relaxed, no more timeout in keyboard key-press, Ghouls 'n' Ghosts, Macadam, a lot more to do. 14% of platform is used.
In November 2014, I bought MiST-board, with two USB pro joysticks.
In September 2014, I bought NEXYS4, more powerfull than NEXYS2, with same external RAM, internal mini-sd, no PS/2 (it is a pmod option)... I have some patchs to make (MSB FAT32 offset). I would like to make a USB sniffer also with it (usb to ethernet (wireshark))
Contact
I am Freemac, my IRL name is Renaud Hélias.
You can contact me by mail to renaudhelias @t gmail d0t com, if you have questions about assembling this project or running it in MiST-board final platform.
Follow me on my google plus account!
Tests done
Great games that are running properly are listed at top of page.
On MiST-board CoreAmstrad version
Games that doesn't run are :
- ACPC_logon_system.dsk: text scrolling lag. This demo will be used for horizontal ink calibration (when I’ll buy a luxurious FPGA platform... I need in fact 224KB of internal RAM to do it), and CRTC overcounts.
- commando.dsk: pixels that should be deleted are not deleted (only VRAM &C000-FFFF seems used), on level 1, the moto is not displayed correctly inside the bridge... but after the bridge :/
- split ink demo.dsk: (from cpcrulez) : may help about ink raster calibration.
- Sultan's Maze.dsk: does need the right part of keyboard (F0-F9 are used for directions in this game)
- Orion Primes.dsk: does display "secteurs entrelacés" - "vérifiez votre copie", a FDC problem, perhaps "sectorId++" is not the good way to reach next sector, or else two tracks in one track.
- Batman_Forever.dsk: some problem during flying chip demo part (one garbage line), and several rupture showing ghost lines around Vcc=0. Rupture solved in r005.8.16c4, but flying chip now show a half garbage of pixels. Batman seems CRTC1. Problems with FDC in r005.8.16 (does slow animations, like if I missed some "not ready" signal ?)
- Battro.dsk seems also CRTC1 and does fail completly. Does pass in r005.8.16.
- 30YMD.dsk: in Benediction demo, at bottom some time you see some ghosts of central animation (too many HSync per screen ?), solved in r005.8.16 (CRTC0 seems perfectly implemented). 30YMD seems CRTC0, running fine except that changing disk feature does still fail (inserted/not inserted/inserted signal ?)
- arkanoid2.dsk: don't run in r005.8.13, but fine in r005.8.13e (experimental fork), ok in r005.8.14 (using default OSD value : MEM_WR=quick)
- trailblazer.dsk: no more "raster" problem since r005.5, it's now perfect ! Palette heuristic offset (done for unlocking Batman Forever Demo) has a small effect in left (squares are not separated by a black line in first column) - same small defect in TV mode using r005.8.14.2... Thinking about a HSYNC offset of 2 (instead of 1 currently) then also delaying DATA+HDISP of 1 (char)
- imperial_mahjong.dsk: modern EXA/EXA2 resolution not passing my color pallet heuristic :p
- rtypeee.dsk: at begin of presentation, a draw of "jack plug" is done in a strange video mode, more than 200 pixels of height !, see flipping lace
- S&Koh.dsk: LOGON SYSTEM, black screen in r005.8.4... damn
- Pinball_Dreams__PREVIEW.DSK: Does run in experimental/ versions r005.8.9.2 and r005.9.11e (experimental forked version of r005.9.11 using flag FPGAmstrad_amstrad_motherboard.vhdl.HACK_Z80=false). Does freeze in !experimental versions when background music is special (not long classic background music) and you press two buttons (left/right flipper keys) at the same time, ok in r005.8.14 (using default OSD value : MEM_WR=quick). Scrolling text is OK in r005.8.16c4 but still one garbage line at top of main game (CRTC1 offset ?). Back and really nice in r005.8.16.1
- Fres Fighter II Turbo.dsk: FDC problem, cannot be launched.
- Seascape.dsk: Devilmarkus, using scandb50Hz and MEM_wr=slow, does display, but a flower petal at bottom is drawn in blue. A good raster test.
- Welcome To Amstrad CPC 6128.dsk: does display "Incompatible BASIC installed" message.
- phX.dsk: does begin to pass on r00.8.16.
Arkanoid.dsk stars use rupture address (changing address several time during display of one image), it is now supported on "candidate 001" version of FPGAmstrad. Run better in r005.9.11e than in r005.9.11.
Gryzor and Prince of Percia use rupture ink/mode (changing ink and mode during display of one image), it is now supported on "candidate 002" version of FPGAmstrad
Gryzor is really sensitive to Amstrad general stability : do press esc at start menu does activate music during game, if no music, FPGAmstrad is in an instable version (last known stable version r003.8)
Crazycars2.dsk first car image use 32KB of VRAM, it is now supported on "candidate 001" version of FPGAmstrad.
Ghouls'n'Ghost.dsk / Ecole.dsk does need RAM write when writing in ROM (RAM is beside ROM, hard to emulate with asynchronous SDRAM controler, MiST does use a hacked synchronous RAM done for that)
moktar.dsk / super_cauldron.dsk does run fine since r004.8.1.1. Morkar run fine in r005.8.16c4 using CRTC0 and MEM_wr=slow.
CPC Aventure does run fine since r005.2 (message about turning disk now displayed)
prehistorik.dsk does run fine since r005.2 (key can now be pressed in intro demo)
Sim City and Hero Quest run fine since r005.2 (it was about "key always pressed"), but does need a hard boot (not quick reset keyboard key), certainly a small problem of component init state. Sim City hack demo intro does run better using A-Z80 instead of T80 (2/3 of luck).
A lot of demos don't pass in NEXYS4 FPGAmstrad's version (I need to implement back the SDRAM hacked in MiST board FPGAmstrad version). CPCRULES demos is a cool ressource as it contains simple dsk formats.
crazycar.dsk does use lowerVRAM=00 (&0000-3FFF and &C000-FFFF area for VRAM)
crazycar2.dsk does use lowerVRAM=10 (&8000-BFFF and &C000-FFFF area for VRAM) while overscan presentation, and lowerVRAM=00 (&0000-3FFF and &C000-FFFF area for VRAM) during play. Run fine since r005.7
buggyboy.dsk: I found one dsk running correctly and another not : problem on this last one is about turning left/right effect : the car continues straight (it seems a prototype bad version)
superski.dsk : don't forget to disable autofire, in jump, do press fire+up or down.
protext.rom: All right since r004 (before, in edit mode, if I pressed continuously one key, it did write ten letter and freeze/crash. Do note that a number is incremented in live at top right, so this is a complex software, good for keyboard vector stability tests)
atomdriv.dsk: unlocked since r004.5 (the game and music did freeze (ROM unplugged shall return xFF))
rtypeee.dsk: unlocked since r004.7. Does use special read FDC cmd, with BOT different of EOT (begin of track/end of track), so reading several 512 bytes blocks in one FDC command only.
Arkanoid Revenge of Doh and Asphalt are two games that doesn't support ROM extensions : Doh starts to show small bad layers, and Asphalt does crash and reset before main menu... if you add another gadget ROM.
antiriad.dsk: no keyboard/joystick between r004.5 and r004.7. By adding pull-up at r004.5 I lost this game, by implementing a better PPI in r004.8 this game run fine : Antiriad is back !
-circles.dsk: this demo freeze does since r004.8 (PPI border effect ?) and is back since r005.5, it was nice to calibrate SOUND clock : I did generate 8 candidates of synchronizing this clock (1MHz from 4MHz : 1100 0110 0011 1001, and 0.5 deltas : 1100i, 0110i, 0011i, 1001i), only one does not freeze -circles... so I release r005.5 candidate. This demo is a great one around calibrating Yamaha clock.
Nigel Mansell's Grand Prix.dsk: Only one race track seems ok : Monaco (Brazil track does not start). Unclassified : this disk bug also with other emulators, certainly a bad dsk dump here, TOSEC version of Nigel Mansell does run fine (but some legendary traces of "SK bit purpose" needed by here (in FDC, setting SK does jump deleted disk tracks), perhaps to investigate) - update : some tracks unlocked in r005.8.15c61.
saboteur2.dsk: run fine since r005.5 (nice music and then freeze problem), it was about Yamaha clock generator (generated by Gatearray, versus WAIT_n added in short Z80 instruction to let them during all 4 clocks (Z80 in Amstrad does use 4T or 8T instructions (WAIT_n does insert missing T)). Does freeze at welcome since r005.8.7. Back since r005.8.10.
tetris95.dsk : bad in r005.8.9.2 (4 beep while breaking 3 lines (instead of 3 beep while breaking 3 lines), was correct in r005.8.4. Back since r005.8.10.
| Game | Welcome | Title | Game | 
|---|---|---|---|
| CrazyCar2 | 1011 | 0011 | |
| One | 0111 | 0011 | |
| Scarabus | 0111 | 0011 | |
| Ace | 0010 | ||
| Tintin | 0111 | 1011 | 0111 | 
| Devil's Crown | 0001 | 0011 | |
| Mach 3 | 0111 | 0111 | 0110 | 
| Paranoia (1994) | 1011 | 0001 | |
| World Class Rugby | 1011 | 0001 | 0001 | 
| Miam Cobra | 1001 | ||
| Galivan | 0001 | 
XOR.dsk does use lowerVRAM=01 (&4000-7FFF and &C000-FFFF area for VRAM)
TODO :
- To test also : Unlimited Bobs (Dr.Piotr).dsk demo.
On ZX-Uno FPGAmstrad version
Games that doesn't run are :
- adios_a_la_casta.dsk: bad sound. Does pass hacking memory banks this way :
else b"1100" when RAMbank="100" and (A(15)='0' and A(14)='1') else b"1101" when RAMbank="101" and (A(15)='0' and A(14)='1') else b"1110" when RAMbank="110" and (A(15)='0' and A(14)='1') else b"1111" when RAMbank="111" and (A(15)='0' and A(14)='1') else b"10" & A(15 downto 14); -- default value
- Ghouls'n'Ghost.dsk: black screen during game. RAM is not relaxed enought to permit changing address just after knowing if Z80 does a read or a write (ROM written=> does write on RAM hidden behind)
- jdva6.dsk: no keyboard... JDVA#6 test
Effort done
VGA
Pixels and VRAM. Palette and rasters. CRTC0
VGA: CRTC0
CRTC0 seems the best one, some demo does cry when detecting a poor CRTC1 (CRTC1 seem a low cost version of CRTC0). I have to implement a CRTC0 instead of my current CRTC1...
In fact CRTC1 is the best one. CRTC2 is the low cost version. CRTC0 did appears before CRTC1.
Some demos are running only on CRTC0 and others CRTC1.
Done in r005.8.14. Detected as CRTC0 by WakeUp! - "Enjoy the show" message displayed.
In r005.8.15. WakeUp! (CRTC0/MEM_wr quick) detected as Emu first time, and after a quick reset, does say detected as CRTC0.
In ZX-Uno FPGAmstrad, I implemented CRTC0.
VGA: VRAM
ram_palette.
VRAM contains 800x300 amstrad pixels (VZoom x2), displayed VGA 800x600@72Hz with fix regular border at 768×576 and fix inside border at 768×544.
In ZX-Uno, VRAM contains 800x300 amstrad pixels (VZoom x2), displayed 640x480@60Hz, with vertical only border.
- simple_GateArrayInterrupt.vhd (GA to VRAM) parameters : VRAM_Hoffset/VRAM_Voffset
- aZRaEL_vram2vgaAmstradMiaow.vhd (VRAM to VGA) parameters : H_BEGIN/H_END/V_BEGIN/V_END (theorical fixed values)
To calibrate : VRAM_Hoffset++ does offset one char left. VRAM_Voffset++ does offset one line up. On display H_BEGIN does begin to scan lines of VRAM. But V_BEGIN does not enter in consideration here : vertical=0 does begin to scan columns of VRAM.
In original CPC, top border has 1/2 char more than bottom border. I used Batman Forever default welcome/calibration screen to calibrate VRAM offsets. On ZX-Uno I used Arkanoid to calibrate VRAM offsets.
RAM_palette contains the ink list and the mode for each line of VRAM, sampled at horizontal middle of 800x600 screen, and used at begin of each line.
VGA: TODO : arnoldemu testbench
arnoldemu testbench: crtctest
Adding choice of CRTC 0 or 1 on OSD, and passing this test could be great.
VGA: TODO : winape testbench
winape testbench: plustest
a better border heuristic
Using winape testbench (plustest), test 2 does show somes problems while border does go out of screen, negative border does hide line itself.
bootloader
SDCARD and RAM.
(nothing to say here, really ???)
GA
GA: alignment of HSYNC Interrupt
Interrupt are respected since version "candidate 001" of FPGAmstrad, Markus does help me a lot about it.
JavaCPC running norecess's "using-interrupts" code [[29]]
It could be interesting to test this asm code on next version of FPGAmstrad.
GA: Sniffing of a real Amstrad
I listen to some wires of my Amstrad CPC 6128 plus, but I can't access VSYNC/HSYNC output of CRTC, so I have to buy another model in order to do this test. In fact you can listen at clock of Amstrad and transmit it to FPGA DCM component, resulting a accelerated clock sequence, that's it, with FPGA DCM you can overclock output Amstrad clock signal in order to insert more operations, I use this tip for listening signals and save them inside starter kit asynchronous RAM (write, stop write, write, stop write... I’m a perfectionist paranoid...)
You can power Amstrad CPC using extension port, applying 5v. By doing it, power down button of Amstrad doesn’t run. Using this way you reach a common 5v power between starter-kit and Amstrad. I connected wires from extension port directly to FPGA, as they are used just for listening.
GA: WAIT_n generator (WIP...)
Instruction timing.
I tested instruction timing of T80 compare to instruction timing of JavaCPC emulator. I deduce synchronization of Z80 with CRTC on M1 signal by WAIT_n insertion in order to have a multiple of 4 Tstates per instruction. I deduce also one WAIT_n inserted during MEM_WR operation (yes I log testbench T80, I’m crazy)
I just made a test bench log of T80 (log of instruction's M1, and first M1 coming after knowing that I send a lot of NOP after my instruction), and compare it to a JavaCPC timing array. Some instructions was not tested (interrupt wait, and special timing (instructions with change timing)), but all others passed correctly.
GA: WAIT_n generator - currently in r008.5.14
In GA, I do use begin of edge for IO_ACK instead of state.
M1 reached same time of IO_ACK are ignored (not M1) in WAIT_n generator.
MEM_WR has an OSD menu choice to switch between "quick" and "slow", "slow" mode does insert ONE WAIT_n during detection of MEM_WR. This switch exists because somes games are running in "slow" mode and others in "quick" mode.
GA: WAIT_n generator - talk about r008.5.14
In fact it exists several instruction making MEM_wr, and adding each one ONE WAIT_n does result in different case of synchronization.
If it's about managing GA reading pixels, perhaps not only M1 signal are synchronized but also the MEM_RD and MEM_WR accesses at another offset.
Timings instructions Z80 sur CPC
If interruption r52 is regular, even while making a continues MEM_WR, interruption (int<='1') shall be taken into account above WAIT_n insertions ?
In Z80 sequence diagram, an IO_ACK(+M1) is preceded by M1 (single)
cpctest.dsk -Timing Instruction- is different while using mode "MEM_WR=slow" and "MEM_WR=quick". Strangly better using "MEM_WR=quick".
Current version is using "Z80_HACK=true" (parameter set during compilation), that shunt Z80.WAIT_n entry, Z80.clock is slow down during theses WAIT_n. It's the only current way I succeed in slowing down enough Timing Instruction for unlocking Saboteur 2 game.
Key games here are : Saboteur 2 (run fine with "MEM_WR=slow", does freeze with "MEM_WR=quick") and Arkanoid II (run fine with "MEM_WR=quick", too slow using "MEM_WR=slow")
CPC Z80 Commands and how long they take...
GA: WAIT_n generator - talk about r005.8.16c4
In r005.8.16c3, I remark that WAIT_n are badly introduced following plustest.dsk testbench (WinAPE). I corrected a T80 parameter firing the WRITE action 1 clock before. That unlocks 32 instructions timing in plustest.dsk testbench .
In r005.8.16c4, I remark that two WAIT_n are also needed by instructions using 5T during M1 cycle. But in fact the M1 signal is not outputing 4T while 4T, but cutted at 2T. So that I do here (experimental), I produce a M1 signal output of 3T while 5T. Detecting it that way in my WAIT_n generator, producing then needed 2 WAIT_n at this moment.
But plustest.dsk testbench don't pass anymore in r005.8.16c4, testbench show border effect : a correct instruction is marked as bad timing, but in fact it the next one I did modify.
I don't know if plustest.dsk does pass on real Amstrad, but it is a really great testbench to progress (without it, r005.8.16c4 could not be realized).
During this work, I remark an instruction that seems badly classified on plustest.dsk : 2A LD HL, (nn). This instruction does not use MEM_WR and is not a 5T M1 instruction. So it shall be using 4 NOPs instead of 5 NOPs. LD (nn), HL does take 5 NOPs (because it does use MEM_WR instruction), as in plustest.dsk testbench here for this instruction.
GA: WAIT_n generator - plustest-5 - Tests on real CPC (by DanyPPC)
CPCWiki's forum : Need plustest.dsk testbench 5 output on original CPC 6128
So 2A is really during 5 NOPs... perhaps MEM_rd has to be slow down with one WAIT_n like for MEM_wr. Perhaps in this case 5T's instruction has not to be slow down. I have to fork r005.8.16c3 to test that. - update : in schematic GateArray does not has "MEM_wr wire" (but MEM_req and RD, so can deduce WR, but in an evil brain's way) - update 2 : all tests failed except one in testbench using this way, perhaps because RFSH_n does also use MREQ_n during M1 cycle. Perhaps WAIT_n generator can detect the current OP Code fetched (this is conform to Patent US5313621 : Programmable wait states generator for a microprocessor and computer system utilizing it ). - update : principe of concept validated for one instruction (2A), I can slow down instructions one per one, I don't know why I had to insert two WAIT_n instead of one here, but its results a plustest.dsk testbench with 2A instruction validated, that's done on r005.8.16c5f5 (candidate 5 fork 5), but I have to revert it to r005.8.16c3f5 I think before going further.
About testbench border effects, I think that IO_ACKed instructions has to be under same rules (MEM_wr, modulo 4 etc) - update : same result in testbench using this way.
GA: WAIT_n generator - plustest-9 - Tests on real CPC (by GUNHED and Kris)
CPCWiki's forum : Need plustest.dsk testbench 9 output on original CPC 6128
GUNHED results :
System 1: CPC6128, CRTC2: Test 9 works normal until the &EC codes, there are two errors marked with an "X". &ED, &46: 2 &ED, &4E: 2 After &ED, &5D it suddenly stops working! System 2: 6128 Plus: &ED, &46: 2 &ED, &48 5 &ED, &49: 5 &ED, &4E: 2 &ED, &50: 5 &ED, &51: 5 &ED, &58: 5 &ED, &59: 5 After &ED, &5D it suddenly stops working! Probably a crash, since a spot appears on screen.
Kris results :
Here are my results (teste performed on CPC 6128 CRTC 1) Pictures of each screen attached in the .rar file. (...)
In pictures of Kris, after ED5D, it does stop also. Only ED test part has some failings :
- ED46:C
- ED4E:C
In WinAPE (by default CPC 6128 CRTC1), ED test does finish its screeen result, with several fails :
- ED46:2
- ED4E:2
- ED66:2
- ED6E:2
Others screens results after does pass. Relaunching once again in WinAPE, same results.
GA: WAIT_n generator - talk about r005.8.16c20
Instruction timing seems all respected following plustest.dsk, but I think it isn't enough, so it's still a candidate version.
Even prefixed and double prefixed instructions are taken into account by my WAIT_n generator.
WAIT_n generator is finally juste inserting a certain number of WAIT_n following executed instruction + a mod 4. In forum if I remembered, you don't do the next mod 4 alignement after a recepted IO_ACK (have to check that)
I try on this version to get an intelligent IO_ACK: in JavaCPC instructions just has a fixed time (array of instruction timing) and IO_ACK does not influence on them. So IO_ACK perhaps has to remove one WAIT_n inserted by GateArray.
I have regressions on this version, Still Rising's scroll, and Trail Blazer palette offset at left.
"RET cc" instruction seems not respecting original timing in T80. I had 2 clocks in last TStates of it.
I don't understand why I have to add 2 WAIT_n when 1 WAIT_n seems suffisant, I think there is some problem around my "PLEASE_WAIT" component (hack of T80's WAIT_n entry), perhaps finally T80's WAIT_n entry is fine, as finally I just insert a certain number of WAIT_n during second clock of M1, MEM_wr slow is unvalidated : Gatearray of Amstrad doesn't have the needed "WR" wire, so.
Next step shall be destroying "PLEASE_WAIT" component I think, in order to add 1 WAIT_n and not 2 with my WAIT_n generator.
T80 WAIT_n has also to be checked, I know that inserting a WAIT_n seems generating a seconde IO_ACK edge during an IO_ACK.
Have to check also the moment IO_ACK is taken into account during M1 signal (I think it's at begin of it, but have to re-check that)
GA: WAIT_n generator - RET cc and WAIT_n timing analysis
Normaly, without WAIT_n generator (even modulo 4), NOP should take 1 M-cycles and 4 T-states, so this instruction should pass using plustest.dsk at 0x00. Does fail here (my approach is incorrect)
In r005.6, cpctest.dsk did pass. Removing MEM_wr:slow both tests does still run fine (NOP/HALT (x00 x76)). Removing full mod4 WAIT_n generator, cpctest.dsk and plustest.dsk does fail. So NOP has to be synchronized ? (plustest.dsk is full of NOPs)
My WAIT_n generator currently passing fully plustest.dsk's testbench is using the bad edge, something is wrong, it's a false positive. I know that NOP, HALT and IO_ACK/INT has to be good for this test to be valid. It's not the case here, so in fact my table of latencies is not the good one : running, but corrections are done on several bad instructions, some of them are even illogical, as this "RET cc" instruction where I had to hack the Z80 itself to reach a passing test, so something is wrong, and that thing is firstly the current WAIT_n generator clock edge (in comparison againts Z80's clock edge)
Have to change my approach, perhaps using invariant (table of full instruction chrono versus reality), validate instruction timing before trying validating IO_ACK interrupts. Write one table from plustest.dsk's testbench launched on WinAPE, and another table from original Z80 documentation, and deduce a first theorical candidate table of latencies. I have to trust first in my instruction timing tables (and have to write them both completely...)
Prefixed instruction seems having only one M1 : Z80 doc show that a prefixed instruction take 4T more time. But last time plustest.dsk did pass thinking prefixed instruction are totally separated (M1 for DD, M1 for CB, M1 for "RLC (IX+D)"), so each prefix has its own M1, and following instruction has also its own M1.
Is IO_ACK itself a separated instruction ? I think not, it's more about a hack of a current instruction, adding two autowait and making its business during this inserted times.
IO_ACK offset into INT (interrupt) should not implicated by WAIT_n generator, and it seems that a WAIT_n during T2 is ignored because of autowait already inserted at this moment... for synchronizing an IO_ACK, I have normaly to insert WAIT_n during T2+2. No way, instruction itself is synchronized, so IO_ACK is synchronized also, you don't have to insert WAIT_n during T2+2.
http://www.cpcwiki.eu/forum/emulators/cpc-z80-timing/ : ~ a WAIT_n does not use RAM access, so does not slow down a "CPC instruction" (hypothesis) - but what about an IO_ACK during NOP in this case ?
In doc, IO_ACK begin after T2, during the two autowait inserted. So no way to detect that an instruction is IO_WAITing before slowing it following "slow down" instruction timing table.
http://www.cpcwiki.eu/forum/programming/cpc-z80-commands-and-how-long-they-take/30/
So, it's more correct to not think of it stretching the M cycle, but instead not starting the next one that requires a memory access until the 4th cycle. If you think of it like this, it also explains the weird exception that happens for interrupt handling. Normally, responding to an interrupt adds 1us on the CPC. That's because it actually just adds 2 T states for the interrupt acknowledge before the next instruction fetch. However, in case where the last M cycle takes 6 T states, the interrupt acknowledge doesn't delay the instruction prefetch and so the usual 1us delay doesn't occur. "Simples!" - ralferoo
(4T equals 1us equals 1 NOP => modulo 4 synchronization of M1)
So you slow down instructions following a slowing down instruction table, slowing it the less you can, and then IO_ACK comes or not, and then you synchronize next M1 putting WAIT_n during T2 modulo 4. IO_ACK two autowaits are not prolongated.
ralferro explains also that stretching instruction timing depends of memory used or not by instruction. I know that Amstrad schematics does not use the MEM_WR wire. So it could be hard to deduce if they added 1 or more WAIT_n for certain instructions. But I'm more about 1 WAIT_n inserted at maximum each time (it's more easy to hard implements), and the modulo 4 synchro, let's see results of my current experiment (comparing time instruction of Z80 and plustest.dsk testbench, deducing diff table of "slowing down instructions") wip.
GA: WAIT_n generator - talk about r005.8.16c29
This version does implements correctly a theorical WAIT_n generator : I used a script comparing Z80 doc timings to plustest.dsk testbench result on real CPC, I deduce that inserting each time 2 WAIT_n does the stuff (that will my first approach), I saw also that first set of instruction timing is fully covered by Z80 doc, so plustest.dsk testbench failing on this part is not due to table slowing down instruction timing (WAIT_n generator's table of slowing down instruction timing), this isntructioon have to be slowed somewhere else : perhaps a bug inside Z80 itself or else the equation NOP/HALT/ACK to review, instructions concerned here seems all about "JUMP" except two instruction (a LD and an EX), in past I did already tested Z80 instruction timing themself and found no problem this way.
I also revisited the edges of WAIT_n generator, to insert WAIT_n at 2T.
I also removed the edge detection of IO_ACK on gatearray, replacing it by state detection of IO_ACK, resulting cpctest's testbench back : this test of HSYNC width is now successfull.
plustest.dsk has some "missing tests" but in fact there are the prefixes : CB, DD, ED, FD, used to launch other areas of instructions.
HALT is the only one instruction that will be always OK on plustest.dsk instruction timing testbench. As this instruction cannot be timed.
Megablasters seems running fine on r005.8.16c29. Using 4 disk version of Megablasters (old one), sometimes bombs doesn't explode, it's normal : in this case you have to press the second fire button (gameplay...)
plustest.dsk source code are available on winape website, I have to explore them.
| Hex | Inst | CPC timing | r005.8.16c29 | 
|---|---|---|---|
| 32 | LD (nn),A | 4 | 3 | 
| 3A | LD A,(nn) | 4 | 3 | 
| C0 | RET NZ | 2/4 | 2/3 | 
| C4 | CALL NZ,nn | 3/5 | 3/4 | 
| C5 | PUSH BC | 4 | 3 | 
| C7 | RST 0H | 4 | 3 | 
| C8 | RET Z | 4/2 | 3/2 | 
| CC | CALL Z,nn | 5/3 | 4/3 | 
| CD | CALL nn | 5 | 4 | 
| CF | RST 8H | 4 | 3 | 
| D0 | RET NC | 2/4 | 2/3 | 
| D4 | CALL NC,nn | 3/5 | 3/4 | 
| D5 | PUSH DE | 4 | 3 | 
| D7 | RST 10H | 4 | 3 | 
| D8 | RET C | 4/2 | 3/2 | 
| DC | CALL C,nn | 5/3 | 4/3 | 
| DF | RST 18H | 4 | 3 | 
| E0 | RET PO | 2/4 | 2/3 | 
| E3 | EX (SP),HL | 6 | 5 | 
| E4 | CALL PO,nn | 3/5 | 3/4 | 
| E5 | PUSH HL | 4 | 3 | 
| E7 | RST 20H | 4 | 3 | 
| E8 | RET PE | 4/2 | 3/2 | 
| EC | CALL PE,nn | 5/3 | 4/3 | 
| EF | RST 28H | 4 | 3 | 
| F0 | RET P | 2/4 | 2/3 | 
| F4 | CALL P,nn | 3/5 | 3/4 | 
| F5 | PUSH AF | 4 | 3 | 
| F7 | RST 30H | 4 | 3 | 
| F8 | RET M | 4/2 | 3/2 | 
| FC | CALL M,nn | 5/3 | 4/3 | 
| FF | RST 38H | 4 | 3 | 
GA: WAIT_n generator - plustest.asm
About 22 pages of source code using 3 columns per page.
.stdinst (launch tests on several list of instruction, some instructions are tested differently using test functions : normtest, testit (testdjnz), rsttest...)
ld a,#c7 call rsttest => C7 RST 0H 4 3
ld a,#cf call rsttest => CF RST 8H 4 3
rsttest seems a nice candidate to explore, as all its tests are failing here.
.times1 (CPC Timing array (first instruction set))
GA: TODO : arnoldemu testbench
arnoldemu testbench: cpctest
forum : amstrad cpc "acid" test => I have uploaded updated tests : http://cpctech.cpc-live.com/test.zip
Tests done here : ppi/psg/cpctest.
In r005.6, I reach successfully some arnoldemu tests to calibrate more efficiently HSYNC interrupt : ppi.bin, psg.bin, cpctest.bin.
Games unlocked by r005.6 : Sigma7, Pac-land, Golden Tail.
In r005.8, Prehistorik is running fine.
In r005.8.4, arnoldemu testbench "cpctest" does fail :/
In r005.8.7, arnoldemu testbench "cpctest" is OK
In r005.8.14 version, using default mode "MEM_wr:quick", is OK. And Prehistorik II is running fine.
In r005.8.16c29, arnoldemu testbench "cpctest" is OK (but it is a wip version :p)
GA: TODO : MODE row buffer
MODE does change at each begin of lines, not at begin of pixel drawn.
Il ne faudrait pas penser que l'on puisse changer de mode plusieurs fois par ligne. En effet. c'est "impossible"! (jusqu'à preuve du contraire, le mode s'enclenche à chaque synchro horizontale (HBL).
https://cpcrulez.fr/coding_logon35-le_gate_array.htm
GA: Moustache testbench
A homemade Testbench done firstly for helping Sorgelig to calibrate it's port of FPGAmstrad into MiSTer. jdvpa6_moustache.dsk
Z80
Architecture of Z80.
Z80: test of a real Zilog 80
 Code name : Z80fx2bb, real Z80@2MHz (instead of 4MHz) on fx2bb extension card.
Code name : Z80fx2bb, real Z80@2MHz (instead of 4MHz) on fx2bb extension card.
[http://www.youtube.com/watch?v=YYnvkR5v3D0]
For it I plug all wires simply from 1 to 40. Some wires are cut, some are Vcc, others GND. Z80 output are directly connected, Z80 input are pull-up with red-red-red resistors (I like red), Z80 is powered 5v (pmod can give 5v using jumper). In fact z80 is so old component that powering it 5v does output 3.3v.
In fact the only difference between T80 of opencore and real Z80 is that T80 runs on rising_edge, and Z80 runs during low state. Test past with little modification of sequencer forcing it to do nothing during low state of z80, resulting a downclock (memory is too overclocked with this sequencer modification), perhaps using buffer on address bus and data bus could solve this detail... but as it runs for me it is not a problem.
Z80: architecture
a) T80.vhdl
17 pages of source codes to read.
Not analyzed yet completly.
Contains the main workflow of Z80: current MCycle and its current TState.
Contains T80_ALU.vhdl and T80_MCode.vhdl components.
b) T80_ALU.vhdl
6 pages of source codes to read.
Not analyzed yet completly. This analyse can certainly be wrong : wip.
Contains flags : C N P X H Y Z S
C : carry - set if result did not fit in the register N : negative? - last instruction was substract P : parity or overflow - overflow example : signed, 7F+7F=FE with overflow setted X : undocumented H : half carry - set if 4bit first bits of result did not fit in the register Y : undocumented Z : zero - set if result is zero S : sign - it is an input ?
Contains ALU_Op : [ADD ADC SUB SBC AND XOR OR CP] ROT BIT [SET RES] DAA
ALU_Op is the basic instructions of Z80 coded here. T80_ALU.vhdl is a slave, a service exposed to T80_MCode.vhdl throw T80.vhdl
[§ Disassembly tables] shall make a cool ALU_Op quick reference card, doesn't it ?
c) T80_MCode.vhdl
First 5 pages, and last 2 pages of source codes to read. Others pages are "always the same" architectually speaking.
Not analyzed yet completly. This analyse can certainly be wrong : wip.
Gives instructions lengh : MCycles, TStates (please remark the 's' at end of theses words...), in [Z80 doc] each instruction is timing described using "M Cycles" and "T States" vocabulary.
It's a "controler" (proof : you have some Set_*_To outputs), does gives orders to T80_ALU.vhdl throw T80.vhdl
Actions of this controler are :
- ALU_Op : the action !
- I_DJNZ I_CPL I_CCF I_SCF I_RETN I_BT I_BC I_BTR I_RLD I_RRD I_INRC : actions not for ALU (wiring input/ouput, changing flags...)
- Save_ALU/PreserveC : an option about register "erased or not" at next instruction
Instructions not coded in T80_MCode.vhdl but in T80.vhdl (strange, barbarian part of code ?) :
- Jump/E/XY Call RstP LDZ LDW LDSPHL Special_LD ExchangeDH/Dp/AF/RS
Inc_WZ register : take a look at [§ The WZ temporary registers. It's a tmp internal register in fact.
Z80: Some bad instruction timing analysis
Based on [WinAPE>download>Plus test>plustest.dsk] testbench, mapped using [Z80 instruction set - ClrHome], instruction described then in [Z80 doc], against [WinAPE] passing testbench timing.
| Hex | Inst | CPC timing | MEM_wr:quick | MEM_wr:slow | remark | 
|---|---|---|---|---|---|
| 02 | LD (BC), A | 2 | 3 | Fixed on r005.8.16c3 | |
| 10 | DJNZ, e | 4/3 | 4/2 | 4/2 | T States begin by "(5, " : M1 is longer than 4. Seems adding also one Wait_n in this case (as about MEM_wr) | 
| 12 | LD (DE), A | 2 | 3 | Fixed on r005.8.16c3 | |
| 22 | LD (nn), HL | 5 | 4 | 6 | Fixed on r005.8.16c3 | 
| 2A | LD HL, (nn) | 5 | 4 | 4 | MEM_WR not used by here, it seems correct following doc : 4+3+3+3+3=16, 16/4=4. Damn. | 
| 32 | LD (nn), A | 4 | 5 | Fixed on r005.8.16c3 | |
| 34 | INC (HL) | 3 | 4 | Fixed on r005.8.16c3 | |
| 35 | 3 | 4 | Fixed on r005.8.16c3 | ||
| 36 | 3 | 4 | Fixed on r005.8.16c3 | ||
| 70 | 2 | 3 | Fixed on r005.8.16c3 | ||
| 71 | 2 | 3 | Fixed on r005.8.16c3 | ||
| 72 | 2 | 3 | Fixed on r005.8.16c3 | ||
| 73 | 2 | 3 | Fixed on r005.8.16c3 | ||
| 74 | 2 | 3 | Fixed on r005.8.16c3 | ||
| 75 | 2 | 3 | Fixed on r005.8.16c3 | ||
| 77 | 2 | 3 | Fixed on r005.8.16c3 | ||
| C0 | RET nz | 2/4 | 2/3 | 2/3 | RET cc, inverse of RET z. | 
| C4 | 3/5 | 3/6 | Fixed on r005.8.16c3 | ||
| C5 | PUSH bc | 4 | 3 | PUSH qq (same as F5), ok using MEM_wr:low | |
| C7 | RST 00h | 4 | 3 | 3 | RST p. Fixed on r005.8.16c3. T States begin by "(5, " : M1 is longer than 4. | 
| C8 | RET z | 4/2 | 3/2 | 3/2 | RET cc, it seems correct following doc: true@5+3+3=>3*4; false@5=>2*4. T States begin by "(5, " : M1 is longer than 4. | 
| CC | 5/3 | 5/3 | 6/3 | Fixed on r005.8.16c3 | |
| CD | 5 | 6 | Fixed on r005.8.16c3 | ||
| CF | RST 08h | 4 | 3 | 3 | RST p. Fixed on r005.8.16c3 | 
| D0 | RET nc | 2/4 | 2/3 | 2/3 | RET cc, inverse of RET c. | 
| D4 | 3/5 | 3/6 | Fixed on r005.8.16c3 | ||
| D5 | PUSH de | 4 | 3 | PUSH qq (same as F5), ok using MEM_wr:low | |
| D7 | RST 10h | 4 | 3 | 3 | RST p. Fixed on r005.8.16c3 | 
| D8 | RET c | 4/2 | 3/2 | 3/2 | RET cc | 
| DC | 5/3 | 6/3 | Fixed on r005.8.16c3 | ||
| DF | RST 18h | 4 | 3 | 3 | RST p. Fixed on r005.8.16c3 | 
| E0 | RET po | 2/4 | 2/3 | 2/3 | RET cc, inverse of RET pe. | 
| E3 | 6 | 5 | 6 | Fixed on r005.8.16c3 | |
| E4 | 3/5 | 3/6 | Fixed on r005.8.16c3 | ||
| E5 | PUSH hl | 4 | 3 | PUSH qq (same as F5), ok using MEM_wr:low | |
| E7 | RST 20h | 4 | 3 | 3 | RST p. Fixed on r005.8.16c3 | 
| E8 | RET pe | 4/2 | 3/2 | 3/2 | RET cc | 
| EC | 5/3 | 6/3 | Fixed on r005.8.16c3 | ||
| EF | RST 28h | 4 | 3 | 3 | RST p. Fixed on r005.8.16c3 | 
| F0 | RET p | 2/4 | 2/3 | 2/3 | RET cc, inverse of RET m. | 
| F4 | 3/5 | 3/6 | Fixed on r005.8.16c3 | ||
| F5 | PUSH af | 4 | 3 | PUSH qq, 5+3+3=11<3*4, is MEM_WR prologation effective two times here 1T+1T? yes: pushing a register pair here, ok using MEM_wr:low | |
| F7 | RST 30h | 4 | 3 | 3 | RST p. Fixed on r005.8.16c3 | 
| F8 | RET m | 4/2 | 3/2 | 3/2 | RET cc | 
| FC | 5/3 | 6/3 | Fixed on r005.8.16c3 | ||
| FF | RST 38h | 4 | 3 | 3 | RST p. Fixed on r005.8.16c3. | 
CC codes : all ok DD codes : somes ko ED codes : somes ko FD codes : somes ko DD CB codes : all ko FD CB codes : all ko
r005.8.16c4 results :
r005.8.16c6 results :
Z80: Some bad instruction analysis
Based on [Zexall: Z80 instruction set exerciser], running fine in JavaCPC.
Z80: ED A9 cpd(r) / ED A1 cpi(r)
Problem here : CPDR and CPIR has same implementation than CPD and CPI.
Z80: TODO : cpc-power testbench
CPC-Power Z80 FULL TEST (UK) (2012) - UTILITAIRE
Some errors detected in r005.8.4 (test done by Philippe D.)
Z80: TODO : winape testbench
WinAPE plustest.zip (including Instruction and Interrupt timing tests)
DSK
It's data, insertion of disk.
DSK: Another disk selector
In first version of FPGAmstrad (NEXYS2) I used switches for disk selection. As final FPGA platform doesn't have any switches set, I have to add an BASIC instruction for it, something like "OUT &CAFE,disk_number" could be fine.
Since FPGAmstrad in NEXYS4, disk selection is done from keyboard, using "OUT &CAFE,disk_number" instruction. A reset key was added also. "PRINT INP(&CAFE)" does print current disk selected number.
DSK: FAT32 fragmented files support
Since advanced FDC, dsk files have to be defragmented. Only ROMs are safe with a not defragemented sdcard...
ZX-Uno is using simple FDC, not impacted here.
DSK: TODO : arnoldemu testbench
arnoldemu testbench: fdctest
arnoldemu's testbench to pass : test/fdctest/fdctest/fdctest.dsk
Contains the DFC "SK bit" test.
Have also to fix theses "Bad Command" responses from fdc (it seems that when you don't reach a track, you have to send back the current track instead of this "Bad Command" signal). Test : 30YMD demo, "disk change" message not running correctly, "another disk inserted" is not detected in this demo.
arnoldemu's testbench results :
CoreAmstrad r005.8.15
- 27FAIL01/29FAIL01 : read_track6/read_track10 - very big sector size counter not implemented (more than 512B)
- 3DFAIL/45FREEZE : read_data_ov/test_write_ov - using flag simpleDSK.IS_ARNOLDEMU_TESTBENCH=false this test will fail/freeze in final version. It will not be implemented (does slow down some demos : 30YMD/Batman)
- 41FAIL06 : test_write2 - does corrupt the testbench itself (writing a deleted mark in testdisk.dsk file) using flag SDRAM_FAT32_LOADER.IS_ARNOLDEMU_TESTBENCH=false this test will pass in final version, one time :)
- 51FAIL02/52FAIL01 : bad5_cylinder/bad6_cylinder - writing data without data is not implemented
- 59FAIL01 format1 - format command not implemented
- 5EPASS : check_dtl3 - does pass but well to know that a dtl write less than sector_sector_size will not be taken into account (due to write per block of sdcard)
- 60FAIL01 : format2 - format command not implemented (this test is slow)
arnoldemu's second testbench
http://www.cpctech.org.uk/test.zip arnold test last update. Folder disc/, tests : "seek, recalibrate, sense interrupt status, sense drive status, write protect"
PPI
Keyboard detection versus VSYNC signal versus interrupt cycle.
PPI: A better PIO
I'm looking after a great implementation of PIO, in original schematic of Amstrad, keyboard (output, not input) is mapped behind Yahama chip behind PIO. In some emulators, keyboard is mapped directly behind PIO. In original schematic, PIO is the only one component having a low state reset (0), I think that imply a 0 value as state init of internal components variable. Data bus of Z80 seems having a pull-up state (read 1 when nothing is plugged), for example a unplugged ROM does respond xFF in data-bus.
PPI: Yamaha clock
In r005.5 I build the Yamaha clock from GA. Unlocking "Saboteur 2" game.
Yamaha clock is generated by GA.
Yamaha clock (YM2149_linmix_AmstradStereo.vhd) is used only for "sound algorithm", not for setting/getting registers (registers are set using "BDIR BC2 BC1" wires), so I have to overclock the setting/getting register clock to simulate the original behaviour...
PPI: PPI clock
PPI in original schematic does not have clock ! So I have to overclock this one to simulate the original behaviour...
Overclocked at 16MHz.
PPI: arnoldemu testbench
arnoldemu's testbench PPI passed.
SOUND
PWM.
SOUND: PWM
Using a simple PWM, data is entered at a certain speed, the PWM clock speed.
If you simulate a constant PWM output signal at middle range of voltage (state just between 0V and 5V : 2.5V), it results an alternance of 0V and 5V, that result in a noise sound. In Arkanoid, this defect make some continues sounds instead of silents...
My idea is generating a sound having a frequency upper than dog ultra sound, while I want to simulate a constant 2.5V.
For this I do use two clocks entries in my PWM : one about data entry, and another about algorithm execution.
This result a high quality sound output (in addition to this nice Yamaha sound chip from fpgaarcade)
SOUND: Stereo sound output
Sound chip was modified in order to get channel A+B at left, and channel B+C at right. It was tested OK using STarKos 1.21 sound tracker (track "Carpet")
In r005.8.14.1 STarKos does feel better using parameter "MEM_wr:slow" in OSD menu.
run"stk / esc / enter / enter / => / enter / space (wait) / esc / ctrl+F2 / \/ (bottom arrow) / space
(ctrl+F1 to go back into the disk menu)
STarKos seems running PERFECTLY using A-Z80 instead of T80, please do contact me if you want a personalized fork version of CoreAmstrad using A-Z80 (I have just to switch a parameter : USE_AZ80:boolean:=false; in FPGAmstrad_amstrad_motherboard.vhd)
Agile method
This project results of an experiment applying Agile method. Finally this project has taken 5 months. The result is a standalone platform that can run several games of Amstrad. Normally, I had to dedicate 2 months on this project, but as result was so great, I continue to a standalone and better version.
This project was done for my father birthday, so sorry that I can't deliver it yet :^)
One day perhaps I'll write one book, or write a lot of wiki page by there, presenting step by step this adventure :)
But I want really to validate project before doing it. So it will stand a few I think.
Minimal Amstrad Architecture: Build your own Z80 Amstrad Computer
I explain here my first great experiment, having Amstrad saying hello :)
First schematic: Z80+RAM+ROM
Z80 can address from 0x0000 to 0xFFFF.
RAM is from 0x0000 to 0xFFFF.
You have lower and upper ROM, so starting at address 0x0000 you put OS464.ROM, and at address xC000 you put BASIC1-0.ROM.
- When Z80 do READ MEMORY, you read ROM
- When Z80 do WRITE MEMORY, you write RAM
- When Z80 do WRITE IO, you do nothing
- When Z80 do READ IO, you response it DATA=0x00
When you run this schematic on FPGA, RAM changes!
Second schematic RAM+VGA
With JavaCPC, when you snapshoot, and hex edit result file, you see RAM content starting at a certain address.
Do "paper 2", "cls" on JavaCPC, the screen became RED, and then save a snapshoot, you can see that last part (from 0xC000 to 0xFFFF) had change from a lot of 0x00 into a lot of 0xFF
So last part of RAM is used for video (it's shown on Quasar [[30]] and other legend websites...)
For making my VGA module, I take a look at UNIX "modeline" command that give us all timing for VGA signals, and it run :)
After having a VGA module displaying a RED screen (yeah!), I made it scanning last part of RAM (from 0xC000 to 0xFFFF), and I solved the puzzle.
RAM contain lines of 0xFF, each finishing by 0x00, but lines are not in great order
Third schematic Z80+ROM+RAM+VGA
Goal is: RAM empty at startup, VGA displays hello after run.
So you put the two last schematics together and tadam... got a problem.
The problem is that two components are accessing RAM at the same time: the Z80 and the VGA, so you had to make a sequencer. A sequencer is simply a counter fed by a clock: 00, 01, 10, 11. And you manage work task like this:
- 00 RAM WRITE start from Z80
- 01 RAM WRITE end from Z80
- 10 RAM READ start from VGA
- 11 RAM READ end from VGA
You plug sequencer(1) on z80 clock and not(sequencer(1)) on VGA...but another problem appears: VGA uses 25MHz speed for scanning RAM. So Z80 has to use same speed xD
To solve this problem you can use a special RAM done for this problem, a RAM that you can WRITE at a certain speed, and READ at another speed, this magic component is called ramb16_s16_s16. Note that they have no problem to write simultaneously on two RAM components, so that you can dump video RAM content using starter kit's external RAM, and you can display VGA using FPGA'z internal ramb16_s16_s16 RAM.
First and Second schematics video
http://www.youtube.com/watch?v=9Y-RvYMxnbE
Here I do program FPGA with a serial port RAM filler (homemade), and then I do upload ROMs (sooo long, you can show several blue progress bars), and then I do program FPGA first schematic : Z80+RAM+ROM, executing it, and then I do program second schematic : VGA is plugged on FPGA platform and does display "Ready".
Just about the "RAM filler", filling a RAM with data is perhaps the more important thing* about using FPGA platforms, do just remember that a RAM does keep its content as long as voltage is entered on it. So you can program FPGA without erasing RAM content. So you can program three FPGA programs : one for filling RAM, one for running a program using RAM, one for dumping RAM. Here I did use a serial transmit, but in fact, in last versions of FPGAmstrad, I do use a SDCARD with a homemade bootloader (filling RAM from SDCARD content)
I did use also Mock components, in first version of FPGAmstrad, FPGA platforms was too small, so I could have FPGAmstrad only with cool sound and low graphics, or else FPGAmstrad with cool graphics and no sound. Both using 100% of this old FPGA platform. A Mock component is a fake component, an empty one, just telling back "I'm OK, please thrust me I do exist" to others components trying to communicate with me (it's a entity with same in out ports as original but using constant output values)
- * [#RAM_dump]
Third schematic Z80+ROM+RAM+VGA video
http://www.youtube.com/watch?v=w_wifI-bRJc
The three main FPGAmstrad schematics
FPGAmstrad_bootloader_sd schematic
After having a first running Amstrad, I had to turn it into as standalone version. In fact before this step Amstrad ROM was put into RAM using serial port (#RAM_dump), it was slow, and Amstrad ROM was lost when I unplug electricity.
Bootloader FAT32 SDCARD is the only component playing with sdcard. Its tasks, all launched at boot, are:
- deploying ROM file on physical RAM
- deploying Nth DSK file on physical RAM, N being the binary number selected by 8 switches
State machine
Both components of Bootloader, it is to say SPI_MASTER and SDRAM_FAT32_LOADER components, does use several state-machines, one state-machine per process, each process communicating with another one using "MASTER/SLAVE" : the master state-machine does ask a slave to do something, and slave does notify master when its task is finished.
Using VHDL, I implement state-machine using a simple "switch case" on an integer. and before break I just change (increment...) this integer variable value, changing line of "switch case" this way. This "switch case" is encapsulated on a "if do/done do/done do/done" instruction. "do" being a boolean from MASTER, and "done" being a boolean from SLAVE. Each MASTER against SLAVE component has a "do" (input if SLAVE component, output if MASTER component) and a "done" (input if MASTER component, output if SLAVE component) wire. That's all. Like this you can run several sequential instructions, like reading and interpreting severals FAT32 variables using a SPI slaved component solving "read one byte at this address" instruction writen under a really low-level SDCARD protocol language.
Theses state-machines does use led debug : an integer contains the state of state machine, and this integer is displayed on 8 leds so you know where you are, it's for that I add several crash states in order to understand why and where component does crash. On MiST-board, this is displayed using the five 7-segment I just added in OSD, I added also an personalized "OSD menu entry" in order to select one or another state machine during first phases of MiST-board's version of this project (that's why you can still see a mysterious 7-segment still displayed at bottom of OSD, it's used sometime for debug purpose)
FPGAmstrad_amstrad_video schematic
VGA
The main component of this schematic is called aZRaEL_vram2vgaAmstradMiaow, due to my first experimentation about drawing a picture on VGA screen.
VGA display component does use the same parameters than unix modeline command, that's all you need, with that parameters you can display something on VGA at the frequency/resolution you choose.
VRAM to VGA
RAM and VGA does not use the same frequency. I add between them a magical VRAM having two clock entries and solving this problem automatically.
The magic RAM in FPGA, getting two clock entries, is not as magical as I was thinking : in fact it does solve clock equations using the clock manager (DCM) and BUFG components (saying phase is freedom between input and output). If you want a set of clocks synchronized do not add a BUFG in one of its wires. If you don't care about synchronize of two clocks, just add it and then it will help to solve finer and greater the clock manager equations of DCM while compiling.
If you seem interested about strange clocks generated during last step of FPGA compile, do look after "time constraints file" and "timing closure".
FPGAmstrad_amstrad_motherboard schematic
This is the core part of FPGAmstrad, it does represent the motherboard schematic of Amstrad, it was aligned to JavaCPC source code.
I fill this schematic component per component, comparing behaviour to JavaCPC components.
JavaCPC is developed in Java, and Java is so cool (Java is better computer language ever, and VHDL is better FPGA language ever =P)
I tickle JavaCPC in order to compare its components to my ones.
Emulator Architecture
 while (¡stop_emulation)
 {
   executeCPU(cycles_to_execute);
   generateInterrupts();
   emulateGraphics();
   emulateSound();
   emulateOtherSoftware();
   timeSincronization();
 }
 Figura 2. Basic Emulator Algorithm.
Extracted from the book [[31]],
Using this way, emulators reach a better running time. They don't need to implement the system-bus architecture[[32]] (CONTROL DATA ADDRESS) crossing Von Neumann architecture[[33]] (CU ALU MEM IO).
Component Architecture
Java is an object language, so having new, set, get, for each of its objects.
A Component Architecture in object language has a special cycle life :
- Build all components (new new new new new)
- Plug all components together (set set set set set)
- Run a main component.
Main component is Z80 on JavaCPC. In fact, JavaCPC's Z80 is already configured in order to run each instruction with a certain timing : a timing already synchronized with CRTC (each instruction takes 4 Tstates or 8 Tstates, Z80@4MHz CRTC@16bit@1MHz so drawing 8 colored pixels on mode 1 takes 4 Tstates)
My main component is #Clock_sequence
In real Amstrad, main component is GateArray.
The fact of choosing Z80 as main component just respects the emulation architecture.
Java debug mode
You can run JavaCPC on debug mode in Eclipse, and insert breakpoints.
It's useful for listening wires, and cut them. You can in live pause debug, cut a function and continue run.
Cut a wire, cut a function
Wires are done for sending message, a message in programming is a function call.
When we cut an input wire, we generally plug it to GND or Vcc.
For cutting a function, you have to insert a cut on it. A cut it's a return. You can insert a (very bad) forcing cut as:
if (1==1) return 0;
everywhere. So function is ended at this moment and next lines became death code. It exists quality code program for checking death code, because it's generally a bug of development, normally we put code in comment.
It is the way I used in order to induce JavaCPC, comparing it with my project.
Clock sequence
When we have to make several components to communicate in a perfect timing, making a sequencer is a nice approach : "It is now your turn to do something".
Clock sequence : first try (prototype)
Original Gatearray of Amstrad is a sequencer (counter plugged with a clock), it manages synchronization between video card and z80 and memory access.
Historically there is a link between CU of CU/ALU, and... control bus and... how making your own sequencer. But I will say no more in order to not disturb these text part xD
Whatever, I made my own sequencer here in form of a bus of 4 wires called CLK4. CLK4 executes a simple repetitive sequence like 0001 0010 0011... CLK4(3), the last wire is directly connected to Z80 clock entry. Components not using explicit CLK4 as clock entry are generally using a not(CLK4(3)) entry, in order to do operations not as same time than z80.
Real Amstrad uses buffer memory in front of each address and data access, and real z80 is clock low state active. Normally if you follow datasheet of z80 you know how to map memory following CU comportment. Or you do as Amstrad, saying that z80 CU sucks, I create my own sequencer, managing all my memories access, alternating CRTC work and z80 work with little synchronization, inserting by the way more pixels that can support my small CRTC...
How to use a sequence in VHDL :
if rising_edge(master_clk) then if seq="00" then elsif seq="01" then elsif seq="10" then else end if end if
What not to do :
if rising_edge(seq[0]) then end if
Because that can auto-generate bad unwanted sub-clocks...
Clock sequence : under time constraints (quality)
In fact, it's better to create your clock sequencer wiring each CLK and not(CLK) directly from DCM, in this case you enter in time constraints norm, and then rules/checks are done on every _edge instruction. Choosing only one sort of _edge (rising or falling) seems better also. Using that way you just have more "bad compiling error" shown, helping you creating a better code (more stable/quality).
Clock sequence using a counter plugged with a clock was in fact a bad practice (but running fine in my first versions of FPGAmstrad as I'm a good blind developer), because output are not under clock constraint : just think about that a "not" component added just after a clock wire is a Time Constraints bad practice... destroying "time constraint" solver (the one telling you when your clock domains are bad (and why), "time constraint" is last step of FPGA compiling process, it is an important step about quality, it shall be respected (generaly in a very last development effort, I shall say in a deploy effort))
Clock sequence : mirror VRAM (performance)
In order to get a better external RAM performance, and getting more luck about porting my project into others FPGA platform, I do now use a "Mirror VRAM" : external is just used by Z80 read and write (no more clock sequence finally ^^'). And a write in video RAM zone (like "poke &C000,255") does just write also in another parallel RAM, a FPGA internal RAM, that I call VRAM, this VRAM can be written at a certain speed and read at another for VGA purpose (FPGA internal RAM can be used like that)
USB Joystick
Before learning final platform and its embedded controlers (USB joystick with a controler, is just 7 wires : left right up down buttonX buttonY buttonZ), and after having destroyed 12 collector original joysticks during tests... I did some research about simply connecting a modern USB joystick into FPGA. It was a part of my Agile Method run, I worked about two months on it.
http://www.youtube.com/watch?v=5BERbI2kyfM
Sniffing USB frames
USB uses two wires in order to transmit frames, green and white, each with two logical values: 0v and 5v.
Let's plug a joystick on PC, if you listen at its two wires, you can sniff a USB transmission. Finally you can save it for example on RAM.
These two wires can be traduced into one with four states: 00 01 10 11.
One of this states is sleep state, in fact it depends on USB mode you use.
USB mode: USB1 or USB2; low speed, full speed or high speed
For sampling, I speed up five times the saving speed on RAM. I succeed sampling an USB1 transmission: "Logitech dual action USB joystick", and an USB2: "Sony PS3 USB joystick". PS3 joystick is not stable enough with my FPGA, but Logitech joystick is correct.
http://github.com/renaudhelias/CoreAmstrad/blob/master/BuildYourOwnZ80Computer/USB_logitech.vhd
http://www.youtube.com/watch?v=2zEp1tHroBs
http://github.com/renaudhelias/CoreAmstrad/blob/master/BuildYourOwnZ80Computer/USB_ps3.vhd
http://www.youtube.com/watch?v=fh4v4OXridc
USB is just a state machine (welcome how are you today, show me your state, show me your state, show me your state....), encoding (have to read USB manual), you can use some usb sniffer softwares to decode them (wireshark unix version does it fine). Sniffer software does not show low level messages (ack ko ok) but does show the high level messages (ones that show that a button is pressed or not)
As it is just encoding, you can capture signals and show that they differ only when you do unpress or press a button.
pull up and pull down
If you respect USB protocol, you have to plug some pull-up and pull-down resistors and some capacitors. But as I am a bad electrician, I just simulate then in VHDL, they are important because they cause USB speed negotiations. You also have an electronic mechanism in order to detect presence of joystick plug, I don't care about it.
For reaching which wire you have to pull-up or pull-down, here the tips :
- For slave (ideal for sniffing) : just take your USB1 joystick without plug it, just supply it (+5v red, 0v black), and test while-black and green-black with voltmeter, if you have got 5v then put a VHDL pull-up, and if you have got 0v then put a VHDL pull-down.
- For master (ideal for creating a mini-host) : just take your PC USB1 port, and test white-black and green-black with voltmeter, if you have got 5v then put a VHDL pull-up, and if you have got 0v then put a VHDL pull-down. Normally you result two pull-down.
Synchronize, decode and check USB frames
One time sample is done, it is not readable. In fact USB frames are synchronized (they started with a certain synchronization pattern), encoded (NRZI), and checked (CRC). CRC type depends on frame length. Encoding is done for synchronization optimization.
Then using USB HID manual, you can understand type of frames, and author of them, and remark that the author alternates: USB master (PC) or USB slave (joystick)
You can use some "USB sniffer software" in order to understand more easily some frames contain, but they generally don't give all frame, and full frame.
great crc check example in perl - offered by www.usb.org
Build a minimum USB master frames state-machine
Let's just plug a USB joystick on FPGA, directly, permanently, thinking about minimum coding size : we can't implement full HID USB protocol on FPGA ^^'
Objective here is to build a minimum state-machine graph, having for transaction between state a "frame transmission". It is normal on USB protocol to have error of transmission, so you have also to put "error frame transmission" on the graph.
At stabilization, you finally switch between two states, one sending a certain frame that contains at different offset simply certain values of joystick button.
At start, some frames are employed for "next frame description", they can generally be ignored, as our USB architecture is fixed and minimal (one USB joystick, that's all)
go further with USB sniffer
A better way to snif USB could be generation of TCP/IP packets encapsulating USB packets, and to record them directly on PC from a RJ45 plug, using this way I could save more than 10 seconds of information transmission (RAM size is limited on FPGA platfoms)
http://www.ulule.com/usb-paf (unfunded) => but MiST-board final platform does offer USB pro competition Joystick compatibility <3 <3 <3
A fork of USB Joystick by The EMARD
A fork of this minimalistic USB Joystick controler by The EMARD, going further :
http://github.com/emard/fpga-usbhid-host
Platforms
Why NEXYS2 500kgates starter kit
Xilinx schematics
Xilinx webpack software permit drawing schematics as book schematics, My point of view is : "For programming a FPGA, you draw a schematic as old books and just press one button. Each component on this schematic can be edited, in a language called VHDL".
My source code is not Altera compatible because of schematics drawn, but webpack can export vhdl code from schematics if you want.
RAM dump
A starter kit that contains a RAM component, that you can dump separaly : you can change schematics without loosing RAM content ! - and so write a schematic for dump only ;)
While power is on you can:
- programming FPGA with a program/schematics done for filling RAM
- press reset button
- programming FPGA with a program/schematics done for using RAM
- press reset button
- programming FPGA with a program/schematics done for reading RAM
- press reset button
My own made program does it with poor serial port, so for dumping all RAM content it takes about 3 hours, and for dumping Amstrad RAM part it is about 15 minutes.
On [Diligent NEXYS2 official page], you can download a "Onboard Memory controller reference design" that contains explanation and VHDL source code about dumping on RAM/ROM of NEXYS2 directly from PC (usb port). I didn't tested this yet, but it is certainly a nicer approach :P
FPGA internal RAM size
It's to know that a FPGA chip contain 45KB internal RAM (360Kb for NEXYS2 500k-gates, and 504Kb for 1200k-gates) so you can't insert a dsk inside. This internal RAM is already used in part by T80 (z80 from opencores), by the soundchip, and for special RAM ramb16_s16_s16 (RAM with two different speed one for writing another for reading, in fact two RAM with a common part) that I use for VGA mode.
VHDL components size
T80 (z80 processor) take 100kgates
Yamaha sound chip (from fpgaarcade) take 50kgates
InterruptGenerator + VGA mode take 50kgates
Bootloader (for standalone) take about 120kgates (FAT32 protocol, SPI protocol, DSK protocol)
Actually the project take about 99.9% of 500kgates. But I think that TV mode will take a lower size. A bigger size shall be great for Amstrad CPC Plus version, if JavaCPC evolve, and then if I evolve ;)
Why MiST-board final platform
Final version of FPGAmstrad
Altera schematics
Altera does also permit drawing schematics. I love schematics, my top file -gluing components- is sure a drawn schematic.
USB competition-pro Joystick
my favorite one <3
SDCARD entry robust
SDCARD player is nice built. It is not destroyed after 30 insertions. It is also easy to program FPGA : I just have to put my files into a SDCARD, and it runs, that's all.
Metal case
It's a true final platform.
Why ZX-Uno platform
Jepalza port
Jepalza has ported FPGAmstrad on it, A lot of thanks Jepalza !
Same FPGA as NEXYS2 500kgates starter kit
It's the opportunity to update the original simple prototype schematic.
low-cost FPGA
simple and over-documented
As the original, it is using simple components :
- simple VGA: it is using a 640x480 centered VGA display at 60Hz
- simple DSK: a dsk here is simply flatten into RAM parts
- simple bootloader: the bootloader is read-only, loading data using SPI protocol, and slave of a FAT32 state machine deploying this data into RAM just before turning on Z80.
- simple disk selection: the first disk is inserted at boot, and the "page-up" bottom does reset+insert the next disk.
- simple GateArray : CRTC0 only
and is over-documented... here !
Xilinx schematics
Schematics, as on original, are quite small, except the motherboard on that is comparable to original CPC motherboard schematic.
fork and merge
This version of FPGAmstrad is a 2011's fork of NEXYS2's FPGAmstrad, merged with last validated components of MiST-board version. This way no useless options are added, and the source code stay clear !
MiST-board - Core Developer's Notes
Here, you'll find all the Amstrad MiST Core development strategy : deployment of FPGAmstrad project on this lovely MiST-board final FPGA platform.
You'll find also the Amstrad MiST Core source code.
Goto MiST-board : CoreDocAmstrad if you want to test final version of CoreAmstrad running on MiST-board platform (a final-user platform)
From Xilinx to Altera schematics
For me, global schematics are really important, for developing and for deploying. A schematic developed in order to be comparable to original documentation schematic is nice. FPGAmstrad is composed of 3 schematics :
* amstrad_motherboard : comparable to original Amstrad schematic. * amstrad_video : does manage a true VGA output, using an internal VRAM. * bootloader_sd : sdcard bootloader, in order to load ROM and dsk at boot, from sdcard.
As Xilinx schematics are not compatible with Altera, I do generate "vhf" files, and rename them :
* FPGAmstrad_amstrad_motherboard.vhd * FPGAmstrad_amstrad_video.vhd * FPGAmstrad_bootloader_sd.vhd
And then I make a global schematic in Altera, that contains the previous components, and several MiST-board controlers :
* FPGAmstrad_amstrad_motherboard.vhd * FPGAmstrad_amstrad_video.vhd * FPGAmstrad_bootloader_sd.vhd * sdcard.v * user_io.v * data_io.v * sdram.v * osd.v
I create then a main clock component, generating all clocks I want even the not clocks (a good practice, while using "time constraints"), I also add some adapters, about wire/bus range solving :
* MIST_SDRAM.vhd : each SDRAM has a different RAM bus size * MIST_DQM.vhd : just a small wiring helper * MIST_RGB.vhd : each VGA output has different count of colors * MIST_STATUS.vhd : mapping status wires * CONF_STR.vhd : generating OSD parameter
Xilinx to Altera
While generating vhf files from Xilinx Schematics, a lot of small components have to be adapted :
* INV component became 'not' instruction * AND2 component became 'and' instruction * OR2 component became 'or instruction * GND component became '0' value * VCC component became '1' value
The internal RAM, sync and async (with one or two clocks) are to adapt, I use mem_altera_gen.vhd for this purpose.
Special things done during deploy
sdram.v is personalized in order to solve address **after** the write or read event. It is due to Amstrad that permit writing in RAM hidden inside ROM : if I read I read ROM, if I write I write RAM.
sdram.v is also personalized in order having a clkref lower than 4MHz.
RAM optimization
Some efforts done about internal RAM.
As I could not do enter my 32KB VRAM, I used a 16KB+8KB+4KB VRAM, to display the 640x480 output, scanning my 800x600 VRAM : bottom of my VRAM is useless for a 640x480 display.
At this step, I have 0KB of internal RAM free. Now let's do appear 16KB more in order to deploy fully my FPGAmstrad project !
RAM inferred : [in Altera reg is inferred into RAM-block](http://quartushelp.altera.com/13.0/mergedProjects/hdl/vlog/vlog_file_dir_ram.htm) (more), so a reg written like this :
reg [7:0] dir_entry_reg [31:0]
became RAM-block.
In data_io.v : dir_entry_reg does use /**synthesis noprune**/ in order to be not unwired. If I remove output dir_entry_d, RAM-block is inferred. If I let output dir_entry_d, LOGIC-block is inferred.
So I removed output dir_entry_d and set :
(* ramstyle = "logic" *) reg [7:0] dir_entry_reg [31:0] /* synthesis noprune */;
So I continue winning my 1KB internal RAM-block (here 256 Bytes was needed and turn into inferred LOGIC-block)
In sdcard.v
reg [7:0] buffer [511:0];
is a big reg and really important one (speaking to ARM SPI !), so I let it inferring into RAM-block. But about cid and csd I does :
(* ramstyle = "logic" *) reg [7:0] cid [15:0]; (* ramstyle = "logic" *) reg [7:0] csd [15:0];
Winning 2KB of internal RAM for this small 128Bytes reg :)
In VRAM_Palette, I had 16KB. But in fact a raster line is a 2+16+1 RAM palette line, so each line I store 19KB, so in fact 19*600/2=5700 bytes (800x600 VRAM in fact thruly 800x300). So only a 8KB RAM palette only was needed in FPGAmstrad project finally.
At this step I won 10KB of internal RAM (I need 6KB more to succeed in my full FPGAmstrad deployment)
I can nibble 2KB more at end of RAM palette, that's what I does.
Now I have 12KB of internal RAM free :)
And 4KB in VRAM :
800x600=100*300=30KB full 800x480=100*240=24000 24000-16384=7616<8KB=8192
so VRAM with a start vertical offset can be composed of 16KB+8KB only. So I won my last 4KB here.
I did patch my simple_GateArrayInterrupt component by parametering vertical offset :
GA_interrupt : simple_GateArrayInterrupt
generic map (VRAM_Voffset=>38*8-30*8-4*8+4 +0 +15) -- MiST +15 ?
I also patched my aZRaEL_vram2vgaAmstradMiaow component by parametering vertical offset :
XLXI_476 : aZRaEL_vram2vgaAmstradMiaow
generic map (VOFFSET_NEGATIF =>0, -- MiST 0 VOFFSET_PALETTE=>0) -- MiST 0
Now I can use my 16KB free RAM in VRAM double buffer. Reaching a full FGPAmstrad project deploy on MiST-board, unlocking others games : it is what is done in realise 002 of Amstrad core. I tested ChaseHQ does now run fine.
ZX-Uno - Core Developer's Notes
Why I destroyed the PPL
NEXYS2's FPGAmstrad version is using a PPL (a DCM : Digital Clock Manager), just for half part of clocks generation.
Then comes the sequencer (the counter used to divise time) that does not respect "Timing Contraints good practice", forcing then adding a "I dislike good pratice" sentence on .ucf file like that :
IN "XLXI_512/XLXI_579/COUNT_1_BUFG.O" CLOCK_DEDICATED_ROUTE = FALSE;
Having half of clocks generated by a PPL results on a project running fine one time on both compilation : you add some normal lines of code, and then you toss a coin.
Then I tryed, as on MiST-board version to manage all clocks from an unique PPL (good practice !), centering all clocks on one component, removing counter and also all logical "NOT" on clocks wires (good practice !), resulting then... in a electronic circuit that does not enter inside my FPGA chip. Damn.
So I go back to dark side, removing PPL. Recalibrating all clocks (this time "rising_edge against falling_edge" instead of "same edges" per component's process), and thinking "no more Time Constraints, no more problems around". And you know what ? I got that :
WARNING:Route:464 - The router has detected a very dense, congested design. It is extremely unlikely the router will be able to finish the design and meet your timing requirements. To prevent excessive run time the router will change strategy. The router will now work to completely route this design but not to improve timing. This behavior will allow you to use the Static Timing Report and FPGA Editor to isolate the paths with timing problems. The cause of this behavior is either overly difficult constraints, or issues with the implementation or synthesis of logic in the critical timing path. If you are willing to accept a long run time, set the option "-xe c" to override the present behavior. Intermediate status: 929 unrouted; REAL time: 3 hrs 35 secs
Damn, 3 hrs 35 secs of compiling... Then I used my brain and think that it is trying to stupidly clocking my "reset_key" wired between my keyboard clock and my bootloader clock... so I had this set of instructions inside SDRAM_FAT32_LOADER.vhd :
attribute keep : string; attribute keep of key_reset : signal is "TRUE"; attribute clock_signal : string; attribute clock_signal of key_reset : signal is "NO";
And tadam, less than half of hour to compile now ! and on a determinist way.
This formula does run also on bus (dsk_info bus wire coming from SDRAM_FAT32_LOADER to simple_DSK)
Internal FPGA RAM (VRAM) config
The internal dual RAM (written at 4MHz by Z80 and readden at 25MHz by VGA) are configured as "WRITE FIRST", "READ DOESN'T CARE".
aZRaEL_vram2vgaAmstradMiaow.vhd (the VRAM to VGA output part) has several manual counter offset calibrations, called "bug_*", it seems this component does not know counting right when reaching 25MHz (in fact it is, "mod" instruction does suffer a lot by here)
palette_D and aZRaEL's counters derailment
Compiler does detect when somes wires of a bus are not used, and when this bus is scanned by several counter derailing it results some data missing (this pixels normaly come from this offset, but is finaly calibred at this offset, so I plug it here and compiler does not thrust me, saying it's plug to an unused wire so does compile all that to GND... black screen)
Solution : using all wires of palette_D, taking care the compiler does not remove an "useless" wire from bus, and do calibrate manualy the derailing counters (all that "bug_*" constants inside aZRaEL_vram2vgaAmstradMiaow.vhd)
Source code
FPGAmstrad source code (Xilinx)
The project binary downloadable on #How_to_assemble_it section contains in fact source code and the binary file (.bit)
This is a simple zip of project folder.
The project was done using Xilinx webpack
It contains some direct drawn schematics, and VHDL components
MiST-board CoreAmstrad source code (Altera)
MiST-board CoreAmstrad source code
Compiling OK in Quartus II 13.0 (Altera IDE), and a few in ISE Design Suite 14.7 (Xilinx IDE) - I have to report back some modifications from my deploy platform(Altera MiST-board) to my dev platform (Xilinx NEXYS4 from Digilent Inc.)
ZX-Uno FPGAmstrad source code (Xilinx)
Schematics
I explode the main schematic into a task by component, so the schematic is big.
Starter kit use only one RAM physical component for RAM ROM and DSK alignment, so I had to manage accesses (it is possible in fact because Z80 is a sequential processor)
My clock take 4 wires, in fact it exists a clock sequence #Clock_sequence (during 1 z80 tic, I do several things)
RAM is done for being dump, comparable to JavaCPC snapshoots.
Components
sound chip is ym2149 one, patched, and repatched in order to get stereo sound.
PPI chip is 8255 CPCWiki one, patched.
PWM chip is PWM_DAC fpga4fun one, patched in order to get high sound quality (my PWM has two clocks in entry)
And thanks
Certainly first thanks to Markus Hohmann, for having programmed a Java version of CPC, I love Java and VHDL, so this project comes from this Java Amstrad emulator.
Secondary Steve Ciarca, author of "Build your own Z80 computer" (1981), so nice book.
Then the author of the VHDL version of Yamaha sound chip : fpgaarcade. And opencores for the Z80 (T80)...
And websites that give access to so much old Amstrad resources like :
- CPCWiki :) the Amstrad community
- Quasar Net fr lost legends
- Grimware lost schematics knowledge
- CPCRULEZ fr assembler and legends
- Genesis8 fr games news
- Amstrad TODAY fr a nice link list
- Push'n'Pop lost FDC knowledge
And more :
- CPC GAMES REVIEWS large illustrated dsk image database
- JavaCPC emulator Markus Hohmann
- CPCMANIA plug Amstrad on TV
- Bellaminettes fr Artist drawer -nice girls- from ACBM magazine - Les puces informatiques - Sasfepu
Appendix
MiST-board special features
Bulk of effort done/TODO-list, especially for the MiST-board's CoreAmstrad implementation.
ROM/RAM
ROM/RAM : extension
In r004, you have more RAM +512KB, and you can add ROMs.
- LowerROM has .eZZ file extension
- UpperROM has .e00 ot eFF file extension (hexa)
In r005.4, I add another UpperROM set : .f00 to .fFF file extension (hexa). If you press "space" during a reset_key ("page up" key), upperROM files used range from .f00 to .fFF instead of ranging from .e00 to .eFF. LowerROM .eZZ file extension is still used in both case.
ROM/RAM: TODO : RAM 4MB extension
Why not ?
VIDEO
VIDEO: A SCART output
In order to plug FPGAmstrad on TV, and help debugging. And also to test a simple scan-doubler.
r005c17 : experimental version, original signal TV output is running fine, with OSD menu. Have to add a flag in mist.ini instead of using OSD menu. scan-doubler doesn't run ok in mode 2, and has strange offset with Arkanoid (vertical display games), so it unvalidated : only original TV output will be added to r004 in r005.
r005 : VGA 60H/TV 50Hz.
VIDEO: An OSD option to enable scan-doubler
scan-doubler (simple TV to VGA converter) doesn't run ok in mode 2, but there is some many recent demo effect that doesn't pass using current VGA 72Hz implementation. Have to try to insert both VGA implementations.
core_r005c18 seems having a scan-doubler output, have to merge it.
VIDEO: A SCART output with border
Original output signal has no border, I have to implement the original border in TV mode.
Priority: HIGH! (asked by Markus Hohmann)
Done in r005.8.14.2
VIDEO: move SCART parameter into mist.ini
Doing like in other cores : do use the global "scandoubler" option in mist.ini to switch between VGA and TV mode.
VIDEO: mix SCART H and V sync into HV sync (sort of C sync)
Amstrad CPC core · Issue #35 · mist-devel-mist-binaries · GitHub :
SCART TVs expect a composite sync. The VGAs vsync is connected the SCART pin used to detect a RGB signal and is constantly driven high. A TV will not cope with a video signal with separate H and V sync. Bu tit's usually sufficient to xor hsync and vsync to get a csync acceptable for many TVs.
So something like this
Vsync=1; Hsync=old_Vsync xor old_Hsync;
Done in r005.8.14.1
VIDEO: refactor of Parrot PAL signal
I found a running 15kHz TV, with mist-board tutorial lesson11 Parrot PAL running fine, but not with CoreAmstrad r005.8.14.1. It's the same TV I used some years ago at festival with original CPC. I have to refactor Parrot tutorial and adapt it on CoreAmstrad in order to generate a better TV signal quality.
Done in r005.8.14.2
In theory, simple_GateArrayInterrupt.vhd shall have :
vsync_azrael<=etat_monitor_vhsync(2); hsync_azrael<=etat_monitor_hsync(2); if hSyncCount=2+4 then
In practice - in r005.8.14.2 - here we have :
vsync_azrael<=etat_monitor_vhsync(1); hsync_azrael<=etat_monitor_hsync(1); if hSyncCount=1+4 then
This way screen is nicely centered but CPCWiki rule "The HSYNC is modified before being sent to the monitor. It happens 2us after the HSYNC from the CRTC and lasts 4us when HSYNC length is greater or equal to 6. If R2=46, and HSYNC width is 14 then monitor hsync starts at 48 and lasts until 51." is not respected.
Test about centering screen are done using "BORDER 0", this way border is ignored and does interact with HSYNC/VSYNC screen synchronisation.
VIDEO: CRTC1
r004.8 : a better CRTC/Gateway implementation, following better JEmu (JavaCPC) one... but it is a CRTC1 (but a better ONE) Some bugs came from PPI also (keyboard bugs in particular), solved in r004.8
VIDEO: CRTC1 detection
I don't remember exactly, but in r005.8.4, one of "Midline Process"/"From Scratch"/"Pheelone" demo does crash due to a "CRTC1 needed" message : my CRTC1 seems not detected as a true CRTC1... If's "From Scratch" that does display this message in fact.
Done in r005.8.14 : Still Rising (Vanity) demo can be launched, better using "MEM_WR:slow" mode.
VIDEO: TODO : Interlaced scanlines
Interlaced scanline is an effect existing in CRTC (register R8) used by Wolfenstrad demo Seen also at begin of R-Typeee.dsk ("stereo soundtrack" message's picture), and seem also used in a lot of recent demos as "flipping lace" effect.
Scanline is also used just at begin of Pinball_Dreams__PREVIEW.DSK (eagle draw) - in fact I've got a doubt here, it seems more about a problem of HSYNC edge choice of alignement here.
Les Sucres en Morceaux - Amstrad CPC - Identifier les CRTC
OUT &BC00,8 OUT &BD00,3
L'écran passe en 100Hz, les registres 4 et 7 doivent être doublés pour retomber sur 50Hz
VIDEO: Scanlines
Here effect is about simulating CRT (not CRTC.R8) original screen. There is several way to implement it. Here, truly one line out of two is 1/2 darker. By visual effect this result in "a thin full black horizontal line".
VIDEO: Monochrome option
Add an option to turn screen into green monochrome mode (in mode TV and in mode VGA)
done in r005.8.9.2 (Soleil Vert demo)
Les Sucres en Morceaux - Couleurs - 1 - Les couleurs du CPC
VIDEO: Monochrome OSD
Could be great having the OSD in monochrome when monochrome is selected and scanlined when scanline is selected
Done in r005.8.14.4
VIDEO: TODO : Scanline during monochrome + scandb50Hz modes
soleil vert demo display result is best using scandb50Hz mode (r005.8.16c29) because it does alternate two pictures at 25Hz, seeming then like a fixed image for humans.
But my scandb50Hz option does not enable yet the scanline effect that could improve her agains this demo. To do.
VIDEO: USELESS : welcome VGA signal
While bootloader is not fully started, do display a lighter screen output (not darker pixels as original screen color CPC depth using more resistors), as it VGA should be nicely centered at each boot. And then after come back to original CPC pixel depth.
Some VGA does detect FPGAmstrad resolution just if pixels are ligther, so I turn them lighter during start of engine. Normaly a press into reset button (the one front the sdcard entry) does solve directly this problem (you can also turn on screen before MiST-board with this sort of screens)
I tryed also menu core project with my stupid screen, as it normally I can power on MiST-board before screen for FPGAmstrad (switching core does the stuff here also)
Tryed in r005.8.14.4 : lighter pixels during bootload. Also with a full white screen.
This solution does not fix the problem of "stupid screen", but reveals something interesting about the defect (next chapter)
VIDEO: TODO : SAMSUNG 16/9 tests
Using lighter pixels full white screen during bootload show me that screen doubts between two positions : a perfect centered 4/3 with 6.5 centimeters horizontal border each; and a starting 16/9 at left, crop at 6.5 centimeters left.
Without lighter pixels full white screen, the crop of image does change, moving into first displayed characters : in fact in SAMSUNG menu, the position of screen is not 50 50, if you put 50 50 you come back to "lighter pixels full white screen" defect. So here screen begining at first char displayed on screen is a second defect, but a small one, as you just have to set 50 50 in SAMSUNG menu.
So back to previous bug : screen doubt between two positions "a perfect centered 4/3 with 6.5 centimeters horizontal border each; and a starting 16/9 at left, crop at 6.5 centimeters left". When displaying a game, in fact, in found two different case in "perfect centered 4/3" case , this case is not so perfect, it does also doubts between two positions :
- one time screen does crop at 6.5 left and right, changing the screen vertical position using menu does translate the image cropping left and right at fix position : 6.5 centimeters fix black border. About extra 1 centimeter pixels : image in middle of image does move, but not the borders at all.
- a second time image does move perfectly (completely/totally) left and right without crop, and if centered has 6.5 black border left and right. This time image seems complete but crushed.
During ZX-Uno merged, I found two bug on VGA implementations (true ones ?), first being horizontal and vertical counter not reaching VTot/HTot (one clock tic missing), and second the horizontal counter limited to 1024 not reaching HTot that seems more than 1024. Perhaps, if this bugs are valided as it, do go back on original 800x600@72Hz modeline formula.
DSK
DSK: A advanced dsk drive
Done on r004, I added also a second Drive in order to copy easily files from one disk to another.
Irregular sector size ok.
You just have to select Drive A or B from OSD before selecting another dsk file.
Write is done directly on sdcard dsk file, so you can save games, and write texts...
You can now change disk without reset. And then play games using several disks.
CPCWiki forum - Amstrad CPC hardware - FDC floppy t80ds detection : talk about FDC in MiST-board CoreAmstrad.
Since r004 "mecashark", the FDC implementation has write access !
DSK: TODO : SNAP DSK
Add an option in OSD MENU : "SNAP DSK". Does create a copy of current disk in current drive into "SNAP[number].DSK". Heuristic for number : file count (at boot, incremented at each snap dsk done)
DSK: TODO : fix message "This program will not run in this environment. Press any key"
HartOz
The core does not support the bundled CP/M+ software. With a valid working CP/M+ Disc1 image mounted, the systems returns with the following message after issuing the |cpm command. "This program will not run in this environment. Press any key"
Due to using wrong language version of CP/M+ disc (cpmpluf1.dsk is french version of CP/M+)
"Wrong disk for your configuration" message seen in one-disk version of "Batman Forever" demo (two separate disk version runs fine), in forum they say that dsk image is using "bad track numbers", in fact when looking at a Track-Info with side 1 (instead of 0), track and side are correct in Track-Info but side is not ok in Sector-Info, normaly track/side are ignored in Sector-Info (Track-Info is used for that)... but still having the message, something else seems also wrong.
Do fix also message "Bad Command" while running a not existing file on disk.
Certainly linked to Orion Primes.dsk loading problem.
DSK: TODO : tapes
Do read .CDT files also.
I think @ralferoo had already written FPGA code for tape reading for his FPGA CPC. Maybe you can borrow some code from him? Bryce.
DSK: TODO : snapshoot purpose
Like in emulators, do something to go back in time while running a game.
Transmit
Could be nice around cross-dev.
Transmit: TODO : Ethernet
Integration of "ethernec.v".
Several multiplayer games using several CPC does already exists : Virtual_Net_96.
X/Y
X/Y: TODO : A X/Y input
I want to work also on screen-pen entry, is there a manner to detect an analog X/Y as pen or gun ? YES : Markus Hohmann does it, he implements the lightgun on JavaCPC-GX4000 using mouse :)
http://cpcrulez.fr/hardware-pistolet-magnum_light_phaser_ACPC.htm
register 11,12 and 13 ?
X/Y: TODO : AMX mouse support
Asked by KLNHOMEALONE.
Others tricks
If you aren't ready yet, here somes experiments (youtube) on real Amstrad :

























