Going from Virtex-2 pro to SmartFusion2 Learning by doing (mistakes) On behalf of the RCU2 collaboration: Johan Alme (johan.alme@hib.no) FPGA Forum 2015, Trondheim 11. 12. Februar 2015
Outline This is a talk about how to put a design in a very bad state, and the struggles you get when porting it to a state-of-the-art technology. Johan Alme (johan.alme@hib.no) - FPGA Forum 2015, Trondheim 11. - 12. februar 2015 2
The Project ALICE Detector @ CERN 216 RCUs 216 RCU2s 216 xc2vp7 216 ms sf2 Johan Alme (johan.alme@hib.no) - FPGA Forum 2015, Trondheim 11. - 12. februar 2015 3
2001 RCU RCU2: A Little Bit of History First prototype RCU: Altera APEX device on PCI board FPGA design: Schematic/VHDL Main Loc: Bergen By Designer A (inexperienced) Johan Alme (johan.alme@hib.no) - FPGA Forum 2015, Trondheim 11. - 12. februar 2015 4
RCU RCU2: A Little Bit of History 2001 First prototype RCU: Altera APEX device on PCI board 2003 Next step: Altera APEX device on dedicated motherboard FPGA design: Schematic/VHDL/ Verilog By Designer A, B, (C & D) Still inexperienced.. Main Loc: CERN Eventually ALL Design in verilog/ Altera schematic But: FAILED in Irradiation tests! Johan Alme (johan.alme@hib.no) - FPGA Forum 2015, Trondheim 11. - 12. februar 2015 5
RCU RCU2: A Little Bit of History 2001 First prototype RCU: Altera APEX device on PCI board 2003 Next step: Altera APEX device on dedicated motherboard 2005 Next step: Xilinx Virtex 2 Pro VP7! Everything ported! All schematics design tried to be understood and rewritten in Verilog Designer B, (C & D) Main Loc: CERN Not really working properly Johan Alme (johan.alme@hib.no) - FPGA Forum 2015, Trondheim 11. - 12. februar 2015 6
RCU RCU2: A Little Bit of History 2001 First prototype RCU: Altera APEX device on PCI board 2003 Next step: Altera APEX device on dedicated motherboard 2005 Next step: Xilinx Virtex 2 Pro VP7!, pt 1 2006 Next step: Xilinx Virtex 2 Pro VP7!, pt2 Mainly verilog Design by Designer C and D. New designer E (that s me!) wrote VHDL module! Main Loc: CERN (Bergen) No spec document written and agreed upon module design is isolated Johan Alme (johan.alme@hib.no) - FPGA Forum 2015, Trondheim 11. - 12. februar 2015 7
Design Philosophy Xilinx Virtex 2 pro code «I don t care if the code is nice, just make the goddamn thing work» Projectleader 2007 Johan Alme (johan.alme@hib.no) - FPGA Forum 2015, Trondheim 11. - 12. februar 2015 8
Design Philosophy Xilinx Virtex 2 pro code «I don t care if the code is nice, just make the goddamn thing work» Projectleader 2007 And then we add: - Overall System: Extremely Complex many devices that should talk to eachother - Deadline: always yesterday - Designers: students (mostly) - Coding guidelines: never heard of that - Design review: only on structural level Johan Alme (johan.alme@hib.no) - FPGA Forum 2015, Trondheim 11. - 12. februar 2015 9
RCU RCU2: A Little Bit of History 2001 First prototype RCU: Altera APEX device on PCI board 2003 Next step: Altera APEX device on dedicated motherboard 2005 Next step: Xilinx Virtex 2 Pro VP7!, pt 1 2006 Next step: Xilinx Virtex 2 Pro VP7!, pt2 2007 Next step: New Designer New Designer F. Experienced VHDL Designer. Everything ported Module by module to VHDL w/help of Bachelorstudent Main loc: CERN Design in a working state just in time for Run1 (2008) Johan Alme (johan.alme@hib.no) - FPGA Forum 2015, Trondheim 11. - 12. februar 2015 10
2001 RCU RCU2: A Little Bit of History First prototype RCU: Altera APEX device on PCI board 2003 Next step: Altera APEX device on dedicated motherboard 2005 Next step: Xilinx Virtex 2 Pro VP7!, pt 1 2006 Next step: Xilinx Virtex 2 Pro VP7!, pt2 2007 Next step: New Designer 2008-2013 Run1 design running remarkable stable The design was actually very stable during operation. Even with no rad. protection in the design (The chip was too small or the design was too big) Maintained by designer F. AMAZING JOB! Johan Alme (johan.alme@hib.no) - FPGA Forum 2015, Trondheim 11. - 12. februar 2015 11
The history looked streamlined but If drawn correctly it would really look a bit like this 2001 2003 2007 2005 2014 2006 2008 NOW 2013 Johan Alme (johan.alme@hib.no) - FPGA Forum 2015, Trondheim 11. - 12. februar 2015 12
RCU RCU2: A Little Bit of History 2001 First prototype RCU: Altera APEX device on PCI board 2003 Next step: Altera APEX device on dedicated motherboard 2005 Next step: Xilinx Virtex 2 Pro VP7!, pt 1 2006 Next step: Xilinx Virtex 2 Pro VP7!, pt2 2007 Next step: New Designer 2008-2013 Run1 design running remarkable stable 2013 RCU2 upgrade decision taken RCU2 was supposed to be a «simple» upgrade Update FPGA to new technology - Increase speed - Improve rad tol Many of the same team of designers still on board. Let s simply port the design! Johan Alme (johan.alme@hib.no) - FPGA Forum 2015, Trondheim 11. - 12. februar 2015 13
RCU RCU2 - the simple upgrade FECs Branch A GTL bus branch A SIU DDL link (optics - 160 MB/s) FECs Branch AO FECs Branch AI A_outer A_inner DDL2 link (optics 5 Gb/s) RCU RCU2 FECs Branch B GTL bus branch B DCS TTC (optics) Monitor/Control (Ethernet) FECs Branch BI FECs Branch BO B_inner B_outer TTC (optics) Monitor/Control (Ethernet) There is no such thing as an simple upgrade Johan Alme (johan.alme@hib.no) - FPGA Forum 2015, Trondheim 11. - 12. februar 2015 14
RCU2 The ALICE TPC readout electronics consolidation for Run2 The Microsemi Smartfusion2 M2S050-FG896 provides: Radiation Tolerant Flash Cells SECDED encoded DDR RAM interface / ETH MAC / int. SRAM Microcontroller Subsystem with ARM Cortex M3 and useful peripherals Platform for Embedded Linux Max 5 Gb/s operation in custom working mode on one lane of the SERDESIF for DDL2 Enough resources to have TMR on vital parts of the logic Johan Alme (johan.alme@hib.no) - FPGA Forum 2015, Trondheim 11. - 12. februar 2015 15
The Initial RCU2 FPGA Design Strategy «Don't fix what's not broken.» Robert Atkins The design had been working nicely for 4 years! So Let s just: Port the mem blocks and technology dependent cores Connect to the new ARM MSS How hard can it be? (answer in two slides from now) Robert Atkins never ported the RCU design Johan Alme (johan.alme@hib.no) - FPGA Forum 2015, Trondheim 11. - 12. februar 2015 16
Status of the original FPGA design (summarised by T. Alt 30.10.2014) Johan Alme (johan.alme@hib.no) - FPGA Forum 2015, Trondheim 11. - 12. februar 2015 17
Example Monitoring and Safety Module A custom I2C Master Consist of 21 separate VHDL files Sysclk 40 Mhz RCU bus Din Dout Addr we Uses I2C clock as system clock 5, 2.5 or 1.25 MHz Streching of 40 MHz domain we signal Works for 5 MHz & 2.5 MHz Just sometimes for 1.25 MHz din2 Monitoring and Safety Module (custom I2C protocol) dout sclk Sysclk = sclk Configurable 1.25 MHz, 2.5 Mhz or 5 MHz din1 The I2C slaves are designed by the same designer: code unreadable & not documented Original design strategy: «don t touch» 10 13 I2C Slaves 10 13 I2C Slaves Johan Alme (johan.alme@hib.no) - FPGA Forum 2015, Trondheim 11. - 12. februar 2015 18
How hard can it be? Answer: Very Hard The design still worked in simulations, but Synplify removed a lot of logic during synthesis. Result: design was not working at all No one was really capable of reading and understanding the code We had only one thing to do Johan Alme (johan.alme@hib.no) - FPGA Forum 2015, Trondheim 11. - 12. februar 2015 19
RCU2 FPGA design square 1 Johan Alme (johan.alme@hib.no) - FPGA Forum 2015, Trondheim 11. - 12. februar 2015 20
Next chapter: Microsemi Betatesters When we started the project, only the first Engineering Sample of Microsemi Smartfusion2 was available This meant: Unfinished documentation (i.e. errors in the errata sheet) Errors with the device Missing features and Radiation related problems Libero (dev tool) was (is?) immature and with flaws Note: this is normal for state-of-the-art devices Similar experiences seen earlier with brand new Xilinx and Altera devices Johan Alme (johan.alme@hib.no) - FPGA Forum 2015, Trondheim 11. - 12. februar 2015 21 SELs in Engineering Sample 1st production batch
Libero Fun Stuff Example in SERDES Communication Custom 4.2 Gbps link: Using the SERDES + the Core EPCS was a major problem Setting it up on a Xilinx Virtex 6 device: ~2 days Setting it up on the SF2: ~1 year Problems: The timing changed from one Libero version to the next Custom delay chain had to be set up in fabric for two hard cores to talk to each other From Libero v11.3 special EPCS core delivered to us from Luca Cattaneo (European technical director, Milano) WORKING Single Event Latchups Reduced core voltage to 1.0V Timing models not correct NOT WORKING Libero v11.4 Tried std EPCS core, not working - reverted to «Luca core» (Nov 2014) Libero v11.4 Removed delay chain still working w no timing warnings (Jan 2015) Libero v11.5 «luca» EPCS core not compatible can not test Johan Alme (johan.alme@hib.no) - FPGA Forum 2015, Trondheim 11. - 12. februar 2015 22
Microsemi Libero Snags (I) Repository system Difficult to use in combination with a repository system Generated cores can be imported but need manual regeneration MSS system core can t be manually regenerated as normal cores, one has to go through the whole configuration GUI. Dependencies of generated core files not clear, hard to understand which files need to be put in a revision system. Core files contain binary files (sdb) which are not optimal for a revision system. Ideal: All source files and files required to generate a core are either text files or can be be exported in a way that they can be easily imported The complete design flow can be scripted easily: Check out the sources and the scripting files from SVN/GIT, run Make Easily doable for e.g. Xilinx, not doable for MS Johan Alme (johan.alme@hib.no) - FPGA Forum 2015, Trondheim 11. - 12. februar 2015 23
Microsemi Libero Snags (II) MSS system (I) MSS needs to be configured via SystemBuilder or SmartDesign SystemBuilder is the default. BUT Essential functionality can t be configured via SystemBuilder, e.g. the FIC address map, number of slaves. To modify the FIC address map, the MSS module needs to be converted from SystemBuilder to SmartDesign. Re-converting from SmartDesign to SystemBuilder means loosing all information done in SmartDesign. This means there are two configuration systems in place, each allows to configure parts, which the other can t but changes will be lost when switching between them. A few more «features» exist saved for backup slides Johan Alme (johan.alme@hib.no) - FPGA Forum 2015, Trondheim 11. - 12. februar 2015 24
BUT The support from Microsemi has been excellent. We have been in touch with the developers themselves directly They have been onsite at CERN sitting with us They are screening the latest pre-production batch of SF2 FPGA for correct speed-grade for us. Not originally intended the next «proper» batch is due in March/April too late for us. And Microsemi might not be without faults but neither are we! And neither are any other company starting with an A or an X Johan Alme (johan.alme@hib.no) - FPGA Forum 2015, Trondheim 11. - 12. februar 2015 25
Summary What have we learnt from this? Never trust an undocumented design When designing use/make guidelines, document your code & review the code If the design is unreadable don t waste your time trying! And State of the art technology comes at a cost Good customer support is extremely important Johan Alme (johan.alme@hib.no) - FPGA Forum 2015, Trondheim 11. - 12. februar 2015 26
Thanks for Listening! Final remark: Problems did not stop us! We have ordered 250 SF2s 6 RCU2 installed as a test sector in week 3 210 more to be installed in May/June! Any questions? Happy RCU2 team in front of sector C08 Johan Alme (johan.alme@hib.no) - FPGA Forum 2015, Trondheim 11. - 12. februar 2015 27
Backup slides Johan Alme (johan.alme@hib.no) - FPGA Forum 2015, Trondheim 11. - 12. februar 2015 28
This took us all the way back to square one If drawn correctly it would really look a bit like this 2001 2003 2007 2005 2014 2006 2008 Decision for RCU2 are taken 2013 May/June 15 Johan Alme (johan.alme@hib.no) - FPGA Forum 2015, Trondheim 11. - 12. februar 2015 29 NOW GOAL!
Microsemi Libero Snags (III) MSS system (II) Hard errors in configuration options (v11.4) When selecting Dedicated Pad as an input clock for the MSS system and using the FIC Interface, the system will automatically insert a fabric CCC and feed the MSS from the output of the FCCC. This is in conflict with the Dedicated Pad option and will result in a cryptic error during P&R. MSS configures the SerDes automatically as PCIe in the system reset core, including a HotPlug fix and other logic, even if the SerDes are used in EPCS mode. Johan Alme (johan.alme@hib.no) - FPGA Forum 2015, Trondheim 11. - 12. februar 2015 30
Microsemi Libero Snags (IV) General Issues Schematicviewer and Multichipnavigator: Fixed size dialog boxes. Hiding menus etc NetList Viewer Long signal names overlap and can t be read. Limited amount of options for Compile, Place & Route e.g. no possibility to put IO registers automatically in IO cells Only way is to set a constraint for each register In case of P&R errors the amount of information where the problem is located is nearly zero! No info about which signal or part of the design caused the error. Debugging requires trial and error to locate the issue!!!! MS support command line support via TCL However, there is no option to auto-regenerate cores => Done manually Johan Alme (johan.alme@hib.no) - FPGA Forum 2015, Trondheim 11. - 12. februar 2015 31