

## **OVERVIEW**

SCOORE is a high performance Out-of-Order SPARC V8 processor currently under implementation by the MASC research lab at UCSC.

SCOORE has several novel aspects; it is an Out-of-Order implementation of the SPARC V8 ISA, it is also being developed with full FPGA and ASIC compatibility as a major design goal. In comparison with current Intel and AMD processors, SCOORE has a larger Issue Logic, Reorder Buffer (ROB), and Register File size. Design specifications call for an operating frequency of 1.4 GHz on 90 nm ASIC, and 175 MHz on an FPGA.

### **FPGA IMPLEMENTATION**

SCOORE is being used as a development platform to define a set of guidelines for FPGA-friendly Out-of-Order CPU designs. An Out-of-Order CPU completely implementable on an FPGA has not yet been efficiently achieved. This is a valuable feature that allows researchers to perform realistic evaluations and simulations.



- Xilinx Virtex-5
- •XUP Board
- •DDR2 SODIMM
- •Shared Board with OpenSPARC

•Full System

- •Nallatech FSB
- •X86 Host Boots Linux

Rigo Dicochea, Tom Golubev, Abhishek Sharma, Anupam Garg, David Munday, Gregory Jackson, Carlos Cabrera, Elnaz Ebrahimi, Jose Renau

- 1.4 GHz ASIC Frequency
- Efficient FPGA Synthesis
- 2 Way SMP

0-bit History – 3-bit or 31-bit History – 5-bit History 8-bit or 50-bit History 12-bit History 19-bit or 80 bit History

- 6 Table Predictor
- Variable Branch History: 0 to 80 bits
- Speculative Update with Fixup
- 4 KByte direct map BTB prediction
  - 2 Predictions per Cycle
- 32-entry RAS predictor

| BPRED  |
|--------|
| CRAC   |
| LOI CA |
| LOD C  |
| SCHED  |
| RAT/R  |
| SELEC  |
| FPU    |
| COMP   |
| LI CAG |
| TOTAL  |
|        |

## SCOORE Santa Cruz Out-of-Order Risc Engine

Micro-Architecture Santa Cruz (MASC Group) Dept. of Computer Engineering, UCSC



**INSTRUCTION FETCH** 



- •100 Kbits OGEHL predictor
- AUNIT
- 4 Cluster Implementation
- AUNIT: Arithmetic Unit (1 port) •Simple Arithmetic
- BUNIT: Branch Unit (2 ports) •Branches/Arithmetic
- CUNIT: Complex Unit (1 port) •Floating Point, mult, div
- MUNIT: Memory Unit (2 ports)
  - Load/Store

|             | FPGA    |      | ASIC                               |                                  |                          |                |        |                                 |  |
|-------------|---------|------|------------------------------------|----------------------------------|--------------------------|----------------|--------|---------------------------------|--|
| UNIT        | LUTS    | RAMS | AREA (COMB)<br>( mm <sup>2</sup> ) | SRAM AREA<br>( mm <sup>2</sup> ) | MAXIMUM<br>POWER<br>(mW) | LEADERS        | STATUS | ESTIMATED<br>COMPLETION<br>DATE |  |
| D           | 2,000   | 12   | 0.10                               | 0.70                             | 200                      | Tom            | 60%    | IQ'10                           |  |
| CK/DECODE   | 5,000   | 2    | 0.20                               | 0.10                             | 300                      | Carlos         | 35%    | 2Q'10                           |  |
| ACHE        | 2,000   | 8    | 0.10                               | 0.20                             | 200                      | Tom            | 95%    | 2Q'10                           |  |
| САСНЕ       | 7,000   | 12   | 0.30                               | 0.70                             | 500                      | David/Anupam   | 80%    | IQ'10                           |  |
| DULER       | 17,000  | 4    | 0.20                               | 0.70                             | 500                      | Abhishek/Elnaz | 95%    | 3Q'09                           |  |
| ROB         | 25,000  | 8    | 0.60                               | 0.70                             | 500                      | Gregory        | 5%     | IQ'10                           |  |
| СТ          | 5,000   | 2    | 0.20                               | 0.00                             | 200                      | Elnaz          | 40%    | 4Q'09                           |  |
|             | 7,000   | 2    | 0.30                               | 0.00                             | 200                      | Rigo           | 90%    | 4Q'09                           |  |
| PUTE ENGINE | 35,000  | 18   | I.40                               | I.20                             | ١,000                    | Rigo           | 70%    | 3Q'09                           |  |
| ACHE        | 12,000  | 40   | 0.50                               | 3.00                             | 3,000                    | David/Anupam   | 5%     | 2Q'10                           |  |
| \L          | 117,000 | 108  | 3.90                               | 7.30                             | 6,900                    |                |        | 2 <b>Q '</b> 10                 |  |

## TARGETS

The design and implementation of SCOORE has been a completely collaborative effort. It has been a completely collaborative effort. It has been used extensively in undergraduate and graduate courses as a practical teaching tool. In addition to the previous and CMPE 125 students who participated in this project. Funding provided by NSF, NASA, Sun Microsystems, and Xilinx.

- 4-Issue Superscalar
- SPARC V8 Processor
- Out-of-Order Execution



# Baskin Engineering

## **DESIGN FLOW**



## **GOALS / MILESTONES**

•Summer 2009

- •Complete AUNIT, CUNIT
- •Implement Extensive Testbenches
- •Implement BUNIT, MUNIT
- •Integrate the Scheduler

•Fall 2009

- •Integrate the RAT/ROB
- •FPGA Preparation

## FUTURE RESEARCH

This project will serve as an infrastructure to advance research and education in micro-architectural topics such as: simulation, thermal modeling, thermal validation, architectural pruning, design complexity analysis, and hardware bug realization.

- Add Thread Level Speculation
- Implement 'Pruned' Versions of SCOORE
- Boot µCLinux