By Andrew Love
Problem Description:
We wish to create a modified floating point MAC without using an HDL. We will be using Vertex 2 FPGAs to run the hardware and Forge will run using java files to create the MAC module.
Solution Description:
For this project, I first wrote a java program that would be able to take in integers and treat them as 32 bit floating point representations. This representation of a floating point differs from the IEEE standard in the sign and exponent representations:
s eeeeeeee mmmmmmmmmmmmmmmmmmmmmmm
Sign Exponent Mantissa
1 bit sign excess-128 encoded (8 bit) 23 bits - implied 1 for 24th bit
1 is positive 0 = represented by 128 (10000000)
-128 = represented by 0 (00000000)
127 represented by 255 (11111111)
A 32 bit floating point number
zero = 0x80000000 or 0x00000000
saturates at 0x7fffffff (smallest) and 0xffffffff (largest)
Addresses:
0x801000 - writing to this resets the accumulator
0x802000 - constant register input to MAC
0x803000 - stream input
0x804000 - MAC output
The implementation in software required masking of the integer bits in order to pull out the sign, exponent, and mantissa portions of the number. For multiplication, a simple xnor of the sign bits gave the appropriate response. The exponents were added, and the mantissa was multiplied. The exponent was then adjusted to get the number back in the proper format and the mantissa was truncated to fit within the allotted number of bits. For addition, the smaller exponent was adjusted to be the same as the larger exponent, and the mantissa was adjusted accordingly. The numbers were then added together, taking into account the sign. Once this was done, the sum was readjusted to fit within the format and the exponent was shifted accordingly. For both operations, once the final sign, exponent, and mantissas were calculated, they were masked again and then an or operation was done on them all to give the final output. The MAC was instantiated by multiplying the constant input and the streaming input and then adding the product with the accumulator to get the output.
To implement the accumulator reset, an accumulator was instantiated in the class and a public reset function was created. Since this function and the MAC function are separate, they can be called independently.
The hardware required some work because the verilog file from Forge would not run any faster then 25 MHz. Therefore, a simple divide by 4 was done on the clock frequency. Forge automatically implemented GO and DONE signals for both the MAC and Accumulator Reset portions of the java code, which helped the transfer to hardware immensely.
The java and vhd files used to implement this MAC is located here:
http://filebox.vt.edu/users/arl8c/public/ECE5530/HW9_files/
The java code was compiled into verilog using Forge and the .vhd and .v files were compiled using the Xilinx toolkit. Other files included in this location are the forge options file, the modified .prj and makefiles which include the proper .vhd and .v files as well as some modified options, and a test script (HW9_dn_script.sh).
Device Utilization and Speed:
Logic Utilization:
Number of Slice Flip Flops: 630 out of 46,080 1%
Number of 4 input LUTs: 1,273 out of 46,080 2%
Logic Distribution:
Number of SLICEs 892 out of 23040 3%
Total Number 4 input LUTs: 1,377 out of 46,080 2%
Number used as logic: 1,273
Number used as a route-thru: 104
Number of MULT18X18s: 4 out of 120 3%
Number of BUFGMUXs 2 out of 16 12%
Number of GCLKs: 2 out of 16 12%
Number of DCMs: 1 out of 12 8%
Total equivalent gate count for design: 39,340
Additional JTAG gate count for IOBs: 37,392
Timing:
25 MHz clock speed
Actual longest cycle time is 38.218 ns with
45 levels of logic.
This program uses a tiny fraction of the available resources, yet it cannot run at speed. This may be because Forge did not optimize the design for speed. Although the option was set to allow pipelining of up to 100 stages, either this was not sufficient or Forge could not figure out how to best pipeline the design. A possible solution would be to change the way the java code is written to see if Forge can better optimize it. Some informal feedback from other Forge users reveals that a person’s programming style can have a significant effect on the way Forge interprets the code.
Test Bench:
I implemented a simple script (HW9_dn_script.sh on the webpage) to run the following through the MAC:
-30 * -1 + 0 = 30
1 * -1 + 30 = 29
Reset Accumulator
1 * 5 + 0 = 5
7 * 5 + 5 = 40
-5 * 5 + 40 = 15
-5 * 7.45 + 15 = -22.25