Floating Point Unit in Max+PLUS II for MC68HC11
This was for my ELEC 498 project at Queen's University. Dr. Ahmad Afsahi was our supervisor. I worked with Fadi Yared and Hock Lee Ooi.
A floating point unit (FPU) was designed using MAX+plus II 10.2 software and implemented on the Altera UP1 Education Board with a chip from the FLEX10K family. The FPU was designed to interface via an eight bit parallel communications handshaking protocol with a microcontroller from the MC68HC11 family, hosted on an evaluation board offered by Technological Arts. This arrangement was used to demonstrate and verify the design.
The FPU was designed to carry out addition, subtraction, multiplication and division according to the IEEE-754 floating point standard. The addition, subtraction and multiplication designs were implemented following standard circuit algorithms. Division was implemented according to the non-restoring division algorithm, using 32 bit registers for the mantissas.
The communications protocol was designed according to handshaking as defined by the documentation for MC68HC11 family microcontrollers. The design uses two dedicated ports for send and receive.
The user interface designed in software for the MC68HC11 was meant for debugging and demonstration purposes. The MC68HC11 connected to a terminal window via the serial port of a PC, and accepted commands of the form 8E45F985+85A38C5E, where each character was a hex value representing four bits of the 32 bit operand encoded according to the IEEE-754 floating point standard. It then returned the result to the terminal window in the same format.
Applications such as integration, convolution, and many signal processing operations require floating point arithmetic precision. The goal of this project was to provide microcontrollers from the MC68HC11 family with floating-point arithmetic capabilities modeled after to the IEEE-754 standard. This goal implies the design, verification and implementation of a floating-point coprocessor that allows users to perform simple arithmetic functions (addition, subtraction, multiplication and division) with floating point precision.
The proposed solution was to design the FPU using MAX+plus II 10.2 software and implement it on the Altera UP1 Education Board with a chip from the FLEX10K family. The FPU was intended to interface via an eight bit parallel communications handshaking protocol with a microcontroller from the MC68HC11 family, hosted on an evaluation board offered by Technological Arts. Such an arrangement would be used to demonstrate and verify the design.
The FPU was designed using MAX+plus II 10.2 software and implemented on the Altera UP1 Education Board with a chip from the FLEX10K family. The FPU was used by a microcontroller from the MC68HC11 family, hosted on an evaluation board offered by Technological Arts. This arrangement was used to demonstrate and verify the design. A communications protocol was successfully designed to interface the two devices. The FPU successfully carried out addition, subtraction, multiplication and division modeled after the IEEE-754 floating point standard. The user interface designed in software for the MC68HC11 was meant for debugging and demonstration purposes. It connected to a PC terminal window via a serial port, and accepted keyboard commands. Furthermore, a VB application was developed in order to automate the verification of results generated by the prototype model.
Input handshaking for the MC68HC11 is given below. Output handshaking is essentially the same in reverse. Since the clock of the FLEX10K is considerably faster than the clock of the MC68HC11 its responses can be considered instantaneous in most cases. In order to establish the interface between the devices, receiving-end designs were created in the FPU for both input and output handshaking.
Each floating point value is 32 bits; therefore, when the MC68HC11 requests a floating point operation from the FPU it must send nine packets and receive four packets. Since the 8-bit parallel ports were chosen as the communications medium, each 32-bit operand can be sent as four 8-bit packets. The ninth sending packet represents the op-code. As a debugging interface for the prototype device, the MC68HC11 can be connected to a PC terminal (via a COM port) and can receive keyboard commands.
The IEEE-754 Standard is outlined in the figure below. The first bit represents, the sign; the next 8 bits represent the exponent in excess-127 format; and the trailing 23 bits represent the bits of the mantissa following an implicit leading one.
The above figure represents a simplified view of the format. The following table gives some example numbers. It is important to note special representation of certain values such as zero and infinity; furthermore, guard bits are generally used to prevent precision loss in rounding. Guard bits are extra hidden bits kept at the end of the mantissa. Representation of special numbers, guard bits and rounding are beyond the scope of this project; trapping overflow when doing floating point operations is also beyond the scope of this project.
Example Numbers from the IEEE-754 Format
Value Sign Exponent Fraction +1.101 x 2^5 0 1000 0100 101 0000 0000 0000 0000 -1.01011 x 2^-126 1 0000 0001 010 1100 0000 0000 0000 +1.0 x 2^127 0 1111 1110 000 0000 0000 0000 0000 +0 0 0000 0000 000 0000 0000 0000 0000 -0 1 0000 0000 000 0000 0000 0000 0000 +¥ 0 1111 1111 000 0000 0000 0000 0000 +1.0 x 2^-128 0 0000 0000 010 0000 0000 0000 0000 +NaN 0 1111 1111 011 0111 0000 0000 0000
The debugging user interface was written in assembly code for the MC68HC11. It was designed to allow the user to input two 32-bit operands and an op-code. The operands must be represented by eight hexadecimal digits, each representing four bits of the 32-bit value. The interface was designed to only accept valid keyboard input (0-9,a-f,A-F) for the operands and (+, -, *, /) for the op-codes. A separate function had to be created to convert between ASCII and binary values. Values received from user input or as a FPU result are stored in memory on the chip at specified addresses. The following figure is a screen capture of the user interface after a user has keyed in their input and the result has been received.
The goal of the communications requirement was the successful transfer of data packets between the MC68HC11 and the FPU. The 8-bit parallel ports of the MC68HC11 were chosen as the communication medium and level sensitive, full handshaking was chosen as the desired protocol. Although the MC68HC11 offers less rigid handshaking protocols such as pulse mode handshaking, employing the rigid level sensitive protocol allowed for easier debugging of communications problems. This was due to the fact that pulses were difficult to detect when debugging using the instruments at the team’s disposal. Testing requirements were also considered in the two port implementation of the communications design. Separate dedicated input and output ports simplified the testing by removing any ambiguities about whether the data being observed was input or output data. This also eliminated any concerns about damage to the devices due to competition on the communication lines.
Since the handshaking protocol specifications were clearly outlined in the MC68HC11 reference manuals, the FPU component of the design consisted of implementing those clearly defined requirements.
The FPU communications design used a combination of register and control logic to accept the data that had been transferred from the MC68HC11 and send the appropriate acknowledgement signals. A finite state machine, coded in VHDL was designed to keep track of the state of the FPU and respond to the MC68HC11 with appropriate handshaking signals. It was also made to generate appropriate control signals such that the value on the input lines were clocked into sign, exponent and mantissa registers for each operand. Similarly, when outputting to the MC68HC11 the finite state machine was made to generate appropriate control signals such that the correct bits were driven onto the output port.
The MC68HC11 aspect to the communications component of the project involved writing a communications driver in assembly code that was custom fit to the FPU design. As previously discussed, the driver must configure the MC68HC11 appropriately and perform the correct handshaking algorithms. The final working algorithm that was used in the project was modeled after the provided specifications as listed in the Motorola reference manuals and was further adjusted to the projects particular requirements.
The communication driver had two main component routines: send and receive. Both routines where programmed with polling methodology rather then interrupts. Initially, the MC68HC11 is driven by the send routine. When the all data is sent and confirmed by the FPU, the MC68HC11 then switches to the receive routine. At this point the driver then waits for all the required data. After the all transactions are complete, the debugging user interface routine displays the result on the screen.
Assembly Code For MC68HC11
INCHAR EQU $ffcd OUTSTRG EQU $ffc7 OUT1BYT EQU $ffbb OUTA EQU $ffb8 ESC EQU 27 CRLF EQU 13 EOT EQU 04 BKSPC EQU 08 TAB EQU 09 DATA1 EQU $c500 OPADDR EQU $C504 DATA2 EQU $C505 DATAEND EQU $c509 RESULT EQU $c509 REND EQU $c50d PIOC EQU $1002 PORTB EQU $1004 PORTC EQU $1005 DDRC EQU $1007 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ORG $c600 PROMPT FCB #CRLF FCC "PLEASE USE THE FORM AAAAAAAA _ AAAAAAAA" FCB #CRLF FCC "WHERE _ CAN BE ANY VALID OPERATOR ( + - * / )" FCB #CRLF FCC "USE enter TO SUBMIT AND esc TO QUIT." FCB #CRLF FCB #CRLF FCC " -> " FCB #EOT QPROMPT FCC "PROGRAM ABORTED." FCB #EOT DONE FCC " = " FCB #EOT ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ORG $C000 START LDS #$CFFF ; init the stack pointer ;----------------------------------------------------------- LDX #DATA1 ; zero the data space LDAA #0 ZEROING STA 0,X INX CPX #REND BLO ZEROING ;----------------------------------------------------------- LDX #PROMPT ; prompt the user for input JSR OUTSTRG ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; LDAA #0 ; init the registers LDAB #0 LDX #DATA1 LDY #0 ;----------------------------------------------------------- GETCHAR JSR INCHAR ; try to get char CMPA #0 ; did we get a char BEQ GETCHAR ; if not, try again CMPA #TAB ; look for illegal keys BEQ QUIT CMPA #BKSPC ; look for illegal keys BEQ QUIT CMPA #ESC ; quit if they hit escape BEQ QUIT CMPA #CRLF ; finish if they hit enter BEQ EEVAL CPX #DATAEND ; check if we're done BEQ JUNK CPX #OPADDR ; check if we're getting the opcode BEQ OPEVAL CPX #DATA2 ; check for white space after the opcode BEQ WEVAL BRA GETHEX ; else we expect a hex digit ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; QUIT LDX #QPROMPT ; display the quiting prompt JSR OUTSTRG JMP START ;----------------------------------------------------------- OPEVAL CMPA #'+' ; check for valid opcodes BEQ ADDOP CMPA #'-' BEQ SUBOP CMPA #'*' BEQ MULTOP CMPA #'/' BEQ DIVOP CMPA #' ' ; allow leading spaces BEQ GETCHAR BRA JUNK ; else it is junk ADDOP LDAA #$08 ; set the opcode BRA STORE SUBOP LDAA #$09 BRA STORE MULTOP LDAA #$0a BRA STORE DIVOP LDAA #$0b BRA STORE ;----------------------------------------------------------- WEVAL CMPA #' ' ; allow leading spaces BEQ GETCHAR BRA GETHEX ; else we expect a hex digit ;----------------------------------------------------------- EEVAL CPX #DATAEND ; check if we're done BEQ OPERATE JMP QUIT ; else quit ;----------------------------------------------------------- JUNK LDAA #BKSPC JSR OUTA ; output a backspace LDAA #' ' JSR OUTA ; output a space to erase what's there LDAA #BKSPC JSR OUTA ; output a backspace JMP GETCHAR ;----------------------------------------------------------- GETHEX CMPA #'0' ; get the hex value of the char BLO JUNK CMPA #'9' BLS NUMBER CMPA #'A' BLO JUNK CMPA #'F' BLS UCASE CMPA #'a' BLO JUNK CMPA #'f' BLS LCASE BRA JUNK ; else it is junk NUMBER SUBA #$30 ; convert from ascii to hex BRA MOD UCASE SUBA #$37 BRA MOD LCASE SUBA #$57 MOD CPY #0 BEQ HIGHEND LOWEND ABA ; else we have the low-end so add high-end from B LDY #0 ; reset the flag BRA STORE HIGHEND LSLA ; move the value into the high end LSLA LSLA LSLA TAB ; store the result in B for use next cycle INY ; set the flag JMP GETCHAR ;----------------------------------------------------------- STORE STAA 0,X ; store the data INX JMP GETCHAR ; get next chunk of data ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; OPERATE JSR ALTERA LDX #DONE ; display the result message JSR OUTSTRG LDX #RESULT ; display the result OUTR JSR OUT1BYT CPX #REND BLO OUTR JMP START ; back to be begining ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; DEBUG ;PSHX ;LDX #REND ;STAA 0,X ;JSR OUT1BYT ;PULX RTS ; display the data being moved ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; WAIT LDAA PORTC ; read port C to clear STAF bit WLOOP LDAA PIOC ; check to see if PORTC data has changed ANDA #$80 BEQ WLOOP RTS ; wait for response handshaking routine ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ALTERA LDAA #$00 STAA DDRC ; set the direction of PORTC LDAA #%00010011 STAA PIOC ; configure PORTC for Handshaking LDAA PORTC ; read PORTC to clear STAF bit ;------------------------------- OPCODE LDAA OPADDR ; send op code STAA PORTB JSR DEBUG JSR WAIT ;------------------------------- LDX #DATA1 ; send first operand OP1 LDAA 0,X STAA PORTB JSR DEBUG JSR WAIT INX CPX #OPADDR BNE OP1 ;------------------------------- LDX #DATA2 ; send second operand OP2 LDAA 0,X STAA PORTB JSR DEBUG JSR WAIT INX CPX #DATAEND BNE OP2 ;------------------------------- LDX #RESULT ; receive result RES JSR WLOOP LDAA PORTC ;STRB is fired here STAA 0,X JSR DEBUG INX CPX #REND BNE RES RTS
Floating point addition can be carried out on any combination of positive and/or negative operands; therefore, once an addition module exists, subtraction can be achieved simply by changing the sign of the second operand and passing it through the addition module.
The addition/subtraction module was designed as an asynchronous device with inputs and outputs for operands and result grouped as separate sign, exponent and mantissa lines. After the design of the other components, the addition/subtraction module was made synchronous solely for the purpose of outputting a ‘done’ signal two clock cycles after its inputs were made valid. This was required only to standardize the addition/subtraction component with the other components which were designed as synchronous machines.
The addition/subtraction design follows the following standard sequential algorithm. The operation (addition/subtraction) is determined by the XNOR of the input signs. The mantissa of the operand with the lesser exponent is shifted right by the difference between the exponents. The operation (addition/subtraction) is carried out on the mantissas. The resultant exponent is set equal to the larger input exponent minus the number of leading zeros of the resultant mantissa. The resultant mantissa is shifted left until there are no leading zeros, and this is used as the final result for the mantissa. The resultant sign is set to positive/negative based on the magnitudes of the altered input mantissas and the determined operation.
SUBTRACT PIPE IN 09 18 36 00 00 1C 12 00 00 PIPE OUT 9C 11 4A 00
The design of floating-point multiplication was more complicated because it must perform many additions. A Design implementing a single addition of all the mantissa results was made but the hardware would not support such a device due to its complexity and the number of logic cells required; therefore, the following sequential synchronous algorithm was implemented instead. The resultant sign is set equal to the XOR of the input signs. The resultant exponent is set equal to the addition of the input exponents minus 127. The design must then deal with the mantissa. The first mantissa is used for the multiplicand and the second is used as the multiplier. The multiplier is checked to see if the value is a one or a zero. If the multiplier is a one, then the multiplicand is added to the product and then shifted one space to the left. If the multiplier is a zero, it is simply shifted and a row of zeros is added to the product. This design requires twenty-four cycles because there are twenty-four checks and additions in the mantissa and each requires one clock cycle. The design then gives the mantissa result, the exponent result and the sign bit. As in addition/subtraction, the resultant mantissa and exponent are normalized based on leading zeros in the mantissa.
MULTIPLY PIPE IN 0A 18 36 00 00 3F 92 00 00 PIPE OUT 18 4F 98 01
The design of floating point division was further complicated compared to floating point multiplication. First, the design shifts the first mantissa twelve spaces to the right and adds twelve to the corresponding exponent. The design then performs and XOR function on the sign bit. It then does a subtraction of the exponents and adds one hundred and twenty seven. The scheme of performing the division on the mantissa follows the non-restoring division algorithm. Firstly, a row of 32 bit zeros is named the dividend. The second mantissa is named the divisor with zeros extended to the right. The dividend is shifted one space to the left and the twenty-fourth bit of the first mantissa is added to the zero bit of the dividend. The first mantissa is then shifted one space to the left. The twenty-fifth bit of the dividend is checked to see if it a zero or a one. If it is a zero, the dividend is added to the divisor. If it is a one, dividend is subtracted by the divisor. This forms a temporary answer. The first bit of the temporary answer is checked to see if it is a zero or a one. If it is a zero, a one is added to the last bit of the quotient. If it is a one, a zero is added to the last bit of the quotient. The quotient is then shifted one space to the left ready for the next bit. The temporary answer now becomes the dividend. The process is then repeated 32 times. This process requires many more clock cycles because second cycle requires an input from the first cycle and so on. After the process completes, the result of the mantissa, the result of the exponent and the sign bit are outputted.
DIVIDE PIPE IN 0B 19 FF 00 00 18 01 00 00 PIPE OUT 41 7D 05 F4
Very minimal testing was required for the debugging user interface. The debugging interface was designed to only accept valid keyboard inputs. This was manually verified, as were the conversion algorithms between ASCII and binary. This ensured that the screen output was an actual representation of the data sent and received on the port lines.
More detailed testing was required for the communications protocol. The communications protocol was independently verified during design first by software simulation, then by hardware simulation. This was done with the two devices independently. In order to facilitate hardware simulation, the device clocks were slowed and dip switches were used as inputs while LEDS were used as outputs. Only minor changes were required when the devices were actually linked in hardware. The nature of the communications protocol was such that it either functioned correctly or not at all; therefore after initial success, only minor testing was carried out to verify that it operated correctly in all expected environments.
Several test numbers were generated and analyzed by hand. These were shown in several representations, including those that would be observed traveling over the communication lines. During the design of each arithmetic operation, these values were used as the test numbers, and success was assumed when the simulation results matched the expected results. Testing of each arithmetic operation was carried out only against one such dataset due to the difficulty of calculating expected results by hand. These simulations were carried out using only Max+plus II simulation software, but were expected to be an accurate representation of the hardware’s behavior.
OPCODES ADD 08 0000 1000 SUBTRACT 09 0000 1001 MULTIPLY 0A 0000 1010 DIVIDE 0B 0000 1011 INPUT NUMBERS X_sign 0 0 X_exponent_excess127 30 00110000 X_mantessa_explicit B60000 10110110 0000 0000 0000 0000 X_pipe 18360000 0001100000110110 0000 0000 0000 0000 Y_sign 0 0 Y_exponent_excess127 38 00111000 Y_mantessa_explicit 920000 10010010 0000 0000 0000 0000 Y_pipe 1C120000 0001110000010010 0000 0000 0000 0000 Z_sign 0 0 Z_exponent_excess127 7F 01111111 Z_mantessa_explicit 920000 10010010 0000 0000 0000 0000 Z_pipe 3F920000 0011111110010010 0000 0000 0000 0000 Q_sign 0 0 Q_exponent_excess127 33 00110011 Q_mantessa_explicit FF0000 11111111 0000 0000 0000 0000 Q_pipe 19FF0000 0001100111111111 0000 0000 0000 0000 R_sign 0 0 R_exponent_excess127 30 00110000 R_mantessa_explicit 810000 10000001 0000 0000 0000 0000 R_pipe 18010000 0001100000000001 0000 0000 0000 0000 DEFINE A = X + Y S = X - Y M = X * Z D = Q / R A_sign 0 0 A_exponent_excess127 38 00111000 A_mantessa_explicit 92B600 10010010 1011 0110 0000 0000 A_pipe 1C12B600 0001110000010010 1011 0110 0000 0000 S_sign 1 1 S_exponent_excess127 38 00111000 S_mantessa_explicit 914A00 10010001 0100 1010 0000 0000 S_pipe 9C114A00 1001110000010001 0100 1010 0000 0000 M_sign 0 0 M_exponent_excess127 30 00110000 M_mantessa_explicit CF9800 11001111 1001 1000 0000 0001 M_pipe 184F9801 0001100001001111 1001 1000 0000 0001 D_sign 0 0 D_exponent_excess127 82 10000010 D_mantessa_explicit FD05F4 11111101 0000 0101 1111 0100 D_pipe 417D05F4 0100000101111101 0000 0101 1111 0100 TESTS ADD PIPE IN 08 18 36 00 00 1C 12 00 00 PIPE OUT 1C 12 B6 00 SUBTRACT PIPE IN 09 18 36 00 00 1C 12 00 00 PIPE OUT 9C 11 4A 00 MULTIPLY PIPE IN 0A 18 36 00 00 3F 92 00 00 PIPE OUT 18 4F 98 01 DIVIDE PIPE IN 0B 19 FF 00 00 18 01 00 00 PIPE OUT 41 7D 05 F4
An independent Visual Basic 6.0 application was designed in order to facilitate rigorous physical testing of the final hardware model. Its graphical user interface is shown in the figure below.
This application has many features and is capable of automatically running the debugging interface of the MC68HC11; however, it requires specific initialization as follows:
- create a new HyperTerminal session of the name "fpu"
- connect to the MC68HC11
- load the MC68HC11 software interface
- begin execution of the MC68HC11 program
The Generate button creates a random input string (operand op-code operand) in the upper left text box, and shows its representation in the boxes at the bottom of the window. The Calculate button does the same but uses the manually entered string from the boxes directly below it. The Generate button only uses the operations specified in the box to its right and also outputs the expected result directly below the generated string. Both buttons are associated with a Send Keys button. This will automatically send the operation to the MC68HC11 which will in turn output the result calculated by the FPU. The Turn button converts a Hex string (in floating point format) into the decimal equivalent. The results from the fpu can be put into the box below the Turn button and converted in order to see if they match the expected results. The Generate button also has a Rolling Send option which simply repeats the function the number of times specified in the box to its right. The Stop button stops the rolling send.
This application was used to rigorously test the hardware model for all four arithmetic operations. This revealed some timing problems with multiplication and division, and a precision error with the division algorithm. After these design errors were corrected the testing verified all expected behavior. It is important to note that some test cases gave imprecise or incorrect results. This was expected as a result of the limited scope of this project. These inconsistencies were the result of the lack of rounding and overflow control.
Given that the testing verified the project design, there remained several issues worth consideration. The IEEE-754 standard calls for the use of guard bits which were not within the scope of this project. Guard bits are used when performing rounding which is also beyond the scope of this project. As a result there were small discrepancies with certain numbers in the testing, these were assuredly due to the limitations of the scope of this project, and did not reflect on flaws in the design. It is also important to note that overflow was beyond the scope of this project and test numbers resulting in overflow do not imply design flaws.
There are numerous promising possible expansions on this project such as: overflow trapping, guard bit compatibility, implementation of special numbers such as infinity and NaN, refinement of division and multiplication algorithms for faster response, convolution, integration or other complex functions.