LG 10/18/2007 Updated 12/26/2007

## Addendum to the PIXEL read-out section of the STAR HFT proposal

This document is intended as an update to sections 4.12 - 4.19 of the Heavy Flavor Tracker for the STAR proposal to the DOE. Recent changes to the development path of the MAPS sensors have resulted in a plan which delivers sensors with enhanced capabilities as compared to the ones described in the original proposal. With these new capabilities comes a new set of requirements for the readout system. In this document we intend to describe the newly agreed sensor development path, the accompanying readout system and provide a preliminary system analysis. This document supersedes the sensor readout and development discussion and analysis presented in section 4.12 - 4.19 of the HFT proposal.

## **Development and Deployment Plan**

We intend to approach the completion of the final PIXEL detector for STAR as a two stage development process with the readout system requirements tied to the stages of sensor development effort at IPHC. In the new development path, the first available set of prototype sensors will have digital outputs and a 640  $\mu$ s integration time. We will use these sensor prototypes to construct a limited prototype detector system for deployment at the STAR detector during the summer of 2010. This prototype system will employ the mechanical design to be used for the final PIXEL detector as well as a readout system that is designed to be a prototype for the expected final readout system to be deployed with the final PIXEL sensors in a complete detector in the 2012 time frame.

#### MAPS Sensor Development at IPHC

The initial sensor development path for the PIXEL detector sensors was tailored to follow the development path of the technology as it was set by the IPHC group. In this path, MAPS sensors with multiplexed serial analog outputs in a rolling shutter configuration were envisioned as the first generation of sensors for a prototype or demonstrator patch of PIXEL detector with a more advanced final or ultimate sensor that had a digital output(s). This path is well described in the previous RDO section of the proposal. The new sensor development path moves to digital binary readout from MAPS with fine grained threshold discrimination, on chip correlated double sampling (CDS) and a fast serial LVDS readout. A diagram showing the current development path and with the attendant evolution of the processing and readout requirements is shown in Figure 1.



Figure 1 Diagram showing the sensor development path of sensors for the STAR PIXEL detector at IPHC in Strasbourg, France. The readout data processing required is shown as a function of sensor generation. The first generation Mimostar sensors are read out via a rolling shutter type analog output. The next generation Phase-1 sensor integrates CDS and a column level discriminator to give a rolling shutter binary readout with a 640  $\mu$ s integration time. The final generation Ultimate sensor integrates data sparsification and lowers the readout time to < 200  $\mu$ s.

The Mimostar series sensors are the generation of sensors that have been fabricated and tested. These are 50 MHz multiplexed analog readout sensors with  $30\mu m \times 30\mu m$  pixels in variously sized arrays depending on generation. This generation has been tested and characterized and, with the exception of some yield issues, appears to be well understood. These sensors are well described in the existing RDO section of the proposal.

The next generation is named "Phase-1". This sensor will be based on the Mimosa-8 and Mimosa-16 sensors and will contain on-chip correlated double sampling and column level discriminators providing digital outputs in a rolling shutter configuration. The Phase-1 will be a full sized  $640 \times 640$  array resulting in a full 2 cm  $\times$  2 cm sensor size. In order to achieve a  $640 \mu s$  integration time, the Phase-1 sensor will be equipped with four LVDS outputs running at 160 MHz. The first delivery of wafers of this sensor design is expected in late 2008.

The final sensor is named "Ultimate". The Ultimate sensor contains all of the attributes of the Phase-1 sensor with the pixel sub-arrays clocked faster to give a  $<200 \ \mu s$  integration time and the integration of a run length encoding based data sparsification and zero suppression circuit. There is one data output from the sensor and the data rates are low thanks to the newly included data sparsification circuitry. The first prototypes of this design are expected to be delivered in the 2010 time frame.

#### **Sensor Series Specifications**

The specifications of the sensors under development are shown in the table below.

|                        | Phase -1                                             | <u>Ultimate</u>                                                                                                                       |
|------------------------|------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------|
| Pixel Size             | 30 µm x 30 µm                                        | 30 µm x 30 µm                                                                                                                         |
| Array size             | 640 x 640                                            | 640 x 640                                                                                                                             |
| Active area            | ~ 2 x 2 cm                                           | ~ 2 x 2 cm                                                                                                                            |
| Frame integration time | 640 µs                                               | 100 – 200 µs                                                                                                                          |
| Noise after CDS        | 10 e-                                                | 10 e-                                                                                                                                 |
| Readout time / sensor  | 640 µs                                               | 100 – 200 µs                                                                                                                          |
| Outputs / sensor       | 4                                                    | 1                                                                                                                                     |
| Operating mode         | Rolling shutter with all pixels read out.            | Rolling shutter with data sparsification.                                                                                             |
| Output type            | Digital binary pixel based<br>on threshold crossing. | Digital addresses of hit<br>pixels with run length<br>encoding and zero<br>suppression. Frame<br>boundary marker is also<br>included. |

 Table 1 Specifications of the Phase-1 and Ultimate sensors.

The Phase-1 is a fully functional design prototype for the Ultimate sensor which results in the Phase-1 and Ultimate sensors having very similar physical characteristics. After successful development and production of the Phase-1 sensors, a data sparsification system currently under development at IPHC will be integrated with the Phase-1 design. With the additional enhancement of design changes allowing for faster clocking of the sub-arrays, the resulting sensor is expected to be used in the final PIXEL detector. In addition to the specifications listed above, both sensors will have the following additional characteristics;

- Marker for first pixel
- Test output pattern JTAG selectable for binary readout troubleshooting. (at least 2 alternating patterns)
- Independent JTAG settable thresholds
- Radiation tolerant pixel design.
- Minimum of 3 fiducial marks / sensor for optical survey purposes.
- All bonding pads located along 1 side of sensor
- Two bonding pads per I/O of the sensor to facilitate probe testing before sensor mounting.

#### Architecture for the Phase-1 Sensor System

The requirements for the Phase-1 prototype and final readout systems are very similar. They include;

- Triggered detector system fitting into existing STAR infrastructure and to interface to the existing Trigger and DAQ systems.
- Deliver full frame events to STAR DAQ for event building at approximately the same rate as the TPC (~ 1 KHz for the STAR DAQ1K upgrade).
- Reduce the total data rate of the detector to a manageable level (< TPC rate)

We have designed the prototype data acquisition system to read out the large body of data from the Phase-1 sensors at high speed, to perform data compression, and to deliver the sparsified data to an event building and storage device.

The proposed architecture for the readout of the Phase-1 prototype system is shown in Figure 2 with the physical location and separation of the system blocks shown in Figure 3.



This is a highly parallel system

Figure 2 Functional block schematic for the readout for the Phase-1 prototype system. The detector ladders and accompanying readout system have a highly parallel architecture. One system unit of sensor array / readout chain is shown. There are ten parallel sensor array / readout chain units in the full system.



Figure 3 Physical layout of the readout system blocks. This layout will be the same for both the Phase-1 based patch and the final PIXEL detector system.

The architecture of the readout system is highly parallel. Each independent readout chain consists of a four ladders mechanical carrier unit with each ladder containing ten Phase-1 sensors. The current plan is to install a patch of Phase-1 sensors consisting of at least two carrier units mounted with the final mechanical positioning structure and positioned with a 120 degree separation. The readout system will be described as if all carriers will be installed since this architecture also extends to the final PIXEL system.

The basic flow of a ladder data path starts with the APS sensors. A PIXEL ladder contains 10 Phase-1 APS sensors, each with a  $640 \times 640$  pixel array. Each sensor contains four separate digital LVDS outputs. The sensors are clocked continuously at 160 MHz and the digital data containing the pixel threshold crossing information is read out, running serially through all the pixels in the sub-array. This operation is continuous during the operation of the Phase-1 detectors on the PIXEL ladder. The LVDS digital data is carried from the four 160 MHz outputs in each sensor in parallel on a low mass flex printed circuit board to discrete LVDS buffers located at the end of the ladder and out of the low mass detector region. This electronics portion of the ladder also contains the buffers and drivers for the clocks and other control signals needed for ladder operation.

Each Phase-1 sensor requires a JTAG connection for register based configuration, power, ground, a 160 MHz readout clock and a synchronization signal to begin the readout. These signals and latch-up protected power as well as the LVDS outputs and synchronization and marker signals from the detectors are carried via low mass twisted pair cables from the discrete electronics at the end of the ladder to a power / mass

termination board located approximately 1 meter from the PIXEL ladders. There is one readout board per PIXEL carrier (40 sensors). A diagram of a ladder is shown in Figure 4.



Figure 4 Assembly of sensors on a low radiation length kapton flex cable with aluminum conductors. The sensors are connected to the cable with bond wires along one edge of the ladder.

The flex cable parameters are shown below;

- 4 layer 150 micron thickness
- Aluminum Conductors
- Radiation Length ~ 0.1 %
- 40 LVDS pair signal traces
- Clock, JTAG, sync, marker traces.

The connection to the driver end of the ladders will be made with very fine 150  $\mu$ m diameter twisted pair wire soldered to the cable ends. These wires are also very low stiffness to avoid introducing stresses and distortions into the mechanical structure. The other ends of these fine twisted pair wires will be mass terminated to allow connection to the Power / Mass-termination (PM) board located approximately 1 meter away.

Latch-up protected power is provided to the sensors from the PM boards. Each ladder has independently regulated power with latch up detection circuitry provided by a power daughter card that plugs into the PM board. There are four regulation and latch-up daughter cards per PM board and a total of ten PM boards are needed for the complete detector system readout. A block diagram for the PM board is shown in Figure 5.



Figure 5 Power and mass-termination board block diagram. The digital signals to and from the sensors are routed through the main board and carried to mass termination connectors for routing to the readout boards. Latch-up protected power regulation is provided to each ladder by a power daughter card mounted to the main board. The main power supplies are located in the STAR racks.

The digital sensor output signals are carried with a 160 MHz clock to from the PM board to the readout boards (RDO) which are mounted on the magnet iron of the STAR magnet structure approximately 6 meters away. A diagram describing the attributes of the two PCBs that make up the RDO system can be seen in Figure 6. A functional block diagram of the RDO can be seen in Figure 7.



Figure 6 Readout board(s). The readout system consists of two boards per carrier of 40 sensors. A commercial Xilinx Virtex-5 development board is mated to a custom motherboard that provides all of the I/O functions including receiving and buffering the sensor data outputs, receiving the trigger from STAR and sending the built events to a STAR DAQ receiver PC via fiber optic connection.



Figure 7 Functional block diagram of the data flow on the RDO boards.

The RDO boards are based on a fast Xilinx Virtex-5 FPGA development board which is mated to a custom motherboard that provides LVDS buffering into the FPGA, the STAR trigger input, PMC connectors for mounting the CERN developed fiber optic Detector Data Link (DDL), SRAM, and various ADCs and I/O to be used in testing. The data processing path is as follows. The sensor output signals are buffered and then fed into the FPGA. In the FPGA the data is resorted to give a raster scan, after which hits registered on pixels are converted to pixel addresses using an address counter. This mechanism of zero suppression, the conversion of hits to addresses in a relatively low multiplicity environment, is the main mechanism for data reduction used in this readout system. The efficiency and accidental rate of a simple threshold on pixel signal is shown in Figure 8.



Figure 8 Efficiency and fake hit rate for a simple threshold cut on pixel signal level. This figure is obtained from beam data taken with Mimostar-2 sensors.

When a trigger is received, one of a bank of event buffers is enabled for one frame (409,600 pixels). After the frame has been recorded in the event buffer, the results of that frame are sent to an event builder. The event builder gathers all of the addresses on the RDO from that trigger and builds them into an event which is then passed via fiber optic links to the STAR DAQ receiver PCs. We intend to use the Source Interface Unit (SIU) and Readout Receiver Cards (RORC) developed for ALICE as our optical link hardware to transfer data to and from the STAR DAQ system. These links have been chosen as the primary readout connections for the new STAR TPC FEE. Leveraging existing hardware and expertise in STAR allows for a faster and more reliable design than developing our own custom solution. The complete system consists of a parallel set of carrier (4 ladder / carrier) readouts consisting of 10 separate chains. A system level functionality block diagram is shown in Figure 9.



Figure 9 System level functionality diagram of the readout of the PIXEL sensors. One of ten parallel readout chains in shown.

#### Data Synchronization, Readout and Latency

The readout of the prototype PIXEL sensors is continuous and hit-to-address processing is always in operation during the normal running of the detector. The receipt of a trigger initiates the saving of the found hit addresses into an event buffer for 1 frame (409,600 pixels). The PIXEL detector as a whole will be triggered via the standard STAR TCD module. Since 640 µs are required to read out the complete frame of interest, the data will be passed to DAQ for event building ~  $640 \,\mu s$  after the trigger is received. We will provide for multiple buffers that will allow the capture of temporally overlapping complete frames. This will allow us to service multiple triggers within the 640 µs readout time of the sensor. In this system, the hit address data is fanned out to 10 event buffers. A separate event buffer is enabled for the duration of one frame upon the receipt of a trigger from the TCD. Subsequent triggers enable additional event buffer until all of the event buffers are full and the system goes busy. The resulting separate complete frames are then passed to the event builder as they are completed in the event buffers. This multiple stream buffering gives a system that can be triggered at a rate above the expected average rate of the STAR TPC (approximately 1 kHz) after the DAQ1K upgrade. Furthermore, since the addition of buffers is external to the sensors, the capability for the addition of large amounts of fast SRAM will be included in the RDO board design allowing for flexibility in our readout system configuration. This multiple event buffer architecture will result in the duplication of some data in frames that overlap in time, but our data rate is low and the duplication of some data allows for contiguous event building in the STAR DAQ, which greatly eases the offline analysis. In addition, synchronization between the ladders/boards must be maintained. The PIXEL will receive triggers and the STAR clock via the standard STAR Trigger and Clock Distribution We will provide functionality to allow the motherboards to be module (TCD). synchronized at startup and any point thereafter.

## **Triggering Considerations**

The primary tracking detector of the STAR experiment is the TPC with the Heavy Flavor Tracker upgrade designed to add high resolution vertex information. The PIXEL detector is part of a larger group of detectors that make up the HFT upgrade at STAR. The other tracking detector components of the HFT include the SSD and the IST. Since the HFT is a system of detectors, in order to maximize efficiency, the trigger response and dead time characteristics of the each detector in the HFT system should be matched, as much as possible, to the others. As the main detector, the post DAQ-1K TPC sets the effective standard for the other detectors in the system. In the current understanding of the system, the PIXEL detector information is only useful in conjunction with the external tracking detectors and thus the PIXEL detector will only be triggered when the TPC is triggered.

The triggers in STAR are produced essentially randomly with a 110 ns crossing clock spacing. The behavior of the TPC is to go dead for 50  $\mu$ s following the receipt of a trigger. This means that the TPC, and by extension the PIXEL detector, will receive random triggers spaced by a minimum of 50  $\mu$ s. An additional constraint is imposed by the fact that the DAQ 1K contains 8 buffers at the front end. This allows for the capability of the TPC to take a quick succession of 8 triggers (separated by 50  $\mu$ s) but then the TPC will go busy until the data has been transferred and buffers cleared. The time required for this depends on the event size. (Some of these numbers can be found at http://drupal.star.bnl.gov/STAR/daq1000-capabilities others are private communication with Tonko Ljubicic). This behavior provides the basis for the assessment of the trigger response characteristics of the detectors in the HFT system. In general, HFT detector readout systems should provide for the acquisition of up to 8 successive triggers separated by 50  $\mu$ s with some, as yet uncharacterized, clearing time. The goal is to have the HFT detectors "live" whenever the TPC is "live". In appendix 1 we show some analysis of the trigger response characteristics of the PIXEL detector.

#### System Performance for the Phase-1 Prototype Sensor System

The raw binary data rate from each Phase-1 sensor is 80 MB / s. For the 400 sensors that make up the PIXEL detector this corresponds to 32GB / s. This raw data rate must clearly be reduced to allow integration into the overall STAR data flow. Zero suppression by saving only addresses of hit pixels is the main mechanism for data volume reduction. The parameters used to calculate the data rates are shown below in Table 2.

| Item                                     | Number             |
|------------------------------------------|--------------------|
| Bits/address                             | 20                 |
| Integration time                         | 640 µs             |
| Luminosity                               | $3 \times 10^{27}$ |
| Hits / frame on Inner sensors (r=2.5 cm) | 295                |
| Hits / frame on Outer sensors (r=8.0 cm) | 29                 |
| Phase-1 sensors (Inner ladders)          | 100                |
| Phase-1 sensors (Outer ladders)          | 300                |
| Event format overhead                    | TBD                |
| Average Pixels / Cluster                 | 2.5                |
| Average Trigger rate                     | 1 kHz              |

Table 2 Parameters used to calculate data rates from a Phase-1 based system.

Based on the parameters given above, the average data rate (address only) from the sensors in the prototype Phase-1 detector is 237 kB / event which give an average data rate of 237 MB / s. It is possible to reduce the data rate further using a run length encoding scheme on the addresses as they are passed from the event buffer to the event builder as indicated in Figure 7. We are currently investigating this option, though the data rate reduction from this approach is expected to be moderate. The raw data rate reduction from the hit pixel to address conversion is given graphically below as Figure 10.



Figure 10 Data rate reduction in the Phase-1 readout system.

#### Architecture for the Ultimate Sensor System

The most significant difference between the Phase-1 and Ultimate sensors is the integration of zero suppression circuitry on the sensor. The ultimate sensors provide zero suppressed sparsified data with one LVDS output line per sensor. In addition, the sub-frame arrays are clocked faster to give a <200  $\mu$ s integration time and a frame boundary

marker is added to the data stream to allow for the demarcation of frame boundaries in the absence of hits in the sensor and to allow for synchronization with the RDO system. The upgrade from the Phase-1 to the Ultimate sensors in the system is expected to involve the fabrication of new sensor ladders using the same mechanical design used in Phase-1 but with the addition of new Ultimate series sensors and a redesign of the kapton readout cable. The Ultra sensor kapton readout cable will require significantly fewer (10 LVDS pairs instead of 40) traces for readout and the new cable design should have a lower radiation length. The task of reading out the Ultimate series sensors is actually less challenging than the readout of the Phase-1 sensors since the data reduction functionality is included in the sensor. **The readout hardware described above for the Phase-1 readout system remains the same for the Ultimate readout system.** Some reconfiguration of the functionality in the FPGA is required for readout of the Ultimate sensor PIXEL detector. A functional block diagram for the RDO boards is shown in Figure 11.



Figure 11 Functional block diagram of the RDO boards for the readout of the Ultimate detector based PIXEL detector.

The Ultimate sensor operates in the same rolling shutter readout mode as the PHASE-1 sensor. The address data clocked out of the Ultimate chip has understood latencies that we will use to keep track of triggered frame boundaries and will be able to verify using synchronization markers from the sensors. The first pixel marker from the sensor corresponds to the actual scan of pixels through the sensor. The frame boundary marker delineates frame boundaries in the sparsification system on the sensor. Using this

information and knowing the internal latencies in the sensor, we can generate the internal logic in the FPGA to implement the same multiple buffering technique that was previously described.

## System Performance for the Ultimate Sensor System

The parameters used to calculate the data rates for this system are shown below in Table 3

| Item                                     | Number             |
|------------------------------------------|--------------------|
| Bits/address                             | 20                 |
| Integration time                         | 200 µs             |
| Luminosity                               | $8 \times 10^{27}$ |
| Hits / frame on Inner sensors (r=2.5 cm) | 246                |
| Hits / frame on Outer sensors (r=8.0 cm) | 24                 |
| Phase-1 sensors (Inner ladders)          | 100                |
| Phase-1 sensors (Outer ladders)          | 300                |
| Event format overhead                    | TBD                |
| Average Pixels / Cluster                 | 2.5                |
| Average Trigger rate                     | 1 kHz              |

Table 3 Parameters used to calculate data rates from a Ultimate sensor based system.

From these parameters, we calculate an average event size of 199 kB giving an address data rate of 199 MB / s from the Ultimate sensor based PIXEL detector.

A more detailed analysis of the readout chain including parameters such as the size of buffers and the internal FPGA functions is included as appendix 1.

## Appendix 1

# **Detailed System Description of the HFT PIXEL RDO System**

This document is an extension of the PIXEL RDO addendum to the HFT proposal. It is intended to give detailed parameters of the function of the PIXEL readout system that will allow for the understanding of the logic and memory and requirements and the functionality of the readout system. We will present the designs of the Phase-1 and Ultimate readout systems under periodic triggering conditions. The simulation of the system response to random triggering of the type expected to be seen at the STAR experiment is ongoing and will be available upon completion. The readout design is highly parallel and one of the ten parallel readout systems is analyzed for each system.

## Phase-1 Readout Chain

The Phase-1 detector will consist of two carrier assemblies, each containing four ladders with ten sensors per ladder. The readout is via parallel identical chains of readout electronics. The relevant parameters from the RDO addendum are reproduced below.

| Item                                     | <u>Number</u> |
|------------------------------------------|---------------|
| Bits/address                             | 20            |
| Integration time                         | 640 µs        |
| Hits / frame on Inner sensors (r=2.5 cm) | 295           |
| Hits / frame on Outer sensors (r=8.0 cm) | 29            |
| Phase-1 sensors (Inner ladders)          | 100           |
| Phase-1 sensors (Outer ladders)          | 300           |
| Event format overhead                    | TBD           |
| Average Pixels / Cluster                 | 2.5           |

Table 4 Parameters for the Phase-1 based detector system used in the example calculations shown below.

The functional schematic of the system under discussion is presented below



Figure 12 Functional schematic diagram for one Phase-1 sensor based RDO board. Each RDO board services one inner ladder and 3 outer ladders. Each ladder contains 10 sensors.

We will show the system function for two cases. The first is for a periodic trigger rate of 1 kHz. The second is for a periodic trigger rate of 2 kHz. These cases make the scaling clear. In both cases we will use the average (pile-up included) event size. We are currently simulating the dynamic response of the system to the triggering and event size fluctuations seen at STAR and will make this information available after the simulations are completed. It is important to note that the system is FPGA based and can be easily reconfigured to maximize the performance by the adjustment of buffer sizes, memory allocations, and most other parameters. The relevant parameters of the system pictured above are described below;

<u>Data transfer into event buffers</u> – The binary hit data is presented to the address counter at 160 MHz. The corresponding hit address data from the adders counter is read synchronously into the event buffers for one full frame of a  $640 \times 640$  sensor at 160 MHz. This corresponds to an event buffer enable time of  $640 \,\mu s$ .

<u>Event Buffers</u> – Each sensor output is connected to a block of memory in the FPGA which serves as the storage for the event buffers. Each block of memory is configured as dual ported RAM and. The overall FPGA block RAM used per sensor output is sized to allow for storage of up to ten average events with event size fluctuation. This leads to a total buffer size that is  $20 \times$  the size required for the average sized event (different for

inner and outer sensors). The FPGA block RAM will be configured with pointer based memory management to allow for efficient utilization of the RAM resources. The average inner sensor has 295 hits / event. There are 4 outputs per sensor so the average inner sensor event address length is  $(0.25 \text{ sensor area}) \times (295 \text{ hits}) \times (20 \text{ bits}) \times (2 \text{ factor for event size fluctuations}) \times (2.5 \text{ hits per cluster}) = 7,375 \text{ bits}$ . Multiplying this event buffer size by 10 gives the size of the RAM required for the full set of event buffers required. The event buffer block RAM size for each inner sensor output is 73,750 bits or 3,688 20-bit addresses.

For outer sensors, the event buffer size is calculated similarly. The average outer sensor has 29 hits / event. There are 4 outputs per sensor so the average inner sensor event address length is  $(0.25 \text{ sensor area}) \times (29 \text{ hits}) \times (20 \text{ bits}) \times (2 \text{ factor for event size})$  fluctuations)  $\times (2.5 \text{ hits per cluster}) = 725 \text{ bits}$ . Multiplying this event buffer size by 10 gives the size of the RAM required for the full set of event buffers required. The event buffer block RAM size for each outer sensor output is 7250 bits or 363 20-bit addresses.

Data transfer into the RDO buffer via the event builder – This process is internal to the FPGA, does not require computational resources, and can run at high speed. In the interests of simplicity, we will assume a 160 MHz clock to move data in 20-bit wide address words. The event builder first adds a 128 Byte header that contains the trigger ID and other identifying information into the RDO buffer, and then moves the address data from the event buffers into the RDO buffer in 20-bit words. The average carrier event size is [(29 hits / sensor (outer)) × (10 sensors) × (3 ladders) + (295 hits / sensor (inner)) × (10 sensors) × (3 ladders) + (295 hits / sensor (inner)) × (10 sensors) × (1 ladders)] × (2.5 hits / cluster) = **9550 address words (20-bit**). The RDO buffer is 5 × the size required for an average event and is thus **955 kb** in size. The full time required to transfer the address data into the RDO buffer (in 20-bit per clock transfers) is then **59.7 μs**.

<u>Data transfer from the RDO buffer over the DDL link</u> – The RDO buffer is dual-ported and thus readout from the SIU to the RORC can proceed as soon as the RDO buffer begins filling. The data transfer rates for the SIU – RORC combination as a function of fragment size are shown below.



Figure 13 Bandwidth of a single channel of the SIU - RORC fiber optic link as a function of event fragment size with an internal and external (DDL) data source using two D-RORC channels. From the LECC 2004 Workshop in Boston.

In this case, we will assume that we are padding the 20-bit address data to 32-bit word lengths for DDL transfer. The event size is then  $(32 \text{ bits}) \times (9550 \text{ address words}) = 305.6 \text{ kb or } 38.2 \text{ kB}$ . In this example, our transfer rate is ~ 200 MB / s. This transfer then takes 191 µs.

Data transfer to the STAR DAQ for event building – The event data is buffered in the DAQ PC RAM (>4GB) until only accepted events are written to disk and then transferred via Ethernet to an event building node of the DAQ system. Level 2 trigger accepts are delivered to the RDO system and transferred via the SIU – RORC to the DAQ receiver PCs. Only the events that have been accepted by level 2 are then built into an event. In this way, the buffer provided by the DAQ PC RAM provides for the elasticity needed for an average event acceptance of 1 kHz

The results of these calculations and discussion are presented below in the following chronograms.



Figure 14 Chronogram of the Phase-1 based readout system functions for a 1 kHz periodic trigger.



Figure 15 Chronogram of the Phase-1 based readout system functions for a 2 kHz periodic trigger.

The memory resources required in the FPGA / motherboard combination for this readout design are (120 outer sensor readout buffers) × (7.25 kb per event buffer) + (262.5 kb for the RDO buffer) + (40 inner sensor readout buffers) × (73.75 kb per event buffer) + (955 kb for the RDO buffer)= **4775 kb**. The Xilinx Virtex-5 FPGA used in our design contains 4.6 - 10.4 Mb of block RAM so the entire design should fit easily into the FPGA.

#### **Ultimate Sensor Detector Readout Chain**

Again, the Ultimate sensor readout system consists of ten parallel readout chains. The main difference between the Phase-1 sensors and the Ultimate sensors is the inclusion of zero suppression circuitry in the Ultimate sensor, thus only addresses are read out into the RDO boards. In addition, the integration time of the Ultimate sensor is 200  $\mu$ s and there is one data output per sensor. These differences lead to the functional schematic of the readout system shown below.



40 independent sensor data chains

Figure 16 Functional schematic diagram for one Ultimate sensor based RDO board. Each RDO board services one inner ladder and 3 outer ladders. Each ladder contains 10 sensors.

We will show the system function for the same two cases as shown for the Phase-1 readout system. The first is for a periodic trigger rate of 1 kHz. The second is for a periodic data rate of 2 kHz. Again, in both cases we will use the average (pile-up included) event size. The relevant parameters of the Ultimate sensor based system pictured above are described below;

| Item                                     | Number |
|------------------------------------------|--------|
| Bits/address                             | 20     |
| Integration time                         | 200 µs |
| Hits / frame on Inner sensors (r=2.5 cm) | 246    |
| Hits / frame on Outer sensors (r=8.0 cm) | 24     |
| Ultimate sensors (Inner ladders)         | 100    |
| Ultimate sensors (Outer ladders)         | 300    |
| Event format overhead                    | TBD    |
| Average Pixels / Cluster                 | 2.5    |
| Average Trigger rate                     | 1 kHz  |

 Table 5 Parameters for the Ultimate sensor based detector system used in the example calculations shown below.

<u>Data transfer into event buffers</u> – The 20-bitaddress data is presented to the event buffer 160 MHz. The integration time is now 200  $\mu$ s giving an event buffer enable time of 200  $\mu$ s.

<u>Event Buffers</u> – Again, we will calculate the amount of FPGA block RAM required for the event buffering. The average inner sensor has 246 hits / event. There are 4 outputs per sensor so the average inner sensor event address length is  $(0.25 \text{ sensor area}) \times (246 \text{ hits}) \times (20 \text{ bits}) \times (2 \text{ factor for event size fluctuations}) \times (2.5 \text{ hits per cluster}) = 6150 \text{ bits}$ . Multiplying this event buffer size by 10 gives the size of the RAM required for the full set of event buffers required. **The event buffer block RAM size for each inner sensor output is 61,500 bits or 3,075 20-bit addresses**.

For outer sensors, the event buffer size is calculated similarly. The average outer sensor has 24 hits / event. There are 4 outputs per sensor so the average inner sensor event address length is  $(0.25 \text{ sensor area}) \times (24 \text{ hits}) \times (20 \text{ bits}) \times (2 \text{ factor for event size})$  fluctuations)  $\times (2.5 \text{ hits per cluster}) = 600 \text{ bits}$ . Multiplying this event buffer size by 10 gives the size of the RAM required for the full set of event buffers required. The event buffer block RAM size for each outer sensor output is 6000 bits or 300 20-bit addresses.

Data transfer into the RDO buffer via the event builder –We will again assume a 160 MHz clock to move data in 20-bit wide address words. The event builder first adds a 128 Byte header that contains the trigger ID and other identifying information into the RDO buffer, and then moves the address data from the event buffers into the RDO buffer in 20-bit words. The average carrier event size is  $[(24 \text{ hits / sensor (outer)}) \times (10 \text{ sensors}) \times (3 \text{ ladders}) + (246 \text{ hits / sensor (inner)}) \times (10 \text{ sensors}) \times (1 \text{ ladders})] \times (2.5 \text{ hits / cluster}) =$ **7950 address words (20-bit**). The RDO buffer is 5 × the size required for an average event and is thus**795 kb**in size. The full time required to transfer the address data into the RDO buffer (in 20-bit per clock transfers) is then**49.7 µs**.

Data transfer from the RDO buffer over the DDL link – The RDO buffer is dual-ported and thus readout from the SIU to the RORC can proceed as soon as the RDO buffer begins filling. Again, we will assume that we are padding the 20-bit address data to 32bit word lengths for DDL transfer. The event size is then  $(32 \text{ bits}) \times (7950 \text{ address})$ words) = 254.4 kb or 31.8 kB. In this example, our transfer rate is ~ 200 MB / s. This transfer then takes 159 µs.

<u>Data transfer to the STAR DAQ for event building</u> – Again, only the events that have been accepted by level 2 are then built into an event. In this way, the buffer provided by the DAQ PC RAM provides for the elasticity needed for an average event acceptance of 1 kHz

The results of these calculations and discussion are presented below in the following chronograms.



Figure 17 Chronogram of the Ultimate sensor based readout system functions for a 1 kHz periodic trigger.



Figure 18 Chronogram of the Ultimate sensor based readout system functions for a 2 kHz periodic trigger.

The system memory resource requirements are somewhat less than those required for the Phase-1 RDO system. This fits easily into the memory resources of the Virtex-5 FPGA.