UvA-DARE (Digital Academic Repository) Architecture and performance of the KM3NeT front-end firmware

. The KM3NeT infrastructure consists of two deep-sea neutrino telescopes being deployed in the Mediterranean Sea. The telescopes will detect extraterrestrial and atmospheric neutrinos by means of the incident photons induced by the passage of relativistic charged particles through the seawater as a consequence of a neutrino interaction. The telescopes are configured in a three-dimensional grid of digital optical modules, each hosting 31 photomultipliers. The photomultiplier signals produced by the incident Cherenkov photons are converted into digital information consisting of the integrated pulse duration and the time at which it surpasses a chosen threshold. The digitization is done by means of time to digital converters (TDCs) embedded in the field programmable gate array of the central logic board. Subsequently, a state machine formats the acquired data for its transmission to shore. We present the architecture and performance of the front-end firmware consisting of the TDCs and the state machine. © The Authors. Published by SPIE under a Creative Commons Attribution 4.0 Unported License. Distribution or reproduction of this work in whole or in part requires full attribution of the original publication, including its DOI. [DOI: 10.1117/1.JATIS.7.1.016001]


Introduction
The KM3NeT neutrino telescopes constitute a deep-sea research infrastructure 1,2 being deployed in the Mediterranean Sea, composed of two detectors placed in two different sites but sharing the same technology.Astroparticle Research with Cosmics in the Abyss (ARCA), 3 located 100 km away from Capo Passero, the southern tip of Sicily, Italy, at a depth of 3450 m will be mainly dedicated to high-energy neutrino astrophysics.Oscillation Research with Cosmics in the Abyss (ORCA) 4 is situated 40 km off the coast near Toulon, France, at a depth of about 2450 m and has been optimized for the study of atmospheric neutrino oscillations.
The telescopes have been designed to detect the Cherenkov photons induced by relativistic charged particles produced in neutrino interactions with the detector surroundings.A threedimensional (3-D) array of digital optical modules (DOMs) detects the Cherenkov photons, allowing for the reconstruction of the trajectory and the energy of the incoming neutrino. 5The DOM [6][7][8] [Fig.1(a)] consists of a 17-in.-diameterpressure-resistant glass sphere housing 31 3-in.photomultiplier tubes (PMTs) together with the front-end and readout electronics. 8,9Eighteen DOMs, mounted in a vertical structure, form a detection unit (DU).Each DU is anchored to the seabed and stands on it due to the buoyancy of the DOMs and a buoy at its top [Fig.1(b)].
When a photon impinges on the PMT cathode, a photoelectron can be produced with a probability given by the quantum efficiency of the PMT (usually ∼30%).Subsequently, a cascade of electrons is generated up to the PMT anode.If the electrical signal at the anode crosses the threshold of a discriminator, it is preprocessed by a dedicated electronic board attached to the PMT.A low voltage differential signal (LVDS) is then generated, with its starting time equal to the threshold crossing time and its duration equal to the time that the waveform is above the configurable threshold.This time duration is called time over threshold (ToT).The LVDS signals generated by the PMTs are collected by the signal collection board and routed to the central logic board (CLB), where the readout acquisition and digitization of the PMT data is performed.
The front-end firmware is embedded in a 160T Kintex-7 field programmable gate array (FPGA) over 160,000 logic cells 10 (see Fig. 2).All the firmware modules are configured and *Address all correspondence to David Calvo, dacaldia@ific.uv.es;Diego Real, real@ific.uv.es controlled by an embedded LatticeMico32 (LM32) microprocessor and its Wishbone bus.The FPGA has a specific hardware logic resource with serial-to-parallel and parallel-to-serial converters, 11 where the time to digital converters (TDCs) are implemented.The 31 TDCs, one for each PMT in the DOM, are coded in hardware description language (HDL) in the FPGA.They digitize the LVDS signals to obtain both the arrival time of the pulse and its ToT.The PMT signal digitized by the TDCs is called a "hit."Once a hit is obtained, a state machine (SM) organizes the hits generated by the TDCs and encodes these into user datagram protocol (UDP) Jumbo frames to be sent to the shore station via the CLB optical link.To reduce the complexity of the hardware and firmware of the DOMs, the concept "all-data-to-shore" is applied, where all the readout information is sent to the shore station without any data filtering.The readout is organized and sent in time intervals of 100 ms, called a "time slice."The TDCs restart their counters at the start of each time slice, being the arrival time of the hits relative to the start of the time slice.To combine and analyze the data provided by all the DOMs, the White Rabbit protocol 12 is used to synchronize the clocks of all the CLBs of the detector with 1-ns resolution.The power consumption of the described front-end firmware is 1.4 W, which represents the 21% of the DOM power consumption.All the devices within the DOM as well as the front-end firmware are shown in Fig. 3.The requirements of the data acquisition (DAQ) architecture are presented in Sec. 2, the TDC requirements in Sec.2.1, and the SM requirements in Sec.2.2.The TDCs are presented, together with several of the qualification tests performed, in Sec. 3, and the SM is described in Sec. 4. Section 5 describes the main test setups used for the validation of the acquisition firmware, and Sec.6 presents some examples of data acquired in the first DUs deployed.Finally, a summary is presented in Sec. 7.

KM3NeT Front-End Firmware Requirements
The main requirements established by the KM3NeT collaboration for the front-end firmware refer to both the digitization of the PMT signals by TDCs and the firmware of the SM that controls the acquisition and generation of the data output packets sent to the shore station.The most important element to reconstruct the particle trajectory from the Cherenkov light is the arrival time of the light on the PMTs.For the KM3NeT requirements, the relative arrival times should be known with an accuracy of 1 ns, which provides an angular resolution of 0.1°f or astrophysical events.

TDC Requirements
Each TDC channel has to be able to deal with pulse rates up to 200 kHz, with 7 kHz being the expected average rate of the PMT signals. 13Table 1 summarizes the data throughput expected for different stages of the KM3NeT infrastructure.The TDCs acquire the LVDS signals generated by the PMTs base, detecting its rise time and duration (ToT) with 1 ns precision.The hit arrival time is coded with four bytes.The duration of the signal, which corresponds to the hit ToT, it is also measured with 1 ns precision.The minimum detectable ToT is 1 ns, whereas the maximum ToT allowed by the sampling architecture is 255 ns (coded with one byte).It is worth mentioning that the TDC range of threshold has been extended by means of the so-called "multihit" functionality, described in Sec.3.6.The minimum time between two consecutive LVDS, or dead time, is 5 ns due to the internal configuration of the TDCs.Each hit (composed of a PMT identification, a ToT, and a time stamp) is encoded with 6 bytes, using the format summarized in Table 2.Each TDC channel has associated one first-in first-out (FIFO) memory with a capacity of 1024 hits.
All the control of the TDC intellectual property (IP) core is done via the Wishbone bus, 14 the bus chosen to interconnect the different firmware IP cores.There is a dedicated 32-bit CPU register that enables the IP core and another one that enables each of the TDC channels independently.

State Machine Requirements
The SM has to organize the optical acquisition in time slices, whose lengths are configurable between 10 and 100 ms.In addition, as in the case of the TDCs, the SM is also integrated as a Wishbone slave with one register to define the payload of the UDP packets sent to the shore station, another register to define the duration of the time slice and six more registers to control the full flags and to send interrupt requests (IRQs) to the LM32 microprocessor.The main requirements for the TDC and the SM are summarized in Table 3.

Time to Digital Converters
TDCs, which convert a pulse time duration into a numeric value, are used in various applications where an accurate measurement of time is needed.TDCs can be implemented in both application specific integrated circuits (ASICs) and FPGAs.ASICs can provide better accuracy when a high time resolution is required.However, FPGAs provide a faster development time and the flexibility to adapt the logic to operating requirements.Moreover, it is possible to use the FPGA logic resources to process the TDC data and interface with the rest of the DAQ system.In the case of KM3NeT, it is not necessary to include extracomponents as in other more accurate architectures such as Vernier, to achieve the desired resolution, which results in a higher-reliability system.For these reasons, KM3NeT collaboration has chosen to implement the TDCs in FPGAs.
Multitapped delay-line architectures were not considered because 31 delay-lines should be implemented, and it would be more problematic to maintain with the rest of the systems implemented in the FPGA.Moreover, the level of resolution and range required by KM3NeT can be achieved by means of synchronous architectures 15,16 more appropriate and simpler for an FPGA implementation.
Table 2 Format of the hit provided by the TDCs.One byte is reserved for the identification number of the PMT, 4 bytes are reserved for the arrival time of the hit, counted as nanoseconds passed since the start of the time slice, and one byte is reserved for the length of the pulse or ToT.Therefore, a total of 6 bytes are needed to codify a hit.The main synchronous architecture for the implementation of TDCs in an FPGA is the 4×oversampling technique (see Fig. 4), which is reliable and can be easily coded with HDL.The four clocks with the required different phases can be directly implemented in the phase locked loops (PLLs) of the FPGA digital clock managers.As a preventive action, the distribution of the clocks in the FPGA resources was carefully designed, since any asymmetry in the high-speed lines distributing the clocks would result in nonlinearities that will degrade the performance of the TDCs.The FPGA allows for specifying the time constraints of the clock distribution.Moreover, the specific hardware provided by most of the FPGAs to serialize and deserialize can implement the 4×-oversampling technique in an even more precise way, as the very fast shift registers present in the serializer-deserializer (SERDES) can be used by the TDCs.In Fig. 5, a schematic view of the 4×-oversampling technique is shown.
The technique finally chosen was the 4×-oversampling technique: a PLL inside the FPGA generates two clocks of equal frequency but different phases (CLK 0 and CLK 90 ).These two phases are routed to a deserializer primitive inside the input/output blocks of the FPGA.The deserializer primitive allows the oversampling of an incoming data stream on both rising and falling edges of the generated clocks, CLK 0 and CLK 90 , resulting in four times the sampling frequency with respect to the original clock (CLK 0 , CLK 90 , CLK 180 , CLK 270 ).If a better resolution is needed, it is possible to either duplicate and shift the sampling clocks by 45 deg in phase, or double the frequency of the clocks.The most important parameters for the characterization of the TDC are: the time range of measurement; the resolution in terms of the least significant bit (LSB); the precision or standard uncertainty of the measurement; the nonlinearities [both the differential nonlinearity (DNL) and the integral nonlinearity (INL)]; the dead time or the shortest time between two consecutive hits; and the maximum readout speed.These are discussed in the following sections.

TDC Architecture
The implementation of the KM3NeT TDCs uses the dedicated input hardware available in the Kintex 7 family of Xilinx FPGA: the input buffer for differential signals (IBUFDS) and the Xilinx input-output serializer-deserializer (IOSERDES), in particular the ISERDESE2 primitive.It includes a firmware data recovery unit (DRU), the logic that takes care of the acquired hits and stores them into the TDC FIFOs.The LVDS signals arriving from the PMT bases enter the FPGA through the IBUFDS input while the 4×-oversampling technique is implemented in the IOSERDES.The IOSERDES block is configured in oversampling mode, with two clock inputs shifted by 90 deg.Internally, the IOSERDES captures the input data, both on the rising and falling edges of the two input clocks, acquiring the input signal exactly as the 4×-oversampling technique requires, i.e., in four equidistant (90 deg) phases derived from the original clock of  The TDCs have three clock domains; the first one, of 1 GHz, includes the IOSERDES oversampling, where the hit is acquired; the second one, of 250 MHz, is used by the DRU; and the third one, of 62.5 MHz, is the system clock domain of the CLB and the one used by the SM to read out the TDC FIFOs.The FIFOs are able to store 1024 hits (each hit has the size and the structure as shown in Table 2), with an almost-full threshold of 1012, to deal with momentary bursts of data, like those due to bioluminescence.The FIFO allows for the readout while other hits are being processed by the SM.
In Fig. 7, a schematic view of the FIFOs is shown.The occupancy of the FIFO is also used to control the acquisition of the TDC channel.When the almost-full-threshold signal of the TDC FIFO is triggered (set to 1012 hits by default, but configurable), the acquisition is stopped until the next time slice, provided that the almost-full signal has been deasserted.Thus, the effective size of the TDC FIFO is 1012 hits, reserving the remaining 12 positions for time slice markers.The time slice markers are specific identifiers used to tag the transition between two consecutive time slices.

TDC Implementation and Resources
The TDCs have been coded entirely in the FPGA using HDL.The resources used per channel include one IBUFDS and one IOSERDES for the TDC readout firmware, 463 registers and 483 look-up tables (mainly used by the DRU) and 3 memory blocks of 36 Kb, used by the TDC FIFOs.Table 4 summarizes the resources used by the TDCs both for one channel only and in total.
The TDC IP core has 31 inputs for the LVDS PMT signals as well as two inputs for the two clocks needed to oversample the input signal (CLK 0 and CLK 90 ).Moreover, the TDC IP core has one bus output, from where the SM can read out the values of the 31 TDC FIFOs.The bus also includes the control and monitor signals to read out the hits stored in the FIFOs, such as the value of the FIFO full and almost-full flags.In addition, the IP core includes the control and monitor signals of a Wishbone slave, which allows for interaction with the embedded software.
Fig. 6 The architecture of the KM3NeT TDCs.Three different subsystems can be identified in the TDC.The first one, running at 1 ns, contains the differential input to the FPGA, the IBUFDS, together with the IOSERDES, where the acquisition is performed.The second one contains the DRU, with the logic to adapt the hits to the format required and to store them in the FIFOs.It works at 4 ns.The third one consists of the logic to read out the FIFOs and provides the obtained hits to the next acquisition level, the SM, running in this case at 16 ns.The interface between the second and the third subsystem is done by means of the FIFOs.
The control registers of the Wishbone slave include one register with one control bit to enable the complete TDC IP core, as well as another register to enable each of the 31 TDC channels.The high rate veto (HRV) and multihit capabilities, explained in Secs.3.5 and 3.6, are also configured and managed by four different registers.Table 5 lists the Wishbone registers used by the TDC channels.

Resolution
The clock frequency and the number of clock phases determine the resolution of the TDC.In the KM3NeT case, the frequency of the IOSERDES clocks is 250 MHz, and the number of phases is four.Therefore, the resolution obtained by the TDCs is 1 ns (1 GHz).It would be possible to increase the resolution of the TDCs by increasing the number of phases, the frequency of the  clock or both.In a Kintex-7 160T, it is feasible to achieve clock frequencies up to 500 MHz, whereas the number of phases could also be increased up to eight with an 8×-oversampling technique.With these two modifications, the resolution could be 250 ps without any modification of the hardware.

Precision
Some factors contributing to the degradation of the TDC precision (σ) are jitter of the clock, electronic noise, and variations in temperature and power.But, since the resolution of the implemented TDCs is not excessively high, these are dominated by quantization errors, as shown in Fig. 8.The arrival time of the hits is asynchronous with the TDC clock since they are not correlated.For this reason, the time interval between the hit starting time and the sampling edge of the TDC is uniformly distributed.The maximum quantization error of a simple measurement is AE1 ns in both rising edge and falling edge.The precision (σ), characterized by the standard deviation of the distribution of repeated measurements, is defined as ; t e m p : i n t r a l i n k -; e 0 0 1 ; 1 1 6 ; 5 5 1 where t 0 is the resolution of the TDC and η is the decimal part of the ratio ToT∕t 0 , denoted by Frac (see Fig. 9): ; t e m p : i n t r a l i n k -; e 0 0 2 ; 1 1 6 ; 5 0 3 The average value of the precision (σ Av ) is calculated by integrating σ over η from Eq. ( 1) between zero and one, which results in E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 3 ; 1 1 6 ; 4 3 3

High Rate Veto
Neutrino telescopes in seawater are affected by several sources of external noise.Studies performed at active sites have shown that bioluminescence, in particular bioluminescence bursts, is an important source of optical noise for electronics with rates that, in extreme cases, could saturate the DAQ.To avoid the saturation of the data taking, the TDCs in KM3NeT incorporate an HRV system to stop the acquisition when an unexpected rate increase is detected until the high rate has ended.The HRV works on a channel and time slice basis.The number of hits detected since the start of the time slice is counted in each of the 31 TDC channels.When this number exceeds some predefined threshold, the acquisition for that channel is blocked until the start of the next time slice.The unaffected TDC channels continue acquiring data.Figure 10 shows a schematic diagram of the HRV operation.The HRV operation prevents the readout of the hits generated while a burst of bioluminescence is happening, preventing the saturation of the communication bandwidth.

Multihit
The multihit option allows for acquiring hits with a duration longer than the limit of 255 ns.Large pulses could be generated in different situations; photons separated few ps in a Cherenkov yield of a muon track; ionized particles along muon trajectory or even exotic particle such as monopoles, which can actually provoke long pulses.If the multihit option is disabled, any hit with a duration longer than 255 ns is digitized as only one hit of 255 ns.On the other hand, if the multihit option is enabled, the same hit is digitized as several consecutive hits, all with a duration of 255 ns except the last one whose duration is the remaining time to complete the real duration of the physical event.Once the data arrive at the shore station, the online trigger and data acquisition system 17 merges the consecutive hits into one hit whose duration equals the real duration of the physical event.The particles that generate these pulses are not included in the standard physics analyses of KM3NeT; however, they could provide useful information to understand unexpected physics phenomena or anomalous behavior of the detector.The use of the multihit can increase the data rate during those moments.However such situations are not expected to occur frequently, therefore multihit will not have a significant impact on the DAQ system performance.Figure 11 shows the operation of the multihit option.

Dead Time
As already mentioned in Sec.2.1, the dead time refers to the minimum time between two consecutive hits.The TDC dead time is 5 ns.The value is intrinsic to the architecture of the TDCs.The multihit option has no influence upon the dead time.If a pattern of hits separated <5 ns is applied to the TDCs, sometimes some hits could be either merged or discarded depending the In the first one, the multihit option is enabled, so the original hit is acquired as four consecutive hits, the first three of 255 ns length, and the fourth one of 70 ns.In the second scenario, the multihit option is disabled, therefore the original hit is acquired as a single hit of 255 ns.Fig. 10 HRV.Channels 0 and 30 are represented with the HRV is active and set to 4 hits.In channel 1, the number of hits in the time slice evaluated is 3, so the HRV is not triggered.The number of hits in channel 30 is 6, so the HRV is triggered, the events are not stored in FIFO anymore and the acquisition is stopped until the beginning of the next time slice.
instant within the IOSERDES position when they arrive.On the other hand, if the time difference is 5 ns, then, the hits are detected as separate entities.

Nonlinearity
Two of the parameters defining the quality of the acquisition are the DNL and INL.A statisticalcode-density test [18][19][20] is needed to determine the TDC nonlinearity characteristics.In these tests, more than 6 million asynchronous pulses are measured and evaluated to assess the uncertainties of the TDC measurements.The hits generated in the seawater are not correlated in time with the TDC clock, and they can be considered as a uniformly random train of pulses at the input of the TDCs.The number of acquisitions per IOSERDES cycle of the TDC, n (pulses acquired) should be large enough to reduce the statistical uncertainty, which can be approximated as 1 ffiffiffi N p , where N is the total number of generated pulses.

Differential nonlinearity
The DNL can be defined as the deviation of a single quantization step from the ideal LSB.In the case of the oversampling technique, the TDC usually has a reduced DNL, as the feature is intrinsic to the architecture, where the quantization step is always an integer fraction of the clock period.The skew of the clocks, related to the accuracy of the FPGA PLL to generate four clock signals, is the main contributors to the appearance of the DNL.The DNL is evaluated by comparing the number of pulses per IOSERDES cycle (n i ) with the mean value, which in this case results in n ¼ N∕4 since the N generated pulses will be detected randomly by the four cycles of the IOSERDES.For each IOSERDES cycle, the DNL is defined as In Fig. 12, an example is shown of the DNL measurements performed in the laboratory for one of the 31 TDC channels.The DNLs are calculated for the four IOSERDES cycles.Five tests have been carried out, each one with 50,000 measurements.The error bars show the standard deviation, which is defined as where M is the number of tests (five in this case) and D j is the DNL value for each of the tests performed.These tests show the maximum error produced by the DNL is lower than 40 ps, which is negligible for the TDCs performance.

Integral nonlinearity
The INL refers to the maximum deviation of the TDC transfer function from the ideal straight line.It can be calculated as ; t e m p : i n t r a l i n k -; e 0 0 6 ; 1 1 6 ; 5 9 0 where T in is the width of the input, T is the mean of the pulse width measurements, and t bin is the size of the bin, which in this case is 1 ns.INL measurements are shown in Fig. 13 for different time widths starting at 5 ns up to the end of the TDC range, 255 ns, in steps of 5 ns.Five tests have been performed, each one with 10,000 measurements.The error bars take into account the dispersion of the INL measured after the five tests, where in the worst case the INL was 0.118 LSB.

Temperature Effects
The effects of temperature on the TDC performance have been evaluated using a climatic chamber and the test setup explained in Sec. 5, where the CLB has been operated at temperatures ranging from −35°C to 60°C.The room temperature and the temperature of the FPGA die have been registered.For each of 20 temperature values, a repetitive pattern has been applied to the TDCs.A Virtex6 ML605 evaluation board placed outside the climatic chamber has been used as pattern generator.The pattern generated has been measured with an oscilloscope before being applied to the TDC.For each temperature, the same pattern has been applied to one where ToT i corresponds to each of the 10,000 ToT measurements performed by the TDCs, N is the total number of pulses, and ToT real is the real value of the pulse width applied, 5 ns in this case.As can be seen in Fig. 14, the precision is quite satisfactory for the entire TDC range varying from 0.5 ps at −35°C to 2.2 ps at 60°C.The value of the σ LSB is ∼2 ps at 25°C and does not degrade significantly with temperature.

Data Processing: The State Machine
The firmware block called SM implements a seven-state mealy finite state machine (FSM) 21 responsible for downloading the TDC data stored in the FIFO memories, enclosing the data in a UDP packet and sending it to the buffer stream selector called buffer stream IP multiplexer (IPMux).The IPMux sends these data to the shore station through the optical link using UDP frames.As already mentioned previously, the data flow is handled in time slices with a duration of 100 ms.Each time slice is identified by its start time in coordinated universal time (UTC), allowing for its unequivocal identification.The time stamps of the digitized data in the time slice are relative to the time slice start UTC time.

Digital Data Formatting
There are three communication channels: one for the TDC data, another for the acoustic data codec using the Audio Engineering Society (AES) protocol, and a channel for the monitoring data (explained in Sec.4.2).The SM organizes the TDC data packets in two different segments: the header (see Table 6) and the TDC data.The first double word (32 bits) of the header is formed by the identifier of the communication channel sending the data.The second double word refers to the run number.The run number refers to a global identifier for a determined time-span, usually lasting for several hours, during which time data are taken with a fixed set of input parameter values to control the detector and the readout.The frame index field contains the identification of the UDP packet within the current time slice.The next field is the start time (in UTC) of the time slice, which is coded with two double words (64 bits).The penultimate double word of the header is reserved for a status register, where the status of the "almost-full" FIFO flag is coded for each TDC channel.The first bit of this word indicates whether it is the last packet of the time slice or a trailer packet.The first bit of the last double word shows whether the CLB is synchronized and the time is valid, and the remaining 31 bits indicate the status of the HRV of each TDC channel.Table 6 summarizes the different fields used in the header of the UDP packets.The UDP packets are with TDC hits, whose format is described in Table 2.

Firmware Architecture of the State Machine
The SM waits for the first UTC time provided by the White Rabbit Precision Time Protocol (Core, called SuperTime (ST), to start to operate.Once the first UTC time is received, the header is created.The header of the UDP packet consists of seven double words (224 bits).After creating the header, the data download process starts.A round-robin procedure has been implemented to download the information from all the channels.The procedure tries to download events from one channel as much as possible to reduce the dead time during the channel transitions.The maximum hit rate the firmware can handle is 600 kHz per channel, which means a throughput of 0.9 Gbps.All the data are sent to the shore station through an optical network based on switches with a maximum throughput of 1 Gbps.Nine DOMs will be connected per switch, so the maximum hit rate allowed by the switches is 74 kHz per channel.The TDC data are structured in blocks of 48 bits.The SM splits the information in units of 16 bits.The download process continues until a special marker is detected in the FIFO memory.When it is detected, the data download for that channel is blocked until the beginning of the next time slice.If the maximum number of bytes for the UDP payload is achieved, the SM ends the current UDP packet and starts another one with the same header increasing the index number by one.When the time slice marker is detected in all the TDC channels, indicating that there are no more hits for that time slice, the packet is ended, a special trailer packet is sent and a new packet, belonging to the next time slice, is started.
Once the download is complete, TDC data corresponding to a certain time slice, a special packet is sent to the shore station, the trailer packet.This packet contains only a header with the trailer packet bit set.The trailer packet does not incorporate any TDC data.If all the FIFOs are emptied before the end of the time slice, the SM will stay waiting for new data.A 32-bit register contains the empty status of all the TDC memories, thus by checking this register, the SM will start downloading the data again.The process is shown in Fig. 15.
Another important block managed by the SM is the monitoring channel, where monitoring data are forwarded to the IPMux.Each time slice, a packet containing the UTC time, DOM identifier, the UTC FIFO status, and the TDC hit counters (one per TDC channel) is transferred.The memory information, which resides in the LM32 dual-port memory address space, is also transferred.The memory base address is a double word aligned and can be set via the monitoring memory base address register.The amount of double words to be read out by the monitoring SM can be set via the monitoring memory words register (see SM registers in Table 7).The amount of words that can be read is limited to 256 to fit in a standard Ethernet packet (1500 bytes) to reduce the data rate.
The percentage of resources used by the TDC SM IP is negligible with respect to the total resources of the Kintex-7 160T FPGA (see Table 8).

Setup
The first test setup used for evaluating the TDC firmware consisted of a KC705 Xilinx evaluation board, where a first version of the TDCs was implemented, and a ML605 Xilinx evaluation board  used for generating the pattern supplied as inputs to the TDCs.In a further step, the KC705 board was replaced by a CLB.Both test setups are shown in Fig. 16.For temperature tests, a climatic chamber from DYCOMETAL was used.For all the tests, the data were sent from the ML605 through a CAT-5 cable using LVDS signals.

Data from Deployed DUs
Several DUs have already been deployed at the ARCA and ORCA sites, whose acquired data are available for analysis and allow for validating the real operation of the acquisition firmware.
The bioluminescence observed for a single TDC channel is shown in Fig. 17, where the drop generated by the activation of the HRV in the TDCs is observed.The maximum rate measured is 20 kHz, corresponding to an HRV set to 2000 hits for a time slice of 100 ms.An example of the data obtained from the TDCs when the multihit option is disabled is shown in Fig. 18, where the ToT distribution for a given PMT installed in a DOM of a deployed DU is presented.The maximum ToT value is 255 ns, which is the maximum range of the TDCs when the multihit option is disabled.As previously explained, the hits with a duration longer than 255 ns are truncated to this maximum value, producing an accumulation of hits at 255 ns. Figure 19 shows the first hits obtained when the multihit option is enabled.The consecutive hits with a duration of 255 ns, resulting from the segmentation of one physical event longer than 255 ns, are properly merged.The DNL has also been obtained from a collection of detector data.In Fig. 20, the DNL measurements performed for deployed DUs are represented.All the 31 TDC channels, with over 6 million of hits, have been evaluated.The DNLs have been calculated for the four cycles of the IOSERDES but only the highest value is represented.The architecture and performance of the front-end firmware for the KM3NeT have been presented.The front-end firmware has been developed to manage 31 TDC channels implemented in a Xilinx Kintex-7 160T FPGA occupying very low resources and providing the required resolution (1 ns) with a high precision and low DNL and INL.The implementation in an FPGA provides the flexibility to modify the front-end logic and to integrate other systems in the same device; therefore, it is not necessary to include additional components to digitize events, which increases reliability.The TDCs have a dead time of 5 ns.Two features have been implemented to improve their performance: the multihit, which allows for the recording of long duration hits, and the HRV, which dynamically limits the maximum DAS rate.The performance of the acquisition firmware is almost independent of the temperature.The test bench setup, where the frontend firmware has been evaluated, has also been described and the results presented.Finally, the data obtained from the first deployed DUs have been analyzed showing agreement with the results obtained in the laboratory.The DAS front-end and readout firmware have been validated and are now successfully running at the two sites of KM3NeT.

Fig. 1
Fig. 1 (a) A KM3NeT DOM.(b) Artist representation of the 3-D grid of the underwater telescope showing the vertical strings holding the DOMs.

Fig. 2
Fig.2Detail of the FPGA mounted on the CLB.The FPGA is a Xilinx Kintex-7 160T of commercial grade.The FPGA package selected for KM3NeT is the FBG676.

Fig. 3
Fig. 3 Block diagram of the DOM.Optic, acoustic, instrumentation, front-end firmware, and all the interfaces are represented.

Fig. 5
Fig. 5 Scheme of the 4×-oversampling technique.The sampling quadruples the clock frequency using four phases of the original clock, shifted by 90 deg each, thus obtaining a sampling frequency of 1 GHz when using a clock with a period of 4 ns.

Fig. 4
Fig. 4 Oversampling architecture.The input signal is connected with no delay to the flip-flops performing the acquisition.The clocks of the flip-flops are driven by equal-frequency clocks shifted with equidistant phases, therefore increasing the sampling rate by the number of phases being used.

Fig. 7
Fig. 7 Schematic view of the FIFOs.The structure of a complete hit is shown, as well as the time slice marker, which has the same length as a hit and indicates the transition between time slices.The FIFO almost-full flag is also shown.The flag becomes active only if the occupancy of the FIFO is 1012 elements.

Fig. 8
Fig.8TDC quantization error.Two examples of hits are given to explain the intrinsic quantization error of the oversampling TDCs.The first one represents a hit with a ToT of 6.8 ns, which is acquired in six clock samples due to the relative position of the hit with respect to the sampling clock.The quantization error is −0.8 ns.In the second example, the hit has a ToT of 7.2 ns.In this case, due to the relative position of the hit, the number of clock samples is eight, giving a quantization error of 0.8 ns.The maximum values for the quantization error would be AE1 ns.

Fig. 9
Fig. 9 TDC precision.TDC quantization error as a function of the fractional part of the ratio ToT t 0 .The model representing an ideal TDC is shown as a continuous line and the measurements with a real TDC as red points.

Fig. 11
Fig. 11 Multihit option.The figure represents one hit of 835 ns analyzed in two different scenarios.In the first one, the multihit option is enabled, so the original hit is acquired as four consecutive hits, the first three of 255 ns length, and the fourth one of 70 ns.In the second scenario, the multihit option is disabled, therefore the original hit is acquired as a single hit of 255 ns.

Fig. 12
Fig. 12 Measured differential nonlinearities.Five tests have been performed with 50,000 events each.The error bars show the standard deviation of these five measurements.The DNLs have been measured for each IOSERDES cycle.
Fig. 13 INL for step sizes of 5 ns between 5 and 255 ns.The error bars show the dispersion of the five tests performed, each one with 10,000 measurements.

Fig. 14
Fig. 14 Precision tests at different temperatures.At each temperature, the standard deviation of the measured ToT values with respect to the real ToT is represented.

Fig.
Fig. Diagram of the mealy FSM implemented in the firmware readout.The diagram describes the operation of the TDC SM.

Fig. 16
Fig. 16 Test bench setups.(a) The evaluation KC705 board (circuit on the right) where the first TDCs were implemented, connected to a ML605 board (on the left) acting as pattern generator.(b) CLB being stimulated for testing the TDC with a ML605 board.

Fig. 18 Fig. 17
Fig.18Distribution of the ToT hits recorded by a given PMT of a deployed DOM when the multihit option is disabled.The ToT range ends at 255 ns, where an accumulation of hits is produced and expected when this option is enabled.

Fig. 20 Fig. 19
Fig. 20 DNL values obtained from one DOM of the deployed DUs data.More than 6 million events were measured.The DNL were computed for the four bins in the 31 channels, but only the maximum absolute value of each channel is shown in the figure.

Table 1
TDC expected average throughput for the DOM and DU.

Table 3
Front-end firmware requirements.This table summarizes both the requirements of the KM3NeT collaboration for the TDCs and the SM.

Table 4
Detailed TDC resources for both 1 channel and 31 channels.

Table 5
Detailed TDC Wishbone control registers and LM32 memory addresses.Five registers have been included to enable or disable both the whole TDC core and the individual channels and to control the HRV and multihit functions.

Table 6
TDC format header.

Table 7
The 12-Wishbone registers used to control and monitor the SM and the IRQ for both TDC and acoustic channel