Herramientas de usuario

Herramientas del sitio


Barra lateral

ipcore_library_for_fpga

IPcore library for FPGA

This page shows a set of already developed blocks for FPGA for event-based processing from sensors to robotic actuators. Descriptions about the instantiation, parameterization and utilization of these blocks is included in the following lines. They can be downloaded and included in your future research simply by including the right publication reference.

Event-based Processing

Event-based processing refers to the processing of those information codified in events produced by neuromorphic sensors, in order to clean, adapt, filter and/or prepare the event-based information for further processing through high level algorithms, like ConvNets, Deep Belief Networks, or motor controllers on robotic platforms. This processing should preserve the low latency property of the sensor output which comes about because of the asynchronous and quick readout of the sensors such as the DVS. There are two main classes of event-based sensor processing algorithms: filters and feature extractors. Filters functionality cannot transform sensor information; they can only apply small changes for improving the quality of the signal, like cleaning the noise or reducing the activity. On the other hand, feature extractors extract a particular stimulus property such as edge orientation. This highlighting can result in the production of new kind of event-based information that can be sent together with the sensor information to improve the signal, or they can be sent alone without the sensor information, to make the next stages to work only with these extracted features. For example, for debugging purpose, it is more convenient to have both the sensor and the feature-extracted information at the same time, but for motor control in robotic applications, the use of the feature is sufficient.

Background Activity Filter

The jAER software implementation of a background activity filter (BackgroundActivityFilter.[9]) works as follows for the events coming from a DVS (or a data file previously recorded): the timestamp of an event is used to measure the inter-event-interval time for that pixel (called delta-time). This delta time is compared to a threshold to determine if this event should be filtered. In this way, isolated pixels with event rate lower than 1/(delta time) are filtered out. When a moving object stimulates the DVS, a neighborhood of pixels is usually active at the same time, generating events. The BAF is further improved by also filtering events that are not correlated in space. In this way the filter removes events that are not spatio-temporally correlated. There are two possible ways to implement the spatial correlation: 1) by sharing a position of the 2D timestamp array within a neighborhood of pixels (ie. subsampling the event address by right shifting the x- and y-parts to access the 2D array of stored timestamps, px1-px4 in Fig. 1); and 2) by updating the last event timestamp in a neighborhood of timestamps around the corresponding pixel of the incoming event, but not at its own address. Therefore, only neighbor activity will make a particular pixel not to be filtered. So isolated pixels, with no neighbors to update their timestamps, will be filtered. 1) and 2) can be combined as in this work.

Backgroud Activity Filter algorithm Fig. 1. Background activity filter: top: address-event-representation mapping to address of 2D memory array of timestamps. Bottom-left: 2D timestamp array representation, where px1-4 corresponds to subsampling operation and nb1-nb8 represent the neighborhood of incoming address (nb0). Right: representation of temporal filtering condition.

Fig. 1 illustrates these two combined ideas. A 2D array of 128×128 timestamps is addressed taking from the address of the event the 7 most significant bits of the X (column) and Y (row) addresses. Thus, each position of the array is shared by 4 pixels (px1 to px4 in the figure) if the sensor address space is bigger than 128×128 (i.e. DAVIS sensor). Extensive spatial correlation is implemented taking more consecutive positions of the timestamps array in X and Y directions (nb1 to nb8 in the figure). This filter needs an internal timer to measure the inter-event-intervals. When a new event arrives, with a pixel address that corresponds to nb0; the filter reads the timestamp stored in nb0 (called ti in Fig. 1) and calculates the delta time by subtracting current timer value (called tj or tk in Fig. 1 to illustrate two possible cases). Then, this delta time is compared to the configurable threshold (called dTH in the figure). If delta time is lower than dTH (the case of tj) then that new event is sent out. If delta time is bigger than dTH (tk case) then the event is filtered. Finally, the filter stores the current timer value (tj for case 1 or tk for case 2) in a set of neighbors around nb0.

FPGA implementation

This hardware implementation keeps in FPGA block RAM memory an updated 2D array of timestamps. Each incoming event updates the timestamp for a configurable set of neighbors around its own address in the 2D array. The address of each event is taken from the row and column parts of the incoming address (see Fig. 1). Spatial correlation is implemented in two different ways: (1) Spatial correlation by subsampling: taking only the 7 most significant bits from row and column addresses, each position of the 2D array (128×128) will be shared between several pixels of the DVS (4 pixels in our case) that are neighbors. In Fig. 1, px1, px2, px3 and px4 have correlated addresses. If we say that the 7-bit col address is x and the 7-bit row address is y, then px1=(2x,2y), px2=(2x+1,2y), px3=(2x,2y+1) and px4=(2x+1, 2y+1) in the DVS visual field. All of them will share the position (x,y) in the 2D array of timestamps that this filter has implemented in the block RAM memory of the FPGA. (2) Spatial correlation by neighborhood. This VHDL implementation has a configurable number of neighbors (nb1-nb8) that can share the same timestamp. Each nbx of the 2D array corresponds to a set of 4 correlated pixels by subsampling (following the previous rule). Almost the whole set of x and y addresses of the 2D array neighborhood (according to table I) of timestamps are updated with the timestamp of the incoming event, except for the one that corresponds with the incoming event (nb0). Regarding to the temporal correlation, when an event arrives, its address is used to read an old timestamp (t0) from the 2D array. This timestamp, which could be produced by any event of the neighborhood, is compared to the current timestamp t1 (extracted from a global timer). Only if the difference between these two timestamps is below a threshold (dtTH) then the event is let through, otherwise, the event is filtered. In Fig. 1, we have represented these two possibilities with tj-ti and tk-ti . If an event arrives in time t1, this event is sent because t1-t0 < dtTH, and then, t1 is stored in nb0 if only subsampling spatial correlation is selected, or in nb1 to nb8, but not nb0, if bigger spatial correlation is selected. But if an event arrives at t1’ instead of t1, being t1’-t0>dtTH, then this event is filtered and t1’ is stored in 2D array in the same way. In VHDL this filter has been implemented using three main descriptions or circuit resources: (a) a finite-state-machine (FSM) to implement the control unit (see Fig. 2), (b) a block-RAM-memory (BLOCKRAM) to implement the 2D array and © a counter for time control (called Timer in Fig. 2).

Backgroud Activity Filter Finite State Machine Fig. 2. Simplified BAF FSM diagram. Temporal correlation is implemented using a global timer from which t0 and t1 are taken. Spatial correlation is implemented by UPDATE RAM state through all the iterations to cover all the neighbors around the incoming event.

1) Finite State Machine

Fig. 2 represents the FSM diagram for the implemented BAF. Starting with the IDLE state, as soon as a new event arrives (REQi=’0’), the global timer is read and its current value is stored in a register called t1. At the same time (same state and clock cycle), the input event address is stored in a register called dir. In the next clock cycle (next state), the BLOCKRAM is read from address dir. That stored value is copied into the register t0. Then, in the following clock cycles (8 in this case because of the 8 neighbors), the state machine accesses the BLOCKRAM for writing t1 in the 8 neighbors addresses (according to table 1). But, position dir of the memory is not written. This position, if updated, must be done when a new event arrives for any of its neighbors. Using this sequence will also filter all the isolated hot pixels. Once, all the neighbors’ RAM positions have been updated with the new timestamp t1, the state machine checks if t1-t0 is bigger or smaller than a configurable threshold, dtTH, in order to decide if the current incoming event must be filtered or passed through to be processed by next event-processing blocks.

2) Block RAM memory

A BLOCKRAM is a specific resource available in most FPGAs. It is a small block (one of usually many blocks) of RAM memory that can be addressed from logic blocks of the FPGA, and whose data can be read and written in parallel from the array of gates of the FPGA. These RAM blocks have a fixed size and they have to be used as a block, independent of the amount of memory needed by the implemented architecture. For this implementation of the background activity filter an array of 128×128 words of 32-bit is used. This implies 16k words, which requires a memory size of 64kbytes. Xilinx Spartan 6 FPGAs have 36kbit BLOCKRAM, which can be used as 1k blocks of 36-bit; so 16 BLOCKRAMs are needed. For Lattice EC3P, each EBR (embedded block RAM) has a size of 18kbits that can be used as a 512 elements of 36bits, so 32 EBR are needed for this filter. Either a BLOCKRAM for Xilinx or an EBR for Lattice can be used by instantiating a component in the VHDL code, or they can be inferred by the synthesis tool if some basic considerations are taken into account when the VHDL description code is written, according to Xilinx synthesis rules.

3) Time counter

The timer is implemented using a 32-bit register that is incremented every clock cycle. The software interface work with dtTH parameter in time units (ms), but in the FPGA a timer counts clock cycles, and the state machine work with the timer output, so the FPGA global clock frequency is needed to perform a right understanding between time (ms) in the software interface and timer output.

Entity
Input Ports
  • aer_in_data(16:0) - Parallel input AER bus, where bit 0 is the polarity, bits 8:1 are x address and bits 16:9 are y address.
  • aer_in_req_l - Request signal of AER datain. It is active low.
  • aer_out_ack_l - Acknowledge signal of AER dataout. It is active low.
Configuration input
  • SPIaddress(7:0) - Configuration register address bus. It selects one register to be written from SPI bus.
  • SPIdata(7:0) - Configuration register data bus. It represents the new data to be written in a register.
  • SPIwr - Configuration write enable. When it is high, the SPIdata is copied into the SPIaddress register.
  • clk50 - clock signal. Usually it is 50MHz.
  • rst_l - reset signal. Active low.
Output Ports
  • aer_out_data(16:0) - Parallel output AER bus, where bit 0 is the polarity, bits 8:1 are x address and bits 16:9 are y address.
  • aer_in_ack_l - Acknowledge signal of AER datain. It is active low.
  • aer_our_req_l - Request signal of AER dataout. It is active low.
Configuration output
  • BGAF_en - configuration bit received through SPI. Used as enable of the BAF+MF+OBT. If disabled, the output is connected to the AER input bus.
  • DAVIS_en - configuration bit received through SPI. Used to specify if the input is a DAVIS retina or a DVS_PAER_128
  • WS2CAVIAR_en - configuration bit received through SPI. Used to enable the conversion from WordSerial to CAVIAR parallel AER format.
  • wholereset - configuration bit received through SPI. Used to reset the system through the software.
Files

Mask Filter

There are two different situations that can be solved by using this filter: 1. Filtering a set of hot pixels whose spatial properties make impossible to be filtered by the BAF (i.e. a neighborhood of hot-pixels). In other words, to filter all high frequency pixels, so it works as a matrix of low-pass filters. 2. Reducing the output of a DVS to a set of pixels of interest. In this case the filter behavior is the opposite of the previous one, thus a matrix of high-pass filters. For example, when an event-based algorithm has a very well defined working region and everything else will produce noise, like in a slot-car game for automatic control of one of the cars. In this case, it is useful to ignore all the pixels that are not part of the path of the car. Let us suppose a matrix M with the same size of the visual field of the retina (AxB). This matrix will contain a bit at each position. M=[m_(a,b) ],〖 m〗_(a,b)∈{0,1},0≤a≤A,0≤b≤B This bit will be understood as a mask to filter or not each of the incoming events from the retina. The M matrix is calculated beforehand by monitoring events for a period of time. If the frequency of events for a particular address is higher than a threshold, the mask can be activated for that address. In this case, the filter will be working as a high pass filter. In contrast, if the mask filter is deactivated when the frequency is higher than the threshold, then the behavior of the filter will be as a low-pass filter. The working principle of the filter is the same in these two cases. A flag is needed for each pixel. This flag indicates if the pixel activity has to be filtered or not. By negating the meaning of this flag, the same filter can be used in the two situations explained above. The jAER software MF.1 (HotPixelFilter) filter is implemented in two stages: an observation step and after that, the filtering step. During the observation step, a list of addresses is generated with those pixels which activity has a higher event rate than a configurable threshold. Using this list, the next step is to allow only the events from those pixels that are not on the list to be communicated to the next block.

FPGA implementation

This filter has been implemented by using a shareable BLOCKRAM memory for the observing stage (256x256x5bits) and a second and smaller BLOCKRAM memory for the normal operation (256x256x1bit). This second memory is the one that our implementation will be accessing per each incoming event in order to check if the event has to be filtered or passed through.

Mask Filter State Machine Fig. 3. Mask Filter FSM diagram. ‘Reqi/Acki’ are the AER input request and acknowledge. ‘Reqo/Acko’ are the AER output request and acknowledge. ‘Array(ae)’ is the filtering flag for the incoming event. ‘Observing’ is an enable signal that is activated through software and deactivated automatically. ‘Observing_timer due’ indicates that the ‘Observing_timer’ has reached the configurable ‘Observing time’. ‘Matrix’ is the memory used for the observing stage. ‘Row, Col and Rc’ are temporal variables. ‘Array_we’ and ‘Matrix_we’ are write enables signals. ‘NEVHP’ is the configurable threshold. A FSM (see Fig. 3) takes care of the observing and filtering stages of this filter operation. From ‘idle’ state (blue) it is possible to advance to the observing stage or to process an incoming event in the normal stage. The only way to start the observing stage is to receive an order from outside (by the software interface through USB in our case), what activates internal ‘Observing’ signal. The state machine will remain in the pink states for a configurable period of time, called ‘Observing time’. During this time, each incoming event is used to increment by one its corresponding position of the 256x256x5bit memory. When the algorithm starts, the Matrix is empty (each address content is zero). After the ‘Observing time’, the ‘Matrix’ has stored a histogram of the incoming traffic during this time. Then, the state machine transitions to a loop (green states) where the ‘Array’ of flags is updated using the ‘Matrix’ and a configurable ‘Threshold’. Basically, the ‘Matrix’ is scanned and each 5-bit value is compared to the ‘Threshold’. If the 5-bit value is higher or equal to the ‘Threshold’, ‘1’ is stored in the ‘Array’ of flags in the same position. Otherwise, a ‘0’ is updated in the ‘Array’ of flags. Before iterating the loop, the current ‘Matrix’ position is cleared. Thus, the ‘Matrix’ will be emptied for a possible next observing stage. While this filter is under normal operation, the ‘Matrix’ memory could be shared with another posterior filter if necessary, but must be empty before the observing stage. This filter has been implemented in such a way that it is possible to invert the polarity of the flag (from the ‘Array’) by software during the normal operation, so the two possible functionalities commented at the beginning are covered.

Entity
Input Ports
  • aer_in_data(16:0) - Parallel input AER bus, where bit 0 is the polarity, bits 8:1 are x address and bits 16:9 are y address.
  • aer_in_req_l - Request signal of AER datain. It is active low.
  • aer_out_ack_l - Acknowledge signal of AER dataout. It is active low.
Configuration input
  • SPIaddress(7:0) - Configuration register address bus. It selects one register to be written from SPI bus.
  • SPIdata(7:0) - Configuration register data bus. It represents the new data to be written in a register.
  • SPIwr - Configuration write enable. When it is high, the SPIdata is copied into the SPIaddress register.
  • clk50 - clock signal. Usually it is 50MHz.
  • rst_l - reset signal. Active low.
Output Ports
  • aer_out_data(16:0) - Parallel output AER bus, where bit 0 is the polarity, bits 8:1 are x address and bits 16:9 are y address.
  • aer_in_ack_l - Acknowledge signal of AER datain. It is active low.
  • aer_our_req_l - Request signal of AER dataout. It is active low.
Files

Object Tracker Feature Extractor

A tracker is more than a filter that focus the attention on a particular cluster activity. It is also a feature extractor, where the center of mass (CM) of the object inside a cluster is calculated and continuously sent out as a new stream of CM events. jAER implements the cluster object tracker algorithm RectangularClusterTracker (RCT). This algorithm processes event packets from a DVS sensor. The main RCT steps are:

  1. 1. For each event (of a packet), it finds a cluster that contains the event, based on a distance criterion (like R_C in equation (6). If a cluster exists, the cluster parameters (location and velocity) are updated using a mixing factor (α≈0.01), as expressed in equation:

x_(n+1)=(1-α) x_n+αe

Where x_(n+1) is the updated location, x_n is the old location and e is the current event.

  1. 2. If the last incoming event does not fall in any cluster, then a new cluster is inferred at this event location. This new cluster will be “visible” in jAER after it has received a configurable number of events (typically 30 events).

After all the events in a packet are managed according to steps 1 and 2, the algorithm processes all the clusters sequentially in the following way:

  - a. If a cluster does not receive any new event for a configurable time, this cluster is removed.
  - b. If two clusters share part of the visual field by overlapping, a merging operation of them is performed into a new cluster and old ones are removed. The new cluster location is computed by averaging the older clusters locations. This average is weighted according to the number of events accumulated by each cluster.

Therefore, a COT has to detect potential objects from DVS output and then follows that object while it is moving through the visual field. It can be seen that a cluster is detected when a configurable number of events are correlated both in time and space. To allow multiple objects detection and tracking, each tracker has to work with a reduced part of the visual field, which is called “cluster”. None of these clusters can work with overlapped space addresses. Any of the multiple trackers should start waiting for an object at a particular initial location and cluster size. As soon as a number of events, Nev, fall into the cluster within a configurable period of time, then, the tracker has detected an object. A configurable extension over the cluster size is always monitored by the tracker for dynamic decision-making on cluster movements and cluster size updates. Nev can be adjusted dynamically for automatic adaptation to different object speeds and sizes. Depending on the evolution of next events falling in the cluster, several tasks are performed in parallel by the COT:

  • (1) The cluster location (its CM) can move through the visual field according to the calculation of the center of mass (CM) of those Nev events.
  • (2) The cluster size can be enlarged or shrink from its initial size depending on the presence of events both inside the cluster and its extension over the radius. If there is activity in this extension area, the cluster is enlarged. If the activity is concentrated around the center of the cluster then it is shrink.
  • (3) The current CM can be averaged from a power of 2 number of last CM events, to low-pass-filter and reduce big changes in the trajectory.
  • (4) Nev is fixed (typically to 20) when tracker is reset. If the object moves faster, the time required to collect Nev events decreases. So, if the next Nev events are detected in a short period of time (typically 10ms for slow objects and 25us for faster ones), the tracker will increase its Nev for the next iteration of the process to increase the precision on the CM calculation. In contrast, if the time for receiving Nev events is increased (the object speed is decreasing), the tracker will decrease Nev to reduce the latency in the CM calculation (precision can be reduced).

These dynamic changes of Nev allow the COT to adapt to object speed changes automatically. In this work we use a factor of 2 for increasing Nev and a factor of 1/2 for decreasing Nev, as we explain in the next section. This tracker can be replicated as many times as necessary in an ASIC or a reconfigurable device (ie. FPGA), using the cascade connection as represented in Fig. 4. The input stream coming from the sensor or a filtered version of the sensor input (i.e. using the BAF or MF filters) goes directly to the first center of mass cell (CMCell). This unit split the input stream in two different streams: (1) all the events for a detected object, called “Cluster events” in the figure, and (2) the rest, called “Pass Through events” in the figure. Furthermore, this cell produces a third port (3), called “CM events”. This third port represents a feature extracted from the input stream that corresponds to the center of mass of the detected object over time. Port number 2 (passed through events) sends out all the events not falling into the cluster of the current tracker, so this output represents the output of the DVS where all the events of the first detected object have been filtered. This output can be used by a next tracker for detecting a different object in the visual field. There is in principle no limit in the number of trackers to be implemented except for the resource limit of the FPGA.

Fig. 4. Object Trackers connected in cascade and sharing an arbiter output stage. The pass through port sends events that are not used. CM events are the center of mass of detected objects. Cluster events are those events over time detected as an object. All the buses are parallel Address-Event-Representation.

FPGA implementation

Fig. 5. Tracker Finite State Machine balls’ diagram and its cluster parameter (right): ‘cm’ is the updated center of mass that has a configurable initial value (InitCMx,InitCMy) and a dynamic updating with a low-pass-filter over time function LPF(AvgXae,AvgYae); ‘g’ is the radius of the cluster that has a configurable initial value (InitRadix) and a dynamic updating regarding the events flow activity (g=g+/-RadixStep) with a configurable step; ‘TH’ is the threshold of accumulated events in the cell that makes it to fire an output event indicating the center of mass. ‘TH’ has an initial configurable value (ClustNev) and it also adapts itself dynamically depending on the speed of the events (TH=TH*/2) by a factor of 2; ‘RadixTX’ is the extension of the cluster. Its size is fixed and it is used for deciding about the need of increasing or decreasing the current cluster size (blue square).

Fig. 5 represents the FSM diagram of our tracker implementation. ‘InitCell’ state is the initial state after an asynchronous reset. In this state all the cluster parameters are initialized with the parameters shown in Table II that come from the software interface. Then, the state machine gets to the idle state to wait for incoming events. In parallel, there are two timers running (TC and TC2). TC manages the reset of the cluster tracker because of the absence of incoming events. TC2 measures the needed time for collecting Nev events. Previous TC2 measure is compared to current one to decide if Nev must be incremented or decremented dynamically by a factor of 2, as expressed in Fig. 5 with ‘TH’ parameter. When a new event arrives, the state machine acknowledges and captures the address of the event in the state ReqReceived. Then it goes to the state EvDiscri in order to discriminate if the event falls inside the current cluster size or not. If the event does not fall in the cluster size, the state machine goes to the EvPassTh state where the event is sent out using the ‘Pass Through’ port. Then idle state is reached again to wait for next event. In contrast, if the received event fell in the cluster, then it depends on how many events have already been received in the cluster in order to perform the average calculation of the events in the cluster, or the calculation of the next center of mass (CM). If current received event does not sum Nev events, then the FSM goes to EvNoProc state. In this state, the averaged X and Y addresses (〖AvX〗_i,〖AvY〗_i) for the last received events is updated using equation (15) with α=1⁄2^n as history (in order to use shift register operations instead of multiplications and divisions). Therefore, the averaged X and Y addresses are not taking into account all Nev events, but the state machine waits for them before performing the next calculations. When the current number of received events inside the cluster is exactly Nev, then the state machine moves to EvProc state. In this state g (the dynamic radius cluster) is updated taking into account if there were or there weren’t events in between the cluster radius and the internal sub cluster, as commented above (orange region of Fig. 5). In this state it is possible to return to the InitCell state if the time since last CM calculation was too long (TC overflows). In the other case, in PreSumComp state the time distance between last CM calculation and current CM is updated and the state machine goes to SumComp state. In this state the CM is updated and, at the same time, the number of events (Nev) is updated according to those dynamic properties commented. The two remaining states are used to manage the transmission of the CM event to both the output arbiter, which sends the event out of the block; and to a next block, which can use this CM feature to process properly DVS output ([24]). Only when these blocks confirm the reception of the CM event and the DVS sensor is ready, the state machine will come back to the idle state.

Entity
Input Ports
  • aer_in_data(16:0) - Parallel input AER bus, where bit 0 is the polarity, bits 8:1 are x address and bits 16:9 are y address.
  • aer_in_req_l - Request signal of AER datain. It is active low.
  • aer_out_ack_l - Acknowledge signal of AER dataout. It is active low.
Configuration input
  • SPIaddress(7:0) - Configuration register address bus. It selects one register to be written from SPI bus.
  • SPIdata(7:0) - Configuration register data bus. It represents the new data to be written in a register.
  • SPIwr - Configuration write enable. When it is high, the SPIdata is copied into the SPIaddress register.
  • clk50 - clock signal. Usually it is 50MHz.
  • rst_l - reset signal. Active low.
Output Ports
  • aer_out_data(16:0) - Parallel output AER bus, where bit 0 is the polarity, bits 8:1 are x address and bits 16:9 are y address.
  • aer_in_ack_l - Acknowledge signal of AER datain. It is active low.
  • aer_our_req_l - Request signal of AER dataout. It is active low.
Configuration output
  • CFG_en(3:0) - Configuration bits through SPI. They said if a tracker (from 0 to 3) is in configuration mode.
  • OT_active(3:0) - Configuration bits through SPI. They represents if each of the 4 trackers is active (enabled).
Files
ipcore_library_for_fpga.txt · Última modificación: 2016/08/02 15:41 (editor externo)