Updated README

This commit is contained in:
canisio
2026-03-30 09:30:16 -03:00
parent 10644b0475
commit 30b31509c1

329
README.md
View File

@@ -1,308 +1,97 @@
# RFSoC Channelizer RX Architecture (ZCU111)
# 📡 RFSoC Channelizer + PS Processing (R-ESM Prototype)
> **Project Context:** This design is part of a prototype **R-ESM (Radar Electronic Support Measures) receiver**, implemented on the ZCU111 RFSoC platform.
> The project was initiated from the RFSoC reference template provided by MATLAB/Simulink SoC Blockset and is being incrementally analyzed and modified.
## Overview
This project is based on the RFSoC SoC Blockset reference design, adapted as a prototype for a Radar Electronic Support Measures (R-ESM) receiver.
### Current Status
- Tx subsystem: simple tone generator (to be replaced by LFM pulse generator)
- Rx subsystem: fully functional channelizer pipeline (PFB-based)
- PL → PS interface: AXI4-Stream + DMA working
- PS processing: frame-based algorithm (RMS + peak detection)
---
## System Overview
## System Architecture
* The **TX subsystem** is currently used for test purposes:
* Generates a **single-tone signal via NCO**
* Future work: implement an **LFM pulse generator**
* The **RX subsystem** (focus of this document):
* Acquires RF data via ADC
* Performs **channelization (PFB)**
* Buffers and serializes data
* Streams data to processor (PS) via AXI DMA
ADC → Channelizer (PFB, 512 bins)
→ FFT_Capture (frame control)
→ FIFO Serializer (4 FIFOs → 1 stream)
→ AXI4-Stream (uint64)
→ DMA (S2MM)
→ PS Memory
→ Processor Algorithm (frame-based)
---
## System Configuration
## Key Parameters
* ADC Sampling Rate: **4096 MSPS**
* Decimation: **×8 → 512 MSPS effective bandwidth**
* FPGA Fabric Clock: **128 MHz**
* Samples per Clock: **4 complex samples**
- ADC Sampling Rate: 4096 MSPS
- Decimation: 8
- Effective BW: 512 MHz
- Channels (FFT size): 512
- Samples per clock: 4
- FPGA clock: 128 MHz
- Frame size (PS): 512 samples
---
## Channelizer (PFB)
## DMA (PL → PS)
* Type: Polyphase Filter Bank (PFB)
* Number of Channels: **512**
* Taps per Channel: **16**
* Output per Clock:
- Data type: uint64
- Frame size: 512
- Buffers: 16
- Memory: PS DDR
* `4 complex samples` (vectorized)
* `valid`, `SOF`, `EOF`
### Frame Structure
* Total bins per frame: **512**
* Samples per clock: **4**
***128 clock cycles per frame**
### Time-Multiplexed Output
Each clock produces consecutive frequency bins:
```id="y7l3sj"
clk 0 → bins 03
clk 1 → bins 47
...
clk 127 → bins 508511
```
Each TLAST corresponds to one DMA frame.
---
## Data Representation
## Processor (PS)
* Input to channelizer: **16-bit complex**
* Output: **25-bit complex**
* Per sample: **50 bits (Re + Im)**
Data is later packed into **uint64** for AXI compatibility.
- Event-driven execution (triggered by DMA)
- No task queueing
- Frames may be dropped if processing is slower than input rate
---
## FIFO Architecture (Banked Design)
## Data Path in PS
### Structure
* **4 independent FIFOs**
* One FIFO per lane (sample index)
```id="k6d9c7"
Lane 0 → FIFO1
Lane 1 → FIFO2
Lane 2 → FIFO3
Lane 3 → FIFO4
```
### Depth
* Each FIFO depth: **128**
* Total frame: **512 samples**
* → 512 / 4 = 128 samples per FIFO
- Stream Read → uint64[512]
- Bit extraction → real/imag
- Conversion → complex vector
- Processing → RMS + peak detection
---
## Why 4 FIFOs (Critical Design Choice)
## Performance Notes
### Hardware Constraint
FPGA BRAM:
* Max **2 ports**
* Cannot support **4 simultaneous writes**
### Input Requirement
* 4 samples per clock → **4 writes per clock**
### Solution
→ **Banked memory (4 FIFOs)**
Each FIFO:
* 1 write per clock
* Fully compatible with BRAM architecture
- Bottleneck: unpacking + type conversion
- PS cannot keep up with full-rate stream
- Frames are skipped under load
---
## Data Organization Across FIFOs
## FrFT Integration Plan
Each FIFO stores a decimated sequence of bins:
```id="ehy2k6"
FIFO1: bins 0, 4, 8, ...
FIFO2: bins 1, 5, 9, ...
FIFO3: bins 2, 6, 10, ...
FIFO4: bins 3, 7, 11, ...
```
This is a **lane-based de-interleaving** of the channelizer output.
- Replace Processor Algorithm with FrFT
- Keep all other components unchanged
- Input: complex single [512x1]
- Accept dropped frames initially
---
## Serialization (Parallel → Stream Conversion)
## Roadmap
### Input
* 4 samples per clock (parallel)
### Output
* 1 sample per clock (AXI stream)
### Mechanism
A **FIFO Sequencer** performs round-robin reads:
```id="b42g8p"
Cycle 0 → FIFO1
Cycle 1 → FIFO2
Cycle 2 → FIFO3
Cycle 3 → FIFO4
(repeat)
```
### Result
Reconstructed stream:
```id="9f8x7x"
0, 1, 2, 3, 4, 5, ..., 511
```
1. Functional FrFT (PS)
2. Profiling
3. NEON optimization
4. Throughput tuning
5. PL acceleration
---
## Throughput Behavior
## Key Takeaway
* Write side: **4 samples/clk**
* Read side: **1 sample/clk**
Over 4 cycles:
* 4 samples written → 4 samples read
→ **No data loss (rate preserved over time)**
---
## TLAST Handling (Frame Boundary)
* TLAST is embedded in FIFO data:
* LSB of FIFO word carries TLAST flag
```id="l6g0vn"
[Data (50 bits)] + [TLAST (1 bit)]
```
* Extracted after FIFO mux and sent to AXI
### Behavior
* TLAST asserted **once per frame**
* Typically associated with final bin (e.g., bin 511)
---
## AXI4-Stream Interface
Output signals:
* `tdata` → uint64 packed data
* `tvalid` → data valid
* `tready` → backpressure from DMA
* `tlast` → frame boundary
### Data Path
```id="9m7iqk"
PL → AXI4-Stream → AXI DMA (S2MM) → DDR → PS
```
---
## Backpressure Handling
AXI backpressure (`tready`) propagates upstream:
* If `tready = 0`:
* FIFO reads pause
* Data accumulates in FIFOs
### Protection Mechanism
* FIFO_Sequencer only reads when:
* AXI is ready
* Data available in all FIFOs
---
## Triggered Capture Mechanism
### Trigger Source
* Software writes to register
* Generates 1-cycle pulse (`TriggerCapture`)
### FFT_Capture Behavior
State machine:
```id="27qf3x"
IDLE → wait trigger
ARMED → wait SOF
CAPTURE → collect 128 cycles
DONE → assert TLAST
```
### Key Property
Capture is **frame-aligned** (starts at SOF)
---
## Architectural Pattern
The system implements:
```id="2q6v6x"
Parallel Stream (4 samples/clk)
Banked Memory (4 FIFOs)
Round-Robin Serialization
AXI Stream (1 sample/clk)
```
This is a standard FPGA pattern:
> **Lane-based parallelism + memory banking + time-multiplexed output**
---
## Notes for Future Work (FrFT Integration)
### Recommended Insertion Points
**Option A (Preferred):**
```id="bb7jbp"
FIFO output → FrFT → MUX → AXI
```
**Option B:**
```id="o5o0qz"
MUX → FrFT → AXI
```
### Avoid
```id="q8k3yo"
Before FIFOs (requires 4-sample parallel processing)
```
---
## Key Takeaways
* Multiple FIFOs are required due to **memory port limitations**
* Serialization is done via **deterministic round-robin scheduling**
* AXI backpressure is safely absorbed using FIFO buffering
* Frame integrity is guaranteed via **SOF-aligned capture + TLAST**
* Architecture is scalable and suitable for further DSP insertion (e.g., FrFT)
---
First make it work end-to-end, then make it fast.