Updated README

This commit is contained in:
canisio
2026-03-30 09:30:16 -03:00
parent 10644b0475
commit 30b31509c1

329
README.md
View File

@@ -1,308 +1,97 @@
# RFSoC Channelizer RX Architecture (ZCU111) # 📡 RFSoC Channelizer + PS Processing (R-ESM Prototype)
> **Project Context:** This design is part of a prototype **R-ESM (Radar Electronic Support Measures) receiver**, implemented on the ZCU111 RFSoC platform. ## Overview
> The project was initiated from the RFSoC reference template provided by MATLAB/Simulink SoC Blockset and is being incrementally analyzed and modified.
This project is based on the RFSoC SoC Blockset reference design, adapted as a prototype for a Radar Electronic Support Measures (R-ESM) receiver.
### Current Status
- Tx subsystem: simple tone generator (to be replaced by LFM pulse generator)
- Rx subsystem: fully functional channelizer pipeline (PFB-based)
- PL → PS interface: AXI4-Stream + DMA working
- PS processing: frame-based algorithm (RMS + peak detection)
--- ---
## System Overview ## System Architecture
* The **TX subsystem** is currently used for test purposes: ADC → Channelizer (PFB, 512 bins)
→ FFT_Capture (frame control)
* Generates a **single-tone signal via NCO** → FIFO Serializer (4 FIFOs → 1 stream)
* Future work: implement an **LFM pulse generator** → AXI4-Stream (uint64)
→ DMA (S2MM)
* The **RX subsystem** (focus of this document): → PS Memory
→ Processor Algorithm (frame-based)
* Acquires RF data via ADC
* Performs **channelization (PFB)**
* Buffers and serializes data
* Streams data to processor (PS) via AXI DMA
--- ---
## System Configuration ## Key Parameters
* ADC Sampling Rate: **4096 MSPS** - ADC Sampling Rate: 4096 MSPS
* Decimation: **×8 → 512 MSPS effective bandwidth** - Decimation: 8
* FPGA Fabric Clock: **128 MHz** - Effective BW: 512 MHz
* Samples per Clock: **4 complex samples** - Channels (FFT size): 512
- Samples per clock: 4
- FPGA clock: 128 MHz
- Frame size (PS): 512 samples
--- ---
## Channelizer (PFB) ## DMA (PL → PS)
* Type: Polyphase Filter Bank (PFB) - Data type: uint64
* Number of Channels: **512** - Frame size: 512
* Taps per Channel: **16** - Buffers: 16
* Output per Clock: - Memory: PS DDR
* `4 complex samples` (vectorized) Each TLAST corresponds to one DMA frame.
* `valid`, `SOF`, `EOF`
### Frame Structure
* Total bins per frame: **512**
* Samples per clock: **4**
***128 clock cycles per frame**
### Time-Multiplexed Output
Each clock produces consecutive frequency bins:
```id="y7l3sj"
clk 0 → bins 03
clk 1 → bins 47
...
clk 127 → bins 508511
```
--- ---
## Data Representation ## Processor (PS)
* Input to channelizer: **16-bit complex** - Event-driven execution (triggered by DMA)
* Output: **25-bit complex** - No task queueing
* Per sample: **50 bits (Re + Im)** - Frames may be dropped if processing is slower than input rate
Data is later packed into **uint64** for AXI compatibility.
--- ---
## FIFO Architecture (Banked Design) ## Data Path in PS
### Structure - Stream Read → uint64[512]
- Bit extraction → real/imag
* **4 independent FIFOs** - Conversion → complex vector
* One FIFO per lane (sample index) - Processing → RMS + peak detection
```id="k6d9c7"
Lane 0 → FIFO1
Lane 1 → FIFO2
Lane 2 → FIFO3
Lane 3 → FIFO4
```
### Depth
* Each FIFO depth: **128**
* Total frame: **512 samples**
* → 512 / 4 = 128 samples per FIFO
--- ---
## Why 4 FIFOs (Critical Design Choice) ## Performance Notes
### Hardware Constraint - Bottleneck: unpacking + type conversion
- PS cannot keep up with full-rate stream
FPGA BRAM: - Frames are skipped under load
* Max **2 ports**
* Cannot support **4 simultaneous writes**
### Input Requirement
* 4 samples per clock → **4 writes per clock**
### Solution
→ **Banked memory (4 FIFOs)**
Each FIFO:
* 1 write per clock
* Fully compatible with BRAM architecture
--- ---
## Data Organization Across FIFOs ## FrFT Integration Plan
Each FIFO stores a decimated sequence of bins: - Replace Processor Algorithm with FrFT
- Keep all other components unchanged
```id="ehy2k6" - Input: complex single [512x1]
FIFO1: bins 0, 4, 8, ... - Accept dropped frames initially
FIFO2: bins 1, 5, 9, ...
FIFO3: bins 2, 6, 10, ...
FIFO4: bins 3, 7, 11, ...
```
This is a **lane-based de-interleaving** of the channelizer output.
--- ---
## Serialization (Parallel → Stream Conversion) ## Roadmap
### Input 1. Functional FrFT (PS)
2. Profiling
* 4 samples per clock (parallel) 3. NEON optimization
4. Throughput tuning
### Output 5. PL acceleration
* 1 sample per clock (AXI stream)
### Mechanism
A **FIFO Sequencer** performs round-robin reads:
```id="b42g8p"
Cycle 0 → FIFO1
Cycle 1 → FIFO2
Cycle 2 → FIFO3
Cycle 3 → FIFO4
(repeat)
```
### Result
Reconstructed stream:
```id="9f8x7x"
0, 1, 2, 3, 4, 5, ..., 511
```
--- ---
## Throughput Behavior ## Key Takeaway
* Write side: **4 samples/clk** First make it work end-to-end, then make it fast.
* Read side: **1 sample/clk**
Over 4 cycles:
* 4 samples written → 4 samples read
→ **No data loss (rate preserved over time)**
---
## TLAST Handling (Frame Boundary)
* TLAST is embedded in FIFO data:
* LSB of FIFO word carries TLAST flag
```id="l6g0vn"
[Data (50 bits)] + [TLAST (1 bit)]
```
* Extracted after FIFO mux and sent to AXI
### Behavior
* TLAST asserted **once per frame**
* Typically associated with final bin (e.g., bin 511)
---
## AXI4-Stream Interface
Output signals:
* `tdata` → uint64 packed data
* `tvalid` → data valid
* `tready` → backpressure from DMA
* `tlast` → frame boundary
### Data Path
```id="9m7iqk"
PL → AXI4-Stream → AXI DMA (S2MM) → DDR → PS
```
---
## Backpressure Handling
AXI backpressure (`tready`) propagates upstream:
* If `tready = 0`:
* FIFO reads pause
* Data accumulates in FIFOs
### Protection Mechanism
* FIFO_Sequencer only reads when:
* AXI is ready
* Data available in all FIFOs
---
## Triggered Capture Mechanism
### Trigger Source
* Software writes to register
* Generates 1-cycle pulse (`TriggerCapture`)
### FFT_Capture Behavior
State machine:
```id="27qf3x"
IDLE → wait trigger
ARMED → wait SOF
CAPTURE → collect 128 cycles
DONE → assert TLAST
```
### Key Property
Capture is **frame-aligned** (starts at SOF)
---
## Architectural Pattern
The system implements:
```id="2q6v6x"
Parallel Stream (4 samples/clk)
Banked Memory (4 FIFOs)
Round-Robin Serialization
AXI Stream (1 sample/clk)
```
This is a standard FPGA pattern:
> **Lane-based parallelism + memory banking + time-multiplexed output**
---
## Notes for Future Work (FrFT Integration)
### Recommended Insertion Points
**Option A (Preferred):**
```id="bb7jbp"
FIFO output → FrFT → MUX → AXI
```
**Option B:**
```id="o5o0qz"
MUX → FrFT → AXI
```
### Avoid
```id="q8k3yo"
Before FIFOs (requires 4-sample parallel processing)
```
---
## Key Takeaways
* Multiple FIFOs are required due to **memory port limitations**
* Serialization is done via **deterministic round-robin scheduling**
* AXI backpressure is safely absorbed using FIFO buffering
* Frame integrity is guaranteed via **SOF-aligned capture + TLAST**
* Architecture is scalable and suitable for further DSP insertion (e.g., FrFT)
---