Updated README
This commit is contained in:
329
README.md
329
README.md
@@ -1,308 +1,97 @@
|
|||||||
# RFSoC Channelizer RX Architecture (ZCU111)
|
# 📡 RFSoC Channelizer + PS Processing (R-ESM Prototype)
|
||||||
|
|
||||||
> **Project Context:** This design is part of a prototype **R-ESM (Radar Electronic Support Measures) receiver**, implemented on the ZCU111 RFSoC platform.
|
## Overview
|
||||||
> The project was initiated from the RFSoC reference template provided by MATLAB/Simulink SoC Blockset and is being incrementally analyzed and modified.
|
|
||||||
|
This project is based on the RFSoC SoC Blockset reference design, adapted as a prototype for a Radar Electronic Support Measures (R-ESM) receiver.
|
||||||
|
|
||||||
|
### Current Status
|
||||||
|
|
||||||
|
- Tx subsystem: simple tone generator (to be replaced by LFM pulse generator)
|
||||||
|
- Rx subsystem: fully functional channelizer pipeline (PFB-based)
|
||||||
|
- PL → PS interface: AXI4-Stream + DMA working
|
||||||
|
- PS processing: frame-based algorithm (RMS + peak detection)
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## System Overview
|
## System Architecture
|
||||||
|
|
||||||
* The **TX subsystem** is currently used for test purposes:
|
ADC → Channelizer (PFB, 512 bins)
|
||||||
|
→ FFT_Capture (frame control)
|
||||||
* Generates a **single-tone signal via NCO**
|
→ FIFO Serializer (4 FIFOs → 1 stream)
|
||||||
* Future work: implement an **LFM pulse generator**
|
→ AXI4-Stream (uint64)
|
||||||
|
→ DMA (S2MM)
|
||||||
* The **RX subsystem** (focus of this document):
|
→ PS Memory
|
||||||
|
→ Processor Algorithm (frame-based)
|
||||||
* Acquires RF data via ADC
|
|
||||||
* Performs **channelization (PFB)**
|
|
||||||
* Buffers and serializes data
|
|
||||||
* Streams data to processor (PS) via AXI DMA
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## System Configuration
|
## Key Parameters
|
||||||
|
|
||||||
* ADC Sampling Rate: **4096 MSPS**
|
- ADC Sampling Rate: 4096 MSPS
|
||||||
* Decimation: **×8 → 512 MSPS effective bandwidth**
|
- Decimation: 8
|
||||||
* FPGA Fabric Clock: **128 MHz**
|
- Effective BW: 512 MHz
|
||||||
* Samples per Clock: **4 complex samples**
|
- Channels (FFT size): 512
|
||||||
|
- Samples per clock: 4
|
||||||
|
- FPGA clock: 128 MHz
|
||||||
|
- Frame size (PS): 512 samples
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Channelizer (PFB)
|
## DMA (PL → PS)
|
||||||
|
|
||||||
* Type: Polyphase Filter Bank (PFB)
|
- Data type: uint64
|
||||||
* Number of Channels: **512**
|
- Frame size: 512
|
||||||
* Taps per Channel: **16**
|
- Buffers: 16
|
||||||
* Output per Clock:
|
- Memory: PS DDR
|
||||||
|
|
||||||
* `4 complex samples` (vectorized)
|
Each TLAST corresponds to one DMA frame.
|
||||||
* `valid`, `SOF`, `EOF`
|
|
||||||
|
|
||||||
### Frame Structure
|
|
||||||
|
|
||||||
* Total bins per frame: **512**
|
|
||||||
* Samples per clock: **4**
|
|
||||||
* → **128 clock cycles per frame**
|
|
||||||
|
|
||||||
### Time-Multiplexed Output
|
|
||||||
|
|
||||||
Each clock produces consecutive frequency bins:
|
|
||||||
|
|
||||||
```id="y7l3sj"
|
|
||||||
clk 0 → bins 0–3
|
|
||||||
clk 1 → bins 4–7
|
|
||||||
...
|
|
||||||
clk 127 → bins 508–511
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Data Representation
|
## Processor (PS)
|
||||||
|
|
||||||
* Input to channelizer: **16-bit complex**
|
- Event-driven execution (triggered by DMA)
|
||||||
* Output: **25-bit complex**
|
- No task queueing
|
||||||
* Per sample: **50 bits (Re + Im)**
|
- Frames may be dropped if processing is slower than input rate
|
||||||
|
|
||||||
Data is later packed into **uint64** for AXI compatibility.
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## FIFO Architecture (Banked Design)
|
## Data Path in PS
|
||||||
|
|
||||||
### Structure
|
- Stream Read → uint64[512]
|
||||||
|
- Bit extraction → real/imag
|
||||||
* **4 independent FIFOs**
|
- Conversion → complex vector
|
||||||
* One FIFO per lane (sample index)
|
- Processing → RMS + peak detection
|
||||||
|
|
||||||
```id="k6d9c7"
|
|
||||||
Lane 0 → FIFO1
|
|
||||||
Lane 1 → FIFO2
|
|
||||||
Lane 2 → FIFO3
|
|
||||||
Lane 3 → FIFO4
|
|
||||||
```
|
|
||||||
|
|
||||||
### Depth
|
|
||||||
|
|
||||||
* Each FIFO depth: **128**
|
|
||||||
* Total frame: **512 samples**
|
|
||||||
* → 512 / 4 = 128 samples per FIFO
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Why 4 FIFOs (Critical Design Choice)
|
## Performance Notes
|
||||||
|
|
||||||
### Hardware Constraint
|
- Bottleneck: unpacking + type conversion
|
||||||
|
- PS cannot keep up with full-rate stream
|
||||||
FPGA BRAM:
|
- Frames are skipped under load
|
||||||
|
|
||||||
* Max **2 ports**
|
|
||||||
* Cannot support **4 simultaneous writes**
|
|
||||||
|
|
||||||
### Input Requirement
|
|
||||||
|
|
||||||
* 4 samples per clock → **4 writes per clock**
|
|
||||||
|
|
||||||
### Solution
|
|
||||||
|
|
||||||
→ **Banked memory (4 FIFOs)**
|
|
||||||
|
|
||||||
Each FIFO:
|
|
||||||
|
|
||||||
* 1 write per clock
|
|
||||||
* Fully compatible with BRAM architecture
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Data Organization Across FIFOs
|
## FrFT Integration Plan
|
||||||
|
|
||||||
Each FIFO stores a decimated sequence of bins:
|
- Replace Processor Algorithm with FrFT
|
||||||
|
- Keep all other components unchanged
|
||||||
```id="ehy2k6"
|
- Input: complex single [512x1]
|
||||||
FIFO1: bins 0, 4, 8, ...
|
- Accept dropped frames initially
|
||||||
FIFO2: bins 1, 5, 9, ...
|
|
||||||
FIFO3: bins 2, 6, 10, ...
|
|
||||||
FIFO4: bins 3, 7, 11, ...
|
|
||||||
```
|
|
||||||
|
|
||||||
This is a **lane-based de-interleaving** of the channelizer output.
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Serialization (Parallel → Stream Conversion)
|
## Roadmap
|
||||||
|
|
||||||
### Input
|
1. Functional FrFT (PS)
|
||||||
|
2. Profiling
|
||||||
* 4 samples per clock (parallel)
|
3. NEON optimization
|
||||||
|
4. Throughput tuning
|
||||||
### Output
|
5. PL acceleration
|
||||||
|
|
||||||
* 1 sample per clock (AXI stream)
|
|
||||||
|
|
||||||
### Mechanism
|
|
||||||
|
|
||||||
A **FIFO Sequencer** performs round-robin reads:
|
|
||||||
|
|
||||||
```id="b42g8p"
|
|
||||||
Cycle 0 → FIFO1
|
|
||||||
Cycle 1 → FIFO2
|
|
||||||
Cycle 2 → FIFO3
|
|
||||||
Cycle 3 → FIFO4
|
|
||||||
(repeat)
|
|
||||||
```
|
|
||||||
|
|
||||||
### Result
|
|
||||||
|
|
||||||
Reconstructed stream:
|
|
||||||
|
|
||||||
```id="9f8x7x"
|
|
||||||
0, 1, 2, 3, 4, 5, ..., 511
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Throughput Behavior
|
## Key Takeaway
|
||||||
|
|
||||||
* Write side: **4 samples/clk**
|
First make it work end-to-end, then make it fast.
|
||||||
* Read side: **1 sample/clk**
|
|
||||||
|
|
||||||
Over 4 cycles:
|
|
||||||
|
|
||||||
* 4 samples written → 4 samples read
|
|
||||||
|
|
||||||
→ **No data loss (rate preserved over time)**
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## TLAST Handling (Frame Boundary)
|
|
||||||
|
|
||||||
* TLAST is embedded in FIFO data:
|
|
||||||
|
|
||||||
* LSB of FIFO word carries TLAST flag
|
|
||||||
|
|
||||||
```id="l6g0vn"
|
|
||||||
[Data (50 bits)] + [TLAST (1 bit)]
|
|
||||||
```
|
|
||||||
|
|
||||||
* Extracted after FIFO mux and sent to AXI
|
|
||||||
|
|
||||||
### Behavior
|
|
||||||
|
|
||||||
* TLAST asserted **once per frame**
|
|
||||||
* Typically associated with final bin (e.g., bin 511)
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## AXI4-Stream Interface
|
|
||||||
|
|
||||||
Output signals:
|
|
||||||
|
|
||||||
* `tdata` → uint64 packed data
|
|
||||||
* `tvalid` → data valid
|
|
||||||
* `tready` → backpressure from DMA
|
|
||||||
* `tlast` → frame boundary
|
|
||||||
|
|
||||||
### Data Path
|
|
||||||
|
|
||||||
```id="9m7iqk"
|
|
||||||
PL → AXI4-Stream → AXI DMA (S2MM) → DDR → PS
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Backpressure Handling
|
|
||||||
|
|
||||||
AXI backpressure (`tready`) propagates upstream:
|
|
||||||
|
|
||||||
* If `tready = 0`:
|
|
||||||
|
|
||||||
* FIFO reads pause
|
|
||||||
* Data accumulates in FIFOs
|
|
||||||
|
|
||||||
### Protection Mechanism
|
|
||||||
|
|
||||||
* FIFO_Sequencer only reads when:
|
|
||||||
|
|
||||||
* AXI is ready
|
|
||||||
* Data available in all FIFOs
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Triggered Capture Mechanism
|
|
||||||
|
|
||||||
### Trigger Source
|
|
||||||
|
|
||||||
* Software writes to register
|
|
||||||
* Generates 1-cycle pulse (`TriggerCapture`)
|
|
||||||
|
|
||||||
### FFT_Capture Behavior
|
|
||||||
|
|
||||||
State machine:
|
|
||||||
|
|
||||||
```id="27qf3x"
|
|
||||||
IDLE → wait trigger
|
|
||||||
ARMED → wait SOF
|
|
||||||
CAPTURE → collect 128 cycles
|
|
||||||
DONE → assert TLAST
|
|
||||||
```
|
|
||||||
|
|
||||||
### Key Property
|
|
||||||
|
|
||||||
Capture is **frame-aligned** (starts at SOF)
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Architectural Pattern
|
|
||||||
|
|
||||||
The system implements:
|
|
||||||
|
|
||||||
```id="2q6v6x"
|
|
||||||
Parallel Stream (4 samples/clk)
|
|
||||||
↓
|
|
||||||
Banked Memory (4 FIFOs)
|
|
||||||
↓
|
|
||||||
Round-Robin Serialization
|
|
||||||
↓
|
|
||||||
AXI Stream (1 sample/clk)
|
|
||||||
```
|
|
||||||
|
|
||||||
This is a standard FPGA pattern:
|
|
||||||
|
|
||||||
> **Lane-based parallelism + memory banking + time-multiplexed output**
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Notes for Future Work (FrFT Integration)
|
|
||||||
|
|
||||||
### Recommended Insertion Points
|
|
||||||
|
|
||||||
**Option A (Preferred):**
|
|
||||||
|
|
||||||
```id="bb7jbp"
|
|
||||||
FIFO output → FrFT → MUX → AXI
|
|
||||||
```
|
|
||||||
|
|
||||||
**Option B:**
|
|
||||||
|
|
||||||
```id="o5o0qz"
|
|
||||||
MUX → FrFT → AXI
|
|
||||||
```
|
|
||||||
|
|
||||||
### Avoid
|
|
||||||
|
|
||||||
```id="q8k3yo"
|
|
||||||
Before FIFOs (requires 4-sample parallel processing)
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Key Takeaways
|
|
||||||
|
|
||||||
* Multiple FIFOs are required due to **memory port limitations**
|
|
||||||
* Serialization is done via **deterministic round-robin scheduling**
|
|
||||||
* AXI backpressure is safely absorbed using FIFO buffering
|
|
||||||
* Frame integrity is guaranteed via **SOF-aligned capture + TLAST**
|
|
||||||
* Architecture is scalable and suitable for further DSP insertion (e.g., FrFT)
|
|
||||||
|
|
||||||
---
|
|
||||||
Reference in New Issue
Block a user