top of page

Research Blog

Buscar

Estimating a person’s 3D pose and body shape from a single image is a fundamental challenge in computer vision—especially when lighting is poor or the subject is partially occluded. Most traditional approaches rely on RGB images, which often fail in real-world scenarios such as nighttime environments or disaster zones. Our recent work introduces a breakthrough using Single-Pixel Imaging (SPI) in the Near-Infrared (NIR) spectrum (850–1550 nm), combined with Time-of-Flight (TOF) technology. This setup offers a powerful alternative to standard imaging methods. NIR light has the unique ability to penetrate clothing and adapt to changing illumination, making it ideal for human detection in low-visibility conditions. Instead of relying on high-resolution sensors, our SPI system reconstructs 3D point clouds from a series of single-pixel measurements. These point clouds are then processed using advanced deep learning models:


  • A Vision Transformer (ViT) aligns the reconstructed human poses with a predefined SMPL-X skeleton model.

  • A self-supervised PointNet++ network estimates fine-grained attributes such as global rotation, translation, body shape, and pose.

The image illustrates a comprehensive method for constructing a 3D human model
The image illustrates a comprehensive method for constructing a 3D human model

Our lab experiments simulating night-time environments demonstrate the potential of this system for real-world applications, especially in rescue missions where vision-based solutions often fail. With no dependence on ambient light and an architecture tailored for low-SWaP (size, weight, and power) devices, NIR-SPI could become a core technology for search-and-rescue UAVs, surveillance, or night-time human monitoring.

Osorio Quero, C.; Durini, D.; Martinez-Carranza, J. ViT-Based Classification and Self-Supervised 3D Human Mesh Generation from NIR Single-Pixel Imaging. Appl. Sci. 2025, 15, 6138. https://doi.org/10.3390/app15116138


In disaster-stricken areas, locating victims swiftly is of utmost importance. One of the most effective ways to achieve this is by detecting radio frequency (RF) signals emitted from communication devices. These signals, originating from cellular networks, radio broadcasts, and satellite communications, provide crucial indicators of human presence. However, scanning across multiple frequencies to identify relevant signals efficiently remains a challenge.



System Overview


Our system is built around three core components:

  1. RTL-SDR Hardware: Provides a flexible and affordable means to scan RF signals over a wide range of frequencies.

  2. FPGA-Based Processing: Accelerates real-time signal processing, ensuring fast and efficient classification of detected signals.

  3. Deep Neural Networks (DNNs): Three different architectures were implemented and tested to enhance the accuracy of signal classification.


This integration allows real-time detection of crucial signals in a disaster area, aiding rescue teams in pinpointing survivors and optimizing their response strategies.


Deep Learning Models and FPGA Integration


To maximize the system’s accuracy, we implemented three deep neural network architectures, trained on a diverse dataset of radio modulations. The networks were optimized for FPGA-based acceleration, leveraging DPU cores for real-time inference. The system is capable of recognizing:


  • AM-SSB-WC (Amplitude Modulation - Single Side Band)

  • AM-DSB-SC (Double Side Band Suppressed Carrier)

  • FM (Frequency Modulation)

  • QPSK (Quadrature Phase Shift Keying)

  • GMSK (Gaussian Minimum Shift Keying)

  • 16QAM (16-Quadrature Amplitude Modulation)

  • OQPSK (Offset Quadrature Phase Shift Keying)

  • 8PSK (8-Phase Shift Keying)

  • BPSK (Binary Phase Shift Keying)

  • OOK (On-Off Keying)


Through extensive testing, the best-performing model achieved an impressive classification accuracy of up to 98%. This level of precision significantly enhances the reliability of RF-based emergency detection systems, ensuring that important distress signals are not overlooked.


Future Applications and UAV Integration


Given its high accuracy and real-time processing capabilities, our system presents a strong candidate for UAV-based emergency response operations. Unmanned Aerial Vehicles (UAVs) equipped with this technology can autonomously scan large disaster zones, detecting and locating RF signals from survivors’ communication devices. This approach could drastically reduce the time required to identify individuals in need of assistance.


Conclusion


Our research demonstrates that cost-effective, FPGA-integrated RF signal detection is a viable solution for emergency response applications. By combining RTL-SDR hardware, FPGA-based processing, and deep learning, we have developed a system that achieves high accuracy in detecting crucial radio signals. The promising results open avenues for further development, particularly in UAV-based implementations, ensuring rapid and efficient victim detection in future disaster scenarios. With ongoing advancements in AI and hardware acceleration, this technology has the potential to revolutionize search and rescue operations, making emergency responses more efficient and saving more lives.

Single-pixel imaging (SPI) is a powerful technique for capturing images under challenging conditions, such as low-light environments or spectral bands where traditional multi-pixel sensors are not readily available. This is particularly crucial in near-infrared (NIR) imaging, covering wavelengths from 850 to 1550 nm, where conventional imaging systems often struggle. In this blog post, we introduce a hybrid approach that leverages Deep Image Prior (DIP) and Generative Adversarial Networks (GANs) to enhance the resolution of SPI-based images.



The Challenge of SPI Resolution


SPI reconstructs images from a series of intensity measurements using a single photodetector. While this method offers advantages in low-light and specialized spectral ranges, it suffers from resolution limitations due to the inherent under-sampling of spatial information. Traditional deep learning-based super-resolution techniques require extensive labeled datasets, which are difficult to acquire for SPI in NIR bands. Our proposed approach mitigates this limitation by utilizing an unsupervised learning framework.


Hybrid Approach: DIP Meets GAN


Deep Image Prior (DIP) is a compelling technique that reconstructs high-quality images without requiring a large training dataset. By coupling DIP with a Generative Adversarial Network (GAN), we improve SPI resolution through an unsupervised learning paradigm. This approach offers several advantages:


  • Reduced Data Dependency: Unlike supervised methods, DIP leverages image priors, reducing the need for extensive SPI datasets.

  • Enhanced Super-Resolution: The GAN component learns to refine the image quality, making it more detailed and perceptually accurate.

  • Optimized Neural Architectures: We enhance the performance by leveraging variations of UNet and GAN architectures across four different neural network configurations.


Implementation and Results


We conducted both numerical simulations and experimental validations to assess the performance of our hybrid model. Key findings include:


  • Improved Image Quality: Our model consistently enhances SPI image resolution, particularly in the NIR range.

  • Robustness to Noise: The DIP-GAN approach exhibits strong resilience to noisy measurements, a common challenge in SPI applications.

  • Architectural Refinements: By optimizing UNet and GAN structures, we achieve significant improvements in feature extraction and detail preservation.


Future Perspectives


Our results demonstrate that combining DIP with GANs is a promising direction for SPI super-resolution, particularly for niche applications in biomedical imaging, remote sensing, and defense technology. Future research could explore:


  • Real-time implementations for SPI-based imaging systems.

  • Adaptations to other spectral bands beyond NIR.

  • Hybrid models incorporating physics-informed neural networks (PINNs) for further refinement.


Conclusion


By integrating DIP and GANs, we propose an innovative, unsupervised approach to improving SPI resolution. This hybrid model significantly reduces the need for large SPI datasets while maintaining high-quality reconstructions, making it a valuable advancement in computational imaging for the NIR spectrum. Our experimental and numerical results validate its effectiveness, paving the way for broader applications in optical imaging and beyond.


BibTeX

@article{OsorioQuero:25,
author = {Carlos Osorio Quero and Irving Rondon and Jose Martinez-Carranza},
journal = {J. Opt. Soc. Am. A},
keywords = {Ghost imaging; Neural networks; Single pixel imaging; Spatial light modulators; Three dimensional imaging; Underwater imaging},
number = {2},
pages = {201--210},
publisher = {Optica Publishing Group},
title = {Improving NIR single-pixel imaging: using deep image prior and GANs},
volume = {42},
month = {Feb},
year = {2025},
url = {https://opg.optica.org/josaa/abstract.cfm?URI=josaa-42-2-201},
doi = {10.1364/JOSAA.541763}
}

Contact
Information

National Institute of Astrophysics, Optics and Electronics (INAOE)

Annie J. Canon 47 Santa María Tonatzintla 72840.

Puebla-Mexico

  • GitHub
  • Icono social LinkedIn
  • ORCID_ICON
  • reseachgate
  • google_scholar
  • YouTube

Thanks for submitting!

©2021 by Carlos Osorio Quero. 

bottom of page