Publications

Title	Details	Date	Abstract	Link	Research Areas
Earbuds Orientation Alignment Based on Markov Chain Monte Carlo Sampling	Author: Xianghao Zhan et al. Nafiul Rashid, Ebrahim Nemati, Jilong Kuang Published: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)	Sep 9, 2025	Earbuds are instrumental in health monitoring but the orientation can variate among users, which may significantly impact the health-monitoring systems’ generalizability. To study the effect of earbuds orientation heterogeneity and align kinematics across earbuds orientations, we collected a dataset with different rotations relative to a baseline orientation. We developed the coordinate transformation by estimating Euler angles in transformation matrices with either grid search or Markov Chain Monte Carlo (MCMC) sampling. Taking ∼ 17 seconds with a personal laptop, the MCMC method accurately estimated the coordinate transformation matrices to enable the transformed tri-axial linear acceleration to better match the baseline tri-axial linear acceleration with an average relative error of 1.897% ( 0.186 m/s2) and a maximum relative error of 2.472% under all rotated orientations. Additionally, using the estimated transformation matrices and Samsung dataset of identification of activities of daily living (ADL), we validated the statistically significant impact of earbuds orientation heterogeneity on ADL identification (p < 0.001), which can cause 14.0% reduction in mean accuracy and 18.7% reduction in mean macro-average F1-score. To sum up, the MCMC method developed can be applied in earbuds kinematics alignment to address orientation heterogeneity and potentially enables better earbuds-based health monitoring.	https://ieeexplore.ieee.org/document/10888055	Digital Health
BallistoBud: Heart Rate Variability Monitoring using Earbud Accelerometry for Stress Assessment	Author: Mahbubur Rahman et al. Mehrab Bin Morshed, Li Zhu, Jilong Kuang Published: Association of Computing Machinery, Conference on Human Factors in Computing Systems (ACM CHI)	Apr 25, 2025	This study explores the potential of commercial earbuds for sensing physiological biomarkers like heart rate (HR), heart rate variability (HRV), and stress arousal. We collected accelerometer (IMU) and photoplethysmography (PPG) data from earbuds to estimate these biomarkers, comparing them to reference electrocardiograms (ECG) across 97 healthy participants. Despite the high accuracy of PPG-based sensing, its power consumption, additional cost, and potential for skin irritation may limit its adoption. We investigated using IMU sensors to capture ballistocardiographic (BCG) signals, which, under specific conditions, matched PPG performance. We introduced ECG-gated BCG heatmap, a novel visualization technique for accurate signal quality annotation, and trained a Random Forest model to differentiate usable from unusable BCG signals, achieving 82\% test accuracy. Filtering out unusable signals significantly reduced HR/HRV estimation error, making these estimates comparable to PPG. Our results highlight the feasibility of accurate physiological sensing with earbuds, paving the way for user-friendly wearable health technologies.	https://dl.acm.org/doi/10.1145/3706598.3714029	Digital Health
Know Your Heart Better: Multimodal Cardiac Output Monitoring using Earbuds	Author: Mahbubur Rahman et al. Mehrab Bin Morshed, Li Zhu, Jilong Kuang Published: International Conference on Acoustics, Speech, and Signal Processing (ICASSP)	Apr 7, 2025	Cardiac Output (CO) is a critical indicator of health, offering insights to cardiac dysfunction, acute stress responses, and cognitive decline. Traditional CO monitoring methods, like Doppler ultrasound and impedance cardiography (ICG), are invasive and impractical for daily use, leading to a gap in continuous, non-invasive monitoring. Although recent advancements explored wearables on heart rate monitoring, these approaches face challenges in accurately estimating CO due to the indirect nature of the signals. To address these challenges, we introduce {\name}, a non-invasive multimodal CO monitoring system with Photoplethysmography (PPG) and Ballistocardiogram (BCG) signals on commodity earbuds. A novel feature fusion method is proposed to integrate raw signals and prior knowledge from both modalities, improving the systems interpretability and accuracy. {\name} achieves a lower error of 1.080~L/min in the leave-one-subject-out settings with 62 subjects, making cardiovascular health monitoring accessible and practical for daily use.	https://ieeexplore.ieee.org/document/10888723	Digital Health
Evaluation of Wearable Head BCG for PTT Measurement in Blood Pressure Intervention	Author: Li Zhu et al. Mehrab Bin Morshed, Mahubur Rahman, Jilong Kuang Published: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)	Mar 7, 2025	This study evaluates the usability of wearable head ballistocardiography (BCG) in providing accurate pulse transit time (PTT) measurements during blood pressure (BP) interventions. Head BCG is a new technique enabling measurement of proximal aortic blood ejection from sensors placed at distal sites, which envisions PTT measurement from single integrated device for cuff-less blood pressure (BP) estimation. However, due to its low signal-to-noise ratio and sensitivity to motion artifacts, accurate beat selection is crucial to ensure the integrity of PTT calculation. In this paper, using inertial measurement unit (IMU) sensors integrated in a prototype earbud, we investigate whether the wearable head BCG signal is aligned with the ground-truth proximal reference acquired from the synchronously-recorded impedance cardiography (ICG) signal, to assess the usability of head BCG as the proximal indicator for PTT measurement at different stages of BP intervention. Wearable BCG signals showed highest reliability during resting states, with 63% of detected j-peaks aligned with ground-truth ICG signals. Beat selection via removal of IBI outliers improves the ratio of reliable peaks, at rest (68%) and during exercise (63% at intervention and 52% at plateau). Other methods, such as template matching or rejecting amplitude outliers, only improves the ratio at rest. Overall, this study reveals characteristics of distortions in the head BCG signal during intervention, as a first step toward robust solutions for PTT-based BP tracking on integrated wearable devices.	https://ieeexplore.ieee.org/document/10890732	Digital Health
Optimizing Earbud-Based Ballistocardiogram: A Comparative Study of Calibration, PCA, and Axis Fusion	Author: Mahubar Rahman et al. Mehrab Bin Morshed, Li Zhu, Jilong Kuang Published: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)	Mar 7, 2025	The earbud-based ballistocardiogram (BCG) holds significant promise for monitoring diverse physiological signals, including stress, cardiac activity, and blood pressure. However, unlike traditional methods that measure the force component along the body longitudinal axis (head to foot) for enhanced BCG signal quality, ear-worn devices are prone to orientation misalignment, leading to significant variations in BCG morphology. To address this challenge, this paper investigates various axis selection techniques for enhancing earbud BCG, including thresholding-based axis selection, principal component analysis (PCA), and sensor-to-segment calibration. We systematically compare these approaches to the widely adopted baseline method, which utilizes the default axis that is most closely aligned with the longitudinal axis during wear. Our analysis focuses on evaluating the heart rate variability (HRV) and morphological features derived from BCG. Through a comprehensive investigation, we aim to identify optimal strategies for obtaining high-quality BCG signals using ear-worn devices.	https://ieeexplore.ieee.org/document/10888723	Digital Health
Super Capacity SRS Design for 5G and Beyond using Channel In-painting	Author: Fan Zhang et al. Shawn Ma, Yang Li Published: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)	Mar 7, 2025	Reliable communication of data in modern wireless systems requires accurate channel state information (CSI). Sounding Reference Signal (SRS) based CSI acquisition enables the estimation of the channel between the base station and user equipment through the uplink transmission of known SRS by the user equipment to the base station. However, limited SRS resources and limited Signal to Noise Ratio (SNR) coverage for which SRS-based CSI acquisition can be performed renders the acquisition of CSI challenging. We propose an approach to perform sparse non-uniform SRS resource allocation and use a masked auto-encoder with a vision transformer backbone to reconstruct full band channel from partial sub-band information. Experiments on data emulating transmission over CDL-A, B, and C channels demonstrate that the proposed method can achieve state-of-the-art normalized mean square error (NMSE) using only 25% of the band information. This translates to a four fold increase in SRS resource capacity and 6dB improvement in SRS coverage under current 5G NR specifications.	https://ieeexplore.ieee.org/document/10890699	Next Generation Communications
MindfulBuddy: Extracting Comprehensive Breathing Biomarkers for Breathing Exercise Biofeedback Using Earbud Motion Sensors	Author: Mahbubur Rahman et al. Mehrab Bin Morshed, Nafiul Rashid, Sharath Chandrashekhara, Jilong Kuang Published: IEEE Internet of Things Journal (IEEE IoT)	Feb 28, 2025	Slow-paced deep breathing exercises have many health benefits, including stress management, lowering blood pressure, pain management, and controlling pulmonary conditions. While biofeedback can significantly improve the efficacy of breathing exercises, existing approaches support limited biomarkers, such as breathing rate, for specific breathing exercises (e.g., equal-phase breathing) in particular conditions without considering the breath-holding phase or variation in device orientation. Therefore, there needs to be a more convenient and robust approach that can generate and deliver comprehensive digital breathing biomarkers to facilitate biofeedback for various types of breathing exercises. In this article, we present a system with lightweight algorithms to passively track mindful breathing in real-time using lower-power earbud motion sensors to extract fine-grained comprehensive breathing biomarkers for generating biofeedback on users’ breathing exercises. We utilize the earbud’s motion sensor data to detect nonbreathing head motion and develop an extensive set of breathing markers, including breathing phases, breathing depth, breathing rate, breathing symmetry, and breath-holding. Such a comprehensive set of biomarkers can enable engaging user experience and effective mindful breathing exercises toward better stress management and overall mental well-being. Moreover, we develop a physiologically informed, novel earbud orientation handling algorithm that makes our biomarkers more resilient to ear canal shape and size. Finally, we showcase potential use-cases based on the breathing biomarkers derived from our algorithms to provide biofeedback on user’s overall breathing performance.	https://ieeexplore.ieee.org/document/10907903?source=authoralert	Digital Health
Low Complexity Layered Decoding for LDPC Codes	Author: Heping Wan et al. Joonyoung Cho, Min Jang, Charlie Zhang Published: IEEE Global Communications 2024	Oct 20, 2024	This paper presents methods to reduce the complexity of the layered decoding for LDPC codes with negligibly impacting the error-correcting performance. We accomplish this by deactivating variable nodes. The LLR updates associated with the deactivated variable nodes are skipped. To determine which variable nodes should be deactivated, the most reliable bit-level channels, derived from the M-ary input channel, are identified. Then, from the variable nodes assigned to the most reliable bit-level channels, we select those with initial LLRs above a threshold as the deactivated variable nodes. We establish an analytical relationship between the threshold and a desired error probability. The threshold can be adjusted accordingly by specifying the desired error probability. The standard 5G LDPC codes are simulated. By setting the desired error probability to 10^-3, more than 20% of the variable nodes are deactivated while the error-correcting performance is preserved.	https://ieeexplore.ieee.org/document/5384575	Next Generation Communications
Universal Cross-app Attacks: Exploiting and Securing OAuth 2.0 in Integration Platforms	Author: Adonis Fung et al. Julien Lecomte Published: Usenix Security Symposium	Oct 1, 2024	Integration Platforms such as Workflow Automation Platforms, Virtual Assistants and Smart Homes are becoming an integral part of the Internet. These platforms welcome third-parties to develop and distribute apps in their open marketplaces, and support “account linking” to connect end-users’ app accounts to their platform account. This enables the platform to orchestrate a wide range of external services on behalf of the end-users. While OAuth is the de facto standard for account linking, the open nature of integration platforms poses new threats, as the platforms’ OAuth architecture could be exploited by untrusted integrated apps. In this paper, we examine the flawed designs of multiapp authorizations that support account linking in integration platforms. We unveil two new platform-wide attacks due to the lack of app differentiation: Cross-app OAuth Account Takeover (COAT) and Request Forgery (CORF). As long as a victim end-user establishes account linking with a malicious app, or potentially with just a click on a crafted link, they risk unauthorized access or privacy leakage of any apps on the platform. To facilitate systematic discovery of vulnerabilities, we developed a semi-automated black-box tool that profiles varied OAuth designs to identify both vulnerabilities in real-world platforms. Our measurement study reveals that among 18 popular consumer- or enterprise-facing integration platforms, 11 and 5 (16 in total) are vulnerable to COAT and CORF respectively, including those built by Microsoft, Google and Amazon. The vulnerabilities render widespread impact, leading to unauthorized control over end-users’ devices and services, covert logging of sensitive information, and compromising a major ecosystem in single click (a CVE with CVSS 9.6). We responsibly reported the vulnerabilities and collaborated with the affected vendors to deploy comprehensive solutions.		Mobile Platform & Solutions
AI/ML Optimized Modulations and Digital Predistortion for RF Impairments	Author: Cale Lo et al. Joonyoung Cho, Longfei Yin, Charlie Zhang Published: International Conference on Communications (ICC 2024)	Jun 10, 2024	We propose machine learning (ML) based optimization methods for modulation and digital predistortion (DPD), that overcome the signal distortion due to power amplifier (PA) non-linearity and memory effects. The proposed methods generate and exploit an adjusted modulation constellation to compensate for a given PA non-linearity, whereas conventional and widely-employed methods mainly rely on DPD to that end. This potentially removes the need for DPD if memory effects are not detrimental and enables the transmitter to operate close to the power saturation region, increasing the PA power efficiency. The ML framework to learn an adjusted constellation is trained to produce a target square quadrature amplitude modulation (QAM) signal at the PA output. We also present a DPD learning architecture for the adjusted constellations. The proposed methods outperform square 16-ary/64-ary QAMs with DPD by more than 1 dB and are within 0.2$\sim$0.3 dB of the theoretical performance at a symbol error rate of $10^{-2}$, when the PA operates in its saturation region.	https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10622835	Artificial Intelligence, Next Generation Communications
AlpaGasus: Training A Better Alpaca with Fewer Data	Author: Lichang Chen et al. Shiyang Li, Jun Yan, Hai Wang, Kalpa Gunaratna, Vikas Yadav, Zheng Tang, Vijay Srinivasan, Tianyi Zhou, Heng Huang, Hongxia Jin Published: International Conference on Learning Representations (ICLR)	May 7, 2024	Large language models (LLMs) strengthen instruction-following capability through instruction-finetuning (IFT) on supervised instruction/response data. However, widely used IFT datasets (e.g., Alpaca’s 52k data) surprisingly contain many low-quality instances with incorrect or irrelevant responses, which are misleading and detrimental to IFT. In this paper, we propose a simple and effective data selection strategy that automatically identifies and filters out low-quality data using a strong LLM (e.g., ChatGPT). To this end, we introduce AlpaGasus, which is finetuned on only 9k high-quality data filtered from the 52k Alpaca data. AlpaGasus significantly outperforms the original Alpaca as evaluated by GPT-4 on multiple test sets and the controlled human evaluation. Its 13B variant matches >90% performance of its teacher LLM (i.e., Text-Davinci-003 generating the 52k data) on test tasks. It also provides 5.7x faster training, reducing the training time for a 7B variant from 80 minutes (for Alpaca) to 14 minutes. Moreover, the experiments prove the efficacy of our method across diverse datasets, base models, and LLM filters. Overall, AlpaGasus demonstrates a novel data-centric IFT paradigm that can be generally applied to instruction-tuning data, leading to faster training and better instruction-following models.	https://arxiv.org/abs/2307.08701	Artificial Intelligence
WiDRa - Enabling Millimeter-Level Differential Ranging Accuracy in Wi-Fi Using Carrier Phase	Author: Vishnu V. Ratnam et al. Bilal Sadiq, Hao Chen, Wei Sun, Shunyao Wu, Boon L., Ng,, Charlie Zhang Published: IEEE Journal on Selected Areas in Communications	Apr 30, 2024	Although Wi-Fi is an ideal technology for many ranging applications, the performance of current methods is limited by the system bandwidth, leading to low accuracy of ~ 1m – a major pain point. Noting that differential range is sufficient for many of these applications, this work proposes WiDRA – a new way of performing ranging with Wi-Fi that provides differential range estimates by using the sum-carrier-phase information. The proposed method is not limited by system bandwidth and can track range changes even smaller than the carrier wavelength. The proposed method is first theoretically justified, while taking into consideration the various hardware impairments affecting Wi-Fi chips. In the process, methods to isolate the sum-carrier phase from the hardware impairments are also proposed. Extensive simulation results show that WiDRa can achieve a differential range estimation root-mean-square-error (RMSE) of ≈ 1 mm in channels with a Rician-factor > 7 (a 100X improvement to existing methods). The proposed methods are also validated on off-the-shelf Wi-Fi hardware to demonstrate feasibility, where they achieve an RMSE of < 1mm in the differential range. Finally, limitations of current investigation and several future directions of exploration are suggested, to further tap into the potential of WiDRa.	https://arxiv.org/pdf/2405.12168	Next Generation Communications
Multimodal Breathing Rate Estimation Using Facial Motion and RPPG From RGB Camera	Author: Migyeong Gwak et al. Korosh Vatanparvar, Li Zhu, Nafiul Rashid, Moshin Ahmed, Jungmok Bae, Jilong Kuang, Alex Gao Published: ICASSP 2024 - 2024 IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP)	Apr 14, 2024	Camera-based respiratory monitoring is contactless, non-invasive, unobtrusive, and easily accessible compared to conventional wearable devices. This paper presents a novel multimodal approach to estimating breathing rate based on tracking the movement and color changes of the face through an RGB camera. A machine learning model determines the final breathing rate between two separately calculated ones from breathing motion and remote photoplethysmography (rPPG) to improve the measurement performance in a broader range of breathing frequencies. Our proposed pipeline is evaluated with 140 facial video recordings from 22 healthy subjects, including 6 controlled and 2 spontaneous breathing tasks ranging from 5 to 30 BPM. The estimation accuracy achieves 1.33 BPM mean absolute error and 86.53% pass rate within 2 BPM error criteria. To the best of our knowledge, our approach outperforms previous works that use a face region alone with a single RGB camera.	https://ieeexplore.ieee.org/document/10446086	Artificial Intelligence
Weakly Supervised Learning for Camera-Based Heart Rate Variability	Author: Jeremy Speth et al. Korosh Vatanparvar, Li Zhu, Jilong Kuang, Alex Gao Published: ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)	Apr 14, 2024	Camera-based pulse measurements from remote photoplethysmography (rPPG) have rapidly improved over recent years due to innovations in video processing and deep learning. However, modern data-driven solutions require large training datasets collected under diverse conditions. Collecting such training data is made more challenging by the need for time-synchronized video and physiological signals as ground truth. This paper presents a weakly supervised learning framework, Freq2Time, to train with heart rate (HR) labels. Our framework mitigates the need for simultaneous PPG or ECG as ground truth, since the HR changes relatively slowly and describes the target rPPG signal over a time interval. We show that 3D convolutional neural network (3DCNN) models trained with the Freq2Time framework give state-of-the-art HR performance with MAE of 2.86 bpm, when tested with challenging smartphone video data from 30 subjects. Additionally, our models still learn accurate rPPG time signals, allowing for other physiological metrics such as heart rate variability.	https://ieeexplore.ieee.org/abstract/document/10446054	Artificial Intelligence
Heart Rate Variability Estimation with Dynamic Fine Filtering and Global-Local Context Outlier Removal	Author: Ramesh Kumar Sah et al. Md. Mahbubar Rahman, Viswam Nathan, Li Zhu, Jungmok Bae, Christina Rosa, Wendy Berry Mendes, Jilong Kuang, Alex Jun Gao Published: ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)	Apr 14, 2024	Consumer hearable technologies such as earbuds are increasingly embedding physiological sensors, including photoplethysmography (PPG) and inertial measurements. They create unique opportunities to passively monitor stress and deliver digital interventions such as music. However, PPG signals recorded from ear canals are often very noisy due to head movement and fit issues. This work proposes algorithms to estimate heart rate variability (HRV) features from noisy PPG signals recorded using earbuds. We have used template matching to determine the signal quality for dynamic fine filtering around the estimated heart rate. We have also improved the inter-beat interval (IBI) outlier detection and removal algorithm using the global-local context of the input PPG signal. The mean absolute error of estimating RMSSD decreased from 70.83 milliseconds (ms) to 24.88 ms, and SDNN decreased from 46.89 ms to 16.60 ms.	https://ieeexplore.ieee.org/document/10447778	Artificial Intelligence
Ballistocardiogram-Based Heart Rate Variability Estimation for Stress Monitoring using Consumer Earbuds	Author: David J. Lin et al. Md Mahbubur Rahman, Li Zhu, Viswam Nathan, Jungmok Bae, Christina Rosa, Wendy B Mendes, Jilong Kuang, Alex J Gao Published: ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)	Apr 14, 2024	Stress can potentially have detrimental effects on both physical and mental well-being, but monitoring it can be challenging, especially in free-living conditions. One approach to address this challenge is to use earbud accelerometers to capture the ballistocardiogram (BCG) response. These sensors allow for noninvasive stress monitoring by estimating physiological indicators linked to stress, such as heart rate variability (HRV). However, ear-worn devices are susceptible to motion artifacts and can exhibit significant BCG signal morphology variations. These challenges necessitate accurate algorithms to estimate HRV for everyday use. Therefore, we developed a method to measure interbeat intervals (IBI) from BCG signals collected from an earbud. To enhance IBI estimation accuracy, we employed a Bayesian method that incorporates robust apriori IBI prediction weighting and sensor fusion techniques. We have also conducted a study involving 97 participants to assess the earbuds’ ability to estimate HRV metrics and classify stressful activities. Our findings demonstrate low IBI estimation error (4.16% ± 1.90%), along with lower errors in subsequent higher-order HRV metrics compared to the state-of-the-art algorithms.	https://ieeexplore.ieee.org/document/10447280	Artificial Intelligence
Core Body Temperature and its Role in Detecting Acute Stress: A Feasibility Study	Author: Mehrab Bin Morshed et al. Md Mahbubur Rahman, Viswam Nathan, Li Zhu, Jungmok Bae, Christina Rosa, Wendy Berry Mendes, Jilong Kuang, Alex Gao Published: ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)	Apr 14, 2024	Core body temperature (CBT) is one of the critical yet under-explored phenomena in the context of stress detection. Several CBT measurement methods exist, but they are often limited in continuous CBT monitoring. Furthermore, how continuous CBT can be used to model acute stress is little explored. We address these challenges by conducting an in-lab controlled study with 97 participants who participated in baseline and stress-inducing tasks while wearing prototype earbuds capable of collecting CBT. We found that accounting for changes from individual baselines in CBT results is acute stress detection with 94.88% accuracy and 94.4% F1-score, which is 29.31% and 26.07% higher in terms of accuracy and F1-score, respectively, compared to generalized features.	https://ieeexplore.ieee.org/abstract/document/10447599	Artificial Intelligence
Joint End-to-End Spoken Language Understanding and Automatic Speech Recognition Training Based on Unified Speech-to-Text Pre-Training	Author: Eesung Kim et al. Yun Tang, Taeyeon Ki, Divya Neelagiri, Vijendra Raj Apsingek Published: ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)	Apr 14, 2024	Modern spoken language understanding (SLU) approaches optimize the system in an end-to-end (E2E) manner. This approach offers two key advantages. Firstly, it helps mitigate error propagation from upstream systems. Secondly, combining various information types and optimizing them towards the same objective is straightforward. In this study, we attempt to build an SLU system by integrating information from two modalities, i.e., speech and text, and concurrently optimizing the associated tasks. We leverage a pre-trained model built with speech and text data and fine-tune it for the E2E SLU tasks. The SLU model is jointly optimized with automatic speech recognition (ASR) and SLU tasks under single-mode and dual-mode schemes. In the single-mode model, ASR and SLU results are predicted sequentially, whereas the dualmode model predicts either ASR or SLU outputs based on the task tag. Our proposed method demonstrates its superiority through benchmarking against FSC, SLURP, and in-house datasets, exhibiting improved intent accuracy, SLU-F1, and Word Error Rate (WER).	https://ieeexplore.ieee.org/document/10447509	Artificial Intelligence
End-To-End Personalized Cuff-Less Blood Pressure Monitoring Using ECG and PPG Signals	Author: Suhas BN et al. Rakshith Sharma Srinivasa, Yashas Malur Saidutta, Jaejin Cho, Ching-Hua Lee, Chouchang Yang, Yilin Shen, Hongxia Jin Published: ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)	Apr 14, 2024	Cuffless blood pressure (BP) monitoring offers the potential for continuous, non-invasive healthcare but has been limited in adoption by existing models relying on handcrafted features from ECG and PPG signals. To overcome this, researchers have looked to deep learning. Along these lines, in this paper, we introduce a novel end-to-end model based on transformers. Further, we also introduce a novel contrastive loss-based loss function for robust training. To study the limits of performance for our proposed ideas, we first study personalized models trained on large subject-specific datasets, and achieve an average mean absolute error of 1.08/0.68 mmHg for systolic (SBP) and diastolic BP (DBP) across all subjects while achieving a best case of 0.29/0.19 mmHg. Further, in the case where subject-specific data is scarce, we leverage transfer learning using multi-subject data, and show that our model outperforms State-of-the-Art (SOTA) methods across varying amounts of subject-specific data.	https://ieeexplore.ieee.org/abstract/document/10445970	Artificial Intelligence
Zero-Shot Intent Classification Using a Semantic Similarity Aware Contrastive Loss and Large Language Model	Author: Jaejin Cho et al. Rakshith Sharma Srinivasa, Ching-Hua Lee, Yashas Malur Saidutta, Chouchang Yang, Yilin Shen, Hongxia Jin Published: ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)	Apr 14, 2024	Zero-shot systems can reduce the cost of collecting data and training in a new domain since they can work directly with the test data without further training. In this paper, we build zero-shot systems for intent classification, based on Semantic Similarity-aware Contrastive Loss (SSCL) that addresses an issue in the original CL which treats non-corresponding pairs indiscriminately. We confirm that SSCL outperforms CL through experiments. Then, we explore how including text or speech in-domain data during the SSCL training affects the out-of-domain intent classification.During the zero-shot classification, embeddings for a set of classes in the new domain are generated to calculate the similarities between each class embedding and an input utterance embedding, after which the most similar class is predicted for the utterance’s intent. Although manually-collected text sentences per class can be used to generate the class embedding, the data collection can be costly. Thus, we explore how to generate better class embeddings without human-collected text data in the target domain. The best proposed method employing an instruction-tuned Llama2, a public large language model, shows the performance comparable to the case where the human-collected text data was used, implying the importance of accurate class embedding generation.	https://ieeexplore.ieee.org/document/10446276	Artificial Intelligence
Leveraging Self-Supervised Speech Representations for Domain Adaptation in Speech Enhancement	Author: Ching-Hua Lee et al. Chouchang Yang, Rakshith Sharma Srinivasa, Yashas Malur Saidutta, Jaejin Cho, Yilin Shen, Hongxia Jin Published: ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)	Apr 14, 2024	Deep learning based speech enhancement (SE) approaches could suffer from performance degradation due to mismatch between training and testing environments. A realistic situation is that an SE model trained on parallel noisy-clean utterances from one environment, the source domain, may fail to perform adequately in another environment, the target (new) domain of unseen acoustic or noise conditions. Even though we can improve the target domain performance by leveraging paired data in that domain, in reality, noisy data is more straightforward to collect. Therefore, it is worth studying unsupervised domain adaptation techniques for SE that utilize only noisy data from the target domain, together with exploiting the knowledge available from the source domain paired data, for improved SE in the new domain. In this paper, we present a novel adaptation framework for SE by leveraging self-supervised learning (SSL) based speech models. SSL models are pre-trained with large amount of raw speech data to extract representations rich in phonetic and acoustics information. We explore the potential of leveraging SSL representations for effective SE adaptation to new domains. To our knowledge, it is the first attempt to apply SSL models for domain adaptation in SE.	https://ieeexplore.ieee.org/document/10447573	Artificial Intelligence
An MVDR-Embedded U-Net Beamformer for Effective and Robust Multichannel Speech Enhancement	Author: Ching-Hua Lee et al. Kashyap Patel, Chouchang Yang, Yilin Shen , Hongxia Jin Published: ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)	Apr 14, 2024	In multichannel speech enhancement (SE) systems based on beamforming, deep neural networks (DNNs) are often used to estimate beamformer weights directly. This approach, however, may not generalize well to new acoustic conditions. Alternatively, DNNs can predict T-F masks for speech and noise patterns that can be used with statistical beamforming. This approach is robust, but its performance is constrained by the later component as relying on certain modeling assumptions, e.g., covariance-based modeling in the minimum-variance-distortionless-response (MVDR) beamformer. In this paper, we propose a novel integration of the two types of methodology by introducing an intra-MVDR module embedded in the U-Net architecture that combines the merits of both, i.e., effectiveness and robustness. Simulation results show that the proposed MVDR-embedded U-Net leads to SE improvements that are not achievable by simply enlarging the network with baseline approaches.	https://ieeexplore.ieee.org/document/10448366	Artificial Intelligence
Leveraging automated knowledge transfer to enable smart home planning capabilities of small language model	Author: Sudipta Paul et al. Lingyu Zhang, Yilin Shen, Hongxia Jin Published: ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)	Apr 14, 2024	Smart home device control is a difficult task if the instruction is abstract and the planner needs to adjust dynamic home configurations. With the increasing capability of Large Language Model (LLM), they have become the customary model for zero-shot planning tasks similar to smart home device control. Although cloud supported large language models can seamlessly do device control tasks, on-device small language models show limited capabilities. In this work, we show how we can leverage large language models to enable small language models for device control task. Towards this goal, we develop an automated system to generate device control planning data leveraging large language model and use the generated data to finetune the small language models. We empirically validate the improvement of small language models’ performance for device control task.	https://ieeexplore.ieee.org/document/10446064	Artificial Intelligence
Extremely Light-Weight Learning Based LDR to PQ HDR Conversion Using Bernstein Curves	Author: Dung Vo et al. Chenguang Liu, McClain Nelson Published: ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)	Apr 14, 2024	The paper proposes a novel automatic Low Dynamic Range (LDR) to Perceptual Quantizer (PQ) High Dynamic Range (HDR) system to convert a LDR input into a HDR output. The process is based on a machine learning model that generates a Bernstein inverse tone mapping (iTM) curve. The monotonic constrain is also considered to maintain the curve monotonic characteristic. The proposed model is kept very small and the iTM is implemented pixel by pixel so that the whole system is extremely light-weight. Experiment results show that the proposed iTM can learn the LDR to HDR conversion style of the experts and outperforms other methods.	https://ieeexplore.ieee.org/abstract/document/10447932	Artificial Intelligence
Multi-Person Respiration Rate Estimation With Single Pair Of Transmit And Receive Antenna	Author: Hao-Hsuan Chang et al. Vishnu Ratnam, Hao Chen, Junsu Choi, Charlie Jianzhong Zhang Published: ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)	Apr 14, 2024	Human respiration rate (RR) estimation is essential for various health care applications, such as sleep apnea detection and chronic obstructive pulmonary disease early diagnose. Recently, radio frequency based RR estimation has achieved high accuracy for single-person RR detection. However, multi-person RR estimation is still the obstacle blocking the wide commercialization of RF sensing based RR solution. In this paper, a novel multi-person RR estimation algorithm that can overcome the frequency resolution limit is present. The proposed algorithm is not only analytically justified but also verified in a real test-bed involving commercial off-the-shelf WiFi devices. Extensive experiment results show a 98% accuracy in people-counting and a root mean square error (RMSE) of 0.13 breath per minute (bpm) on RR detection. To the best of our knowledge, this is the first WiFi sensing work that can detect different people who share the same RR by only using a single pair of transmit and receive antenna.	https://ieeexplore.ieee.org/document/10446996	Artificial Intelligence
Normalization is All You Need: Robust Full-Range Contactless SpO2 Estimation Across Users	Author: Qijia Shao et al. Li Zhu, Moshin Ahmed , Korosh Vatanparvar, Migyeong Gwak, Nafiul Rashid, Jungmok Bae, Jilong Kuang, Alex Gao Published: ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)	Apr 14, 2024	The accurate estimation of peripheral capillary oxygen saturation (SpO 2 ) is vital for monitoring respiratory health, with applications spanning medical diagnostics and fitness tracking. Remote photoplethysmography (rPPG) offers a convenient and non-contact approach for SpO 2 estimation. However, existing methods predominantly rely on data within the normal SpO 2 range, hindering their effectiveness during hypoxemia. Moreover, cross-user variations poses significant challenges for practicality. To address these limitations, we propose a simple yet effective normalization-based SpO 2 estimation algorithm. By aligning individual Ratio-of-Ratios (RoR) data with a standard model at the matching SpO 2 level, we mitigate cross-user variation, accommodate different camera configurations, and account for lighting changes. Our experiments demonstrate that the proposed method achieves an rMSE of 2.8% with leave-one-subject-out cross-validation across the full SpO 2 range (70%-100%), significantly outperforming existing RoR-based and CNN-based SpO 2 estimation approaches. Notably, our methods excel in accurately identifying hypoxemia, a critical clinical requirement. We anticipate broader applicability of our approach in rPPG-based vital sign monitoring, underlining the potential for enhancing robustness and reliability in various domains.	https://ieeexplore.ieee.org/document/10446435	Artificial Intelligence
Unified Srgb Real Noise Synthesizing with Adaptive Feature Modulation	Author: Wenbo Li et al. Zhipeng Mo, Yilin Shen, Hongxia Jin Published: ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)	Apr 14, 2024	Recently, the Neighboring Correlation-Aware (NeCA) noise model has achieved impressive performance on both noise synthesis and the downstream image denoising task. However, its design regarding noise-level prediction requires training NeCA separately for each camera type. To this end, by making use of an adaptive feature modulation technique, we improve NeCA’s noise-level prediction model to be unified for different camera types and thus enable a unified sRGB real noise synthesis method. We also find out that in the neigh-boring correlation network of NeCA, there is no mechanism to maintain the signal dependency of the synthesized noise. Therefore, we introduce another adaptive feature modulation technique to the neighboring correlation network to maintain the signal dependency of the noise.	https://ieeexplore.ieee.org/abstract/document/10447546	Artificial Intelligence
CWCL: Cross-Modal Transfer with Continuously Weighted Contrastive Loss	Author: Rakshith Srinivasa et al. Jaejin Cho, Chouchang Yang, Yashas Malur Saidutta, Chinghua Lee, Yilin Shen, Hongxia Jin Published: Neural Information Processing Systems (NeurIPS)	Dec 10, 2023	This paper considers contrastive training for cross-modal 0-shot transfer wherein a pre-trained model in one modality is used for representation learning in another domain using pairwise data. The learnt models in the latter domain can then be used for a diverse set of tasks in a 0-shot way, similar to Contrastive Language-Image Pre-training (CLIP) and Locked-image Tuning (LiT) that have recently gained considerable attention. Classical contrastive training employs sets of positive and negative examples to align similar and repel dissimilar training data samples. However, similarity amongst training examples has a more continuous nature, thus calling for a more non-binary treatment. To address this, we propose a new contrastive loss function called Continuously Weighted Contrastive Loss (CWCL) that employs a continuous measure of similarity. With CWCL, we seek to transfer the structure of the embedding space from one modality to another. Owing to the continuous nature of similarity in the proposed loss function, these models outperform existing methods for 0-shot transfer across multiple models, datasets and modalities. By using publicly available datasets, we achieve 5-8% (absolute) improvement over previous state-of-the-art methods in 0-shot image classification and 20-30% (absolute) improvement in 0-shot speech-to-intent classification and keyword classification.	https://openreview.net/pdf?id=hz10oiVMNE	Artificial Intelligence
Training Energy-Based Normalizing Flow with Score-Matching Objectives	Author: Yen-Chang Hsu Published: Neural Information Processing Systems (NeurIPS)	Dec 10, 2023	” In this paper, we establish a connection between the parameterization of flow-based and energy-based models, and present a new flow-based modeling approach called energy-based normalizing flow (EBFlow). We demonstrate that by optimizing EBFlow with score-matching objectives, the computation of Jacobian determinants for linear transformations can be entirely bypassed. This feature enables the use of arbitrary linear layers in the construction of flow-based models without increasing the asymptotic complexity of each training iteration. In addition to the reduction in runtime, we enhance the training stability and empirical performance of EBFlow through a number of techniques developed for score-matching methods. Our experimental results demonstrate that our approach exhibits improved efficiency compared to maximum likelihood estimation, and outperforms the other flow-based models trained using score-matching methods in recent literature. “	https://arxiv.org/pdf/2305.15267.pdf	Artificial Intelligence
DVSOD: RGB-D Video Salient Object Detection	Author: Jingjing Li et al. Wei Ji, Size Wang, Wenbo Li, Li Cheng Published: 37th Conference on Neural Information Processing Systems (NeurIPS 2023)	Dec 10, 2023	Salient object detection (SOD) aims to identify standout elements in a scene, with recent advancements primarily focused on integrating depth data (RGB-D) or temporal data from videos to enhance SOD in complex scenes. However, the unison of two types of crucial information remains largely underexplored due to data constraints. To bridge this gap, we in this work introduce the DViSal dataset, fueling further research in the emerging field of RGB-D video salient object detection (DVSOD). Our dataset features 237 diverse RGB-D videos alongside comprehensive annotations, including object and instance-level markings, as well as bounding boxes and scribbles. These resources enable a broad scope for potential research directions. We also conduct benchmarking experiments using various SOD models, affirming the efficacy of multimodal video input for salient object detection. Lastly, we highlight some intriguing findings and promising future research avenues. To foster growth in this field, our dataset and benchmark results are publicly accessible at: https://dvsod.github.io/.	https://papers.nips.cc/paper_files/paper/2023/file/1b88e65f737256d437e56764d39ba06d-Paper-Datasets_and_Benchmarks.pdf	Artificial Intelligence
RGB-D Video Salient Object Detection	Author: Jingjing Li et al. Wei Ji, Size Wang, Wenbo Li, Li Cheng Published: Neural Information Processing Systems (NeurIPS) 2023	Dec 10, 2023	Salient object detection (SOD) aims to identify standout elements in a scene, with recent advancements primarily focused on integrating depth data (RGB-D) or temporal data from videos to enhance SOD in complex scenes. However, the unison of two types of crucial information remains largely underexplored due to data constraints. We address this gap by introducing the DViSal dataset, fueling further research in the emerging field of RGB-D video salient object detection (DVSOD). Our dataset features 237 diverse RGB-D videos alongside comprehensive annotations, including object and instance-level markings, as well as bounding boxes and scribbles. These resources enable a broad scope for potential research directions. We also conduct benchmarking experiments using various SOD models, affirming the efficacy of multimodal video input for salient object detection. Lastly, we highlight intriguing findings and promising future research avenues. To foster growth in this field, our dataset and benchmark results are publicly accessible at: https://dvsod.github.io.	https://dvsod.github.io/	Artificial Intelligence
Loudspeaker position identification using human speech directivity index	Author: Adrian Celestinos et al. Carren Zhongran Wang, Victor Manuel Chin Lopez Published: Audio engineering society convention (AES)	Oct 25, 2023	“Extended Summary Often a regular user of a multichannel loudspeaker system in regular living rooms places the loudspeakers in a non-uniform manner, with angles that don’t necessarily follow the recommended ITU-R BS.2159-4 standard, and with inconsistent distances from each speaker to the listener. By identifying the physical loudspeakers’ location, a spatial correction can thus be applied to recreate the artistic intention of the producer. The main goal of this proposal is to obtain the user/listener location with respect to the loudspeakers, assuming a multichannel audio system equipped with N number of loudspeakers and M very near-field (NF) microphones attached to each speaker. This is done by using a supervised machine learning (ML) model that is trained with the human speech directivity index (DI) computed by room simulations, where the sound source is the typical directivity radiation pattern of human speech and the NF receivers attached to the loudspeakers are located around the listener. The DI represents an acoustical energy ratio of one specific direction to all directions. The human voice presents a unique directivity pattern which is frequency, angle/direction and distance dependent; the computed DI carries that information. Assuming the setup described above with multiple microphones/receivers placed in a room and a human speaker as source, then the DI can be extracted from a multichannel voice command recorded from the user. The neural network (NN) is trained with DI data computed from human speech in-room simulations. An image source room simulation model is utilized to replicate a typical human speech recorded by receivers placed in typical loudspeaker positions around the source (user). Since the NF microphones are attached to the loudspeakers as close as possible to the driver, their directivity is affected due to the loudspeaker baffle. In the simulation model the NF microphone directivity is included. Typical female and male directivity was included for the source in the simulations. A customized room generator was used to create shoe-box room setups of various sizes. Each setup has material absorption coefficients chosen from a selection pool, with randomized receiver and source locations within some limits. A total of 39 rooms were simulated. Among these simulations there were three room sizes from 80 to 300 cubic meters. A total of 1140 setups consisting of 570 x two gender sets of IR data were computed. Each setup includes four channels of simulated IRs by gender. The result of the simulation was impulse responses (IR) at the NF microphone/receivers which then were convolved with anechoic human male and female mono recordings. Thus, convolution audio is the result from the simulation representing the voice command audio that the loudspeaker multichannel NF mics are supposed to “record’’ in each simulation case. The data was split as 80% for testing, 10% for training and 10% for validation. Before passing the data to the NN model, principal component analysis (PCA) was utilized to increase interpretability and reduce dimensionality. The dB values were converted to linear amplitude values to facilitate the PCA analysis. Then the training, test and validation sets were passed to the two NN models, one for the distance to the user, and one for the incidence angle. The distance NN model included an Input layer, two hidden layers, and an output layer. The angle estimation network included an input”	https://www.aes.org/e-lib/browse.cfm?elib=22292	Digital Media
In-Ear Headphones on Ear Canal Simulator vs Real Human Ear Geometries: Quantifying the Differences with Simulations	Author: Andri Bezzola Published: Audio engineering society convention (AES)	Oct 25, 2023	“Measurements of the pressure at the ear Drum Reference Point (DRP) in humans are extremely difficult and bear a high risk of injury to the eardrum and ear canal. In order to circumvent this challenge, ear simulators have been developed to predict a population average of the pressure response at DRP. I.e. the results of the simulator should mimic the acoustic behavior of an average human ear. However, these simulators do not predict the range and variance of the pressure responses at DRP, only their average. Recent research by Olive et al. suggests that there is significant variance in frequency response of over-ear headphones, and they emphasize the need for personalized headphone solutions [1]. We can estimate the pressure response at DRP for in-ear headphones outfitted with microphones in the cavity between headphone driver and eardrum by means of a transfer function G(f) that relates the pressure at DRP with the pressure inside the earbud. \|D(f)\|=\|H(f)\|+\|G(f)\|, Where \|H(f)\| is the magnitude of pressure response in dB at the microphone inside the earbud, \|D(f)\| is the magnitude of the pressure response in dB at DRP, and G(f) is the transfer function from pressure at earbud microphone to pressure at DRP. If an accurate (personalized) function for G_p (f) can be found, then we can accurately predict D_p (f), but obtaining an accurate G_p (f) by means of measurement is nearly impossible in live humans, and using an average approximation of G_a (f) does not accurately describe the personalized sound pressure response at DRP. In order to quantify the variance of error between average G_a (f) provided by the simulator and the personalized G_p (f) in a real human ear we used finite element simulations of MRI scans of 10 subjects (20 ears) at five different insert depths each. The MRI scans were obtained from the openly available “IHA database of human geometries including torso, head, and complete outer ears for acoustic research” [2]. From these simulations we obtained a spread of personalized H_p (f) and D_p (F), from which we could calculated 100 different personalized G_p (f). This ensemble of personalized G_p (f) were then compared to the G_a (f) obtained on G.R.A.S HATS simulator outfitted with 711 couplers. The range of errors observed at frequencies below 1 kHz was less than ±1 dB, assuming no leakage in all cases. Between 1 kHz and 4 kHz, the range of errors was within ±5 dB and above 4 kHz the error grows to ±15 dB or more. These results align well with the observations of Olive et al. that personalized solutions are needed in headphones just room equalization is needed for an optimal listening experience with a convention speaker setup. The difference in shapes of the ear canals are large enough to warrant the investigation of personalized equalization, which can deviate from the standard voicing which is often performed on ear canal simulators. “	https://www.aes.org/e-lib/browse.cfm?elib=22277	Digital Media
Implementation of Simultaneous Deconvolution on a Real-time Smartphone App	Author: Ashish Rawat et al. Sunil Bharitkar, Allan Devantier, Matthew Ryan McDuffee, Ritesh Banka Published: Audio engineering society convention (AES)	Oct 25, 2023	In-room speaker system equalization was traditionally implemented by exciting one speaker at a time. With a higher number of speakers, restrictions of measurement microphone setup, the annoyance factor due to traditional stimuli, and background noises, the process of measuring the impulse response of a multi-channel system in real-time can be cumbersome. With FFT computation restrictions on a smartphone DSP, the accuracy and resolution of the impulse responses are compromised. This paper addresses all of these concerns with a novel approach to implementing the Simultaneous Deconvolution of a multichannel speaker system. It uses a set of circularly shifted Sine-Sweep stimuli to excite the speakers and calculate the impulse responses in real-time on a smartphone app over a cloud-based architecture. An independent recording and playback system, along with manual delays or system delays due to Bluetooth, Wi-Fi, or cloud-based communication, pose further challenges to the accuracy of our measurements. To surmount these complications, we discuss a time-alignment method that uses bin-wise matched filtering of spectrograms, followed by a statistical analysis of its results.	https://www.aes.org/e-lib/browse.cfm?elib=22287	Digital Media
Perceptually Motivated Bitrate Allocation for Object-Based Audio Using Opus Codec	Author: Toni Hirvonen et al. Carlos Tejeda Ocampo, Ema Souza Blanes, Sunil Bharitkar Published: Audio engineering society convention (AES)	Oct 25, 2023	We reviewed the performance of the opus codec for object-based audio using two different bit-allocation strategies: a vanilla method that uses the same bitrate for each object and joint allocation method that distributes the total bitrate among objects using their energy-based perceptual importance. Proposed joint allocation significantly outperformed the vanilla method at the same total bitrate and achieved an Excellent score during MUSHRA testing.	https://www.aes.org/e-lib/browse.cfm?elib=22279	Digital Media
Excitation Stimuli For Simultaneous Deconvolution of Room Responses	Author: Sunil Bharitkar et al. Ema Souza Blanes, Pascal Brunet Published: Audio engineering society convention (AES)	Oct 25, 2023	This paper compares three state-of-the-art stimuli (multitone-pink, MLS, and log-sweep) for the deconvolution of several loudspeaker-room impulse responses using a single time-domain measurement after exciting all the loudspeakers simultaneously. A Bayesian hyper-parameter optimization algorithm constructs the stimulus, where the algorithm optimizes the stimuli parameters by minimizing a {\em time-domain error} between the actual impulse responses and the simultaneously deconvolved responses over a training dataset. Objective results are presented for the various stimuli on a test dataset, whereas subjective tests compare the preference to the excitation stimuli played on all the loudspeakers. Additionally, the robustness of the constructed stimuli to various noises at different signal-to-noise ratios (SNR) is compared in the context of simultaneous deconvolution.	https://www.aes.org/e-lib/browse.cfm?elib=22321	Digital Media
Advances in Perceptual Bass Extension for Music and Cinematic Content	Author: Sunil Bharitkar et al. Ema Souza Blanes, Glenn S Kubota, Ashish Rawat Published: Audio engineering society convention (AES)	Oct 25, 2023	“Small-form factor and thin devices exhibit a high-pass frequency response due to loudspeaker-enclosure constraints. The low-frequency reproduction loss from these devices severely degrades the audio experience for music and cinematic content. In this paper, we present a new perceptual bass extension model using a side chain for music and cinematic content and leveraging the principle of the missing fundamental frequency. Optimizing the nonlinear function parameters enables the nonlinear function output to be invariant to input signal level changes. The model employs a unique input gain normalization scheme based on loudness metadata and level-matching between multiple side chains. A loudness compensation algorithm restores the perception of the loss of bass, particularly at low playback levels. Subjective testing and perceptually derived objective metrics using television (TV) loudspeakers validate the performance of the approach. “	https://www.aes.org/e-lib/browse.cfm?elib=22055	Digital Media
Application of ML-Based Time Series Forecasting to Audio Dynamic Range Compression	Author: Pascal Brunet et al. Yuan Li, Retiree Published: Audio engineering society convention (AES)	Oct 25, 2023	Time Series Forecasting (TSF) is used in astronomy, geology, weather forecasting, finance to name a few. Recent research [1] has shown that, combined with Machine Learning (ML) techniques, TSF can be applied successfully for short-term prediction of music signals. We present here an application of this approach to prediction of music levels and Dynamic Range Compression (DRC). Look ahead prediction of audio level allows to apply compression just-in-time, avoiding latency and attack/release time constants, which are proper to traditional DRC and difficult to tune.	https://www.aes.org/e-lib/browse.cfm?elib=22266	Digital Media
Explainable and Accurate Natural Language Understanding for Voice Assistants and Beyond	Author: Kalpa Gunaratna et al. Vijay Srinivasan, Hongxia Jin Published: International Conference on Information and Knowledge Management (CIKM)	Oct 21, 2023	Joint intent detection and slot filling is invaluable for smart voice assistants, which is also termed as joint NLU (Natural Language Understanding). Most of the recent advancements in this area has been to improve accuracy using various techniques. Explainability is undoubtedly an important aspect for deep learning-based models including joint NLU models since they are considered black box models. Their decisions are opaque to the outside world and hence, tendency to lack user trust. This this proposed work, we show that is it possible to make the full joint NLU model `inherently explainable at granular levels of explanations without compromising on the accuracy. Further, as we enable the full joint NLU model explainable, we show that our extensions can be used in other general classification tasks such as sentiment analysis and named entity recognition (NER).	https://dl.acm.org/doi/pdf/10.1145/3583780.3615277	Artificial Intelligence
Massive MIMO Evolution Towards 3GPP Release 18	Author: Gilwon Lee et al. Emad Farag, Dalin Zhu, Eko Onggosanusi Published: IEEE Journal on Selected Areas in Communications	Oct 19, 2023	Since the introduction of fifth-generation new radio (5G-NR) systems in Third Generation Partnership Project (3GPP) Release 15, swift progress has been made in to evolve 5G. Recently, 3GPP began work on Release 18, a.k.a. 5G-Advanced, to standardize systems beyond 5G. From its predecessors, 5G-NR can be distinguished in several different aspects. One critical aspect is the design of the radio access network (RAN) via massive multiple-input multiple-output (MIMO) technology, resulting in superior coverage, spectral efficiency, and reliability. This paper makes several important contributions. We first provide a comprehensive overview of the evolution of standardized massive MIMO features from 3GPP Release 15 to 17 for both time/frequency-division duplex (TDD/FDD) operation across both 3GPP frequency ranges, i.e., FR-1 (microwave) and FR-2 (millimeter-wave). In particular, we present the common massive MIMO architectures in commercial radio products, analyze the progress on channel state information (CSI) acquisition frameworks for single and multi-user (UE) operation, study beam management/indication frameworks for FR-2 bands, and present enhancements for uplink CSI. Secondly, by comparing concepts proposed for the 3GPP Release 18 Work Item relative to those for contemporaneous releases, we shed light on the emerging problems requiring imminent attention at the physical layer. These include advanced codebook design (for FDD) and sounding reference signal design (for TDD systems) optimized for coherent joint transmission (CJT) from multiple transmission/reception points (multi-TRPs); advancements in uplink demodulation reference signal design to support higher-order massive MIMO, enhancements for mobility to provide accurate CSI estimates, and unified transmission configuration indicator (TCI) framework tailored for FR-2-based multi-TRPs to reduce beam switch latency. For each of these concepts, we provide comprehensive system level simulation results to highlight their performance gains relative to systems standardized in Release 15 to 17. Thirdly, via real-world field trials in an outdoor urban environment at Shanghai Jiaotong University, China, we demonstrate the gains in practice of multi-TRP CJT with real-time constraints relative to single TRP transmissions. To the best of our knowledge, a contribution of this type, which amalgamates massive MIMO evolution with novel concepts and results for 3GPP Release 18 has been missing from the literature.	https://arxiv.org/ftp/arxiv/papers/2210/2210.08218.pdf	Next Generation Communications
Compositional Generalization in Spoken Language Understanding	Author: Yilin Shen et al. Hongxia Jin Published: Annual Conference of the International Speech Communication Association (INTERSPEECH)	Aug 24, 2023	State-of-the-art spoken language understanding (SLU) models have shown tremendous success in benchmark SLU datasets, yet they still fail in many practical scenario due to the lack of model compositionality when trained on limited training data. In this paper, we study two types of compositionality: novel slot combination, and length generalization. We first conduct in-depth analysis, and find that state-of-the-art SLU models of- ten learn spurious slot correlations during training, which leads to poor performance in both compositional cases. To miti- gate these limitations, we create the first compositional splits of benchmark SLU datasets and we propose the first compositional SLU model, including compositional loss and paired training that tackle each compositional case respectively. On both benchmark and compositional splits in ATIS and SNIPS, we show that our compositional SLU model significantly out- performs state-of-the-art BERT SLU model.	https://www.isca-speech.org/archive/pdfs/interspeech_2023/ray23_interspeech.pdf	Artificial Intelligence
Robust Keyword Spotting for Noisy Environments by Leveraging Speech Enhancement and Speech Presence Probability	Author: Chouchang Yang et al. Yashas Malur Saidutta, Rakshith Srinivasa,, Chinghua Lee, Yilin Shen, Hongxia Jin Published: Annual Conference of the International Speech Communication Association (INTERSPEECH)	Aug 21, 2023	Although various deep keyword spotting (KWS) systems have demonstrated promising performance under relatively noiseless environments, accurate keyword detection in the presence of strong noise encountered in our daily lives remains challenging. Room acoustics and noise conditions can be highly diverse, which can lead to drastic performance degradation if not handled carefully. In this paper, we aim to make deep KWS systems with small model sizes robust to environmental noise. We propose a noise management module (SE-SPP Net) that estimates both the denoised Mel spectrogram and the position of the speech utterance in the noisy input signal. The latter is estimated as the probability of a particular TF bin containing speech. Further, it comes at relatively no cost in model size when compared to a model estimating the denoising mask. Our proposed SE-SPP with KWD module can improve keyword speech performance by up to 7% compared to a similar sized SOTA model at SNR -10dB.	https://www.isca-speech.org/archive/pdfs/interspeech_2023/yang23t_interspeech.pdf	Artificial Intelligence
ESC: Exploration with SoF Commonsense Constraints for Zero-shot Object Navigation	Author: Yilin Shen et al. Hongxia Jin Published: International Conference on Machine Learning (ICML)	Jul 29, 2023	Navigating to the right place to localize the desired object is the fundamental ability of embodied agents that interact with the objects and complete real-world tasks. Such object navigation tasks usually require large-scale training in visual environments with labeled objects. In this work, we find that the knowledge in pre-trained models for semantic scene understanding and commonsense reasoning can be transferred to open-world object navigation without any navigation experience nor any other training on the visual environments to achieve training-free zero-shot object navigation. However, these large pre-trained models may not directly generate navigation actions well. To mitigate the gap between the pre-trained knowledge and navigation actions, we propose a framework combining the commonsense knowledge with an existing exploration method to enable exploration with commonsense (EwC) using Probabilistic Soft Logic (PSL). Extensive experiments on MP3D, HM3D, and RoboTHOR (Deitke et al., 2020; Chang et al., 2017; Ramakrishnan et al., 2021) benchmarks shows that our method improves significantly over former baselines, and achieves new state-of-the-art results for zero-shot object navigation (e.g., 215% relative Success Rate improvement than CoW (Gadre et al., 2022) on MP3D). Our ablation studies also validate the efficacy of commonsense reasoning.	https://arxiv.org/pdf/2301.13166.pdf	Artificial Intelligence
MM-HAR: Multi-Modal Human Activity Recognition Using Consumer Smartwatch and Earbuds	Author: Nafiul Rashid et al. Ebrahim Nematihosseinabadi, Mohsin Ahmed, Jilong Kuang, Alex Gao Published: Engineering in Medicine and Biology Conference (EMBC)	Jul 24, 2023	Human Activity Recognition (HAR) is one of the important applications of digital health that helps to track fitness or to avoid sedentary behavior by monitoring daily activities. Due to the growing popularity of consumer wearable devices, smartwatches, and earbuds are being widely adopted for HAR applications. However, using just one of the devices may not be sufficient to track all activities properly. This paper proposes a multi-modal approach to HAR by using both buds and watch. Using a large dataset of 44 subjects collected from both in-lab and in-home environments, we demonstrate the limitations of using a single modality as well as the importance of a multi-modal approach. Moreover, we also train and evaluate the performance of five different machine learning classifiers for various combinations of devices such as buds only, watch only, and both. We believe the detailed analyses presented in this paper may serve as a benchmark for the research community to explore and build upon in the future.	https://ieeexplore.ieee.org/document/10340984	Digital Health
Power Optimized Smartwatch-Earbuds Multimodal System for Monitoring Activities of Daily Living	Author: Mohsin Ahmed et al. Ebrahim Nematihosseinabadi, Nafiul Rashid, Maksim Shurpo, Jilong Kuang, Alex Gao Published: Engineering in Medicine and Biology Conference (EMBC)	Jul 24, 2023	We implement an end-to-end mobile and wearable system using earbuds and smartwatch IMU sensors for detecting various activities of daily living and present various power optimizing scheme to reduce power consumption.	https://ieeexplore.ieee.org/document/10340166	Digital Health
Estimating SpO2 with Deep Oxygen Desaturations from Facial Video Under Various Lighting Conditions: A Feasibility Study	Author: Li Zhu et al. Korosh Vatanparvar, Migyeong Gwak, Jilong Kuang, Alex Gao Published: Engineering in Medicine and Biology Conference (EMBC)	Jul 24, 2023	This paper presents a feasibility study to collect data, process signals, and validate accuracy of peripheral oxygen saturation (SpO2) estimation from facial video in various lighting conditions. The remote photoplethysmogram (rPPG) signals were first extracted from the facial videos recorded using RGB camera without auto-tuning. These videos were collected from subjects breathing through mouth tube with their nose clipped. The air inhaled by the subjects were manually controlled to gradually induce hypoxemia and lower subjects’ SpO2 to as low as 81%. We applied the principle of pulse oximetry and extracted the ratio of ratios (RoR) for two color combinations: Red/Blue and Red/Green. Next, we assessed SpO2 estimation accuracy against a SpO2 multi-wavelength analyzer under four lighting conditions: warm color temperature and normal brightness, neutral color temperature and normal brightness, cool color temperature and normal brightness, neural color temperature and dim brightness. We have achieved an RMSE of 1.93% and a PCC of 0.97 under the warm color temperature and normal brightness lighting condition using leave-one-subject-out cross validation between two subjects. The results have shown that it is feasible to estimate SpO2 remotely and accurately using consumer level RGB camera with suitable camera configuration and lighting condition.	https://ieeexplore.ieee.org/document/10340025	Digital Health
RRDetection: Respiration Rate Estimation Using Earbuds During Physical Activities	Author: Yincheng Jin et al. Mahbubur Rahman, Retiree, Jilong Kuang, Alex Gao Published: Engineering in Medicine and Biology Conference (EMBC)	Jul 24, 2023	Respiratory rate is a fundamental vital sign that is sensitive to different pathological conditions (e.g., adverse cardiac events, pneumonia, and clinical deterioration) and stressors, including emotional stress, cognitive load, heat, cod, physical effort, and exercise-induced fatigue. Traditionally, the capnograph device was recognized as the gold standard to monitor the respiratory measurement, however it is expensive and inconvenient to be used at home. There are recent development on monitoring breathing rate passively using earbuds. However, those are limited to resting condition. In this paper, we develop novel algorithm based on earbud motion sensor data that can estimate breathing rate while the user is walking or doing physical exercises such as step test.	https://ieeexplore.ieee.org/document/10340157	Digital Health
Enhanced AI Based CSI Prediction Solutions for Massive MIMO in 5G and 6G Systems	Author: Daoud Burghal et al. Yang Li, Pranav Madadi, Yeqing Hu, Joonyoung Cho, Jeongho Jeon, Andreas Molisch, Charlie Zhang Published: IEEE Access	Jun 30, 2023	“Accurate Channel State Information (CSI) is critical for maximizing throughput of massive Multi-Input Multi-Output (mMIMO) systems. Due to the environment dynamics and user mobility, CSI aging is a major challenge to achieving the large throughput potential of mMIMO. CSI prediction can be used to overcome this without increasing the overhead. Motivated by the anticipated native support for Artificial Intelligence (AI) in the fifth generation and beyond cellular standards, we propose deep learning CSI prediction solutions based on 3-Dimensional (3D) Complex Convolutional Neural Networks (CCNN). These solutions provide improved capabilities for capturing temporal and spatial correlations, enhancing CSI prediction performance. In particular, they utilize the angle delay decomposition of previously observed CSI to predict the future one. In one architecture, the network, dubbed CSI Prediction Network (CSI-PNet), uses small kernels with circular padding to efficiently capture the correlation between propagation paths in the angle domain. This architecture can be further improved by the use of an attention-like model to vary the weights and enhance prediction performance adaptively. We also propose methods to enhance robustness to noise, time-, and frequency offsets. We tested these solutions using both 3GPP-compatible simulations and field measurement in commercial network. Our solutions demonstrate stable performance and significantly outperform several benchmarks, especially at low and medium speeds. They strike a balance between the performance and the architecture complexity, indicating suitability for actual implementation.”	https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10286287	Next Generation Communications
Optimal Preprocessing of WiFi CSI for Sensing Applications	Author: Vishnu V Ratnam et al. Hao Chen, Hao-Hsuan Chang, Abhishek Sehgal, Jianzhong Zhang Published: IEEE Transactions on Wireless Communications	Jun 30, 2023	Due its ubiquitous and contact-free nature, the use of WiFi infrastructure for performing sensing tasks has tremendous potential. However, the channel state information (CSI) measured by WiFi devices suffers from errors in both its gain and phase, which can significantly hinder sensing tasks. By analyzing these errors from different WiFi devices, a mathematical model for these gain and phase errors is developed in this work. Based on these models, several theoretically justified preprocessing algorithms for correcting such errors and, thus, obtaining clean CSI are presented. Simulation results show that at typical system parameters the developed algorithms for cleaning CSI can reduce noise by 40% and 200%, respectively, compared to baseline methods for gain correction and phase correction, without significantly impacting computational cost. The superiority of proposed methods is also validated in a real world testbed for respiration rate monitoring (an exemplary sensing task), where they improve estimation signal-to-noise ratio by 10% compared to baseline methods.	https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=10476327	Next Generation Communications
Artificial Intelligence Augmentation for Channel State Information in 5G and 6G	Author: Yang Li et al. Yeqing Hu, Retiree, Hyo Yol Park, Hayoung Yang, Tiexing Wang, Retiree, Ji-Yun Seol, Charlie Zhang Published: IEEE Wireless Communications	Jun 16, 2023	In this article, we present an artificial intelligence (AI) augmentation framework for physical layer communication applicable to both 5G and future 6G networks. The framework classifies the channel state information (CSI), and uses the classified CSI knowledge to optimally adapt transmission configurations and/or improve conventional signal processing modules in estimation. We demonstrate system benefits of such AI-augmentation in different use cases in 5G NR context, such as beamforming mode adaptation, reference signal (RS) resource optimization, link adaptation as well as channel estimation. The framework also allows extension to resolve future 6G challenges.	https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10077214	Next Generation Communications
To Wake-up or Not to Wake-up: Keyword False Alarm reduction by Succesive Refinement	Author: Yashas Malur Saidutta et al. Rakshith Srinivasa, Chinghua Lee, Chouchang Yang, Yilin Shen, Hongxia Jin Published: International Conference on Acoustics, Speech, and Signal Processing (ICASSP)	Jun 4, 2023	Keyword spotting systems continuously process audio steams to detect keywords. One of the most challenging aspects of such systems is the case when the system falsely registers a keyword (false alarm) despite the keyword not being uttered. In this paper, we propose a simple yet elegant solution that follows from the law of total probability. We show that existing deep keyword spotting mechanisms can be improved by successive refinement, where the system first classifies whether the input audio is speech or not, followed by whether the input is keyword-like or not, and finally classifies which keyword was uttered. We show across multiple models ranging from 13K parameters to 340K parameters, the successive refinement technique reduces false alarm by a factor of 3 on both held-out test dataset and out-of-domain (unseen) data. Further, our proposed approach is “plug-and-play and can applied to any baseline keyword spotting method.	https://arxiv.org/pdf/2304.03416.pdf	Artificial Intelligence
Snapshot Matching Masking for Improved PSD Estimation in Mask-Based Neural Beamforming for Multichannel Speech Enhancement	Author: Chinghua Lee et al. Chouchang Yang, Yilin Shen, Hongxia Jin Published: International Conference on Acoustics, Speech, and Signal Processing (ICASSP)	Jun 4, 2023	In multichannel speech enhancement (SE), time-frequency (T-F) mask-based neural beamforming algorithms take advantage of deep neural networks to predict T-F masks that represent speech/noise dominance, which are leveraged to estimate the speech/noise power spectral density (PSD) matrices that are subsequently utilized to obtain the spatial filter weights. However, in the literature most networks are trained to estimate some pre-defined masks, e.g., the ideal binary mask (IBM) and ideal ratio mask (IRM), which lack direct connection to the PSD matrices. In this paper, we propose a new masking strategy where the complex-valued U-Net is utilized to predict a novel T-F mask, namely the Snapshot Matching Mask (SMM), that aims to minimize the distance between the predicted signal snapshots and the true signal snapshots, thereby estimating the PSD matrices in a more systematic way. Performance of the SMM compared with existing IBM- and IRM-based beamformers is presented on several datasets to demonstrate its effectiveness for improved T-F mask-based beamforming.	https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10096213	Artificial Intelligence
MOUTH BREATHING DETECTION USING AUDIO CAPTURED THROUGH EARBUDS	Author: Tousif Ahmed et al. Mahbubur Rahman, Ebrahim Nematihosseinabadi, Jilong Kuang, Alex Gao Published: International Conference on Acoustics, Speech, and Signal Processing (ICASSP)	Jun 4, 2023	Mouth breathing has a significant adverse effect on people. For example, mouth breathing has been associated with sleep-related disorders and dental problems. Detecting mouth breathing in the daily environment during resting activities could be helpful for early intervention and reversing the negative impact. However, in previous works, mouth breathing detection in the everyday environment has not sufficiently been explored. This work presents a machine learning approach to detect mouth breathing using the audio captured by commercially available earbuds. Earbuds are becoming famous for health monitoring, and they could provide a more convenient method for detecting mouth breathing without requiring user attention. We conducted a data collection study with 30 participants to train the audio-based classifier. Our results suggest that a convolutional neural network based model can detect mouth breathing with 80.4\% accuracy.	https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10095793	Digital Health
IMPROVING HEART RATE AND HEART RATE VARIABILITY ESTIMATION FROM VIDEO THROUGH A HR-RR-TUNED FILTER	Author: Retiree et al. Li Zhu, Korosh Vatanparvar, Jilong Kuang, Alex Gao Published: International Conference on Acoustics, Speech, and Signal Processing (ICASSP)	Jun 4, 2023	This paper presents algorithms to improve the estimation of heart rate (HR) and heart rate variability (HRV) from smartphone video. The remote photoplethysmogram (rPPG) signals are first extracted from the videos recorded. Next, we proposed rPPG filtering adaptively tuned by HR and respiratory rate (RR) to better enhance source signal that modulates HR. Further, we also addressed the smartphone artifact— occasionally seen in smartphone videos—by introducing an algorithm designed to suppress these artifacts. HR and HRV accuracies are assessed on 22 subjects who were instructed to breath at seven different RRs. The mean absolute errors of HR and standard deviation of the NN intervals (SDNN) are found to be 1.36 ± 0.88 bpm and 23.47 ± 12.09 ms respectively. Finally, we also conducted experiments to highlight improvements in accuracies made by the proposed algorithms.	https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10096576	Digital Health
BREATHIE: ESTIMATING BREATHING INHALE EXHALE RATIO USING MOTION SENSOR DATA FROM COMMODITY EARBUDS	Author: Nafiul Rashid et al. Mahbubur Rahman, Tousif Ahmed, Jilong Kuang, Alex Gao Published: International Conference on Acoustics, Speech, and Signal Processing (ICASSP)	Jun 4, 2023	Breathing Inhale/Exhale (IE) ratio is one of the critical breathing biomarkers for pulmonary patients and healthy individuals. It can indicate the severity of the lung obstruction for chronic lung patients and help detect psycho-social stress for healthy individuals. With the advancement of the wearable technologies, common consumer wearables such as smartwatches offer breathing rate. However, IE ratio measurement is not available in consumer electronic devices till today. In this paper, we present a novel algorithm, BreathIE, to estimate breathing rate and IE ration using low-power motion sensor embedded in consumer grade earbuds. Moreover, our algorithm is adaptive which is capable of dynamically adjusting to the varying breathing duration based on the breathing habit of the user at run time. We conducted a study with 30 participants where we use both earbuds and a reference chestband device simultaneously. Experimental evaluation against the annotated reference data shows that, our algorithm can estimate breathing rate with a mean absolute error (MAE) of 2.48 breaths per minute and breathing IE ratio with 0.26 MAE while outperforming the state-of-the-art algorithms.	https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=10096084	Digital Health
Machine Learning-Assisted Codebook Design for MMSE Channel Estimation	Author: Yeqing Hu et al. Yang Li, Tiexing Wang, Charlie Zhang Published: IEEE International Conference on Communications (ICC)	May 28, 2023	Breathing rate is critical for the user’s respiratory health, stress, and fitness level. Unfortunately, breathing rates are hard to track outside the clinical context, requiring specialized devices. While the mobile and wearable device-based approach could be helpful, existing methods require heavy engagement from the user. Furthermore, they can be inaccurate in the presence of minute body motions and loud noises. This paper presents a robust multimodal approach to tracking the users breathing rate by using a signal-processing-based algorithm on motion sensors and a lightweight machine learning algorithm on acoustic sensors from the earbuds. A comprehensive user study with 30 participants shows that our system can calculate the breathing rate reasonably (Mean Absolute Error < 2 BPM) in controlled and uncontrolled settings with varying body motion and environmental noises. This work provides an essential direction toward developing continuous and passive breathing rate monitoring in the wild.	https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10283641	Next Generation Communications
LEARNING TO JOINTLY SHARE AND PRUNE WEIGHTS FOR GROUNDING BASED VISION AND LANGUAGE MODELS	Author: Shangqian Gao et al. Burak Uzkent, Yilin Shen, Hongxia Jin Published: International Conference on Learning Representation (ICLR)	May 1, 2023	Transformers have seen growing interest in processing different modalities, including language and images. As a result, we can process vision and language data using transformers that are architecturally similar. This feature of transformers provides us with several opportunities. This study explores weight sharing across two transformer backbones and within the same transformer backbone and prun-ing in a unified framework. More specifically, we investigate weight sharing and pruning for two components of the transformers: (1) Multi-Head Attention (MSA) and (2) Feed-Forward Network (FFN) layers. To jointly perform weight sharing and pruning, we propose to use a regularization term to align model weights and the desired structure during the pre-training step. The structure vectors of sharing and pruning are generated by using a hypernetwork, which can capture complex interactions between pruning and sharing across layers and modalities. The hypernetwork and model weights are trained iteratively so that the learned structure evolves along with model weights. After minimizing the proposed objective in the pre-training step, we perform weight sharing and pruning and fine-tune the model on downstream tasks. We perform experiments on vision and language tasks, including Referring Expression Comprehension (REC) and Visual Question Answering (VQA), using the state-of-the-art models: MDETR and GLIP. Our experiments show that we can reduce the size of the MDETR and GLIP by 35 − 40% by sharing and pruning MSA and FFN weights without significant loss in accuracy. 1 INTRODUCTION The dominant architecture in natural language processing (NLP) is Transformer Vaswani et al. (2017). Besides NLP, recent advance in computer vision shows that transformer based model, like ViT Dosovitskiy et al. (2021) or DeiT Touvron et al. (2020), can achieve similar or even better performance than convolutional neural networks (CNNs) on various tasks. As a result, it allows us to use architecturally similar models on cross-modal tasks with vision and language data. This setting naturally provides foundations to structurally share weights across different modalities. The advantage of weight sharing is that it encourages weight reuse and thus reduces the number of parameters while maintaining the model capacity to some extent. On the other hand, existing weight sharing techniques have many limitations. Most of them Lee et al. (2021); You et al. (2022); Lan et al. (2019); Reid et al. (2021) use manual designed sharing rules to share a whole layer or block, largely restricting the flexibility of weight sharing. This reduced flexibility can lead to drastic performance drop. To maximally utilize model parameters, we proposed to unify cross-modal sharing, layer-wise sharing and pruning, all in a single unified framework. Unlike previous works, the minimal structure of these operations is a weight vector instead of a whole layer or block, which drastically increases the flexibility of sharing and pruning. Instead of manually designed strategies, the position of sharing and pruning is learned in an end-to-end differentiable manner. To purse a better trade-off between the model performance and the parameter efficiency, we aim to maximize the flexibility by utilizing the structure of transformer backbones. If only cross-modal sharing is considered, there will be an upper bound for the compression rate (∼ 50%) when sharing all layers of one backbone for another one. Another direction is to share layers within a single 1	https://openreview.net/pdf?id=UMERaIHMwB3	Artificial Intelligence
Breathing Rate Tracking Using Earbuds In Uncontrolled Environments	Author: Tousif Ahmed et al. Mahbubur Rahman, Mohsin Ahmed, Ebrahim Nematihosseinabadi, Jilong Kuang, Alex Gao Published: International Conference on Human Factors in Computing Systems (CHI)	Apr 21, 2023	Breathing rate is critical for the user’s respiratory health, stress, and fitness level. Unfortunately, breathing rates are hard to track outside the clinical context, requiring specialized devices. While the mobile and wearable device-based approach could be helpful, existing methods require heavy engagement from the user. Furthermore, they can be inaccurate in the presence of minute body motions and loud noises. This paper presents a robust multimodal approach to tracking the users breathing rate by using a signal-processing-based algorithm on motion sensors and a lightweight machine learning algorithm on acoustic sensors from the earbuds. A comprehensive user study with 30 participants shows that our system can calculate the breathing rate reasonably (Mean Absolute Error < 2 BPM) in controlled and uncontrolled settings with varying body motion and environmental noises. This work provides an essential direction toward developing continuous and passive breathing rate monitoring in the wild.	https://dl.acm.org/doi/pdf/10.1145/3544548.3581265	Digital Health
GOHSP: A Unified Framework of Graph and Optimization-based Heterogeneous Structured Pruning for Vision Transformer	Author: Burak Uzkent et al. Yilin Shen, Hongxia Jin Published: National Conference on Artificial Intelligence (AAAI)	Jan 2, 2023	Transformers have seen growing interest on processing different modalities including language and images. As a result, we can process vision and language data using transformers that are architecturally similar. This feature of transformers provides us several opportunities. In this study, we explore weight sharing between two architecturally similar transformers for vision and language tasks. More specifically, we investigate sharing two main components of the transformers; (1) Multi-Head Attention (MSA), and (2) Feed-Forward Network (FFN) layers, across two backbones. To achieve this, we propose an additional objective that encourages the minimization of the difference of the MSA weights as well as FFN weights across two backbones. After minimizing the corresponding weights in two backbones, we perform weight sharing and fine-tune the model. We perform experiments on vision and language tasks including Referring Expression Comprehension and VQA using the state-of-the-art model, MDETR. Our experiments show that we can reduce the size of the MDETR by 35 − 40% by sharing MSA and FFN weights without significant loss in accuracy.	https://arxiv.org/pdf/2301.05345.pdf	Artificial Intelligence
Numerical Optimizations for Weighted Value Decomposition on Language Models	Author: Ting Hua et al. Yen-Chang Hsu, Felicity Wang, Retiree, Yilin Shen, Hongxia Jin Published: Conference on Empirical Methods in Natural Language Processing (EMNLP)	Dec 9, 2022	Singular value decomposition (SVD) is one of the most popular compression methods that approximate a target matrix with smaller matrices. However, standard SVD treats the parameters within the matrix with equal importance, which is a simple but unrealistic assumption. In real cases, the parameters of a trained neural network model affect the task performance unevenly, suggesting non-equal importance among the parameters. Therefore, this paper proposed Fisher information weighted Value Decomposition (FVD) to compress a neural network model with the awareness about parameter importance. Unlike standard SVD, FVD is a non-convex optimization problem that lacks a closed-form solution. Therefore, optimizing FVD is non-trivial. We systematically investigated multiple optimization strategies to tackle the problem and examined our method by compressing transformer-based language models. Further, we designed a metric to predict when the SVD may introduce a significant performance drop, and our FVD can be a rescue strategy. The extensive evaluations demonstrate that our FVD can perform comparable or even better with current SOTA methods in compressing Transformer-based language models. Also, the analysis of Transformer-blocks shows that our FVD can achieve significant performance improvements over SVD on the sub-structure factorization.	https://arxiv.org/pdf/2211.09718.pdf	Artificial Intelligence
Explainable Slot Type Attentions to Improve Joint Intent Detection and Slot Filling	Author: Kalpa Gunaratna et al. Vijay Srinivasan, Retiree, Hongxia Jin Published: Conference on Empirical Methods in Natural Language Processing (EMNLP)	Dec 7, 2022	Joint intent detection and slot filling is a key research topic in natural language understanding (NLU). Existing joint intent and slot filling systems compute features collectively for all slot types, and importantly, have no way to explain the slot filling model decisions. In this work, we propose a novel approach that: (i) learns to generate slot type specific features to improve accuracy and (ii) provides explanations of slot filling decisions for the first time in a joint NLU model. Further, the model is inherently explainable and does not need any post-hoc processing. We perform an additional constrained supervision using a set of binary classifiers to learn slot type specific features, thus ensuring appropriate attention weights are learned to explain slot filling decisions for utterances. We evaluate our approach on two widely used datasets and show accuracy improvements. Moreover, a detailed analysis is also provided for the exclusive slot explainability of our proposed model.	https://arxiv.org/pdf/2210.10227.pdf	Artificial Intelligence
Foreground-Specialized Model Imitation for Instance Segmentation	Author: Wenbo Li et al. Hongxia Jin Published: Asian Conference on Computer Vision (ACCV)	Dec 4, 2022	We leverage the knowledge distillation to address the object instance segmentation for robots with limited computational power. Instance segmentation is formulated as a multi-task learning problem involving object classification, localization and mask prediction. However, knowledge distillation is not well-suited to these sub-tasks except only one of them, i.e., multi-class object classification. To deal with this challenge, we introduce a novel distillation method where the teacher is a small foreground-specialized (FS) model. We train the FS instance segmentation teacher model using images with only foreground objects, i.e., background pixels are removed. So, the FS instance segmentation model is effective in object classification which is exactly what the distillation method is designed exclusively for. To accommodate the difference between inputs used by the teacher and student, we introduce a novel Foreground-Specialized model Imitation (FSI) method with two complementary module components. First, a reciprocal anchor box selection method is introduced to distill from the most informative output of the teacher model. Second, to embed the foreground-awareness in the students feature learning, we come up with two solutions by either adding a co-learned foreground segmentation branch or applying a soft feature mask. We conducted an extensive evaluation with the state-of-the-art one-stage object instance segmentation method YOLACT which is suitable for on-device inference. Experiment results on MS COCO and Pascal VOC datasets demonstrate that our method significantly outperforms knowledge distillation baselines in terms of both accuracy improvement and training efficiency.	https://openaccess.thecvf.com/content/ACCV2022/papers/Li_Foreground-Specialized_Model_Imitation_for_Instance_Segmentation_ACCV_2022_paper.pdf	Artificial Intelligence
BreatheBuddy: Tracking Real-time Breathing Exercises for Automated Bio-feedback Using Commodity Earbuds	Author: Mahbubur Rahman et al. Tousif Ahmed, Mohsin Ahmed, Retiree, Ebrahim Nematihosseinabadi, Jilong Kuang, Alex Gao Published: Mobile HCI	Oct 1, 2022	Breathing exercises reduce stress and improve overall mental well-being. There are various types of breathing exercises. Performing the exercises correctly may give the best outcome and doing it in wrong ways can sometimes have adverse effect. Providing real-time biofeedback can greatly improve the user experience in doing the right exercises in the right ways. In this paper, we present methods to passively track breathing biomarkers in real-time using wireless commodity earbuds and generate feedback on users breathing performance. We use the earbuds low-power accelerometer to generate a comprehensive set of breathing biomarkers including breathing phase, breathing rate, depth of breathing, and breathing symmetry. We have conducted studies where the subjects performed different types of guided breathing exercises while wearing the earbuds. Our algorithms detect breathing phases with ~88.88\% accuracy and estimate breathing rate with ~95\% accuracy. We further show that our algorithms can be used to generate biofeedback towards designing engaging smartphones user interactions that facilitate users to accurately perform various breathing exercises.	https://dl.acm.org/doi/pdf/10.1145/3546748	Digital Health
Real-Time Breathing Phase Detection Using Earbuds Microphone	Author: Retiree et al. Tousif Ahmed, Mahbubur Rahman, Mohsin Ahmed, Ebrahim Nematihosseinabadi, Jilong Kuang, Alex Gao Published: IEEE-EMBS International Conference on Biomedical and Health Informatics(BHI) and the Body Sensor Networks(BSN) Conferences (IEEE BHI & BSN)	Sep 27, 2022	Tracking breathing phases (inhale and exhale) outside the hospitals can offer significant health and wellness benefits to users. For example, the breathing phases can provide fine-grained breathing information for proper meditation or breathing exercises. While previous works use smartphones and smartwatches for tracking breathing phases, in this work, we use earbuds for breathing phase detection, which has the potential to be a better form factor for breathing exercises as it requires less user attention from the user. We propose a convolutional neural network-based algorithm for detecting breathing phases using the audio captured through the earbuds during guided breathing sessions. We conducted a user study with 30 participants in both lab and home environments to develop and evaluate our algorithm. Our algorithm can detect the breathing phases with 85% accuracy by taking only 500ms audio signal. Our work demonstrates the potential of using earbuds for tracking the breathing phases in real-time.	https://ieeexplore.ieee.org/document/9928520	Digital Health
Deep Audio Spectral Processing for Respiration Rate Estimation from Commodity Earbuds	Author: Mohsin Ahmed et al. Tousif Ahmed, Mahbubur Rahman, Retiree, Jilong Kuang, Alex Gao Published: IEEE-EMBS International Conference on Biomedical and Health Informatics(BHI) and the Body Sensor Networks(BSN) Conferences (IEEE BHI & BSN)	Sep 27, 2022	Breathing rate is an important health biomarker and a vital indicator for health and fitness. With smart earbuds gaining popularity as a commodity device, recent works have demonstrated the potential for monitoring breathing rate using such earable devices. In this work, we use spectrograms from breathing cycle audio signals captured using earbuds as a spectral feature to train a deep convolutional neural network to infer respiration rate with high accuracy. Using novel earbud audio data collected from 30 subjects with both controlled breathing at a wide range (from 5 upto 45 breaths per minute), and uncontrolled natural breathing from 7-day home deployment, experimental results demonstrate that our model can estimate respiration rate with 0.77 MAE for controlled breathing and with 0.99 MAE for at-home natural breathing.	https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9928461	Digital Health
Respiration Rate Estimation from Remote PPG via Camera in Presence of Non-Voluntary Artifacts	Author: Korosh Vatanparvar et al. Migyeong Gwak, Li Zhu, Jilong Kuang, Alex Gao Published: IEEE-EMBS International Conference on Biomedical and Health Informatics(BHI) and the Body Sensor Networks(BSN) Conferences (IEEE BHI & BSN)	Sep 27, 2022	Contactless measurement of vitals has been seen as a promising alternative to contact sensors for monitoring of health condition. In this paper, we focus on respiration rate (RR) as one of the fundamental biomarkers of a persons cardio and pulmonary activities. Remote RR estimation has gained attraction due to its various potential applications; use of RGB cameras to extract remote photoplethysmography (PPG) signal from subjects face has been debated as one of the enabling technologies for remote RR estimation. The technology is challenged with respect to wide range of RR and non-voluntary motion in uncontrolled settings. We propose a novel methodology to enhance the quality of respiration signal and remove artifacts from the remote PPG signal, which results in reducing the MAE from 4.5bpm to 2.8bpm for RR in range of 5-25bpm. We validate the accuracy of our methodology using smartphone video recordings of 30 subjects with uniform distribution of skin tone.	https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9928485	Digital Health
Enhancement of Remote PPG and Heart Rate Estimation with Optimal Signal Quality Index	Author: Jiyang Li et al. Korosh Vatanparvar, Li Zhu, Jilong Kuang, Alex Gao Published: IEEE-EMBS International Conference on Biomedical and Health Informatics(BHI) and the Body Sensor Networks(BSN) Conferences (IEEE BHI & BSN)	Sep 27, 2022	With the popularity of non-invasive vital signs detection, remote photoplethysmography (rPPG) is drawing attention in the community. Remote PPG, or rPPG signals are extracted by a contactless manner that is more prone to artifacts than PPG signals collected by wearable sensors. To develop a robust and accurate system to estimate heart rate (HR) from rPPG signals, we propose a novel real-time dynamic ROI tracking algorithm that is applicable to slight motions and light changes. Furthermore, we develop and include the signal quality index (SQI) to improve the HR estimation accuracy. Studies have developed optimal SQI for PPG signals but not rPPG signals, we select and test six SQIs: Perfusion, Kurtosis, Skewness, Zerocrossing, Entropy, and signal-to-noise ratio (SNR) on 124 rPPG sessions from 30 participants wearing masks. Based on the mean absolute error (MAE) of HR estimation, the optimal SQI is selected and validated by Mann–Whitney U test (MWU). Lastly, we show that the HR estimation accuracy is improved by 29% after removing outliers decided by the optimal SQI, and the best result achieves the MAE of 2.308 bpm.	https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9928503	Digital Health
IMU-based Cough Detection With Lightweight Template Matching Models	Author: Ebrahim Nematihosseinabadi Published: IEEE-EMBS International Conference on Biomedical and Health Informatics(BHI) and the Body Sensor Networks(BSN) Conferences (IEEE BHI & BSN)	Sep 10, 2022	Cough is a major symptom of respiratory-related diseases. There exists a tremendous amount of work in detecting coughs from audio but there has been no effort to identify coughs from solely inertial measurement unit (IMU). Coughing causes motion across the whole body and especially on the neck and head. Therefore, head motion data during coughing captured by a head-worn IMU sensor could be leveraged to detect coughs using a template matching algorithm. In time series template matching problems, K-Nearest Neighbors (KNN) combined with elastic distance measurement (esp. Dynamic Time Warping (DTW)) achieves outstanding performance. However, it is often regarded as prohibitively time-consuming. Nearest Centroid Classifier is thereafter proposed. But the accuracy is comprised of only one centroid obtained for each class. Centroid-based Classifier performs clustering and averaging for each cluster, but requires manually setting the number of clusters. We propose a novel self-tuning multi-centroid template-matching algorithm, which can automatically adjust the number of clusters to balance accuracy and inference time. Through experiments conducted on synthetic datasets and a real-world earbud-based cough dataset, we demonstrate the superiority of our proposed algorithm and present the result of cough detection with a single accelerometer sensor on the earbuds platform.	https://arxiv.org/pdf/2109.00630.pdf	Digital Health
Instance Contour Adjustment via Structure-driven CNN	Author: Yi Wei Published: European Conference on Computer Vision (ECCV)	Jul 31, 2022	Instance contour adjustment is desirable in image editing, which allows the contour of an instance in a photo to be either dilated or eroded via user sketching. This imposes several requirements for a favorable method in order to generate meaningful textures while preserving clear user-desired contours. Due to the ignorance of these requirements, the off-the-shelf image editing methods herein are unsuited. Therefore, we propose a specialized two-stage method. The first stage extracts the structural cues from the input image, and completes the missing structural cues for the adjusted area. The second stage is a structure-driven CNN which generates image textures following the guidance of the completed structural cues. In the structure-driven CNN, we redesign the context sampling strategy of the convolution operation and attention mechanism such that they can estimate and rank the relevance of the contexts based on the structural cues, and sample the top-ranked contexts regardless of their distribution on the image plane. Thus, the meaningfulness of image textures with clear and user-desired contours are guaranteed by the structure-driven CNN. In addition, our method does not require any semantic label as input, which thus ensures its well generalization capability. We evaluate our method against several baselines adapted from the related tasks, and the experimental results demonstrate its effectiveness.	https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136670142.pdf	Artificial Intelligence
Table2Graph: Transforming Tabular Data to Unified Weighted Graph	Author: Rui Chen et al. Li Li, Soo-Hyun Choi, Xia Hu Published: International Joint Conference on Artificial Intelligence (IJCAI)	Jul 23, 2022	Learning useful interactions between input features is crucial for tabular data modeling. Recent efforts start to explicitly model the feature interactions with graph, where each feature is treated as an individual node. However, the existing graph construction methods either heuristically formulate a fixed feature-interaction graph based on specific domain knowledge, or simply apply attention function to compute the pairwise feature similarities for each sample. While the fixed graph may be sub-optimal to downstream tasks, the sample-wise graph construction is time-consuming during model training and inference. To tackle these issues, we propose a framework named Table2Graph to transform the feature interaction modeling to learning a unified graph. Represented as a probability adjacency matrix, the unified graph learns to model the key feature interactions shared by the diverse samples in the tabular data. To well optimize the unified graph, we employ the reinforcement learning policy to capture the key feature interactions stably. A sparsity constraint is also proposed to regularize the learned graph from being overly-sparse/smooth. The experimental results in a variety of real-world applications demonstrate the effectiveness and efficiency of our Table2Graph, in terms of the prediction accuracy and feature interaction detection.	https://www.ijcai.org/proceedings/2022/0336.pdf	Mobile Platform & Solutions
A New Concept of Knowledge based Question Answering (KBQA) System using Multiple Reasoning Paths	Author: Yu Wang Published: Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL)	Jul 21, 2022	Knowledge based question answering (KBQA) is a complex task for natural language understanding. Many KBQA approaches have been proposed in recent years, and most of them are trained based on labeled reasoning path. This hinders the system’s performance as many correct reasoning paths are not labeled as ground truth, and thus they cannot be learned. In this paper, we introduce a new concept of KBQA system which can leverage multiple reasoning paths’ information and only requires labeled answer as supervision. We name it as Mutliple Reasoning Paths KBQA System (MRPQA). We conduct experiments on several benchmark datasets containing both singlehop simple questions as well as muti-hop complex questions, including WebQuestionSP (WQSP), ComplexWebQuestion-1.1 (CWQ), and PathQuestion-Large (PQL), and demonstrate strong performance.	https://aclanthology.org/2022.naacl-main.294.pdf	Artificial Intelligence
Joint phase-time arrays: a paradigm for frequency-dependent analog beamforming in 6G	Author: Vishnu Vardhan Ratnam et al. Jianhua Mo, Boon Loong Ng, Ahmad AlAmmouri, Charlie Zhang Published: IEEE Access	Jul 12, 2022	Hybrid beamforming is an attractive solution to build cost-effective and energy-efficient transceivers for millimeter-wave and terahertz systems. However, conventional hybrid beamforming techniques rely on analog components that generate a frequency flat response such as phase-shifters and switches, which limits the flexibility of the achievable beam patterns. As a novel alternative, this paper proposes a new class of hybrid beamforming called Joint phase-time arrays (JPTA), that additionally use true-time delay elements in the analog beamforming to create frequency-dependent analog beams. Using as an example two important frequency-dependent beam behaviors, the numerous benefits of such flexibility are exemplified. Subsequently, the JPTA beamformer design problem to generate any desired beam behavior is formulated and near optimal algorithms to the problem are proposed. Simulations show that the proposed algorithms can outperform heuristics solutions for JPTA beamformer update. Furthermore, it is shown that JPTA can achieve the two exemplified beam behaviors with 1 radio-frequency chain, while conventional hybrid beamforming requires the radio-frequency chains to scale with antennas to achieve similar performance. Finally, a wide range problems to further tap into the potential of JPTA are also listed as future directions.	https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9826716	Next Generation Communications
Detecting Physiological Stress Using Earbuds	Author: Mahbubur Rahman et al. Viswam Nathan, Tousif Ahmed, Retiree, Jilong Kuang, Alex Gao Published: Engineering in Medicine and Biology Conference (EMBC)	Jul 11, 2022	Continuous stress exposure negatively impacts mental and physical well-being. Stress arousal affects heart beat frequency, changes breathing pattern, and peripheral temperature, among several other bodily responses. Traditionally the stress detection is performed by collecting bio-signals such as electrocardiogram (ECG), breathing, and galvanic skin response using uncomfortable chestbands or chestpatches. In this study, we use earbuds that passively measure photoplethysmograph (PPG), core body temperature, and inertial measurements simultaneously. We conducted a lab study exposing 18 test subjects to Trier Social Stress Test (TSST) and going through several relaxing activities including listening to functional music and progressive muscle relaxation while measuring physiological signal using earbuds. Moreover, we have simultaneously collected PPG, ECG, impedance cardiogram (ICG), and blood pressure using gold-standard reference devices. We show that earbuds can reliably capture heart rate and heart rate variability. We further show that earbud signals can be used to classify the physiological stress arousal with 91.30\% recall and 80.52\% precision using a random forest classifier with leave-one-subject-out cross-validation.	https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9871569	Digital Health
Unsupervised Remote Photoplethysmograph and Heart Rate Estimation by Dynamic Region of Interest Tracking	Author: Retiree et al. Korosh Vatanparvar, Li Zhu, Jilong Kuang, Alex Gao Published: Engineering in Medicine and Biology Conference (EMBC)	Jul 11, 2022	Remote photoplethysmography (PPG) estimates vital signs by measuring changes in the reflected light from the human skin. Compared with traditional PPG techniques, remote PPG enables contactless measurement and reduced cost. In this paper, we propose a novel unsupervised method to extract remote PPG signals and heart rate from videos. We propose an algorithm to dynamically track regions of interest (ROIs) and combine the signals from all ROIs based on signal qualities. To maintain a stable frame rate and accuracy, we propose a dynamic down-sampling approach, which makes our system robust to the different video resolutions and user-camera distances. We also propose the strategy of waiting time adaptation for HR measurements, which can achieve comparable accuracy in HR estimation while reduce the average waiting time. To test the accuracy of the proposed system, we have collected data from 30 subjects with facial masks. Experimental results show that the proposed system can achieve 3.0bpm mean absolute error in HR estimation.	https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9871722	Digital Health
Deep Multivariate Domain Translation for Device Invariant Pulmonary Patient Identification from Cough and Speech Sounds	Author: Mohsin Ahmed et al. Korosh Vatanparvar, Jilong Kuang, Alex Gao Published: Engineering in Medicine and Biology Conference (EMBC)	Jul 11, 2022	audio based machine learning models to infer pulmonary health, exacerbation and activity. A major challenge to widespread usage and deployment of such pulmonary health monitoring audio models is to maintain accuracy and robustness across a variety of commodity devices, due to the effect of device heterogeneity. Because of this phenomenon, pulmonary audio models developed with data from one type of device perform poorly when deployed on another type of device. In this work, we propose a framework incorporating a multivariate deep neural network regressor as a feature translator from the source device domain to the target device domain. Our empirical and extensive experiments with data from 131 real pulmonary patients and healthy controls show that our framework can recover upto 66.67% of the accuracy lost due to device heterogeneity for two different pulmonary activity based person identification tasks with two common mobile and wearable devices: smartphone and smartwatch.	https://ieeexplore.ieee.org/document/9871967	Digital Health
Motion-based Respiratory Rate Estimation with Motion Artifact Removal Technique in a Facial Video with an RGB Camera	Author: Migyeong Gwak et al. Korosh Vatanparvar, Jilong Kuang, Alex Gao Published: Engineering in Medicine and Biology Conference (EMBC)	Jul 11, 2022	Respiratory rate (RR) is a significant indicator of health conditions. Remote contactless measurement of RR is gaining popularity with recent respiratory tract infection awareness. Among various methods of contactless RR measurement, a frontal face video with an RGB camera can be used to obtain an instantaneous RR. In this paper, we introduce an RR estimation based on the subtle motion of head or upper chest captured on an RGB camera. Motion-based respiratory monitoring allows us to acquire RR from individuals even with partial face covering, such as glasses or a face mask. However, motion-based RR estimation is vulnerable to the subject’s voluntary movement. In this work, adaptive selection between face and chest regions plus a motion artifact removal technique enable us to obtain a clean respiratory signal from facial video recordings. The average mean absolute error (MAE) for both controlled and natural breathing is 1.95 BPM using head motion only and 1.28 BPM using chest motion only. Our results demonstrate the possibility of continuous monitoring of breathing rate in real-time with any personal device equipped with camera, such as a laptop or smartphone.	https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9871231	Digital Health
Utilizing Deep Learning on Limited Mobile Speech Recordings for Detection of Obstructive Pulmonary Disease	Author: Viswam Nathan et al. Korosh Vatanparvar, Jilong Kuang Published: Engineering in Medicine and Biology Conference (EMBC)	Jul 11, 2022	Passive assessment of obstructive pulmonary disease has gained substantial interests over the past few years in the mobile and wearable computing communities. One of the promising approaches is speech-based pulmonary assessment where spontaneous or scripted speech is used to evaluate an individuals pulmonary conditions. Recent work in speech-based pulmonary assessment approach has shown promising results in pulmonary disease detection. However, this approach heavily relies on the accuracy of speech activity detection and a handful number of specific features. Recently, the application of deep learning has shown promising results in the domain of activity recognition involving time series data. In this paper, we present a deep learning approach for detecting obstructive pulmonary disease.	https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9871980	Digital Health
Lite-MDETR: A Lightweight Multi-Modal Detector	Author: Qian Lou et al. Yen-Chang Hsu, Burak Uzkent, Ting Hua, Yilin Shen, Hongxia Jin Published: Computer Vision and Pattern Recognition (CVPR)	Jun 21, 2022	Recent multi-modal detectors based on transformers and modality encoders have successfully achieved impressive results on end-to-end visual object detection conditioned on a raw text query. However, they require a large model size and an enormous amount of computations to achieve high performance, which makes it difficult to deploy mobile applications that are limited by tight hardware resources. In this paper, we present a Lightweight modulated detector, Lite-MDETR, to facilitate efficient end-to-end multi-modal understanding on mobile devices. The key primitive is that Dictionary-Lookup-Transformormations (DLT) is proposed to replace Linear Transformation (LT) in multi-modal detectors where each weight in Linear Transformation (LT) is approximately factorized into a smaller dictionary, index, and coefficient. This way, the enormous linear projection with weights is converted into efficient linear projection with dictionaries, a few lookups and scalings with indices and coefficients. DLT can be applied to any pretrained multi-modal detectors, removing the need to perform expensive training from scratch. To tackle the challenging training of DLT due to non-differentiable index, we convert the index and coefficient into a sparse matrix, train this sparse matrix during the fine-tuning phase, and recover it back to index and coefficient during the inference phase. Our experiments on phrase grounding, referring expression comprehension and segmentation, and VQA show that our Lite-MDETR achieves similar accuracy as the prior multimodal detectors with up to ∼ 4.1× model size reduction.	https://openaccess.thecvf.com/content/CVPR2022/papers/Lou_Lite-MDETR_A_Lightweight_Multi-Modal_Detector_CVPR_2022_paper.pdf	Artificial Intelligence
Reducing FDD MMU form factor with active cancellation	Author: Khurram Muhammad et al. Jin Yuan, Zhang Shaomin, Chance Tarver, Xinguang Xu, Yu Liu, Jie Li, Junghwan Moon, Matthew Tonnemacher, Gary Xu, Charlie Zhang Published: IEEE/MTT-S International Microwave Symposium (IMS)	Jun 19, 2022	In this paper, a multi-channel self-interference cancellation (SIC) technique is proposed to reduce the size of AWS/PCS dual-band 5G FDD massive MIMO base station. Combining two bands with small frequency offset such as PCS and AWS bands in a base station requires modification of frequency duplex cavity filters to provide a wide passband with extremely narrow gap between the passband and the stop band. Such duplexer is hard to realize. We propose a novel multi-channel SIC to allow the dual-band operation with the same form factor as the single-band base station. To verify the feasibility of this idea, a proof of concept (PoC) prototype is developed to demonstrate the feasibility of multi-channel SIC to this problem.	https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9865246	Next Generation Communications
COUGHTRIGGER: EARBUDS IMU BASED COUGH DETECTION ACTIVATOR USING AN ENERGY-EFFICIENT SENSITIVITY-PRIORITIZED TIME SERIES CLASSIFIER	Author: Ebrahim Nematihosseinabadi et al. Tousif Ahmed, Minh Dinh, Jilong Kuang, Alex Gao Published: International Conference on Acoustics, Speech, and Signal Processing (ICASSP)	May 23, 2022	Persistent coughs are a major symptom of respiratory-related diseases. Increasing research attention has been paid to detecting coughs using wearables. Among all types of sensors utilized, microphone is most widely used to detect coughs. However, the intense power consumption needed to process audio signals prevents acoustic sensors from being continuously powered on battery-limited commercial wearable products, such as earbuds. In this work, we present CoughTrigger, which utilizes a lower-power sensor, an inertial measurement unit (IMU), in earbuds as a cough detection activator to trigger a higher-power sensor for audio processing and classification. It is able to run all-the-time as a standby service with minimal battery consumption and trigger the audio-based cough detection when a candidate cough is detected from the IMU. Besides, the use of IMU brings the benefit of improved specificity of cough detection. Experiments are conducted on 45 subjects and achieved 90% sensitivity and 60% specificity for cough detection activation.	https://arxiv.org/pdf/2111.04185.pdf	Artificial Intelligence
UbiLung: Multi-modal Passive Sensing for Lung Health Assessment	Author: Ebrahim Nematihosseinabadi et al. Viswam Nathan, Korosh Vatanparvar, Tousif Ahmed, Mahbubur Rahman, Jilong Kuang, Alex Gao Published: International Conference on Acoustics, Speech, and Signal Processing (ICASSP)	May 23, 2022	Spirometry test has been the gold standard for the measurement of a pulmonary patient’s lung function for decades. Spirometry is generally done in the hospital setting, where patients need to forcefully blow air into the spirometer’s tubes under the guidance of clinicians. Such a procedure is time-consuming, cumbersome, and extremely effort-dependent. Recent advances in ubiquitous computing investigate the feasibility of leveraging commodity devices such as smartphones to replace the standard clinical spirometry test. However, existing solutions are still demanding, usually requiring users to complete a series of tasks such as blowing towards a microphone, and could potentially introduce risks such as dizziness and shortness of breath due to the forced blowing. More importantly, the test is still dependent on the user’s effort which naturally degrades when no supervision exists. We propose UbiLung, a new method that leverages passively sensed modalities for lung function estimation. Such a method relies on the physiological correlation of the introduced passive modalities to the lung function, which consequently obviates the need for active user engagement yet can provide an accurate effort-independent measurement. We focus on sensor modalities that are feasible in passive sensing: cough and speech sound collected from microphones and blood volume pulse (BVP) signals collected via photoplethysmography (PPG) sensors. Through feature extraction and selection, our best machine learning models achieve mean absolute error of 11.1% for estimation of FVC perdicted percentage, 11.8% for FEV1 predicted percentage, and 7.4% for FEV1/FVC prediction. It significantly outperforms the baseline, with an average relative improvement of 13.9%. The generalizability of the model was further verified by an average improvement of 7.8% against baselines when applying the model directly on a completely separate and independent dataset. Moreover, we investigated important confounding factors (e.g., age, gender, and smoking behavior) and augment the results by 4.5% on average. In addition to the parameter estimation, we also trained models for a series of pulmonary disease diagnosis tasks. Our method achieves a F1-score of 0.982 on healthy v.s. diseased, 0.881 on obstructive v.s. non-obstructive, 0.854 on COPD v.s. asthma, and 0.892 on non-severe v.s. severe classification. Our technique is the first multi-modal effort-independent passive estimation of lung function, which could shed light on the passive monitoring of both pulmonary patients and general population.	https://ieeexplore.ieee.org/document/9746614	Digital Health
Beam Management with Orientation and RSRP using Deep Learning for Beyond 5G Systems	Author: Khuong Nhat Nguyen et al. Anum Ali, Jianhua Mo, Boon Loong Ng, Vutha Va, Charlie Zhang Published: IEEE International Conference on Communications (ICC)	May 16, 2022	Beam management (BM), i.e., the process of finding and maintaining a suitable transmit and receive beam pair, can be challenging, particularly in highly dynamic scenarios. Side-information, e.g., orientation, from on-board sensors can help in the user equipment (UE) BM. In this work, we use the orientation information coming from inertial measurement unit (IMU) for effective BM. We use a data-driven strategy and fuse the reference signal received power (RSRP) information with orientation information using an artificial neural network (ANN). Simulation results show that the proposed strategy performs better than the conventional BM and an orientation-assisted BM strategy that utilizes particle filter in another study. Specifically, the proposed data-driven strategy improves the beam-prediction accuracy up to 34% and reduces mean reference signal received power (RSRP) loss caused by sub-optimal beam-selection by up to 4.2 dB when the UE has fast rotation speed.	https://arxiv.org/pdf/2202.02247.pdf	Next Generation Communications
PolarDenseNet: A Deep Learning Model for CSI Feedback in MIMO Systems	Author: Pranav Madadi et al. Jeongho Jeon, Joonyoung Cho, Caleb Lo, Juho Lee, Charlie Zhang Published: IEEE International Conference on Communications (ICC)	May 16, 2022	In multiple-input multiple-output (MIMO) systems, the high-resolution channel information (CSI) is required at the base station (BS) to ensure optimal performance, especially in the case of multi-user MIMO (MU-MIMO) systems. In the absence of channel reciprocity in frequency division duplex (FDD) systems, the user needs to send the CSI to the BS. Often the large overhead associated with this CSI feedback in FDD systems becomes the bottleneck in improving the system performance. In this paper, we propose an AI-based CSI feedback based on an auto-encoder architecture that encodes the CSI at UE into a low-dimensional latent space and decodes it back at the BS by effectively reducing the feedback overhead while minimizing the loss during recovery. Our simulation results show that the AI-based proposed architecture outperforms the state-of-the-art high-resolution linear combination codebook using the DFT basis adopted in the 5G New Radio (NR) system.	https://arxiv.org/pdf/2202.01246.pdf	Next Generation Communications
End-to-end 6G Terahertz Wireless Platform with Adaptive Transmit and Receive Beamforming	Author: Shadi Abu-Surra et al. Won Suk Choi, SungTae Choi, Eunyoung Seok, Dongjoo Kim, Navneet Sharma, Siddharth Advani, Vitali Loseu, KITAEK BAE, ILJU NA, Gary Xu, Charlie Zhang Published: IEEE International Conference on Communications (ICC)	May 16, 2022	6G is envisioned to provide ultimate experience for all through hyper-connectivity involving humans and everything, with unprecedented requirements and expectations [1]. In this vision, terahertz (THz) technology is a leading candidate to realize the 6G requirements. This paper presents the latest development and results of a terahertz wireless prototyping platform, which is being developed in Samsung research lab. The platform currently supports real-time transmission of 6 Gbps of data over a 2 GHz channel centered around 135 GHz with adaptive beamforming at the transmitter and receiver. The modem is designed to handle data-rate up to 36 Gbps, supports two MIMO streams, and aggregates two 2GHz channels. This paper also presents the specifications of the current RF units and discusses the challenges faced during the design and fabrication of these units.	https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9814579	Next Generation Communications
Atrial Fibrillation Detection and Atrial Fibrillation Burden Estimation via Wearables	Author: Li Zhu et al. Viswam Nathan, Jilong Kuang, Jacob Kim, Alex Gao Published: IEEE Journal of Biomedical and Health Informat	May 1, 2022	Atrial Fibrillation (AF) is an important cardiac rhythm disorder, which if left untreated can lead to serious complications such as a stroke. AF can remain asymptomatic, and it can progressively worsen over time; it is thus a disorder that would benefit from detection and continuous monitoring with a wearable sensor. Here, we develop an AF detection algorithm, deploy it on a smartwatch, prospectively and comprehensively validate its performance on a real-world population that included patients diagnosed with AF. The algorithm showed a sensitivity of 87.8% and a specificity of 97.4% over every 5-minute segment of PPG evaluated. Furthermore, we introduce novel algorithm blocks and system designs to increase the time of coverage and monitor for AF even during periods of motion noise and other artifacts that would be encountered in daily-living scenarios. An average of 67.8% of the entire duration the patients wore the smartwatch produced a valid decision. Finally, we present the ability of our algorithm to function throughout the day and estimate the AF burden, as a first-of-this-kind measure using wearable sensor, showing 98% correlation with the ground truth and an average error of 6.2%. Authors from UCSF: Robert Avram, 10% Jeffrey Olgin, 10%	https://ieeexplore.ieee.org/document/9633021	Digital Health
An Information Fusion Approach to Learning With Instance-Dependent Label Noise	Author: Li Li et al. Rui Chen, Soo-Hyun Choi, Xia Hu Published: International Conference on Learning Representation (ICLR)	Apr 25, 2022	Instance-dependent label noise (IDN) widely exists in real-world datasets and usually misleads the training of deep neural networks. Noise transition matrix (i.e., the probability that clean labels flip into noisy labels) is used to characterize the label noise and achieves statistically consistent classifiers for underlying distribution that the data belongs to. However, most of instances are long-tail, i.e., the number of appearance for each instance is usually limited, which leads to the gap between underlying distribution and empirical distribution, and model degeneration. To mitigate the distribution mismatch problem, we propose posterior transition matrix to posteriorly model label noise given limited observed noisy labels achieving statistically consistent classifiers for underlying and empirical distribution}. Note that even if the instance is corrupted by the same noise transition matrix, the intrinsic randomness incurs to different noisy labels, and thus requires different correction methods. Motivated by this observation, we propose an Information Fusion (IF) approach to fine-tune the noise transition matrix based on estimated posterior transition matrix. Specifically, we adopt the noisy labels and model predicted probability to estimate posterior transition matrix and then correct the noise transition matrix in forward propagation. Empirical evaluations on synthetic and real-world datasets demonstrate that our method is superior to the state-of-the-art approaches, and achieve more stable training for learning from the instance-dependent label noise.	https://openreview.net/pdf?id=ecH2FKaARUp	Artificial Intelligence
Language model compression with weighted low-rank factorization	Author: Yen-Chang Hsu et al. Ting Hua, Sung-En Chang, Qian Lou, Yilin Shen, Hongxia Jin Published: International Conference on Learning Representation (ICLR)	Apr 25, 2022	Factorizing a large matrix into small matrices is a popular strategy for model compression. Singular value decomposition (SVD) plays a vital role in this compression strategy, approximating a learned matrix with fewer parameters. However, SVD minimizes the squared error toward reconstructing the original matrix without gauging the importance of the parameters, potentially giving a larger reconstruction error for those who affect the task accuracy more. In other words, the optimization objective of SVD is not aligned with the task accuracy. In this work, we propose using Fisher information to weigh the importance of parameters affecting the model prediction, then perform a weighted SVD to factorize the learned matrices of a neural network model. Although our factorized matrices are not necessary to have a smaller reconstruction error, they retain better task accuracy. We perform analysis with the transformer-based language models, showing our weighted SVD significantly reduces the misaligned optimization objectives between low-rank factorization and task accuracy. The evaluation of compressing compact models shows our method can further reduce 9% to 30% parameters without affecting task accuracy.	https://openreview.net/pdf?id=uPv9Y3gmAI5	Artificial Intelligence
CSI Feedback for Distributed MIMO	Author: Gilwon Lee et al. Md Saifur Rahman, Eko Onggosanusi Published: IEEE Wireless Communications and Networking Conference (WCNC)	Apr 10, 2022	In this paper, we consider a distributed multi-input-multi-output (D-MIMO) system wherein multiple radio remote heads (RRHs) distributed in a cell are connected with a single baseband unit. To enable coherent joint transmission from multiple RRHs in the D-MIMO system, we propose several channel state information (CSI) codebooks as candidates for enhancements in the context of 3rd Generation Partnership Project (3GPP) 5G New Radio (NR) standardization. The proposed codebooks are developed based on the 5G Release-16 Type-II CSI codebook framework. In addition, we propose dynamic RRH selection (DRS) methods that are able to obtain performance gain and reduce the amount of feedback by sending the CSI only for the selected RRHs having dominant channel qualities. System-level simulation (SLS) results under realistic scenarios are provided to validate the potential of the proposed CSI codebooks and DRS methods	https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9771853	Next Generation Communications
DictFormer: Tiny Transformer with Shared Dictionary	Author: Qian Lou et al. Ting Hua, Yen-Chang Hsu, Yilin Shen, Hongxia Jin Published: International Conference on Learning Representation (ICLR)	Mar 10, 2022	We introduce DictFormer with efficient shared dictionary to provide a compact, fast, and accurate transformer model. DictFormer significantly reduces the redundancy in the transformer’s parameters by replacing the prior transformer’s parameters with compact, shared dictionary, few unshared coefficients and indices. Also, DictFormer enables faster computations since expensive weights multiplications are converted into cheap shared look-ups on dictionary and few linear projections. Training dictionary and coefficients are not trivial since indices used for looking up dictionary are not differentiable. We adopt a sparse-constraint training with l1 norm relaxation to learn coefficients and indices in DictFormer. DictFormer is flexible to support different model sizes by dynamically changing dictionary size. Compared to existing lightweight Transformers, DictFormer consistently reduces model size over Transformer on multiple tasks, e.g., machine translation, abstractive summarization, and language modeling. Extensive experiments show that DictFormer reduces 3.6× to 8.9× model size with similar accuracy over multiple tasks, compared to Transformer.	https://openreview.net/pdf?id=GWQWAeE9EpB	Artificial Intelligence
ISEEQ: Information Seeking Question Generation using Dynamic Meta-Information Retrieval and Knowledge Graphs	Author: Kalpa Gunaratna et al. Vijay Srinivasan, Hongxia Jin Published: National Conference on Artificial Intelligence (AAAI)	Feb 22, 2022	Conversational Information Seeking (CIS) is a relatively new research area within conversational AI that attempts to seek information from end-users in order to understand and satisfy users’ needs. If realized, such a system has far-reaching benefits in the real world; for example, a CIS system can assist clinicians in pre-screening or triaging patients in healthcare. A key open sub-problem in CIS that remains unaddressed in the literature is generating Information Seeking Questions (ISQs) based on a short initial query from the enduser. To address this open problem, we propose Information SEEking Question generator (ISEEQ), a novel approach for generating ISQs from just a short user query, given a large text corpus relevant to the user query. Firstly, ISEEQ uses a knowledge graph to enrich the user query. Secondly, ISEEQ uses the knowledge-enriched query to retrieve relevant context passages to ask coherent ISQs adhering to a conceptual flow. Thirdly, ISEEQ introduces a new deep generative adversarial reinforcement learning-based approach for generating ISQs. We show that ISEEQ can generate high-quality ISQs to promote the development of CIS agents. ISEEQ significantly outperforms comparable baselines on five ISQ evaluation metrics across four datasets having user queries from diverse domains. Further, we argue that ISEEQ is transferable across domains for generating ISQs, as it shows the acceptable performance when trained and tested on different pairs of domains. The qualitative human evaluation confirms ISEEQ-generated ISQs are comparable in quality to human-generated questions and outperform the best comparable baseline.	https://arxiv.org/pdf/2112.07622.pdf	Artificial Intelligence
Model-driven Machine Learning Approaches for Mobility Classification in Intelligent 5G Network	Author: Tiexing Wang et al. Yeqing Hu, Yang Li, Junmo Sung, Rui Wang, Charlie Zhang Published: IEEE Wireless Communications and Networking Conference (WCNC)	Dec 31, 2021	Channel information is essential to unleash the benefits of 5G New Radio (NR) by enabling network intelligence that adapts transmissions to users’ channels. In this paper, we propose model-driven feature design and use support vector machine to classify users’ speed range. Our model-driven features are designed based on stochastic channel modeling. Multiple features are derived from time-domain cross-correlation and time-domain auto-correlation function of the sounding reference signals. The classifier is trained and verified with extensive standard compliant simulation channels at different SNR levels and speeds, and attains greater than 90% accuracy.	https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9771678	Next Generation Communications
SAFENet: A Secure, Accurate and Fast Neural Network Inference	Author: Qian Lou et al. Yilin Shen, Hongxia Jin Published: International Conference on Learning Representation (ICLR)	Dec 12, 2021	The advances in neural networks have driven many companies to provide prediction services to users in a wide range of applications. However, current prediction systems raise privacy concerns regarding the users private data. A cryptographic neural network inference service is an efficient way to allow two parties to execute neural network inference without revealing either party’s data or model. Nevertheless, existing cryptographic neural network inference services suffer from huge running latency; in particular, the latency of communication-expensive cryptographic activation function is 3 orders of magnitude higher than plaintext-domain activation function. And activations are the necessary components of the modern neural networks. Therefore, slow cryptographic activation has become the primary obstacle of efficient cryptographic inference. In this paper, we propose a new technique, called SAFENet, to enable a Secure, Accurate and Fast nEural Network inference service. To speedup secure inference and guarantee inference accuracy, SAFENet includes channel-wise activation approximation with multiple-degree options. This is implemented by keeping the most useful activation channels and replacing the remaining, less useful, channels with various-degree polynomials. SAFENet also supports mixed-precision activation approximation by automatically assigning different replacement ratios to various layer; further increasing the approximation ratio and reducing inference latency. Our experimental results show SAFENet obtains the state-of-the-art inference latency without a decrease in accuracy, reducing latency by $38\% \sim 61\%$ over prior techniques on various encrypted datasets.	https://openreview.net/pdf?id=Cz3dbFm5u-	Artificial Intelligence
RRMonitor: A Resource-Aware End-to-End System for Continuous Monitoring of Respiration Rate Using Earable Devices	Author: Tousif Ahmed et al. Mahbubur Rahman, Mohsin Ahmed, Ebrahim Nematihosseinabadi, Minh Dinh, Nathan Robert Folkman, Jilong Kuang, Alex Gao Published: Engineering in Medicine and Biology Conference (EMBC)	Nov 5, 2021	Respiration rate is considered as a critical vital sign, and daily monitoring of respiration rate could provide helpful information about any acute condition in the human body. While researchers have been exploring mobile devices for respiration rate monitoring, passive and continuous monitoring is still not feasible due to many usability challenges (e.g., active participation) in existing approaches. This paper presents an end-to-end system called RRMonitor that leverages the movement sensors from commodity earbuds to continuously monitor the respiration rate in near real-time. While developing the systems, we extensively explored some key parameters, algorithms, and approaches from existing literature that are better suited for continuous and passive respiration rate monitoring. RRMonitor can passively track the respiration rate with a mean absolute error as low as 1.64 cycles per minute without requiring active participation from the user.	https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9631109	Digital Health
A Novel Multi-Center Template-Matching Algorithm and Its Application for Cough Detection	Author: Ebrahim Nematihosseinabadi et al. Tousif Ahmed, Mahbubur Rahman, Jilong Kuang, Alex Gao Published: Engineering in Medicine and Biology Conference (EMBC)	Nov 2, 2021	In time series classification problems, K-Nearest Neighbors (KNN) combined with elastic distance measurement (esp. Dynamic Time Warping (DTW)) achieves outstanding classification performance. However, it is often regarded as prohibitively time-consuming. Nearest Centroid Classifier is thereafter proposed. But the accuracy is comprised with only one centroid obtained for each class. Centroid-based Classifier performs clustering and averaging for each cluster, but requires manually setting the number of clusters. In this work, we propose a novel self-tuning multi-center template-matching algorithm, which can automatically adjust the number of clusters to balance accuracy and inference time. Through experiments conducted on synthetic datasets and a real-world earbud-based cough dataset, we demonstrate the superiority of our proposed algorithm in terms of both accuracy and inference time.	https://arxiv.org/pdf/2109.00630.pdf	Digital Health
Using Neighborhood Context to Improve Information Extraction from Visual Documents Captured on Mobile Phones	Author: Kalpa Gunaratna et al. Vijay Srinivasan, Sandeep Nama, Hongxia Jin Published: International Conference on Information and Knowledge Management (CIKM)	Nov 1, 2021	Information Extraction from visual documents is useful in practice to enable intelligent assistant to users. We present an approach that combines local context information and contextual language models to improve information extraction accuracy. We show that our method is able to perform well across model sizes and able to work well with small models that can be useful in applications that need efficient processing (e.g., mobile computing). Our method outperformed state-of-the-art global context based technique and our implementation on a mobile platform suggests its usefulness in practical real-world applications.	https://arxiv.org/pdf/2108.10395.pdf	Artificial Intelligence
SpeechSpiro: Lung Function Assessment from Speech Pattern as an Alternative to Spirometry for Mobile Health Tracking	Author: Korosh Vatanparvar et al. Viswam Nathan, Ebrahim Nematihosseinabadi, Mahbubur Rahman, Daniel McCaffrey, Jilong Kuang, Alex Gao Published: Engineering in Medicine and Biology Conference (EMBC)	Oct 31, 2021	Abstract—Respiratory illnesses are common in the United States and globally which people deal with in various forms, such as asthma, chronic obstructive pulmonary diseases or infectious respiratory diseases (e.g. from coronavirus). Lung function of the subjects affected by these illnesses is compromised due to infection and/or inflammation in their respiratory airways. There are clinically-validated tests to assess lung function using in-clinic medical equipment, and quite recently, via portable spirometry devices. Research has shown that the obstruction and restriction in the respiratory airways affect individuals’ voice characteristics, where the audio features could be analyzed to predict the lung function and severity of the obstruction. In this paper, we go beyond well-known voice audio features and create a hybrid deep learning model using CNN-LSTM to discover spatiotemporal patterns in speech and predict the lung function parameters with accuracy comparable to conventional devices. We validate the performance and generalizability of our method using the data collected from 200 subjects enrolled in two studies internally and in collaboration with a pulmonary hospital. SpeechSpiro measures lung function parameters (e.g. FEV1, FVC, FEV1/FVC) with mean RMSE of 12% and R2 of up to 76% using 60-second phone audio recording of individuals reading a passage. Clinical relevance — Speech-based spirometry (SpeechSpiro) eliminates the need for an additional device and carries out the lung function assessment outside the clinical settings using a smartphone; hence, enabling continuous mobile health tracking for the individuals, healthy or with a respiratory illness.	https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9630077	Digital Health
Device Invariant Deep Neural Networks for Pulmonary Audio Event Detection Across Mobile and Wearable Devices	Author: Mohsin Ahmed et al. Li Zhu, Mahbubur Rahman, Tousif Ahmed, Jilong Kuang, Alex Gao Published: Engineering in Medicine and Biology Conference (EMBC)	Oct 31, 2021	Mobile and wearable devices are being increasingly used for developing audio based machine learning models to infer pulmonary health, exacerbation and activity. A major challenge to widespread usage and deployment of such pulmonary health monitoring audio models is to maintain accuracy and robustness across a variety of commodity devices, due to the effect of device heterogeneity. Because of this phenomenon, pulmonary audio models developed with data from one type of device perform poorly when deployed on another type of device. In this work, we propose a framework incorporating feature normalization across individual frequency bins and combining task specific deep neural networks for model invariance across devices for pulmonary event detection. Our empirical and extensive experiments with data from 131 real pulmonary subjects and healthy controls show that our framework can recover upto163.6% of the accuracy lost due to device heterogeneity for four different pulmonary classification tasks across two broad classification scenarios with two common mobile and wearable devices: smartphone and smartwatch.	https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9629853	Digital Health
Real-Time Limb Motion Tracking with a Single IMU Sensor for Physical Therapy Exercises	Author: Wenchuan Wei et al. Keiko Kurita, Jilong Kuang, Alex Gao Published: Engineering in Medicine and Biology Conference (EMBC)	Oct 31, 2021	Limb exercises are common in physical therapy to improve range of motion (RoM), strength, and flexibility of the arm/leg. To improve therapy outcomes and reduce cost, motion tracking systems have been used to monitor the user’s movements when performing the exercises and provide guidance. Traditional motion tracking systems are based on either cameras or inertial measurement unit (IMU) sensors. Camera-based systems face problems caused by occlusion and lighting. Traditional IMU-based systems require at least two IMU sensors to track the motion of the entire limb, which is not convenient for use. In this paper, we propose a novel limb motion tracking system that uses a single 9-axis IMU sensor that is worn on the distal end joint of the limb (i.e., wrist for the arm or ankle for the leg). Limb motion tracking using a single IMU sensor is a challenging problem because 1) the noisy IMU data will cause drift problem when estimating position from the acceleration data, 2) the single IMU sensor measures the motion of only one joint but the limb motion consists of motion from multiple joints. To solve these problems, we propose a recurrent neural network (RNN) model to estimate the 3D positions of the distal end joint as well as the other joints of the limb (e.g., elbow or knee) from the noisy IMU data in real time. Our proposed approach achieves high accuracy with a median error of 4.4/4.1 cm for the wrist/elbow joint when tracking the arm motion, outperforming the state-of-the-art approach by 50%. In addition, the proposed model is lightweight, enabling real-time applications on mobile devices.	https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9630480	Digital Health
Listen2Cough: Leveraging End-to-End Deep Learning Cough Detection Model to Enhance Lung Health Assessment Using Passively Sensed Audio	Author: Ebrahim Nematihosseinabadi et al. Korosh Vatanparvar, Viswam Nathan, Tousif Ahmed, Mahbubur Rahman, Daniel McCaffrey, Jilong Kuang, Jun Gao Published: ACM International Conference on Ubiquitous Computing (UbiComp)	Sep 13, 2021	The prevalence of ubiquitous computing enables new opportunities for lung health monitoring and assessment. In the past few years, there have been extensive studies on cough detection using passively sensed audio signals. However, the generalizability of a cough detection model when applied to external datasets, especially in real-world implementation, is questionable and not explored adequately. Beyond detecting coughs, researchers have looked into how cough sounds can be used in assessing lung health. However, due to the challenges in collecting both cough sounds and lung health condition ground truth, previous studies have been hindered by the limited datasets. In this paper, we propose Listen2Cough to address these gaps. We first build an end-to-end deep learning architecture using public cough sound datasets to detect coughs within raw audio recordings. We employ a pre-trained MobileNet and integrate a number of augmentation techniques to improve the generalizability of our model. Without additional fine-tuning, our model is able to achieve an F1 score of 0.948 when tested against a new clean dataset, and 0.884 on another in-the-wild noisy dataset, leading to an advantage of 5.8% and 8.4% on average over the best baseline model, respectively. Then, to mitigate the issue of limited lung health data, we propose to transfer the feature representation from the cough detection task to lung health assessment tasks so that the rich cough data can be leveraged. Our hypothesis is that these tasks extract and utilize similar effective representation from cough sounds. We embed the cough detection model into a multi-instance learning framework with the attention mechanism and further tune the model for lung health assessment tasks. Our final model achieves an F1-score of 0.912 on healthy v.s. unhealthy, 0.870 on obstructive v.s. non-obstructive, and 0.813 on COPD v.s. asthma classification, outperforming the baseline by 10.7%, 6.3%, and 3.7%, respectively. Moreover, the weight value in the attention layer can be used to identify important coughs highly correlated with lung health, which can potentially provide interpretability for expert diagnosis in the future.	https://dl.acm.org/doi/pdf/10.1145/3448124	Digital Health
ToA-based Localization of Far-Away Targets: Equi-DOP Surfaces, Asymptotic Bounds, and Dimension Adaptation	Author: Raghunandan M Rao et al. Boon Loong Ng, YI YANG, Moon-Seok Kang Published: IEEE Transactions on Vehicular Technology	Sep 3, 2021	This paper studies the Dilution of Precision (DOP) in the Time-of-arrival (ToA)-based localization of targets outside the anchors convex hull. In the far-away target regime, we derive a closed-form expression of the DOP that reveals a linear asymptotic scaling law. We characterize the asymptotic DOP bounds, equi-DOP surfaces/contours in 3D/2D localization scenarios, which quantifies the reliability of location estimates on a trajectory. Motivated by vehicular applications, we propose a range-aided dimension adaptation scheme. Here the localization dimension is adapted in real-time using a single range measurement such that the maximum or root-mean-square DOP does not exceed a threshold. Since high-accuracy localization of far-away targets is infeasible due to linear DOP scaling with distance, this scheme prioritizes high-performance tracking of nearby targets while monitoring far-away targets with range-only measurements.	https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9531485	Next Generation Communications
Automatic Mixed-Precision Quantization Search of BERT	Author: Changsheng Zhao et al. Ting Hua, Yilin Shen, Hongxia Jin Published: International Joint Conference on Artificial Intelligence (IJCAI)	Aug 21, 2021	Pre-trained language models such as BERT have shown great effectiveness in various natural lan- guage processing tasks. However, these models usually contain millions of parameters, which prevent them from the practical deployment on resource-constrained devices. Knowledge distilla- tion, Weight pruning, and Quantization are known to be the main directions in model compression. In this field of pre-trained language model com- pression, most existing work aims to obtain a com- pact model through knowledge distillation from the original larger model, which may suffer from sig- nificant accuracy drop even for a relatively small compression ratio. On the other hand, there are only a few attempts based on quantization designed for natural language processing tasks, and they usually require manual setting on hyper-parameters. In this paper, we proposed a BERT compression approach that can achieve automatic mixed-precision quanti- zation, which can conduct quantization and prun- ing at the same time. Specifically, our proposed method leverages differentiable Neural Architec- ture Search to automatically assign scales and pre- cision for parameters in each sub-group, and mean- while pruning out redundant groups of parameters. Extensive evaluations on BERT downstream tasks reveal that our proposed method beats baselines by providing the same performance with much smaller model size. We also show the possibility of obtain- ing the extremely light-weight model by combining our solution with orthogonal methods such as Dis- tilBERT.	https://arxiv.org/pdf/2112.14938.pdf	Artificial Intelligence
Enhancing the Generalization for Intent Classification and Out-of-Domain Detection in SLU	Author: Yilin Shen et al. Yen-Chang Hsu, Avik Ray, Hongxia Jin Published: Association for Computational Linguistics (ACL)	Aug 2, 2021	Intent classification is a major task in spoken language understanding (SLU). Since most models are built with pre-collected in-domain (IND) training utterances, their ability to detect unsupported out-of-domain (OOD) utterances is critical in practice. Recent works showed that using extra data and labels can improve the OOD detection performance, yet it could be costly to collect such data. In this paper, we propose to train a joint model only on IND training set to support both IND intent classification and OOD detection. Our method explicitly models a domain variable to learn the domain disentangled utterance representation, named DDM model. DDM can be used as a drop-in replacement for any deep neural intent classifier. To further improve OOD detection performance, we introduce confidence and feature based OOD detection methods to combine with DDM and BERT-based models. On all three benchmark SLU datasets and one in-house dataset, we show that our method built on BERT and RoBERTa models achieve the state-of-the-art performance against existing approaches as well as multiple BERT based strong baselines for both intent classification and OOD detection tasks.	https://arxiv.org/pdf/2106.14464.pdf	Artificial Intelligence
Towards Motion-Aware Passive Resting Respiratory Rate Monitoring Using Earbuds	Author: Mahbubur Rahman et al. Tousif Ahmed, Mohsin Ahmed, Ebrahim Nematihosseinabadi, Minh Dinh, Nathan Robert Folkman, Jilong Kuang, Jun Gao Published: IEEE-EMBS International Conference on Biomedical and Health Informatics(BHI) and the Body Sensor Networks(BSN) Conferences	Jul 27, 2021	Breathing rate is an important vital sign and an indicator of overall health and fitness. Traditionally breathing is monitored using specialized devices such as chestband or spirometers. However, these are uncomfortable for everyday use. Recent works show the feasibility of estimating breathing rate using earbuds. However, non-breathing head motion is one of the biggest challenges for accurate breathing rate estimation using earbuds or other head-mounted devices such as smart-glass. In this paper, we propose an algorithm to estimate the breathing rate in presence of non-breathing head motion using inertial sensors embedded in commodity earbuds. Using the chestband as a reference device, we show that our algorithms can estimate breathing rate in resting positions with $\pm$ 2.63 breaths per minute (BPM) error. However, when the algorithms developed on data without head motion and applied to the data with head motion, the error significantly increases. Our head-motion handling algorithm proposed in this paper can improve the accuracy up to 30\% in the presence of non-breathing head motion. This paper can help make a big stride towards passive breathing monitoring in everyday life using commodity earbuds which are increasingly becoming popular nowadays.	https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9507016	Digital Health
Better Battery Life: Towards Energy-Efficient Smartwatch-Based Atrial Fibrillation Detection in Ambulatory Free-living Environment	Author: Retiree et al. Li Zhu, Viswam Nathan, Jilong Kuang Published: IEEE-EMBS International Conference on Biomedical and Health Informatics(BHI) and the Body Sensor Networks(BSN) Conferences	Jul 27, 2021	Atrial Fibrillation (AF) is an important medical condition that an be passively detected and tracked using a smartwatch. Diagnosis and monitoring of AF can be more effective and reliable if the smartwatch senses continuously, but this can lead to significant battery consumption by the LED in the photoplethysmography (PPG) sensor. In this paper, we explore the feasibility of leveraging downsampling to achieve energy-efficient AF detection. We collect data from participants with paroxysmal AF in real ambulatory free-living environments using a commercial smartwatch and separately study the impact of uniform downsampling and compressed sensing on AF detection. Our results reveal that downsampling enables the AF detection system to consume about 77.4% less LED power than the original sampling strategy without a significant performance drop	https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9507025	Digital Health
Real-Time 3D Arm Motion Tracking using the 6-axis IMU sensor of a Smartwatch	Author: Wenchuan Wei et al. Keiko Kurita, Jilong Kuang, Jun Gao Published: IEEE-EMBS International Conference on Biomedical and Health Informatics(BHI) and the Body Sensor Networks(BSN) Conferences	Jul 27, 2021	Inertial measurement unit (IMU) sensor is widely used in motion tracking for various applications, e.g., virtual physical therapy and fitness training. Traditional IMU-based motion tracking systems use 9-axis IMU sensors that include an accelerometer, gyroscope, and magnetometer. The magnetometer is essential to correct the yaw drift in orientation estimation. However, its magnetic field measurement is often disturbed by the ferromagnetic materials in the environment and requires frequent calibration. Moreover, most IMU-based systems require multiple IMU sensors to track the body motion and are not convenient for use. In this paper, we propose a novel approach that uses a single 6-axis IMU sensor of a consumer smartwatch without any magnetometer to track the users 3D arm motion in real time. We use a recurrent neural network (RNN) model to estimate the 3D positions of both the wrist and the elbow from the noisy IMU data. Compared with the state-of-the-art approaches that use either the 9-axis IMU sensor or the combination of a 6-axis IMU and an extra device, our proposed approach significantly improves the usability and potential for pervasiveness by not requiring an magnetometer or any extra device, while achieving comparable results.	https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9507012	Digital Health
CoughBuddy: Multi-Modal Cough Event Detection Using Earbuds Platform	Author: Ebrahim Nematihosseinabadi et al. Tousif Ahmed, Mahbubur Rahman, Jilong Kuang, Jun Gao Published: IEEE-EMBS International Conference on Biomedical and Health Informatics(BHI) and the Body Sensor Networks(BSN) Conferences	Jul 27, 2021	The prevalence of novel wearable devices has opened new horizons of opportunity for lung health monitoring and assessment in the past decade. There has been extensive amount of study on cough detection using acoustic features of the cough from smartphones and smartwatches. However, the specificity of the algorithms has always been a concern when exposed to the unseen field data that contain cough-like sounds. In this paper, we propose a novel sensor fusion algorithm that employs a hybrid of classification and template matching algorithms to tackle the problem of unseen classes. The algorithm utilizes in-ear audio signal as well as head motion captured by the inertial measurement unit (IMU). A large study including 45 subjects from healthy and chronic cough cohorts was conducted that contained various tasks including cough and cough-like body sounds in various conditions such as quite/noisy and stationary/non-stationary. Our proposed hybrid algorithm which comprises audio-event classification and a dynamic time warping (DTW)-based IMU template matching is evaluated for sensitivity and specificity in the aforementioned conditions using leave one-subject out validation (LOSOV). Our model is able to achieve an average sensitivity of 83% for stationary tasks with an average specificity of 91.7% for cough-like sounds reducing the false positive rate by 55%. These results indicate the feasibility and superiority of earbuds platforms for detection of pulmonary sound events such as cough.	https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9507017	Digital Health
Fractionally Spaced Equalizer for Next Generation Terahertz Wireless Communication Systems	Author: Jeongho Jeon et al. Joonyoung Cho, Shadi Abu-Surra, KITAEK BAE, Charlie Zhang Published: IEEE International Conference on Communications (ICC)	Jun 14, 2021	Higher data rates are required to support exponential growth in wireless traffic, motivating an expansion of the transmission bandwidth for sixth generation (6G) communications. The available bandwidth in the terahertz (THz) band significantly exceeds the available bandwidth in the mmWave band that has been adopted in fifth generation (5G) systems; thus, the THz band is envisioned as a pillar for 6G systems that can support data rates on the order of terabits per second (Tb/s). However, wireless communications in the THz band poses several new challenges. One of these challenges involves the practical constraint of employing a limited oversampling factor to process wideband THz signals, even while leveraging state-of-the-art analog/digital converter techniques. This limited oversampling factor – which can lead to an increased sampling timing offset – degrades the demodulation performance when it is employed in conjunction with a conventional symbol-spaced equalizer. Thus, we employ a fractionally spaced equalizer (FSE) in a THz communication system to overcome the impact of the increased sampling timing offset for a practical system that utilizes a limited sampling rate. Analysis and simulations demonstrate that the FSE can perfectly compensate the timing offset by optimally combining the available samples. Also, an approximation to the noise covariance matrix is proposed to reduce the computational complexity of the frequency-domain FSE.	https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9473698	Next Generation Communications
End-to-end 140 GHz Wireless Link Demonstration with Fully-Digital Beamformed System	Author: Shadi Abu-Surra et al. Will Choi, SungTae Choi, Eunyoung Seok, Dongjoo Kim, Navneet Sharma, Siddharth Advani, Vitali Loseu, KITAEK BAE, ILJU NA, Gary Xu, Charlie Zhang Published: IEEE International Conference on Communications (Workshop) (ICC W/S)	Jun 14, 2021	It is projected that mobile traffic will increase by 80x by year 2030. To meet this increase in demand, it is inevitable to utilize the terahertz bands (0.1 THz to 10 THz) for future 6G wireless systems. However, operating at such high frequency comes with several fundamental and technical challenges. In this work, we present a proof-of-concept system to demonstrate the feasibility of establishing a communication link at 140 GHz carrier frequency. In addition, this work highlights techniques to tackle the challenges that comes with operating in the terahertz regime. To the authors knowledge, this is the world’s first end-to-end system with up to 16-channel digitally-beamformed 140 GHz system and dynamic beam steering capability. The paper presents lab results which demonstrate link throughput of 6 Gbps at 15-meter distance with adaptive beamforming.	https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9473600	Next Generation Communications
An Actor-Critic based End-to-End Neural Coreference System	Author: Yu Wang et al. Yilin Shen, Hongxia Jin Published: International Conference on Acoustics, Speech, and Signal Processing (ICASSP)	Jun 11, 2021	In this paper, we introduce a novel actor-critic based end-to-end neural coreference system to achieve joint tasks including mention detection, mention clustering and coreference resolution. Our model achieves the state-of-the-art performance on the CoNLL-2012 Shared Task English test set.	https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9413579	Artificial Intelligence
An adversarial learning based multi-step spoken language understanding system through human-computer interaction	Author: Yu Wang et al. Yilin Shen, Hongxia Jin Published: International Conference on Acoustics, Speech, and Signal Processing (ICASSP)	Jun 11, 2021	Most of the existing spoken language understanding systems can perform only semantic frame parsing based on a singleround user query. They cannot take users’ feedback to update/add/remove slot values through multiround interaction with users. In this paper, we introduce a novel interactive adversarial reward learning-based spoken language understanding system that can leverage the multiround user’s feedback to update slot values. We perform two experiments on the benchmark ATIS dataset and demonstrate that the new system can improve parsing performance byat least 2:5% in terms of F1, with only one round of feedback. The improvement becomes even larger when the number of feedback rounds increases. Furthermore, we also compare the new system with state-of-the-art dialogue state tracking systems and demonstrate that the new interactive system can perform better on multiround spoken language understanding tasks in terms of slot- and sentence-level accuracy.	https://arxiv.org/pdf/2106.14611.pdf	Artificial Intelligence
Hyperparameter-free Continuous Learning for NLU Domain Classification	Author: Ting Hua et al. Yilin Shen, Changsheng Zhao, Yen-Chang Hsu, Hongxia Jin Published: Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL)	Jun 8, 2021	Domain classification is the fundamental task in natural language understanding (NLU), which often requires fast accommodation to new emerging domains. This constraint makes it impossible to retrain all previous domains, even if they are accessible to the new model. Most existing continual learning approaches are designed for the scenario that zero old data are observable. However, these methods may result in low accuracy and performance fluctuation, when the old and new data distributions are significantly different. And extensive efforts are often required in parameter tuning. The key problem in many practical cases such as domain classification is not the absence of old data, but the inefficiency to retrain the model with the whole old dataset. Is it potential to utilize little old data to yield high accuracy and maintain stable performance, while at the same time, without introducing extra parameters? In this paper, we proposed a parameter-free continual learning model for text data that can stably produce high performance under various environments. Specifically, we utilize Fisher information to select exemplars that can “record key information of original model. Also, a novel scheme called dynamical weight consolidation is proposed to enable parameter-free learning during the retrain process. Extensive experiments demonstrate baselines provide fluctuated performance which makes them useless in practice. On the contrary, our proposed model significantly and consistently outperforms the best state-of-the-art method by up to 20\% in average accuracy, and each of its component contributes effectively to overall performance.	https://arxiv.org/pdf/2201.01420.pdf	Artificial Intelligence
Data Augmentation for Voice-Assistant NLU using BERT-based Interchangeable Rephrase	Author: Akhila Yerukola et al. Hongxia Jin Published: European Association for Computational Linguistics (EACL)	Apr 21, 2021	We introduce a data augmentation technique based on byte pair encoding and a BERT like self-attention model to boost performance on spoken language understanding tasks. We compare and evaluate this method with a range of augmentation techniques encompassing generative models such as VAEs and performance-boosting techniques such as synonym replacement and back-translation. We show our method performs strongly on domain and intent classification tasks for a voice assistant and in a user-study focused on utterance naturalness and semantic similarity.	https://arxiv.org/pdf/2104.08268.pdf	Artificial Intelligence
Early Detection and Burden Estimation of Atrial Fibrillation in Ambulatory Free-living Environment	Author: Li Zhu et al. Viswam Nathan, Jilong Kuang, Jacob Kim, Jun Gao Published: ACM International Conference on Ubiquitous Computing (UbiComp)	Mar 1, 2021	Early detection and accurate burden estimation of AFib can provide the foundation for effective physician treatment and attract tremendous attention in recent years. In this paper, we develop a novel smartwatch-based system to achieve detection of AFib episodes and estimation of AFib burden in ambulatory free-living environment withour user engagement. Our system leverages built-in PPG sensor to collect heart rhythm without user engagement. Then, a data preprocessor module includes time-frequency (TF) analysis to augment features in both time and frequency domain. Finally, a super lightweight multi-view convolutional neural network consisting of 19 layers achieves the AFib detection. To validate our system, we collaborate with medical professionals and carry out a clinical study to enroll 53 participants across 3 months. For each participant, we collect and annotate more than 336 hours of data. Our systems can achieve average 91.6% accuracy, 0.930 specificity, and 0.908 sensitivity without dropping any data. Moreover, our system takes 0.51 million parameters and costs 5.18 ms per inference. These results reveal that our proposed system has the potential to provide the clinical assessment of AFib in daily living.	https://dl.acm.org/doi/pdf/10.1145/3463503	Digital Health
MIMO Evolution Towards 6G: Modular Massive MIMO in Low-Frequency Bands	Author: Jeongho Jeon et al. Gilwon Lee, Ahmed Ibrahim, Jin Yuan, Gary Xu, Joonyoung Cho, Eko Onggosanusi, Younsun Kim, Juho Lee, Charlie Zhang Published: IEEE Communications Magazine	Feb 28, 2021	As the pace of global 5G network deployments accelerates, now is the moment for the cellular industry to realize the sixth generation (6G) cellular communication. In this article, the so-called modular massive MIMO (mmMIMO) is presented as one candidate technology for 6G. The 5G had relentlessly pushed the boundary of the cellular system’s operating frequency to millimeter wave bands and such a trend will be continued in the 6G era to further embrace the greenfield terahertz (THz) spectrum. Admittedly, however, the technical advances in 5G for low bands fall short, although low bands are crucial in serving a large number of users in a wide coverage area. Although, it would be ideal if massive MIMO could be utilized in low bands, it is less practical due to a large antenna form factor size. mmMIMO is a technology to distribute a large active antenna array with smaller standardized antenna modules, just like the LEGO blocks. Through this, the benefits of massive MIMO can be achieved in low bands, unconstrained from the spatial limitations. In this article, the concept of mmMIMO, its applicability, and needed research efforts to realize the technology are discussed. In addition, through the demonstration of a proof-of-concept system, it is shown that the technology will be within reach at the time of 6G massive commercialization around 2030. Lastly, the performance gain of mmMIMO is evidenced by system-level simulation.	https://ieeexplore.ieee.org/document/9665444	Next Generation Communications
BreathTrack: Detecting Regular Breathing Phases from UnannotatedAcoustic Data Captured by a Smartphone	Author: Mahbubur Rahman et al. Tousif Ahmed, Mohsin Ahmed, Korosh Vatanparvar, Ebrahim Nematihosseinabadi, Viswam Nathan, Jilong Kuang, Alex Gao Published: ACM International Conference on Ubiquitous Computing (UbiComp)	Feb 13, 2021	Passive and continuous monitoring of breathing biomarkers is vital for assessing well-being and detecting abnormalities in breathing patterns. In this paper, we present a novel method to detect breathing phases during regular breathing towards passive monitoring of natural breathing using acoustic sensors embedded in smartphones. Our model eliminates the need for breathing sound annotation by transferring knowledge from inertial sensor to acoustic sensor and by fusing signal processing techniques with deep learning methods. Our study with 131 subjects including healthy subjects and pulmonary patients shows that our model can detect breathing phases with 77.33% accuracy using acoustic sensors which enables novel and fine-grained breathing biomarkers such as inhalation exhalation ratio, fractional inspiratory time including commonly known vital sign called breathing rate. We further show that our algorithm can estimate fractional inspiratory time with92.08% accuracy, the inhalation-exhalation ratio with 86.76% accuracy, and the commonly known breathing rate with 91.74% accuracy. We further present the respiratory patient detection model as an example application of breathing phase detection and novel biomarker extraction. We show that fractional inspiratory time is significantly correlated with patient severity and our model can distinguish respiratory patients from healthy individuals with up to 76% accuracy. This paper is the first work to show the feasibility of detecting regular breathing phases towards passively monitoring respiratory well-being using a smartphone.	https://dl.acm.org/doi/pdf/10.1145/3478123	Digital Health
FadeNet: Deep Learning based mm-Wave large-scale channel fading prediction and its applications	Author: Vishnu Vardhan Ratnam et al. Hao Chen, Charlie Zhang, YOUNG-JIN KIM, Retiree, MINSUNG CHO, SUNG-ROK YOON Published: IEEE Access	Sep 30, 2020	Accurate prediction of the large-scale channel fading is fundamental to planning and optimization in 5G mm-Wave cellular networks. The current prediction methods, which are either too computationally expensive or inaccurate, are unsuitable for city-scale cell planning and optimization. This paper presents FadeNet, a convolutional neural-network enabled alternative for predicting large-scale fading with high computation speed and accuracy. By using carefully designed input features and neural-network architecture, FadeNet accurately predicts the large-scale fading from a base station to each location in its coverage area. Evaluations on realistic data, derived from mm-Wave cells across multiple cities in USA, suggest that FadeNet can achieve a prediction accuracy of 5.6 dB in root mean square error. In addition, by leveraging the parallel processing capabilities of a graphics processing unit, FadeNet can reduce the prediction time by 40X-1000X in comparison to industry prevalent methods like ray-tracing. Generalizations of FadeNet, that can handle variable topographies and base station heights, and its use for optimal cell site selection are also explored.	https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9311729	Next Generation Communications

Join us