Search or browse our publications below.

Title Details Date Abstract Link Research Areas

Numerical Optimizations for Weighted Value Decomposition on Language Models

Author: Ting Hua et al. Yen-Chang Hsu, Felicity Wang, Retiree, Yilin Shen, Hongxia Jin

Published: Conference on Empirical Methods in Natural Language Processing (EMNLP)

Dec 9, 2022

Singular value decomposition (SVD) is one of the most popular compression methods that approximate a target matrix with smaller matrices. However, standard SVD treats the parameters within the matrix with equal importance, which is a simple but unrealistic assumption. In real cases, the parameters of a trained neural network model affect the task performance unevenly, suggesting non-equal importance among the parameters. Therefore, this paper proposed Fisher information weighted Value Decomposition (FVD) to compress a neural network model with the awareness about parameter importance. Unlike standard SVD, FVD is a non-convex optimization problem that lacks a closed-form solution. Therefore, optimizing FVD is non-trivial.
We systematically investigated multiple optimization strategies to tackle the problem and examined our method by compressing transformer-based language models.
Further, we designed a metric to predict when the SVD may introduce a significant performance drop, and our FVD can be a rescue strategy.
The extensive evaluations demonstrate that our FVD can perform comparable or even better with current SOTA methods in compressing Transformer-based language models.
Also, the analysis of Transformer-blocks shows that our FVD can achieve significant performance improvements over SVD on the sub-structure factorization. Artificial Intelligence

Explainable Slot Type Attentions to Improve Joint Intent Detection and Slot Filling

Author: Kalpa Gunaratna et al. Vijay Srinivasan, Retiree, Hongxia Jin

Published: Conference on Empirical Methods in Natural Language Processing (EMNLP)

Dec 7, 2022

Joint intent detection and slot filling is a key research topic in natural language understanding (NLU). Existing joint intent and slot filling systems compute features collectively for all slot types, and importantly, have no way to explain the slot filling model decisions. In this work, we propose a novel approach that: (i) learns to generate slot type specific features to improve accuracy and (ii) provides explanations of slot filling decisions for the first time in a joint NLU model. Further, the model is inherently explainable and does not need any post-hoc processing. We perform an additional constrained supervision using a set of binary classifiers to learn slot type specific features, thus ensuring appropriate attention weights are learned to explain slot filling decisions for utterances. We evaluate our approach on two widely used datasets and show accuracy improvements. Moreover, a detailed analysis is also provided for the exclusive slot explainability of our proposed model. Artificial Intelligence

Foreground-Specialized Model Imitation for Instance Segmentation

Author: Wenbo Li et al. Hongxia Jin

Published: Asian Conference on Computer Vision (ACCV)

Dec 4, 2022

We leverage the knowledge distillation to address the object instance segmentation for robots with limited computational power. Instance segmentation is formulated as a multi-task learning problem involving object classification, localization and mask prediction. However, knowledge distillation is not well-suited to these sub-tasks except only one of them, i.e., multi-class object classification. To deal with this challenge, we introduce a novel distillation method where the teacher is a small foreground-specialized (FS) model. We train the FS instance segmentation teacher model using images with only foreground objects, i.e., background pixels are removed. So, the FS instance segmentation model is effective in object classification which is exactly what the distillation method is designed exclusively for. To accommodate the difference between inputs used by the teacher and student, we introduce a novel Foreground-Specialized model Imitation (FSI) method with two complementary module components. First, a reciprocal anchor box selection method is introduced to distill from the most informative output of the teacher model. Second, to embed the foreground-awareness in the students feature learning, we come up with two solutions by either adding a co-learned foreground segmentation branch or applying a soft feature mask. We conducted an extensive evaluation with the state-of-the-art one-stage object instance segmentation method YOLACT which is suitable for on-device inference. Experiment results on MS COCO and Pascal VOC datasets demonstrate that our method significantly outperforms knowledge distillation baselines in terms of both accuracy improvement and training efficiency. Artificial Intelligence

BreatheBuddy: Tracking Real-time Breathing Exercises for Automated Bio-feedback Using Commodity Earbuds

Author: Mahbubur Rahman et al. Tousif Ahmed, Mohsin Ahmed, Retiree, Ebrahim Nematihosseinabadi, Jilong Kuang, Alex Gao

Published: Mobile HCI

Oct 1, 2022

Breathing exercises reduce stress and improve overall mental well-being. There are various types of breathing exercises. Performing the exercises correctly may give the best outcome and doing it in wrong ways can sometimes have adverse effect. Providing real-time biofeedback can greatly improve the user experience in doing the right exercises in the right ways. In this paper, we present methods to passively track breathing biomarkers in real-time using wireless commodity earbuds and generate feedback on users breathing performance. We use the earbuds low-power accelerometer to generate a comprehensive set of breathing biomarkers including breathing phase, breathing rate, depth of breathing, and breathing symmetry. We have conducted studies where the subjects performed different types of guided breathing exercises while wearing the earbuds. Our algorithms detect breathing phases with ~88.88\% accuracy and estimate breathing rate with ~95\% accuracy. We further show that our algorithms can be used to generate biofeedback towards designing engaging smartphones user interactions that facilitate users to accurately perform various breathing exercises. Digital Health

Enhancement of Remote PPG and Heart Rate Estimation with Optimal Signal Quality Index

Author: Jiyang Li et al. Korosh Vatanparvar, Li Zhu, Jilong Kuang, Alex Gao

Published: IEEE-EMBS International Conference on Biomedical and Health Informatics(BHI) and the Body Sensor Networks(BSN) Conferences (IEEE BHI & BSN)

Sep 27, 2022

With the popularity of non-invasive vital signs detection,
remote photoplethysmography (rPPG) is drawing attention
in the community. Remote PPG, or rPPG signals are extracted
by a contactless manner that is more prone to artifacts than
PPG signals collected by wearable sensors. To develop a robust
and accurate system to estimate heart rate (HR) from rPPG
signals, we propose a novel real-time dynamic ROI tracking
algorithm that is applicable to slight motions and light changes.
Furthermore, we develop and include the signal quality index
(SQI) to improve the HR estimation accuracy. Studies have
developed optimal SQI for PPG signals but not rPPG signals,
we select and test six SQIs: Perfusion, Kurtosis, Skewness, Zerocrossing,
Entropy, and signal-to-noise ratio (SNR) on 124 rPPG
sessions from 30 participants wearing masks. Based on the mean
absolute error (MAE) of HR estimation, the optimal SQI is
selected and validated by Mann–Whitney U test (MWU). Lastly,
we show that the HR estimation accuracy is improved by 29%
after removing outliers decided by the optimal SQI, and the best
result achieves the MAE of 2.308 bpm. Digital Health

Respiration Rate Estimation from Remote PPG via Camera in Presence of Non-Voluntary Artifacts

Author: Korosh Vatanparvar et al. Migyeong Gwak, Li Zhu, Jilong Kuang, Alex Gao

Published: IEEE-EMBS International Conference on Biomedical and Health Informatics(BHI) and the Body Sensor Networks(BSN) Conferences (IEEE BHI & BSN)

Sep 27, 2022

Contactless measurement of vitals has been seen as a promising alternative to contact sensors for monitoring of health condition. In this paper, we focus on respiration rate (RR) as one of the fundamental biomarkers of a persons cardio and pulmonary activities. Remote RR estimation has gained attraction due to its various potential applications; use of RGB cameras to extract remote photoplethysmography (PPG) signal from subjects face has been debated as one of the enabling technologies for remote RR estimation. The technology is challenged with respect to wide range of RR and non-voluntary motion in uncontrolled settings. We propose a novel methodology to enhance the quality of respiration signal and remove artifacts from the remote PPG signal, which results in reducing the MAE from 4.5bpm to 2.8bpm for RR in range of 5-25bpm. We validate the accuracy of our methodology using smartphone video recordings of 30 subjects with uniform distribution of skin tone. Digital Health

Deep Audio Spectral Processing for Respiration Rate Estimation from Commodity Earbuds

Author: Mohsin Ahmed et al. Tousif Ahmed, Mahbubur Rahman, Retiree, Jilong Kuang, Alex Gao

Published: IEEE-EMBS International Conference on Biomedical and Health Informatics(BHI) and the Body Sensor Networks(BSN) Conferences (IEEE BHI & BSN)

Sep 27, 2022

Breathing rate is an important health biomarker and a vital indicator for health and fitness. With smart earbuds gaining popularity as a commodity device, recent works have demonstrated the potential for monitoring breathing rate using such earable devices. In this work, we use spectrograms from breathing cycle audio signals captured using earbuds as a spectral feature to train a deep convolutional neural network to infer respiration rate with high accuracy. Using novel earbud audio data collected from 30 subjects with both controlled breathing at a wide range (from 5 upto 45 breaths per minute), and uncontrolled natural breathing from 7-day home deployment, experimental results demonstrate that our model can estimate respiration rate with 0.77 MAE for controlled breathing and with 0.99 MAE for at-home natural breathing. Digital Health

Real-Time Breathing Phase Detection Using Earbuds Microphone

Author: Retiree et al. Tousif Ahmed, Mahbubur Rahman, Mohsin Ahmed, Ebrahim Nematihosseinabadi, Jilong Kuang, Alex Gao

Published: IEEE-EMBS International Conference on Biomedical and Health Informatics(BHI) and the Body Sensor Networks(BSN) Conferences (IEEE BHI & BSN)

Sep 27, 2022

Tracking breathing phases (inhale and exhale) outside the hospitals can offer significant health and wellness benefits to users. For example, the breathing phases can provide fine-grained breathing information for proper meditation or breathing exercises. While previous works use smartphones and smartwatches for tracking breathing phases, in this work, we use earbuds for breathing phase detection, which has the potential to be a better form factor for breathing exercises as it requires less user attention from the user. We propose a convolutional neural network-based algorithm for detecting breathing phases using the audio captured through the earbuds during guided breathing sessions. We conducted a user study with 30 participants in both lab and home environments to develop and evaluate our algorithm. Our algorithm can detect the breathing phases with 85% accuracy by taking only 500ms audio signal. Our work demonstrates the potential of using earbuds for tracking the breathing phases in real-time. Digital Health

IMU-based Cough Detection With Lightweight Template Matching Models

Author: Ebrahim Nematihosseinabadi

Published: IEEE-EMBS International Conference on Biomedical and Health Informatics(BHI) and the Body Sensor Networks(BSN) Conferences (IEEE BHI & BSN)

Sep 10, 2022

Cough is a major symptom of respiratory-related diseases. There exists a tremendous amount of work in detecting coughs from audio but there has been no effort to identify coughs from solely inertial measurement unit (IMU). Coughing causes motion across the whole body and especially on the neck and head. Therefore, head motion data during coughing captured by a head-worn IMU sensor could be leveraged to detect coughs using a template matching algorithm. In time series template matching problems, K-Nearest Neighbors (KNN) combined with elastic distance measurement (esp. Dynamic Time Warping (DTW)) achieves outstanding performance. However, it is often regarded as prohibitively time-consuming. Nearest Centroid Classifier is thereafter proposed. But the accuracy is comprised of only one centroid obtained for each class. Centroid-based Classifier performs clustering and averaging for each cluster, but requires manually setting the number of clusters. We propose a novel self-tuning multi-centroid template-matching algorithm, which can automatically adjust the number of clusters to balance accuracy and inference time. Through experiments conducted on synthetic datasets and a real-world earbud-based cough dataset, we demonstrate the superiority of our proposed algorithm and present the result of cough detection with a single accelerometer sensor on the earbuds platform. Digital Health

Instance Contour Adjustment via Structure-driven CNN

Author: Yi Wei

Published: European Conference on Computer Vision (ECCV)

Jul 31, 2022

Instance contour adjustment is desirable in image editing, which allows the contour of an instance in a photo to be either dilated or eroded via user sketching. This imposes several requirements for a favorable method in order to generate meaningful textures while preserving clear user-desired contours. Due to the ignorance of these requirements, the off-the-shelf image editing methods herein are unsuited. Therefore, we propose a specialized two-stage method. The first stage extracts the structural cues from the input image, and completes the missing structural cues for the adjusted area. The second stage is a structure-driven CNN which generates image textures following the guidance of the completed structural cues. In the structure-driven CNN, we redesign the context sampling strategy of the convolution operation and attention mechanism such that they can estimate and rank the relevance of the contexts based on the structural cues, and sample the top-ranked contexts regardless of their distribution on the image plane. Thus, the meaningfulness of image textures with clear and user-desired contours are guaranteed by the structure-driven CNN. In addition, our method does not require any semantic label as input, which thus ensures its well generalization capability. We evaluate our method against several baselines adapted from the related tasks, and the experimental results demonstrate its effectiveness. Artificial Intelligence

Table2Graph: Transforming Tabular Data to Unified Weighted Graph

Author: Rui Chen et al. Li Li, Soo-Hyun Choi, Xia Hu

Published: International Joint Conference on Artificial Intelligence (IJCAI)

Jul 23, 2022

Learning useful interactions between input features is crucial for tabular data modeling. Recent efforts start to explicitly model the feature interactions with graph, where each feature is treated as an individual node. However, the existing graph construction methods either heuristically formulate a fixed feature-interaction graph based on specific domain knowledge, or simply apply attention function to compute the pairwise feature similarities for each sample. While the fixed graph may be sub-optimal to downstream tasks, the sample-wise graph construction is time-consuming during model training and inference. To tackle these issues, we propose a framework named Table2Graph to transform the feature interaction modeling to learning a unified graph. Represented as a probability adjacency matrix, the unified graph learns to model the key feature interactions shared by the diverse samples in the tabular data. To well optimize the unified graph, we employ the reinforcement learning policy to capture the key feature interactions stably. A sparsity constraint is also proposed to regularize the learned graph from being overly-sparse/smooth. The experimental results in a variety of real-world applications demonstrate the effectiveness and efficiency of our Table2Graph, in terms of the prediction accuracy and feature interaction detection. Mobile Platform & Solutions

A New Concept of Knowledge based Question Answering (KBQA) System using Multiple Reasoning Paths

Author: Yu Wang

Published: Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL)

Jul 21, 2022

Knowledge based question answering (KBQA) is a complex task for natural language understanding. Many KBQA approaches have been proposed in recent years, and most of them are trained based on labeled reasoning path. This hinders the system’s performance as many correct reasoning paths are not labeled as ground truth, and thus they cannot be learned. In this paper, we introduce a new concept of KBQA system which can leverage multiple reasoning paths’ information and only requires labeled answer as supervision. We name it as Mutliple Reasoning Paths KBQA System (MRPQA). We conduct experiments on several benchmark datasets containing both singlehop simple questions as well as muti-hop complex questions, including WebQuestionSP (WQSP), ComplexWebQuestion-1.1 (CWQ), and PathQuestion-Large (PQL), and demonstrate strong performance. Artificial Intelligence

Joint phase-time arrays: a paradigm for frequency-dependent analog beamforming in 6G

Author: Vishnu Vardhan Ratnam et al. Jianhua Mo, Boon Loong Ng, Ahmad AlAmmouri, Charlie Zhang

Published: IEEE Access

Jul 12, 2022

Hybrid beamforming is an attractive solution to build cost-effective and energy-efficient transceivers for millimeter-wave and terahertz systems. However, conventional hybrid beamforming techniques rely on analog components that generate a frequency flat response such as phase-shifters and switches, which limits the flexibility of the achievable beam patterns. As a novel alternative, this paper proposes a new class of hybrid beamforming called Joint phase-time arrays (JPTA), that additionally use true-time delay elements in the analog beamforming to create frequency-dependent analog beams. Using as an example two important frequency-dependent beam behaviors, the numerous benefits of such flexibility are exemplified. Subsequently, the JPTA beamformer design problem to generate any desired beam behavior is formulated and near optimal algorithms to the problem are proposed. Simulations show that the proposed algorithms can outperform heuristics solutions for JPTA beamformer update. Furthermore, it is shown that JPTA can achieve the two exemplified beam behaviors with 1 radio-frequency chain, while conventional hybrid beamforming requires the radio-frequency chains to scale with antennas to achieve similar performance. Finally, a wide range problems to further tap into the potential of JPTA are also listed as future directions. Next Generation Communications

Detecting Physiological Stress Using Earbuds

Author: Mahbubur Rahman et al. Viswam Nathan, Tousif Ahmed, Retiree, Jilong Kuang, Alex Gao

Published: Engineering in Medicine and Biology Conference (EMBC)

Jul 11, 2022

Continuous stress exposure negatively impacts mental and physical well-being. Stress arousal affects heart beat frequency, changes breathing pattern, and peripheral temperature, among several other bodily responses. Traditionally the stress detection is performed by collecting bio-signals such as electrocardiogram (ECG), breathing, and galvanic skin response using uncomfortable chestbands or chestpatches. In this study, we use earbuds that passively measure photoplethysmograph (PPG), core body temperature, and inertial measurements simultaneously. We conducted a lab study exposing 18 test subjects to Trier Social Stress Test (TSST) and going through several relaxing activities including listening to functional music and progressive muscle relaxation while measuring physiological signal using earbuds. Moreover, we have simultaneously collected PPG, ECG, impedance cardiogram (ICG), and blood pressure using gold-standard reference devices. We show that earbuds can reliably capture heart rate and heart rate variability. We further show that earbud signals can be used to classify the physiological stress arousal with 91.30\% recall and 80.52\% precision using a random forest classifier with leave-one-subject-out cross-validation. Digital Health

Unsupervised Remote Photoplethysmograph and Heart Rate Estimation by Dynamic Region of Interest Tracking

Author: Retiree et al. Korosh Vatanparvar, Li Zhu, Jilong Kuang, Alex Gao

Published: Engineering in Medicine and Biology Conference (EMBC)

Jul 11, 2022

Remote photoplethysmography (PPG) estimates vital signs by measuring changes in the reflected light from the human skin. Compared with traditional PPG techniques, remote PPG enables contactless measurement and reduced cost. In this paper, we propose a novel unsupervised method to extract remote PPG signals and heart rate from videos. We propose an algorithm to dynamically track regions of interest (ROIs) and combine the signals from all ROIs based on signal qualities. To maintain a stable frame rate and accuracy, we propose a dynamic down-sampling approach, which makes our system robust to the different video resolutions and user-camera distances. We also propose the strategy of waiting time adaptation for HR measurements, which can achieve comparable accuracy in HR estimation while reduce the average waiting time. To test the accuracy of the proposed system, we have collected data from 30 subjects with facial masks. Experimental results show that the proposed system can achieve 3.0bpm mean absolute error in HR estimation. Digital Health

Deep Multivariate Domain Translation for Device Invariant Pulmonary Patient Identification from Cough and Speech Sounds

Author: Mohsin Ahmed et al. Korosh Vatanparvar, Jilong Kuang, Alex Gao

Published: Engineering in Medicine and Biology Conference (EMBC)

Jul 11, 2022

audio based machine learning models to infer pulmonary health, exacerbation and activity. A major challenge to widespread usage and deployment of such pulmonary health monitoring audio models is to maintain accuracy and robustness across a variety of commodity devices, due to the effect of device heterogeneity. Because of this phenomenon, pulmonary audio models developed with data from one type of device perform poorly when deployed on another type of device. In this work, we propose a framework incorporating a multivariate deep neural network regressor as a feature translator from the source device domain to the target device domain. Our empirical and extensive experiments with data from 131 real pulmonary patients and healthy controls show that our framework can recover upto 66.67% of the accuracy lost due to device heterogeneity for two different pulmonary activity based person identification tasks with two common mobile and wearable devices: smartphone and smartwatch. Digital Health

Motion-based Respiratory Rate Estimation with Motion Artifact Removal Technique in a Facial Video with an RGB Camera

Author: Migyeong Gwak et al. Korosh Vatanparvar, Jilong Kuang, Alex Gao

Published: Engineering in Medicine and Biology Conference (EMBC)

Jul 11, 2022

Respiratory rate (RR) is a significant indicator
of health conditions. Remote contactless measurement of RR
is gaining popularity with recent respiratory tract infection
awareness. Among various methods of contactless RR measurement,
a frontal face video with an RGB camera can be used
to obtain an instantaneous RR. In this paper, we introduce an
RR estimation based on the subtle motion of head or upper
chest captured on an RGB camera. Motion-based respiratory
monitoring allows us to acquire RR from individuals even with
partial face covering, such as glasses or a face mask. However,
motion-based RR estimation is vulnerable to the subject’s
voluntary movement. In this work, adaptive selection between
face and chest regions plus a motion artifact removal technique
enable us to obtain a clean respiratory signal from facial
video recordings. The average mean absolute error (MAE)
for both controlled and natural breathing is 1.95 BPM using
head motion only and 1.28 BPM using chest motion only. Our
results demonstrate the possibility of continuous monitoring of
breathing rate in real-time with any personal device equipped
with camera, such as a laptop or smartphone. Digital Health

Utilizing Deep Learning on Limited Mobile Speech Recordings for Detection of Obstructive Pulmonary Disease

Author: Viswam Nathan et al. Korosh Vatanparvar, Jilong Kuang

Published: Engineering in Medicine and Biology Conference (EMBC)

Jul 11, 2022

Passive assessment of obstructive pulmonary disease has gained substantial interests over the past few years in the mobile and wearable computing communities. One of the promising approaches is speech-based pulmonary assessment where spontaneous or scripted speech is used to evaluate an individuals pulmonary conditions. Recent work in speech-based pulmonary assessment approach has shown promising results in pulmonary disease detection. However, this approach heavily relies on the accuracy of speech activity detection and a handful number of specific features. Recently, the application of deep learning has shown promising results in the domain of activity recognition involving time series data. In this paper, we
present a deep learning approach for detecting obstructive pulmonary disease. Digital Health

Lite-MDETR: A Lightweight Multi-Modal Detector

Author: Qian Lou et al. Yen-Chang Hsu, Burak Uzkent, Ting Hua, Yilin Shen, Hongxia Jin

Published: Computer Vision and Pattern Recognition (CVPR)

Jun 21, 2022

Recent multi-modal detectors based on transformers and modality encoders have successfully achieved impressive results on end-to-end visual object detection conditioned on a raw text query. However, they require a large model size and an enormous amount of computations to achieve high performance, which makes it difficult to deploy mobile applications that are limited by tight hardware resources. In this paper, we present a Lightweight modulated detector, Lite-MDETR, to facilitate efficient end-to-end multi-modal understanding on mobile devices. The key primitive is that Dictionary-Lookup-Transformormations (DLT) is proposed to replace Linear Transformation (LT) in multi-modal detectors where each weight in Linear Transformation (LT) is approximately factorized into a smaller dictionary, index, and coefficient. This way, the enormous linear projection with weights is converted into efficient linear projection with dictionaries, a few lookups and scalings with indices and coefficients. DLT can be applied to any pretrained multi-modal detectors, removing the need to perform expensive training from scratch. To tackle the challenging training of DLT due to non-differentiable index, we convert the index and coefficient into a sparse matrix, train this sparse matrix during the fine-tuning phase, and recover it back to index and coefficient during the inference phase. Our experiments on phrase grounding, referring expression comprehension and segmentation, and VQA show that our Lite-MDETR achieves similar accuracy as the prior multimodal detectors with up to ∼ 4.1× model size reduction. Artificial Intelligence

Reducing FDD MMU form factor with active cancellation

Author: Khurram Muhammad et al. Jin Yuan, Zhang Shaomin, Chance Tarver, Xinguang Xu, Yu Liu, Jie Li, Junghwan Moon, Matthew Tonnemacher, Gary Xu, Charlie Zhang

Published: IEEE/MTT-S International Microwave Symposium (IMS)

Jun 19, 2022

In this paper, a multi-channel self-interference cancellation (SIC) technique is proposed to reduce the size of AWS/PCS dual-band 5G FDD massive MIMO base station. Combining two bands with small frequency offset such as PCS and AWS bands in a base station requires modification of frequency duplex cavity filters to provide a wide passband with extremely narrow gap between the passband and the stop band. Such duplexer is hard to realize. We propose a novel multi-channel SIC to allow the dual-band operation with the same form factor as the single-band base station. To verify the feasibility of this idea, a proof of concept (PoC) prototype is developed to demonstrate the feasibility of multi-channel SIC to this problem. Next Generation Communications

UbiLung: Multi-modal Passive Sensing for Lung Health Assessment

Author: Ebrahim Nematihosseinabadi et al. Viswam Nathan, Korosh Vatanparvar, Tousif Ahmed, Mahbubur Rahman, Jilong Kuang, Alex Gao

Published: International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

May 23, 2022

Spirometry test has been the gold standard for the measurement of a pulmonary patient’s lung function for decades. Spirometry is generally done in the hospital setting, where patients need to forcefully blow air into the spirometer’s tubes under the guidance of clinicians. Such a procedure is time-consuming, cumbersome, and extremely effort-dependent. Recent advances in ubiquitous computing investigate the feasibility of leveraging commodity devices such as smartphones to replace the standard clinical spirometry test. However, existing solutions are still demanding, usually requiring users to complete a series of tasks such as blowing towards a microphone, and could potentially introduce risks such as dizziness and shortness of breath due to the forced blowing. More importantly, the test is still dependent on the user’s effort which naturally degrades when no supervision exists. We propose UbiLung, a new method that leverages passively sensed modalities for lung function estimation. Such a method relies on the physiological correlation of the introduced passive modalities to the lung function, which consequently obviates the need for active user engagement yet can provide an accurate effort-independent measurement. We focus on sensor modalities that are feasible in passive sensing: cough and speech sound collected from microphones and blood volume pulse (BVP) signals collected via photoplethysmography (PPG) sensors. Through feature extraction and selection, our best machine learning models achieve mean absolute error of 11.1% for estimation of FVC perdicted percentage, 11.8% for FEV1 predicted percentage, and 7.4% for FEV1/FVC prediction. It significantly outperforms the baseline, with an average relative improvement of 13.9%. The generalizability of the model was further verified by an average improvement of 7.8% against baselines when applying the model directly on a completely separate and independent dataset. Moreover, we investigated important confounding factors (e.g., age, gender, and smoking behavior) and augment the results by 4.5% on average. In addition to the parameter estimation, we also trained models for a series of pulmonary disease diagnosis tasks. Our method achieves a F1-score of 0.982 on healthy v.s. diseased, 0.881 on obstructive v.s. non-obstructive, 0.854 on COPD v.s. asthma, and 0.892 on non-severe v.s. severe classification. Our technique is the first multi-modal effort-independent passive estimation of lung function, which could shed light on the passive monitoring of both pulmonary patients and general population. Digital Health


Author: Ebrahim Nematihosseinabadi et al. Tousif Ahmed, Minh Dinh, Jilong Kuang, Alex Gao

Published: International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

May 23, 2022
Persistent coughs are a major symptom of respiratory-related diseases. Increasing research attention has been paid to detecting coughs using wearables. Among all types of sensors utilized, microphone is most widely used to detect coughs. However, the intense power consumption needed to process audio signals prevents acoustic sensors from being continuously powered on battery-limited commercial wearable products, such as earbuds. In this work, we present CoughTrigger, which utilizes a lower-power sensor, an inertial measurement unit (IMU), in earbuds as a cough detection activator to trigger a higher-power sensor for audio processing and classification. It is able to run all-the-time as a standby service with minimal battery consumption and trigger the audio-based cough detection when a candidate cough is detected from the IMU. Besides, the use of IMU brings the benefit of improved specificity of cough detection. Experiments are conducted on 45 subjects and achieved 90% sensitivity and 60% specificity for cough detection activation. Artificial Intelligence

Beam Management with Orientation and RSRP using Deep Learning for Beyond 5G Systems

Author: Khuong Nhat Nguyen et al. Anum Ali, Jianhua Mo, Boon Loong Ng, Vutha Va, Charlie Zhang

Published: IEEE International Conference on Communications (ICC)

May 16, 2022
Beam management (BM), i.e., the process of finding and maintaining a suitable transmit and receive beam pair, can be challenging, particularly in highly dynamic scenarios. Side-information, e.g., orientation, from on-board sensors can help in the user equipment (UE) BM. In this work, we use the orientation information coming from inertial measurement unit (IMU) for effective BM. We use a data-driven strategy and fuse the reference signal received power (RSRP) information with orientation information using an artificial neural network (ANN). Simulation results show that the proposed strategy performs better than the conventional BM and an orientation-assisted BM strategy that utilizes particle filter in another study. Specifically, the proposed data-driven strategy improves the beam-prediction accuracy up to 34% and reduces mean reference signal received power (RSRP) loss caused by sub-optimal beam-selection by up to 4.2 dB when the UE has fast rotation speed. Next Generation Communications

End-to-end 6G Terahertz Wireless Platform with Adaptive Transmit and Receive Beamforming

Author: Shadi Abu-Surra et al. Won Suk Choi, SungTae Choi, Eunyoung Seok, Dongjoo Kim, Navneet Sharma, Siddharth Advani, Vitali Loseu, KITAEK BAE, ILJU NA, Gary Xu, Charlie Zhang

Published: IEEE International Conference on Communications (ICC)

May 16, 2022

6G is envisioned to provide ultimate experience for all through hyper-connectivity involving humans and everything, with unprecedented requirements and expectations [1]. In this vision, terahertz (THz) technology is a leading candidate to realize the 6G requirements. This paper presents the latest development and results of a terahertz wireless prototyping platform, which is being developed in Samsung research lab. The platform currently supports real-time transmission of 6 Gbps of data over a 2 GHz channel centered around 135 GHz with adaptive beamforming at the transmitter and receiver. The modem is designed to handle data-rate up to 36 Gbps, supports two MIMO streams, and aggregates two 2GHz channels. This paper also presents the specifications of the current RF units and discusses the challenges faced during the design and fabrication of these units. Next Generation Communications

PolarDenseNet: A Deep Learning Model for CSI Feedback in MIMO Systems

Author: Pranav Madadi et al. Jeongho Jeon, Joonyoung Cho, Caleb Lo, Juho Lee, Charlie Zhang

Published: IEEE International Conference on Communications (ICC)

May 16, 2022

In multiple-input multiple-output (MIMO) systems, the high-resolution channel information (CSI) is required at the base station (BS) to ensure optimal performance, especially in the case of multi-user MIMO (MU-MIMO) systems. In the absence of channel reciprocity in frequency division duplex (FDD) systems, the user needs to send the CSI to the BS. Often the large overhead associated with this CSI feedback in FDD systems becomes the bottleneck in improving the system performance. In this paper, we propose an AI-based CSI feedback based on
an auto-encoder architecture that encodes the CSI at UE into a low-dimensional latent space and decodes it back at the BS by effectively reducing the feedback overhead while minimizing
the loss during recovery. Our simulation results show that the AI-based proposed architecture outperforms the state-of-the-art high-resolution linear combination codebook using the DFT basis adopted in the 5G New Radio (NR) system. Next Generation Communications

Atrial Fibrillation Detection and Atrial Fibrillation Burden Estimation via Wearables

Author: Li Zhu et al. Viswam Nathan, Jilong Kuang, Jacob Kim, Alex Gao

Published: IEEE Journal of Biomedical and Health Informat

May 1, 2022

Atrial Fibrillation (AF) is an important cardiac rhythm disorder, which if left untreated can lead to serious complications such as a stroke. AF can remain asymptomatic, and it can progressively worsen over time; it is thus a disorder that would benefit from detection and continuous monitoring with a wearable sensor. Here, we develop an AF detection algorithm, deploy it on a smartwatch, prospectively and comprehensively validate its performance on a real-world population that included patients diagnosed with AF. The algorithm showed a sensitivity of 87.8% and a specificity of 97.4% over every 5-minute segment of PPG evaluated. Furthermore, we introduce novel algorithm blocks and system designs to increase the time of coverage and monitor for AF even during periods of motion noise and other artifacts that would be encountered in daily-living scenarios. An average of 67.8% of the entire duration the patients wore the smartwatch produced a valid decision. Finally, we present the ability of our algorithm to function throughout the day and estimate the AF burden, as a first-of-this-kind measure using wearable sensor, showing 98% correlation with the ground truth and an average error of 6.2%.

Authors from UCSF:
Robert Avram, 10%
Jeffrey Olgin, 10% Digital Health

An Information Fusion Approach to Learning With Instance-Dependent Label Noise

Author: Li Li et al. Rui Chen, Soo-Hyun Choi, Xia Hu

Published: International Conference on Learning Representation (ICLR)

Apr 25, 2022
Instance-dependent label noise (IDN) widely exists in real-world datasets and usually misleads the training of deep neural networks. Noise transition matrix (i.e., the probability that clean labels flip into noisy labels) is used to characterize the label noise and achieves statistically consistent classifiers for underlying distribution that the data belongs to. However, most of instances are long-tail, i.e., the number of appearance for each instance is usually limited, which leads to the gap between underlying distribution and empirical distribution, and model degeneration. To mitigate the distribution mismatch problem, we propose posterior transition matrix to posteriorly model label noise given limited observed noisy labels achieving statistically consistent classifiers for underlying and empirical distribution}. Note that even if the instance is corrupted by the same noise transition matrix, the intrinsic randomness incurs to different noisy labels, and thus requires different correction methods. Motivated by this observation, we propose an Information Fusion (IF) approach to fine-tune the noise transition matrix based on estimated posterior transition matrix. Specifically, we adopt the noisy labels and model predicted probability to estimate posterior transition matrix and then correct the noise transition matrix in forward propagation. Empirical evaluations on synthetic and real-world datasets demonstrate that our method is superior to the state-of-the-art approaches, and achieve more stable training for learning from the instance-dependent label noise. Artificial Intelligence

Language model compression with weighted low-rank factorization

Author: Yen-Chang Hsu et al. Ting Hua, Sung-En Chang, Qian Lou, Yilin Shen, Hongxia Jin

Published: International Conference on Learning Representation (ICLR)

Apr 25, 2022

Factorizing a large matrix into small matrices is a popular strategy for model compression. Singular value decomposition (SVD) plays a vital role in this compression strategy, approximating a learned matrix with fewer parameters. However, SVD minimizes the squared error toward reconstructing the original matrix without gauging the importance of the parameters, potentially giving a larger reconstruction error for those who affect the task accuracy more. In other words, the optimization objective of SVD is not aligned with the task accuracy. In this work, we propose using Fisher information to weigh the importance of parameters affecting the model prediction, then perform a weighted SVD to factorize the learned matrices of a neural network model. Although our factorized matrices are not necessary to have a smaller reconstruction error, they retain better task accuracy. We perform analysis with the transformer-based language models, showing our weighted SVD significantly reduces the misaligned optimization objectives between low-rank factorization and task accuracy.
The evaluation of compressing compact models shows our method can further reduce 9% to 30% parameters without affecting task accuracy. Artificial Intelligence

CSI Feedback for Distributed MIMO

Author: Gilwon Lee et al. Md Saifur Rahman, Eko Onggosanusi

Published: IEEE Wireless Communications and Networking Conference (WCNC)

Apr 10, 2022

In this paper, we consider a distributed multi-input-multi-output (D-MIMO) system wherein multiple radio remote heads (RRHs) distributed in a cell are connected with a single baseband unit. To enable coherent joint transmission from multiple RRHs in the D-MIMO system, we propose several channel state information (CSI) codebooks as candidates for enhancements in the context of 3rd Generation Partnership Project (3GPP) 5G New Radio (NR) standardization. The proposed codebooks are developed based on the 5G Release-16 Type-II CSI codebook framework. In addition, we propose dynamic RRH selection (DRS) methods that are able to obtain performance gain and reduce the amount of feedback by sending the CSI only for the selected RRHs having dominant channel qualities. System-level simulation (SLS) results under realistic scenarios are provided to validate the potential of the proposed CSI codebooks
and DRS methods Next Generation Communications

DictFormer: Tiny Transformer with Shared Dictionary

Author: Qian Lou et al. Ting Hua, Yen-Chang Hsu, Yilin Shen, Hongxia Jin

Published: International Conference on Learning Representation (ICLR)

Mar 10, 2022

We introduce DictFormer with efficient shared dictionary to provide a compact, fast, and accurate transformer model. DictFormer significantly reduces the redundancy in the transformer’s parameters by replacing the prior transformer’s parameters with compact, shared dictionary, few unshared coefficients and indices. Also, DictFormer enables faster computations since expensive weights multiplications are converted into cheap shared look-ups on dictionary and few linear projections. Training dictionary and coefficients are not trivial since indices used for looking up dictionary are not differentiable. We adopt a sparse-constraint training with l1 norm relaxation to learn coefficients and indices in DictFormer. DictFormer is flexible to support different model sizes by dynamically changing dictionary size. Compared to existing lightweight Transformers, DictFormer consistently reduces model size over Transformer on multiple tasks, e.g., machine translation, abstractive summarization, and language modeling. Extensive experiments show that DictFormer reduces 3.6× to 8.9× model size with similar accuracy over multiple tasks, compared to Transformer. Artificial Intelligence

ISEEQ: Information Seeking Question Generation using Dynamic Meta-Information Retrieval and Knowledge Graphs

Author: Kalpa Gunaratna et al. Vijay Srinivasan, Hongxia Jin

Published: National Conference on Artificial Intelligence (AAAI)

Feb 22, 2022

Conversational Information Seeking (CIS) is a relatively new research area within conversational AI that attempts to seek information from end-users in order to understand and satisfy users’ needs. If realized, such a system has far-reaching benefits in the real world; for example, a CIS system can assist clinicians in pre-screening or triaging patients in healthcare. A key open sub-problem in CIS that remains unaddressed in the literature is generating Information Seeking Questions (ISQs) based on a short initial query from the enduser. To address this open problem, we propose Information SEEking Question generator (ISEEQ), a novel approach for generating ISQs from just a short user query, given a large text corpus relevant to the user query. Firstly, ISEEQ uses a knowledge graph to enrich the user query. Secondly, ISEEQ uses the knowledge-enriched query to retrieve relevant context passages to ask coherent ISQs adhering to a conceptual flow. Thirdly, ISEEQ introduces a new deep generative adversarial reinforcement learning-based approach for generating ISQs. We show that ISEEQ can generate high-quality ISQs to promote the development of CIS agents. ISEEQ significantly outperforms comparable baselines on five ISQ evaluation metrics across four datasets having user queries from diverse domains. Further, we argue that ISEEQ is transferable across domains for generating ISQs, as it shows the acceptable performance when trained and tested on different pairs of domains. The qualitative human evaluation confirms ISEEQ-generated ISQs are comparable in quality to human-generated questions and outperform the best comparable baseline. Artificial Intelligence

Model-driven Machine Learning Approaches for Mobility Classification in Intelligent 5G Network

Author: Tiexing Wang et al. Yeqing Hu, Yang Li, Junmo Sung, Rui Wang, Charlie Zhang

Published: IEEE Wireless Communications and Networking Conference (WCNC)

Dec 31, 2021

Channel information is essential to unleash the benefits of 5G New Radio (NR) by enabling network intelligence that adapts transmissions to users’ channels. In this paper, we propose model-driven feature design and use support vector machine to classify users’ speed range. Our model-driven features are designed based on stochastic channel modeling. Multiple features are derived from time-domain cross-correlation and time-domain auto-correlation function of the sounding reference signals. The classifier is trained and verified with extensive standard compliant simulation channels at different SNR levels and speeds, and attains greater than 90% accuracy. Next Generation Communications

SAFENet: A Secure, Accurate and Fast Neural Network Inference

Author: Qian Lou et al. Yilin Shen, Hongxia Jin

Published: International Conference on Learning Representation (ICLR)

Dec 12, 2021

The advances in neural networks have driven many companies to provide prediction services to users in a wide range of applications. However, current prediction systems raise privacy concerns regarding the users private data. A cryptographic neural network inference service is an efficient way to allow two parties to execute neural network inference without revealing either party’s data or model. Nevertheless, existing cryptographic neural network inference services suffer from huge running latency; in particular, the latency of communication-expensive cryptographic activation function is 3 orders of magnitude higher than plaintext-domain activation function. And activations are the necessary components of the modern neural networks. Therefore, slow cryptographic activation has become the primary obstacle of efficient cryptographic inference.

In this paper, we propose a new technique, called SAFENet, to enable a Secure, Accurate and Fast nEural Network inference service. To speedup secure inference and guarantee inference accuracy, SAFENet includes channel-wise activation approximation with multiple-degree options. This is implemented by keeping the most useful activation channels and replacing the remaining, less useful, channels with various-degree polynomials. SAFENet also supports mixed-precision activation approximation by automatically assigning different replacement ratios to various layer; further increasing the approximation ratio and reducing inference latency. Our experimental results show SAFENet obtains the state-of-the-art inference latency without a decrease in accuracy, reducing latency by $38\% \sim 61\%$ over prior techniques on various encrypted datasets. Artificial Intelligence

RRMonitor: A Resource-Aware End-to-End System for Continuous Monitoring of Respiration Rate Using Earable Devices

Author: Tousif Ahmed et al. Mahbubur Rahman, Mohsin Ahmed, Ebrahim Nematihosseinabadi, Minh Dinh, Nathan Robert Folkman, Jilong Kuang, Alex Gao

Published: Engineering in Medicine and Biology Conference (EMBC)

Nov 5, 2021

Respiration rate is considered as a critical vital sign, and daily monitoring of respiration rate could provide helpful information about any acute condition in the human body. While researchers have been exploring mobile devices for respiration rate monitoring, passive and continuous monitoring is still not feasible due to many usability challenges (e.g., active participation) in existing approaches. This paper presents an end-to-end system called RRMonitor that leverages the movement sensors from commodity earbuds to continuously monitor the respiration rate in near real-time. While developing the systems, we extensively explored some key parameters, algorithms, and approaches from existing literature that are better suited for continuous and passive respiration rate monitoring. RRMonitor can passively track the respiration rate with a mean absolute error as low as 1.64 cycles per minute without requiring active participation from the user. Digital Health

A Novel Multi-Center Template-Matching Algorithm and Its Application for Cough Detection

Author: Ebrahim Nematihosseinabadi et al. Tousif Ahmed, Mahbubur Rahman, Jilong Kuang, Alex Gao

Published: Engineering in Medicine and Biology Conference (EMBC)

Nov 2, 2021

In time series classification problems, K-Nearest Neighbors (KNN) combined with elastic distance measurement (esp. Dynamic Time Warping (DTW)) achieves outstanding classification performance. However, it is often regarded as prohibitively time-consuming. Nearest Centroid Classifier is thereafter proposed. But the accuracy is comprised with only one centroid obtained for each class. Centroid-based Classifier performs clustering and averaging for each cluster, but requires manually setting the number of clusters. In this work, we propose a novel self-tuning multi-center template-matching algorithm, which can automatically adjust the number of clusters to balance accuracy and inference time. Through experiments conducted on synthetic datasets and a real-world earbud-based cough dataset, we demonstrate the superiority of our proposed algorithm in terms of both accuracy and inference time. Digital Health

Using Neighborhood Context to Improve Information Extraction from Visual Documents Captured on Mobile Phones

Author: Kalpa Gunaratna et al. Vijay Srinivasan, Sandeep Nama, Hongxia Jin

Published: International Conference on Information and Knowledge Management (CIKM)

Nov 1, 2021

Information Extraction from visual documents is useful in practice to enable intelligent assistant to users. We present an approach that combines local context information and contextual language models to improve information extraction accuracy. We show that our method is able to perform well across model sizes and able to work well with small models that can be useful in applications that need efficient processing (e.g., mobile computing). Our method outperformed state-of-the-art global context based technique and our implementation on a mobile platform suggests its usefulness in practical real-world applications. Artificial Intelligence

SpeechSpiro: Lung Function Assessment from Speech Pattern as an Alternative to Spirometry for Mobile Health Tracking

Author: Korosh Vatanparvar et al. Viswam Nathan, Ebrahim Nematihosseinabadi, Mahbubur Rahman, Daniel McCaffrey, Jilong Kuang, Alex Gao

Published: Engineering in Medicine and Biology Conference (EMBC)

Oct 31, 2021

Abstract—Respiratory illnesses are common in the United States and globally which people deal with in various forms, such as asthma, chronic obstructive pulmonary diseases or infectious respiratory diseases (e.g. from coronavirus). Lung function of the subjects affected by these illnesses is compromised due to infection and/or inflammation in their respiratory airways. There are clinically-validated tests to assess lung function using in-clinic medical equipment, and quite recently, via portable spirometry devices. Research has shown that the obstruction and restriction in the respiratory airways affect individuals’ voice characteristics, where the audio features could be analyzed to predict the lung function and severity of the obstruction. In this paper, we go beyond well-known voice audio features and create a hybrid deep learning model using CNN-LSTM to discover spatiotemporal patterns in speech and predict the lung function parameters with accuracy comparable to conventional devices. We validate the performance and generalizability of our method using the data collected from 200 subjects enrolled in two studies internally and in collaboration with a pulmonary hospital. SpeechSpiro measures lung function parameters (e.g. FEV1, FVC, FEV1/FVC) with mean RMSE of 12% and R2 of up to 76% using 60-second phone audio recording of individuals reading a passage.

Clinical relevance — Speech-based spirometry (SpeechSpiro) eliminates the need for an additional device and carries out the lung function assessment outside the clinical settings using a smartphone; hence, enabling continuous mobile health tracking for the individuals, healthy or with a respiratory illness. Digital Health

Device Invariant Deep Neural Networks for Pulmonary Audio Event Detection Across Mobile and Wearable Devices

Author: Mohsin Ahmed et al. Li Zhu, Mahbubur Rahman, Tousif Ahmed, Jilong Kuang, Alex Gao

Published: Engineering in Medicine and Biology Conference (EMBC)

Oct 31, 2021

Mobile and wearable devices are being increasingly used for developing audio based machine learning models to infer pulmonary health, exacerbation and activity. A major challenge to widespread usage and deployment of such pulmonary health monitoring audio models is to maintain accuracy and robustness across a variety of commodity devices, due to the effect of device heterogeneity. Because of this phenomenon, pulmonary audio models developed with data from one type of device perform poorly when deployed on another type of device. In this work, we propose a framework incorporating feature normalization across individual frequency bins and combining task specific deep neural networks for model invariance across devices for pulmonary event detection. Our empirical and extensive experiments with data from 131 real pulmonary subjects and healthy controls show that our framework can recover upto163.6% of the accuracy lost due to device heterogeneity for four different pulmonary classification tasks across two broad classification scenarios with two common mobile and wearable devices: smartphone and smartwatch. Digital Health

Real-Time Limb Motion Tracking with a Single IMU Sensor for Physical Therapy Exercises

Author: Wenchuan Wei et al. Keiko Kurita, Jilong Kuang, Alex Gao

Published: Engineering in Medicine and Biology Conference (EMBC)

Oct 31, 2021

Limb exercises are common in physical therapy to improve range of motion (RoM), strength, and flexibility of the arm/leg. To improve therapy outcomes and reduce cost, motion tracking systems have been used to monitor the user’s movements when performing the exercises and provide guidance. Traditional motion tracking systems are based on either cameras or inertial measurement unit (IMU) sensors. Camera-based systems face problems caused by occlusion and lighting. Traditional IMU-based systems require at least two IMU sensors to track the motion of the entire limb, which is not convenient for use. In this paper, we propose a novel limb motion tracking system that uses a single 9-axis IMU sensor that is worn on the distal end joint of the limb (i.e., wrist for the arm or ankle for the leg). Limb motion tracking using a single IMU sensor is a challenging problem because 1) the noisy IMU data will cause drift problem when estimating position from the acceleration data, 2) the single IMU sensor measures the motion of only one joint but the limb motion consists of motion from multiple joints. To solve these problems, we propose a recurrent neural network (RNN) model to estimate the 3D positions of the distal end joint as well as the other joints of the limb (e.g., elbow or knee) from the noisy IMU data in real time. Our proposed approach achieves high accuracy with a median error of 4.4/4.1 cm for the wrist/elbow joint when tracking the arm motion, outperforming the state-of-the-art approach by 50%. In addition, the proposed model is lightweight, enabling real-time applications on mobile devices. Digital Health

Listen2Cough: Leveraging End-to-End Deep Learning Cough Detection Model to Enhance Lung Health Assessment Using Passively Sensed Audio

Author: Ebrahim Nematihosseinabadi et al. Korosh Vatanparvar, Viswam Nathan, Tousif Ahmed, Mahbubur Rahman, Daniel McCaffrey, Jilong Kuang, Jun Gao

Published: ACM International Conference on Ubiquitous Computing (UbiComp)

Sep 13, 2021

The prevalence of ubiquitous computing enables new opportunities for lung health monitoring and assessment. In the past few years, there have been extensive studies on cough detection using passively sensed audio signals. However, the generalizability of a cough detection model when applied to external datasets, especially in real-world implementation, is questionable and not explored adequately. Beyond detecting coughs, researchers have looked into how cough sounds can be used in assessing lung health. However, due to the challenges in collecting both cough sounds and lung health condition ground truth, previous studies have been hindered by the limited datasets. In this paper, we propose Listen2Cough to address these gaps. We first build an end-to-end deep learning architecture using public cough sound datasets to detect coughs within raw audio recordings. We employ a pre-trained MobileNet and integrate a number of augmentation techniques to improve the generalizability of our model. Without additional fine-tuning, our model is able to achieve an F1 score of 0.948 when tested against a new clean dataset, and 0.884 on another in-the-wild noisy dataset, leading to an advantage of 5.8% and 8.4% on average over the best baseline model, respectively. Then, to mitigate the issue of limited lung health data, we propose to transfer the feature representation from the cough detection task to lung health assessment tasks so that the rich cough data can be leveraged. Our hypothesis is that these tasks extract and utilize similar effective representation from cough sounds. We embed the cough detection model into a multi-instance learning framework with the attention mechanism and further tune the model for lung health assessment tasks. Our final model achieves an F1-score of 0.912 on healthy v.s. unhealthy, 0.870 on obstructive v.s. non-obstructive, and 0.813 on COPD v.s. asthma classification, outperforming the baseline by 10.7%, 6.3%, and 3.7%, respectively. Moreover, the weight value in the attention layer can be used to identify important coughs highly correlated with lung health, which can potentially provide interpretability for expert diagnosis in the future. Digital Health

ToA-based Localization of Far-Away Targets: Equi-DOP Surfaces, Asymptotic Bounds, and Dimension Adaptation

Author: Raghunandan M Rao et al. Boon Loong Ng, YI YANG, Moon-Seok Kang

Published: IEEE Transactions on Vehicular Technology

Sep 3, 2021

This paper studies the Dilution of Precision (DOP) in the Time-of-arrival (ToA)-based localization of targets outside the anchors convex hull. In the far-away target regime, we derive a closed-form expression of the DOP that reveals a linear asymptotic scaling law. We characterize the asymptotic DOP bounds, equi-DOP surfaces/contours in 3D/2D localization scenarios, which quantifies the reliability of location estimates on a trajectory. Motivated by vehicular applications, we propose a range-aided dimension adaptation scheme. Here the localization dimension is adapted in real-time using a single range measurement such that the maximum or root-mean-square DOP does not exceed a threshold. Since high-accuracy localization of far-away targets is infeasible due to linear DOP scaling with distance, this scheme prioritizes high-performance tracking of nearby targets while monitoring far-away targets with range-only measurements. Next Generation Communications

Automatic Mixed-Precision Quantization Search of BERT

Author: Changsheng Zhao et al. Ting Hua, Yilin Shen, Hongxia Jin

Published: International Joint Conference on Artificial Intelligence (IJCAI)

Aug 21, 2021

Pre-trained language models such as BERT have shown great effectiveness in various natural lan- guage processing tasks. However, these models usually contain millions of parameters, which prevent them from the practical deployment on resource-constrained devices. Knowledge distilla- tion, Weight pruning, and Quantization are known to be the main directions in model compression. In this field of pre-trained language model com- pression, most existing work aims to obtain a com- pact model through knowledge distillation from the original larger model, which may suffer from sig- nificant accuracy drop even for a relatively small compression ratio. On the other hand, there are only a few attempts based on quantization designed for natural language processing tasks, and they usually require manual setting on hyper-parameters. In this paper, we proposed a BERT compression approach that can achieve automatic mixed-precision quanti- zation, which can conduct quantization and prun- ing at the same time. Specifically, our proposed method leverages differentiable Neural Architec- ture Search to automatically assign scales and pre- cision for parameters in each sub-group, and mean- while pruning out redundant groups of parameters. Extensive evaluations on BERT downstream tasks reveal that our proposed method beats baselines by providing the same performance with much smaller model size. We also show the possibility of obtain- ing the extremely light-weight model by combining our solution with orthogonal methods such as Dis- tilBERT. Artificial Intelligence

Enhancing the Generalization for Intent Classification and Out-of-Domain Detection in SLU

Author: Yilin Shen et al. Yen-Chang Hsu, Avik Ray, Hongxia Jin

Published: Association for Computational Linguistics (ACL)

Aug 2, 2021

Intent classification is a major task in spoken language understanding (SLU). Since most models are built with pre-collected in-domain (IND) training utterances, their ability to detect unsupported out-of-domain (OOD) utterances is critical in practice. Recent works showed that using extra data and labels can improve the OOD detection performance, yet it could be costly to collect such data. In this paper, we propose to train a joint model only on IND training set to support both IND intent classification and OOD detection. Our method explicitly models a domain variable to learn the domain disentangled utterance representation, named DDM model. DDM can be used as a drop-in replacement for any deep neural intent classifier. To further improve OOD detection performance, we introduce confidence and feature based OOD detection methods to combine with DDM and BERT-based models. On all three benchmark SLU datasets and one in-house dataset, we show that our method built on BERT and RoBERTa models achieve the state-of-the-art performance against existing approaches as well as multiple BERT based strong baselines for both intent classification and OOD detection tasks. Artificial Intelligence

Real-Time 3D Arm Motion Tracking using the 6-axis IMU sensor of a Smartwatch

Author: Wenchuan Wei et al. Keiko Kurita, Jilong Kuang, Jun Gao

Published: IEEE-EMBS International Conference on Biomedical and Health Informatics(BHI) and the Body Sensor Networks(BSN) Conferences

Jul 27, 2021

Inertial measurement unit (IMU) sensor is widely used in motion tracking for various applications, e.g., virtual physical therapy and fitness training. Traditional IMU-based motion tracking systems use 9-axis IMU sensors that include an accelerometer, gyroscope, and magnetometer. The magnetometer is essential to correct the yaw drift in orientation estimation. However, its magnetic field measurement is often disturbed by the ferromagnetic materials in the environment and requires frequent calibration. Moreover, most IMU-based systems require multiple IMU sensors to track the body motion and are not convenient for use. In this paper, we propose a novel approach that uses a single 6-axis IMU sensor of a consumer smartwatch without any magnetometer to track the users 3D arm motion in real time. We use a recurrent neural network (RNN) model to estimate the 3D positions of both the wrist and the elbow from the noisy IMU data. Compared with the state-of-the-art approaches that use either the 9-axis IMU sensor or the combination of a 6-axis IMU and an extra device, our proposed approach significantly improves the usability and potential for pervasiveness by not requiring an magnetometer or any extra device, while achieving comparable results. Digital Health

CoughBuddy: Multi-Modal Cough Event Detection Using Earbuds Platform

Author: Ebrahim Nematihosseinabadi et al. Tousif Ahmed, Mahbubur Rahman, Jilong Kuang, Jun Gao

Published: IEEE-EMBS International Conference on Biomedical and Health Informatics(BHI) and the Body Sensor Networks(BSN) Conferences

Jul 27, 2021

The prevalence of novel wearable devices has opened new horizons of opportunity for lung health monitoring and assessment in the past decade. There has been extensive amount of study on cough detection using acoustic features of the cough from smartphones and smartwatches. However, the specificity of the algorithms has always been a concern when exposed to the unseen field data that contain cough-like sounds. In this paper, we propose a novel sensor fusion algorithm that employs a hybrid of classification and template matching algorithms to tackle the problem of unseen classes. The algorithm utilizes in-ear audio signal as well as head motion captured by the inertial measurement unit (IMU). A large study including 45 subjects from healthy and chronic cough cohorts was conducted that contained various tasks including cough and cough-like body sounds in various conditions such as quite/noisy and stationary/non-stationary. Our proposed hybrid algorithm which comprises audio-event classification and a dynamic time warping (DTW)-based IMU template matching is evaluated for sensitivity and specificity in the aforementioned conditions using leave one-subject out validation (LOSOV). Our model is able to achieve an average sensitivity of 83% for stationary tasks with an average specificity of 91.7% for cough-like sounds reducing the false positive rate by 55%. These results indicate the feasibility and superiority of earbuds platforms for detection of pulmonary sound events such as cough. Digital Health

Better Battery Life: Towards Energy-Efficient Smartwatch-Based Atrial Fibrillation Detection in Ambulatory Free-living Environment

Author: Retiree et al. Li Zhu, Viswam Nathan, Jilong Kuang

Published: IEEE-EMBS International Conference on Biomedical and Health Informatics(BHI) and the Body Sensor Networks(BSN) Conferences

Jul 27, 2021

Atrial Fibrillation (AF) is an important medical condition that an be passively detected and tracked using a smartwatch. Diagnosis and monitoring of AF can be more effective and reliable if the smartwatch senses continuously, but this can lead to significant battery consumption by the LED in the photoplethysmography (PPG) sensor. In this paper, we explore the feasibility of leveraging downsampling to achieve energy-efficient AF detection. We collect data from participants with paroxysmal AF in real ambulatory free-living environments using a commercial smartwatch and separately study the impact of uniform downsampling and compressed sensing on AF detection. Our results reveal that downsampling enables the AF detection system to consume about 77.4% less LED power than the original sampling strategy without a significant performance drop Digital Health

Towards Motion-Aware Passive Resting Respiratory Rate Monitoring Using Earbuds

Author: Mahbubur Rahman et al. Tousif Ahmed, Mohsin Ahmed, Ebrahim Nematihosseinabadi, Minh Dinh, Nathan Robert Folkman, Jilong Kuang, Jun Gao

Published: IEEE-EMBS International Conference on Biomedical and Health Informatics(BHI) and the Body Sensor Networks(BSN) Conferences

Jul 27, 2021

Breathing rate is an important vital sign and an indicator of overall health and fitness. Traditionally breathing is monitored using specialized devices such as chestband or spirometers. However, these are uncomfortable for everyday use. Recent works show the feasibility of estimating breathing rate using earbuds. However, non-breathing head motion is one of the biggest challenges for accurate breathing rate estimation using earbuds or other head-mounted devices such as smart-glass. In this paper, we propose an algorithm to estimate the breathing rate in presence of non-breathing head motion using inertial sensors embedded in commodity earbuds. Using the chestband as a reference device, we show that our algorithms can estimate breathing rate in resting positions with $\pm$ 2.63 breaths per minute (BPM) error. However, when the algorithms developed on data without head motion and applied to the data with head motion, the error significantly increases. Our head-motion handling algorithm proposed in this paper can improve the accuracy up to 30\% in the presence of non-breathing head motion. This paper can help make a big stride towards passive breathing monitoring in everyday life using commodity earbuds which are increasingly becoming popular nowadays. Digital Health

Fractionally Spaced Equalizer for Next Generation Terahertz Wireless Communication Systems

Author: Jeongho Jeon et al. Joonyoung Cho, Shadi Abu-Surra, KITAEK BAE, Charlie Zhang

Published: IEEE International Conference on Communications (ICC)

Jun 14, 2021

Higher data rates are required to support exponential growth in wireless traffic, motivating an expansion of the transmission bandwidth for sixth generation (6G) communications. The available bandwidth in the terahertz (THz) band significantly exceeds the available bandwidth in the mmWave band that has been adopted in fifth generation (5G) systems; thus, the THz band is envisioned as a pillar for 6G systems that can support data rates on the order of terabits per second (Tb/s). However, wireless communications in the THz band poses several new challenges. One of these challenges involves the practical constraint of employing a limited oversampling factor to process wideband THz signals, even while leveraging state-of-the-art analog/digital converter techniques. This limited oversampling factor – which can lead to an increased sampling timing offset – degrades the demodulation performance when it is employed in conjunction with a conventional symbol-spaced equalizer. Thus, we employ a fractionally spaced equalizer (FSE) in a THz communication system to overcome the impact of the increased sampling timing offset for a practical system that utilizes a limited sampling rate. Analysis and simulations demonstrate that the FSE can perfectly compensate the timing offset by optimally combining the available samples. Also, an approximation to the noise covariance matrix is proposed to reduce the computational complexity of the frequency-domain FSE. Next Generation Communications

End-to-end 140 GHz Wireless Link Demonstration with Fully-Digital Beamformed System

Author: Shadi Abu-Surra et al. Will Choi, SungTae Choi, Eunyoung Seok, Dongjoo Kim, Navneet Sharma, Siddharth Advani, Vitali Loseu, KITAEK BAE, ILJU NA, Gary Xu, Charlie Zhang

Published: IEEE International Conference on Communications (Workshop) (ICC W/S)

Jun 14, 2021

It is projected that mobile traffic will increase by 80x by year 2030. To meet this increase in demand, it is inevitable to utilize the terahertz bands (0.1 THz to 10 THz) for future 6G wireless systems. However, operating at such high frequency comes with several fundamental and technical challenges. In this work, we present a proof-of-concept system to demonstrate the feasibility of establishing a communication link at 140 GHz carrier frequency. In addition, this work highlights techniques to tackle the challenges that comes with operating in the terahertz regime. To the authors knowledge, this is the world’s first end-to-end system with up to 16-channel digitally-beamformed 140 GHz system and dynamic beam steering capability. The paper presents lab results which demonstrate link throughput of 6 Gbps at 15-meter distance with adaptive beamforming. Next Generation Communications

An Actor-Critic based End-to-End Neural Coreference System

Author: Yu Wang et al. Yilin Shen, Hongxia Jin

Published: International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

Jun 11, 2021

In this paper, we introduce a novel actor-critic based end-to-end neural coreference system to achieve joint tasks including mention detection, mention clustering and coreference resolution. Our model achieves the state-of-the-art performance on the CoNLL-2012 Shared Task English test set. Artificial Intelligence

An adversarial learning based multi-step spoken language understanding system through human-computer interaction

Author: Yu Wang et al. Yilin Shen, Hongxia Jin

Published: International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

Jun 11, 2021

Most of the existing spoken language understanding systems can perform only semantic frame parsing based on a singleround user query. They cannot take users’ feedback to update/add/remove slot values through multiround interaction with users. In this paper, we introduce a novel interactive adversarial reward learning-based spoken language understanding system that can leverage the multiround user’s feedback to update slot values. We perform two experiments on the benchmark ATIS dataset and demonstrate that the new system can improve parsing performance byat least 2:5% in terms of F1, with only one round of feedback. The improvement becomes even larger when the number of feedback rounds increases. Furthermore, we also compare the new system with state-of-the-art dialogue state tracking systems and demonstrate that the new interactive system can perform better on multiround spoken language understanding tasks in terms of slot- and sentence-level accuracy. Artificial Intelligence

Hyperparameter-free Continuous Learning for NLU Domain Classification

Author: Ting Hua et al. Yilin Shen, Changsheng Zhao, Yen-Chang Hsu, Hongxia Jin

Published: Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL)

Jun 8, 2021

Domain classification is the fundamental task in natural language understanding (NLU), which often requires fast accommodation to new emerging domains.
This constraint makes it impossible to retrain all previous domains, even if they are accessible to the new model.
Most existing continual learning approaches are designed for the scenario that zero old data are observable.
However, these methods may result in low accuracy and performance fluctuation, when the old and new data distributions are significantly different. And extensive efforts are often required in parameter tuning.
The key problem in many practical cases such as domain classification is not the absence of old data, but the inefficiency to retrain the model with the whole old dataset.
Is it potential to utilize little old data to yield high accuracy and maintain stable performance, while at the same time, without introducing extra parameters?
In this paper, we proposed a parameter-free continual learning model for text data that can stably produce high performance under various environments.
Specifically, we utilize Fisher information to select exemplars that can “record key information of original model.
Also, a novel scheme called dynamical weight consolidation is proposed to enable parameter-free learning during the retrain process.
Extensive experiments demonstrate baselines provide fluctuated performance which makes them useless in practice.
On the contrary, our proposed model significantly and consistently outperforms the best state-of-the-art method by up to 20\% in average accuracy, and each of its component contributes effectively to overall performance. Artificial Intelligence

Data Augmentation for Voice-Assistant NLU using BERT-based Interchangeable Rephrase

Author: Akhila Yerukola et al. Hongxia Jin

Published: European Association for Computational Linguistics (EACL)

Apr 21, 2021

We introduce a data augmentation technique based on byte pair encoding and a BERT like self-attention model to boost performance on spoken language understanding tasks. We compare and evaluate this method with a range of augmentation techniques encompassing generative models such as VAEs and performance-boosting techniques such as synonym replacement and back-translation. We show our method performs strongly on domain and intent classification tasks for a voice assistant and in a user-study focused on utterance naturalness and semantic similarity. Artificial Intelligence

Early Detection and Burden Estimation of Atrial Fibrillation in Ambulatory Free-living Environment

Author: Li Zhu et al. Viswam Nathan, Jilong Kuang, Jacob Kim, Jun Gao

Published: ACM International Conference on Ubiquitous Computing (UbiComp)

Mar 1, 2021

Early detection and accurate burden estimation of AFib can provide the foundation for effective physician treatment and attract tremendous attention in recent years. In this paper, we develop a novel smartwatch-based system to achieve detection of AFib episodes and estimation of AFib burden in ambulatory free-living environment withour user engagement. Our system leverages built-in PPG sensor to collect heart rhythm without user engagement. Then, a data preprocessor module includes
time-frequency (TF) analysis to augment features in both time and frequency domain. Finally, a super lightweight multi-view convolutional neural network consisting of 19 layers achieves the AFib detection. To validate our system, we collaborate with medical professionals and carry out a clinical study to enroll 53 participants across 3 months. For each participant, we collect and annotate more than 336 hours of data. Our systems can achieve average 91.6% accuracy, 0.930 specificity, and 0.908 sensitivity without dropping any data. Moreover, our system takes 0.51 million parameters and costs 5.18 ms per inference. These results reveal that our proposed system has the potential to provide the clinical assessment of AFib in daily living. Digital Health

MIMO Evolution Towards 6G: Modular Massive MIMO in Low-Frequency Bands

Author: Jeongho Jeon et al. Gilwon Lee, Ahmed Ibrahim, Jin Yuan, Gary Xu, Joonyoung Cho, Eko Onggosanusi, Younsun Kim, Juho Lee, Charlie Zhang

Published: IEEE Communications Magazine

Feb 28, 2021

As the pace of global 5G network deployments accelerates, now is the moment for the cellular industry to realize the sixth generation (6G) cellular communication. In this article, the so-called modular massive MIMO (mmMIMO) is presented as one candidate technology for 6G. The 5G had relentlessly pushed the boundary of the cellular system’s operating frequency to millimeter wave bands and such a trend will be continued in the 6G era to further embrace the greenfield terahertz (THz) spectrum. Admittedly, however, the technical advances in 5G for low bands fall short, although low bands are crucial in serving a large number of users in a wide coverage area. Although, it would be ideal if massive MIMO could be utilized in low bands, it is less practical due to a large antenna form factor size. mmMIMO is a technology to distribute a large active antenna array with smaller standardized antenna modules, just like the LEGO blocks. Through this, the benefits of massive MIMO can be achieved in low bands, unconstrained from the spatial limitations. In this article, the concept of mmMIMO, its applicability, and needed research efforts to realize the technology are discussed. In addition, through the demonstration of a proof-of-concept system, it is shown that the technology will be within reach at the time of 6G massive commercialization around 2030. Lastly, the performance gain of mmMIMO is evidenced by system-level simulation. Next Generation Communications

BreathTrack: Detecting Regular Breathing Phases from UnannotatedAcoustic Data Captured by a Smartphone

Author: Mahbubur Rahman et al. Tousif Ahmed, Mohsin Ahmed, Korosh Vatanparvar, Ebrahim Nematihosseinabadi, Viswam Nathan, Jilong Kuang, Alex Gao

Published: ACM International Conference on Ubiquitous Computing (UbiComp)

Feb 13, 2021

Passive and continuous monitoring of breathing biomarkers is vital for assessing well-being and detecting abnormalities in breathing patterns. In this paper, we present a novel method to detect breathing phases during regular breathing towards passive monitoring of natural breathing using acoustic sensors embedded in smartphones. Our model eliminates the need for breathing sound annotation by transferring knowledge from inertial sensor to acoustic sensor and by fusing signal processing techniques with deep learning methods. Our study with 131 subjects including healthy subjects and pulmonary patients shows that our model can detect breathing phases with 77.33% accuracy using acoustic sensors which enables novel and fine-grained breathing biomarkers such as inhalation exhalation ratio, fractional inspiratory time including commonly known vital sign called breathing rate. We further show that our algorithm can estimate fractional inspiratory time with92.08% accuracy, the inhalation-exhalation ratio with 86.76% accuracy, and the commonly known breathing rate with 91.74% accuracy. We further present the respiratory patient detection model as an example application of breathing phase detection and novel biomarker extraction. We show that fractional inspiratory time is significantly correlated with patient severity and our model can distinguish respiratory patients from healthy individuals with up to 76% accuracy. This paper is the first work to show the feasibility of detecting regular breathing phases towards passively monitoring respiratory well-being using a smartphone. Digital Health

FadeNet: Deep Learning based mm-Wave large-scale channel fading prediction and its applications

Author: Vishnu Vardhan Ratnam et al. Hao Chen, Charlie Zhang, YOUNG-JIN KIM, Retiree, MINSUNG CHO, SUNG-ROK YOON

Published: IEEE Access

Sep 30, 2020

Accurate prediction of the large-scale channel fading is fundamental to planning and optimization in 5G mm-Wave cellular networks. The current prediction methods, which are either too computationally expensive or inaccurate, are unsuitable for city-scale cell planning and optimization. This paper presents FadeNet, a convolutional neural-network enabled alternative for predicting large-scale fading with high computation speed and accuracy. By using carefully designed input features and neural-network architecture, FadeNet accurately predicts the large-scale fading from a base station to each location in its coverage area. Evaluations on realistic data, derived from mm-Wave cells across multiple cities in USA, suggest that FadeNet can achieve a prediction accuracy of 5.6 dB in root mean square error. In addition, by leveraging the parallel processing capabilities of a graphics processing unit, FadeNet can reduce the prediction time by 40X-1000X in comparison to industry prevalent methods like ray-tracing. Generalizations of FadeNet, that can handle variable topographies and base station heights, and its use for optimal cell site selection are also explored. Next Generation Communications