Join us on our Research Day!

The First Annual FIU School of Computing and Information Sciences Research Day will be held at the Tech Station (PG 6).

It will be a whole day event that consists of guest/keynote speeches, faculty presentations, student talks, and poster sessions. The event will be attended by all faculty and graduate students at SCIS. And we invite all from the university community to join us at the event.

October 25, 2019

0Days0Hours0Minutes0Seconds

Detail Agenda

Opening Plenary Session (shared with ARO AI-IA Workshop)

8:45am – 10:00am, CASE 241

8:45am                  Welcoming Remarks and Introduction of the Workshop & SCIS Research Day

Dr. S. S. Iyengar, Director, School of Computing and Information Science

9:00am                   Welcoming Remarks

Dr. Mark. B. Rosenberg, President, Florida International University

9:15am                   Welcoming Remarks

Dr. Andres Gil, Vice President of Research

9:25 am                 Welcoming Remarks

Dr. John Volakis, Dean College of Engineering and Computing

9:30am                   Workshop Objectives Overview

Dr. Cliff Wang, Computing Sciences Division Chief, US Army Research Office

9:45am                   Keynote Speaker

Dr. Angela Landress, Defense Information Systems Agency

10:30am – 11:00am       Coffee Break

Poster Lightning Talks

11:00am – 1:00pm (PG-6 116)

1:00pm – 2:00pm             Lunch Break

Guest Presentation

Speaker: Xian-He Sun, Professor, Illinois Institute of Technology

Pace-Matching Data Access: A Dynamic, Multi-layered Memory System Design

2:00pm – 3:00pm (PG-6 116)

3:00pm – 3:30pm              Coffee Break

Faculty Research Presentations

3:30pm – 4:30pm (PG-6 116)

Student Poster Session

4:30pm – 5:45pm (PG-6 116)

Closing Remarks and Announcement of Best Poster Award

5:45pm ­– 6:00pm (PG-6 116)

Special Guest Speaker

Xian-He Sun

Illinois Institute of Technology

Abstract

Computing has changed from compute-centric to data-centric. From deep-learning to visualization, data access becomes the main performance concern of computing. In this talk, based on a series of fundamental results and their supporting mechanisms, we introduce a new thought on memory system design. We first present the Concurrent-AMAT (C-AMAT) data access model to quantify the unified impact of data locality, concurrency and overlapping. Then, we introduce the pace-matching data-transfer design methodology to optimize memory system performance. Based on the pace-matching design, a memory-computing hierarchy is built to generate and transfer the final results, and to mask the performance gap between computing and data transfer. C-AMAT is used to optimize performance at each memory layer, and a global management algorithm, named Layered Performance Matching (LPM), is developed to optimize the overall performance of the memory system. The holistic pace-matching optimization is very different from the conventional locality-based system optimization, and can minimize memory-wall effects to the minimum. Experimental testing confirms the theoretical findings, with a 150x reduction of memory stall time. We will present the concept of the pace-matching data-transfer design, the design of C-AMAT and LPM, experimental case studies and results on DoE and NASA applications. We will also discuss optimization and research issues related to pace-matching data access and to memory systems in general.


Bio

Dr. Xian-He Sun is a University Distinguished Professor of Computer Science at the Department of Computer Science in the Illinois Institute of Technology (IIT). He is the director of the Scalable Computing Software laboratory at IIT and a guest faculty in the Mathematics and Computer Science Division at the Argonne National Laboratory. Before joining IIT, he worked at DoE Ames National Laboratory, at ICASE, NASA Langley Research Center, at Louisiana State University, Baton Rouge, and was an ASEE fellow at Navy Research Laboratories. Dr. Sun is an IEEE fellow and is known for his memory-bounded speedup model, also called Sun-Ni’s Law, for scalable computing. His research interests include data-intensive high-performance computing, memory and I/O systems, software system for big data applications, and performance evaluation and optimization. He has over 250 publications and 6 patents in these areas. He is the Associate Editor-in-Chief of the IEEE Transactions on Parallel and Distributed Systems, a Golden Core member of the IEEE CS society, a former vice chair of the IEEE Technical Committee on Scalable Computing, the past chair of the Computer Science Department at IIT, and is serving and served on the editorial board of leading professional journals in the field of parallel processing. More information about Dr. Sun can be found at his web site www.cs.iit.edu/~sun/.

Speakers

Hadi Amini

Assistant Professor

Trevor Cickovski

Instructor

Christine Lisetti

Associate Professor

Jason Liu

Professor & Graduate Program Director

Ananda Mondal

Assistant Professor

Fahad Saeed

Associate Professor

Presentations

Faculty Research Presentations

The human is a complex organism, possessing a genome more than 3 GB in length which produces a specific set of molecules that directly or indirectly through chemical reactions influence metabolic (or life-sustaining) activity. Complicating this picture is that only about half of the cells in our bodies contain our own DNA; the other half can be attributed to microbiota that comprise our microbiome. Each microbe contains its own genetic material that can influence both itself, other microbes, and its human host. Our research analyzes the microbiome from many perspectives; including composition, ecology (through microbial ”social networks”), and multi-omics analysis which accounts for underlying chemical dynamics. Through a flexible and lightweight software package PluMA with an open-source pool of plugins in various programming languages, we facilitate the free and open exchange of ideas and analysis algorithms, with the ultimate goal of discovering larger-scale implications for human health.

Over the past decades, simulation-based virtual training environments (VTEs) have been successful at providing a hands-on training experience for a variety of technical skills in real-life situations that are either impossible, dangerous or too costly to reproduce in a safe environment (e.g. piloting a plane, responding to dangerous chemical accidents). VTEs provide a safe environment where errors committed inside the virtual environments have no impact in real life and can be repeated at will until learning is achieved. Recent advances in virtual intelligent agents and affective computing have made it possible to also support the training of social skills using VTEs. The study of affective computing, i.e. using computers to display, perceive, or adapt to human emotions and affect, is an emerging field in computer science that introduces many research questions for its use with VTEs. In this talk I will give a brief overview of how the VISAGE lab approaches a few of these research questions in the domain of teachers’ training, including the impact of using different Virtual Reality (VR) platforms on user experience (VR headsets, SCIS I-CAVE), and of generating realistic displays of emotional expressions when users training their social skills with VTEs. Virtual social agent technologies are also anticipated to replace the existing desktop interface metaphor (in which the user’s computer monitor is treated as the user’s desk where documents and folders can place) for a variety of application domains involving socio-emotional content (e.g. health counseling, educational games). In this talk, I will describe a computational multimodal framework that enables the creation of social virtual agents able to communicate with users verbally with spoken utterances and speech recognition, as well as non-verbally with facial expressions and gestures. I will give a brief overview of how the VISAGE lab is designing and evaluating virtual agents to derive generalizable design principles for virtual health assistants, which have the potential to transform access to and delivery of effective healthcare interventions for diverse populations.

Sustainability, Optimization, and Learning for InterDependent networks laboratory (solid lab) is an interdisciplinary research group bridging the gap between theory and the real-world. Interdependent networks require highly efficient computational algorithms to deal with large-scale optimization, learning, and ultimately intelligent decision-making problems. In this introductory talk, I will explain how we deploy our solid mathematical background to develop advanced machine learning and distributed decision-making algorithms tailored for real-world applications. I will further explain use-cases, e.g., device-free sensing (“learning to sense rather than sensing to learn”) and intelligent transportation networks. I will further share some updates about our recent findings.

The question we are trying to address – “Can we find the trajectory of cancer development?” In other words, “Can we decode the cancer dynamics?” To decipher the cancer dynamics, longitudinal or time-series omics data for the same cohort (reasonably large size) of patients from initiation to stage-to-stage to metastasis are necessary. But no such temporal data is available for cancer patients leading to metastasis. Recent studies show that single-cell gene expression with no temporal information can be analyzed to discover the mechanism of cell development by inferring pseudotime. The preliminary results show that it is possible to infer pseudotime using static omics data of cancer patients from The Cancer Genome Atlas repository. In this presentation, challenges to decipher the cancer dynamics from static omics data will be addressed by enumerating a pseudotime based computational framework. Some results related to cancer heterogeneity discovered using machine learning techniques and protein network modules for cancer discovered using graph-theoretic approaches will be shared as well.

Big Data acquired from high-throughput instruments requires computational methods to make biological sense out of the data sets. Our lab’s long terms goal is to develop open-source, integrative, and scalable machine-learning strategies leveraging various types of biological, and clinical datasets. We have designed and implemented various machine-learning algorithms and parallel algorithms that can run on XSEDE supercomputers for the processing of big Proteomics and fMRI data. Interesting results will be discussed in the talk.

ModLab at FIU conducts research in performance modeling and simulation of complex systems. Many of these systems, such as computer networks (including cyberinfrastructures, mobile/wireless networks, and datacenter networks), parallel and distributed systems (including high-performance computing systems, cloud-based environments, and operating systems) involve a large number of interconnected components and sophisticated processes. Complex behaviors emerge as these components and processes inter-operate across multiple scales and granularities, and involve different layers of the system hierarchy. Performance modeling and simulation must be able to provide sufficiently accurate results while coping with the scale and complexity. This talk will provide an overview of the related research by ModLab, particularly in the large-scale simulation and performance modeling and evaluation of these systems.

Student Research Presentations

Characters are a key element of narrative and so character identification plays an important role in automatic narrative understanding. Unfortunately, most prior work that incorporates character identification is not built upon a clear, theoretically grounded concept of character. They either take character identification for granted (e.g., using simple heuristics on referring expressions), or rely on simplified definitions that do not capture important distinctions between characters and other referents in the story. Prior approaches have also been rather complicated, relying, for example, on predefined case bases or ontologies. In this paper, we propose a narratologically grounded definition of character for discussion at the workshop, and also demonstrate a preliminary yet straightforward supervised machine learning model with a small set of features that performs well on two corpora. The most important of the two corpora is a set of 46 Russian folktales, on which the model achieves an F1 of 0.81. Error analysis suggests that features relevant to the plot will be necessary for further improvements in performance.

In order to protect user data while maintaining application functionality, encrypted databases can use specialized cryptography such as property-revealing encryption, which allows a property of the underlying plaintext values to be computed from the ciphertext. One example is deterministic encryption which ensures that the same plaintext encrypted under the same key will produce the same ciphertext. This technology enables clients to make queries on sensitive data hosted in a cloud server and has considerable potential to protect data. However, the security implications of deterministic encryption are not well understood. We provide a leakage analysis of deterministic encryption through the application of the framework of quantitative information flow. A key insight from this framework is that there is no single “right” measure by which leakage can be quantified: information flow depends on the operational scenario and different operational scenarios require different leakage measures. We evaluate leakage under three operational scenarios, modeled using three different gain functions, under a variety of prior distributions in order to bring clarity to this problem.

Psychiatric evaluation reports represent a rich and still mostly-untapped source of information for developing systems for automatic diagnosis and treatment of mental health problems. These reports contain free-text structured within sections using a convention of headings. We present a model for automatically detecting the position and type of different psychiatric evaluation report sections. We developed this model using a corpus of 150 sample reports that we gathered from the Web, and used sentences as a processing unit while section headings were used as labels of section type. From these labels we generated a unified hierarchy of labels of section types, and then learned $n$-gram models of the language found in each section. To model conventions for section order, we integrated these $n$-gram models with a Hierarchical Hidden Markov Model (HHMM) representing the probabilities of observed section orders found in the corpus, and then used this HHMM $n$-gram model in a decoding framework to infer the most likely section boundaries and section types for documents with their section labels removed. We evaluated our model over two tasks, namely, identifying section boundaries and identifying section types and orders. Our model significantly outperformed baselines for each task with an $F_1$ of 0.88 for identifying section types, and a 0.26 WindowDiff ($W_d$) and 0.20 and ($P_k$) scores, respectively, for identifying section boundaries.

Qualitative temporal graphs that are derived from annotations on text–e.g., TimeML annotations–explicitly reveal only partial orderings of events and times. For many purposes, however, a total ordering (i.e., a timeline) is more useful. We adapt prior work on solving temporal constraint problems to the task of extracting multiple timelines from TimeML annotated texts and demonstrate, for the first time, an exact, end-to-end solution, which we call TLEX (TimeLine EXtraction). TLEX transforms TimeML annotations into a collection of main and subordinated timelines arranged into a trunk-and-branch style structure. As has been done in prior work, our method checks the consistency of the annotations; however, we go further by showing how to identify the specific relations involved in an inconsistency. After manually correcting inconsistencies, we identify sections of timelines that have indeterminate order, which is information that is critical for downstream tasks such as aligning events from different timelines. We provide formal justification for TLEX’s correctness, and we also conduct an experimental evaluation by applying TLEX to 385 TimeML annotated texts from four corpora. We show that 72 of the texts are inconsistent, 181 of them have more than one main timeline, and there are 2,541 indeterminate sections across all four corpora. A sampling evaluation revealed that TLEX is 98-100% accurate with 95% confidence for five aspects of its output: the ordering of time points, the number of main timelines, the placement of time points on main versus subordinate timelines, the attachment point of subordinated timelines, and the location of the indeterminate sections. We provide our implementation, the extracted timelines for all texts, and the manual corrections to the temporal graphs of the inconsistent texts.

Identifying feelings through the expression of text (aka text emotion detection) has recently received a lot of attention in the text mining community. Emotion is a primary aspect of communication, which consists of gestures, speech, facial expressions, and textual data. Different applications of emotion analysis include getting insight into public opinion of various socio-political subjects (e.g. via analyzing tweets), narrative, observing emotions in sports fields, politics, humanitarian assistance and disaster relief (HADR), navy, terrorist attacks, automated analysis of historical corpus, and the study of product reviews for the purpose of getting true customer sale prediction. A major contribution of this paper is to propose an unsupervised learning framework for automating the process of extracting emotions of sentences in a given corpus. The proposed generalized approach utilizes a vector space model assigning each sentence to a vector of weights quantifying the impact of different words presented in the given sentence utilizing tf-idf criteria. The key module of this framework, called dimension reducer, uses a hybrid of off-the-shelf feature selection and feature extraction tools to reduce the dimension of the constructed vector space model and reduce noises, outliners, and imperfections from the model. Finally, the proposed framework has a labeling module that utilizes various similarity quantifiers like cosine-similarity and kNN to label sentences by evaluating how similar they are compared to standard vectors generated based on a given emotion lexicon.

Motifs are distinctive recurring elements found in folklore that are used to categorize and track tales between cultural groups and through time. Motifs do not just occur in folklore: well-known examples in modern usage include a troll under a bridge, a glass slipper, or a nuclear button. Motifs are often connected to a constellation of associations within the group that recognizes it: much is known about the troll, the bridge, and what happens when a traveler encounters said troll under the bridge. Tracking motifs gives us the ability to model how information relevant to specific cultural groups spreads throughout networks. We present our ongoing work on identifying salient motifs for specific cultural groups, determining the cognitive effect of motifs, and detecting motifs in narratives.

Several studies have focused on the microbiota living in environmental niches including human body sites. In many of these studies researchers collect longitudinal data with the goal of understanding not just the composition of the microbiome but also the interactions between the different taxa. However, analysis of such data is challenging and very few methods have been developed to reconstruct dynamic models from time series microbiome data. Here we present a computational pipeline that enables the integration of data across individuals for the reconstruction of such models. Our pipeline starts by aligning the data collected for all individuals. The aligned profiles are then used to learn a dynamic Bayesian network which represents causal relationships between taxa and clinical variables. Testing our methods on three longitudinal microbiome data sets we show that our pipeline improve upon prior methods developed for this task. We also discuss the biological insights provided by the models which include several known and novel interactions. The extended CGBayesNets package is freely available under the MIT Open Source license agreement. The source code and documentation can be downloaded from: https://github.com/jlugomar/longitudinal_microbiome_analysis_public. We propose a computational pipeline for analyzing longitudinal microbiome data. Our results provide evidence that microbiome alignments coupled with dynamic Bayesian networks improve predictive performance over previous methods and enhance our ability to infer biological relationships within the microbiome and between taxa and clinical factors.

The Internet of Things (IoT) has been erupting world widely over the decade. However, the security and privacy leakage issues from IoT devices are surfaced to a major flaw for IoT device operators. An attacker may use network traffic data to identify IoT devices and launch attacks on their target devices. To explore the severity and extent of this privacy threat, we design a hybrid ML-based IoT device identification framework. Our key insight is that typically an IoT device has a unique traffic signature and it is already embedded in its network traffic. Unlike other existing work using complex modeling, we show that the majority of IoT devices can be easily identified using our empirical models, and the other devices can also be correctly classified using our ML-based models. We instrument a smart IoT experiment environment to verify and evaluate our approaches. Our framework paves the way for operators of smart homes to monitor the functionality, security and privacy threat without requiring any additional devices.

SSDs are taking over in storage because of lower latency, higher bandwidth, and recent price drops. In the absence of moving mechanical parts, SSDs are also expected to be more durable than HDDs. But in terms of reliability in extreme and harsh environments, little is known about SSDs beyond the vendor-supplied information. In this work, we investigate the impact of temperature and humidity on the performance of SSDs in a systematic way.

As robots become more sophisticated, widespread in use, and critical for humans, security and privacy issues become significant concerns. This is especially important in scenarios where non-cooperating robots and humans need to solve tasks collectively without revealing their private information. In our work, we investigate a study case where a group of $n$ robots needs to collect information collected by a set of $m$ deployed sensors. Two players, Alice and Bob, representing entities that have conflicting interests (e.g., two rival companies) need to deploy the robots and learned a model from the information collected from the sensors without revealing their private information. Our methodology consists of the following steps: 1) A task privately preserving task allocation algorithm is proposed to assign sensors to tasks; 2) secure path collision checking between robots is executed to ensure that the robots do not collide without revealing their information; 3) a privately preserving learning algorithm is used to allow Alice and Bob to learn a model of the collected data without leaking excessive information. We implemented our ideas in software and analyzed the results of an initial set of experiments.

Sequencing peptides from tandem MS/MS spectra data using computational methods is a critical step in proteomics research. The most commonly employed computational method involves searching the experimentally obtained tandem MS/MS data against a set of simulated MS/MS spectra generated from a protein sequence database. Modern mass spectrometers produce data at astonishing velocity and volumes whereas the peptide sequencing speed is bottlenecked by the available computation and memory resources for data processing and searching. Further, most existing state-of-the-art peptide search algorithms and software have not been designed to utilize and efficiently scale-up with increasing available system resources. In this work, we present a novel software framework, HPC-PCDSFrame, for extreme scale peptide sequencing in high performance computing (HPC) environments. Our proposed framework features several new and improved algorithms to efficiently exploit both task and data parallelism in peptide sequencing to extract maximum throughput from the HPC system. The featured algorithms include optimum hybrid MPI/OpenMP job geometry construction, compute load balancing, memory efficient indexing, double buffering, queuing, and dynamic task scheduling. Our experimental results demonstrate the ultrafast peptide sequencing and superior speedup efficiency results for HPC-PCDSFrame on Comet HPC cluster (XSEDE supercomputer). The HPC-PCDSFrame has been implemented in C++ and Python using MPI and OpenMP libraries.

GAN is a kind of neural network, which can generate new data based on the given dataset and the big data of human brains anatomical structure is very expensive. How to use GAN to generate data of human brains becomes a valuable problem. Here, we will use conformal welding signature curves (CWSC) of brain as an example to show our work on using AC-GAN to generate structured experiment data. A novel method for normalization will be involved. To show the effectiveness of our method, the results of experiment for classification of brain based on IQ label will be shown below which show that the new data help the classifier network improve its performance.

The Named Data Networking architecture mandates cryptographic signatures of packets at the network layer. Traditional RSA and ECDSA public key signatures require obtaining signer’s NDN certificate (and, if needed, the next-level certificates of the trust chain) to validate the signatures. This potentially creates two problems. First, the communication channels must be active in order to retrieve the certificates, which is not always the case in disruptive and ad hoc environments. Second, the certificate identifies the individual producer and thus producer anonymity cannot be guaranteed if necessary. In this paper, we present NDN-ABS, an alternative NDN signatures design based on the attribute-based signatures, to addresses both these problems. With NDN-ABS, data packets can be verified without the need for any network retrieval (provided the trust anchor is pre-configured) and attributes can be designed to only identify application-defined high-level producer anonymity sets, thus ensuring individual producer’s anonymity. The paper uses an illustrative smart-campus environment to define and evaluate the design and highlight how the NDN trust schema can manage the validity of NDN-ABS signatures. The paper also discusses performance limitations of ABS and potential ways they can be overcome in a production environment.

Finding the biomarkers of cancers and the analysis of cancer-driving genes that are involved in these biomarkers are essential for understanding the dynamics of cancer. Clusters of genes in co-expression networks are commonly used as functional units. This work is based on the hypothesis that the dense clusters or communities in the gene co-expression networks of cancer patients may represent functional units regarding cancer initiation and progression. The conserved communities in the gene co-expression networks of three cancers – Breast, Brain, and Colon – are extracted using seven state-of-the-art community detection algorithms. Survival analysis based on the genes of conserved communities concludes that some of them could predict the survival risk of patients which are considered as the biomarkers of cancer. Finally, the discovered biomarkers are validated using survival analysis of completely two different cancers – Glioma and Ovarian cancers – which are not used in determining the communities.

Long noncoding RNA (lncRNA) plays key roles in tumorigenesis. Misexpression of lncRNA can lead to changes in expression profiles of various target genes, which are involved in cancer initiation and progression. So, identifying key lncRNAs for a cancer would help develop the cancer therapy. Usually, to identify key lncRNAs for a cancer, expression profiles of lncRNAs for normal and cancer samples are required. But, this kind of data are not available for all cancers. In the present study, a computational framework is developed to identify cancer specific key lncRNAs using the lncRNA expression of cancer patients only. The framework consists of two state-of-the-art feature selection techniques – Recursive Feature Elimination (RFE) and Least Absolute Shrinkage and Selection Operator (LASSO); and five machine learning models – Naive Bayes, K-Nearest Neighbor, Random Forest, Support Vector Machine, and Deep Neural Network. For experiment, expression values of lncRNAs for 8 cancers – BLCA, CESC, COAD, HNSC, KIRP, LGG, LIHC, and LUAD – from TCGA are used. The combined dataset consists of 3,656 patients with expression values of 12,309 lncRNAs. Important features or key lncRNAs are identified by using feature selection algorithms RFE and LASSO. Capability of these key lncRNAs in classifying 8 different cancers is checked by the performance of five classification models. This study identified 37 key lncRNAs that can classify 8 different cancer types with an accuracy ranging from 94% to 97%. Finally, survival analysis supports that the discovered key lncRNAs are capable of differentiating between high-risk and low-risk patients.

Due to the decline of solar module prices, more and more people would like to install solar panel energy systems. To better monitor and analyze, the energy generation and consumption data are transmitted on the Internet, stored in the cloud, even making the data available to the public. The energy data can leak privacy information of the occupancy, easily cause multiple attacks. However, these attacks can be avoided as the solar energy are “anonymous”, which means the data is not connected to the account information, such as a name and address, thus these solar-powered home energy data is often not treated as sensitive. Our key insight is the solar energy data is not anonymous: since every location on the earth has unique solar and weather signature. We design a system to localize the “anonymous” solar-powered homes. We first localize the source home to a small region of interest by inferring the latitude and longitude from the information inherently embedded in the solar data. We then identify solar-powered homes within this region using satellite image processing by extracting and detecting rooftop solar deployment using convolution neural networks (CNN) to identify a specific home without extra cost.

Recognizing the internal structure of events is a challenging language processing task of great importance for text understanding. We present a supervised model for automatically identifying when one event is a subevent of another. Building on prior work, we introduce several novel features, in particular discourse and narrative features, that significantly improve upon prior state-of-the-art performance. Error analysis further demonstrates the utility of these features. We evaluate our model on the only two annotated corpora with event hierarchies: HiEve and the Intelligence Community corpus. No prior system has been evaluated on both corpora. Our model outperforms previous systems on both corpora, achieving 0.74 BLANC F1 on the Intelligence Community corpus and 0.70 F1 on the HiEve corpus, respectively a 15 and 5 percentage point improvement over previous models.

Key-Value Solid State Drive (KV SSD), a key-addressable SSD technology, promises to simplify storage management for unstructured data and improve system performance with minimal host-side intervention. This is achieved by streamlining the software storage stack and removing redundancy. However, the proposed KV stack is lacking in features. Absence of one-to-many and many-to-one key-value mappings in KV SSD impede optimal system performance. Key-value size limitation also hinders maximal capacity utilization of the KV storage device. In addition, the current index structure and key-value mapping of KV SSD possesses data reliability issues. We propose to address these challenges by optimizing the internal data organization and management of the KV storage device. We plan to implement the proposed schemes inside an emulator and evaluate the solutions with different applications through KVBench, a KV stack performance benchmarking suite, on both block-based SSDs and KV SSDs.

Graphics Processing Units (GPUs) has become increasingly popular owing to the nature of accelerating the training process for many deep learning models. However, the total cost of ownership of GPU accelerated deep learning system infrastructure is a lot higher. Which brings us to the question: Can we train deep learning models using existing resources so that it becomes more affordable? This also brings up the issue of fault tolerance. Spark is not a feasible framework for this system as several layers of indirection will adversely affect performance. Therefore, we take advantage of Docker containers and MXNet on the top of the Yarn cluster to improve the training performance. In this work, we also aim to investigate and study several distinct issues that affect cluster utilization for deep learning training workloads such as how we can orchestrate deep learning’s parameters server and workers over containers and virtual machines, the correlation between number of workers and the number of containers. Based on our experience running a large-scale operation, we propose a new framework for constructing a resource-efficient fault-tolerant platform for deep learning applications. In the future, we plan to explore scheduling and migration of running deep learning training. Apart from that, we will further explore the acceleration of deep learning using SSDs.

The pace of evolution and advancements in an industry is directly proportional to the amount of relevant research that has been done in order to study the practices and areas affecting the development and growth of that domain. This paper tries to apply the same theory in the field of Travel and Hospitality, by gauging the difference between the research aspects and industry practices in restaurant(beverage) industry. Using NLP, we try to evaluate the gap between the interests of the restaurant industry, and the research performed in academia. This involves evaluating a dataset consisting of 5-year records extracted from National Restaurant Association’s SmartBrief email archive and comparing them with the research published in the International Journal of Hospitality Management for the relevant years. To establish a baseline characterization of the gap, we use powerful NLP concepts like Key-phrase Analysis, Topic Modelling, Word Embeddings and Similarity Assessment. We also analyze how the discussion topics vary over time, in both industry and academia, and see which topics are trending in a particular time frame.

Longitudinal or time-series omics data for the same cohort (reasonably large size) of patients are necessary to understand the cancer dynamics or development. But no such data is available for cancer. The goal of this study is to arrange the breast cancer samples in order to construct the trajectory of cancer development. This goal is achieved through two objectives: First, by developing the trajectory of samples in each of the four stages separately; Second, by combining the trajectories in each stage to find the complete trajectory. To accomplish the first objective, a statistical approach t-distributed stochastic neighbor embedding (t-SNE) is used to reduce the high dimension of static mRNA expression dataset. Then principal curve fitting is employed on dataset with 3 t-SNE components to infer the pseudotime for cancer samples in each stage. Finally, the trajectory is developed by ordering the samples based on the normalized pseudotime ranging between 0 and 1. To decipher the heterogeneity within each stage, samples are divided into two time periods: 0.0 to 0.50 and 0.50 to 1.0. Then k-mean and gap statistic are used to find the clusters, representative of cancer heterogeneity, in each time period. Finally, the samples in each cluster are analyzed with respect to key genes related to breast cancer. The developed computational framework has the potential of analyzing cancer heterogeneity in details which in turn will help develop the therapeutic prevention plan and drug design at personalized level.

Black Hat App Search Optimization (ASO) in the form of fake reviews and sockpuppet accounts, is prevalent in peer-opinion sites, e.g., app stores, with negative implications on the digital and real lives of their users. To detect and filter fraud, a growing body of research has provided insights into various aspects of fraud posting activities, and made assumptions about the working procedures of the fraudsters from online data. However, such assumptions often lack empirical evidence from the actual fraud perpetrators. To address this problem, in this paper, we present results of both a qualitative study with 18 ASO workers we recruited from 5 freelancing sites, concerning activities they performed on Google Play, and a quantitative investigation with fraud-related data collected from other 39 ASO workers.We reveal findings concerning various aspects of ASO worker capabilities and behaviors, including novel insights into their working patterns, and supporting evidence for several existing assumptions. Further, we found and report participant-revealed techniques to bypass Google-imposed verifications, concrete strategies to avoid detection, and even strategies that leverage fraud detection to enhance fraud efficacy. We report a Google site vulnerability that enabled us to infer the mobile device models used to post more than 198 million reviews in Google Play, including 9,942 fake reviews. We discuss the deeper implications of our findings, including their potential use to develop the next generation fraud detection and prevention systems.

Recent advances in machine learning open up new and attractive approaches for solving classic problems in computing systems. ​For storage systems, cache replacement is one such problem because of its enormous impact on performance. We model workloads as a composition of four workload primitive types — ​LFU​-friendly, ​LRU​-friendly, scan, and churn. ​We then design and evaluate ​Cacheus​, a new class of fully adaptive machine-learned caching algorithms that utilize lightweight experts carefully designed to address these workload primitive types. The lightweight experts used by ​Cacheus — SR-LRU, a scan-resistant version of LRU, and CR-LFU, a churn-resistant version of LFU. We evaluate ​Cacheus using 13,500 simulation experiments on a collection of 250 workloads run against 6 different cache configurations that are sized relative to the individual workload sizes, comparing its performance against state-of-the-art caching algorithms. We conduct pairwise t-tests comparing the ​Cacheus against 90 different combinations of algorithm, workload collection, and cache-size. ​With a p-value threshold of 0.05, ​Cacheus using the newly proposed lightweight experts, SR-LRU and CR-LFU, does significantly better than the best performing competitor in 40% of the tests, is indistinguishable from the best in 47%, and is significantly worse than the best in the remaining 13%. ​Moreover, for the 13% cases where an algorithm other than ​Cacheus is found to be distinctly better, no single algorithm is found to be consistently the best, indicating that ​Cacheus​ should be the algorithm of choice.

Simulation-based training systems have proven effective in a variety of domains, both for facilitating the learning of skills as well for applying this knowledge to real life. Although difficulties managing students’ disruptive behavior in classrooms has been identified as one of the main causes of teachers’ turnover, only a handful of virtual training environments have focused on providing training to teachers, and still no clear methodologies exist for their design, their implementation, nor their evaluation. In this article we discuss the methodologies employed by an interdisciplinary team of computer science and education researchers involved the development of the first of four iterative, increasingly sophisticated, prototypes of a web-based 3D Interactive Virtual Training Environment for Teachers (IVT-T). IVT-T simulates students with disruptive behaviors that teachers can interact with in a 3D virtual classroom, which provides teachers practice in managing classrooms, as well as feedback and reflection opportunities about their classroom behavior management skills. We currently describe the processes we conducted to derive the main system requirements for IVT-T 1.0 (the system is still evolving), which led to our suggestions for general requirements, in addition to the next lifecycle steps we identified for the successful implementation of the final system.

Conventionally, caching algorithms have been designed for the datapath — the levels of memory that data must contain the data before it gets made available to the CPU. Amongst the latest developments, attaching a fast device (such as an SSD) as a cache to a host that runs the application workload and thus creating host-side caches, opening up the possibilities for non-datapath caches to exist. Non-Datapath caches are called as such because the caches do not exist on the traditional datapath, instead of being optional memory locations for data. Being optional, a new option is available to caching algorithms that manage these caches: not caching at all, and instead of bypassing the cache entirely for access. With the option to not cache, items may not get inserted into the cache, an outcome that is beneficial to the lifetime of cache devices that can only sustain a limited number of writes before they degrade and become unusable. We propose ANX, a new approach for managing non-datapath caches. ANX takes into account that most storage workloads go through a series of phases during their execution with access behavior that can vary significantly across these phases. ANX operates using a probabilistic metric called anxiety — it uses this probability to decide whether a request is to be admitted to the cache or not. This probability changes with the learned state of the workload. ANX uses an understanding of these states of the workload that is counter-intuitive when compared to the usual methods caching algorithms typically employ. Most significantly, algorithm ANX is generic—it modifies any underlying caching policy to create a new cache replacement algorithm that is better suited for non-datapath caches.

Matrix multiplication is a fundamental operation in many machine learning al- gorithms. With the size of the dataset increasing rapidly, it is now a common practice to compute large-scale matrix multiplication on multiple servers, such that each server multiplies submatrices inside the input matrices. As straggling servers are inevitable in a distributed infrastructure, various coding schemes have been proposed which deploy coded tasks encoded from the submatrices of input matrices. The overall result can then be decoded from a subset of such coded tasks. However, as the resources are shared with other jobs in a distributed infrastructure and their performance can change dynamically, the optimal way to encode the input matrices may also change over time. So far, existing coding schemes for matrix multiplication all require to split the input matrices and encode them in advance, and cannot change the coding schemes or adjust their parameters after encoding. In this paper, we propose a coding framework that can dynamically change the coding schemes and their parameters, by only re-encoding local data in each task. We demonstrate that the original tasks can be quickly converted into new tasks only incurring marginal overhead.

Quantum computing leverages quantum mechanical phenomena to perform computations that are very difficult in classical computing. Quantum computers are predicated on qubit logic in contrast to the bit logic of a classical computer and thus are subject to decoherence and other quantum noise. The goal of this project is study the dynamics of qubits by exercising an integrated device model with spatiotemporal-aware noise sources. The physical implementation of the model consists of a multiport Brune superconducting circuit which can encode any device and include arbitrary descriptions of noise. The model is flexible enough to include the anharmonic effects from Josephson Junctions needed to model qubits. We have tested and validated the model and obtained results in agreement with the expected theoretical results.

The rising growth of in-memory computing by data-intensive applications today has increased the demand for DRAM. But DRAM is limited in terms of capacity, cost, and power consumption. Persistent memory devices offer a unique set of properties that make them well-suited as main memory extension devices to address these limitations. These persistent memory devices are byte-addressable, large in capacity, cheaper than DRAM, and consume less power. However, it is not clear how this new type of memory can be best used in existing systems. One promising direction is to introduce these persistent memory devices as a second memory tier that is directly exposed to the CPU. This tiered design presents the key challenge of placing the right data in the right memory tier at the right time. Thus, we design MULTI-CLOCK, a tiered memory system wherein persistent memory and DRAM co-exist and provide improved application performance. MULTI-CLOCK is built upon the well-understood CLOCK page replacement algorithm to handle migration across memory tiers adaptively as access patterns of the workloads change and to handle tier memory pressure. It is entirely transparent to, and backward compatible with, existing applications. Here, we discuss the motivation for MULTI-CLOCK, outline a system design, and present the initial evaluation of MULTI-CLOCK from the ongoing implementation in Linux. We are currently working on improving MULTI-CLOCK with a machine-learned approach that allows for more responsive and dynamic tier placement of memory pages that optimizes for the overall performance of the system.

Feature Selection has been a significant pre-processing procedure for classification in the area of Supervised Machine Learning. It is mostly applied when the attribute set is very large. The large set of attributes often tend to misguide the classifier. Extensive research has been performed to increase the efficacy of the predictor by finding the optimal set of features. The feature subset should be such that it enhances the classification accuracy by the removal of redundant features. We propose a new feature selection mechanism, an amalgamation of the filter and the wrapper techniques by taking into consideration the benefits of both the methods. Our hybrid model is based on a two phase process where we rank the features and then choose the best subset of features based on the ranking. We validated our model with various datasets, using multiple evaluation metrics. Furthermore, we have also compared and analyzed our results with previous works. The proposed model outperformed many existent algorithms and has given us good results.

Click fraud is a fast-growing cyber-criminal activity with the aim of deceptively clicking on the advertisements to make the profit to the publisher or cause loss to the advertiser. Due to the popularity of smartphones since the last decade, most of the modern-day advertisement businesses have been shifting their focus toward mobile platforms. Nowadays, in-app advertisement on mobile platforms is the most targeted victim of click fraud. Malicious entities launch attacks by clicking ads to artificially increase the click rates of specific ads without the intention of using them for legitimate purposes. The fraud clicks are supposed to be caught by the ad providers as part of their service to the advertisers; however, there is a lack of research in the current literature for addressing and evaluating different techniques of click fraud detection and prevention. Another challenge toward click fraud detection is that the attack model can itself be an active learning system (smart attacker) with the aim of actively misleading the training process of fraud detection model via polluting the training data.

With the growing complexity of IT service environments, unexpected problems happen daily which in consequence generate huge volumes of incident tickets. An interactive and analytical platform becomes an urgent need for administrators to conduct quick investigations on what-if scenarios. In this paper, we develop VTMM, a visual tool with a user-friendly interface for ticket monitoring and management, aiming to help administrators get a clear overview of issues occurring in IT service and detect possible root causes for problems with advanced algorithms. VTMM consists of three main components in which data visualization and clustering techniques are leveraged and integrated to facilitate the system administration in IT service management. These components include a statistics panel for an overview of IT service, a module for data preprocessing and a platform for running experiments using diverse clustering algorithms. With the visual tool’s help, administrators can get the guidance of prioritizing work and are capable of providing immediate support easily.

Intelligent virtual agents (IVAs) are digital embodied characters that can emulate some verbal or non-verbal qualities of human-human dialogue. Although IVAs have emerged as a powerful new paradigm for human-computer interaction, it is unclear how their portrayed racial and gender features shape users’ perception and preferences during interactions. Since research suggests that an agent’s role can influence a users’ opinions, we focus on agents acting in the role of health assistants, which have been deemed promising by the healthcare community in the delivery of health interventions. In this research, we discuss the design of a user study on 22 diverse 3-D animated virtual agents, which we developed to portray different gender and racial characteristics. In this work, we examine the concordance between an agent’s perceived gender and race and a user’s self-reported gender and race, as well as their ratings on the agents’ trustworthiness, likability, and their likelihood to select a particular agent. We also investigate users’ perceived immediacy when observing an agent during a greeting scenario, in which the agent’s proximity to the user, smile, and gaze — non-verbal behaviors which have been shown to play a significant role in regulating the flow of communication — are manipulated.

Matrix multiplication is a fundamental building block in various machine learning algorithms. When the matrix comes from a large dataset, the multiplication will be split into multiple tasks which calculate the multiplication of submatrices on different nodes. As some nodes may be stragglers, coding schemes have been proposed to tolerate stragglers in such distributed matrix multiplication. However, existing coding schemes typically split the matrices in only one or two dimensions, limiting their capabilities to handle large-scale matrix multiplication. Three-dimensional coding, however, does not have any code construction that achieves the optimal number of tasks required for decoding. The best result is twice the optimal number, achieved by entangled polynomial (EP) codes. In this paper, we propose dual entangled polynomial (DEP) codes that significantly improve this bound from 2x to 1.5x. With experiments in a real cloud environment, we show that DEP codes can also save the decoding overhead and memory consumption of tasks.

Historically, there have been two opposing approaches to peptide deduction from mass-spectrometry data i.e. de novo sequencing and database searching. De novo approach tries to transform spectral space into peptide space by predicting individual amino acids from a given spectrum. On the other hand, database search tries to associate the experimental spectra to existing peptides by transforming peptide space into the spectral space and performing the comparisons. Each approach uses a similarity function based on heuristics that, correlates with the match quality between an experimental spectrum and its corresponding peptide. When using heuristics, there is no solid reasoning outlining why a function is chosen over the other one or why a certain feature within a function has the given associated weight. In this project, we design and implement a deep learning model, called DeepSNAP, that presents the middle ground between the above-mentioned opposing techniques. DeepSNAP, which stands for “Deep Similarity Network for Proteomics”, transforms spectral and peptide spaces into shared Euclidean subspace by learning embeddings for both spectra and peptides. The problem is tackled by training a similarity network to learn a similarity function for obtaining high quality peptide spectrum matches. The network is trained on triplets (Q, P, N) using the triplet-loss function. Each triplet consists of a query spectrum Q, a positive peptide P, and a negative peptide N. By training the network only on a moderate sized dataset of nearly five hundred thousand triplets, obtained from NIST peptide library, we achieve an accuracy of 99.7%

The use of computer systems to perform critical tasks with sensitive data and equipment have increased exponentially over the last few decades. Due to this, critical systems are one of the main targets of cyber attacks and malicious software that unauthorized individuals try to manipulate to gather intel or control of the equipment. As of now, the most common and current methods of preventing, detecting and cleansing of a malicious software include encryption of data, firewalls, anti-viruses, etc. This research focuses on the implementation of a virtual machine introspection based framework for detecting, analyzing and monitoring of malware behavior on a virtual machine. The low-level introspection allows a deeper level of monitoring by accessing kernel-level structures that conventional protection software is not able to access, thus enhancing the security of said systems.

In this work we study a variant of the well-known multi-armed bandit (MAB) problem, which has the properties of a delay in feedback, and a loss that declines over time. We introduce an algorithm, EXP4-DFDC, to solve this MAB variant, and demonstrate that the regret vanishes as the time increases. We also show that LeCaR, a previously published machine learning-based cache replacement algorithm, is an instance of EXP4-DFDC. Our results can be used to provide insight on the choice of hyperparameters, and optimize future LeCaR instances.

The nuclear industry is experiencing a steady increase in maintenance costs even though plants are maintained under high levels of safety, capability, and reliability. Nuclear power plants are always expected to run every unit at maximum capacity, efficiently utilizing assets with minimal downtime. Surveillance and maintenance of nuclear-decommissioning infrastructure provide many challenges with respect to maintenance or decommissioning of the buildings. As these facilities await decommissioning, there is a need to understand the structural health of these structures. Many of these facilities were built over 50 years ago and in some cases these facilities have gone beyond the operational life expectancy. In other cases, the facilities have been placed in a state of “cold and dark” and they are sitting unused, awaiting decommissioning. In any of these scenarios, the structural integrity of these facilities may be compromised, so it is imperative that adequate inspections and data collection/analysis be performed on a continuous and ongoing basis. A pilot-scale infrastructure was developed to implement structural health monitoring using scanning technologies, machine learning/deep learning and big data technologies. The focus for structural health monitoring was the walls of the mock-up infrastructure. A plan was developed to collect various formats of data such as structured and unstructured data from the various sensors deployed in the mock-up infrastructure. The main source considered for data was video and images from various imaging sources. During the data collection process, a total of 28,000 images (RGB) were taken with a regular camera and stored in the Big Data Platform using the Hadoop Distributed File System (HDFS). The images contain variations in light exposure, angles, and aspect ratios. The entire dataset was evenly separated into two categories, “Baseline” and “Degraded”. A duplicate dataset was formed by scaling down all images using antialiasing to a manageable resolution for the neural network model. This data distribution formed the basis for the machine learning approach. A Deep Convolutional Neural Network (CNN) was implemented in Python using the Keras library and TensorFlow architecture. The goal for the CNN was to classify the images into its two categories respectively. The CNN was constructed by a Sequential Model in Keras which is a linear stack of neuron layers. A total of 10 layers were stacked by a combination of convolutions, max pooling, and a dense layer. The model was verified with 70/30 Cross Validation technique which achieved 97.1% accuracy during the training phase. The high accuracy of the CNN model demonstrates that with machine learning as a component of structural health monitoring can provide valuable information for the conditions of a nuclear facility.

Inferring causality is the process of connecting a cause with an effect. Identify- ing even a single causal relationship from data is more valuable than observing dozens of correlations in a data set. The Turing award winner, Judea Pearl, and others paved the way for the study of causality. With advances in artificial in- telligence, Bayesian networks, causal calculus, data science, and machine learn- ing, the question has become “how to draw a causal conclusion in a data-driven way?”. Given a sufficiently large and rich data set, the theoretical foundations of causality allows us to go well beyond merely discovering statistical associations in data, but to infer causal relationships in a quantitative manner and to even explore “what-if” questions, which can have a profound impact in data-driven decision making in any domain. Learning causal inference has been compared to human level intelligence [8]. Causal inference has been successfully applied in the fields of education [8], economics [13], online advertising [2], medicine and epidemiology [6, 3], social sciences [7], natural language processing [14], policy evaluation [12], recommendation systems [1], and much more. Consider two random variables, X and Y , and a set of n observations {(x1, y1), . . . , (xn, yn)} arising from an underlying joint distribution, denoted by PXY . A traditional statistical approach is to “learn” a function f such that y = f(x) and both supervised and unsupervised methods exist to learn f from the data [11]. However, no causal relationship between x and y can be inferred from this approach [9]. The goal of my research is to develop a causal framework for the highly dynamic, interdependent, and complex data sets generated from microbiome studies. A microbiome is a community of microbes including bacteria, archaea, protists, fungi and viruses that share an environmental niche [10]. Microbiomes have been referred to as a social network because of the complex set of potential interactions between its various taxonomic members [5, 4]. The study of microbiomes can potentially impact the study of diseases linked to dysbiosis in microbiomes, and could lead to potential interventional treat- ments and therapies.

With the advent of technology, sophisticated malware presents a significant threat to computer security. In this work, we propose anomaly detection techniques that learn three different behaviors of windows system-call sequences. We apply Long-Short-Term-Memory (LSTM) for temporal behavior, Cosine Similarity for frequency distribution behavior, and Jaccard Similarity for commonality behavior. The proposed framework monitors the processes in a hypervisor-based environment to detect compromised virtual machines. System call sequences of normal processes and malware-infected processes were extracted with memory forensic techniques.

Test case design and production is largely a manual activity that consumes as much as 70% of a software test life-cycle. User requirements usually include functional and non-functional attributes, performing analysis on them can be used to gather the conditions and data required for testing.The approach under consideration is to use a model-to-model transformation to convert requirements into test cases. Key to this transformation process is the use of meta-modeling for requirements and test cases. Model-driven Software Engineering(MDSE) when used with Artificial Intelligence (AI) and Natural Language Processing(NLP) will assure the automatic generation of test cases using the requirements. The feasibility of this approach is verified by building a prototype.

The huge amount of social spam from large-scale social networks has been a common phenomenon in the contemporary world. The majority of former research focused on improving the efficiency of identifying social spam from a limited size of data in the algorithm side, however, few of them target on the data correlations among large-scale distributed social spam and utilize the benefits from the system side. In this paper, we propose a new scalable system, named SpamHunter, which can utilize the spam correlations from distributed data sources to enhance the performance of large-scale social spam detection. It identifies the correlated social spam from various distributed sources through DHT-based hierarchical functional trees. These functional trees act as bridges among data servers/sources to aggregate, exchange, and communicate the updated and newly emerging social spam with each other. Furthermore, by processing the online social logs instantly, it allows online streaming data to be processed in a distributed manner, which reduces the online detection latency and avoids the inefficiency of outdated spam posts.

A storm surge occurs when winds, produced by a high-category hurricane, push water towards the land. Current systems usually rely upon 2D visuals when demonstrating the areas that may be affected or were already affected by a hurricane. We propose a 3D visualization that makes use of Geographic Information System (GIS) data to attest disasters such as storm surge. More importantly, we make use of Light Detection and Ranging (LiDAR) data provided by The National Oceanic and Atmospheric Administration (NOAA). LiDAR is a remote sensing technology that emits lasers and measures the time it takes for the sensor to detect the reflected light. The result is a point cloud that exhibits fine-grained height information of the scanned area. We can take advantage of this technology to build accurate and interactive 3D maps that represent real-life coastal areas. Plus, it enhances the learning experience of the effects of natural disasters.

In recent years, Cyber-Physical Systems (CPSs) have become ubiquitous. From utility features in household devices to safety-critical features in cars, trains, aircraft, robots, smart healthcare devices they are being used. Designing CPSs and their controller algorithms are extremely challenging due to the interaction of various, multi-domain, and physical systems. It is also crucial to ensure that they work as they are expected to. Hybrid Automata (HA) is one of the most widely used techniques to develop behavioral models of CPSs for analysis. Several sophisticated methods to capture complex behaviors along with efficient algorithms to analyze are available extending HA. But, these approaches suffer from a dramatic increase in the model dimension as the growing complexity of the systems. High-level Petri Nets (HLPNs), on the other hand, are powerful techniques to model complex, distributed, concurrent, asynchronous systems. But, they are difficult to analyze. Under this study, we aim to integrate these two formal methods into one unified framework, which will provide support to use Hybrid Predicate Transition Nets, a class of HLPNs, for modeling CPSs and to use HA with the support of the tool SpaceEx for analysis.

The price of an airline ticket is affected by a number of factors, such as flight distance, purchasing time, fuel price, etc. Each carrier has its own proprietary rules and algorithms to set the price accordingly. Recent advance in Artificial Intelligence (AI) and Machine Learning (ML) makes it possible to infer such rules and model the price variation. This paper proposes a novel application based on two public data sources in the domain of air transportation: the Airline Origin and Destination Survey (DB1B) and the Air Carrier Statistics database (T-100). The proposed framework combines the two databases, together with macroeconomic data, and uses machine learning algorithms to model the quarterly average ticket price based on different origin and destination pairs, as known as the market segment. The framework achieves a high prediction accuracy with a 0.869 adjusted R squared score on the testing dataset.

A classical result of Rothschild and van Lint asserts that if every non-zero Fourier coefficient of a Boolean function f over Fn 2 has the same absolute value, namely |ˆf(α)| = 1/2k for every α in the Fourier support of f, then f must be the indicator function of some affine subspace of dimension n − k. In this paper, we slightly generalize their result and show that Boolean functions whose Fourier coefficients take values in the set {−2/2^k,−1/2^k, 0, 1/2^k, 2/2^k} are indicator functions of two disjoint affine subspaces of dimension n − k or four disjoint affine subspace of dimension n − k − 1. Our main technical tools are results from additive combinatorics which offer tight bounds on the affine span size of a subset of F_n^2 when the doubling constant of the subset is small.

Best Poster Winners

Congratulations to everyone that participated.

Best Poster

DeepSNAP: Protein Inference using Deep Learning, by Muhammad Usman Tariq and Fahad Saeed.

Runner-up

Effect of Race, Ethnicity, and Gender of 3-D Virtual Health Agents on User’s Perceived Immediacy, by Stephanie Lunn and Christine Lisetti

Runner-up

Analyzing Cultural Motif Effects in Networks, by W. Victor H. Yarlott, Anurag Acharya, and Mark A. Finlayson

RSVP

Please let us know if you are able to attend and if so, if you bring someone. We can’t wait to see you at our event!

Send us mail

This contact form is deactivated because you refused to accept Google reCaptcha service which is necessary to validate any messages sent by the form.