HPCC Systems collaborates with a number of academic institutions in the USA, Brazil, Europe and India. Our intern program welcomes students onto the HPCC Systems platform team during the summer recess to work on HPCC Systems related projects. Our Academic Program supports research requiring data analytics, machine learning and other projects which students contribute to, and as a result, are also eligible to enter this contest. In fact, any student working on an HPCC Systems related project is welcome to enter, regardless of whether they are an intern or academic partner contributor. This year, the 6th annual technical poster competition will showcase the work of 18 students across four categories including data analytics, platform enhancement, research and use case. In addition, the Community Choice Award is returning where we ask our Virtual Summit attendees to participate in the voting of their favorite poster.
Data Analytics

Improvements on HSQL: A SQL-like language for HPCC Systems

Improving conditional probability calculations using kernel methods in Reproducing Kernel Hilbert Space (RKHS)

Independence Testing with RCoT : Causal Validation and Discovery for HPCC Systems Causal Toolkit

Processing Student Image Data with Kubernetes on HPCC Systems GNN on Azure

Platform Enhancement

Apply Docker Image Build and Kubernetes Security Principles

HPCC Systems Ingress Configuration with AWS ALB

Ingress Configuration

Using Azure Spot Instance with HPCC Systems for Cost Optimization


Big Data and Logistic Regression applied to Analysis of Loan Requests

Comparative study of HPCC Systems and Hadoop

HPCC Systems File Usage Monitor

Massive data analysis in public management: A proposal to identify outliers in the São Paulo city government's real estate registry

Preventing Fraud by Registration Inconsistencies

Use Case

COVID-19 Cases and Vaccination Data Tracker in India

Developing a Recommendation System for a Virtual Reality based Supermarket using Big Data Platforms

Ingestion and Analysis of Collegiate Women's Basketball GPS Data in HPCC Systems and RealBI

The Forecast of COVID-19 Spread

Toxicity Detection

Conditional Probability is a key enabling technology for Causal Inference. For real valued variables, calculating conditional probabilities is particularly challenging because they can take on an infinite set of values. With the increase in conditional dimensions, the data appears sparser and sparser making it difficult to derive accurate results. After looking at various ways of modelling conditional probabilities, we found that using RKHS kernel methods, it was possible to estimate the density and cumulative density of conditional probabilities with a single conditioning variable. Read More

poster image
student photo
Achinthya Sreedhar

Achinthya Sreedhar is studying for a Bachelor of Computer Science and Engineering at the RV College of Engineering, Bengaluru, India.

Email: achinthyas.cs18@rvce.edu.in

An Ingress is an object that allows access to Kubernetes services from outside the Kubernetes cluster. Ingress is made up of an Ingress object and the Ingress Controller. An Ingress Controller is the implementation of the Ingress. In this project, two Ingress implementations, HAProxy and Nginx were examined on Azure environment. These two Ingress controllers both use the in-cluster Ingress solutions, where load balancing is performed by pods within the cluster. My works explore the different setup used to configure Ingress features through annotations and Kubernetes ingress specifications. Read More

poster image
student photo
Amy Ma

Amy Ma is an 11th Grade student attending Marjory Stoneman Douglas High School in Florida, USA.

Email: amy.ma5656@gmail.com

Big Data and its applications are becoming more and more important across many different fields. In this context, techniques and tools that are able to process the immense flow of information to create value can be powerful instruments. This study focuses on the application of data analysis to financial investments at LendingClub’s platform. LendingClub is an American peer-to-peer lending company. As investing in loans that end up not being paid evidently incurs in financial losses, it would be useful to have a way to identify loan requests that have a higher probability of being paid on time. Read More

poster image
student photo
André Fontanez Bravo

André Fontanez Bravo is studying Industrial Engineering at the University of Sao Paulo in Brazil.

Email: andrebravo@usp.br

Big Data has become an important field, and there is a steep learning curve to getting used to handling Big Data, especially in distributed systems. HSQL for HPCC Systems is a solution that is developed for allowing users to get used to its architecture and the ECL (Enterprise Control Language) language with which it primarily operates. HSQL aims to provide a seamless interface for data science developers to use, for working with data. It is designed to work in conjunction with ECL, the primary programming language for HPCC Systems, and should prove to be easy to work with and robust for general purpose analysis. Read More

poster image
student photo
Atreya Bain

Atreya Bain is studying for a Bachelor of Computer Science and Engineering at the RV College of Engineering, Bengaluru, India.

Email: atreyabain.cs18@rvce.edu.in

Tons of money is lost because of fraud committed by companies. There are already laws to punish company partners for these abusive acts for their own benefit, however, how can the authorities locate and take the necessary actions? This is where my work comes in. Identifying registration inconsistencies, suspicious behaviors or unusual situations may prevent or locate frauds. Using three different public databases as the starting point, I was able to link companies and partners to suspicious behaviors, such as receipt of undue government benefit by company partners and reports of work analogous to slavery in companies. Read More

poster image
student photo
Bruno Carniero Camara

Bruno Carniero Camara is studying Electrical Engineering at the University of Sao Paulo in Brazil.

Email: bruno.camara@usp.br

In order to foster a safe learning environment, measures to bolster campus security have emerged as a top priority around the world. The developments from my internship will be applied to a tangible security system at American Heritage High School (AHS). Processing student images on the HPCC Systems Cloud Native Platform and evaluating the HPCC Systems Generalized Neural Network (GNN) bundle on cloud ultimately facilitated a model’s classification of an individual as “AHS student” or “Not an AHS student”. This will allow a person to receive confirmation from the robot that they are in the student database and retrieve information as part of a larger, interactive security feature. Read More

poster image
student photo
Carina Wang

Carina Wang is a student at American Heritage School of Boca/Delray (AHS) in Florida, USA.

Email: bd557401@ahschool.com

In order to constantly evolve and generate better results from any system, we require constant studies to be conducted to assess and compare the performance of new and upcoming systems with the current industry standards. Through our project, we intend to perform a similar comprehensive comparative study between the current standard in Big Data Analytics systems - Hadoop, and that provided by HPCC Systems. This will allow us to assess both the similarities and differences between the two setups, which in turn will assist the end user or the client to make a better and more informed choice about the kind of system to be set up for their specific requirements. Read More

poster image
student photo
Chirag Bapat

Chirag Bapat is a student at the RV College of Engineering, Bengaluru, India.

Email: chiragbapat.cs18@rvce.edu.in

In the past NC State Strength and Conditioning has worked with HPCC Systems to create solutions for taking different data streams and bringing them together for a comprehensive analysis to improve athlete wellbeing and performance. Here you will see some solutions using HPCC Systems and RealBI to provide insight from data collected with the NC State Women's basketball team. You will see some differences from working with a Bare Metal environment to a Kubernetes environment. See how these solutions can help our understanding of this data to provide better service to these student athletes. Read More

poster image
student photo
Christopher Connelly

Christopher Connelly is a Data Scientist at North Carolina State University, who works with various sports teams to help them discover insights into the data collected about their players, that may help them to improve their fitness and technique. In previous years, he worked on the Athlete 360 project at NCSU and HPCC Systems supported this research via our Academic Program.

Email: cmconnelly21@gmail.com

This poster introduces a Virtual Reality (VR) based online shopping platform and its integration with a recommendation system with the demonstration of the virtual environment. With the advent of the pandemic, the ability of virtual reality platforms to provide a realistic shopping experience puts it in a unique position that assures safety and isolation while also offering the benefits of online shopping platforms to both customers and retailers. To foster user adoption and improve the experience of the user beyond the confines of traditional shopping experiences, a recommendation system is necessary in such a platform. Read More

poster image
student photo
Deeksha Shravani

Deeksha Shravani is studying for a Bachelor of Computer Science and Engineering at the RV College of Engineering, Bengaluru, India.

Email: deekshashravani.cs17@rvce.edu.in

During this current era of information, the use of cloud computing became a necessity due to the amount of computational power needed. The access to storage and processing power at low cost allied with ease of access are some of the advantages of using such service, which is available as platform as a service (PaaS), software as a service (SaaS), infrastructure as a service (IaaS), and hardware as a service (HaaS). In the IaaS model payment is normally under the Pay-as-you-go politics, where you pay for what you’re using. Though pricing may be cheap, the misusage of resources and unnecessary uptime can bring up the cost. Read More

poster image
student photo
Francisco Ciol Rodrigues Aveiro

Francisco Ciol Rodrigues Aveiro is studying for a Bachelor of Computer Engineering at INSPER, Sao Paulo, Brazil.

Email: francisco.rodrigues@lexisnexisrisk.com

A cluster is a connection between two or more computers with the purpose of improving the performance of systems in performing different tasks. In the cluster, each computer is called “node” and there is no limit to how many nodes can be interconnected. Then, computers start to act within a single system, working together in processing, analyzing and interpreting data, information and/or performing simultaneous tasks. It is interesting to know information about a cluster, such as its capacity and availability. Read More

poster image
student photo
Guilherme Santos da Silva

Guilherme Santos da Silva is a student at the Universidade Tecnológica Federal do Paraná in Brazil.

Email: guilherme.da@lexisnexisrisk.com

Not only was the creation of the internet the largest technological breakthrough of the 20th century, it also happened to become a hidden double-edged sword. The internet has allowed us to access information and communicate at unprecedented levels, across the globe. Yet, this comes at an enormous cost. The human cost. Hidden behind computer screens, we enjoy a security blanket of anonymity, which emboldens some to say and do things that are labeled as disturbing in a public setting. By creating a Toxicity Detection Platform, I aim to curb this harassment and provide a healthier web environment for everyone. Read More

poster image
student photo
Jefferson Mao

Jefferson started his internship as a student studying at Lambert High School in Georgia, USA. On completion of his internship, he was preparing to start his first year at university.

Email: jeffs.mao@gmail.com

The amount of open data made available by government agencies is getting bigger over time. This results in a large number of datasets with different layouts, formats and frequency updates, that can fall under the domains of Big Data. Despite being difficult to analyze, these datasets have a large amount of rich information that could be useful for applications involving public policies. The objective of this project is to develop a machine learning pipeline using HPCC Systems that can be ultimately used to identify outliers in the São Paulo city government´s real state registry extract. Read More

poster image
student photo
Luiz Fernando Cavalcante Silva

Luiz Fernando Cavalcante Silva is studying for a Bachelor of Civil Engineering at the University of Sao Paulo in Brazil.

Email: lfcavalcante@usp.br

The new science of Causality promises to open new frontiers in Data Science and Machine Learning, but requires an accurate model of the causal relationships between variables. This causal model takes the form of a Directed Acyclic Graph (DAG). Nature provides a few subtle cues to the structure of the causal model, the most important of which is the independencies or conditional independencies between variables. These independencies allow us to test a causal model to determine if it is consistent with the observed data, and in some cases to discover the causal model from data alone. Read More

poster image
student photo
Mayank Agarwal

Mayank Agarwal is studying for a Bachelor of Computer Science and Engineering at the RV College of Engineering, Bengaluru, India.

Email: mayankagarwal.cs19@rvce.edu.in

The early detection of the coronavirus disease 2019 (COVID-19) outbreak is important to save people's lives and restart the economy quickly and safely. People's social behavior, reflected in their mobility data, plays a major role in spreading the disease. Therefore, we used the daily mobility data aggregated at the county level beside COVID-19 statistics and demographic information for short-term forecasting of COVID-19 outbreaks in the United States. The daily data are fed to a deep learning model based on Long Short-Term Memory (LSTM) to predict the accumulated number of COVID-19 cases in the next two weeks. Read More

poster image
student photo
Murtadha D Hssayeni

Murtadha D Hssayeni is a PhD Candidate studying Computer Science at Florida Atlantic University, Florida, USA.

Email: mhssayeni2017@fau.edu

With cybersecurity attacks becoming more prevalent in the United States every year, organizations are constantly looking for ways to improve the security outlook of their platforms. Recently, HPCC Systems has begun transitioning to a cloud-native platform in which they use Docker containers managed by Kubernetes to store and manage data. With this new change, it is of utmost importance that HPCC Systems has a secure cloud environment since they are using it to manage secure data from other companies. Read More

poster image
student photo
Nikita Jha

Nikita Jha is a student attending Northview High School in Georgia, USA.

Email: nikitajha1912@gmail.com

Minimizing the cost of setting up cloud infrastructure is very important for all companies. Azure spot instances can provide great cost savings for cloud infrastructure setup. Azure Spot Instances are unused computing resources (virtual machines) azure has. Azure gives it for a lower price compared to normal virtual machines. It is found that Azure gives these instances at a rate that can be as low as 90% below the normal instance. The price can vary based on region and size. In this project, we try to analyze different aspects related to the use of Azure Spot Instance with HPCC Systems. Read More

poster image
student photo
Roshan Bhandari

Roshan Bhandari is a Masters student studying Computer Science at Clemson University in the USA.

Email: rbhanda@g.clemson.edu

With the global outbreak of COVID-19 pandemic, it has become crucial to track the active cases and vaccination data in order to analyse the current situation and trends. Hence, a systematic way of collecting, processing, enhancing, analysing and visualising the data and trends for better understanding has been very much needed. Through this project, we aim to provide the users with the required information about the covid cases since it’s outburst and the vaccination data in different states of India and country as a whole. Read More

poster image
student photo
Shivani C H

Shivani C H is studying for a Bachelor of Engineering and Computer Science at the RV College of Engineering, Bengaluru, India.

Email: shivanich.cs18@rvce.edu.in

Back to top of page
Processing. Please wait.