Off-campus University of Richmond users: To download campus access theses, please use the following link to log in to our proxy server with your university username and password.


Ivan Jibaja

Date of Award


Document Type

Restricted Thesis: Campus only access

Degree Name

Bachelor of Science



First Advisor

Dr. Kelly A. Shaw


Data mining is the process of extracting useful information or patterns from large raw sets of data. In recent years the amount of data being collected has increased tremendously, which has resulted in the development of new and more complex data mining algorithms to go through the vast data. However, the rate of growth of the new computer systems does not equal the growth of the datasets and the complexity of these data mining algorithms. Because data mining applications are computationally intensive and can be made to use many processors in parallel, multi-core architectures have the potential to enable good computational performance for these applications. However, the memory intensity of these applications may lead to performance problems when storing these large datasets on-chip. Moreover, as the size of the working datasets for these applications continues to increase, new approaches to storing the entire dataset on chip need to be found since it might be infeasible to keep storing them on chip in their entirety. In order to do so, we analyze the data access patterns of these working sets to gain a better understanding of these applications.

In this paper, we examine the data access patterns of two parallel applications from the MineBench [9] data mining applications suite with very different data access patterns. The analysis done on these two applications is in an architecture independent manner, examining different levels of threading. Particularly, we keep track of the number of accesses, the duration and frequency of idle and nonidle periods for these datasets, and whether data is shared or not including the number of threads that access certain data. We find that there are differences in the access patterns of data that is shared across threads from data that is exclusive to a single thread. We extend earlier single-threaded analysis [11] to include parallelism, and we look at how our analysis of shared and nonshared data access patterns differs from the original analysis.