In June 2020 I was halfway through my summer internship at the National Collegiate Athletics Association, the non-profit org that handles the rules & eligibility criteria for the majority of college sports in the US. It was just a few months into the beginning of the pandemic, and all things considered it was going pretty well for a first internship despite everyone having to adjust to a fully remote work environment. Going into the summer I was already pretty familiar with Java, so I had managed to get started on the backend Spring API fairly quickly. It’s a great framework overall and generally pretty straightforward if you know the MVC model, but like any software there are a few footguns you have to watch out for…
At the time, we were doing a complete-from-scratch rewrite of a tool called Compliance Assistant, used by universities to determine whether or not their athletes are eligible to compete in different sports. We had finished most of the features around academic eligibility (requirements around GPA, credits earned, and so on) and were just adding some finishing touches, such as better error handling and performance improvements.
We noticed that a few of our API endpoints were taking a really long time (in extreme cases, 500ms+) when tested with a realistic-sized test database, so I started investigating. The IntelliJ profiling tool narrowed it down to one specific helper method, called getFullTimeStudents()
, that was being used by these endpoints. This method was just a single loop that ran through all of the students at the current university (no more than a few thousand) and counted the number of them that were considered full time by credit hours. No nesting, no recursion, no sorting, no complex calculations… basically, no obvious reasons why this should be a performance sink.
public int getFullTimeStudents(Org university) {
List<Student> students = university.getStudents();
int fullTime = 0;
for (Student student : students) {
if (student.getCourseList().getCreditHours() > 12) {
fullTime++;
}
}
return fullTime;
}
Interestingly, while the profiler reported a long wall time for our helper method, the cpu time was relatively tiny, which indicates something must be blocking in the method for a long time. The student.getCourseList()
call seemed like a likely candidate, so I decided to take a closer look.
If you aren’t familiar with Spring (or MVC frameworks in general), a model is the basic data structure that’s used to store all of the data in the web application. With Spring, defining a model is as easy as creating a class and adding any of the fields we want. For our Compliance Assistant tool, we had models for a Student, a University, and so on.
public class Student {
public String name;
public double gpa;
}
We typically use an ORM which enables us to automatically save these models into our database without having to write a bunch of SQL queries by hand. With Spring, we used a library Hibernate which was super straightforward to use.
We can even include other models within our model, and Hibernate will handle everything for us: creating tables for both of the models, plus a foreign key to reference the other table and generating the appropriate JOIN
queries when we try to access the model. You can see how we used this within the Student
model to include a list of courses the student is enrolled in (the CourseList
model):
public class Student {
public String name;
public double gpa;
@LazyLoad
public CourseList courseList;
}
Notice the @LazyLoad
attribute attached to the CourseList
field. This tells Hibernate that when it fetches a Student
from the database it can defer getting the student’s CourseList
until we actually need to use it, instead of loading it right away with the other fields like name
and gpa
. Since joins are a relatively expensive SQL operation, and we only rarely need the CourseList
when we’re working with the Student
model, this can give us a significant performance boost in most (but not all) cases.
Looking back at our loop from before, you can see how this might cause problems:
for (Student student : students) {
student.getCourseList();
}
Since all of these CourseList
s are lazy-loaded, each iteration of this loop forces Hibernate to create a new connection to the database and fetch just that one student’s CourseList
. For thousands of students this became super inefficient and was ultimately the root of our performance problems.
Unfortunately, just removing the @LazyLoad
annotation would have negative performance impacts on a lot of other parts of the code base. Instead, I had to write an SQL query to handle all of the logic in one go.
The CourseEnrollments
table looked like:
> SELECT * FROM CourseEnrollments;
|CourseName|CreditHours|StudentId|
|Math |3 |101 |
|English |3 |101 |
And the Students
table:
> SELECT * FROM Students;
|Id |StudentName|SchoolId|
|101|Caleb |1000 |
So we can join the two and filter the results to only include students enrolled at the university we care about:
> SELECT * FROM CourseEnrollments
> INNER JOIN Students
> ON Students.Id = CourseEnrollments.StudentId
> WHERE Students.SchoolId = ?;
|CourseName|CreditHours|StudentId|StudentName|SchoolId|
|Math |3 |101 |Caleb |1000 |
|English |3 |101 |Caleb |1000 |
Since we want the total CreditHours
for each student, we can use a GROUP BY
clause on each StudentId:
> SELECT StudentName, SUM(CourseEnrollments.CreditHours) AS TotalCredits
> FROM CourseEnrollments
> INNER JOIN Students
> ON Students.Id = CourseEnrollments.StudentId
> WHERE Students.SchoolId = ?
> GROUP BY CourseEnrollments.StudentId;
|StudentName|TotalCredits|
|Caleb |6 |
Finally, we just want to know the number of students that are enrolled full-time, so we can add a HAVING
clause to our query:
> SELECT COUNT(CourseEnrollments.StudentId) AS FullTimeCount
> FROM CourseEnrollments
> INNER JOIN Students
> ON Students.Id = CourseEnrollments.StudentId
> WHERE Students.SchoolId = ?
> GROUP BY CourseEnrollments.StudentId
> HAVING SUM(CourseEnrollments.CreditHours) >= 12;
|FullTimeCount|
|1 |
This whole saga highlights just how important it is to understand what’s really going on behind the scenes, especially when it comes to complex frameworks like Spring. Even additions that seem simple can have plenty of unintended side effects! While a lot of the magic of frameworks like Spring and Hibernate can save us a lot of time writing boilerplate code, there are always cases where some special logic has to be implemented by hand.