The optimal roommate matching service
The ingenuity of the Berkeley Roommate Network’s matching algorithm is hidden and subtle: namely, we use a subset of questions to compute an importance for some personal attributes. These attributes and their numeric importances are further provided and combined with other sets of questions to create a highly personalized matching priority queues for potential roommates. This information is set against threshholds determined by empirical averages from about 100 in-person surveys.
We use NLTK to use WordNet’s similarity function to analyze free response questions to a survey and look at this similarity metric across all words between two responses to match two people. WordNet’s similarity function is better than simply looking for occurrence of the same words in two responses because it takes into account context, synonyms, related meanings, and different forms or tenses of the same word.
High level algorithm:
1.) Create a dictionary mapping (email of client -> dictionary of important client data)
2.) For a given client email address, find "important data" or i_data
3.) for roommate in potential roommates:
potential_i_data = mapping(potential_email)
evaluate matching priority between client & potential roommate #the lower the better the match
queue.add(potential roommate, priority)
4.) Create a file with the top 5 matches (5 lowest priority people) and store locally
5.) Construct an email message with these top 5 matches and a CSV file with all information for these matches
6.) Send email message using SMTP authentication to client
7.) Repeat steps 2-7 until all clients have roommate matches.
From post-matching surveys, seems like matches were not good enough. People ended up using their own intuition and take their chances with who they believe will get along with, but our matches did seem to impact their decisions because we got a lot of requests for more emails and more matches.