Notebook execution time?

Context--I'm working on a web app to basically use customer criteria and learn what developers they're eligible to travel with in our system. We currently use a very complex Google Sheets file, but as the company grows we need something that's easier to maintain with multiple simultaneous users. I'm expecting 500 to 1000 searches per day.

We're comparing guest data with 10 different developers on things like age, destination, income, state of residence, marital status, even ZIP code. The different developers all have slightly different qualifications, so I need to account for a wide range of possibilities.

I'm building the code in notebooks before trying to make it an app. I'm using lists of strings, and for-loops to run the user input by qualifications looking for a match. Maybe not the most efficient approach, but it's in line with my spreadsheet experience.

I'm about halfway done with the python. My concern is, the notebook is currently taking 8 to 11 seconds to execute. Unless I'm just taking the absolute WORST approach to building this thing, that just doesn't make sense.

Will the web app execute more quickly than my python workbook, or is the execution time similar?

The execution time is likely to be similar, so you might want to have a look at optimising your code before deploying it.

Looping through lists of strings is pretty-much the least efficient way that you can write something like this. If you're used to spreadsheets, perhaps you can have a look at pandas which is specifically designed for handling and filtering tabular data. You might want to also look at storing your information in a database and then using indexes to speed up your searches.

Thanks for the info! I'll look into pandas.

Question. My next thought was to use classes and instances and compare the individual elements, instead of looping through lists. Would that be faster and/or practical?

I don't think it would be. Though you might try some pre-processing that you could do upfront once: If you had dictionaries of the traits that you were searching on that were mapped to lists of the people that had those traits, you could use the dictionaries for fast lookups and then have much smaller lists to loop through to find the common elements.

That sounds helpful, but slightly beyond my comprehension of using dictionaries. Could you break that down for me a little?


Suppose you have 2 entries to search:

    ["trait1", "trait2", "trait3"],
    ["trait1", "trait3", "trait4"],

You can convert that into a dictionary (here I'm using the index in the list to point back to the item in the list):

   "trait1": [0, 1],
   "trait2": [0],
   "trait3": [0, 1],
   "trait4": [1]

Now I can get a list of all the entries that have traitx, with just one quick dictionary lookup. Then, if I want to find commonalities, I can loop through the (presumably much shorter lists of users to find ones that occur in all lists). Alternatively, I could use sets in the dictionary instead of lists and then use the set operations to find common entries and those are really fast.

You're paying a little up-front to set up the dictionary, but each query would be much faster.

That does look helpful, thanks!

Dumb follow-up question. Is in more processor friendly than the for loop? It just occurred to me that would be much cleaner than how I've been writing it. But would it be better?

Absolutely. Python dictionaries are incredibly well-tuned for fast and cheap lookups. Looping through lists is both slow and computationally expensive.

Excellent. So before I get too deep into the next-least-efficient method possible, let me run this buy you guys (Thanks again for all your help!!)

Instead of running the whole list of matching developers through the individual qualifications, I'm trying to use dictionaries to run the developers through the whole gauntlet one at a time. Before I get hip-deep in it, I'd like to verify that I'm on a path to avoiding 10-second processing times.

Here's what I got--best I can think of is a series of IF statements like I'd do in a spreadsheet. Help me any way you can!

userInput = {

    "location" : location,
    "age" : age,
    "spouseAge" : spouseAge,
    "maritalStatus" : maritalStatus,
    "income" : income,
    "secondID" : secondID,
    "state" : state,
    "zipCode" : zipCode,
    "date" : date,
    "existingVisa" : existingVisa,
    "existingDining" : existingDining,
    "existingCondo" : existingCondo,
    "existingExtraNight" : existingExtraNight

developer1Qualifications = {
    "location" : "Branson",
    "age" : [28, 85],
    "spouseAge" : [28, 85],
    "maritalStatus" : ["married", "co-hab", "single female"],
    "income" : 50,
    "state" : None
    "zipCode" : find_bad_zip_codes("Branson"),
    "maxVisa" : 100,
    "maxDining" : 100,
    "condoAllowed" : False

def check_qualifications(userInput, devQuals):
        nq = ""

    #run the gauntlet
    if userInput["location"] not in devQuals[]:
        return False
    if userInput["age"] < devQuals["age"][0] and userInput["age"] > devQuals["age"][1]:
        if userInput["spouseAge"] < devQuals["age"][0] and userInput["spouseAge"] > devQuals["age"][1]:
            nq += "-age-"
    if userInput["maritalStatus"] not in devQuals["maritalStatus"]:
        nq += "-marital status-"
    if userInput["income"] < devQuals["income"]:
        nq += "-income-"
    if userInput["secondID"] not in devQuals["secondID"]:
        nq += "-second ID-"
    if userInput["state"] in devQuals["state"]: #some devs have "red" states
        nq += "-invalid state-"
    if userInput["zipCode"] in devQuals["zipCode"]:
        nq += "-invalid ZIP code-"
    if userInput["existingVisa"] > devQuals["maxVisa"]:
        nq += "-Visa above limit"
    if userInput["existingDining"] > devQuals["maxDining"]:
        nq += "-Dining above limit-"
    if userInput["existingCondo"] != devQuals["condoAllowed"]:
        nq += "-Condo not allowed-"
    if userInput["existingExtraNight"] != devQuals["extraNightAllowed"]:
        nq += "-Extra night not allowed"

    #bring it home
    if nq == "":
        return "Qualified"
        return nq

One other clarifying question.

The zipCode variable will include, in some cases, upward of a hundred entries. Since the only distance API I've found isn't free, I don't think this is avoidable.

Is in still efficient for checking lists, or just dictionaries? What is the most efficient way to ask "is this zip code in this list of 150 zip codes" ?

definitely seems like you want to use a pandas dataframe, and do something like this:

found_users = users[users['location'] == inputted_location] & users['existingVisa'] > inputted_max_visa & users['existingCondo'] == inputted_condo_allowed & ...]

Thanks. Pandas is completely new to me. What's the best tutorial to bootstrap with?

Edit Will I need to install that into a virtualenv in order to use it in a notebook here, or should I just be able to "import pandas" ?

If you are using a virtualenv you would have to install pandas.

For a tutorial on pandas maybe just read through this? But there are probably many good tutorials available.