Home> Blog> Looking for a Model

Looking for a Model

This post originally appeared on the Software Carpentry website.

Updated: this CSV file has information on who taught when. The three columns are the person's unique identifier, the date on which they first qualified, and the dates on which they taught. (If someone has taught multiple times, there is one record for each teaching event.) People who haven't taught at all are at the bottom with empty values in the third column. Erin Becker's analysis of this data is posted on the Data Carpentry blog and discussed here.

We rebooted instructor training in October 2015, and things have been going pretty well since then. If we average over all 23 new-style classes, it looks like two thirds of people who take part actually qualify as instructors within four months of finishing the class:

DateSite(s)Days SinceParticipantsCompletedPercentageCum. ParticipantsCum. CompletedCum. %age
2015-10-15online170483062.5%483062.5%
2015-12-07Paris16277100.0%553767.2%
2015-12-07Potsdam16255100.0%604270.0%
2015-12-07Thessaloniki16244100.0%644671.8%
2015-12-07Arlington16210440.0%745067.5%
2015-12-07Vancouver1625480.0%795468.3%
2015-12-07Wisconsin1627571.4%865968.6%
2015-12-07Australia1623266.6%896168.5%
2015-12-07Curitiba16233100.0%926469.5%
2015-12-07Toronto162141285.7%1067671.7%
2016-01-05Oklahoma13319526.3%1258164.8%
2016-01-13Lausanne125201680.0%1459766.9%
2016-01-18Brisbane120201470.0%16511167.2%
2016-01-21Melbourne11727622.2%19211760.9%
2016-01-21Florida11725832.0%21712557.6%
2016-01-28Auckland11120735.0%23713255.7%
2016-02-16Online9126830.7%26314053.2%
2016-02-22UC Davis8523939.1%28614952.1%
2016-03-09U Washington6914214.2%30015150.3%
2016-04-13online343313.0%33315245.6%
2016-04-17North West U312300.0%35615242.7%
2016-05-04Edinburgh131500.0%37115240.9%
2016-05-11Toronto62700.0%39815238.1%

One of our goals for this year is to lower the majority completion time from four months to three; another is to increase the throughput from two thirds to three quarters. What I'd really like, though, is some help figuring out what statistical model to use for the other important aspect of our training and mentoring: how many of the people we train go on to actually teach workshops, and how quickly.

The data we have includes the following for each person:

  • unique personal identifier (we can easy anonymize individuals)
  • date(s) of the instructor training courses they took (someone may enroll, drop out, enroll again, and so on)
  • date(s) on which they were certified (they may have qualified for Software Carpentry and Data Carpentry at different times)
  • the date on which they taught their first workshop (if any)

"Mean time to teach first workshop" isn't a good metric, since roughly 1/3 of the people we've trained haven't taught yet. Should we use an inverted half-life measure, i.e., how long until the odds of someone having taught hit 50%? Or would something else give us more insight? Whatever we choose needs to be robust in the face of a big spike in our data in January 2016, when we retroactively certified a big batch of Data Carpentry instructors. If you have suggestions, comments on this post would be very welcome.