IMPROVING IMPROVEMENT: RESULTS FROM A PANDEMIC YEAR

David Hersh | Proving Ground

Volume 3 Issue 2 (2021), pp. 19-22

This is the fourth installment of Improving Improvement, our quarterly series focused on leveraging the power of research-practice partnerships (RPPs) to build schools’, districts’, and states’ capacity to improve. In a little over a year of writing for NNERPP Extra, we’ve shared an overview of our improvement work, lessons learned from working with existing partners during the pandemic, and lessons learned from creating and launching an improvement-focused RPP in response to the pandemic. We also recently argued for using stimulus funding to invest in an improvement infrastructure with RPPs playing a central role.

Writing this installment, as the 2020/21 school year winds down, we reflect on some of the outcomes of our improvement efforts with our partners. As we shared in our previous installments, we have been managing three continuous improvement networks: the Proving Ground (PG) Chronic Absenteeism Network, a network of larger, mostly urban school districts; the National Center for Rural Education Research Networks (NCRERN), a network of smaller, rural school districts that focused on attendance in its first improvement cycle; and the Covid Recovery Cadre (CRC), a purpose-built improvement network of Florida districts addressing students’ challenges in 8th-grade Pre-algebra and 9th grade Algebra. Across the three networks, nearly 40 district partners piloted 12 distinct interventions over the past year. While we have not formally written up the analyses – stay tuned for links to technical reports – we can share high level results and some broader takeaways here. Despite challenges posed by the pandemic, several of our partners’ interventions improved the outcomes they sought to address. At the same time, navigating the pandemic taught us lessons that will inform our work in future pandemic-free years as well.

Pilot Results

While our results are generally mixed in “normal” years (some share of piloted interventions will have substantively negligible effects) this year’s results were our most nuanced to date. Of the 12 interventions our partners across the three networks  tried, four had meaningful positive impacts on the main outcome of interest[1], one was too small to measure, one had a large impact that we struggle to explain and six had null effects. The four with positive impacts were the four easiest to implement. One of the interventions with null effects was based on an intervention that was highly successful in all prior pilots. And for one intervention, we found the largest impact of any of the more than two dozen interventions our partners have piloted, but we could not reconcile the impact with the implementation data. Thus, while our partnerships still generated evidence that districts could use to make decisions, we ultimately learned as many lessons about the challenges of piloting and testing as we did about the interventions themselves. Below is a summary of pilot results by network.

>> Proving Ground Chronic Absenteeism Network

In this network, our five partners tested five interventions (one per district) to reduce chronic absenteeism in their schools: restorative circles (an in-class practice that restores relationships through structured, mutual sharing), collaborative case management (a structured protocol for engaging families of high-absence students in the development of solutions), mentorship (routine engagement between a student and adult to build a trusted relationship), daily attendance nudges for virtual attendance (automated messages sent in the evenings to remind virtual students to log in if they haven’t yet), and weekly messaging nudges for virtual attendance (digital messages letting families know the students’ attendance record and reminding them to log in). Only the nudges and restorative circles cost-effectively improved the outcomes (the others either did not measurably improve attendance or improved attendance minimally relative to the effort they took to implement). For restorative circles, we estimated the largest impact we’ve ever estimated but also learned that the implementation diverged so much from the original design and was so complicated that we could not identify the treatment-control contrast that was generating the result. That partner is now engaged in a deep dive to identify what exactly it should scale up.

>> National Center for Rural Education Research Networks

Within NCRERN, our 28 rural partners piloted four interventions over the course of the fall and winter to increase attendance: elementary postcards sent weekly to students missing at least a day, with the postcards highlighting cumulative absences and what the student missed (9 districts), periodic personalized messages sent via text or robocall every 6-8 weeks letting families know how many days the student had missed (8 districts), mentorship (routine engagement between a student and adult to build a trusted relationship, 5 districts), and a family engagement practice that involved routine bi-directional messaging and included specific types of supportive messages, such as pro tips on how to improve attendance (6 districts). Of those, only periodic personalized messaging, the easiest intervention to implement, had substantively significant impacts on attendance while three had no evidence of meaningful effects on attendance. The null effect for postcards was perhaps the most surprising result, as the intervention was based on one that had proven effective for all five partners that piloted it in the PG Chronic Absenteeism Network in prior years. Given the complexity of implementing in a pandemic year and considering prior evidence of impact for the interventions, several partners have opted to repilot their interventions in 2021/22.

>> Covid Recovery Cadre

In the CRC, our four district partners piloted three interventions to improve students’ algebra achievements. Two piloted PERTS growth mindset modules for 9th graders, one piloted twice-weekly small group tutoring with adaptive software, and one piloted virtual MQI coaching involving up to six feedback sessions on teacher practice with a virtual coach based on recordings of class sessions. The latter involved only 12 teachers and therefore resulted in an evaluation largely focused on implementation and user-feedback. The tutoring intervention ran into myriad implementation challenges and resulted in a negligible impact estimate. For the growth mindset intervention, we got a mixed result, finding meaningful GPA improvements for one partner and negligible changes for another. All four partners are currently incorporating these results into their decision-making using a decision-making protocol that requires them to reflect on the effort it took to implement and define the size of impact they need to see to justify scaling up before they see the results. Where the impact falls below their defined thresholds (or where they did not get a rigorous impact estimate), they will choose whether to adapt and repilot or stop altogether. Where the impacts fall above their thresholds, they will decide whether to scale the same way they implemented, or scale with revisions to improve implementation.

Lessons Learned Beyond the Impact Estimates

While we learned a great deal about piloting in a pandemic, with any luck, many of those lessons will not be generalizable in a Covid-free future. Here, we focus on those lessons that are not unique to testing interventions during a public health crisis.

>> In continuous improvement, repiloting is a valid outcome

Decision-making in education agencies tends to be binary: scale up or stop. What makes “continuous improvement” continuous is the idea that all interventions can be improved upon. The options include trying again a different way. We have seen partners find cost-effective interventions in the first semester and iterate on them to find bigger impacts in the second semester. Likewise, when we find null effects, that can be because the theory was wrong or because implementation diverged so much from the design that the theory was never really tested. Piloting in a pandemic – with the implementation challenges that it creates – was an important reminder that where implementation fidelity is lacking, repiloting might be the right decision. Implementation data is therefore critical to the decision. It cannot be made with impact estimates alone. Next year, ten of our NCRERN partners will repilot: seven will repilot the postcards more in line with the original design and three will repilot their family engagement interventions.

>> Implementation data is critical

As noted above, where an impact evaluation leads to a null estimate, implementation data is critical for deciding whether to stop or try again with lessons learned the first time. However, we were also reminded this year that implementation data is critical even with a strong impact estimate. One of our partner’s interventions generated the largest impact estimate we have ever recorded. Yet based on their implementation, which involved a great deal of spillover and a non-random introduction of an additional treatment, it is difficult to say what caused the change in outcomes. As such, the district is undergoing an intensive post-mortem on their intervention to develop a hypothesis about what to scale up. They may need to try again. In all cases, implementation data is critical to making changes for next time.

>> Value the full set of outcomes

In the real world, even impact estimates and implementation data may not be enough to make decisions. The costs are usually far broader than what gets calculated – staff or other stakeholder push back, for example, might make an intervention that is otherwise free too costly to continue. The benefits are likewise not all easily quantified – staff or stakeholders might value an intervention for reasons that are important but not well captured by measurement instruments. Because this year made the impacts harder to measure than in the past, we worked with partners to incorporate a broader set of considerations into their decision-making. Instrumental or secondary outcomes became more critical. For example, one district saw no impact on academic outcomes in the short term but participating teachers recognized for the first time the degree to which their students lacked number sense. This was not enough evidence to scale up but helped them decide to repilot rather than stop.

>> Stakeholder engagement is not optional

One of the compromises we made to the process this year was limiting the amount of stakeholder engagement. Our standard continuous improvement process incorporates user-centered design principles that involves directly engaging stakeholders at at least three stages. Once a target population is defined, our partners meet with a few members of the population to develop personas (detailed summaries of who their target audience is). Partners rely on these personas to ensure they are designing interventions around the needs and personalities of the members of their target population. Once partners have designed their interventions, they create prototypes and engage members of the target population (and, often, those who will implement the intervention) to gather feedback on their prototypes. This feedback is used to finalize the design and implementation plan for the intervention before it is piloted. Finally, stakeholders are engaged during or after the pilot so that their experience can help characterize the impact estimates. This year, time was more limited, and stakeholders were less accessible, so we were forced to skip the first two engagement exercises. However, our partners’ interventions suffered as a result. Lack of stakeholder engagement affected the design of the interventions, the fidelity of implementation and the fidelity to the research design. Going forward we need to ensure partners can build in opportunities to engage users in the design, execution, and interpretation of their interventions.

Looking Ahead

Our next installment of Improving Improvement will look ahead to the upcoming school year. We are working with new partners on new outcomes and testing out different ways of delivering continuous improvement content.

We are also always open to additional suggestions for topics for future editions of Improving Improvement. Reach out to us with any questions you have about our networks, continuous improvement process, or ideas you would like to see us tackle.

David Hersh (david_hersh@gse.harvard.edu) is Director of Proving Ground.

 

[1] Proving Ground uses a Bayesian estimation model that allows for pooling across districts to evaluate the impact of pilots. All results referenced here are based on the pooled posterior estimates emerging from RCTs lasting from as little as 6 weeks up to around 20 weeks.

Suggested citation: Hersh, D. (2021). Improving Improvement: Results from a Pandemic Year. NNERPP Extra, 3(2), 19-22.

NNERPP | EXTRA is a quarterly magazine produced by the National Network of Education Research-Practice Partnerships  |  nnerpp.rice.edu