Policy research paper

Complaint upheld: delay, distress and inconvenience caused by insurers

Distress and inconvenience were recorded in nearly two-thirds of all complaints upheld by the Financial Ombudsman Service in 2023, for home, motor, travel and pet insurance. Delays are also at a five-year high. Which? investigates the harm being caused to consumers
33 min read
A man is on his phone, looking distressed as a result of his insurance claim

 Executive Summary

For consumers, insurance products are a safeguard against calamity; a resource to call on when things go wrong. Too often, the actions of insurers cause those relying on them unnecessary distress, uncertainty over delays in the process, and inconvenience - often to people already knocked off course by the events which led to their claim. The FCA’s Consumer Duty and specific insurance rulebook are clear that providers must act proactively to avoid causing harm to consumers, and deliver good outcomes throughout the claims process. However, complaints to the Financial Ombudsman Service (FOS) have highlighted issues with how the sector is treating its customers. Between the last full year of data before the covid pandemic (2019/20) and the latest year of data (2022/23), complaints to the FOS about insurance rose by 22% [1].

Which? has conducted a deep dive into a subset of complaints upheld by the FOS over the last five years, using a large language model to examine the text of over 8,500 decisions related to motor, home, travel and pet insurance. In each case, we examined an ombudsman’s reasoning for why a complaint was upheld, looking in particular at cases where an ombudsman found that an insurer has caused unnecessary distress and inconvenience, or an unfair delay. We found that:

Levels of upheld decisions citing distress and inconvenience or delay hit a five year high in 2023 

In 2023, the FOS cited distress and inconvenience caused to consumers 1,321 times, with this harm appearing in 64% of upheld complaints. They also found that insurers had caused unfair delays in 800 complaints - 38% of those upheld. On both measures, this is both the highest number and the highest proportion of harm cited in upheld complaints since 2019.

Some providers caused distress and inconvenience significantly more than others

Distress and inconvenience caused to consumers is not equally spread over the insurance market. Four providers of buildings and motor insurance were found to have caused distress and inconvenience in at least 70% of their upheld decisions since 2019. Concerningly, some of the UK’s largest providers caused distress in over 60% of upheld FOS decisions. Our analysis shows that better outcomes are possible, with some providers causing distress in considerably fewer of their decisions, as well as having lower rates of complaints against them upheld.

Distress and inconvenience was cited as a reason for upholding the complaint in at least half of upheld complaints relating to home emergency, car or motorcycle, buildings and contents insurance

The FOS found that home emergency insurance providers caused distress in almost three quarters (73%) of upheld complaints since 2019, as well as above average levels of delay. Delays were most often experienced in buildings insurance, for which the FOS published the highest number of complaints in this period (6,270 - an average of more than three per day.)

Consumers experiencing certain types of issues - including delay - are particularly likely to experience negative outcomes

The FOS assigns a ‘complaint issue’ to each complaint when it is first raised, describing the problem highlighted by the consumer. We found that more distress and inconvenience was caused in complaints related to delays in a claim than any other type, highlighting the impact these delays cause. Concerningly,  levels of distress were also high in areas where we would expect insurers to exercise particular care, including medical issues in travel insurance and cases where an insurer needed to make repairs to buildings.

That the FOS is more often finding that insurance firms have caused people avoidable distress and delays, with some providers and products causing particularly poor outcomes to consumers, is a clear area of concern. We recommend that: 

  1. The FCA should ensure that its planned review of how swiftly insurance claims are handled covers wider issues with how firms handle claims, including how firms identify and respond to issues of potential vulnerability arising from the nature of the claim.
  2. The FCA should undertake enforcement action against insurance firms that are persistently failing to meet its requirements to avoid causing foreseeable harm to customers and to handle claims fairly and timely. 
  3. The FOS should publish metrics on the reasons for complaints being upheld to improve visibility of issues affecting consumers.

Chapter 1: Introduction

Insurance products and services are widely held by UK consumers to protect against things going wrong in their lives. It is therefore vitally important that consumers are treated fairly by their insurance provider, especially when they need them most. That might be following a car crash, a flood in their home or when needing medical treatment on holiday.

However, rising numbers of complaints made against insurers to the Financial Ombudsman Service (FOS) have highlighted issues with how the sector is treating its customers. Between the last full year of data before the covid pandemic (2019/20) and the latest year of data (2022/23), complaints to the FOS about insurance have risen by 22%. Increases were even more pronounced for widely-held general insurance products. Complaints about pet insurance were up by more than half (52%) [2]. Motor insurance and travel insurance complaints were both up by 49%, and complaints about buildings insurance have risen by 41% [3].

This is despite long-standing requirements on insurers to deliver good outcomes for customers and handle insurance claims fairly and promptly. The FCA’s Insurance Conduct of Business Sourcebook (ICOBS) has long required insurers to act honestly, fairly and professionally in accordance with the best interests of its customers. Firms must handle claims promptly and fairly, providing reasonable guidance to help a policyholder make a claim and appropriate information on its progress [4]. Firms must also consider how the level of support needed for customers who have characteristics of vulnerability may be different from that for others, and take particular care to ensure they act to deliver good outcomes for those customers [5].

These requirements have been further reinforced by the introduction of the Consumer Duty, which came into force in July 2023. It requires all financial services firms to avoid causing foreseeable harm to consumers, taking proactive and reactive steps. While the FCA has been clear that insurers should already have been meeting many parts of the Consumer Duty based on existing requirements [6] and that it was a less significant change than for other parts of financial services [7], it outlined guidance to support firms implementing the Consumer Duty that makes clear that unreasonable delays to claims processing should be avoided and customers should receive fair claims settlements [8].

Data pointing to rising complaints, together with stories we have heard directly from consumers [9], has led Which? to seek to better understand what is happening for consumers in the insurance industry. In this paper, we present a new analysis of FOS data.

 Understanding consumer issues in insurance using FOS data

Where firms are unable to resolve a complaint to a customer’s satisfaction within eight weeks of it first being raised  the customer can escalate the complaint to the FOS [10]. The FOS provides an impartial decision on a complaint on the basis of what is fair and reasonable, accounting for regulatory requirements, law and good practice [11]. Where the FOS upholds a customer’s complaint, this is clear, independent evidence that the firm’s actions have caused harm to the consumer, whether financial or non-financial, that justifies redress. 

FOS decisions are therefore hugely important for identifying consumer harm, understanding the drivers of this harm, and remedying it. This information is already used in several ways to try and improve consumer experiences:

  • When handling complaints, firms are required by FCA rules to consider FOS guidance and to appropriately analyse FOS decisions concerning similar complaints they have received [12]. 
  • The FCA monitors complaints against firms and those escalated to the FOS closely to identify issues with how firms are treating customers and handling complaints. 
  • Finally, the FOS seeks to use the insight from its decisions to inform and improve firms’ practices to prevent unfairness arising in the first place. This involves working directly with firms, HM Treasury, consumer groups and others [13].

As well as publishing guidance and case studies for firms, the FOS publishes data on all its final decisions in PDF format through its online database of decisions [14]. Each decision includes a detailed explanation of the complaint and the results of the ombudsman’s investigation, including the final decision and reasons for this. 

FOS complaints, then, are a powerful record of where things are going wrong for consumers. However, this data is not published in a structured format,  meaning the only way to identify patterns or trends within FOS decisions is to read the text of each PDF,  reducing the ability of both regulators and consumer organisations like Which? to understand what is going on for consumers. 

To overcome this barrier, and explore what we can learn from FOS decisions about the harm consumers are facing in insurance markets, Which? has used cutting-edge data science techniques to parse thousands of these PDFs and explore trends in upheld complaints.

This report does not paint a picture of the experience of the average insurance customer, and upheld FOS decisions do not give us a picture of consumers’ everyday experience in insurance. Each complaint examined in this report is the result of a consumer who has i) complained to their insurer; ii) been unsatisfied by the response and raised a complaint to the FOS, and iii) had that complaint upheld at the final, legally-binding decision stage. Consumers who drop their complaint during this process will not be accounted for - neither will the majority of consumers who never complain to their insurer. Rather than a representative sample of insurance customers, we have set out to uncover trends within cases where something has gone verifiably wrong, as this is likely to inform us about some of the most persistent causes of consumer harm. 

To ensure we were targeting types of insurance relevant to large numbers of consumers, we focus here on four widely owned categories: motor (including breakdown), home (buildings, contents and home emergency), travel and pet insurance. Motor and home insurance (contents and buildings) and motor breakdown cover are the most commonly held insurance products, according to the FCA’s Financial Lives Survey [15]. Travel or pet insurance are only useful to consumers in specific circumstances, so are not as widely held, but nevertheless represent important and relatively routine forms of insurance for consumers [16], justifying their inclusion.

Methodology

Which? collected the contents of, and metadata related to, just over 322,000 decisions published by the FOS on their online database [17]. These were then filtered as follows:

  • Decisions not related to motor, home, travel or pet insurance were identified using the ‘product type’ annotation provided on each document by the FOS, and removed from our dataset;
  • Decisions regarding business with fewer than 30 published decisions were removed [18];
  • Decisions dated earlier than 2019 or later than 2023 were removed (in part as we found product type information was missing before these dates). 

This left us with 20,749 decisions relating to motor, home, travel or pet insurance, of which 8,547 were upheld. Each of these represents an Ombudsman’s final, rather than provisional, decision on a complaint.

While we were unable in this dataset to differentiate between complaints made by consumers and those made by other organisations who can refer to FOS, such as small businesses and some charities, the filter for product type purposely excluded types of insurance designed for businesses, such as ‘commercial vehicle insurance’. Visual checks of the data suggest this was successful, and that the overwhelming majority of cases analysed here are brought by consumers.  

For each upheld decision, we extracted the full text from the PDF and sent it to OpenAI’s latest GPT4 model, using their API [19]. The large language model (LLM) was asked to apply a coding framework developed by human coders at Which?, including experts in insurance, designed to ascertain the reasons given by an Ombudsman for upholding each complaint.

After substantial efforts developing and testing various codes (described further in the technical annex to this report), we here present results for two key factors: distress and inconvenience and unfair delays. While other behaviour by insurers will have a negative impact on consumers, these two types of harm were chosen for three reasons: 

  • They closely reflect language used by the FOS in making decisions.
  • They are directly related to emotional and time harm caused to consumers.
  • We found, in testing, that we were able to algorithmically identify them at scale with sufficient accuracy.

Distress and inconvenience captures cases where the insurer’s behaviour has resulted in a negative emotional or practical impact on a consumer [20], for example if they have had to repeatedly contact their insurer to resolve an issue, or have been left without a vehicle or hot water for a period of time while a claim was resolved. The FOS are clear that this must be greater than the minor inconvenience we all expect as a routine occurrence in daily life, and must be a direct result of the insurer’s actions [21]. 

Unfair delay captures cases where an insurer’s actions have led to an unnecessary and avoidable delay in resolving a claim.

To assess distress and inconvenience and delay caused to consumers in insurance, we looked for cases where an ombudsman, in upholding a complaint, cited these outcomes caused by the actions of an insurer as part of their reasoning. In this, we focused on measuring the frequency of harm, not the amount of harm caused in each case. While it would be useful to be able to quantify the extent of distress caused, or the length of a delay, the complexity of language used to describe this in a decision put this task beyond the reach of this project.

Delay and distress and inconvenience are not mutually exclusive. Indeed, in many cases an unfair delay caused by an insurer might be the cause of distress highlighted by an ombudsman, and so both labels will be applied to a single document. Equally, a complaint might be upheld for an unrelated reason, in which case neither label will be applied. 

These labels offer valuable information about the frequency with which consumers are experiencing distress and inconvenience and delays when engaging with insurers. This is particularly valuable to help us understand how insurers are meeting their regulatory obligations towards consumers, particularly in light of the FCA’s planned review of delays in claims handling [22] and interest in understanding how the Consumer Duty is being applied.

Due to the nuance inherent in natural language, as with any automated system, our labelling here is not 100% accurate. In testing against a ‘gold standard’ of 100 decisions cross-coded by a team of trained Which? analysts, including experts in financial policy and insurance, we found that GPT4 was 94% accurate when labelling decisions as ‘distress and inconvenience, and 78% accurate applying the label for ‘unfair delay.’ This was broadly comparable to human performance, with a team of expert coders applying distress codes with 94% accuracy, and delay at 84% accuracy. That a large language model performs worse than humans at a task of this complexity is not surprising - the value of this approach remains that we are able to go beyond human constraints and study the contents of thousands of documents. 

For delay, our measured rate of 78% accuracy means that while aggregate analyses which encompass large numbers of documents are likely to be a good reflection of reality, it is difficult to draw conclusions from smaller quantities of data. As a result, we have applied a 95% confidence interval to all delay analysis below -  a statistical test which, for each measured value, gives us a range of values in which we can be 95% confident the ‘real’ value will lie. As this interval becomes larger for smaller sample sizes, we have refrained from publishing any analysis of delay data at a provider level where we cannot be sufficiently certain about accuracy.

For more precise accuracy metrics on these classifiers, as well as discussion on the process involved in coding documents with an LLM, please see the technical annex to this document.

This report

This report outlines five key findings from our analysis of FOS data, each of which is explored in detail:

  • Both the number and proportion of complaints upheld by the FOS in relation to our core consumer products have risen since 2019, suggesting growing issues with how insurers are treating consumers.
  • Distress and inconvenience and delay caused to consumers has been rising since 2019.
  • The frequency of distress and inconvenience and delay caused to consumers varies considerably by product type, with home emergency insurance causing distress in almost three quarters (73%) of upheld complaints.
  • The frequency of distress caused to consumers varies considerably by provider, with a group of buildings and motor insurance providers causing distress in over 70% of their decisions
  • Consumers with certain types of complaint, including around claim delays and medical issues, are particularly likely to experience negative impacts.

We then present a series of recommendations to the FCA and FOS on how this rising tide of consumer detriment in insurance markets can be tackled.

 Chapter 2: Findings

Levels of upheld decisions citing distress and inconvenience or delay hit a five year high in 2023

The number of complaints about our core consumer products upheld by the FOS for any reason has more than doubled in the last five years, as shown in Table 1. This suggests that insurers are failing more often to provide consumers with fair treatment and adequately resolve complaints without the need for mediation. While the initial increase in complaints made to FOS, and complaints upheld, aligned with onset the Covid-19 pandemic in 2020 [23], the persistence of this trend to 2023 after these complaints began to tail off [24] implies there are wider issues within the industry.

Table 1: Complaint uphold rates, and the proportion of upheld rates in which a consumer was found to have been caused distress, by year 

YearComplaintsComplaints upheld% upheldUpheld complaints citing distress% of upheld complaints citing distressUpheld complaints citing unfair delay% of upheld complaints citing unfair delay
20234,0872,080511,3216480038
20224,5691,866411,0535660332
20215,1771,986381,0995553927
20204,5871,692379565747928
20192,3299234048953​302
33

Source: Which? analysis of data from the FOS database of decisions. Note that distress and delay are not mutually exclusive and some upheld complaints contain both.

The proportion of upheld complaints where the insurer is judged to have caused distress or delay to the consumer has also increased. The FOS considered that insurers caused distress and inconvenience to consumers in nearly two-thirds of all upheld complaints relating to our products of interest in 2023, more than 1,300 individual cases. This is both the highest number and highest proportion of upheld complaints related to distress and inconvenience since 2019, and represents a substantial increase in the proportion of complaints involving distress and inconvenience relative to previous years. There is a possibility that this reflects firms being held to a higher standard following the implementation of the Consumer Duty in July 2023, however we consider this to be unlikely as we would expect cases relating to the Duty to take some time to work their way through to FOS, and that the Duty was not ultimately expected to lead to an increase in FOS caseload [25].

While unfair delays were cited less frequently, appearing in 38% of upheld complaints in 2023,  this is also the highest number and proportion of upheld complaints related to unfair delay since 2019, supporting the FCA’s decision to further investigate the issue of delays in claims handling [26].

Taken together, our findings that the FOS are increasingly likely both to uphold complaints about insurers and find that consumers faced one of the two poor outcomes we examine here suggests growing issues in the insurance industry meriting further investigation. 

Distress and inconvenience was cited as a reason for upholding the complaint in at least half of upheld complaints relating to home emergency, car or motorcycle, buildings and contents insurance

Figure 2 shows the volume of upheld complaints related to each product type in our dataset where FOS found insurers caused distress and inconvenience or unfair delay. Most strikingly, this shows that distress and inconvenience was cited as a reason for upholding the complaint in at least half of upheld complaints relating to home emergency, car or motorcycle, buildings and contents insurance, suggesting widespread issues in the ways consumers are being treated.

Figure 2: Product types by % complaints upheld in part due to distress and inconvenience caused, and unfair delay

Source: Which? analysis of FOS data (n = 8,547 upheld complaints). Sample size of upheld complaints: Buildings Insurance (n = 2,737); Car or Motorcycle Insurance (n = 2,294); Travel Insurance (n = 1,128); Home emergency Insurance (n = 1,018); Contents Insurance (n = 541); Pet Insurance (n = 390). Products receiving fewer than 300 upheld complaints have been omitted from the analysis.  Note that distress and delay are not mutually exclusive so some upheld complaints could contain both.

This graph shows that upheld complaints are more likely to relate to distress and inconvenience or unfair delay in some product types, specifically: 

  • Home emergency insurance is a concerning outlier here, with avoidable distress and inconvenience caused to consumers in almost three quarters of upheld complaints (73%) and above average levels of delay (38%).
  • Customers with buildings insurance products experienced unfair delays in 46% of upheld complaints - more often than any other type.
  • Customers for some types of product experienced levels of distress which don’t seem to be driven by delays. Whilst distress and inconvenience was invoked in two-thirds of upheld claims for motor insurance (66%), levels of delay were lower than average in complaints related to these products (28%).

The amount of distress and inconvenience caused to consumers varies considerably by provider

Avoidable distress and inconvenience and delays both represent poor consumer outcomes, and the FCA is clear that insurers should proactively work to avoid causing them [27]. Below, we take a detailed look at how levels of distress and inconvenience differ across various insurance providers.

Figure 3 shows FOS decisions at an insurer level. The dots here represent providers, sized by the number of complaints received (whether or not these complaints were upheld) and coloured by the product type most often complained about. 

A provider receiving more upheld complaints won’t necessarily be a sign of bad practice, but is likely to be a function of market share. Insurance is a market characterised by a small number of providers with a large customer base, with a ‘long tail’ of firms catering to smaller groups. As such, the biggest driver of variation in upheld complaint numbers shown below is more likely to be an insurer’s market share, rather than indicating higher levels of dissatisfaction.

Clearly, a firm’s market share is relevant to the number of consumers who could be impacted by poor practice. Our assessment of providers here, however, is not drawn from their size, but their position on this graph. Broadly, providers sitting towards the upper right of this graph are both more likely to have complaints upheld against them, and more likely to be found to have caused distress and inconvenience in upheld decisions. For clarity, and to ensure a large enough sample size, we have greyed out firms receiving fewer than 100 complaints (upheld or not) and do not include them in this analysis.

The spread of points in Figure 3 suggest that, even within a given type of insurance, there is a wide range of performance on our metrics between firms. Below, we look into the individual insurers making up four loosely defined groups visible in this graph: 

  • High distress and inconvenience: Insurers most often causing distress and inconvenience, in more than 70% of upheld complaints. 
  • Other issues: Insurers with high upheld rates, but below average levels of distress and inconvenience, suggesting other issues affecting consumers. 
  • Better practice: Insurers with low levels of distress and inconvenience and low upheld rates, indicating better practice relative to other providers.
  • Business as usual: Insurers close to the mean upheld and distress and inconvenience  rates, including large providers, showing that high levels of distress and inconvenience are relatively normal across the sector.

Figure 3: Providers by % decisions upheld vs % of decisions upheld in part due to distress and inconvenience caused

Source: Which? analysis of FOS data. (n=20,749 complaints) colour indicates product type.

Group 1: High distress and inconvenience

Figure 4: Providers with high levels of distress (over 70%) in upheld complaints

Source: Which? analysis of FOS data. (n=20,749 complaints), colour indicates product type.

Four companies stood out at the high end of our ‘distress’ scale, all in the motor or home emergency categories. In each case, the FOS found these providers caused distress or inconvenience to consumers in at least 70% of upheld complaints. In an extreme case, over four in five Hastings Insurance customers who had their complaint upheld were awarded redress for distress and inconvenience caused. These high figures suggest recurring issues in the ways these providers deal with consumers.

Group 2: Other issues

Figure 5: Providers with high upheld rates (over 50%) but below average levels of distress

Source: Which? analysis of FOS data. (n=20,749 complaints), colour indicates product type. 

Distress and inconvenience caused to a consumer is often one of a number of factors involved in an ombudsman’s decision to uphold a complaint. They might find that a provider has failed to communicate policy terms effectively or charged a price which is unfair - and while any of these practices might cause distress and inconvenience, this won’t always be the case.

The group shown above includes providers who are likely to have complaints against them upheld, with a rate higher than 50% in each case, but who have caused distress and inconvenience in a lower than average proportion of upheld complaints. Although the sample sizes here are not huge - one provider here is close to our 100 complaint cut off - this suggests that, at least for those customers with cause to complain to the FOS, there is often something in these insurer’s behaviour which makes these complaints justified. 

We can, however, see mixed levels of distress and inconvenience amongst this group. One provider of pet insurance, Casualty and General, has an upheld rate of 86% - more than twice the average - but distress and inconvenience is cited in 43% of upheld decisions concerning them, lower than most of the providers in this analysis. This is in line with relative lower average levels of distress caused to consumers in pet insurance. 

These providers indicate a space in which something else might be going on - more work is needed to understand the drivers of high upheld rates. These providers should be proactively undertaking work to understand the drivers of high uphold rates in line with their obligations under the Consumer Duty, to avoid causing foreseeable harm to customers in future.

Group 3: Better practice

Figure 6: Better practice - low levels of distress (under 50%) and below average upheld rates

Source: Which? analysis of FOS data. (n=20,749 complaints), colour indicates product type. 

At the other end of the scale, a handful of providers achieved below average rates for upheld complaints and, when complaints were upheld, were found to have caused distress and inconvenience in fewer than half of their upheld decisions. 

This may reflect good practice: the National Farmers’ Union Mutual - based on an assessment independent to the work published here - is currently the only Which? recommended provider for home insurance.

This group, while small, indicates that it is possible for insurance providers to treat consumers well, and, in particular, that there is nothing specific about the buildings insurance market which necessarily implies high levels of distress are unavoidable.

Group 4: Business as usual

Figure 7: Large providers are close to the mean upheld and distress rates – with distress still present in >50% of upheld complaints

Source: Which? analysis of FOS data. (n=20,749 complaints), colour indicates product type. 

Each provider in this figure is close to the average on both metrics, and caused avoidable distress and inconvenience in at least half of complaints about them upheld by the FOS. Of providers with close-to-average levels of distress and upheld rates we have named those with the highest number of complaints.

That this group contains some of the largest providers in the UK shows that the harm highlighted here is not limited to niche products, or smaller insurers catering to a select group of consumers; it is widespread. That distress and inconvenience is consistently present in the majority of complaints upheld suggests systemic failings among insurers to meet their Consumer Duty obligations to avoid foreseeable harm to consumers. 

If these providers improved how they learned from past FOS cases to improve how they design and deliver their claims-handling operations, and how they approach complaints when issues arise for customers, this could result in a marked shift in the levels of distress and inconvenience caused to consumers.

Consumers with certain types of complaint are particularly likely to experience negative impacts

The FOS assigns a ‘complaint issue’ to each complaint when it is first raised, which is included as metadata to each decision. Examining these, we found that consumers experiencing some types of issues are much more likely to suffer from distress and delay than others. Again, this is distress caused by the actions of an insurer, above and beyond that caused by the event which caused them to make a claim.

Table 8: Uphold, distress and delay rates for complaints by complaint issue

Complaint issueMost common product typeComplaintsUpheldUpheld with distress% Upheld with distressUpheld with delay% Upheld with delay
Claim DelayBuildings Insurance227713189747498575
Administration or Customer ServiceCar or Motorcycle Insurance24349576917227028
Claim RepairsBuildings Insurance

1868

9426326737340
Claim LiabilityCar or Motorcycle Insurance

304


12479643931
MedicalTravel Insurance

877


269171648331
RenewalBuildings Insurance5361891115953
Cancellation of PolicyCar or Motorcycle Insurance84632318858268

Source: Which? analysis of FOS data. Issues with fewer than 100 upheld complaints are not shown here.

Table 8 shows distress and delay levels for complaint types since 2019. For brevity, and to avoid misleadingly high proportions in cases where e.g. a single complaint was upheld, we have removed products with fewer than 100 complaints from this analysis. 

To highlight the impact delays can have on consumers, complaints initially about a delay were the most likely to be upheld for causing avoidable distress - this was true in approximately three quarters of cases.

We also found that consumers who had a travel insurance complaint upheld relating to medical issues had suffered additional distress and inconvenience in 64% of cases, and experienced avoidable delays in 31% of cases (close to the average amount of 32%). This is concerning - this group is already likely to have suffered physical or emotional harm by virtue of needing to make a claim, and may have characteristics of vulnerability related to poor health. In a category where you might expect insurers to be particularly careful not to cause emotional harm to consumers, recognising the potential vulnerability of claimants, it’s concerning to see the figure for distress higher than the average (58%). Delays here could have serious consequences for consumers, particularly if they mean people are unable to receive prompt treatment abroad, or face eye-watering medical bills.

The table also shows that 40% of upheld complaints related to claim repairs were cited to have unfair delay caused by insurers - higher than the average of 32%. This is another area where we’d expect insurers to exercise particular care, as delays can cause considerable impact on claimants, for example, leaving people in sub-par or temporary accommodation for a prolonged period of time.

Chapter 3: Recommendations

There were already a range of indicators highlighting issues with how consumers are being treated by insurers, including rising levels of complaints. The analysis we have presented here sheds new light on some of the persistent drivers of consumer harm, based on independent decisions on complaints by the FOS. The levels of distress and inconvenience caused to customers, including by unnecessary delays, should be hugely concerning for industry and policymakers. It suggests that longstanding regulatory requirements on how firms should treat their customers and prevent harm arising, which have been reinforced with the introduction of the Consumer Duty, are very often not being met. 

Given the nature of insurance, the scale of financial, emotional and time harm from poor industry practices can be hugely significant. Further understanding why firms are not supporting their customers sufficiently and how to address this should therefore be an urgent priority for the insurance industry and the FCA. We recommend action in three areas:

A broad regulatory review of insurers’ claims-handling practices

We welcome the FCA’s announcement in its current annual business plan that it will conduct a multi-firm review of how swiftly the insurance industry responds to claims, including where customers are more likely to show characteristics of vulnerability [28]. Our findings show that unfairly delayed claims are a widespread issue in the sector, particularly affecting buildings insurance where this can mean people are living in damaged homes or alternative accommodation for extended periods.

However, we have also found that levels of distress were persistently high in areas where we would expect insurers to exercise particular care, including medical issues in travel insurance and cases where an insurer needed to make repairs to buildings. High levels of distress and inconvenience span widely-held products and include many large providers. The FCA must therefore ensure that its planned review is sufficiently broad enough to properly address consumer harm arising from how claims are handled by insurers. 

Recommendation 1: The FCA should ensure that its planned review of how swiftly insurance claims are handled covers wider issues with how firms handle claims, including how firms identify and respond to issues of potential vulnerability arising from the nature of the claim.

Enforcement action

Our analysis shows persistently high rates of upheld complaints due to avoidable distress and inconvenience against many individual insurers. We have identified providers and their product areas where this issue is most prevalent. We want the FCA to urgently prioritise its supervision activities on these firms, seeking evidence of how these firms have considered past FOS decisions made against them to improve how they prevent harm from arising, including in the complaints process. Where firms have consistently failed to take action to address past failings, the FCA has a range of enforcement powers that it can use to force firms to take action. 

Recommendation 2: The FCA should undertake enforcement action against insurance firms that are persistently failing to meet its requirements to avoid causing foreseeable harm to customers and to handle claims fairly and timely.

Better insight on the drivers of harm

Our analysis here has been made possible by extensive public data made available by the FOS, including the full text of each decision, and metadata on the type of product involved in each case. However, we were only able to parse this information due to our in-house data science expertise; without this, the rich insights uncovered in this study would remain hidden. 

Steps to improve the availability and visibility of key metrics, such as the reasons for upholding a complaint, would both help industry to understand trends in performance, and allow the regulator and consumer advocates like Which? to more easily monitor potential issues as they arise and take timely action where necessary.  As a first step, FOS should add the reasons for upholding a complaint to the metadata provided in their existing document library. 

Recommendation 3: The FOS should publish metrics on the reasons for complaints being upheld to improve visibility of issues affecting consumers.  

Technical Annex

Detailed methodology

Data collection

To collect FOS decisions, Which?’s data science team developed a series python scripts to collect every PDF published on the FOS website - which is browsable at https://www.financial-ombudsman.org.uk/decisions-case-studies/ombudsman-decisions

This resulted in a set of 322,490 documents, spanning all of the FOS’ regulated financial products, and dated between February 2013 and February 2024. These documents were saved to a PostGRES database, along with relevant metadata included with each file. This metadata was authored by the FOS, and included the ‘Complaint issue’ field explored in our findings above.

Data processing

Before reading the text, this dataset was filtered to only include decisions relevant to this report: 

  • Decisions not related to motor (including breakdown), home (buildings, contents and home emergency), travel and pet insurance were identified using the ‘product type’ annotation provided on each document by the FOS, and removed from our dataset;
  • Decisions regarding business with fewer than 30 published decisions were removed, taking a cue from FOS policy of removing complaints about businesses with low complaint numbers from periodic reporting [29];
  • Decisions dated earlier than 2019 were removed (in part as we found product type information was missing before these dates); and
  • Decisions not marked ‘Upheld’ by the FOS were removed.

This process also corrected a number of misspelt or inconsistent entries in the metadata. We then used python to extract the full text of each relevant document and save this to our database.

Classification 

As the collection was underway, a group drawn from across relevant teams at Which? - including experts in financial policy, journalists covering insurance, analysts and data scientists - compiled a coding framework to identify reasons given by an ombudsman within FOS decisions which were relevant to consumer harm. This coding framework included ‘Unfair delay’ and ‘Distress and inconvenience’, alongside other labels relevant to harm, such as ‘Poor communication’, ‘investigation failings’ and ‘unsatisfactory remediation’. These codes were non-exclusive - multiple codes were often applied to a single document.

This coding framework was then used to generate a ‘gold standard’ test set, as follows: 

  • A small initial sample of documents was read by a researcher, and used to refine the coding framework, resolving ambiguities and filling in gaps.
  • A random sample of 219 FOS decisions was then split between 14 members of the working group, with a proportion of each coder’s sample cross-coded by another member of the group.
  • Labels which appeared regularly in our dataset, and which had high agreement between coders, were then discussed with the working group, and an agreement was made to prioritise two: distress and inconvenience, and delay. 
  • Remaining disagreements on these two labels were resolved by a pair of analysts, and this dataset was expanded to include 100 cross-coded and decisions labelled with delay and distress.

Once this gold standard was in place, the data science team built a python pipeline to pass the text of decisions to a Large Language Model via the OpenAI API, with responses written back to our database.  For each document, the full prompt contained:

  • A description of the task.
  • The full text of the document.
  • The coding framework, instructions on how to apply it, and examples for each label.
  • A definition of structured format to return the data in.

We then tested the performance of various models against our human coded gold standard, measuring the effect of changes to the prompt (different framings of the problem, the number of labels applied at once, etc) and various API parameters (e.g. the model’s ‘temperature’ - essentially a measure of how outlandish the language returned is). One useful feature of the OpenAI API here was the ‘completions’ parameter, which asks a model to return a defined number of independent responses for a single prompt. This allowed us to perform a kind of cross-coding for each sample, asking for multiple completions and using the number of these which agreed that a label should be applied as a crude confidence score.

This was not always a predictable process. While the highest accuracy was consistently achieved by OpenAI’s most powerful available model at the time, (gpt-4-0125-preview) we found that different prompts, settings and approaches worked for delay and distress respectively. While aggregate accuracy scores for distress quickly reached a high level, settling at 94%, scores for delay using the same approach were much lower, with a precision (broadly, the proportion of labels applied by the classifier which are in fact correct) of only 60%. The difficulty here, also faced by our human coders, was primarily in determining when delay was the fault of the insurer, and when it was down to the claimant, another third party, or some other aspect of the event which prompted the claim.

To improve precision on the delay label we tested an approach called ‘chain-of-thought prompting’, a method suggested by Google researchers in 2022 to help LLMs reason [30]. In our case, this involved asking GPT4 to assess a document, then asking it to explain why it had applied (or not applied) the ‘delay’ label. We then asked it to reassess the document, taking this new reasoning into account. For contentious decisions, this process occasionally caused the LLM to notice and correct flaws in its initial judgement.

For delay, a chain of reasoning approach improved overall accuracy from 71% to 78%, improving both the precision and the recall of the classifier. For distress, however, performance dropped from 94% to 93% with this approach, primarily due to overenthusiastic reassessment.

Reflections

  • While the chain of thought approach improved our accuracy for delay, to implement it we found we needed to send, at each step in the chain, every message previously in the conversation. In our case this involved sending the full text of each decision- sometimes amounting to multiple pages of text - repeatedly, which greatly increased the cost and time taken to code our dataset. A summarisation step, which extracts and resends only those parts relevant to a decision, may help here.
  • The decision to use the OpenAI API was taken in part because of the flexibility this approach offered, allowing us to test against open models using tools like llamafile, which support this syntax. Testing against non-OpenAI models wasn’t possible during this project due to time constraints, but it would be interesting to see how performance on this task varies using other LLMs.

Accuracy scores

The final scores obtained by GPT4 and human coders on this task are below. As a guide to interpreting these metrics:

  • Precision measures the proportion of documents labelled ‘true’ (or, in this case ‘distress and inconvenience’ or ‘delay’) by a classifier which were also labelled ‘true’ in the gold standard. 
  • Recall is a measure of how many of the documents labelled ‘true’ in the gold standard were also labelled ‘true’ by the classifier. 
  • The F1 score is a straight average of precision and recall.

Label 1: Distress and inconvenience


F1 scorePrecisionRecallLabelled documents in gold standard (of 100)
GPT-4 coding0.940.920.9762​​​​
Human coding before alignment0.940.930.9462

Label 2: Unfair delay


F1 scorePrecisionRecallLabelled documents in gold standard (of 100)
GPT-4 coding0.780.750.82
34
Human coding before alignment0.840.840.83​​​​34

As a final step, the best performing classifiers were then used to label each of the 8,547 upheld complaints in our dataset, forming the basis for the analysis presented in this report.

Product types in scope

The FOS assigns a ‘product type’ to each complaint when it is first made, recording the type of financial product it relates to. We used these product types to work out which complaints related to motor, home, travel or pet insurance. 

Product types considered in scope were:

  • Car or Motorcycle Insurance
  • Caravan insurance
  • Motor Warranties
  • Roadside Assistance Insurance
  • Buildings Insurance
  • Contents Insurance
  • Home Emergency Cover
  • Home Emergency Insurance
  • Travel insurance
  • Pet insurance
  • Horse insurance

Note this excludes types of insurance used by small and medium enterprises rather than consumers, such as commercial property insurance and commercial vehicle insurance.

Footnotes

[1] FOS (June 2023) Annual complaints data and insight 2022/23 
[2] Data for 2019/20 includes livestock insurance but is published separately in subsequent years, so the percentage increase in pet insurance complaints between 2022/23 and 2019/20 is likely to be slightly higher  
[3] Which? analysis of FOS annual complaints data, 2019/20 - 2022/23 
[4] Financial Conduct Authority (2008) Insurance Conduct of Business Sourcebook 8.1.1 
[5] Financial Conduct Authority (2008) Insurance Conduct of Business Sourcebook 2.7.6 
[6] Financial Conduct Authority (2022) FG22/5 Final non-Handbook Guidance for firms on the Consumer Duty, and Financial Conduct Authority (2023) Dear CEO letter: Implementing the Consumer Duty in the General Insurance and Pure Protection sectors 
[7] Comments by Dan Hurl, Head of Insurance, FCA: “For the most part, I believe that the Consumer Duty will not be as significant a change for the insurance market compared to other areas of financial services." FCA (2022) Transcript: Consumer Duty Webinar – Insurance 
[8] Financial Conduct Authority (2023) Dear CEO letter: Implementing the Consumer Duty in the General Insurance and Pure Protection sectors 
[9] See, for example, Which?’s three-part investigation into buildings insurance claims: Which? (2023) Home insurance customers short-changed by meagre payouts 
[10] Financial Ombudsman Service (undated) Before we get involved 
[11] Financial Ombudsman Service (2023) What does the Consumer Duty mean for resolving financial complaints? Speech by Abby Thomas to the Consumer Duty Implementation Summit, 20th June 2023, and Financial Ombudsman Service (undated) How we handle complaints 
[12] Financial Conduct Authority (2019) Dispute Resolution Complaints Sourcebook 1.4.Complaints Resolution Rules, Investigating, assessing and resolving complaints, 2G27/09/2019 
[13] Financial Ombudsman Service (undated) Our work with other organisations 
[14] Financial Ombudsman Service (undated) Ombudsman decisions 
[15] Financial Conduct Authority (2023) Financial Lives 2022 
[16] Association of British Insurers (2020) The Price Of Accuracy: Consumer Attitudes To Data And Insurance 
[17] Financial Ombudsman Service (undated) Ombudsman decisions 
[18] This broadly follows FOS policy of removing complaints about businesses with low complaint numbers from periodic reporting. See their Data and Insight page (undated) 
[19] Precisely, the model used here was gpt-4-0125-preview, OpenAI’s most advanced available text model in April 2024  
[20] Financial Ombudsman Service (undated) Compensation for distress or inconvenience 
[21] Financial Ombudsman Service (undated) Compensation for distress or inconvenience 
[22] Financial Conduct Authority (2024) Business Plan 2024/25 
[23] Financial Ombudsman Service (2021) Insight summary: Complaints resulting from Covid-19 and the impact on consumers and SMEs 
[24] Financial Ombudsman Service (2022) Annual complaints data and insight 2021/22 
[25] Financial Ombudsman Service (2023) What does the Consumer Duty mean for resolving financial complaints? Speech by Abby Thomas to the Consumer Duty Implementation Summit, 20th June 2023 
[26] Financial Conduct Authority (2024) Business Plan 2024/25 
[27] Financial Conduct Authority (2022) FG22/5 Final non-Handbook Guidance for firms on the Consumer Duty, and Financial Conduct Authority (2023) Dear CEO letter: Implementing the Consumer Duty in the General Insurance and Pure Protection sectors 
[28] Financial Conduct Authority (2024) Business Plan 2024/25 
[29] Financial Ombudsman Service (2024) Data and insight: Half-yearly complaints data 
[30] Wei et al. (2022) Chain-of-Thought Prompting Elicits Reasoning in Large Language Models