6 September 2023

Large-Scale Surveys and Official Statistics: Best Practices


Surveys & Research See all news

In the fields of official statistics and academic research, people often refer to “large-scale surveys”. It refers to large data collection programs designed to produce statistical evidence about a population, for example in health, social research, or consumption and living conditions.

At Gide, we have extensive experience deploying large-scale field surveys, either alongside research institutes or directly with public and research organizations such as France’s national statistics office (INSEE), INSERM, INED, IRDES, and statistical departments within government ministries.

How are large-scale surveys organized?
What makes them methodologically different?
Which tools can support their implementation?

This article provides a practical overview.

Large-scale surveys: key methodological characteristics

Running a large-scale survey is a long-term process that mobilizes significant human, technical, and financial resources. While these surveys rely on well-known data collection modes (face-to-face, telephone, online questionnaires), their overall methodology differs from a typical marketing or social research study.

Fieldwork organization: assigning interviewers and scheduling appointments

Surveys often mean interviewers. Even when large-scale surveys include an online component with self-completion questionnaires, they still frequently rely on face-to-face or telephone interviews.

As a result, implementation often requires working with a dedicated fieldwork provider that supplies trained interviewers to conduct interviews in person (at home or in designated locations) or remotely (by phone and sometimes by video).

In some cases, the organization running the survey has its own team of interviewers, but the operational structure remains similar.

You need to manage the relationship with interviewers, assign each interviewer to specific respondents, train interviewers on the tools provided to them (especially for face-to-face interviews), and support them throughout the fieldwork period.

Scheduling is critical to making fieldwork efficient. Interviewers must be assigned to respondents while respecting constraints and availability on both sides. This typically follows a strict protocol and specific operational rules.

At Gide, staying true to our bespoke approach, we build a scheduling module that is tailored to each project.

For example, in a recent health survey project, we deployed an appointment scheduling platform used by multiple field roles: interviewers, nurses, and coordinators.

To ensure smooth and efficient fieldwork, this platform must meet many requirements, for example:

  • respecting the logical sequence “recruitment > interview > medical measurements and samples > sample drop-off” and enforcing minimum time gaps between each step,
  • interviewer and nurse availability,
  • maximum travel distance between the interviewer’s location and the interview area,
  • and more.

In probability samples of the general population, respondents may also be able to book an appointment themselves after confirming they are eligible, for example by checking that their phone number was selected in the sample.

appointment scheduling module for field interviewers

Assignment mechanisms can vary widely: automatic allocation of address lists to each interviewer, allocation managed by a coordinator, or in some cases managed by the interviewer themselves.

In all cases, the rules should be embedded in the tool, so that protocol compliance does not rely solely on the interviewer or the coordinator. At the same time, these rules should not be so restrictive that they reduce the interviewer’s ability to complete interviews effectively.

Complex, nested questionnaires

In the past, paper questionnaires were the norm for face-to-face surveys. Today, data collection is most often performed using a digital questionnaire on a tablet.

electronic questionnaire device

This requires programming the questionnaire in advance, which can be long, very long. We regularly work on questionnaires with more than 1,000 questions.

To maximize productivity and reliability, we automate as much as possible the generation of an initial questionnaire script, for example:

  • by automatically converting the original questionnaire format (often a Word document) into our questionnaire scripting language. To do so, we define formatting conventions with our clients that can be interpreted by our conversion process,
  • or by starting from a formal questionnaire specification in a standard format such as DDI (Data Documentation Initiative, see ddialliance.org), for which we have also developed a converter.

Beyond length, the structure itself is often complex: these questionnaires involve routing rules, consistency checks, loops, and other logic that can be heavy to implement. They are not flat forms. They are tree-structured questionnaires with nested data.

For example, in the public health survey mentioned above, interviewers must interview households made up of adults and children.

During recruitment, the interviewer first contacts the adults. If a child in the household is eligible, you must collect consent from an adult (with an electronic signature) and obtain the child’s consent as well.

However, the child may not be present at home during the interviewer’s first visit.

The interviewer therefore interviews the adult first, then returns another day to interview the child. This second step enriches the initial questionnaire dataset with the child’s data. Once both steps are completed, nurse appointments can take place, and the relevant questionnaires become available on nurses’ tablets.

In this common situation where multiple household members are interviewed, the data structure is specific: one questionnaire at the household level, with a sub-level for adults and another for children. The final data outputs must preserve this structure to be usable. You cannot rely on a single flat table with one row per household.

This requires data transformation and restructuring. The ability of the provider programming the questionnaire (in this case, Gide) to deliver a workable data model aligned with the survey objectives is critical to the survey’s success.

This step directly impacts the quality and reliability of the delivered dataset. It is often underestimated, but it should be discussed from the very start of the project.

Managing paper questionnaires

As noted above, a large share of data collection is digital, either via self-completion questionnaires or interviewer-administered questionnaires (face-to-face or by phone). However, paper questionnaires typically remain part of the fieldwork, especially for postal mail-outs.

For fieldwork monitoring, you need to reintegrate paper-mode metadata into the digital tracking: number of questionnaires received and entered, completion rates, and so on.

You also need to integrate the data collected through paper questionnaires.

To do so, the organization running the survey typically works with a provider that receives paper questionnaires and enters data either through scanning (OCR and verification workflows) or manual entry using a dedicated entry interface, or sometimes the web version of the questionnaire when available.

This provider delivers data files which, at Gide, we convert into the required format so they can be merged with data collected through other modes (web, telephone, tablets, etc.).

A multi-stage, iterative process

Large-scale surveys are deployed through a series of iterative stages.

Once a first version of the questionnaire is programmed, projects typically begin with a test or pilot involving a limited number of respondents and interviewers. This helps validate the questionnaire, the tools, and the fieldwork organization.

Feedback is then used to improve the survey system: updates to the questionnaire, adjustments to the methodology, enhancements to the tools, and so on.

For the most complex surveys, multiple pilots may be run before the full field deployment begins.

At Gide, we support our clients throughout these deployment phases, from pilot setup to full fieldwork (which may run in several waves), including interviewer training.

Whenever possible, we also aim to be involved early, starting at questionnaire design stage. This allows us to contribute ideas based on similar projects, and to adjust the survey design based on what is technically feasible or not.

Even when the contract is placed through a research institute, we remain available and interested in participating in discussions with the survey sponsor.

Tailored data outputs and reporting

In large-scale surveys, data analysis is usually performed by the end client. We therefore deliver data in the most practical format for them, whether that is standard CSV files or formats suited to specific tools and languages (R, SAS, etc.).

Raw data is rarely delivered exactly as collected. It typically goes through a cleaning and validation step before being organized into a specific data structure (often called a “data map” or “record layout”) requested by the client.

Upstream, for fieldwork monitoring, we provide a secure web portal dedicated to the survey. It allows clients and fieldwork providers to track the number of completed interviews, completion rates, and other operational indicators by mode, target group, geography, interviewer, and more.

Depending on needs, dashboards can be downloaded, and dedicated exports can be set up and automated.

Data security and confidentiality

When processing personal data, and even more so sensitive data (for example health data, or responses to victimization surveys), security and confidentiality are major concerns.

There is no question of emailing a file containing 100,000 addresses, or hosting data on a server located outside the required jurisdiction.

At Gide, throughout the project we provide a secure exchange platform to share all survey documents: questionnaires, respondent samples, collected data, and more.

For face-to-face surveys, we can enable encryption of data stored on tablets and encrypt data transmission to our servers, which are hosted in France.

For health surveys, we can also deploy the data collection system and host data on HDS-certified platforms (French certification for health data hosting).

Which technology tools support large-scale surveys?

As we have seen, large-scale surveys rely on complex survey systems, with strict protocols and specific methodological frameworks.

At Gide, we build bespoke tools for each survey, tailored to the project’s needs.

Here are a few examples of tools and features that can be used to meet those requirements.

Traceability features

Digital tools can be used to incorporate traceability into digital questionnaires, for example:

  • electronic signature within the questionnaire,
  • photo capture of documents and proof of submission,
  • barcode scanning for physical items, such as biological samples in health surveys,
  • and more.

Audio playback of questions and audio capture of responses

In the past, running large-scale surveys with non-French-speaking populations was difficult. Face-to-face interviews with paper questionnaires often required an interpreter, which both complicated fieldwork logistics and created high costs.

This was partly addressed with the widespread use of tablet-based questionnaires, which can be translated into many languages, as we did for IRDES in a survey targeting migrant populations (read the case study).

However, this approach is not perfect, especially when respondents have reading difficulties. Adding audio capabilities to digital questionnaires helps address this.

It is possible to include audio playback of questions on a tablet (question text, instructions, response options, etc.). This is what we implemented for INSEE in an ongoing survey among people experiencing homelessness.

As a result, even if the interviewer does not speak the respondent’s language, the interview can still be conducted. And if the respondent has reading difficulties, they can listen to the question and answer by following the instructions.

multilingual questionnaire with audio support

Audio questionnaires can support many other use cases, such as interviews with children who cannot read, or people with reading difficulties (disability, dyslexia, etc.).

Audio playback through headphones can also be very useful for sensitive questions. It helps respondents feel comfortable answering questions privately, without the interviewer or anyone else in the room hearing the questions or responses. For example, we implemented this in a survey on intimate partner violence.

Depending on the context, audio playback can be delivered using pre-produced audio files (recorded human voice or text-to-speech recordings) or via dynamic text-to-speech that reads the on-screen content directly on the tablet (Text-to-Speech).

While this second option is not always perfect (some texts may be rendered incorrectly), it is typically less costly. Most importantly, it allows the questionnaire to be updated up to the start of fieldwork, because audio files do not need to be generated in advance.

For open-ended questions, audio capture is also possible: the respondent speaks their answer, which is captured via the tablet microphone or headset and saved as an audio file.

At each synchronization, these audio files are uploaded to our servers together with the rest of the questionnaire response data, and delivered alongside the final data outputs.

Finally, these audio files can be processed using Speech-to-Text technologies to convert audio into text, just like standard open-ended responses.

Calendar-based questionnaires

In some large-scale surveys, respondents are asked to reconstruct a detailed personal history, such as an employment trajectory or residential history. Similar approaches apply to consumption diaries, expenditure logs, activity diaries, travel diaries, and more.

These questions are often difficult to present as a traditional form.

A solution is to deploy a questionnaire with a visual, user-friendly calendar interface that helps respondents recall events more accurately.

For each time period, you can embed a nested mini-questionnaire, with different questions depending on the type of period (for example, employed vs. unemployed).

calendar questionnaire interface

This UX work is essential to ensure data collection is as reliable and usable as possible. It becomes even more important when the questionnaire must be responsive so it can be completed on a smartphone. In that case, you often need two distinct layouts depending on screen size.

mobile calendar questionnaire interface

Listening and supervision for telephone interviews

In telephone interviews, you need to ensure interviewers follow the protocol and that what they enter matches what the respondent actually says.

Monitoring features are usually built into the fieldwork provider’s CATI tool. However, when the questionnaire is programmed as a web questionnaire (CAWI) and used in a call center context with a separate calling system (often referred to as webCATI), it is not always possible to see what the interviewer is entering.

To address this, we provide CAWIspy, a tool we developed that allows a supervisor or the end client to see, in real time, what the interviewer is entering into the web questionnaire programmed by Gide.

Running face-to-face interviews by video

When the interviewer cannot meet the respondent in person, it is possible to set up remote interviews designed to be as close as possible to the face-to-face experience.

To achieve this, we developed CAVIsio specifically for large-scale surveys during the 2020 lockdown period.

cavisio dares
cavisio interface

This hybrid mode goes beyond telephone interviewing by maintaining visual contact between interviewer and respondent. It allows:

  • the interviewer to see, in a single window, their own video, the respondent’s video, and the questionnaire they are administering,
  • the respondent to similarly see their own video, the interviewer’s video, and any response lists (showcards) the interviewer may share.

By integrating everything into a single interface (one browser tab), CAVIsio avoids juggling multiple tools and screens, unlike approaches based on Teams, Zoom, and similar platforms. This improves comfort and efficiency for interviewers.

Click here to learn more about CAVIsio.

Further reading

Interested in large-scale surveys? You may also like: