How many questionnaires were collected?

It is well known that collecting questionnaires is a thankless task. In the past, collecting questionnaires offline was quite troublesome, and online questionnaire collection is still not an easy job. For example, I personally managed to collect only 155 questionnaires at most, and this questionnaire mainly consisted of multiple-choice questions.

Forms data

Naturally, some people may try to manipulate the numbers. After all, it ultimately counts as a percentage, so they can scale the numbers proportionally, keeping the percentage unchanged while showcasing their workload, thus solving their dilemma.

However, Gul'dan, what is the cost?

Coprime#

Let’s introduce a simple concept—coprime.

Two non-zero integers are said to be coprime if their greatest common divisor is 1 (i.e., their only common factor is 1). Coprime numbers have some very interesting properties:

Relationship with 1: 1 is coprime with any integer because its only factor is itself.
Prime characteristics: Different prime numbers are always coprime, such as 3 and 5.
Adjacent number rule: Two adjacent natural numbers (like 8 and 9) or adjacent odd numbers (like 15 and 17) are always coprime.
Judgment method: You can determine if two numbers are coprime by calculating their greatest common divisor (such as using the Euclidean algorithm).

If you scale the numbers proportionally, both the numerator and denominator are scaled, and although the percentage remains unchanged, the numerator and denominator cannot be coprime. Conversely, as long as you find a pair of coprime numbers such that the result of their division is close to the calculated percentage, you can roughly deduce the actual number of questionnaires. Any scaling will inevitably be held accountable!

How to Operate#

Let’s assume the number of questionnaires is $N$ , and the percentages are $a_1, a_2, a_3, \cdots a_n$ , with corresponding numerators $b_1, b_2, b_3, \cdots b_n$ , meaning $a_1 \approx \frac{b_1}{N}, a_2 \approx \frac{b_2}{N}, \cdots a_n \approx \frac{b_n}{N}$ . Generally, percentages are kept to two decimal places, so our precision must be less than $0.0001$ , and each $b_1$ and $N$ are integers, so we can plug in some numbers and calculate directly.

Of course, you can also develop a sensitivity to numbers. Simply put, you can focus on the value of $\frac{1}{some number}$ ; the numerator and denominator of $\frac{1}{some number}$ are necessarily coprime, which is a very obvious characteristic. For example, seeing $\frac{1}{3}$ almost allows you to guess that there are only 3 questionnaires in total, probably one filled out by yourself, one by a friend, and one by someone of the opposite sex, while also considering the gender ratio.

Thinking deeper, seeing 0.0222, 0.1556, 0.1778, 0.4667 can quickly lead you to realize these numbers are:

\frac{1}{45} \approx 0.02222, \text{ error } 0.00002

\frac{7}{45} \approx 0.15556, \text{ error } 0.00004

\frac{8}{45} \approx 0.17778, \text{ error } 0.00002

\frac{21}{45} \approx 0.46667, \text{ error } 0.00003

At this point, one can't help but marvel at how simple this problem is; with just a little focus, the results become clear—so easy! After writing so much, the article should come to an end, right?

Of course not. Thanks to AI, my thoughts have been refined and even taken further, truly a great improvement, liberating my mind. The thought process is very simple, the numbers are not large, and brute force will suffice.

import math

from itertools import product

from functools import reduce

def find_min_n_direct(decimal_list, tolerance=0.0001, max_n=10000):
    """
    Direct search method: try each integer n from small to large until finding the smallest value that meets the conditions.
    Parameters:
        decimal_list: list of decimals
        tolerance: error tolerance relative to the value of n
        max_n: maximum value of n to search
    Returns:
        The smallest integer n that meets the conditions, or None if not found.
    """
    for n in range(1, max_n + 1):
        valid = True
        for x in decimal_list:
            k = round(x * n)
            error = abs(x * n - k) / n
            if error > tolerance:
                valid = False
                break
        if valid:
            return n
    return None

# User input
m = int(input("Please enter the number of decimals m: "))
decimals = [float(input(f"Please enter decimal number {i + 1}: ")) for i in range(m)]

# Execute search
result = find_min_n_direct(decimals)

# Output result
if result:
    print(f"The smallest integer n that meets the conditions is \033[1;32m{result}\033[0m")
    # Print detailed information
    print("\nVerification results:")
    for x in decimals:
        k = round(x * result)
        error = abs(x * result - k)
        print(f"{x} × {result} = {x * result:.4f} ≈ {k} (error: {error:.4f})")
else:
    print("No suitable n found.")

Is it time to end here? Certainly not. Some may argue that if my questionnaire data is very precise, meaning that no numerator is coprime with the total number of questionnaires, the calculated data would be less than the total number of questionnaires and thus not referenceable.

If that’s the case, it would be quite a coincidence, and we need to introduce another concept.

Euler's Totient Function#

The Euler's totient function, denoted as ( $\varphi(n)$ ), is a function in number theory used to calculate the number of positive integers less than or equal to a positive integer $n$ that are coprime to $n$ .

Definition#

For a positive integer $n$ , $\varphi(n)$ represents the count of numbers between $1$ and $n$ that are coprime to $n$ . For example:

$\varphi(8) = 4$ , because $1, 3, 5, 7$ are coprime to $8$ .
$\varphi(12) = 4$ , because $1, 5, 7, 11$ are coprime to $12$ .
Special value: $\varphi(1) = 1$ , because $1$ is coprime with itself.

Core Properties#

Multiplicative function
If $a$ and $b$ are coprime, then $\varphi(ab) = \varphi(a)\varphi(b)$ .
Euler's function for primes
If $p$ is a prime, then $\varphi(p) = p - 1$ , because all numbers less than $p$ are coprime to $p$ .
Euler's function for prime powers
If $p$ is a prime and $k$ is a positive integer, then $\varphi(p^k) = p^k - p^{k-1}$ .
For example, $\varphi(9) = \varphi(3^2) = 9 - 3 = 6$ .
General calculation formula
If the prime factorization of $n$ is $n = p_1^{k_1} p_2^{k_2} \cdots p_m^{k_m}$ , then:

\varphi(n) = n \left(1 - \frac{1}{p_1}\right)\left(1 - \frac{1}{p_2}\right) \cdots \left(1 - \frac{1}{p_m}\right)

For example, if $n = 12 = 2^2 \cdot 3^1$ , then:

\varphi(12) = 12 \left(1 - \frac{1}{2}\right)\left(1 - \frac{1}{3}\right) = 12 \cdot \frac{1}{2} \cdot \frac{2}{3} = 4

Graph#

The graph of the Euler's totient function looks something like this.

Figure_1

If the prime factorization of $n$ is $n = p_1^{k_1} p_2^{k_2} \cdots p_m^{k_m}$ , you can plot this function using the following code based on the formula $\varphi(n) = n \left(1 - \frac{1}{p_1}\right)\left(1 - \frac{1}{p_2}\right) \cdots \left(1 - \frac{1}{p_m}\right)$ :

import matplotlib.pyplot as plt

import numpy as np

def euler_phi(n):
    """Calculate Euler's function φ(n): the number of positive integers less than or equal to n that are coprime to n."""
    result = n  # Initialize to n
    # Check all possible prime factors
    p = 2
    while p * p <= n:
        if n % p == 0:
            # p is a prime factor of n
            while n % p == 0:
                n //= p
            result -= result // p  # φ(n) = φ(n) * (1 - 1/p)
        p += 1
    # If n is greater than 1, then n is a prime
    if n > 1:
        result -= result // n
    return result

# Calculate Euler's function values from 1 to 1000
n_values = np.arange(1, 1001)
phi_values = [euler_phi(n) for n in n_values]

# Create the graph
plt.figure(figsize=(12, 8))
plt.scatter(n_values, phi_values, s=5, alpha=0.6)  # Reduce point size for readability
plt.plot(n_values, phi_values, 'r-', alpha=0.2, linewidth=0.5)  # Lighten connecting lines

# Add graph title and labels
plt.title(' φ(n) (1-1000)')
plt.xlabel('n')
plt.ylabel('φ(n)')
plt.grid(True, alpha=0.3)

plt.legend(['φ(n)'], loc='upper right')

# Show the graph
plt.tight_layout()
plt.show()

From the graph, it is not difficult to see that

$\varphi(n)$ takes integer values, and the graph consists of discrete points, but usually presents a continuous trend in line or scatter plot form.

When $n$ is a prime, $\varphi(p) = p - 1$ , and the graph is distributed along the line $y = x - 1$ , such as when $n = 11$ , $\varphi(11) = 10$ .

When $n$ is a power of a composite number, such as $n = 9 = 3^2$ , $\varphi(9) = 6$ , the value is relatively high but lower than that of the prime in the same range.

When $n$ is a composite number with multiple prime factors, such as $n = 30 = 2 \cdot 3 \cdot 5$ , $\varphi(30) = 8$ , which is much lower than the adjacent prime points.

Characteristics of $\frac{\varphi(n)}{n}$ #

Of course, we can go further and see how likely it is for the survey data to hit a pair of non-coprime numerators and denominators. As before, if the prime factorization of $n$ is $n = p_1^{k_1} p_2^{k_2} \cdots p_m^{k_m}$ , then: $\varphi(n) = n \left(1 - \frac{1}{p_1}\right)\left(1 - \frac{1}{p_2}\right) \cdots \left(1 - \frac{1}{p_m}\right)$ .

Thus, we have $\frac{\varphi(n)}{n} = \left(1 - \frac{1}{p_1}\right)\left(1 - \frac{1}{p_2}\right) \cdots \left(1 - \frac{1}{p_m}\right)$ .

This makes it simple: assuming the number of questionnaires $n$ has prime factors $p_m$ , then the probability of $p_m \times n$ being coprime with the numerator and denominator of $n$ is the same. If someone’s questionnaire data is very precise, meaning that no numerator is coprime with the total number of questionnaires ( $p_m \times n$ ), then within a larger data range, the same probability applies that no numerator is coprime with the total number of questionnaires ( $n$ ). Since the probabilities are the same, why would you specifically report this situation? Therefore, the previously mentioned scenario is clearly very unlikely. We are not in criminal law here; there is no need to maintain any modesty—this situation with very low probability can be completely ignored.

Assuming $p_l$ is not a prime factor of the number of questionnaires $n$ , then the probability of $p_l \times n$ being coprime with the numerator and denominator of $n$ varies greatly. You can examine the prime factors of $n$ , find a few that are not prime factors of $n$ , and multiply them with $n$ to see the results, picking a reasonable one.

However, there’s no need to be discouraged; let’s take a look at the graph.

Here’s the following graph:

Figure_2

Perhaps the range can be larger:

Figure_3

This way, it becomes much clearer.

It can be seen that the graph exhibits obvious fractal characteristics and self-similar patterns, with multiple periodically descending structures. Each "branch" corresponds to multiples of specific primes. As $n$ increases, the overall trend shows a downward fluctuation, and as $n$ increases, the average value of $\frac{\varphi(n)}{n}$ approaches $\frac{6}{\pi^2} \approx 0.6079$ , which is related to the asymptotic density of prime distribution. (The above is generated by AI)

Thus, since the collision probability does not decrease significantly, overall, this method is still quite reliable and can easily uncover some insincere fraud.

Defects#

Of course, this method also has its flaws. Aside from the aforementioned lack of rigor, since typical survey results are retained as percentages with two decimal places, the tolerance for error in this code is set at 0.0001. If the number of questionnaires exceeds 10,000, the results will also only yield up to 10,000, limiting the capability to this extent. If the results are retained as percentages with one decimal place, the tolerance would be 0.001, leading to an upper limit of 1,000 questionnaires, and so on.

Additionally, the elegance of exhaustive search is lacking. I initially had AI write an algorithm for continued fractions, but it failed the test, yielding numbers that were too large in some cases. Since there are only 10,000 numbers, and we are using Python, there’s no need to pursue efficiency; brute force will suffice.

Lastly, there is another flaw, but I won’t mention it, as that would only increase the difficulty of my findings!

Finally, I’ll practice with some literature from CNKI to see how accurate it really is.