Social scientists have long debated the question of how much social desirability biases affect the information they can gather from online survey responses. However, it remains unclear how and to what extent we can measure it. Reviewing relevant literature focusing on this problem, we argue that the most promising way to measure social desirability bias is manipulating it globally through an experimental design placed at the very start of a survey. This approach—if successful—allows researchers to achieve three crucial goals that other approaches fall short in achieving simultaneously: 1) assuring that social desirability rather than confounders is measured, 2) allowing for checking whether social desirability was actually manipulated, and 3) allowing for measuring social desirability pressures in an infinite number of outcomes throughout the survey. Employing both novel treatment designs and designs already used in established research, we demonstrate with pre-registered survey experiments in the United States (N = 5,000) and Denmark (N = 3,000) that this approach is much too risky for researchers to pursue. Specifically, we show that some treatment designs repeatedly fail to achieve manipulation (i.e., respondents do not believe their answers are being observed), whereas others achieve manipulation but do not affect outcomes which we know for a fact are marred by social desirability (i.e., respondents do not care even if they know they are being observed). We end the paper by providing advice for scholars regarding which approaches are then most feasible to pursue judging by to what extent they achieve the three crucial goals reported above.