Reflecting on Consent At Scale

By and on

In the era of internet research, everyone is a participant.

A PhD stood at the front of a crowded conference hall.  They’d just presented their paper on social capital in distributed online communities. As the applause settled, an audience member scuttled to the microphone, eager to ask the first question.

A professor from University College. Thank you for the great talk. It was refreshing to attend a talk with such rigorous methods. You scrapped data from so many different subreddits and made such a compelling argument for how these results will generalize to other online spaces. My question is less about the research and more about your experiences with data contributors. How did the various subreddit community members react when you talked to them about this exciting work?

What kind of question is this? The PhD thinks to themself. It’s not feasible to get consent from every user. We got an IRB exemption, got approval from subreddit moderators, and followed all the API terms of use and regulations for researcher access. Do other researchers really ask for consent at scale? Did I get consent…?

You may be in a similar situation now! Using social media data for research is a common method that has massive potential for large-scale analyses in both quantitative and qualitative research. However, it can be frustrating to simultaneously hold individual, affirmative consent as the golden standard and recognize its limitations as a viable option for many researchers. To that end, we’ve made a reading list about getting individual consent at scale, particularly in research settings. We hope this reading list serves as a provocation for discussion rather than a list of solutions to this problem.

Normative Papers

1. The “Ought-Is” Problem: An Implementation Science Framework for Translating Ethical Norms into Practice. Our resident ethicist (Leah Ajmani) loves this paper so much! It basically uses informed consent as a case to describe the larger translational effort needed to move from normative prescriptions to actual implementation.

2. Yes: Affirmative Consent as a Theoretical Framework for Understanding and Imagining Social Platforms. A contemporary classic in CHI,  this paper does a really good job of describing affirmative consent as the ideal situation but then using the “ideal” for explanatory and generative purposes. There is merit to having an ideal, even if it is not perfectly attainable!

HCML Papers

We’re obviously biased because she’s a GroupLenser, but Stevie Chancellor does a great job at describing consent at scale as an ethical tension rather than a “must-have.” It is something researchers need to navigate with justified reasoning.

1. A Taxonomy of Ethical Tensions in Inferring Mental Health States from Social Media

2. Toward Practices for Human-Centered Machine Learning

Design Papers

These papers are both critical of current consent design and do a great job of discussing alternatives, even if it is outside of a research context.

1. (Un)informed Consent: Studying GDPR Consent Notices in the Field

2. Limits of Individual Consent and Models of Distributed Consent in Online Social Networks

From grappling with moral nuance to designing better consent procedures, these readings can take our discussions of individual consent at scale from a theoretical ideal to an operationalizable goal. So, let’s embrace difficult discourse about how to move forward and continue to traverse the space between the idyllic and the feasible. Comment or tweet which papers you would add to this list!

Wordy Writer Survival Guide: How to Make Academic Writing More Accessible

By and on

As GroupLensers received CHI reviews back, many of us were told our papers were “long,” “inaccessible,” and even “bloated.” These critiques are fair. Human-Computer Interaction (HCI) research should be written for a broad and interdisciplinary audience. However, inaccessible writing can be hard to fix, especially if it is your natural writing style. Here’s some advice from GroupLens’s very own Stevie Chancellor (computer science professor, PhD advisor, and blogger about everything writing-related)

Sentence Structure

  • Sentence Length: How long are your sentences, and how many comma-dependent clauses are going on per paragraph? Long sentences are more complicated to read and, therefore, harder to parse. Some people say to split any sentence with more than 25 words. Eh. 30-35 should be fine for academic writing, but longer is worser. 
  • Commas, Commas, Commas: Comma-separated clauses are painful to follow. A comma is a half-stop in writing and momentarily pauses trains of thought. While some commas are grammatically necessary (see the one that follows this parenthesis), too many commas chop your sentences into pieces. Therefore, too many commas interrupt your reader’s comprehension of your idea.
  • Sentence Cadence: How are you varying your cadence of the writing? Do you use short sentences, then longer sentences, and vary the structure and placement of comma clauses? Using ONLY long sentences gets repetitive and, therefore, more challenging to read.
  • Topic Sentence and Transition Clarity: Topic and “transition” sentences should be crystal clear in their simplicity. Interior sentences can be more elaborate/have more “meat.”

Word Choice

  • Simple Words are Better: Are we using as simple words as possible to describe what we mean? For example: do not write “utilize” as a synonym for “use”. Just say “use”. 
  • Active vs. Passive Voice: Are you overly using the passive voice and not active? Passive voice is occasionally correct, especially when needed to soften a claim (e.g., “Research has suggested that….”). But too much passive voice is hard to read.
  • Filler Words: Look for words that contribute nothing to the idea but make your sentence longer. Adverbs and fluffy adjectives are common culprits of this. Adverbs like “very”, “fairly”, and “clearly” provide almost NO substance to writing but lengthen the sentence.
  • Weasel Words: Inspired by Matt Might, check your writing for “weasel words” that augment the clarity of your sentence. Do you need to say an experiment was “mostly successful, but had limitations?” Or can you say, “The experiment was successful in X and Y with less success in Z”?
  • Citations vs. Names: Be judicious with \citet{} in your writing. Invoking someone’s name is equivalent to inviting that person to a dinner party and forces the reader to pay attention to the “who’s who” of your writing. Who do you want to invite to your home? Remember, you’re in charge of maintaining conversation during the party and providing food for everyone, so be careful who you invite.

Pragmatic Decisions/Actions

  • Read Aloud: Read “dense” or “inaccessible” sections out loud. Say them with your mouth. Long, poorly structured paragraphs become obvious when read out loud.
  • Use a Friend or Colleague To Kill Your Darlings: Friends and colleagues with no emotional connection to the paper are great for removing self-indulgent yet non-essential writing. Ask a friend to read a section to go in and “kill your darlings.”
  • Use AI Tools Judiciously: Tools such as Grammarly Pro, Writefull, or ChatGPT/Bard/LLM du jour can do first passes for wordiness and phrasing. For example, Grammarly Premium provides swaps for too-long phrases (and is free if you have a SIGCHI membership). LLMs can trim your writing by 10%. Just be cautious in the accuracy of the edits and maintain the same tone and argumentation.
  • Ctrl + F Is Your Friend: Recognize your writing “quirks” and ctrl + f to search for and cut them. Stevie’s writing quirks include using adverbs in initial drafts, meaning that searching for “very” and “ly” returns many words to cut.

From managing sentence structure to choosing simple words, these tips can take your writing from “in the clouds” to a reader-friendly and enjoyable experience. Remember, the goal is not just brevity but clarity, ensuring that our work resonates with a broad and interdisciplinary audience. So, let’s embrace these tips, Ctrl + F our way through, and invite our readers to a well-organized and engaging intellectual dinner party. Cheers to more accessible and impactful HCI research!

Page Protection: The Blunt Instrument of Wikipedia

By on

Wikipedia is a 22 year-old, wonky, online encyclopedia that we’ve all used at some point. Currently (2023), Wikipedia has a dizzying amount of information in numerous languages. The English language of Wikipedia alone has over 6 million articles and 40,000 active editors. The allure of Wikipedia articles is that they are highly formatted and community-governed; while anyone can contribute to a Wikipedia article, there’s a vast infrastructure of admins, experienced editors, and bots who maintain the platform’s integrity. Wikipedia’s About page reads:

Anyone can edit Wikipedia’s text, references, and images. What is written is more important than who writes it. The content must conform with Wikipedia’s policies, including being verifiable by published sources […] experienced editors watch and patrol bad edits.”

Our research aims to understand the tension between open participation and information quality that underlies Wikipedia’s moderation strategy. In other words, how does maintaining Wikipedia as a factual encyclopedia conflict with the value of free and open knowledge? Specifically, we look at page protection–an intervention where administrators can “lock” articles to prevent unregistered or inexperienced editors from contributing.

We used quasi-causal methods to explore the effects of page protection. Specifically, we created two datasets: (1) a “treatment set” of page-protected articles and (2) a “control set” of unprotected articles that were similar to a treated article in terms of article activity, visibility, and topic. We then ask: does page protection affect editor engagement consistently?

Our findings show that page protection dramatically but unpredictably affects Wikipedia editor engagement. Above is the kernel density estimate (KDE) of the difference between the number of editors before page protection versus after protection. We evaluated this metric across three time windows: seven, fourteen, and thirty days. Not only is this spread huge, but it also spans both a negative and positive difference. In essence, we cannot predict whether page protection decreases or increases the number of people editing an article. 

Are heavy-handed moderation interventions necessary for a complex platform such as Wikipedia? How can we design these non-democratic means of control to maintain a participatory nature? Check out our paper for discussions on these questions or come to my talk on October 16, 2023 at 4:30pm!

How Can Collaborative Tools Improve Online Learning with VR Video?

By on

Virtual Reality (VR) has been long touted as a revolutionary technology, offering a unique and immersive learning experience that can transport students to far-flung locations and bring abstract concepts to life. However, one of the biggest challenges for VR adoption has been the high cost of creating VR content. Instructors have to find help from the VR developers or 3D model designers to create the content, because it’s hard to find or create a perfect content to fit into their classes.

With the proliferation of inexpensive panoramic consumer video cameras and various types of video editing software, using 360-degree videos in VR has attracted more attention as an alternative method for instructors building a realistic and immersive environment. It is a “more user-friendly, realistic and affordable” way to create a realistic digital experience compared to developing a simulated VR environment.

Pedagogically, collaboration learning is better than individual learning in many scenarios. This articulates a research gap in the development and empirical investigation of collaboration VR video learning environments. 

Our work designed two modes to investigate the roles of collaborative tools and shared video control, and compared it with a plain video player (See our demo through the following video). Each mode contains a video viewing system and an after-video platform for further discussion and collaboration.  Basic mode uses a conventional VR video viewing system together with an existing widely-available online platform. Non-sync mode includes a collaborative video viewing system with individual control and video timeline and an in-VR platform for after-video discussion. Sync mode contains the same in-VR after-video platform, but students have shared video control. 

The study aimed to answer two research questions: 

RQ1: How does VR video delivery via existing technology (Basic mode) compare to collaborative VR video delivery (Sync and Non-Sync mode) on measures of knowledge acquisition, collaboration, social presence, cognitive load and satisfaction?
RQ2: How does individual VR video control (Non-sync mode) compare with shared video control (Sync mode) on measures of knowledge acquisition, collaboration, social presence, cognitive load, and satisfaction?

In order to examine the influence of different types of collaborative technology on the perceptions and experiences of online learning, we conducted three conditions within-subject experiment with 54 participants (18 groups (trios)). We collected quantitative data from knowledge assessment, self-reported questionnaires and log files, then we triangulated the validated measures with qualitative data from semi-structured interviews.

Figure 1. Study protocol

For RQ1, we found that collaborative VR-based systems both achieved statistically significantly higher scores on the measures of visual knowledge acquisition, collaboration, social presence, and satisfaction, compared to the baseline system.  For qualitative results, participants reported the potential reasons, such as lack of shared context and current technical obstacles (e.g, echos), for lower scores of Basic mode on collaboration and satisfaction. They also appreciated the in-VR platform’s power to transmit and display visuals for after-video discussion, which explained the potential reason for lower scores of Basic mode on the measures of visual knowledge acquisition. 

For RQ2, The shared control in Sync Mode significantly increased the ease of collaboration and sense of social presence. In particular, shared control significantly increased the view similarity (where the team was watching the same view of the video) and discussion time during the video. Based on the qualitative results, There were better collaboration experiences with shared control in the Sync mode due to better communication comfort. There was a tension between communication comfort and learning pace flexibility and the control method would influence the perceived usefulness of collaborative tools. 

Our work provides implications for design and research on collaborative VR video viewing. One important one is balancing the trade-off between learning pace flexibility and communication comfort based on teaching needs. The expectations for time flexibility and collaboration experience might differ for diverse educational activities and learning scenarios. Therefore, VR collaborative applications should decide whether or not to use shared control based on specific purposes.

Finally, takeaways from this paper:

  • Collaborative VR video viewing system can improve visual knowledge acquisition, collaboration, social presence, and satisfaction compared to the conventional system
  • Shared video control in VR video viewing can enhance collaboration experiences by increasing communication comfort, but may also reduce learning pace flexibility.
  • In-VR platforms for after-video discussion can enhance visual transmission and engagement, and improve the overall learning experience in collaborative VR video environments.

Find more information in our paper here –– coming to CHI 2023! 

Cite this paper:

Qiao Jin*, Yu Liu*, Ruixuan Sun, Chen Chen, Puqi Zhou, Bo Han, Feng Qian, and Svetlana Yarosh. 2023, Collaborative Online Learning with VR Video: Roles of Collaborative Tools and Shared Video Control. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI ’23). https://doi.org/10.1145/3544548.3581395

Towards Practices for Human-Centered Machine Learning

By on

Toward Practices for Human-Centered Machine Learning from CACM on Vimeo.

People are excited about human-centered AI and machine learning to make AI more ethical and socially appropriate. AI has captured the popular zeitgeist with promises of generalized artificial intelligence that can solve many complex human problems. These promises of ML, however, have had negative consequences, with both ridiculous and catastrophic failures – they rack up so fast that colleagues are keeping AI Indicents databases, reports of AI ethics failures, and more to boot.

How will ML researchers and engineers avoid these problems and move towards more compassionate and responsible ML? There aren’t many concrete guidelines on what it looks like to do human-centered machine learning in practice. And while there are some pragmatic guides, they often lack the connection between technical and social/cultural/ethical focus.

In my recently published CACM article, I argue that there is a gap in building human-centered systems – the gap between the values we hold but don’t have actionable methods and technical methods that don’t align with our values. The paper argues for practices bridging the ever-significant value and the focus of ever-practical methods.  

This paper synthesizes my CS and Critical Media Studies background in thinking about how we should DO HCML. It also builds on my decade of research experience in human-centered research in a challenging area – predicting and acting on dangerous mental health behaviors discussed on social media data.  It builds on classical definitions of human-centeredness in defining HCML and lays out five practices for researchers and practitioners. These practices ask us to prioritize technical advancements EQUAL TO our commitments to social realities. In doing this, we can make genuinely impactful technical systems that meet people and communities where they’re at.

Here are the five big takeaways from the paper and the practices you can implement immediately.

  1. Ask if machine learning is the appropriate approach to take 
  2. Acknowledge that ML decisions are “political”
  3. Consider more than just a single “user” of an ML system
  4. Recognize other fields’ contributions to HCML 
  5. Think about ML failures as a point of interest, not something to be afraid of

Let’s dig into one of these that seems – considering more than just a single “user” of an ML system. When considering who “uses” a system, we often only consider the person commissioning or building the system. Even in HCI, we talk about “users” of systems and (if lucky) the people whose data goes into the model. However, many systems have much larger constellations of people “involved” in the ML model. For example, the “user” may be a government or business in facial recognition technology. But the people whose faces are in that system are also “users” of the technology. Likewise, if that facial recognition system is used in an airport to screen passengers for flight identification, everyone who walks by ambiently may interact with it. The existing ML system meaningfully impacts a user who chooses NOT to interact with that system – if opting out means they must spend more time in airport security or have their identity scrutinized more closely. Both examples make it clear that with the consideration of multiple stakeholders involved in the ML model, we should consider all the stakeholders whose data goes into creating the model.

I aim for these principles to inspire action – to encourage more profound research, empirical evaluations, and new ML methods. I also hope the practices make human-centered activities more tractable for researchers AND practitioners. I hope this inspires you and your colleagues to ask hard questions that may mean making bold decisions, taking action, and balancing these competing priorities in our work. 

You can read more about this paper in the recently published Featured Article in the Communications of the ACM here.