Workshop
1st Workshop on Multimodal Content Moderation
Mei Chen · Cristian Canton · Davide Modolo · Maarten Sap · Maria Zontak · Chris Bregler
East 17
Keywords: CV for social good
Content moderation (CM) is a rapidly growing need in today’s industry, with a high societal impact, where automated CM systems can discover discrimination, violent acts, hate/toxicity, and much more, on a variety of signals (visual, text/OCR, speech, audio, language, generated content, etc.). Leaving or providing unsafe content on social platforms and devices can cause a variety of harmful consequences, including brand damage to institutions and public figures, erosion of trust in science and government, marginalization of minorities, geo-political conflicts, suicidal thoughts and more. Besides user-generated content, content generated by powerful AI models such as DALL-E and GPT present additional challenges to CM systems.
With the prevalence of multimedia social networking and online gaming, the problem of sensitive content detection and moderation is by nature multimodal. The Hateful memes dataset [1] highlights the multimodal nature of content moderation, for example, an image of a skunk and a sentence “you smell good” are benign/neutral separately, but can be hateful when interpreted together. Another aspect is the complementary nature of multimodal analysis where there may be ambiguity in interpreting individual modalities separately. Moreover, content moderation is contextual and culturally multifaceted, for example, different cultures have different conventions about gestures. This requires CM approach to be not only multimodal, but also context aware and culturally sensitive.
Despite the urgency and complexity of the content moderation problem, it has not been an area of focus in the research community. By having a workshop at CVPR, we hope to bring attention to this important research and application area, build and grow the community of interested researchers, and generate new discussion and momentum for positive social impact. Through invited talks, panels, and paper submissions, this workshop will build a forum to discuss ongoing efforts in industry and academia, share best practices, and engage the community in working towards socially responsible solutions for these problems.
With organizers across industry and academia, speakers who are experts across relevant disciplines investigating technical and policy challenges, we are confident that the Workshop on Multimodal Content Moderation (MMCM) will complement the main conference by strengthening and nurturing the community for interdisciplinary cross-organization knowledge sharing to push the envelope of what is possible, and improve the quality and safety of multimodal sensitive content detection and moderation solutions that will benefit the society at large.
Schedule
Sun 8:30 a.m. - 8:45 a.m.
|
Opening Remarks and Logistics for the Day
(
Presentation
)
|
🔗 |
Sun 8:45 a.m. - 9:15 a.m.
|
Red teaming Generative AI Systems
(
Invited Talk
)
As generative AI systems continue to evolve, it is crucial to rigorously evaluate their robustness, safety, and potential for misuse. In this talk, we will explore the application of red teaming methodologies to assess the vulnerabilities and limitations of these cutting-edge technologies. By simulating adversarial attacks and examining system responses, we aim to uncover latent risks and propose effective countermeasures to ensure the responsible deployment of generative AI systems in new domains and modalities. |
🔗 |
Sun 9:15 a.m. - 9:45 a.m.
|
Fact Checking 101
(
Invited Talk
)
In this talk Mevan will be taking a deep dive into the world of fact checks, and fact checking. Together we'll be exploring the real-world context: How many fact checkers are out there? How are they organised? How do they fit into the information ecosystem? We'll be doing a deep dive into how fact checking actually works on the ground: Is it an effective intervention? Does it change minds? How are fact checks actually made? And we'll be ending on what the challenges are in the modern day with respect to specific examples of mis/disinformation, GenAI and global data infrastructure. Together we'll explore the opportunities and limitations, and how these will affect the future of information credibility around the world. |
🔗 |
Sun 9:45 a.m. - 10:15 a.m.
|
Content Moderation: Two Histories and Three Emerging Problems
(
Invited Talk
)
The technical challenges of identifying toxic content are so immense, they can often eclipse the fact that identification is just one element of ‘content moderation’ as a much broader sociotechnical practice. Considering the broader historical context of content moderation helps to explain why moderation is so difficult, identify why good technical solutions don’t always make good social or political ones, and reframe what problems we’re even trying to solve. I will close by highlighting three problems that I hope will open provocative and challenging questions for those working on moderation as a technical problem. |
🔗 |
Sun 10:15 a.m. - 10:30 a.m.
|
Coffee Break
|
🔗 |
Sun 10:30 a.m. - 11:00 a.m.
|
Bias, Causality and Generative AI
(
Invited Talk
)
|
🔗 |
Sun 11:00 a.m. - 11:45 a.m.
|
Policy, Social Impact, Trust & Safety
(
Panel Discussion
)
|
🔗 |
Sun 11:45 a.m. - 1:00 p.m.
|
Lunch Break
|
🔗 |
Sun 1:00 p.m. - 1:30 p.m.
|
Generative Media Unleashed: Advancing Diffusion models with Safety in Mind
(
Invited Talk
)
This talk explores the exciting opportunities in diffusion models for image, video, and 3D generation. We'll dive into various generatgive media applications that'll transform and expand the creative economy, and help humans become more creative. At the same time, I will emphasize on the vital importance of trust and safety to ensure responsible and ethical utilization of these powerful technologies. I'll put forward a number of proposals for addressing safety, specifically in the generative media space. |
🔗 |
Sun 1:30 p.m. - 2:00 p.m.
|
Data Collection for Content Moderation
(
Invited Talk
)
Data collection and curation is an integral, yet often overlooked component of building content moderation systems. In this presentation we'll discuss optimizing data annotation, the effects of data quality and quantity on overall model performance, techniques for identifying and alleviating biases in models, and discussing appropriate applications of synthetic data. |
🔗 |
Sun 2:00 p.m. - 2:15 p.m.
|
CrisisHateMM: Multimodal Analysis of Directed and Undirected Hate Speech in Text-Embedded Images from Russia-Ukraine Conflict
(
Accepted Paper
)
Text-embedded images are frequently used on social media to convey opinions and emotions, but they can also be a medium for disseminating hate speech, propaganda, and extremist ideologies. During the Russia-Ukraine war, both sides used text-embedded images extensively to spread propaganda and hate speech. To aid in moderating such content, this paper introduces CrisisHateMM, a novel multimodal dataset of over 4,700 text-embedded images from the Russia-Ukraine conflict, annotated for hate and non-hate speech. The hate speech is annotated for directed and undirected hate speech, with directed hate speech further annotated for individual, community, and organizational targets. We benchmark the dataset using unimodal and multimodal algorithms, providing insights into the effectiveness of different approaches for detecting hate speech in text-embedded images. Our results show that multimodal approaches outperform unimodal approaches in detecting hate speech, highlighting the importance of combining visual and textual features. This work provides a valuable resource for researchers and practitioners in automated content moderation and social media analysis. The CrisisHateMM dataset and codes are made publicly available at https://github.com/aabhandari/CrisisHateMM. |
🔗 |
Sun 2:15 p.m. - 2:30 p.m.
|
Prioritised Moderation for Online Advertising
(
Accepted Paper
)
Online advertisement industry aims to build a preference for a product over its competitors by making consumers aware of the product at internet scale. However, the ads that violate the applicable laws and location specific regulations can have serious business impact with legal implications. At the same time, customers are at risk of getting exposed to egregious ads resulting in a bad user experience. Due to the limited and costly human bandwidth, moderating ads at the industry scale is a challenging task. Typically at Amazon Advertising, we deal with ad moderation workflows where the ad distributions are skewed by non defective ads. It is desirable to increase the review time that the human moderators spend on moderating genuine defective ads. Hence prioritisation of deemed defective ads for human moderation is crucial for the effective utilisation of human bandwidth in the ad moderation workflow. To incorporate the business knowledge and to better deal with the possible overlaps between the policies, we formulate this as a policy gradient ranking algorithm with custom scalar rewards. Our extensive experiments demonstrate that these techniques show a substantial gain in number of defective ads caught against various tabular classification algorithms, resulting in effective utilisation of human moderation bandwidth. |
🔗 |
Sun 2:30 p.m. - 3:00 p.m.
|
Understanding Health Risks for Content Moderators and Opportunities to Help
(
Invited Talk
)
Social media platforms must detect a wide variety of unacceptable user-generated images and videos. Such detection is difficult to automate due to high accuracy requirements, continually changing content, and nuanced rules for what is and is not acceptable. Consequently, platforms rely in practice on a vast and largely invisible workforce of human moderators to filter such content when automated detection falls short. However, mounting evidence suggests that exposure to disturbing content can cause lasting psychological and emotional damage to moderators. Given this, what can be done to help reduce such impacts? My talk will discuss two works in this vein. The first involves the design of blurring interfaces for reducing moderator exposure to disturbing content whilst preserving the ability to quickly and accurately flag it. We find that interactive blurring can reduce psychological impacts on workers without sacrificing moderation accuracy or speed (see demo at http://ir.ischool.utexas.edu/CM/demo/). Following this, I describe a broader analysis of the problem space, conducted in partnership with clinical psychologists responsible for wellness measurement and intervention in commercial moderation settings. This analysis spans both social and technological approaches, reviewing current best practices and identifying important directions for future work, as well as the need for greater academic-industry collaboration |
🔗 |
Sun 3:00 p.m. - 3:30 p.m.
|
Coffee Break
|
🔗 |
Sun 3:30 p.m. - 4:00 p.m.
|
Building end-to-end content moderation pipelines in the real world
(
Invited Talk
)
In this talk, we explore a holistic approach for building a natural language classification system tailored for content moderation in real-world scenarios. We discuss the importance of crafting well-defined content taxonomies and labeling guidelines to ensure data quality, and detail the active learning pipeline developed to handle rare events effectively. We also examine various techniques used to enhance the model's robustness and prevent overfitting. This approach generalizes to diverse content taxonomies and how the resulting classifiers can outperform standard off-the-shelf models in the context of content moderation. |
🔗 |
Sun 4:00 p.m. - 4:30 p.m.
|
Disrupting Disinformation
(
Invited Talk
)
We are awash in disinformation consisting of lies and conspiracies, with real-world implications ranging from horrific humans rights violations to threats to our democracy and global public health. Although the internet is vast, the peddlers of disinformation appear to be more localized. I will describe a domain-level analysis for predicting if a domain is complicit in distributing or amplifying disinformation. This process analyzes the underlying domain content and the hyperlinking connectivity between domains to predict if a domain is peddling in disinformation. These basic insights extend to an analysis of disinformation on Telegram and Twitter. |
🔗 |
Sun 4:30 p.m. - 4:40 p.m.
|
Safety and Fairness for Content Moderation in Generative Models
(
Work-in-Progress Spotlight
)
With significant advances in generative AI, new technologies are rapidly being deployed with generative components. Generative models are typically trained on large datasets, resulting in model behaviors that can mimic the worst of the content in the training data. Responsible deployment of generative technologies requires content moderation strategies, such as safety input and output filters. Here, we provide a theoretical framework for conceptualizing responsible content moderation of text-to-image generative technologies, including a demonstration of how to empirically measure the constructs we enumerate. We define and distinguish the concepts of safety, fairness, and metric equity, and enumerate example harms that can come in each domain. We then provide a demonstration of how the defined harms can be quantified. We conclude with a summary of how the style of harms quantification we demonstrate enables data-driven content moderation decisions. |
🔗 |
Sun 4:40 p.m. - 5:25 p.m.
|
Technology & Approach
(
Panel Discussion
)
|
🔗 |
Sun 5:25 p.m. - 5:30 p.m.
|
Closing Remarks
(
Presentation
)
|
🔗 |