The Collective Aspect of ‘Incoming Vector’ Personalization

Jane wakes up in the morning. She yawns, stretches and reaches her hand over to the night table. Laying right next to her glasses and glass of water is her cell phone. She has seventeen updates from one app and eight from another. As she checks her account on one social media platform, a notification pops up that a friend of hers has posted a new video on another one. She quickly rushes to check it and posts a supportive response to the video. She rolls over to make sure her husband does not miss the video and sure enough he has seen the notification about the new video as well. This description may seem eerily familiar to many of us. People check their e-mails, apps and social media accounts before they even get out of bed or greet their spouse in the morning. We check up on our various online accounts during mealtime and even while driving. Some have called them the ‘cigarette of this century’, indicating just how addictive they are.

These technologies are not only alluring because of their social nature, they are consciously designed to encourage prolonged use and user dependence. Platforms use several tools to encourage addiction, some even using insight from the gambling world in the hope of getting their users increasingly addicted to them. Encouraging extended use is a key aspect of platforms’ business model. Many platforms offer their users free online services. They can do this and still be some of the wealthiest companies in the world because the users are not the platforms’ clients, rather they are part of the product being sold to advertisers.

Generally speaking, there are two directions in which information flows between platforms and users. The first direction, includes data that flows from users to the platforms. Labeled the ‘Outgoing Vector’ by Prof. Katrina Ligett, this flow of data includes information about a user’s online actions: what pages a user has ‘liked’, what sites she browses, her email correspondence, what products she shops for online, what she searches for on search engines and so much more. Platforms can collect and organize this data, analyze it and draw insights about their users. While they often use this data for their own purposes, they sometimes sell the data or the insights derived from it to third parties. This data flow has received much regulatory treatment in recent years in legal tools such as the GDPR, CCPA and CPRA, which have impacted privacy practices around the world. These regulations address issues such as consent to data collection and processing, what purposes the data can be used for, and various aspects in which users can control their data.

The outgoing vector content is collected and analyzed by platforms as it serves as the basis for platforms’ main source of income – personalized advertising. In the past, advertising agencies would cast a fishing net and hope to catch consumers who were likely to be interested in the advertised product and ultimately purchase it. This is why the most frequent advertisers at the Super Bowl include companies such as Bud Light and Budweiser, Doritos, T-Mobile and Hyundai; Women’s Fashion Magazines are an attractive venue for companies producing women’s perfume; and children’s TV shows draw advertisements by toy companies (often restricted by various regulations). With the rise of online platforms, the collection of vast personal data, and the development of machine learning capabilities to analyze this data, the world of advertising has substantially changed: it is now much more like fishing with a rod and bait, anticipating to catch a particular fish in the time and place it can be expected to show up hungry, than fishing with a net. Platforms offer advertisers the ability to define the exact profile of the consumer they want their advertisement presented to. This potentially allows advertisers to be much more efficient in their spending.

Let’s return to Jane. Platforms she is active on have recently started to register a change in her behavior. She searched YouTube for videos of recipes for healthy quiches. She joined a Facebook group called ‘Healthy Living 2021’ and has tagged some of her tweets with the hashtag #healthyfood. Finally, she started posting pictures of elaborate salads on Instagram. Based on this data it is now easy for platforms to realize that Jane will probably be a good target for advertisements for healthy food and she may also be interested in content generally related to a healthy lifestyle.

But it is not only advertisers who personalize content presented to users. In fact, almost every aspect of Jane’s interaction with each and every platform she is active on, is personalized. Content is presented to users by the platform in a bid to encourage them to spend as much time as possible engaging with the platform. This is done using several tools. Platforms like Facebook suggest various ways in which a user can expand her interaction with the platform, based of course on what the platform knows about her. Once Jane has joined a healthy eating group she can expect to see recommendations for joining other groups advancing similar interests. On her best friend’s birthday, Facebook will remind Jane to send her a Happy Birthday message. Facebook has even created its own ‘meta-relationship’, encouraging friends to post a video celebrating their anniversary of becoming Facebook friends with another user.

Each individual’s newsfeed is personalized based on the profile the platform has of them. Jane’s newsfeed will most likely not present a balanced picture of reality. Rather it will include reports about events that Jane has expressed an interest in and opinions that are similar to ones that Jane has supported. She will be presented with content that reinforces positions that she already has and not content that will challenge her beliefs or encourage her to reconsider or question her convictions. Moreover, platforms have an ability to display increasingly extreme content. This may have positive results at times. For example, if Jane starts searching for YouTube tutorials on how to lead a more active lifestyle she may initially be directed to videos that will encourage her to jog twice a week. Ultimately these videos may inspire her to train to run a marathon. However, this increasingly extreme tendency of platforms can also have dire consequences: encouraging extremism and facilitating polarization. It may encourage users who showed a slight interest in marginal movements to ultimately participate in violent demonstrations and doubt the validity and truth of content that does not align with their beliefs. This tendency, coupled with the ability of foreign governments to interfere with national elections in various ways through social media platforms, poses a grave concern for democracy and for people’s trust in the system and its institutions.  

Indeed, some of the ads that Jane will be presented with are for sneakers or a cell phone. However, platforms also post advertisements for job and housing opportunities. Under American law it is illegal to discriminate in access to housing and employment based on protected attributes such as race and gender. It is also illegal to discriminate in advertising for these fields. Despite this clear legal standard, research has found that Facebook discriminated in the presentation of such ads. Jane may not have been presented with an employment opportunity based on her personal attributes as recognized by the platform – but she will never be aware of this missed opportunity.

While personalized content may be helpful and advanced, it can also have serious negative repercussions and it is time that we start addressing these issues directly.

Any solution that wishes to address incoming vector personalization must be based on the acknowledgement of the collective nature of data. Platforms learn about Jane not only through her interaction with the platform. They analyze the patterns of behavior of large groups of users. Platforms then compare Jane’s behavior to that of other groups of users and learn much through the comparison. Is Jane’s behavior similar to that of other members of the group? Inasmuch as it is, platforms’ recommendations for Jane will be based on the behavior of other group members (people who have expressed an interest in healthy eating are known to also be interested in different forms of exercise; people who like one artist often like another one.) Information about Jane can be inferred by a platform based on an analysis of the behavior of others users, and even if she isn’t a user of that platform. One of the early cases of problematic inferences occurred offline. Target, a retail store, tracked its customers’ purchases. Based on an analysis of the purchasing patterns of a large group of their female users they were able to build an algorithm that could identify that a woman was pregnant based on her purchases. It found this pattern in the behavior of one its customers and sent her pregnancy-related coupons. Unfortunately, the customer was still in high school and had not told her parents about the pregnancy and this ‘discovery’ created a very uncomfortable situation, constituting a substantial infringement on the girl’s privacy. Platforms can infer many more highly personal attributes about their users including gender, sexual orientation, political positions and even income.  

In order to address such personalization, and counter the damaging properties it can have, it is not enough to analyze the content presented to Jane. First, as the outgoing and incoming vectors strongly influence each other any solution addressing the challenges of personalization must be able to analyze patterns across both vectors. Moreover, a collective point of view, which will enable identifying the personalization and its causes, must be adopted.

We must treat personalization as a matter of utmost importance for the public interest and not leave this area up to the control and discretion of powerful platforms with strong financial interests.


* This work is being carried out as part of the Data Co-Ops Project which focuses on interdisciplinary projects reimagining the data ecosystem, in collaboration with Georgetown University.