Signa_Lab ITESO, Enjambre Digital,
Openlabs Tecnológico de Monterrey
Translated by Alexandra Argüelles
On June 21st network scientist Albert-László Barbási was interviewed at Carmen Aristegui’s newscast. During this interview, a publication posted on Maven7us’ blog (a company Barabási holds partnership with) was presented; this publication allegedly resulted from an unpublished study regarding the influence of bots during the Mexican presidential election campaigns, from a network science perspective. Altogether, the information provided during the newscast and the publication leads to the following conclusions:
- From over a million Twitter accounts that were analyzed, 53% hold more than a 50% chance of being bots.
- Between 45% and 67% of the presidential candidates’ followers have more than a 50% chance of being bots.
- If the Twitter conversation is “cleansed” of the accounts identified as bots, most of the comments support Anaya and oppose López Obrador.
- The accounts that were identified as bots support PRI (Institutional Revolutionary Party).
Barbási’s interview generated mixed reactions: On one hand, people that support López Obrador (AMLO) have responded on social media with the hashtag #NoSoyBot (I’m not a bot) in order to disprove that over 60% of AMLO’s followers are automated accounts. In contrast, the political alliance “Por México al Frente” (“For Mexico to the Front”) made public a lawsuit against AMLO’s campaign, claiming it exceeded the designated budget amount in order to hire bots.
While all of this took place, different groups of academics and network analysts that share tools, and methodologies (Openlabs, Enjambre Digital & ITESO’s Signa_Lab) have been working collaboratively to address the statements and results of the article on which the material presented during Aristegui’s newscast is allegedly based.
This document intends to provide an approach from a social science perspective on the network analysis of those results, in the light of researches and observations that were made in Mexico; in order to promote critical thought and dialogue regarding the socio-digital networks in a process as important and controversial as the presidential elections in Mexico.
The interview on Carmen Aristegui’s newscast was interrupted on a nodal point of the discussion with Barabási, just when the scientist was asked about the possibility of bots being not only helpful for candidates, but also used for attacking the others. This is very important, as in Mexico different actors and analysis have shown that in recent years bots have mostly been used for attack strategies, such as:
- Creating random trending topics to deflect attention from an specific trend.
- Attacking users with an specific political profile, through the combination of automated accounts and troll accounts (accounts handled by humans with high levels of violence).
- Attacking an specific trending topic, in order to weaken it onto Twitter’s general conversation.
- Adding a deliberate amount of bots to a candidate’s account as an strategy to criticize or even diminish the category and reach of that account.
Botometer algorithm, developed through the collaboration between Indiana University Network Science Institute (IUNI) and The Center for Complex Networks and Systems Research (CNetS) -used by most of app developers and websites that offer the possibility of “detecting” bot presence on Twitter- is based on six-feature evaluation guidelines, which include: published content analysis, feeling analysis, user analysis (account’s data and metadata), friend analysis, network attribute analysis, and time analysis.1 This characteristics are contrasted to develop the account’s communication and interaction patterns, the kind of activity it has (temporal cycles, tweet publication, retweets, and hashtags) as well as the network connectivity type of the user (mentions made and received), in order to determine if an account has any probability of being a bot.
This parameters intervene when assigning the bot probability percentage to a Twitter account, nonetheless, this parameters are non conclusive. After years of study, the evolution of automated processes and the complex articulacion of strategies to simulate human behavior patterns on automated accounts has been observed. Therefore, claiming that an account with an assigned probability of over 50% is a bot, leads to a margin of doubt as there are elements in human user’s behaviour that could match the behaviour of an automated account. For example: people that don’t generate their own content and only retweet, people that are compelled by their employers to post some specific content on their social media, accounts of associations or collectives that only tweet about their events without any significant interaction with other accounts, or recently created accounts that post over 20,000 tweets regarding the elections in less than a year. Establishing how to assure an account is a bot or not on the probability of the 50% doesn’t seem as a pertinent criteria. If we wanted to reduce the margin of error, we could suggest situating the criteria on a probability over the 80% in order to claim the presence of automated accounts. And even so, assuming a margin of error. In other words, there is still a possibility that accounts below the 80% are bots, as well as others above the 80% that may not be bots.
In a nutshell, we can assure that until now there is no consensus on which “trust range” should be used to guarantee that an account is a bot or not, without thoroughly addressing its content through a qualitative analysis. This analysis can be set up on weird or abnormal behaviour, but there are also some key dynamics (which occur simultaneously) that must be assessed: On one hand, there is machine learning, artificial intelligence and other advanced programming tools that are constantly upgrading; making the “human response simulation” son advanced and sophisticated it becomes hard to tell the difference. On the other hand, a vast amount of Twitter users tend to follow automated patterns, which means that millions of users only retweet the posts that best represent their affiliations and phobias, which makes them more susceptible to be identified as bots or automated accounts by the algorithm. The political polarization prospect, on a country like Mexico, accentuates the possibilities for this kind of “humanly automated” behaviour to become an increasingly common immediate response.
Botometer indicator is a controversial object that is also on dispute, due to the fact that it can be used in different ways to filter databases and detect bot presence. It can be used as a didactic tool that makes possible the discussion on these subjects, but it is dangerously inaccurate to use only this indicator as conclusive data.
In addition to this, an exhaustive and qualitative analysis of some of the accounts that (quantitatively) hold a higher probability of being bot accounts, shown a peculiar behaviour that could lead to the belief of a new category situated between “mechanic automatization” and “human automatization”. Referring to accounts that fulfill the tasks of producing and sharing content, with a very intense publication rhythm that could reach thousands of tweets in few months, and that -in times of electoral competition and dirty war- could be attributable to people that turn themselves into a sort of “techno-political-artillery”: accounts that are hired to become the “keepers” of the conversation stream towards or against any of the candidates. Day and night, guiding the conversation, holding a subject, raising another, recruiting other accounts, monitoring the conversation surrounding any campaign.
The analysis presented on the site maven7us.com, used as reference for Aristegui’s editorial piece as well as the newscast, shows some numeric results which claim that -from a network and data sciences perspectives- an account could be considered “automated” if it scores more than .5 probability on it’s evaluation according to the bot score results. Even though the method is not in doubt, the insights on the implicit limits of the script used to perform this evaluation show that -on a social context with blurring boundaries between the human and the automated- in order to identify the accounts with a higher probability of being bots, scrolling the reference measurement to .8 on the bot score could be more adequate to depict an overview, not a categorical affirmation, of the automated account volume. Also, we should add that a qualitative perspective is needed to assess those first resulta, in order to reach further to an even more reliable description of the case.
Facing this panorama it is necessary to establish a clear stand: Internet is not a substitute for the public space nor the political practices, it is a stage for the re-articulation of the communications and politics repertoires. The emergence of social bots and political bots in particular as a campaign strategy is proof that how the struggle for the “truth regimes” and the “production of the political” on digital environments has entered a phase ruled by the automatization processes.
We can show some of the local uses of Botometer script to contribute to the discussion on the atomization processes on the electoral campaigns. A first case is the development of Atrapabot (https://atrapabot.org), a platform that was developed by the collaboration between The Institute of Technology and Society of Rio de Janeiro, AppCivico, Enjambre Digital, and Openlabs. Atrapabot is a Botometer adaptation to spanish, which is presented as a didactic tool that enables the public debate regarding these subjects through the analysis of individual accounts on Twitter. With this tool we are seeking to promote citizenship’s access to new tools that could help them explore the possibilities of certain accounts that may be simulating human behaviours or the existence ef purely automated accounts.
Another example is Proton Pack2, a bot detection script developed under the coordination of Luis Guillermo Natera Orozco by the Signa_Lab team, that -just as the one used by Dr. Barabási, is based on Botometer’s script. This scriptworks by connecting to Botometer’s API to send -in an automated way- all the accounts that the user wants to analyze, and then receiving a report on the bot probability of each account.
Using Proton Pack, Signa_Lab made a test to contrast how the same Twitter accounts can be showns as bots or humans depending on how each account’s score is categorized on the bot score. In order to achieve this, we took a sample of 1450 accounts linked to the four Mexican presidential candidates from may 15th to may 22nd. We added two categories to the database for bot detection: one that marks the automated accounts that obtained .5 on the bot score, and other one that marks the account from .8 on that scale. The intention of this last exercise is to emphasize the need of adding a qualitative layer to the numeric results shown through techniques and tools for automated account detection. The results are shown here:
- The methodologies and tools (scripts) used to stables the bot presence on Twitter are not absolutely conclusive, due to the diverse factors presented above.
- Bots don’t operate nor act by themselves, in order to achieve a successful “contamination” strategy (on a subject or conversation), this accounts are handled by humans that spread hatred, threats, mockery, and humiliation with a significant emotional effect on ther targets and the accounts that follow those targets.
- For this reason, in Mexico, a bot study can’t be complete without an analysis on the threatening figures; their job is much more effective if it is amplified by thousands of automated accounts that promote a trend.
- On methodological terms, the paramount task is to settle a higher mark or at least discuss the implications of settling such a low margin of error, and assuming the great difficulty that establishing a detection parameter on .5 represents due to the unpredictable behaviour of human users, as opposed to the .8 mark we have been working with. With the lower mark, the accounts seemed more like troll accounts, which compels us to reinforce our methodological efforts, triangulate the data, and reinforce qualitative observation.
- There is a lack of studies like this, that settle on a broader discussion space that includes diverse points of view to learn the context, qualify, and contrast the empiric evidences from different perspectives, and sources. It would be very valuable to promote a collaborative culture on a national level, through open data; in order to enable the replication of the studies and the elaboration of reports that could be produces on our context, from academic rather than comercial perspectives.
1 For more information on the characteristics evaluated by the program, the following sources are recommended: Ferrara, E., Varol, O., Davis, C., Menczer, F. & Flammini, A. (2016) The Rise of Social Bots. Communications of the ACM. DOI: 10.1145/2818717.
Varol, O., Ferrara, E., Davis, C., Menczer, F. & Flammini, A. (2017). Online Human-Bot Interactions: Detection, Estimation, and Characterization. arXiv preprint arXiv:1703.03107.
2 Named after the device used on the Ghostbusters film to capture and get rid of “unwanted presences”.