

Same graph as above, but with fake sources onlyĪfter collecting and organizing the data (I used graph database Neo4j to store network data), the first step was to do an initial exploratory analysis of the network data. The sources I determined to be fake were those who had more than 5% of their tweets ranked as below partially accurate by the Amazon Mechanical Turk credibility raters. Of those users, 66,621 were determined to be sources of real news and 2,404 were determined to be sources of fake news. The data consisted of 69,025 verified users, and all the connections between them. The primary contribution of CREDBANK is a unique dataset compiled to link social media event streams with human credibility judgements in a systematic and comprehensive wayīy combining this dataset with the Twitter network data, I was able to create my own dataset for training a classification model. In total, CREDBANK comprises more than 60M tweets grouped into 1049 real-world events, each annotated by 30 Amazon Mechanical Turk workers for credibility (along with their rationales for choosing their annotations). If you’ve got the time, I highly suggest checking out the paper but here is the TLDR: Luckily, I was able to find a fantastic dataset in the CREDBANK data that accompanied the ICWSM 2015 paper “CREDBANK: A Large-scale Social Media Corpus With Associated Credibility Annotations”. There is no universally agreed upon way of determining whether or not news is fake news or not, and if there was, it would not be a problem in the first place.
#Network analysis definition in science how to
Probably the biggest problem I faced at the outset of this project was how to determine which Twitter accounts to classify as sources of fake news for my training data. Training Data Problem: How do I decide which nodes represent fake news sources? This graph visualization, as well as all others you’ll see in this article, were created using Gephi.įor the purposes of this project I decided to analyze strictly verified Twitter networks as I felt there was a natural tendency for users to have more trust in sources that have officially been verified by Twitter. Be sure to check out his Medium for excellent posts on exploring and visualizing network data). Each circle represents a verified twitter user node (size of the circle related to total follower count) and the lines, or edges, linking them represent nodes “following” one another (credit to Luca Hammer who provided me with the Twitter edge list. In the image below, I have visualized the ego networks of all Twitter Verified Users with over 1,000,000 followers. In such a network, egos could be human beings or objects like products or services in a business context. Each alter in an ego network can have its own ego network, and all ego networks combine to form the social network. What is an Ego Network?Įgo Networks (also known as Personal Networks in a human social network analysis) consist of a focal node known as the Ego, and the nodes to whom ego is directly connected to, called Alters, with edges showing links between ego to altars or between altars. I am very interested in the power of networks and the information we can gain from them, so I decided to see if I could build a classification model that would find patterns in ego networks to detect fake news. After doing some research on the topic, I found that there was some work being done with graph theory to see if we could use machine learning to assist in the detection of sources of fake news. While “Fake News” has existed long before the age of the internet, it seems like today it is harder than ever to determine the reliability of news sources. Colors represent classes determined through modularity clustering. Circles (nodes) represent users and the lines connecting the circles represent one user “following” another. Twitter network of verified users with over 1 million followers.
