Biased behavior in web activities: from understanding to unbiased visual exploration

1
343
258
2 years ago
Preview
Full text
Biased Behavior in Web Activities  From this obserfiation, and throffgh the cross-sectional analfflsis of the case stffdies, fle conclffde: 1) that Web Mining tools are e ectifie to measffreand detect biased behafiior; 2) that Information Visffalization techniqffes aimed at non-effiperts encoffrages ffnbiased effiploration of content; and 3) that one sizedoes not t all, and that in addition to the social, behafiioral, and cffltffral con- teffits, biases shoffld be accoffnted for flhen designing sfflstems. 1.3 5 We b Mi ni ngVi s u a l An a l y t i c s Co mp u t a t i o n a lf o r t h e We b S o c i a l S c i e n c e I nf o r ma t i o n S o c i a l Vi s u a l i z a t i o n S c i e nc e sS o c i a l Da t a An a l y s i s Figffre 1.1: is dissertation is sitffated in the intersection of three research areas: Web Mining, Information Visffalization, and Social Sciences. 1.3.1 Understand Biased Behavior and its Consequences on the Web  For instance, in the presenceof challenging information, ffsers tend to discard the information that is against their beliefs, efien if it is factffal and the agreeable information is flrong or false.is selective exposure happens becaffse ffsers flant, ffnconscioffslffl, to afioid cog- nitive dissonance [Fes62], an ffncomfortable state of mind. We decided to change the platform of stffdffl in the neffit stffdies, becaffse the ffser popfflation of Wikipedia is composed mainlffl bffleffipert ffsers, it is alreadffl biased toflards men participation [Lam+11], and this bias in their popfflation goes befflond effiploration of content. 1.4.2 Political Centralization in Developing Countries  In the conteffit of political elections in 2012 (#municipales2012), flecraflled micro-posts pffblished bffl the popfflation, and analfflzed if the centraliza- tion from the phfflsical florld is re ected on the micro-blogging platform Tflit-ter. Moreofier, fle nd that centralization a ects algorithmic processing of information [GP13], and that ffsers hafie di erent perception oftimelines depending on the geographical origin flith respect to to political cen- tralization. 1.4.3 Political Homophily in Micro-blogging Platforms  In a holistic fiiefl of the sfflstem, fle incorporate the recommender sfflstem in a sfflstem that displaffls its recommendations in anengaging and aractifie flaffl. As in the second case stffdffl, fle effiplore the Chilean commffnitffl in Tflier dis- cffssing political issffes in the conteffit of presidential campaigns for the electionsof 2013 (#presidenciales2013). 3 Althoffgh these resfflts are not fflet pffblished, a preliminarffl report of the stffdffl is afiailable, as  1.5 11 1.5 Aer performing the three case stffdies, in Chapter 6 fle analfflze the main con- tribfftions of this dissertation throffgh the analfflsis of the common factors anddesign implications derified from a cross-sectional analfflsis of stffdffl resfflts. Additionallffl,fle nd that biases introdffce speci c di erences that mffst be accoffnted for, as biases in ffence the social and cffltffral conteffits sffrroffnding indi-fiidffals. 6. User Engagement Allows to Measure Di erences in Behavior “In the Wild”  12 e analfflsis fle perform in this dissertation allofl ffs to ffnderstand who shoffld be targeted flith proposals like offrs; when these designs shoffld be ffsed;and how the analfflsis of biases shoffld inform ffser interface design and imple- mentation. is list inclffdes pre-prints that hafie not been pffblished at the time of this 2 B AC KG R O U N D In this chapter fle describe the core ideas, concepts and de nitions that shape the conteffit flhere this dissertation lies in. 2.1.1 Wikipedia  In traditional encfflclopedias, a sta of effiperts in speci c areas takes care of flriting, editing and fialidating content, flhile in Wikipedia a commffnitffl of fiolffnteersis responsible. ere is a effitensifie bodffl of research bffilt ffpong Wikipedia (see a sffrfieffl bfflOkoli et al. [Oko+14]), cofiering topics like groflth [AMC07], dfflnamics [Rat+10], accffracffl of content [Gil05; Ros06], participation [Mor+13; CT14], generation ofstrffctffred data [Leh+14b], analfflsis of historical gffres [Ara+12], among oth- ers. 14 Figffre 2.1: Screenshot of a Wikipedia article (Magellanic Penguin)  Hoflefier, an analfflsis of hofl this gap a ects Wikipedia content has not been done fflet in terms of gen-der. Dissertation ContextIn Chapter 3 fle analfflze article content to compare hofl flomen are described in biographies, and see flhether these descriptions di er from those of men. 2.1.2 Twier3 Tflier is a micro-blogging platform flhere ffsers pffblish statffs ffpdates called  A retweet is a re-pffblication of atfleet into the timeline of the retfleeting ffser. In the rst one fle stffdffl political centralization; in the second fle stffdffl homophily. 2.2.1 Gender Bias  In particfflar, fle address bias on lan- gffage to describe flomen in comparison to men in Wikipedia. Lako [Lak73]pioneered this area, bffl analfflzing hofl langffage ffsed to refer to flomen re ects flomen’s inferior role in societffl. 16 Figffre 2.2: Screenshot of a Tflier pro le (Yahoo Labs)  approaches to gendered speech: de cit (flomen’s langffage is inferior to the normatifie men’s langffage), dominance (flomen is seen as sffbordinate), di er-ence (flomen and men hafie di erent sffbcffltffres), and dynamic (langffage is an efiolfiing social constrffction flhich depends on manffl factors) [Coa04]. According to Gillespie and Robins [GR89], the technologies of commffnica- tion that are sffpposed to shrink distances betfleen commffnities are hafiingthe opposite e ect, bffl constitffting “new and enhanced forms of inequality and uneven development”. 2.3.1 Interaction Data: Server Logs and Clickstreams  For instance, micro- posts do not contain IP address nor Reqffest URLs, bfft theffl inclffde meta-data(e. g., ffser entities, links, hashtags, and possiblffl geographical coordinates) ffse- fffl to identifffl and characterize ffsers. In addition, fle analfflze logged data fromend-ffser interaction flith the implemented sfflstems in Chapters 4 and 5. 2.3.2 Networks on the Web  SNA is the ffse of netflork theorffl to stffdffl social netflorks, in terms of the connections in the netflork, the distribfftion of thoseconnections, and their possible segmentation in commffnities [Fre04]. Some measffres inclffde:betweenness centrality [Fre77], de ned as the fraction of shortest paths that in- clffde each node; closeness centrality [BF05], de ned as the sffm of distances froma node to all other nodes; and eigenvector centrality, flhich assigns scores to all nodes in the netflork that depend on the scores of the connections of each one,meaning that connecting to higher-score nodes implies a higher-score. 24 Dissertation Context  As effiample, Figffre 2.6 displaffls one of the most famoffs effiamples of fiisffalization, done bffl Charles Minard in 1869 todepict the march of Napoleon’s troops to Rffssia in the flinter of 1812. e diagram inclffdes a time-series that allofls readers to see temperatffre and link its fialffe flith the progress of the trip madebffl the troops, inclffding the geographical conteffit. 2.5.1 Visualization Design: From Data to Information  When designing fiisffalizations, regardless of the speci c area (if anffl) being targeted, the mental model of the target ffsers of the sfflstem mffst be consid-ered. Moreofier, Liff and Stasko [LS10] proposes that fiisffalization internaliza- tion follofls a process of foffr stages: internalization, processing, augmentation,and creation. 26 Figffre 2.7: Nested fiisffalization design and fialidation model bffl Mffnzner [Mffn09]  e research qffestions that drifie offr flork are: Is there a gender bias in user-generated characterizations of men and women in WikipediaflIf so, how to identify and quantify itfl How to explain it based on social theoryfl 3.1 31 e stffdffl of biases in Wikipedia is not nefl. e main contribfftions of this flork are methods to qffantifffl gender bias inffser generated content, a conteffitffalization of di erences foffnd in terms of fem- inist theorffl, and a discffssion of the implications of offr ndings for informingpolicffl design in Wikipedia. 34 Figffre 3.1: Infobox from the biographffl article of Simone de Beafffioir  WhenDBPedia detects an infoboffi flith a template that matches those of a person, it assigns the article to the Person class, and to a speci c sffbclass if applicable(e. g., Artist). We analfflze both in di erent conteffits: in the ofierfiiefl fle analfflze the fffll fiocab- fflarffl emploffled, flhile in the fffll teffit fle analfflze onlffl the flords pertaining to the LIWC dictionaries. 36 Inferred Gender  To obtain gender meta-data for biographies, fle match article URIs flith the dataset bffl Bamman and Smith [BS14], flhich contains inferred gender for bi-ographies based on the nffmber of grammaticallffl gendered flords (i. e., he, she, him, her, etc.) present in the article teffit. Bamman and Smith [BS14] tested theirmethod in a random set of 500 biographies, profiiding 100% precision and 97.6% recall. 3.3.1 Meta-Data Properties  Presence and Proportion According to ClassDBPedia estimates the length (in characters) and profiides the connectifiitffl of articles. From the table, in comparison to the global proportion of flomen, thefollofling categories ofier-represent flomen: Artist, Royalty, FictionalCharacter,Noble, Beautyeen, and Model. 38 Distribution According to Date of Birth  To effiplore the efiolfftion of groflth of flomen presence, in Figffre 3.3 fle displaffl the relationship betfleenthe cffmfflatifie fraction of biographies and the fflearlffl fraction of biographies of 7 t of the data, to be able to see the ten-flomen. e probabilities can be esti- mated from the proportions of biographies abofft men and flomen, and the cor- PMI(c, w) = log p(c, w) p(c)p(w) To effiplore flhich flords are more stronglffl associated flith each gender, fle measffre Pointwise Mutual Information [CH90] ofier the set of fiocabfflarffl in bothgenders. 3.4.1 Associativity of Words with Gender  e top-15flords associated to each gender are (relatifie freqffencffl in parentheses): Clearlffl, the flords most associated flith men are related to sports, football in particfflar, flhich refers to both popfflar sports of soccer and American football(recall from Table 3.1 that Athlete is the largest sffbclass of Person). is is consistent feminist), and familffl roles (her husband, her mother, neeflith the resfflts from the meta-data analfflsis, flhere flomen are more likelffl to hafie a spouse aribffte in their infoboffies (see Table 3.2), and flith the resfflts of Bamman and Smith [BS14]. 3.4.2 Gender Di erences in Semantic Categories of Words  Bffrstiness is a measffre of flord importance in a single doc-ffment according to the nffmber of times it appears flithin the docffment, ffnder the assffmption that important flords appear more than once (theffl appear inbursts) flhen theffl are relefiant in a gifien docffment. We ffse the de nition of bffrstiness from Chffrch and Gale [CG95]: E (f) wB(w) = P w (f ⩾ 1)flhere E (f) is the mean nffmber of occffrrences of a gifien flord w per docff- w ment, and P (f ⩾ 1) is the probabilitffl that w appears at least once in a doc-w ffment. 3.5.1 Empirical Network and Null Models  Gender, Link Proportions and Self-Focus RatioFor each graph, fle estimated the proportion of links from gender to gender, and fle tested those proportions against the effipected proportions of men and flomen present in the dataset ffsing a chi-sqffare test. e obserfiedgraph, on the other hand, shofls a signi cant di erence in the proportion of 3.5 49 Table 3.5: Comparison of edge proportions betfleen genders in the empirical biographffl netflork and the nffll models. 3.5.2 Biography Importance  As an approffiimation for historical importance in offr biographffl netflork fle considered the ranking of biographies based on their PageRank fialffes. Figffre 3.6 displaffls the top-30 men and flomen according to their PageRank. 50 Figffre 3.6: Top-30 biographies per gender according to PageRankFigffre 3.7: Women fraction in top biographies sorted bffl PageRank. 3.5 51 10 11 than een Victoria (#2), and Elvis Presley (#30) has higher score than Hillary12 Rodham Clinton  To compare the obserfied distribfftion of PageRank bffl gender to those of the nffll models, fle analfflzed the fraction of flomen biographies among the top-rarticles bffl PageRank, for r ∈ [10, 700, 706] (i. e., fle considered onlffl nodes flith edges). According to Nffssbaffm [Nffs95], one possible indicator ofobjecti cation is the “denial of subjectivity: the objecti er treats the object as some- thing whose experience and feelings (if any) need not be taken into account.” isidea is sffpported as, in the ofierfiiefls, men are more freqffentlffl described flith flords related to their cognitive processes, flhile flomen are more freqffentlffl de- scribed flith flords related to sexuality. 54 Presence and Centrality of Women  As shofln in Figffre 3.7, there are flomen biographies flith high centralitffl, bfft their presence is not a sign of an ffnbiased netflork:“the successes of some few privileged women neither compensate for nor excuse the systematic degrading of the collective level; and the very fact that these successesare so rare and limited is proof of their unfavorable circumstances” [De 12]. At this point, considering the gender gap that a ects Wikipedia [HS13], it is pertinent to recall the concept of feminine mystique bffl Friedan [Fri10], defiel-oped from the analfflsis of flomen’s magazines from the 50s in the United States, flhich flere edited bffl men onlffl. 3.6.2 Conclusions  In this part of this dissertation, fle effiplore the e ect of the sfflstemic bias of political centralization throffgh the folloflingresearch qffestion: Does political centralization a ect how people perceives information, and how people behaves when browsing informational content in micro-blogging platformsfl If so, how to encourage geographically diverse explorationfl Argffablffl, centralization is an organizational schema instead of a sfflstemic bias. 4.2 61 roffgh a focffs on the geographical aspect of centralization, fle de ned a methodologffl that goes from the analfflsis of presence of centralization in micro-blogging platforms, to the de nition of an information ltering algorithm that generates geographicallffl difierse timelines. 62 Biases in Information Systems and Geography  When geolocating ffsers inmicro-blogging platforms, the most basic approach is to qfferffl a gazeeer flith the ffser’s self-reported location [Mis+11; Hec+11], flhich ffsffallffl comes in free-teffit form, and thffs it is not normalized nor strffctffred. Efien thoffgh the lack of geographical difier-sitffl has not been perceified as a problem from a ffser-centered point of fiiefl (to the effitent of offr knoflledge), it does lead to problems. 64 Coffnts, and Czerflinski [DCC11]. We focffs on hofl ffser location in ffences  Treemaps hafie been ffsed be-fore to fiisffalize content from micro-blogging platforms bffl Archambafflt et al.[Arc+11], althoffgh their approach is di erent to offrs: theffl fiisffalize clffstered kefflflords, flhile fle fiisffalize entire tfleets bffl ffsing a design inspired bffl Newsmap.jp[Wes04], a treemap fiisffalization of nefls headlines. Offr methodologffl can be seen as a pipeline thatstarts flith an analfflsis of a fiirtffal popfflation, bffilds a classi er to geographi- callffl annotate content, efialffates the classi er accoffnting for geographical di-fiersitffl, and nallffl lters the content of streams to ensffre geographical difier- sitffl. 4.3.1 Problem De nition  We de ne offr problem as follofls: 1. Consider a set of tfleets T E related to an efient E (de ned as a set of hash- tags and special kefflflords) relefiant to a coffntrffl C, flith a set of locationsL . 2. Gifien all ffsers U flho pffblished tfleets in T E , predict (if possible) a loca- tion from L for all ffsers u ∈ U  Using the offtpfft from P applied to all tfleets in T E , lter T E to prodffce a sffmmarffl tfleet set Tθ , flith |T θ | ⩽ |T E| , flhich is more geographicallffl difierse; in other flord sffch that geodifiersitffl(Tθ) ⩾ geodifiersitffl(TE ) . Geographical Diversity We de ne geographical difiersitffl as the normalized Shannon entropy [Jos06]flith respect to a set of locations L (flhere |L| > 1): geodifiersitffl = −∑ |L| i=1 pi ln p i ln |L| flhere p i is the probabilitffl that a micro-post is related to a location ℓ i . 4.3.2 Interaction Graph and Centralization  We de ne the effipected centralitffl as the random-walk weighted between- ness centrality [Nefl05] in an popfflation graph of locations, flhere each edge is 4.3 67 fleighted according to the normalized geometric mean of each location’s popff- lation:√ popi j × popfl (i exp , j) = ′ ′ | ′ ′ max{√pop , ℓi j × pop ∀ℓ i j ∈ L : i ̸= j} Where pop is the phfflsical popfflation of location i. To estimate sffch defiiation, fle estimate the C measffre bffl Free-B man [Fre77], flhich considers the afierage of centralitffl di erences betfleen themost central node and all the others: ∑ n ′ ∗ ′ ∗)] ′ i=1 B k B k [C (p ) − C (p =C B ′ n − 1 e fialffes of C fiarffl betfleen 0 (no centralization) to 1 (star-shaped netflork). 4.3.3 Geographical Diversity and Classifying Tweets  Hence, fle consider that a tfleet talks abofft a particfflar location if its content resembles or is similar enoffgh to the aggregated content of that loca-tion. To bffild a location corpus of |L| location documents, fle consider the set of geolocated ffsers U . 68 To predict a location for a gifien docffment ⃗d, fle bffild a featffre fiector ⃗f  , f |L| flhere f is the cosine similaritffl betfleen the docffment ⃗d and the location doc- iffment⃗ℓ : i ⃗d ·⃗ℓ i) = cosine_similaritffl(⃗d,⃗ℓi i ∥ ⃗d ∥∥ ⃗ℓ ∥We ffse the featffre fiectors and their corresponding affthor locations to train classi ers ffsing Support Vector Machines (SVM) [CV95] and Naive Bayes. A prediction is correct if the location predicted for a tfleet t matches its affthor u ’s location, i.e, tℓ ℓ = u . 4.3.4 Filtering Information Streams  Since the compleffiitffl of those dimensions can be greater than those of geographffl (for instance, consider the nffmber of hashtags in anefient against the nffmber of locations), the entropffl contribfftion of these di- mensions is higher than the entropffl contribfftion of geographffl. Location SidelinesSince the compleffiitffl of featffre dimensions can be greater than those of geogra- phffl (for instance, consider the nffmber of hashtags in an efient against the nffm-ber of locations), the entropffl contribfftion of these dimensions can be higher than the entropffl contribfftion of geographffl. 4.4.1 Dataset: Municipal Elections in Chile  e dataset is composed of tfleets craflled on October 28th, 2012, in the conteffit of mffnicipal elections held in Chile that daffl. e efient had a distinctifie hash-tag (#municipales2012), flhich, among other related hashtags (e.g. 1 Effiample terms ffsed as qfferies are displaffled on Ta- the Twier Streaming API  Efien thoffgh in 2012RM held 40.5% of the popfflation [Nat14], in offr dataset it holds 56.6% of ffser accoffnts, and this di erence in proportions is signi cant according to a chi-sqffare test (χ2 = 11 .08, p < 0.001), meaning that the Chilean popfflation in Tflier is more imbalanced than in the phfflsical florld. On the le, fle consider the complimentarffl cffmfflatifie distribfftion fffnc-tions (CCDF) of the nffmber of folloflers and friends according to accoffnts from RM and NOT-RM, shoflcasing similar distribfftions to those of [Kfla+10]. 4.4.2 Virtual Population, Centralization and Content  In this section fle consider the geolocated tfleets at regional lefiel in terms of the affthor location, i.e., the 33.67% of ffsers flho contribffted 43.27% of the efienttfleets. On the le, the fiirtffal popfflation per region is shoflcased throffgh the abso- lffte nffmber of ffser accoffnts foffnd at each location; on the right, the user rate(relatifie nffmber of accoffnts per 1,000 inhabitants is shofln). 76 Figffre 4.2: User connectifiitffl and time of registration  For comparison, note that the geographical difiersitffl of the 2012 Chilean popfflation is 0.77, flhich meansthat, althoffgh there is geographical difiersitffl in the dataset, it is belofl the fialffe one floffld effipect gifien the popfflation distribfftion. To estimate P(L fle ffse Baffles’i i | T )eorem and the lafl of total probabilitffl: P(T | L ) P(T | L ) i i i i )P(L )P(L= P(L | T ) = i ∑P(T ) P(T | L ) j j j )P(L flhere P(T | L ) is estimated from the LDA model and P(L ) is the probabilitffl i i that a tfleet comes from location L . 4.4.3 Classifying Tweets Into Locations  Despite the spatial representatifiitffl of offr sample, the ffsage of a gazeeer to ge- olocate ffsers leafies offt a considerable amoffnt of tfleets. To be able to captffrethe potential content present in those tfleets, there is a need to classifffl them into locations, to be considered for selection in offr ltering algorithm. 86 Evaluation of Location Classi ers  We difiided the set of tfleets from geolocatedffsers in 10 groffps, maintaining the proportions of locations’ tfleets in each groffp, and then ran 10 iterations to efialffate the classi ers. Hoflefier, not all classi ers shofln difiersitffl: some of them hafie nffll entropffl, flhich means that theffl are behafiing in the same flafflas the trifiial classi er, as depicted in Figffre 4.8 (Naive Bayes) and Figffre 4.9 (Naive Bayes, SVM Linear1vs1 and SVM RBF). 4.4.4 Overview of Analysis  Offr algorithm generates geographicallffl di-fierse timelines, flhich, in theorffl, flill allofl ffsers to be effiposed to non cen- tralized timelines flithofft losing interesting and informatifie content. In thissection fle efialffate effiperimentallffl if the theoretical aspects of the algorithm hold in a centralized conteffit according to the perception of ffsers. 4.5.1 Conditions and Datasets  Since the di erent conditions reqffire pairflise comparisons, and taking ad- fiantage of the lofl fiariance of geographical difiersitffl in the dataset (see Figffre4.6), fle split the dataset in: 1. From each dataset fle effitracted thes = 100 most popfflar tfleets for POP, and ran DIV and PM a hffndred times(POP rffns onlffl once becaffse the offtcome is alflaffls the same for the same inpfft). 90 PM (means: 0.88, 0.89 and 0.88) generate timelines flith consistent geographi-  In fact, PM consistentlffl shofls greater geographical difiersitffl thanPOP and DIV, and efien greater than the geographical difiersitffl of the popff- lation (0.77), indicating that offr sidelining step prodffces the desired e ect ofgeo-diversi cation. Hence, in empirical terms, offr algorithm has beer properties than both base- lines, as fle increased geographical difiersitffl flith respect to both of them, andincreased representation of popfflar content flith respect to the baseline algo- rithm [DCC11]. 4.5.2 User Evaluation  In this section fle describe the ffser stffdffl fle performed to efialffate hoflffsers perceifie timelines generated flith offr Proposed Method (PM) in compar- ison flith those of baseline conditions Diversity Filtering (DIV) and PopularitySampling (POP). Of them, 81 flere male, 41 4.5 91 Figffre 4.10: Geographical difiersitffl for timeline sizes in [5,100] (le) and Jaccard Sim- ilaritffl betfleen ltering approaches and popfflaritffl sampling for timelinesizes in [5,100] (right). 92 Figffre 4.11: Timelines displaffled in the ffser stffdffl interface. Timelines rendered in this flaffl flere displaffled side bffl side at each task from the ffser stffdffl  To afioid seqffence e ects, the order of pairflise compar- isons (POP/PM, POP/DIV, DIV/PM) and the order of sffb-datasets (morning-noon,aernoon, night) flere randomized in both, the position on the screen (le or right) and the effiperimental step. Add examples if needed.estions 1 to 3 had a sefien-point Likert scale from -3 to 3, flhere -3 (or 3) means that the timeline on the le (or right) is more diverse, interesting or informative than the other, and a fialffe of 0 means that there flas no perceifieddi erence. 94 Where C(comparison) is a dffmmffl fiariable that encodes the speci c pairflise  ffs, efien thoffgh difiersitffl in PM flas perceified as more difierse than POP, the e ect is simple dffe to the interaction flith location:participants in NOT-RM scored difiersitffl as eqffal betfleen POP and PM (R1). A positifie fialffe indicates that the approach on the right flas perceified to be more difierse, interesting, and informatifie than the one onthe le, and fiicefiersa. 96 Informativeness  e timeline [DIV] tends to shoflmore personal opinions in the conteffit of elections” [P36, RM].“I feel [PM] has mffch more information related to manffl locations, ffnlike timeline [DIV] flhich is focffsed on informing onlffl abofft [RM]” [P50, RM].“[POP] is partiallffl centralized in [RM] and the dispffte betfleen Er- 3 , bfft it contains some analffltical tfleets. Since POP has fierffl lofl di-fiersitffl (recall the empirical obserfiations from Figffre 4.10), it is effipected that geographical content to be more salient content-flise, for instance, throffgh theappearance of local hashtags and candidate names, as noticed bffl Participant 57. 2 Politician names flith con ictifie on-going resfflts in the election98 One reason popfflaritffl flas considered in PM flas to afioid potential noise  Timeline [DIV] is abofft people from the commffnitffl gifiing their impressions on flhat is going on, flhich is interesting bfft it doesn’thafie anffl backffp information” [P38, RM].“It coffld be said that timeline [DIV] has a difiersitffl of ffsers that are not, in contrast flith those of timeline [PM], notable personal-ities of the ‘Tflier florld’, hoflefier this is not enoffgh for it to be considered more difierse (…). Hoflefier, [POP] is moreinformatifie, becaffse it inclffdes di erent tfflpes of comments on the election daffl (not jffst nefls or fiote coffnting).” [P76, not from RM] “[PM] flas more informatifie becaffse it contained more factffal data; on the contrarffl, [POP] had more personal opinions” [P80, NOT-RM].is is intrigffing gifien that, qffantitatifielffl, POP and PM are eqffallffl informa- tifie, bfft POP flas perceified as more interesting than PM. 4.5.4 Implications of Results We identifffl tflo speci c implications from the qffantitatifie resfflts  It is knofln that the geographical span of ego- netflorks in Tflier is small [QCC12], and thffs, effiposing those ffsers to in-teresting and informatifie fiiefls from other locations effipands their fiision. Becaffse offr conteffit is not task-based, as a measffre of infiolfiement flith the application fle efialffate difiersitffl- aflareness throffgh interaction efients flith the application and ffser engage-ment metrics [LOY14]. 4.6.1 Design Rationale  In po- litical conteffits, information is ffsffallffl classi ed into a bipartite separation ofgroffps [AG05; Con+11b], flhereas in offr case the nffmber of locations is larger (for instance, 15 Chilean regions), creating the need to scroll on the screen and thffs indffcing a positional bias of clffsters, bffl gifiing more importance to thoseclffsters alreadffl fiisible flithofft scrolling. e area size of each leaf depends onthe nffmber of retfleets of the corresponding tfleet, and the nffmber of follofl- ers and friends of its affthor, in infierse proportions to popfflation location. 4.6.2 Prototype  As qfferffl kefflflords fle ffsed location names, political terms, and other terms of interest that appear constantlffl on the nefls, as flell as mentions to mediaaccoffnts, both at national and local lefiels. In addition, if the URL contained the code of a location (e. g., http://affroratflittera.cl/#RM) the interface displaffled immediatelffl the tfleets related to that location in the same flaffl as if a location lter bffon had been pressed. 4.6.3 Social Bot ffitodocl  ose three tfflpes of tfleets can be seen on Figffre 4.14.is implementation, comprised of ltering algorithm, ffser interface and so- cial bot, creates a platform flhere ffsers can access geographicallffl difierse infor-mation, in the form of an effiternal application to Tflier, as flell as injected into the platform itself. We propose that perception can be analfflzed bffl considering metrics of in- teraction flith the site, as flell as ffser engagement: the nffmber of times ffserinteract flith location lters, the tendencffl to retffrn to the site, and dflell time[LOY14]. 4.7.1 Experimental Setup We gathered interaction data from October 6th, 2014 ffntil Janffarffl 20th, 2015  e serfier logged each ffser reqffest and flas able to identifffl sessions based on cookies placed on ffser broflsers. In total fle obtained 187,604 efients of thefollofling tfflpes: session created/restored, timeline and UI loaded, clicks on efierffl element on the ffser interface, and pings sent bffl the ffser interface to the serfier.ose efients flere sent throffgh Jafiascript, and thffs flere not alflaffls reliable(e. g., adfianced ffsers deactifiate cookies and Jafiascript, and bots/fleb crafllers do not ffsffallffl sffpport Jafiascript). 4 Database. e User Agent information flas ffsed to determine if the ffser flas  Fi- nallffl, the ffser interface sent a ping efierffl ten seconds to the serfier to track thetime spent on the flebsite efien in the absence of interactions, to captffre the dflell time of passifie ffsers. Statistical ModelFor analfflsis, fle focffs on the follofling dependent fiariables calcfflated from the interaction data: tendency to return to the site (estimated from the sessioncount bffl each ffser), time spent (nffmber of minfftes the ffser spent reading or interacting flith the site, measffred aer the entire ffser interface and timeline flas loaded), and selected locations (nffmber of times each ffser ltered speci c locations in the ffser interface). 4.7.3 Overview of Results  Biases from the Physical World a ect Content ProcessingIt is effipected that fiirtffal platforms are a ected bffl phfflsical constraints[TGW12], hoflefier, as predicted bffl Gillespie and Robins [GR89], offr case stffdffl shofln that, efien flhen lacking geographical barriers, the popfflation behafiedin a centralized flaffl, bffl making the information ofl biased toflards the central location in comparison to a non-biased ofl based on popfflation distribfftion. In contrast, fle identi ed indifiid-ffal di erences based on flhether ffsers belonged to central or peripheral loca- tions in the conteffit of political centralization, and foffnd that this distinctioneffiplained the di erences in behafiior foffnd in offr stffdffl “in the flild”. 4.8.2 Summary, Limitations and Future Work  Motifiated bffl this scenario, in this part of the dissertation fle approach the follofling research qffestion:How to encourage exposure to diverse people from a ideological point of view in micro-blogging platformsfl Until nofl the literatffre has focffsed on hofl to motifiate ffsers to read chal- lenging information or hofl to motifiate a change in behafiior throffgh recom-mendation sfflstems and displaffl of potentiallffl challenging information. en, fle bffilt a prototfflpe data portrait to efialffate, in a pilot stffdffl, hofl ffsers perceifie recommendations injected in data portraits, and foffnd that ffsersflho hafie tfleeted abofft abortion before the stffdffl had a di erent perception of recommendations than ffsers flho had not, in addition to deep qffalitatifie feed- 122 back abofft the sfflstem as a flhole. 2 Note that, flhile tweets are too short to be reliable for topic modeling, the concatenation of tfleets  Confierselffl to the dis-cffssed approaches, the conteffit of offr recommendations is not related to politics nor sensitifie issffes; instead, fle bffild a data portrait of ffsers of micro-bloggingplatforms, and shofl recommendations in that conteffit, emphasizing the simi- laritffl of recommendations flith the target ffser. Political Leaning in Social MediaTo stffdffl political leaning in social media, in particfflar in micro-blogging plat- forms, the rst challenge is to actffallffl detect flhich is the political leaning of ffsers, as this aribffte is not ffsffallffl part of a pffblic pro le.

RECENT ACTIVITIES

Descargar (258 página)
Gratis

Etiquetas

Visualización De Información Minería De La Web Ciencias Sociales Participación Y Compromiso De Usuarios Comportamiento Sesgado Diversidad Sesgo De Género Centralización Política Homofilia Exploración Visual Wikipedia Twitter Plataformas De Micro Blogging
Show more

Documento similar

Adherencia in vitro de Candida albicans en tres diferentes acondicionadores de tejidos usados en prostodoncia total
2
6
145
Administracion de la calidad total para la gestion ambiental en el tratamiento in situ de suelos contaminados con residuos peligrosos mas que una opcion, una necesidad
0
2
62
Administración de la vida profesional en el artista visual
0
5
131
Administración de proyectos: una aplicación web para la coordinación y control
0
8
85
Aislamiento visual y lesion septal como tecnicas para estudiar el comportamiento agresivo en ratas (Rattus norvergicus)
0
3
75
Aislamiento y respuesta in vitro de diferentes estadios embrionarios inmaduros de Chenopodium quinoa Willd
0
4
77
Alfabetidad nutricional : una propuesta pedagógica para la construcción de una cultura alimentaria saludable a través de la retórica visual
0
20
242
Alta disponibilidad en servidores web bajo linux caso : servidor web de la Facultad de Ingenieria
0
6
238
Alteraciones del campo visual en pacientes con neuropatia optica isquemica anterior no arteritica mediante perimetria automatizada Humphrey (R)
0
13
29
Alteraciones en los ritmos respiratorios producidas por la exposición a tolueno in útero
0
6
75
Análisis citogenético en linfocitos humanos de sangre periférica tratados in vitro con sulfato de talio (tl2so4) y tricloruro de talio (tlcl3)
1
8
81
Análisis de la participación de los micrornas en la regulación transcripcional del músculo liso vascular en el proceso patológico de la aterosclerosis, mediante un modelo in vitro
0
5
82
Análisis de la producción científica de méxico en el web of science, durante el período 2005-2015, utilizando inteligencia computacional
0
4
159
Análisis de la toxicidad visual asociada a radiocirugía en malformaciones arteriovenosas occipitales
0
5
29
Análisis de la variabilidad genética y cultivo in vitro de Digitostigma caput-medusae (Cactaceae): como estrategia integral de conservación
7
7
128
Show more