WarcraftReamls.com
  FAQFAQ    SearchSearch    MemberlistMemberlist    UsergroupsUsergroups   RegisterRegister 
  ProfileProfile    Log in to check your private messagesLog in to check your private messages    Log inLog in 
Who name selection criteria
Goto page Previous  1, 2, 3  Next
 
Post new topic   Reply to topic    WarcraftRealms.com Forum Index -> CensusPlus UI Mod Bugs
View previous topic :: View next topic  
Author Message
bringoutyourdead
Forums Admin & general flunky


Joined: 07 Nov 2008
Posts: 535
Location: Silicon Valley
WR Updates: 6,600,430
bringoutyourdead WR Profile

PostPosted: Tue Nov 01, 2011 1:41 pm    Post subject: Reply with quote

1974ER wrote:
I admit that I know very little about databases, but based on how names work on Blizzard servers, I have to assume your assumption is faulty. Why? Because if it were true, it would be impossible to have a Legolas, Lgolas, Legols and Leglas on the same server, correct?

Yes and no Wink

I currently do not have internet access for my game machine so I don't have any way to experiment and see what results I get back from /who requests. I'm using locked down public (library) machines for my connection and studies.

Obviously as you realized when entering string data into the database (for example character name creation, guild names, pet names) all ASCII and Latin-1 characters from the UTF-8 character set are allowed. Apparently for US and EU servers the other UTF-8 are blocked(haven't seen any Latin extended or Greek characters for example.)

Both MySQL and Oracle have the ability to create case insensitive selection collations to pull data from their tables. And I expect this is what Blizzard is using on the /Who query. If someone wants to invest some time in testing these queries manually where you already know you will get positive matches on names, it would help.
If level 47 human warlock "Bad" and level 47 Gnome warlock "Bd" exist then will /who n-a c-"Warlock" 47-47 return only "Bad" or both names? An interesting side note http://www.wowwiki.com/Who_List states that n- must have at least 3 characters and we know this is wrong or CensusPlus would show errors.

On the other hand the /Who query is a string query and these are always resource expensive. I can guarantee that Blizzard has a unique numeric key for every character that has ever been used on WoW. ID keys like this are leaking out with the web api calls.. for example: doing the query show all classes returns

{"classes":[{"id":3,"mask":4,"powerType":"focus","name":"Hunter"},
{"id":4,"mask":8,"powerType":"energy","name":"Rogue"},
{"id":1,"mask":1,"powerType":"rage","name":"Warrior"},
{"id":2,"mask":2,"powerType":"mana","name":"Paladin"},
{"id":7,"mask":64,"powerType":"mana","name":"Shaman"},
{"id":8,"mask":128,"powerType":"mana","name":"Mage"},
{"id":5,"mask":16,"powerType":"mana","name":"Priest"},
{"id":6,"mask":32,"powerType":"runic-power","name":"Death Knight"},
{"id":11,"mask":1024,"powerType":"mana","name":"Druid"},
{"id":9,"mask":256,"powerType":"mana","name":"Warlock"}]}
as the json reply. Interesting to note where is ID:10?!.. I expect this is a class they were going to have and then decided to not use. Possibly the Monk class was going to be used and then they decided to delay until later. The other interesting thing is the binary bitmask.. used when they want to compress data from 8 bit or larger chunk size down to single bit storage.

Doing a web api query for races returns
{"races":[{"id":3,"mask":4,"side":"alliance","name":"Dwarf"},
{"id":6,"mask":32,"side":"horde","name":"Tauren"},
{"id":5,"mask":16,"side":"horde","name":"Undead"},
{"id":2,"mask":2,"side":"horde","name":"Orc"},
{"id":7,"mask":64,"side":"alliance","name":"Gnome"},
{"id":8,"mask":128,"side":"horde","name":"Troll"},
{"id":9,"mask":256,"side":"horde","name":"Goblin"},
{"id":11,"mask":1024,"side":"alliance","name":"Draenei"},
{"id":22,"mask":2097152,"side":"alliance","name":"Worgen"},
{"id":10,"mask":512,"side":"horde","name":"Blood Elf"},
{"id":1,"mask":1,"side":"alliance","name":"Human"},
{"id":4,"mask":8,"side":"alliance","name":"Night Elf"}]}
as the json dataset. This show two other oddities the jump with Worgen id and mask!

I haven't yet (and probably never will) find a web api call that leaks actual character ID keys.

If Rollie could match and use the same ID keys as Blizzard it could make for some interesting future improvements to CensusPlus... but I expect that Rollie would have the same issues changing his keys as he has with changing the default character set he uses to allow Cyrillic.

Since I've apparently hit a roadblock I can't currently solve with names, I have started looking further at CensusPlus.
I have made a stab at updating CensusPlus to handle Pandarens and Monks.. also added the Latin American and Brazil locales to the US servers so they can be added to the census runs.

This is strictly just playing with code.. since 1)This is Rollie's addon 2)Any changes are worthless without the changes in this site's database and 3)the assumptions I'm making about how Blizzard will handle the faction changing Pandaren are probably all wrong.

This does make me speculate on WoW future.. since Pandaren's will have three factions associated with the race. (neutral, alliance or horde).. does this open up the possibilities of renegade characters? Horde humans, reformed alliance blood elves? Shocked
Back to top
View user's profile Send private message
1974ER
Epic Censi


Joined: 07 Nov 2008
Posts: 723

WR Updates: 22,670,394
1974ER WR Profile

PostPosted: Wed Nov 02, 2011 3:52 am    Post subject: Reply with quote

Actually, you are overcomplicating the test and I did one for you.

I temporarily created a character called Cent. /who Cente returned 0 players, while /who Cent found her. Meaning that the letter must be an exact match to be found.

As for future speculations, I rather suspect that Blizzard will (if there are further additions to race roster) add more neutral races, because it means they only have make one, not two. Granted, MoP is another odd twist as such, as we are getting both a race AND a class...

There is also the question... Do we even really need more races/classes? I am starting to lean towards no... The monk is a full hybrid leather user and as such stands on the same footing with a druid.

As for races, both sides already have big, medium, small, also both sides already have furries (worgen and tauren)... Naturally, Blizzard could opt for something unusual (giant size is not a option because tauren and draenei already have trouble passing through some door(way)s, but tiny size could be an option) like going non-humanoid, levitating/flying (fairy/pixie/drake/vampire?), transparent/only partially corporeal (ghost?), mechanical (golem/construction?), elemental lord (protection in earth elemental form, DPS in fire elemental form and healer in water elemental form), etc...

It's also worth noting that player controlled Pandaren are apparently denied the option of remaining fully neutral. Which could have been interesting. I know some players have expressed their desire for a (fully) neutral faction. Considering how some factions behave ingame, renegade might be a bit far fetched... But I could fully see tauren and night elf druids opting for the same neutral, pro-nature faction or a human paladin allying himself with blood elf paladin to slay demons. They could retain some of their factional views, but would set their priorities in a different order than other members of their respective races. Steamwheedle goblins could get a more active role as true neutral traders in good, old highest bid wins all fashion. Very Happy
Back to top
View user's profile Send private message
bringoutyourdead
Forums Admin & general flunky


Joined: 07 Nov 2008
Posts: 535
Location: Silicon Valley
WR Updates: 6,600,430
bringoutyourdead WR Profile

PostPosted: Wed Nov 02, 2011 1:50 pm    Post subject: Reply with quote

1974ER wrote:
Actually, you are overcomplicating the test and I did one for you.
Yea.. I do that a lot. Wink

1974ER wrote:
I temporarily created a character called Cent. /who Cente returned 0 players, while /who Cent found her. Meaning that the letter must be an exact match to be found.
Thanks.. that is very important info.

Looking at the code I see that Rollie thought about subdividing by zones, but then commented it out.
Would you do a quick /who request to see if a z-mount returns data from multiple zones with mount in their names or error as invalid request option.

1974ER wrote:
As for future speculations, I rather suspect that Blizzard will
do whatever insane irrational thing they can think of to keep the money flowing from their cash cow WoW

I agree with what you said. I've already had concerns with looting conflicts with multiple class. I can't tell you how many times I've seen loot drama with two players who can both use the armor drop where one would see a very minor upgrade and intends to replace right away with more appropriate gear and the other would get a decent upgrade with no better gear available until leveling. You would think the first would say it is better for you, so take it.. Instead they bitch over it and blow up the group in distress.

The other problem I have with more races/classes.. is that I don't have any more slots available on my play realm. Blizzard should increase the maximum characters you can have on a realm. Over the years I've invested too much time and effort with the toons I have.. and I've killed off toons as high as level 50 to make space for the new race/class combos.

The one thing I do know (ok expect) is that the final expansion will be the one where the players battle the titans for the right to live. Either the Titans do come back to Azeroth.. or the players leave Azeroth to find the Titans.... and Blizzard somehow uses that to get everyone to jump to their "NextGen MMO" Twisted Evil
Back to top
View user's profile Send private message
1974ER
Epic Censi


Joined: 07 Nov 2008
Posts: 723

WR Updates: 22,670,394
1974ER WR Profile

PostPosted: Thu Nov 03, 2011 6:56 am    Post subject: Reply with quote

Very HappyVery HappyVery Happy Happens to me as well...

You're welcome.

It didn't do either... It simply didn't return anything... doing a /who mount got info from multiple places.

Having monks and druids is going to be problematic... A 10 man class run only has room for 1 of those. Dropping any other class is not really a viable option (most of the time) from a loot distribution point of view, though in theory, either one could sideline a warrior, DK or paladin from a tanking spot, a rogue from a DPS spot or a shaman from a healer spot.

Under specific circumstances, a holy or retribution paladin might get sidelined as +int plate is fairly rare to begin with. Naturally, Blizzard might make changes that render all of this null and void, as they are heavily revamping the talent trees and there are persistent rumours that looting will also be changed (again). Less credibly, there have been some whispers about 15 and / or 20 man dungeons...

Some people I have spoken with have also speculated that we might get some sort of "substitution bench" sort of system, where a raid is assigned a replacement, who can't actually participate, but sees what's going on and has the option to step in, if someone DCs or quits in the middle of things.

Unfortunately, at times, it seems we are playing World of Gearwarcraft. Sad

And I share your pain of too few slots... not only per realm, but overall as well... :/

Lastly, I need to re-emphasise that several things I said are pure rumours, guesses and wishful thinking, not facts.
Back to top
View user's profile Send private message
bringoutyourdead
Forums Admin & general flunky


Joined: 07 Nov 2008
Posts: 535
Location: Silicon Valley
WR Updates: 6,600,430
bringoutyourdead WR Profile

PostPosted: Thu Nov 03, 2011 1:36 pm    Post subject: Reply with quote

1974ER wrote:
It didn't do either... It simply didn't return anything... doing a /who mount got info from multiple places.

Hmm... what I'm trying to find out is if you pass the 'z-' zone selection parameter to the /who request, does it require a complete zone name or will it accept a partial like the name 'n-' selector

http://www.wowwiki.com/Who_list shows as following example:

n-lis z-"Ironforge" r-"Human" c-"Priest" 30-40

http://www.wowpedia.org/Who
is much better an explaining manual /who queries

http://wowprogramming.com/docs/api/SendWho
shows the api call..

so a naked '/who mount' will match any character, guild, zone, (class if there was one), (race if there was one) that contained 'mount' within the name. The wowpedia entry is in error for its filter 2 example..
in that like "guild" "character name" would be not limited to Priest but anything that contained priest as part of the name.

so I errored.. I should have asked for /who z-"mount"

only http://www.wowpedia.org/API_SendWho insists that the z-,r-,and c- selectors have the selection item in " ". And none of my resources say if " " indicate exact match or not.

At this point I don't trust any of the resources I've been using.. sigh Sad

If z-"mount" returns data from multiple zones with mount in name then what Rollie started to do (then commented out) will work.
If it doesn't then a sub-query will need to be done with zone names from a table.. (I've already found a localization list of all current zones.)

I'm thinking about making a trial version which is Rollie's code with an addition of alternate name letter selection and zone sub-query as selectable options for those who want to be more complete.. need to see how much of a process impact this will add to census runs.
Back to top
View user's profile Send private message
1974ER
Epic Censi


Joined: 07 Nov 2008
Posts: 723

WR Updates: 22,670,394
1974ER WR Profile

PostPosted: Thu Nov 03, 2011 3:06 pm    Post subject: Reply with quote

Ah, sorry, I didn't come to think of that either...

Also, I tested:

/who z-"mount"
/who z-"high"
/who z-"stra"

And they all worked, locating people in multiple locations.

However, with a bit of further testing I discovered an additional problem...

On a high population realm, even a /who z-"mount" n-"a" 85-85 r-"human" c-"paladin" can produce more than 49 hits.

Which effectively means that to separate by zone, zones would have to identified specifically... and with popular destinations, like Mount Hyjal, Orgrimmar or Stormwind, even that might fail.

In short, I fear adding too many search criteria will severely bog the census down... and that would be really bad... I just completed a run on EU-Silvermoon, Alliance... the census took 38 minutes 7 seconds with the current coding to locate, as a funny coincidence 3456 characters. Very HappyVery HappyVery Happy

Considering the fact that at least the following triggered exceeds 50 problem...

Human Paladin 85 O
Human Paladin 85 A
Human Warrior 85 A
Night Elf Druid 85 A

We are already talking about adding multiple minutes to checks as the location list would be a long one.

I look forward to hearing how your trial version turns out, but I am afraid we are heading to a direction where censusing prime time populations of high population servers will become counterproductive as the census will approach and possibly even exceed one hour in length.

The tricky thing is that basically all realms that are older than about one to two weeks are extremely top heavy. One can easily assume that between roughly 58 to 82 % of all characters seen during a single run are of the current max level of 85.

Furthermore, when an expansion is activated, the problem area temporarily expands to cover all levels up to the new maximum level. This was already evident with Cataclysm. Even with the current coding, some realms triggered census runs exceeding 45 minutes... I may recall wrong, but I think I pulled at least one exceeding 54 minutes...

On a further consideration, only adding a very limited amount of locations might be an option: Stormwind, Orgrimmar, (Mount) Hyjal (or perhaps rather Firelands?), (Tol) Barad and Dalaran would already give fairly good coverage.

But this is turning into a wall of text and my tired brain is starting to sputter, so I better take a break...

EDIT: Already spotted some massive typos, must get sleep... ZZZZ.
Back to top
View user's profile Send private message
FuxieDK



Joined: 22 May 2008
Posts: 448
Location: Copenhagen, DK
WR Updates: 2,571,239
FuxieDK WR Profile

PostPosted: Fri Nov 04, 2011 3:14 am    Post subject: Reply with quote

We can also cross our fingers, and hope that Blizzard increase/remove the 49/50 limit on /who

I know it sounds like Utopia, but it's actually not that unlikely..

Until mid-WotLK, AddOns could only poll the AH, one page (50 auctions) at a time, but suddenly they added a GetAll feature, which made e.g. Auctioneer ATLEAST 20x faster.

I'm not saying /who should get an GetAll feature (but it would be nice Wink), but if only the limit was increased to e.g. 100, it would help insanely much..

Who/where to lobby for this, I have no idea..
They could change GetPosXY to GetPosXYZ while at it, to enable AddOns to help in 3D fights (Al'akir, Malygos, "last boss in Occulus"), but dungeons is not a priority for THIS forum..
_________________
Doing census on various servers Wink
Back to top
View user's profile Send private message
bringoutyourdead
Forums Admin & general flunky


Joined: 07 Nov 2008
Posts: 535
Location: Silicon Valley
WR Updates: 6,600,430
bringoutyourdead WR Profile

PostPosted: Fri Nov 04, 2011 1:15 pm    Post subject: Reply with quote

1974ER wrote:

/who z-"mount"
/who z-"high"
/who z-"stra"

And they all worked, locating people in multiple locations.
Great that is the last piece of info I need. (I think/hope Confused )

1974ER wrote:
On a high population realm, even a /who z-"mount" n-"a" 85-85 r-"human" c-"paladin" can produce more than 49 hits.

Which is why I picked CensusPlus for my Lua training. I saw that I could learn and maybe even make something a little better.

While changing the coding to allow full Auction House dumps was a high priority for Blizzard (they could make money from that move), changing the limit on who requests has been ignored over and over again on the Blizzard forums.
1974ER wrote:
Which effectively means that to separate by zone, zones would have to identified specifically... and with popular destinations, like Mount Hyjal, Orgrimmar or Stormwind, even that might fail.

In short, I fear adding too many search criteria will severely bog the census down... and that would be really bad...

Agreed... however cycling down to adding a zone selector only happens after the character name selector fails to limit at 49. So the goal is to choose the correct letters that will match up to 49 and not more names at any that point of the census run. And only use zone selector if it can't be helped.. my real hope is they are never used but are available if needed.

1974ER wrote:
Considering the fact that at least the following triggered exceeds 50 problem...

Human Paladin 85 O
Human Paladin 85 A
Human Warrior 85 A
Night Elf Druid 85 A
And this shows the problem, using vowels as a selector isn't good for two reasons.
First they are too common and get too many matches
Second and the greater harm they get too many redundant matches.

A census hit (one character) where the character is new to your census run is good the goal of your run
A census hit where the character has already been seen on the run is bad as it takes just as much time as the good one but is a total waste of time.

I am still gathering names from US and EU realms (currently over 48k names). At some point I'm going to analyze the data to find what is the minimum character selection that will allow me to 'find' all the names in my list with the lowest possible of duplicate finds. That list of letters is what I will propose as the alternate selector. The list of letters maybe more or less then the 15 Rollie currently uses. But even if it is more, it will be faster since it will have limited the number of time waster duplicates.

I am still looking at the code but my gut feeling is that the greatest time usage of CensusPlus is the processing of good or duplicate census hits into the CensusPlus tables. Not the actual request and return of data from the server. I expect that a /who request that returns zero hits is only an issue in that libwho does limit the rate of /who requests to keep below the server process usage that would cause Blizzard to be unhappy.
Back to top
View user's profile Send private message
bringoutyourdead
Forums Admin & general flunky


Joined: 07 Nov 2008
Posts: 535
Location: Silicon Valley
WR Updates: 6,600,430
bringoutyourdead WR Profile

PostPosted: Fri Nov 04, 2011 6:42 pm    Post subject: All I need is tomato sauce Reply with quote

Just spent a solid +4 hours on the CensusPlus.lua code.. what fun.

Since this is an established program with a number of revisions I expected to find it with some spaghetti features. I wasn't disappointed. Laughing

There are plenty of lines of unused code..
workarounds no longer needed
unfinished ideas
debug stubs
etc.

There are lines of code to work with addon's that haven't existed in years
There are lines of code to handle data format changes (WoW v2.1)

While the above tends to make the code harder to follow and maintain.. keeping some of it makes sense considering we can always expect Blizzard to make api call changes on us.

There is code added to make friendly with other addons --- Wholib.. where the published linkage is ignored and the call dives deeply into the library internals.
And while this code works... it doesn't really work the way we thought it might. By adding the Wholib function. CensusPlus doesn't break other addons that also use the who api. But we probably haven't realized that CensusPlus isn't protected from the other addons!!!
Currently we send Wholib a query.. wait for Wholib to signal an event back to CensusPlus then on getting the event proceed to ignore the results data sent back and just recall the who results list from the server.

Background.. Who requests are queries to the server that are later replied to by the server.. upon getting the reply is ready signal, you send multiple requests to the server to get the data that has been gathered for you. Due to the variable delay on getting the results from the server.. you can get the following
CensusPlus asks for data
other addon asks dor data
server says data ready for CP (actually server says data is ready.. it is wholib who keeps track of what addon asked for it.)
Server says data ready {for other addon}
CensusPlus gets data.. but it isn't CP's data it is the other addon's data since comes from the same bucket.

The Wholib code was added by the maintainer of Wholib because CensusPlus which spams who requests was interfering with programs like WIM and {Pratt(?)also written by the Wholib maintainer}. I am not sure why the published interface wasn't used. Maybe it was to bypass a problem with a feature of Wholib. {for example Wholib will locally cache who data and give you that cached data unless you tell it not to.} Something to look are more closely.

Besides the issue of data collision mentioned above.. which is a very low probability issue until someone else writes an addon that bangs hard on /who. The other problem is that the wholib spends processing time gathering all the data we want.. then it is ignored.. and we spend processing time getting it again in CensusPlus. This I expect is an observable time cost on high pop servers. But I still expect the major time cost is spent stepping through the local data each time we get a character as we need to decide if it is someone we have already seen this census run or someone to add to the data.
Back to top
View user's profile Send private message
1974ER
Epic Censi


Joined: 07 Nov 2008
Posts: 723

WR Updates: 22,670,394
1974ER WR Profile

PostPosted: Sat Nov 05, 2011 5:39 pm    Post subject: Reply with quote

Urghs... examining a small part of the verbose mode of census on EU-Silvermoon...

Human paladins at level 85 blew 50 on R, S, O, I, E, A, etc...

EU-Argent Dawn:

Human paladins at level 85 blew 50 on R, E and A, at least.
Human warriors at level 85 blew 50 on A, at least.

You are going towards a brick wall, I am afraid. Due to the structure of languages involved, the lowest common denominators are going to be the letters you don't want to use, like A, E and R.

To be blunt, in light of your current research so far, the best accuracy Census would use all of the 26 "basic" letters excluding Q and adding .

And the fastest... would only contain 7 letters: A, E, I, O, U, Y and (for additional French compatability), because most names contain at least one vowel. To increase accuracy consonants should be added in this order of preference (based on your post at 35k names): R, N, L, S, T, H, D, M...

Human paladins are not the only problem point, but they are the most common one on the alliance side, on the horde side the problem is focused on blood elf paladins. People like paladins, humans and blood elves and paladins have a VERY restrictive racial list, which compounds the problem.

Quote:

"A census hit (one character) where the character is new to your census run is good the goal of your run
A census hit where the character has already been seen on the run is bad as it takes just as much time as the good one but is a total waste of time."

This is a problem, as it's not true most of the time. Why? Because almost nobody submits after every single census they run. A census hit is only "good" if it produces new data, in other words a character never seen before, one that has leveled up or joined/left a guild.

A hit on character that has already been seen, can be "good", if the character has changed since it was last counted, whether it was 10 s or 10 days ago.

I'll try another experiment even though my characters aren't 85s.

Triggers (now, no vowels, no vowels + no R, S or N):

(5, 3, 2), (5, 3, 1), (4, 2, 1), (7, 4, 1), (5, 2, 1), (4, 2, 1), (3, 2, 1), (5, 2, 1), (6, 3, 2) and (4, 2, 1) for first 10 I checked...

On those cases, cutdowns would ease on strain... Problem is there are cases like these among mine:

(2, 0, 0), (3, 1, 0) and (4, 2, 0)... that is they could "cease to exist" for checks at max level.

If the program(s) you are using can handle it... you should check what sort of percentages of names consists of either vowels only (fairly common?) or consonants only (very rare?)... and if technically possible, special characters only (even more rare?). Almost all of my characters have fairly long names... short names are more likely to get missed and the chance for that goes up quickly, if the number of checks is cut down... another late nighter, so off to bed, more thoughts later on...
Back to top
View user's profile Send private message
bringoutyourdead
Forums Admin & general flunky


Joined: 07 Nov 2008
Posts: 535
Location: Silicon Valley
WR Updates: 6,600,430
bringoutyourdead WR Profile

PostPosted: Mon Nov 07, 2011 9:16 pm    Post subject: Reply with quote

1974ER wrote:
To be blunt, in light of your current research so far, the best accuracy Census would use all of the 26 "basic" letters excluding Q and adding .

Yes, I have always agreed that is the case. The more letters, the more accurate at catching characters. But unfortunately at some point the improvement of accuracy fails to be worth the cost of processing the census.

1974ER wrote:
And the fastest... would only contain 7 letters: A, E, I, O, U, Y and (for additional French compatability), because most names contain at least one vowel. To increase accuracy consonants should be added in this order of preference (based on your post at 35k names): R, N, L, S, T, H, D, M...

And here is where I have my doubts/issues. When running CensusPlus, what process steps eat the most time (and can that time be reduced?) The three areas I see as potential process runtime costs are:
1)Actual query and wait for response from Blizzard server. {not much can be done with this.}
2Programmed delays built into both CensusPlus and Wholib (potentially as much as 15 seconds for every query.) {delays can probably be reduced or eliminated}
3)Process runtime used to walk the local CensusPlus database to add/update each character found. {Code can be changed to make more efficient.}

I think there is some confusion between us in one regard. When I say "a Census run", I mean from the time CensusPlus auto starts (or is manually started) running until the last character has been enumerated and CensusPlus goes to wait state for next auto start. This has nothing to do with submitting to the website. (just to make sure we are on the same page.)
1974ER wrote:
"A census hit (one character) where the character is new to your census run is good the goal of your run
A census hit where the character has already been seen on the run is bad as it takes just as much time as the good one but is a total waste of time."

This is a problem, as it's not true most of the time. Why? Because almost nobody submits after every single census they run. A census hit is only "good" if it produces new data, in other words a character never seen before, one that has leveled up or joined/left a guild.

A hit on character that has already been seen, can be "good", if the character has changed since it was last counted, whether it was 10 s or 10 days ago.
Very much agree a record of change on a character is by definition good.

But here is an example of a bad hit. Character "Raenilousty" This character is top level common class and race so census needs to go to name letter splitting.
Letter A found .
Searched for current realm in Local Database (LDb). create pointer to realm table. Search for current faction in local realm subset of LDb, create pointer to faction table, Search for current race in local faction subset of LDb, create pointer to race table. Search for current class in local race subset of LDb, create pointer to class table. Search for Character in class subset of LDb. Create/Add Character entry with all data and current server timestamp.
Letter E found.
Do the above all over again..again create Character entry with all data and new current server timestamp.
...
and repeat for a complete eleven (different letter selector) times since currently CensusPlus has no code to limit repeat processing of the same single found name in the /who request. And at this point your database has for this one name eleven entries with eleven timestamps and all the same data. All this for one actual found /who data match. sigh.

1974ER wrote:
If the program(s) you are using can handle it... you should check what sort of percentages of names consists of either vowels only (fairly common?) or consonants only (very rare?)... and if technically possible, special characters only (even more rare?). Almost all of my characters have fairly long names... short names are more likely to get missed and the chance for that goes up quickly, if the number of checks is cut down... another late nighter, so off to bed, more thoughts later on...
Yes, these are some of the things I will be looking at.

Here is a count from 48542 name so far
name size = count found at that size
1 = 0
2 = 71
3 = 600
4 = 2547
5 = 6454
6 = 9653
7 = 8863
8 = 7196
9 = 5219
10 = 3674
11 = 2484
12 = 1773
13 = 0
Back to top
View user's profile Send private message
1974ER
Epic Censi


Joined: 07 Nov 2008
Posts: 723

WR Updates: 22,670,394
1974ER WR Profile

PostPosted: Tue Nov 08, 2011 9:01 am    Post subject: Reply with quote

Quote: "But unfortunately at some point the improvement of accuracy fails to be worth the cost of processing the census."

Agreed, but in light of your research, the cut off point in terms of accuracy lies between and Q. The tricky part, of course, is that the efficiency cut off point comes much earlier.

Quote: "I think there is some confusion between us in one regard. When I say "a Census run", I mean from the time CensusPlus auto starts (or is manually started) running until the last character has been enumerated and CensusPlus goes to wait state for next auto start. This has nothing to do with submitting to the website. (just to make sure we are on the same page.)"

I want to be on the same page too... So here goes: Yes, it does. Census 20 minutes, submit, census 20 minutes, submit is automatically superior to census 40 minutes, submit. This assuming all censusing is done on a single faction. The latter method guarantees that the character is added / updated maximum once. The former method means that a character might be added during submission one, updated during the second, or in case of an "old" character, updated twice.

Furthermore, very high likelyhood of this: Census x 6 x 2,5 minutes = less efficient than census x3 x 5 minutes = less efficient than census x 1 x 15 minutes. Again, assuming single faction.

Then, to your example name... Firstly, you are now assuming we are using at least 11 very common triggers. This is bad assumption, as your own data suggests average name length is roughly between 5 and 9 letters total. Names exceeding 10 letters are VERY unlikely to contain no multiples of any letter, I think. In light of that, the most efficient census would contain roughly 9 triggers (which is a lot let than the current 15). Also according to your data again, names exceeding 12 or lacking 4 letters are very rare. The very short, rare names have the highest propability to get caught if there are lots of triggers.

The long end causes the following: From efficiency point of view: Maximum of 12 triggers should be used. From accuracy point of view, those 12 (or less) triggers should cover the most common letters to guarantee the highest likelyhood of at least one trigger working on any given name.

Now, again from your data: We need an absolute minimum of 4 triggers to maximise the chance of getting at least one hit on 4 letter name. Also, if we went for a ridiculously low number of triggers, it would be absolutely vital that those triggers generate as much data as possible. At only 4 triggers, picking something along the lines of "A, E, R, N" would make the most sense.

I'll use my own 39 active characters (even though they are not 85s) for demonstration purposes:

"High" probability triggers "A, E, R, N": 38/39 found. Note that we already got one fail, despite my characters only having "normal" letters in their names.

"Medium" probability triggers "D, U, M, K": 28/39 found. Not horrible, but accuracy is suffering.

"Low" probability triggers "W, V, X, ": 5/39 found. Now the accuracy has gone utterly to hell.

"Very low" probability triggers (with extra Scandinavian, German and French flavour): ", Q, , ": 0/39 found.

Now, to the second meaty part: For me, personally, having just 3 triggers would suffice: "A, E, O" and every single one of my 39 play characters would be hit at least once. But... and this is important: Eliminate all vowel triggers and the situation changes a lot: I now need the absolute minimum of following: "R, M, N" and of those three, R and N are the common "blow" 50 consonant triggers. Now, if I eliminate them as well...

After that I need "M, D, S, T, G, L" to catch all 39. The problem with excluding common triggers is that the overall need of triggers goes up. And the more we eliminate the common triggers, the faster the need to add even more triggers grows.

Achievieng both efficiency and accuracy is question of compromise, since at certain points increasing one will automatically decrease the other and vice versa. Additionally, the efficiency and accuracy cut off points are really far away from each other. Maximum efficiency cut off point is at 4 letters (max. 200 hits), maximum accuracy at 26 letters (any name with normal letters excluding Q and including has a theoretical chance of blowing 50).

This post is getting really long and I have spent a long time working on it every now and then... so I am going to post it and think more for later...
Back to top
View user's profile Send private message
1974ER
Epic Censi


Joined: 07 Nov 2008
Posts: 723

WR Updates: 22,670,394
1974ER WR Profile

PostPosted: Wed Nov 09, 2011 5:10 pm    Post subject: Reply with quote

Ummm... I did some further checking and testing on very high population factions and I am sorry to report that your plan to eliminate "blowing" triggers just got blown.

If you want to eliminate "exploding" triggers, you can't use the following at all:

A, B, C, D, E, I, O, R, S, T, U (H, L, M, N)

And should strongly avoid all of the following:

F, G, P (K)

Leaving us with:

Y (J, Q, V, W, X, Z, )

The letters in ()s are currently not used by CensusPlus. Reducing the trigger list to Y only would be utterly stupid. However, even adding J, Q, V, W, X, Z and doesn't significantly increase accuracy. Especially with Q being marginal to begin with and common only on French realms.

Also, after quite some testing I am also confirming the following: The processing time is not tied to amount of hits. I experienced several cases where 7-13 characters took considerably longer to process than 45+.

In light of this and my earlier thoughts I suggest only minimal refinements be made, if desired, with following main alternatives:

1) Drop triggers F, G, P and Y: Faster census with least loss of accuracy.

2) Implement 1) and add triggers H, L, M and N instead: No significant speed change, increased census accuracy.

3) Add H, L, M and/or N to current triggers: Sacrifice time to boost accuracy even further.

As a possible alternative, though a radical one:

4) Remove the letter of name check entirely and check for a limited number of location based triggers instead, along the lines I sketched in my post on the 3rd.

Personally, I am in favour of option 3, but that's just personal taste. Letter level triggering doesn't happen much, if at all on low population factions, is fairly rare on medium level factions and even on very high population factions tends to concentrate on a fairly short list of combinations on maximum level: night elf druids, human mages, paladins and warriors, blood elf paladins.

And when MoP is released, the racial spread changes unavoidably. Humans and blood elves will likely remain the most played races, but they will be less prevalent in terms of both percentages and actual characters online at any given time. Pandaren monks might eventually form a block resembling human and blood elf paladins (on level 90), but even if it happens it's not a significant problem.

On an additional note... adding both a new race and a new class does mean that census lengths will increase somewhat, because there will be a new racial check as soon as any level exceeds 50 characters and new class based checks for all races that can become monks. If I am not mistaken, there will be several of those.
Back to top
View user's profile Send private message
bringoutyourdead
Forums Admin & general flunky


Joined: 07 Nov 2008
Posts: 535
Location: Silicon Valley
WR Updates: 6,600,430
bringoutyourdead WR Profile

PostPosted: Wed Nov 09, 2011 7:22 pm    Post subject: Reply with quote

62808 character names (have to split total names to <64K due to machine/software restrictions.. sigh)
letter, A,E,R,I,N, _ O,L,S,T,H, _ D,U,M,K,C, _ G,Y,B,P,Z, _ F,W,V,X,J, _
,Q,,,, _ ,,,,, _ ,,,,, _ ,,,,, _ ,,,,, _
,,,,, _ ,
rank, 1,2,3,4,5, _ 6,7,8,9,10, _ 11,12,13,14,15, _ 16,17,18,19,20, _ 21,22,23,24,25, _
26,27,28,29,30, _ 31,32,33,34,35, _ 36,37,38,39,40, _ 41,42,43,44,45, _ 46,47,48,49,50, _
51,52,53,54,55, _ 56,57
count,37078,30635,30146,27222,25616, _ 23763,22487,21480,19491,14858, _ 13678,13544,13322,12132,10276, _ 9668,9343,8193,7313,5731, _ 5638,4717,4455,3318,2901, _ 1149,781,717,670,664, _ 647,541,504,454,409, _ 372,306,263,250,242, _ 232,220,209,193,164, _ 162,136,124,108,107, _
100,72,66,54,42, _ 35,0

Name length distribution - and some other interesting stats.
Minimum allowed 2 characters, Maximum allowed 12 characters
2 = 101 = 0.16% - - - - 68 with zero standard vowel (aeiouy), 30 with one standard vowel, and 3 with both letters as vowels.
3 = 802 = 1.27% - - - - 247 with zero, 420 with 1, 127 with 2, and 8 that are all vowels.
4 = 3271 = 5.21% - - - - 341 with zero, 1568 with 1, 1235 with 2, 128 with 3 and 1 with all vowels.
5 = 8210 = 13.07%
6 = 12313 = 19.60%
7 = 11380 = 18.12%
8 = 9251 = 14.73%
9 = 6845 = 10.90%
10 = 4915 = 7.82%
11 = 3372 = 5.37%
12 = 2344 = 3.73%


Last edited by bringoutyourdead on Wed Nov 09, 2011 8:31 pm; edited 3 times in total
Back to top
View user's profile Send private message
bringoutyourdead
Forums Admin & general flunky


Joined: 07 Nov 2008
Posts: 535
Location: Silicon Valley
WR Updates: 6,600,430
bringoutyourdead WR Profile

PostPosted: Wed Nov 09, 2011 7:23 pm    Post subject: Reply with quote

Dataset source . CP most server submission lists (most censused server/factions) and WowHead Guild Profiler (lists max 200 guilds.. sorted for top membership guilds.. used >=200 members as cutoff.)
US realms
Cenarius - Alliance - 200 guilds listed - 21 guilds used
Scarlet Crusade - Horde - 175 guilds listed - 15 guilds used
EU realms
EU - English - Scarshield Legion - Horde - 99 guilds - 6 used
EU - English - Trollbane - Alliance - 200 guilds - 16 used (wowwiki {W-W:} notes High Dutch and Swedish player counts)
EU - French -Les Sentinelles - Horde - 125 guilds - 6 used
EU - French - Eitrigg - horde - 134 guilds - 13 used
EU - German - Die%20Todeskrallen - Horde - 136 guild - 8 used
EU - German - Die%20Todeskrallen - Alliance - 72 guild - 4 used
EU - Spanish - C'thun - Horde - 200 guilds - 16 used EU - Anachronos - alliance - 200 guilds - 18 used {W-W: Some UK, Dutch, Scandinavian and Polish}
EU - khadgar - Horde - 200 guilds - 21 used {W-W: UK, Scandinavian, eastern Europe, Russian}
EU - Kilrogg - horde - 200 guilds - 16 used {W-W: A very high Dutch population and some from UK}
EU - Bloodfeather - horde - 200 guilds - 16 used {W-W: Many Scandinavian and Portuguese. Horde has many Chinese}
EU - Bloodscalp - alliance - 189 guilds - 3 used (hit systems limit) {W-W: Very high Hungarian population}
Back to top
View user's profile Send private message
1974ER
Epic Censi


Joined: 07 Nov 2008
Posts: 723

WR Updates: 22,670,394
1974ER WR Profile

PostPosted: Thu Nov 10, 2011 3:42 am    Post subject: Reply with quote

Very interesting, and mostly as expected. 2 character names are ultra rare and low on standard vowels, but as soon as the name has 3 or more, likelyhood of at least one vowel is high:

3 = 555 / 802 = 69,20 %
4 = 2930 / 3271= 89,58 %

In light of this, vowel triggers are absolutely necessary. However, it would be interesting to see a more detailed analysis of the 68 + 247 + 341 names that did NOT contain any standard vowels. What's the ratio between consonants, non-standard consonants and non-standard vowels and also are R, N, L, S and T the top five consonants used within them as your overall data slightly suggests?
Back to top
View user's profile Send private message
bringoutyourdead
Forums Admin & general flunky


Joined: 07 Nov 2008
Posts: 535
Location: Silicon Valley
WR Updates: 6,600,430
bringoutyourdead WR Profile

PostPosted: Thu Nov 10, 2011 4:32 pm    Post subject: Only looking n-letter selections Reply with quote

Frequency efficiency (Rollie's letter selection order)
Using 62804 Characters from the realms/guilds listed in previous message
total names found 62586 number of duplicate entries found and stored in database 204876
Names not found 218

Duplicate is defined as a name found with previous letter selectors and current letter selector.
Duplicate counts and percentage valid only for order of selectors as shown.

-- count of new names found -- Percent new found with letter -- running total percentage - duplicate names found again -- percentage duplicates -- dup vs. find
A = 37077 - - - 59.04%
B = 3428 -- -- 5.46% -- 64.49% -- 4765 -- 7.59% -- 1.39:1
C = 3637 -- -- 5.79% -- 70.29% -- 6639 -- 10.57% -- 1.83:1
D = 3825 -- 6.09% -- 76.38% -- 9853 -- 15.69% -- 2.58:1
E = 7898 -- 12.58% -- 88.95% -- 22735 -- 36.20% -- 2.88:1
F = 702 -- 1.12% -- 90.07% -- 4936 -- 7.86% -- 7.03:1
G = 1150 -- 1.83% -- 91.90% -- 8518 -- 13.56% -- 7.41:1
I = 2449 -- 3.90% -- 95.80% -- 24772 -- 39.44% -- 10.12:1
O = 1277 -- 2.03% -- 97.83% -- 22486 -- 35.80% -- 17.61:1
P = 185 -- 0.29% -- 98.13% -- 7128 -- 11.35% -- 38.53:1
R = 368 -- 0.59% -- 98.71% -- 29778 -- 47.41% -- 80.92:1
S = 270 -- 0.43% -- 99.14% -- 21209 -- 33.77% -- 78.55:1
T = 112 -- 0.18% -- 99.32% -- 19379 -- 30.86% --173.03:1
U = 116 -- 0.18% -- 99.51% -- 13427 -- 21.38% --115.75:1
Y = 92 -- 0.15% -- 99.65% -- 9251 -- 14.73% --100.56:1

Student: (Oh wait.. that is me!)
If the same selector letters are used in a different order will that impact total found vs. not found: answer no!
If the same selector letters are used in a different order will that impact duplicate counts and percentages (show numbers in your answer):
NO!, wait YES!, wait Maybe? Confused That will require time and another posting. Twisted Evil Wink
Back to top
View user's profile Send private message
1974ER
Epic Censi


Joined: 07 Nov 2008
Posts: 723

WR Updates: 22,670,394
1974ER WR Profile

PostPosted: Thu Nov 10, 2011 7:04 pm    Post subject: Reply with quote

Stop! Halt! Seis! And so on!!!

You can't apply frequency math to the names like that!!! Why? Because the same names have a chance of repeating on multiple realms. In fact, you would be hard pressed to find a realm that does NOT have a Legolas, etc. Trying to find multiple characters called... say, Undowar (one of mine) would be much less likely. Both are seven characters long, have 3 vowels, etc, but they still share only 2 triggers.

Also, the individual check by letter checks start from the end and finish off with As.

Not to mention the fact that even the same letter has a variable efficiency dependant on the realm... Any check for would be far more effective on a French faction than an English one, same would apply for on German factions. Even the famous letter A isn't equally frequent on all realms!

And even further, provided that there are sufficiently many characters of specific type about... let's say we have 104 human paladins of level 85 online. 85 of the paladins have a name with A(s). 65 of them also have E(s), so there is overlap. In addition there are Z, Zoyx, Wii and Quu.

Now, Census makes the first check and blows 50 on race, class and level. This first check may detect 0-4 of the "special" characters. This is where the trigger selection becomes important.

If we only use trigger "A", we will find 50 results for a grand total of 50 to 89. If we use triggers "A" and "E", we'll get 50 the "E"s too, with a theoretical maximum of 104 hits total. If we only use trigger "Z", we are guaranteed two hits and up to 50 randoms. But if we look for "A, E, I, O, U and Y" we will have a chance to find everyone except Z and even she might be found accidentally by the first check.

The thing is... the less triggers there are, that much more important it's to include "A" and "E" among them to maximise accuracy. If we intended to use a huge number of triggers, we might get away with not using "A" and / or "E"... but we don't want lots of triggers.

Granted, I am a firm proponent of accuracy and some of your ideas suggest that you would be willing to sacrifice accuracy to speed things up...

Anyway, I have been awake too long again, so I am stopping for now. :/
Back to top
View user's profile Send private message
bringoutyourdead
Forums Admin & general flunky


Joined: 07 Nov 2008
Posts: 535
Location: Silicon Valley
WR Updates: 6,600,430
bringoutyourdead WR Profile

PostPosted: Mon Nov 14, 2011 9:07 pm    Post subject: Reply with quote

1974ER wrote:
the individual check by letter checks start from the end and finish off with As.
Ah you are correct Smile I forgot the queues are created as LIFO (last in first out) Embarassed .. but since I planned to play with order changes just too see what happens.. it isn't a big loss.

As far as the impact of duplicate names... it is surprisingly small... of the 62804 names in my lists, only 669 duplicate names are found or 1.1%. This is such a small amount.. I decided to ignore the small error that would be added to my percentages. (I may rethink this, but not until after I have looked through the various letter combos and orders)

Concerning your comment in regards to letter efficiency vs realm/language. Well yes that is given.. but meaningless.. since we only have one version of CensusPlus to be used on any/all realm. (Nor would I even want to suggest trying to maintain different versions!)


1974ER wrote:
but we don't want lots of triggers.
Not necessarily. As CensusPlus is now, Yes - the more triggers the slower it runs. But if the logic is changed and built in delays eliminated or at least decreased we may find that we could have more triggers and faster processing.

1974ER wrote:
Granted, I am a firm proponent of accuracy and some of your ideas suggest that you would be willing to sacrifice accuracy to speed things up...

No that isn't the case.

My 1st goal is to learn Lua programming, XML and event driven object programming.
My 2nd goal is to investigate if there is anyway to (1st improve the accuracy of the census, 2nd to make the process of gathering the data faster.)

As I stated at the beginning of this exercise, I'm going at it from the "feeling" that what CensusPlus uses as triggers might not be the best.
I may find that after all is said and done, that Rollie did pick the best triggers.
Back to top
View user's profile Send private message
bringoutyourdead
Forums Admin & general flunky


Joined: 07 Nov 2008
Posts: 535
Location: Silicon Valley
WR Updates: 6,600,430
bringoutyourdead WR Profile

PostPosted: Mon Nov 14, 2011 9:27 pm    Post subject: Re: Only looking n-letter selections Reply with quote

bringoutyourdead wrote:
Student: (Oh wait.. that is me!)
If the same selector letters are used in a different order will that impact total found vs. not found: answer no!
If the same selector letters are used in a different order will that impact duplicate counts and percentages (show numbers in your answer):
NO!, wait YES!, wait Maybe? Confused That will require time and another posting.


Reverse order (correct order)
Y = 9343 -- 14.88%
U = 11894 -- 18.94% -- 33.81% -- 1649 -- 2.63% -- 0.13:1
T = 12886 -- 20.52% -- 54.33% -- 6605 -- 10.52% -- 0.51:1
S = 8484 -- 13.51% -- 67.84% -- 12995 -- 20.69% -- 1.53:1
R = 9851 -- 15.69% -- 83.53% -- 20295 -- 32.31% -- 2.06:1
P = 1191 -- 1.90% -- 85.42% -- 6122 -- 9.75% -- 5.14:1
O = 3278 -- 5.22% -- 90.64% -- 20485 -- 32.62% -- 6.25:1
I = 2993 -- 4.77% -- 95.41% -- 24228 -- 38.58% -- 8.09:1
G = 439 -- 0.70% -- 96.11% -- 9229 -- 14.69% -- 21.02:1
F = 208 -- 0.33% -- 96.44% -- 5430 -- 8.65% -- 26.11:1
E = 1146 -- 1.82% -- 98.26% -- 29487 -- 46.95% -- 25.73:1
D = 215 -- 0.34% -- 98.61% -- 13462 -- 21.43% -- 62.32:1
C = 152 -- -- 0.24% -- 98.85% -- 10124 -- 16.12% -- 66.32:1
B = 133 -- -- 0.21% -- 99.06% -- 8060 -- 12.83% -- 60.60:1
A = 372 - - - 0.59% -- 99.65% -- 36705 -- 58.44% -- 98.67:1

And we find as predicted.. The number of names found by the selector doesn't change.. since the selector done change .. order is irrelevant
And we find as (hmmm.) .. The total number of duplicate counts doesn't change.. therefore again order is irrelevant.

Further data mining nuggets:
Length of names that were missed:
7 letters = 4 names
6 letters = 9 names
5 letters = 33 names
4 letters = 70 names
3 letters = 73 names
2 letters = 29 names

Most common letters in the missed names: (A..Z only)
L = 76 names
N = 66 names
K = 47 names
X = 45 names
M = 43 names
Z = 35 names
J = 29 names
H = 27 names
W = 18 names
V = 17 names
Q = 10 names

10 Names have no common letters (A..Z)

So what if we change selectors and use the top 15 most frequent letters as found in my names list?
This will have to be the next posting.
Along with the most common name (single unique spelling) found.
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    WarcraftRealms.com Forum Index -> CensusPlus UI Mod Bugs All times are GMT - 6 Hours
Goto page Previous  1, 2, 3  Next
Page 2 of 3

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
WarcraftRealms.com  


Powered by phpBB © 2001, 2005 phpBB Group