Skritter | Words with three or more characters

Newer Topic Created 14 years ago Older Topic

Words with three or more characters

YouJing October 30th, 2011 7:11a.m.

For words with three or more characters I have to manually go in and change it to don't know if I don't know them. If I just press the correct button, v, 1 or the down arrow, it get's marked as so-so. This means that it will be added as a word that I learned. And every time that I review that word and still don't know it, it will be counted as a word that I forgot (thus messing up my retention rating).
Is there any way around this?

Cheers,
Julius

nick October 30th, 2011 11:06a.m.

I have an idea of some Bayesian math we can do to better determine whether it's the word that you'd likely have forgot or the character itself, and then to do the grading appropriately. I started to do it when implementing the iOS app, but got sidetracked. If I eventually figure out the math, I can add it to the Flash version as well, and then it should be a lot smarter about stuff like that. (You might still have to do manual overrides sometimes.)

mike_thatguy October 30th, 2011 11:20a.m.

I'll also be really happy when you guys figure out the math to do this! I tend to add a lot of multi-character words...

InkCube October 30th, 2011 11:55a.m.

If this changes anything: I don't like that skritter seems to assume that I need to have forgotten a character to forget a word.

e.g. 建议 (which I always mix up with 意见) or words in general where I know perfectly well what characters are in a word or at least how to write that character, but can't remember the order.

Now I know that I can set the characters both to 'correct' and toggle the word to 'wrong' but that always feels like a work-around. If you just mark the characters correct, it automatically jumps to the next word and you have to go back to tell it that you got the word wrong so you have to do it in the right order.

It just doesn't make sense to me -- just because I forget 得不得 doesn't mean I can't write either character practiacally in my sleep.

YouJing October 30th, 2011 12:16p.m.

For writing, tone and pin-yin prompts it makes sense that it marks them as so-so sometimes. But for definition-prompts I don't think it makes a lot of sense at all. And I can't even use the grading buttons, if I press 1 or down it still marks it as so-so. Why? I can't think of any situation where I would want a definition-prompt that I don't know to be marked as so-so.

Antimacassar October 30th, 2011 7:41p.m.

I was also recently thinking about this. It's annoying that Skritter thinks that I don't know that 当 has 2 tones and keeps making me review it (among others). Of course it could be targeting characters I'm not sure of as well, I imagine it must be tough to figure out which are which.

One small point though, I just wondered if, when you, say, learn a new multi-character word on a HSK list that contains characters that you know, and since it's a new word you would most likely get it wrong, does this affect the rate at which you study the individual characters? I assume that it doesn't, but if it does then this would be one thing that could be changed that would be a logical step to take. You could also make it that if you got it wrong the 2nd time then it affects the rate of the individual character.

GrandPoohBlah October 30th, 2011 11:00p.m.

Oh, so that's what's going on? I always mark a character incorrect when I don't know a word since it's the fastest way to mark the word wrong; I never realized I was marking both the character and the word incorrect. That would explain why my progress stats claim that I've forgotten characters while learning new words with familiar characters.

FatDragon October 31st, 2011 4:52a.m.

@youjing - You have a point there. Personally, if I forget one character in a multi-character word, I typically leave it at so-so, since I got the others right. Marking it wrong would probably be the right thing to do, but I prefer to reward myself a little for getting it mostly right :D

@inkubus - I feel the same way to some extent, but I can't personally think of a better way to do it. I'm sure the Skritter team would be happy to consider a viable alternative to the current system if someone suggested one, but it seems to me like there are three options:
(1) Default to marking the character and word as forgotten
(2) Default to only marking the word forgotten
or
(3) Add a step in which the user is directly asked for input on whether it's a character issue or a word issue.

(3) is just a more obtrusive way of doing what anybody can already do by self-grading, so I would toss that out. (1) and (2) are the alternatives, but I feel like (2) would require some tweaking of the interface to pull off, since currently you have one set of grading buttons (if they're turned on) and they mark both character and word wrong (and the word score can be manually changed afterwards, giving the user full control over grading both character and word, albeit not in the most graceful manner), whereas if the grading buttons only marked the word wrong, you would need a way to indicate whether or not you knew the character as well.

YouJing October 31st, 2011 6:01a.m.

For the definition-prompts there are no way to grade the individual characters, one can only use the grading buttons to grade the whole word, but giving it a 1 will still mark it as so-so.

FatDragon October 31st, 2011 9:10a.m.

Sounds like something that needs to be changed. When you click the buttons in the actual study box (if you've got the enabled), it seems to work just fine.

nick October 31st, 2011 11:20a.m.

The math I described above can fix all of the complaints y'all are having--you would just leave wrong characters as wrong, even if you knew the character by itself, and Skritter would figure it out. I think it'll work out well.

YouJing, I don't understand how you're triggering the so-so grade for definition prompts. My tests all show a "forgot" (1) grade as appropriate with the definition prompts. Can you describe in detail the steps you're taking to get a so-so, and how the so-so grade is displayed?

YouJing October 31st, 2011 1:14p.m.

Nick, I sent an example to your mail.

jww1066 November 4th, 2011 8:44p.m.

@nick I ran into a possibly-related problem just now. I was studying tones for 看不惯 and got one of the tones wrong (for 看). I saw that Skritter marked the whole word as so-so, so I went into the popup to ban the single characters in question. (In the past this has only happened when I have had some of the individual characters in "My Words".) The popup showed 看 with a green check mark and the other two characters with blue plus signs, so I went to the popup for 看 to ban it, but it was already banned.

So, to summarize, I see two problems:

-- When I go to the popup for 看不惯, why does it show 看 with a green check if it's been banned?

-- When I get a single tone wrong for 看不惯, why does it mark it as so-so when I'm not studying any of the individual characters? It should mark the whole word wrong.

James

nick November 5th, 2011 8:57p.m.

I'll send Scott a bug case for that first problem; that should probably be showing up as banned.

I've written code to mark the whole word wrong when you're not studying the individual sub-character, instead of doing the so-so, but it looks like it's not working in this case. I am not sure if it's my code, or something wrong with that item where it's not properly disabled by the banning.

Since you ban a lot of individual characters, can you tell whether that code normally works--marking words as wrong instead of so-so if you get one character wrong (which you're not studying) in a word of 3 or more characters?

YouJing, thanks for that bug report. I'll get it on my list.

jww1066 November 5th, 2011 11:54p.m.

Yeah, normally I ban the individual characters and that solves the problem with the words getting marked so-so.

James

Thorondor November 6th, 2011 8:23a.m.

I actually would be interested in the math, and (if I have time) might be able to contribute. My study interest at University is artificial intelligence and I had my share of math during my years. To be honest I am pretty interested in the whole Skritter-Math, since I guess it is using algorithms that are somewhat connected with AI.

nick November 6th, 2011 11:49a.m.

Thorondor, the math I'm talking about is really just a straightforward application of Bayes' Theorem to the competing hypotheses of which parts you know and which you've forgotten, given the evidence of what responses you gave. Our priors of how well you know each part are just what the spaced repetition tells us about how due they are (for your target retention index).

So here's an example with the writing prompts in 蘑蘑 with made-up numbers. Say that 蘑菇 is 200% due (0.83), 磨 is 100% due (0.9), 菇 is 50% due (0.94).

H1: we know 蘑菇, 蘑, 菇 - prior 0.83 * 0.9 * 0.94 = 0.70218
H2: we know 蘑菇, 蘑 - prior 0.83 * 0.9 * 0.06 = 0.04482
H3: we know 蘑菇, 菇 - prior 0.83 * 0.1 * 0.94 = 0.07802
H4: we know 蘑菇 - prior 0.83 * 0.1 * 0.06 = 0.00498
H5: we know 蘑, 菇 - prior 0.17 * 0.9 * 0.94 = 0.14382
H6: we know 蘑 - prior 0.17 * 0.9 * 0.06 = 0.00918
H7: we know 菇 - prior 0.17 * 0.1 * 0.94 = 0.01598
H8: we know - prior 0.17 * 0.1 * 0.06 = 0.00102

Event A is getting 蘑 right.
Event B is getting 蘑 wrong.
Event C is getting 菇 right.
Event D is getting 菇 wrong.

So I started doing the math here to work out what would happen, and then much later I realized that my hypotheses weren't right, because I needed hypotheses for knowing/not knowing it was 蘑 in the word and also knowing/not knowing it was 菇 in the word, rather than just having one hypothesis for "knows which characters are in the word" and "doesn't know which characters are in the word".

The end result would hopefully generalize to some simple code I could use to say things like, "It's probable that the user forgot how to write 蘑, so let's mark both the word and that character wrong", or "The user knows 菇 pretty well but 蘑菇 not very well, so let's not submit that review for 菇--we'll just mark 蘑菇 wrong and leaving 菇's scheduling where it is (except spaced out a little bit as usual)." This is more important on the iPhone/iPod version, where there's little opportunity to see the character-specific details in the prompt, so it's harder to tell whether the user has forgotten the character or just that it's this character in this word.

The math was taking too long, so I just put in a nice heuristic for now in the iOS version: only submit the character if it's at least 70% as due as the word. Works okay.

There's plenty of more interesting math in Skritter that we've actually done, too, like for the scheduling and the handwriting recognition.

jww1066 November 6th, 2011 5:07p.m.

@nick it happened again just now with 看不慣. Again 看 seems to be the problem. I suspect the problem is that, even though it's banned, it still shows up in My Words. There's no option on the My Words page to delete it, only to ban it.

jww1066 November 6th, 2011 6:38p.m.

@nick the problem with doing that kind of computation is that "磨 is 100% due (0.9)" doesn't actually mean that the prior probability of the user knowing how to write 磨 is 0.9; instead this is the prior probability of the user knowing how to write 磨 given the Skritter prompt for 磨. For example, suppose the user is studying two different characters with very similar prompts, and let's say he actually knows how to write both characters. Then it could be that the probability of getting the individual characters right when they are tested in isolation is close to 0.5 just because of confusion.

When doing Bayesian inference, you would need to come up with an estimated probability that the user knows how to write the character that would somehow take into account *all* the ways that the user studies writing that character, in order to distinguish "knows how to write the character" from "knows how to answer this particular prompt".

There is a very interesting discipline called Item Response Theory which can be used to come up with this kind of estimate, but the math is quite involved. Take a look here:

http://en.wikipedia.org/wiki/Item_response_theory

James

nick November 7th, 2011 9:49a.m.

If he knows how to write both characters and it's just a matter of interference from the prompts being similar, then he'll probably mark himself right when he writes it. The model does require some human intervention. I think it is reasonable to expect users to say, "I knew that one," and to be thinking in terms of knowing 蘑 rather than knowing 蘑 for a particular prompt, as long as it's easy to indicate that to Skritter (the character-level grading buttons, in this case).

I definitely don't want to get into IRT with this.

jww1066 November 7th, 2011 10:34p.m.

@nick OK, now I know something's screwed up with 看 - even though it's banned, it showed up for tone study just now.

scott November 8th, 2011 11:17a.m.

I'll work on this bug.

This forum is now read only. Please go to Skritter Discourse Forum instead to start a new conversation!

create an account

recover an account

Words with three or more characters