Caching for a multithreaded dictionary can be done w/o locks

#1

this is just for discussion.

given solution: https://github.com/epibook/epibook.github.io/blob/master/solutions/java/concurrency/S2.java

the given solution locks to make sure that String wLast and String[] closestToLastWord is always read/written at the same time.

and since that’s all we need to guarantee for the given question, we can write a very simple solution without any locks.

if you write a wrapper class for both variables, e.g.

 private static class Response {
  public final String query;
  public final String[] suggestions;
  // and a constructor...
}

and because reads and writes are atomic for reference variables in Java per doc: https://docs.oracle.com/javase/tutorial/essential/concurrency/atomic.html

Reads and writes are atomic for reference variables and for most primitive variables (all types except long and double).

also, if we want additional guarantee that the latest update to s_lastResponse is visible to other threads, we’ll use volatile keyword

any write to a volatile variable establishes a happens-before relationship with subsequent reads of that same variable. This means that changes to a volatile variable are always visible to other threads.

so then… this should suffice w/o locks.

class S3 extends SpellCheckService {

private static class Response {
  public final String query;
  public final String[] suggestions;

  public Response(String query, String[] suggestions) {
     this.query = query;
     this.suggestions = suggestions;
  }
}

// adhoc weird naming just to distinguish it from local vars
// note use of volatile to guarantee happens-before relationship.
private static volatile Response s_lastResponse = null;

public static void service(ServiceRequest req, ServiceResponse resp) {

  String query = req.extractWordToCheckFromRequest();
  
  // read of reference objects are atomic per java doc: https://docs.oracle.com/javase/tutorial/essential/concurrency/atomic.html
  // (makes sense since it's just a pointer read)
  // even if s_lastResponse gets overwritten by another thread immediately afterwards, we still have the old variable.
  Response r = s_lastResponse;
  
  if (query.equals(r.query)) {
     resp.encodeIntoResponse(r.suggestions);
  } else {
     // otherwise, create a new Response instance
     r = new Response(query, Spell.closestInDictionary(query));
     resp.encodeIntoResponse(r.suggestions);
     
     // then we'll save it.
     // this operation is atomic per java doc: https://docs.oracle.com/javase/tutorial/essential/concurrency/atomic.html
     // so even if multiple threads are doing this at once, whoever doing it last will overwrite it successfully.
     s_lastResponse = r;
  }
 }
}

of course, if the question was any more complicated concurrency-wise (i.e. caching last 100 response instead of just 1), we would’ve needed locks, but given the simple requirement of this particular question…

0 Likes

#2

I was about to say there’s a race possible between

if (query.equals(r.query)) {

and

 resp.encodeIntoResponse(r.suggestions);

however since as you pointed out r is volatile and assignments are atomic, this is not an issue.

under the constraint that you cannot allocate new objects you will need synchronization, and also for more complex caches (multiple entries), but i cannot fault your current solution. i will think of a way of phrasing the question that precludes your solution.

thanks for bringing this to our attention, and appreciate your clear exposition with its concrete references, it’s a model for the kind of interaction we like to see.

cheers,
adnan

0 Likes

#3

I thought more about this example, there’s a couple of things we can do.

FIrst, we can explicitly disallow the use of new, e.g., by saying “Cache the most recently computed result using two static fields—a string, and an array of closest strings.”

Alternately, we can provide the unsafe solution below:

    public class S1Alternative extends SpellCheckService {
      static final int MAX_ENTRIES = 3;
      static LinkedHashMap<String,String[]> cachedClosestStrings = new LinkedHashMap<String,String[]>() {
        protected boolean removeEldestEntry(Map.Entry eldest) {
          return size() > MAX_ENTRIES;
        }
      };
    
      public static void service(ServiceRequest req, ServiceResponse resp) {
        String w = req.extractWordToCheckFromRequest();
        if (cachedClosestStrings.containsKey(w)) {
          resp.encodeIntoResponse(cachedClosestStrings.get(w));
          return;
        }
        String[] closestToLastWord = Spell.closestInDictionary(w);
        cachedClosestStrings.put(w, closestToLastWord);
      }
  }

This has a race (containsKey() and get()) and unsynchronized puts into a LinkedHashMap will lead to corruption. I don’t believe there’s a concurrent LinkedHashMap, which in any case would not solve the problem of race between containsKey*( and get().

The solution we had before would work (synchronize the if and the put()). There’s no atomic write object that can be applied here.

Let me know your thoughts,

bye,
Adnan

0 Likes

#4

i think either proposal would work great.

i like the second proposal a little bit more because, when i first read the original question, i mistakenly thought i was also supposed to implement closestInDictionary() function too (or do i?). with the second proposal, no one would misunderstand the intent of the question. but i still think the first one would work just as great.

0 Likes

#5

just to let you know, we have updated this problem (using the second approach), thank you so much for pointing it out, especially since it’s the segway into the chapter. any other inputs, esp on concurrency, will be greatly appreciated,

cheers,
adnan

ps - we have a java beta of EPI ready, if you are interested in getting a copy, please write to me (adnan.aziz@gmail.com) and we’ll ship one out to you

0 Likes

#6

syntax error… private static class ?

0 Likes

#7

private static is fine since it is used internally for those other members I think.

0 Likes