Servers in C/C++: Handle POST Multipart and X-Form in C Server

Often, uh, very often, writing a POST reader/handler is a sort of embarrassing predicament. Some browsers send the POST content along with the request (in the request's body); some send them exceptionally as a separate chunks of bytes; some combine the two strategies depending on the POST message size. Until all the browsers will implement new types of POST messages (like JSON), there are three standard types of Content-Type header:

❶ multipart/form-data: the values are sent as blocks (chunks of data) separated by the boundary string, which is defined in the Content-Type header;

❷ application/x-www-form-urlencoded: like the querystring in GET method, the values ought to be url-decoded, since the key-value pairs are separated by "&" and inside by "=" reserved characters;

❸ text/plain: this is a bulk unencoded transmission, without boundaries, just plain text. It is particularly useful when you transmit texts of any size, be it a JSON, a Base64-encoded image or a combination of them in one single JSON structure (object).

*there is a note at the bottom regarding the memory issues and the appropriate strategies.

Considering the above, here is the implementation:

1. Multipart

void readMultipart(struct _request *H, const char *ctype, size_t len, void (*fileHandler)(struct _request *, char *buf, int status)) {
 const char *boundary = "boundary=";
 char *buf = H->ioBuffer, *p, *bry = (char*)strstr(ctype, boundary) + strlen(boundary);
 bry -=2; bry[0]=bry[1]='-';
 int conn = H->conn, bytes, bryLen = strlen(bry), fileOngoing = 0;
 char *file, *name;
 char timeStampt[17]; time_t local; // timestampt

 printf("\nPOST Boundary: [\e[32m%s\e[0m] (%d bytes)\n\n", bry, bryLen);

 if(H->body) if(!parseSection(H, H->body, bry, '\1')) return;

 while( (bytes = read(conn, buf, BUF_SIZE)) >= 0 ) {
  buf[bytes] = '\0';
  printf("\t\t\t[%d bytes]\n[%s]\n", bytes, buf);
  if(!parseSection(H, buf, bry, '\0')) return;
 }
}

The readMultipart function gets the boundary string, looks if there is content in the message body, and then loads the chunks of data and send them to the parser (parseSection).

int parseSection(struct _request *H, char *buf, char *bndry, char ifInMemory) {
 int bLen = strlen(bndry), totSec = 0;
 char *sec[20];
 while(sec[totSec] = strstr(buf, bndry)) {
  if(totSec>0) sec[totSec][-2] = '\0';
  buf = sec[totSec] + bLen;
  totSec++;
 }

 for(int i=0; i<totSec; i++) printf("\t\t\tSEC #%d:\n[%s]\n", i, sec[i]);

 if(totSec == 0) {  // might be a file
  printf("FILE:\n[\e[33m%s\e[0m]\n", buf);
  return 2;
 }

 for(int i=0; i<totSec; i++) {
  char *p = sec[i] + bLen;
  if(*p == '-' && *(p+1) == '-') { printf("FINAL\n"); return 0; } // final
  p += 2;
  char *val = strstr(p, "\r\n\r\n"); *val = '\0'; val += 4; // points to Value
  char *filename = extract(p, " filename="), *name = extract(p, " name=");
  printf("> %d: [%s] = [%s]\n", i, name, val);
  if(ifInMemory) addInReqList(H, name, val, (filename && *filename)?filename:NULL);
  else addInReqList(H, addToMem(H, name), addToMem(H, val), (filename && *filename)?addToMem(H, filename):NULL);
 }
 return 1;
}

The parseSection analyses every chunk of data (section), whether from the message body or loaded as a new block, and sets the fields values. In case the field is a file, the function just prints it in the terminal. In your applications, you may save the files to the disk or implement another fileHandler (a function passed as parameter to the readMultipart function.

2. X-www-form-urlencoded

The readXForm function parse the POST body. Since it is url-encoded, the function sends the content to the getQuesries() function for decoding and adding to the queries lists.

void readXForm(struct _request *H, const char *ctype, size_t len) {
 char *buf = H->ioBuffer, *beg = H->buffer + H->MemLen + 1;
 size_t bytes, totBytes = 0;
 while( (bytes = read(H->conn, buf, BUF_SIZE)) >= 0 ) {
  buf[bytes] = '\0'; totBytes += bytes;
  //printf("\t\t\t[%d/%d bytes]\n[%s]\n", (int)bytes, (int)len, buf);
  if((H->MemLen + bytes + 1) >= BUF_SIZE) ERR("Not enough memory.");
  strcpy(H->buffer + H->MemLen + 1, buf);
  H->MemLen += bytes+1;

  if(totBytes >= len) break;
 }
 if(*ctype == 'a') getQueries(H, beg);
 else H->body = beg; // for plain text
}

The extract functions "extracts" the name, value and filename from the Multipart section.

char *extract(char *pool, const char *fish) {
 char *x = strstr(pool, fish); if(!x) return NULL;
 x += strlen(fish)+1;
 int i=0; while(x[i] && x[i]!='\"') i++; x[i]='\0';
 return x;
}

The addToMem function is meant for a specific effect. In few words: when you read the request into your buffer (by readSocket function), as rule, you read less than the buffer's size. The remaining part can be used to add the key-value pairs extracted from the X-form, Plain Text and Multipart fields. In case these additions exceed the buffer size, the error "Not enough memory" is issued, suggesting you must enlarge the buffer, or reallocate if it was malloc-ed.

char *addToMem(struct _request *H, char *p) {
 int l = strlen(p) + 1;
 if((H->MemLen + l) >= BUF_SIZE) ERR("Not enough memory.");
 char *ret = strcpy(H->buffer + H->MemLen + 1, p);
 H->MemLen += l;
 return ret;
}

3. Text/Plain

If the POST Content-Type is text/plain, then you just place it in the body. It requires no handling at this stage, since it contains the unmodified text, as mentioned above.

* * *

The memory issue

Ordinarily, you provide a buffer for GET requests. It could be of 1Kb, 2Kb or larger. In case that the readSocket reads only a part of the pending message, you may realloc memory and keep reading the message to the end. It is relatively easy to manage. The POST message however may attempt to transport huge size of data, and here a specific strategy is in order to be adopted by the programmer.

⚑

In the first place, such a strategy must address the files. In the above examples, the files are bluntly printed on the terminal screen. They could be save on disk as well. But even the fields of the Multipart and X-Form requests could be of considerable size, which wouldn't allow you keeping them in the initial buffer meant for GET requests. To summarise, the programmer, as per the objectives and expectations of the specific application, has to choose and implement a specific strategy for handling memory consuming requests. Especially, when it comes to POST Multipart, X-Form and Plain/text submissions.

*The full code source can be seen here.

Servers in C/C++

11 February 2014

Handle POST Multipart and X-Form in C Server

1. Multipart

2. X-www-form-urlencoded

3. Text/Plain

The memory issue

No comments:

Post a Comment