How to Upload Files to Tomcat Server
Similar it commencement and and then lookout it. Form a skillful habit
preface
I saw a question nearly Tomcat in the Q & A area these two days, which is very interesting. I just haven't thought about this earlier. Today I'll talk nigh the "why" in combination with Tomcat mechanism.
In this paper, the analysis of file upload standard and Tomcat mechanism in HTTP protocol is more basic. If y'all don't demand it, you lot can jump to the end of the text directly.
File upload in HTTP protocol
Every bit nosotros all know, HTTP is aText protocol, how does the text protocol transfer files?
Direct transmission… Yes, it's that elementary. Text protocol is merely from the perspective of application layer. When it comes to the transport layer, all information are bytes. At that place is no difference, and there is no demand for boosted encoding and decoding.
Multipart / grade data way
The HTTP protocol specifies aForm based file upload。 Define an enctype attribute in the form with the value of multipart / form data, and then add a file with the type of file<input>
characterization.
<Form ENCTYPE="multipart/class-data" ACTION="_URL_" METHOD=POST> File to process: <INPUT NAME="userfile1" Type="file"> <INPUT Blazon="submit" VALUE="Send File"> </Grade>
This multipart / form information course is somewhat different from the default ten-world wide web-form-urlencoded course. Although they are all used as forms and can upload multiple fields, the former can upload files, while the latter can just transmit text
Now let's take a look at the protocol of the form file upload method. The post-obit figure is a simple multipart / form information type asking bulletin:
Equally tin can exist seen from the above effigy, in that location is no change in the HTTP header part, simply a boundary tag is added in the content blazon, but the payload office is completely different
Boundary is used in multipart / class information to separate multiple fields of the form. In the payload department, at that place is a purlieus in the beginning and last lines, and there will be a boundary between each field (part / item)
When the server side reads, it only needs to get the purlieus from the content type first, and and so split the payload role through the purlieus to go all the fields.
In the message of each field, there is a content disposition field equally the header part of this field. The current field name (name) is recorded. If it is a file, at that place will be a filename attribute, and a content blazon volition exist attached to the next line to identify the file type
Although both ten-world wide web-form-urlencoded and multipart forms tin can transfer fields, multipart can transfer non only text fields, but also files. Moreover, the multipart file transfer method is also "standard", which can be supported by diverse servers to read files directly.
X-www-form-urlencoded tin can but transmit basic text data. However, if you strength the file as text, no 1 can stop you from transmitting it with this blazon, simply when information technology is transmitted as text, the back stop must be parsed in the course of cord. The coding overhead in byte – > STR is completely unnecessary, and may lead to coding errors
In the x-www-form-urlencoded message, if there is no boundary, multiple fields will pass&
Symbol splicing, and URLEncode the key / value
Although x-www-form-urlencoded adds a i-step encoding process, it does not add a header to each field, nor does it have a boundary. The message volume is much smaller than that of multipart.
In addition to this multipart, at that place is likewise a course of straight uploading files, just information technology is not commonly used
Binary payload mode
In addition to multipart / form information, there is too a binary payload upload method. This binary payload is my own name… Considering the description of this method is not establish in the HTTP protocol (if at that place is a connection posted in the boss comment surface area), but many HTTP clients support information technology.
For instance, postman:
For case, okhttp:
OkHttpClient customer = new OkHttpClient().newBuilder() .build(); MediaType mediaType = MediaType.parse("image/png"); RequestBody torso = RequestBody.create(mediaType, "<file contents here>"); Request asking = new Request.Builder() .url("localhost:8098/upload") .method("Mail", body) .addHeader("Content-Type", "image/png") .build(); Response response = customer.newCall(request).execute();
This method is very simple, that is, the whole payload role is used to store file information. As shown in the following figure, the entire payload part is the file content:
Although this method is elementary and the customer implementation is simple, the server does not have adept back up. For example, Tomcat does not care for this binary file as a file, just as an ordinary bulletin.
Assay of Tomcat processing mechanism
When Tomcat processes a bulletin in text class, it volition first read the previous header part and parse the content length to divide the message boundary. The remaining payload office will not exist read at in one case, but wrap an InputStream and call socket read internally to read the RCV_ BUF data(When the full message size is greater than readbuf size)
When calling getparameter / getinputstream on HttpServletRequest and other read operations involving payload, socket RCV in InputStream volition be performed_ Read buf and read payload information.
suchInstead of reading all data at one time and temporarily storing information technology in memory, wrap an InputStream to read RCV internally_ BUF wayThe feature is that information technology does not store data, only only makes a package. The read operation of the awarding layer on the servletrequest#inputstream will be forwarded to the socket RCV_ BUF read.
However, if the awarding layer reads the ServletRequest #inputstream completely, then converts the string and stores it in retention, information technology has nil to practise with Tomcat.
Tomcat has a special processing mechanism for multipart requests. Since multipart is designed to transfer files, Tomcat adds the concept of a temporary file when processing this type of request,When parsing the message, the information in the multipart is written to the disk。
Equally shown in the figure below, Tomcat wraps each field equally a diskfileitem-org.apache.tomcat.util.http.fileupload.disk.DiskFileItem
(this diskfileitem does not distinguish between file and text data). Diskfileitem is divided into header part and content part. Part of the content is stored in retentiveness and the remainder is stored on disk, which is divided past a sizethreshold;However, this value defaults to 0In other words, all contents volition be stored to disk by default.
Since information technology is stored on the disk, it must also be read from the deejay… The efficiency is naturally relatively low. Therefore, if only text letters are transmitted, do non use multipart type. This type will be transferred to disk.
Some other common cold knowledge is that when Tomcat processes multipart messages, if a field is not a file, information technology will add the central / value of this field to the parametermap, that is, these non file fields can be obtained through request.getparameter/getparametermap.
//org.apache.catalina.connector.Request#parseParts if (function.getSubmittedFileName() == null) { String name = office.getName(); String value = null; try { value = part.getString(charset.name()); } catch (UnsupportedEncodingException uee) { // Not possible } ...... parameters.addParameter(proper name, value); }
You should know that this getparameter tin can only obtain form parameters (formparameters) and query parameters (querystring), merely multipart is too a form, and there seems to be nothing wrong with obtaining parameters
A simple summary
Tomcat handles different types of requests:
- If the parameter is in get querystring mode (spell parameter on URL), all parameters are in the bulletin header and will be read to retentiveness at one time
-
If it is a mail type message, Tomcat will only read the header part, and the payload function will not actively read, but packet the socket into an InputStream supply layer read
- Although ten-www-form-urlencoded messages will not be read actively, many web frameworks (such every bit spring MVC) will telephone call getparameter or outset the read of InputStream to RCV_ BUF for reading
- The same is true for the binary payload mentioned above. Tomcat does not actively initiate the read operation. The application layer needs to call servletrequest#inputstream to read the RCV_ BUF data
-
Multipart messages will non be read actively, and parsing / reading will exist triggered only past calling httpservletrequest#getparts; Similarly, many spider web frameworks call getparts, so parsing is triggered
Why write a temporary file outset and wrap the InputStream directly to the application layer for reading?
If the application layer does non (timely) read the RCV_ BUF, and so when the received data is filled with RCV_ When buf, ACK will not be returned, and the data of the customer volition as well be stored in SND_ In buf, information cannot be sent continuously when SND_ When the buf is filled past the application layer, the connection is blocked.
The following reasons are personal opinions without the support of official documents. If you have different opinions, please leave a bulletin in the comment surface area for discussion
Multipart is generally used to transfer files, but the file size is usually much larger than the capacity of the socket buffer. Therefore, in lodge not to cake the TCP connexion, Tomcat will read the complete payload part at one fourth dimension, and then store all the parts in information technology to disk (the header is in memory and the content is on disk).
The application layer just needs to read part data from the diskfileitem provided by Tomcat. In this style, it seems that although it is transferred to the next layer, RCV_ The data in buf can be consumed in fourth dimension.
In terms of efficiency, the operation of transferring + saving disk must be much slower than not transferring, but RCV can exist consumed in time_ BUF to ensure that the TCP connexion is not blocked.
If multiple requests employ the same TCP connection under http2 multiplexing, if RCV_ If buf is not consumed in time, all "logical HTTP connections" volition be blocked
And then why don't other types of letters need to be temporarily stored on disk?
Because the bulletin is small-scale, ordinary asking messages will not exist also big. The common ones are only a few K to dozens of G. moreover, for plain text messages, the reading operation must be timely and read all at once. Unlike multipart messages, it is a combination of text and file, and it may also be multi file.
For example, afterwards receiving the file, the server also needs to transfer the file to the object storage service of some cloud manufacturers. At this time, there are two transfer methods:
- Receive the full file information, shop information technology in retention, and so call the SDK stored by the object.
- In stream style, read servletrequest#inputstream and write to OutputStream of SDK
Mode 1, although RCV was read in time_ BUF, but the memory occupation is likewise large, it is easy to burst the retention, which is very unreasonable
In style ii, although the retentivity consumption is very minor (simply one read buffer at most), RCV will be caused because both sides are networks while reading and writing_ BUF cannot be consumed in time.
Moreover, not only tomcat, but likewise jetty handles multipart in this mode. Although other web servers haven't seen it, I think they will handle information technology in this fashion.
reference resources
- Apache Tomcat
- Form-based File Upload in HTML – IETF
- Tomcat compages analysis by Liu Guangrui
Original is not easy, unauthorized reprint is prohibited. If my commodity is helpful to you, delight similar / collect / pay attention to encourage and support it ❤❤❤❤❤❤
Source: https://developpaper.com/how-does-tomcat-handle-file-upload/
0 Response to "How to Upload Files to Tomcat Server"
Post a Comment