Modify

Opened 9 years ago

Closed 9 years ago

Last modified 9 years ago

#12603 closed enhancement (fixed)

Alternative to GET for crash reports.

Reported by: michael2402 Owned by: stoecker
Priority: normal Milestone: 16.04
Component: Trac Version:
Keywords: trac Cc:

Description

I am working on improving the crash report dialog and enhancing it with more data (Primitive-ID where rendering crashed, ...).
https://github.com/michaelzangl/josm/tree/better-bug-report

Even when using gzip, I quickly reach the 2k character limit for URLs (at least what most browsers support) with a simple stack trace.

Is it possible to add a short script to Trac so that I can send a Post request to an URL, get a token and simply pass that token on to the new ticket site (sort of an internal text pasting).

The API can be kept simple:

POST https://josm.openstreetmap.de/dump
<some long text as body, type: text/plain>

response:

abcdefghi

Then we redirect the user to this URL:
https://josm.openstreetmap.de/josmticket?dump=abcdefghi

Attachments (1)

prepareticket.py (6.1 KB ) - added by michael2402 9 years ago.

Download all attachments as: .zip

Change History (22)

comment:1 by stoecker, 9 years ago

The "/josmticket" URL already supports POST. Thought it does not support to store and retrieve the ticket data like you suggest. Sounds like a working idea, but is probably not so easy on the server side. It needs to store the data somewhere in the database (outside the current session as this does not survive), add cleanup code for old entries and retrieval code (id is preferred over dump :-)

If you can create the python code for trac yourself that would help (I can give you the current code and help a bit), otherwise chances are low that this will be implemented soon.

comment:2 by bastiK, 9 years ago

See #8571 - Enhance JOSM bug report system

The task is very similar to URL shortening (tinyurl and the like).

comment:3 by michael2402, 9 years ago

I will work on it. The code should not be long, something like http://posttestserver.com but using a database instead (and adding some size/count constraints to get avoid users spaming the system)

in reply to:  3 comment:4 by stoecker, 9 years ago

Replying to michael2402:

I will work on it. The code should not be long, something like http://posttestserver.com but using a database instead (and adding some size/count constraints to get avoid users spaming the system)

That sounds like you want to do that outside Trac. That's wasted time as I will never accept that. Trac already provides the complete infrastructure, the database and everything needed and an implementation outside of a Trac function means additional maintenance effort for the admins (it's already complex enough as it is) and thus will be rejected.

You can either expand the existing python trac script or wait until someone else does it.

comment:5 by michael2402, 9 years ago

I am writing a trac plugin. I create a table in the normal trac database.

This is how it works:

It is already working but depends on Trac 1.1 API for creating the table and table scheme updates (if ever).

Creating the table and setting this up in a clean way is more difficult in 1.0, since the API is implemented in the 1.1 branch [1].

Are you planing to upgrade trac or should I use the old API?

[1] https://trac.edgewall.org/ticket/11893

comment:6 by stoecker, 9 years ago

Hi,

Sounds still much too complicated:

  • There is no need for a new database, you can store it in the session: See e.g. here for storing data in the session: https://trac.edgewall.org/browser/plugins/1.0/spam-filter/tracspamfilter/captcha/api.py
  • Don't use multiple entry points, one is enough:
    • pass storeid=1 to store the data in the already established format (See below) with the POST - return a simple ID or XML code with the ID. XML may be better expandable.
    • call the GET with id=... to retrieve the data and redirect to the final target
    • The ID should somehow represent the session, otherwise you don't find the data again
    • Either store only one dataset per session or allow more than one and the ID should consist of session and secondary ID
  • Add cleanup code (see also spamfilter example) - there the session information also contains a timestamp which is later used for cleanup

Implementation seems straightforward for me except that I do not know how I reconstruct an old session from the session ID or how to get the session ID. If session reset is not possible a direct SQL access may help. ID retrieval should hopefully be easy though until now I never needed that.

Current JOSM code:

class JosmTicket(Component):
    """ Handle call to '/josmticket' to create josm specific ticket
    """

    handlers = ExtensionPoint(IRequestHandler)

    implements(IRequestHandler)

    def match_request(self, req):
        return req.path_info == '/josmticket'
    def process_request(self, req):
        return # never called

    def process_base64_data(self, data):
        lens = len(data)
        lenx = lens - (lens % 4 if lens % 4 else 4)
        try:
            return urlsafe_b64decode(data[:lenx])
        except:
            return ""

    def process_tdata(self, data):
        data = "==== What steps will reproduce the problem?\n" \
               + "1. \n" \
               + "2. \n" \
               + "3. \n" \
               + "\n" \
               + "==== What is the expected result?\n\n" \
               + "==== What happens instead?\n\n" \
               + "==== Please provide any additional information below. Attach a screenshot if possible.\n\n" \
               + "{{{\n" + data + "\n}}}\n"
        return "keywords=template_report&description="+quote(data)

    implements(IRequestFilter)
    def pre_process_request(self, req, handler):
        if(req.path_info == '/josmticket'):
            if(req.method == 'POST'):
                req.environ['REQUEST_METHOD'] = 'GET'
                args = req.read()
            else:
                args = req.query_string
            data = re.compile("gdata=([A-Za-z0-9_=-]+)").search(args)
            if data != None:
                data = self.process_base64_data(data.group(1))
                data = self.process_tdata(zlib.decompressobj(15+32).decompress(data))
            else:
                data = re.compile("tdata=([A-Za-z0-9_=-]+)").search(args)
                if data != None:
                    data = self.process_base64_data(data.group(1))
                    data = self.process_tdata(data)
                else:
                    try:
                        data = re.compile("data=([A-Za-z0-9+/=]+)").search(args).group(1)
                        lens = len(data)
                        lenx = lens - (lens % 4 if lens % 4 else 4)
                        data = b64decode(data[:lenx])
                    except:
                        data = ""

            req.environ['QUERY_STRING'] = data
            req.environ['PATH_INFO'] = '/newticket'
            for newhandler in self.handlers:
                if isinstance(newhandler, TicketModule):
                    handler = newhandler
        return (handler)

    def post_process_request(self, req, template, content_type):
        return (template, content_type)

    def post_process_request(self, req, template, data, content_type):
        return (template, data, content_type)

comment:7 by michael2402, 9 years ago

I cannot use the session. The first request is done by JOSM. Then the web page is opened in the normal web browser. They don't share a session.

An alternative would be to use the HMTL module of Swing to display the ticket web page. But Swing is missing many features.

Why aren't you using req.args?

in reply to:  7 ; comment:8 by stoecker, 9 years ago

Replying to michael2402:

I cannot use the session. The first request is done by JOSM. Then the web page is opened in the normal web browser. They don't share a session.

The JOSM call creates an anonymous session. When you know the ID of this on the next call then you can find the data. Whether later that's the same Trac session or not is of no importance. That's why I said the returned ID must include information about the session.

Also data in anonymous sessions is dropped from time to time manually, so that is also good for stuff forgotten in cleanup code for this.

Why aren't you using req.args?

Either because

  • of a reason I don't remember or
  • this was designed before reg.args existed or
  • I didn't know of it when this code was made.

in reply to:  8 comment:9 by michael2402, 9 years ago

Replying to stoecker:

Replying to michael2402:

I cannot use the session. The first request is done by JOSM. Then the web page is opened in the normal web browser. They don't share a session.

The JOSM call creates an anonymous session. When you know the ID of this on the next call then you can find the data. Whether later that's the same Trac session or not is of no importance. That's why I said the returned ID must include information about the session.

I didn't know that. That makes it a lot easier, especially because I can simply not send any session cookie. So I only need the old session id to pull that data and can throw away most of the code. Thanks.

I used a normal post to make it easier on the server side. I also used base64 for now because I am unsure on how well Trac handles Unicode data in requests.

This is what I came up with:

Trac Plugin:

from trac.core import *
from trac.web import IRequestHandler, IRequestFilter
from trac.web.session import DetachedSession
from base64 import b64decode

class AnonymousDetachedSession(DetachedSession):
    """Access to non-authenticated session storage.""" 

    def __init__(self, env, sid):
        super(AnonymousDetachedSession, self).__init__(env, None)
        self.get_session(sid) 

class PreparedTicketComponent(Component):
    implements(IRequestHandler, IRequestFilter)

    def match_request(self, req):
        return req.path_info == '/prepareticket'

    def process_request(self, req):
        # can only be /prepareticket.
        text = req.args.get('text')
        try:
            text = b64decode(text);
        except TypeError:
            req.send("Could not decode base64", 'text/plain', 400)
            return

        if len(text) > 100000:
            req.send("Cannot store that much text", 'text/plain', 400)
            return

        req.session['prepareticket'] = text;
        req.send(req.session.sid, 'text/plain')

    def pre_process_request(self, req, handler):
        if req.path_info == '/newticket':
            # Simulate that the request contained the ticket text.
            sid = req.args.getfirst('prepared_id')
            if sid != None:
                req.args['description'] = self.getText(sid);
        elif req.path_info == '/prepareticket' and req.method == 'POST':
            # Convince trac that this is not a CSRF attack
            req.form_token = 'x'
            req.args['__FORM_TOKEN'] = 'x'
        return handler

    def getText(self, sessionId):
        session = AnonymousDetachedSession(self.env, sessionId)
        return session.get('prepareticket', 'Could not retrive text.')

    def post_process_request(self, req, *args):
        return args

JOSM test code:

                String text = getDebugText();
                String postQuery = "text=" + Base64.encode(text, true);
                HttpClient client = HttpClient.create(new URL("http://localhost:8000/trac/prepareticket"), "POST")
                        .setHeader("Content-Type", "application/x-www-form-urlencoded")
                        .setRequestBody(postQuery.getBytes(StandardCharsets.UTF_8));

                Response connection = client.connect();
                if (connection.getResponseCode() != 200) {
                    throw new BugReportSenderException("Could not connect to josm server.");
                }

                try (Scanner s = new Scanner(connection.getContentReader())) {
                    return s.useDelimiter("\\A").next();
                }

Feel free to add your thoughts on it - especially the parameter names.

I can integrate it into JosmTicket tomorrow. I would then also change JosmTicket to use the args syntax.

comment:10 by stoecker, 9 years ago

Seems good on first review. Some remarks:

  • I'd use simply XML as return like "<josmticket status="ok"><ticketid>...</ticketid><josmticket>", easier for future enhancements or errors or ... --> You can also add a XML in error case, which probably makes display of error to the user easier.
  • retrieve misses an "e"
  • Please use only one entry point "/josmticket", separate the functionality with the parameters
  • Cleanup code is missing (store the time together with the text and when calling the function check if there are old entries (more than 1/2 hour) in any session to be dropped). I think spamfilter example should have database code for that already
  • pdata instead of text? "pdata" for post-data :-)

comment:11 by michael2402, 9 years ago

I attached a file to this ticket with the source code.

I restructured the code to prevent us from having the same code multiple times.

I changed the response to contain an XML string. I use xml.sax for encoding in hope that this is installed on the server. I also added the cleanup code. It cleans up the database when a new code is submitted. We may have some old strings laying around if there are no bug reports with this method for several days, but this should not be a problem.

I also extracted the if(gdata) ... section to a new method and used process_base64_data everywhere

I tested gdata, tdata and normal data parameters but feel free to run your own tests.

I had an odd experience with unicode texts after adding the plugin author names to the debug text. It seems to be working now that I added a .decode('utf-8', errors='ignore') and unicode characters were added correctly.

If you agree to the changes, I would get started on integrating this in normal JOSM so that we (hopefully) do not run into the 2k character limit even if we add more data to stack taces.

by michael2402, 9 years ago

Attachment: prepareticket.py added

comment:12 by stoecker, 9 years ago

On short first view it looks good except two points:

  • The cleanup code isn't good. See the example of SpamFilter: A single SQL statement does the work, that's faster and less error prone. If you leave out the two AND with "captcha_verified" blocks it should be able to copy that 1:1. You can also drop the lastclean stuff - ticket function is called seldom unlike the captcha stuff, so the cleanup can be done each time.
  • Also SAX is not needed I think. trac.util.escape should be enough

Also I'd order the newest data transmission variant front instead last. :-)

I will have a deeper look maybe on the weekend.

comment:13 by michael2402, 9 years ago

from trac.core import *
from trac.web import IRequestHandler, IRequestFilter
from trac.web.session import DetachedSession
from trac.ticket.web_ui import TicketModule
from base64 import b64decode, urlsafe_b64decode
import time, zlib, sys
from trac.util import escape

class AnonymousDetachedSession(DetachedSession):
    """Allows access to non-authenticated session storage.""" 

    def __init__(self, env, sid):
        super(AnonymousDetachedSession, self).__init__(env, None)
        self.get_session(sid) 

class JosmTicket(Component):
    """ This class handles calls to '/josmticket' to create a josm specific ticket.
        This can also be used to create a ticket from inside JOSM.

        It allows you to send in debug data as text (base64 or gzipped) or to store them for later retrival with a retrival key.

        Author of pdata-storage: Michael Zangl with help of stoecker
    """

    handlers = ExtensionPoint(IRequestHandler)

    implements(IRequestHandler)

    def match_request(self, req):
        return req.path_info == '/josmticket'

    def process_request(self, req):
        self.cleanup_pdata()

        # can only be /josmticket to store pdata
        pdata = req.args.get('pdata')
        try:
            pdata = b64decode(pdata);
        except TypeError:
            self.send_josm_error(req, "Could not decode base64")
            return

        if len(pdata) > 100000:
            self.send_josm_error(req, "Cannot store that much text")
            return

        req.session['preparedticket'] = pdata;
        req.session['preparedticket_time'] = int(time.time());
        self.send_josm_ticket(req, {'preparedid': req.session.sid})

    def send_josm_error(self, req, message):
        self.send_josm_ticket(req, {'error': message}, 'error')

    def send_josm_ticket(self, req, data, status = 'ok'):
        items = ["  <%(k)s>%(v)s</%(k)s>\n" % {'k': str(escape(key)), 'v' : str(escape(value))} for key, value in data.items()]
        req.send('<?xml version="1.0" encoding="UTF-8"?>\n'
               + '<josmticket status="ok">\n'
               + ''.join(items)
               + '</josmticket>', 'text/xml', 200 if status == 'ok' else 400)
        

    def process_base64_data(self, data, urlsafe = True):
        lens = len(data)
        lenx = int(lens) / 4 * 4 # round down
        try:
            func = urlsafe_b64decode if urlsafe else b64decode
            return func(unicode(data[:lenx]).encode('ascii'))
        except:
            return "(Error decoding base64)"

    def process_tdata(self, data):
        data = "==== What steps will reproduce the problem?\n" \
               + "1. \n" \
               + "2. \n" \
               + "3. \n" \
               + "\n" \
               + "==== What is the expected result?\n\n" \
               + "==== What happens instead?\n\n" \
               + "==== Please provide any additional information below. Attach a screenshot if possible.\n\n" \
               + "{{{\n" + str(data) + "\n}}}\n"
        return data

    implements(IRequestFilter)
    def pre_process_request(self, req, handler):
        if(req.path_info == '/josmticket'):
            if req.method == 'POST' :
                if req.args.getfirst('pdata') != None:
                    #pdata store request.
                    # Let us handle this, convince trac that this is not a CSRF attack
                    req.form_token = 'x'
                    req.args['__FORM_TOKEN'] = 'x'
                    return handler;
                req.environ['REQUEST_METHOD'] = 'GET'

            description = self.get_description_from_args(req)

            # This also seems to fix the unicode issues.
            req.args['description'] = self.process_tdata(description).decode('utf-8', errors='ignore')
            req.args['keywords'] = 'template_report'
            req.environ['PATH_INFO'] = '/newticket'
            for newhandler in self.handlers:
                if isinstance(newhandler, TicketModule):
                    handler = newhandler
        return (handler)

    def get_description_from_args(self, req):
        sid = req.args.getfirst('pdata_stored')
        if sid != None:
             return self.get_pdata(sid);

        gdata = req.args.getfirst('gdata')
        if gdata != None:
            gdata = self.process_base64_data(gdata)
            try:
                return zlib.decompressobj(15+32).decompress(gdata)
            except:
                return "Error decompressing compressed data."

        tdata = req.args.getfirst('tdata')
        if tdata != None:
            tdata = self.process_base64_data(tdata)
            return tdata

        data = req.args.getfirst('data')
        if data != None:
            return self.process_base64_data(data, False)

        return ""

    def get_pdata(self, sessionId):
        session = AnonymousDetachedSession(self.env, sessionId)
        return session.get('preparedticket', 'Could not retrieve text.')

    def post_process_request(self, req, *args):
        return args

    def cleanup_pdata(self):
        DELETE_AFTER = 6 * 60 * 60 #6h
        etime = int(time.time() - DELETE_AFTER)
        with self.env.db_transaction as db:
            db("""
                    DELETE FROM session_attribute 
                    WHERE (name = 'preparedticket' OR name = 'preparedticket_time')
                    AND sid IN (SELECT * FROM (
                        SELECT sid FROM session_attribute WHERE name = 'preparedticket_time' AND """ + db.cast('value', 'int') + """ < %s
                    ))""",
                    (etime,))

comment:14 by Don-vip, 9 years ago

Milestone: 16.03

comment:15 by Don-vip, 9 years ago

Summary: Alternative to GET for crash reports.[Patch] Alternative to GET for crash reports.

comment:16 by stoecker, 9 years ago

Applied.

@Michael: I'll roast you slowly if it breaks the server...

Ok, old style submission still works on first test. :-)

comment:17 by michael2402, 9 years ago

New style does not, I should have tested the new cleanup code with MySQL and not just SQlite

     [java]   File "/usr/local/lib/python2.7/dist-packages/Trac-1.0.9-py2.7.egg/trac/db/util.py", line 72, in execute
     [java]     return self.cursor.execute(sql_escape_percent(sql), args)
     [java] ProgrammingError: subquery in FROM must have an alias
     [java] LINE 4:                     AND sid IN (SELECT * FROM (
     [java]                                                       ^
     [java] HINT:  For example, FROM (SELECT ...) [AS] foo.

A quick fix (also not tested with MySQL):

        with self.env.db_transaction as db:
            db("""
                    DELETE FROM session_attribute 
                    WHERE (name = 'preparedticket' OR name = 'preparedticket_time')
                    AND (sid IN (SELECT * FROM (
                        SELECT sid FROM session_attribute WHERE name = 'preparedticket_time' AND """ + db.cast('value', 'int') + """ < %s
                    ) AS todelete))""",
                    (etime,))

comment:18 by stoecker, 9 years ago

Summary: [Patch] Alternative to GET for crash reports.Alternative to GET for crash reports.

Quick fix applied. Seems to work at least in psql direct call. NOTE: Testing mysql also will not help, we use postgres :-)

comment:19 by michael2402, 9 years ago

Thanks. It seems to be working fine now. I added the client code in #12652.

comment:20 by stoecker, 9 years ago

Resolution: fixed
Status: newclosed

comment:21 by Don-vip, 9 years ago

Milestone: 16.0316.04

Milestone renamed

Modify Ticket

Change Properties
Set your email in Preferences
Action
as closed The owner will remain stoecker.
as The resolution will be set.
The resolution will be deleted. Next status will be 'reopened'.

Add Comment


E-mail address and name can be saved in the Preferences .
 
Note: See TracTickets for help on using tickets.