regex: Active questions tagged regex

Affichage des articles dont le libellé est Active questions tagged regex - Stack Overflow. Afficher tous les articles

samedi 25 avril 2015

strip and get the codes in between

please rescue me from this regex nightmare.

Claim Code:
7241B-2HWRXR9-2P2BA
    $1.00

I'm trying to assign it to a variable in php but all the regex and preg_replace i've tried doesn't help me pull out exactly what is in the middle which is: 7241B-2HWRXR9-2P2BA

any kind of help I can get on this is greatly appreciated!

Would Rewriting It Using Regex Shorten/Beautify The Code?

The problem is a little challenging because I want to code it using std::regex believing it would be easier to read and faster to write.

But it seems that I can only code it one way (shown below).

Somehow my mind could not see the solution using std::regex.

How would you code it?

Would using std::regex_search do the job?

/*
input: data coming in:
/product/country/123456/city/7890/g.json

input: url parameter format:
/product/country/<id1:[0-9]+>/city/<id2:[0-9]+>/g.json

output:
std::vector<std::string> urlParams

sample output:
urlParams[0] = "123456"
urlParams[1] = "7890"
*/

bool ParseIt(const char *path, const char* urlRoute, std::vector<std::string> *urlParams)
{
   const DWORD BUFSZ = 2000;
   char buf[BUFSZ];
   DWORD dwSize = strlen(urlRoute);
   urlParams.clear();

   int j = 0;
   int i = 0;
   bool good = false;
   for (i = 0; i < dwSize; i++)
   {
       char c1 = path[j++];
       char c2 = urlRoute[i];
       if (c2 == '<')
       {
           good = true;
           while (c2 != '/')
           { 
               i++;
               c2 = urlRoute[i];
           }
           int k = 0;
           memset(buf, 0, BUFSZ);
           while (c1 != '/')
           {
               buf[k++] = c1;
               c1 = path[j++];
           }
           urlParams->push_back(_strdup(buf));
           int b = 1;
       }
       if (c1 != c2)
       {
           return false;
       }
       if (c2 != '<')
       {
           if (c1 == c1)
           {

           }
           else
           {
               return false;
           }
        }

    }

    if (dwSize == i && good)
    {
        return true;
    }

    return false;
}

python RE findall() return value is an entire string

I am writing a crawler to get certain parts of a html file. But I cannot figure out how to use re.findall().

Here is an example, when I want to find all ... part in the file, I may write something like this:

re.findall("<div>.*\</div>", result_page)

if result_page is a string "<div> </div> <div> </div>", the result will be

['<div> </div> <div> </div>']

Only the entire string. This is not what I want, I am expecting the two divs separately. What should I do?

Matching Barcodes to sequences python?

I have sequence files and barcode files. The barcode files may have barcodes of any length that look like "ATTG, AGCT, ACGT" for example. The sequence files look like "ATTGCCCCCCCGGGGG, ATTGTTTTTTTT, AGCTAAAAA" for example. I need to match the barcodes to the sequences that contain them at the beginning. Then for each set of sequences with the same barcode I have to do calculations on them with the rest of the program (which is written already). I just dont know how to get them to match. Ive went through using print statements and The part where it is messed up is the "potential_barcode = line(:len(barcode)" line. Also, where it says #simple to fasta that is where I should be reading in the matched sequences. I'm pretty new at this so I probably made a lot of mistakes. Thanks for your help!

bcodefname = sys.argv[1]
infname = sys.argv[2]
barcodefile = open(bcodefname, "r")
for barcode in barcodefile:
        barcode = barcode.strip()
        print "barcode: %s" % barcode
        outfname = "%s.%s" % (bcodefname,barcode)
#           print outfname
        outf = open("outfname", "w")
        handle = open(infname, "r")
        for line in handle:
                potential_barcode = line[:len(barcode)]
                print potential_barcode
                if potential_barcode == barcode:
                        outseq = line[len(barcode):]
                        sys.stdout.write(outseq)
                        outf.write(outseq)
                        fastafname = infname + ".fasta"
                        print fastafname
                        mafftfname = fastafname + ".mafft"
                        stfname = mafftfname + ".stock"
                        print stfname
#simp to fasta#
#                       handle = open(infname, "r")
                        outf2 = open(fastafname, "w")
                        for line in handle:
                                linearr = line.split()
                                seqid = linearr[0]
                                seq = linearr[1]
                                outf2.write(">%s\n%s\n" % (seqid,seq))
#                       handle.close()
#                       outf.close()
#mafft#
                        cmd = "mafft %s > %s" % (fastafname,mafftfname)
                        sys.stderr.write("command: %s\n" % cmd)
                        os.system(cmd)
                        sys.stderr.write("command done\n")

How to extract links from a web content?

I have download a web page and I want to extract all the links in that file. this links include absolutes and relatives. for example we have :

<script type="text/javascript" src="/assets/jquery-1.8.0.min.js"></script>

<a href="http://ift.tt/gbk8l4" />

so after reading the file, what should I do?

How to handle x*, x+, or x? regex-like operators in an LR parser?

I have implemented recursive descent and PEG-like parsers in the past, where you could do things like this:

Path -> Segment+
Segment -> Slash Name
Segment -> /
Name -> /\w+/
Slash -> /

where Segment+ means "match one or more Segment"
and there's a plain old regular expression for matching one or more word characters with \w+

How do you typically accomplish this same sort of thing with LR grammars/parsers? All of the examples of LR parsers I have seen are very basic, such as parsing 1 + 2 * 3, or (())(), where the patterns are very simple and don't seem to involve "one or more" functionality (or zero or more with *, or optional with ?). How do you do that in an LR parser generally?

Or does LR parsing require a lexing phase first (i.e. an LR parser requires terminal and nonterminal "tokens"). Hoping that there is a way to do LR parsing without two phases like that.

How to match everything except a particular pattern after/before a specific string constant

ATS(inline, const, unused)
OTS(inline, const, unused)

I'm trying to match inline, const, unused keywords only in ATS macro. i tried ATS([^,]*) but it only matches inline keyword.

Match last occuring enclosing outer brackets

I tried now for three hours to construct the following regex match without much success. I have the following two strings:

This is a test string to illustrate the problem (example) in complex matching logic (Work / not working (in this case) to match this last occurring bracket closure)

and

Simpler version of the string (Matchable in any easy way)

I would like to define a str.match() that matches this last part of the strings above. Resulting in:

Work / not working (in this case) to match this last occurring bracket closure

and

Matchable in any easy way

Any good way to achieve this? Sadly the data is highly volatile that a strong Regex is much rather preferred instead of long functional logic. Thanks so much!

Simplest way to parse a title from an HTML file using PHP functions only, no extra classes

So far I've been trying to get a simple way to stract a title from an HTML page.

This simple:

$url = "http://localhost";

Use any function to extract the title tag using only PHP functions or regular expressions, I do not want to use any external classes such as simple_html_dom or Zend_Dom... I want to do it the simple way with PHP only... can anyone post a sample code to simply extract the title tag from localhost?

I've tried using DOMdocument() class, simple_xml_parse(), and none of them with success

I tried like this:

<?php $dom = new DOMdocument(); 
$dom->loadhtml('pag.html'); 
$items = $dom->getElementsByTagName('title');
foreach ($items as $title) { echo "title"; }

How to get value of numbers with space

I used to have strings like this:

233.43 USD
634,233 EURO

and I used to extract numbers from those strings using this:

def extractNumbersFromString(value): #This function is to get the numbers froma string
        return re.search('(\d+(?:[.,]\d*)*)', value).group(1)

Now I got strings like these as well:

2300 000 USD
430 000 EU

where there is a space between the numbers and the zeros on the right.

How can I adjust my code to extract the numbers from those strings?

Required output:

 2300000 
 430000

My code currently gives me just this 2300 and 430 (i.e. without the zeros on the right).

How to get the number out of a HTML string without tags?

I have the following string inside the source of some website:

user_count: <b>5.122.512</b>

Is this possible to get the number out of this string, even if the tags around this number were different? I mean, "user_count:" part won't change, but the tags can be changed, to strong for example. Or the tags could be doubled, or whatever.

How can I do that?

C# Regular Expression - Extracting the number of month or year from codes

I would like only one regular expression to extract the number of days, weeks, months and/or years from the following codes:

AB7YT1M=ABC       ==> 7Y1M

AB10YT1M=ABC      ==> 10Y1M

AB30YT1M=ABC      ==> 30Y1M

ABCDEF1Y1M=A      ==> 1Y1M

ABCDEF34Y6M=A     ==> 34Y6M

ABCDEF7M=A        ==> 7M

ABCDEF1D=A      ==> 1D


@"(\d+[DWMY])(?!\w+(1))(\d+[DWMY])(?!\w+(1))|(\d+[DWMY])(?!\w+(1))"

This code does not support e.g. 30YT1M

Could someone please help find an appropriate regexp for me?

How to get a substring until a specific character with preg_match in php?

Assuming I have these variations:

1: Today is a beautiful day (Monday)

2: Today is a beautiful day

I want to get Today is a beautiful day.

I'm trying preg_match('/(?=(^\w+.+))$|(?=(^\w+.+)\s\())/ui', $string, $matches) without success.

Update MySQL table & Regexp

I have a MySQL table with 4 millions of records having a field like "hello@xyz22-03-2015". Concatenated date is not fixed for all 4 million records. I am wondering how can I remove the numbers or any string after @xyz using mysql. One possible solution must be somehow with Regular expression and I know that Mysql does not allow replace using regex, so I am wondering how this particular task can be completed. I want to remove everything after @xyz.com

Many thanks

What is the JavaScript regex to match config.*.json?

I want to match config.json and config.staging.json and config.anything.json

So far I have configFile.match /config\.(.*)\.json/i

How to use regular expression in sahi script?

I am using sahi for automate the website, when I record the actions from the sahi recorder then it record click action of a button(actually "span") as _click(_span("Done[4]"));
but when I play the recorded script then it got failed on that line as it does not found the "Done[4]".
To solve this I just tried Regular expression to click on the _span("Done[4]") but no luck.

HTML Source structure :(this get displayed in popup [ui-dialog,ui-widget])

<div class="dashboardDlgButtonPanel">
<div id="addWidgetDone_wrapper" class="input_button  ">
    <div id="addWidgetDone" class="form_input_button">
        <div class="buttonwrapper">
            <a style="width: 49px; height: 41px; display: block;" id="addWidgetDone_Link" class="PrimaryButton" href="#" s1ignore="true" data-role="button" title="">
                <span>Done</span>
            </a>
        </div>
    </div>
</div>
<div id="addWidgetCancel_wrapper" class="input_button  tertiaryButton">
    <div id="addWidgetCancel">
        <div class="buttonwrapper">
            <a id="addWidgetCancel_Link" class="link" href="#" s1ignore="true" title="">Cancel</a>
        </div>
    </div>  
</div>
</div>

I tried followings one by one:

_click(_span(/Done.*/));
_click(_span(/Done\\[[0-9]\\]/));
_click(_span(/Done\[[0-9]\]/));
_click(_span(/Done/i));
_click(_span("/Done/"));
_click(_span(new Reg Exp("Done\\[[0-9]\\]")));
_click(_span(/Done.*/,_near(_div("addWidgetDone_wrapper[1]"))));
_click(_span(/Done.*/,_near(_div(/addWidgetDone_wrapper\\[[0-9]\\]/))));
_click(_span(/Done.*/,_near(_div(/addWidgetDone_wrapper.*/))));
_click(_span(/Done.*/,_in(_div("addWidgetDone_wrapper[1]"))));
_click(_span(/Done.*/,_in(_div(/addWidgetDone_wrapper/))));
_click(_span(/Done.*/,_in(_div(/addWidgetDone_wrapper.*/))));

and many more other combination but none of them working.

Ref Link :sahi-link-1 , sahi-link-2

Can any one please tell me what wrong I am doing?

Note : In recorded action "Done[4]" the numeric part is getting changed every time.

Validating minutes and seconds in Rails

I'm currently trying to validate a time based attribute called duration for one of my models. The attribute, would accept something along the lines of 01:30 as a valid value. The goal is to have a 4 digit time-code (minutes and seconds) with a colon in between the two. Both minutes and seconds limit in range 59 and cannot have 00:00 as a value. The regex I currently have doesn't seem to work:

validates :duration, presence: true, format: {with: /A([0-9]|0[0-9]|1[0-9]|2[0-3]):[0-5][0-9]z/}

Get IP address in a string

I'm new to this web site and hope I'm doing this correctly.

I'm looking for some sort of PHP-code to scan my /var/log/secure to filter breakin attempts. Below are just some examples of string that need to be searched and get the IP address ONLY. I'm using 0.0.0.0 as an example of an IP address and not the actual IP.

Failed password for invalid user admin from 0.0.0.0 port 3108 
Invalid user ubnt from 0.0.0.0
pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=0.0.0.0

Building the find regex parameters in shell scripts

I'm trying to build the parameters used in the regex expression in a find command from a shell script but it seems does not work.

Objective of this shell script is to be able to find some files according with specified parameters in shell script.

The shell script looks like something like:

#!/bin/bash
IDS=$1
FOLDER="/tmp"
MODULENAMEPATTERN=`echo $IDS | sed "s/,/|/g"`
MODULENAMEPATTERN=".*\\.\\($MODULENAMEPATTERN\\)"
echo "find command: find $FOLDER -follow -type f -regex \"$MODULENAMEPATTERN.suffix\""

for FILEFOUND in `find $FOLDER -follow -type f -regex "$MODULENAMEPATTERN.suffix"`; do
    echo $FILEFOUND
done;

To launch it, I use the following command:

./test pattern1,pattern2

it generate the following output:

find command: /tmp -follow -type f -regex ".*\.$pattern1\|pattern2$.suffix"

But nothing more.

Unfortunately, if I execute the generated find command from a terminal, it generate the following output:

/tmp/folder1/.pattern1.suffix
/tmp/folder2/.pattern1.suffix
/tmp/folder2/.pattern2.suffix

I do know exactly where is my problem. Can you help me?

Regards

Space before sentence and after sentence?

I've got this script:

<input type="text" name="lastnamename" pattern="[^\s]*" title="Delete space before or after sentence!" style="text-transform:uppercase" required>

This pattern "[^\s]*" shows every spaces which has been made in sentence _Adam_Sandler_. I need pattern which shows only _Adam Sandler_ these spaces BEFORE SENTENCE, AND AFTER SENTENCE.