regex: avril 2015

samedi 25 avril 2015

strip and get the codes in between

please rescue me from this regex nightmare.

Claim Code:
7241B-2HWRXR9-2P2BA
    $1.00

I'm trying to assign it to a variable in php but all the regex and preg_replace i've tried doesn't help me pull out exactly what is in the middle which is: 7241B-2HWRXR9-2P2BA

any kind of help I can get on this is greatly appreciated!

Would Rewriting It Using Regex Shorten/Beautify The Code?

The problem is a little challenging because I want to code it using std::regex believing it would be easier to read and faster to write.

But it seems that I can only code it one way (shown below).

Somehow my mind could not see the solution using std::regex.

How would you code it?

Would using std::regex_search do the job?

/*
input: data coming in:
/product/country/123456/city/7890/g.json

input: url parameter format:
/product/country/<id1:[0-9]+>/city/<id2:[0-9]+>/g.json

output:
std::vector<std::string> urlParams

sample output:
urlParams[0] = "123456"
urlParams[1] = "7890"
*/

bool ParseIt(const char *path, const char* urlRoute, std::vector<std::string> *urlParams)
{
   const DWORD BUFSZ = 2000;
   char buf[BUFSZ];
   DWORD dwSize = strlen(urlRoute);
   urlParams.clear();

   int j = 0;
   int i = 0;
   bool good = false;
   for (i = 0; i < dwSize; i++)
   {
       char c1 = path[j++];
       char c2 = urlRoute[i];
       if (c2 == '<')
       {
           good = true;
           while (c2 != '/')
           { 
               i++;
               c2 = urlRoute[i];
           }
           int k = 0;
           memset(buf, 0, BUFSZ);
           while (c1 != '/')
           {
               buf[k++] = c1;
               c1 = path[j++];
           }
           urlParams->push_back(_strdup(buf));
           int b = 1;
       }
       if (c1 != c2)
       {
           return false;
       }
       if (c2 != '<')
       {
           if (c1 == c1)
           {

           }
           else
           {
               return false;
           }
        }

    }

    if (dwSize == i && good)
    {
        return true;
    }

    return false;
}

python RE findall() return value is an entire string

I am writing a crawler to get certain parts of a html file. But I cannot figure out how to use re.findall().

Here is an example, when I want to find all ... part in the file, I may write something like this:

re.findall("<div>.*\</div>", result_page)

if result_page is a string "<div> </div> <div> </div>", the result will be

['<div> </div> <div> </div>']

Only the entire string. This is not what I want, I am expecting the two divs separately. What should I do?

Matching Barcodes to sequences python?

I have sequence files and barcode files. The barcode files may have barcodes of any length that look like "ATTG, AGCT, ACGT" for example. The sequence files look like "ATTGCCCCCCCGGGGG, ATTGTTTTTTTT, AGCTAAAAA" for example. I need to match the barcodes to the sequences that contain them at the beginning. Then for each set of sequences with the same barcode I have to do calculations on them with the rest of the program (which is written already). I just dont know how to get them to match. Ive went through using print statements and The part where it is messed up is the "potential_barcode = line(:len(barcode)" line. Also, where it says #simple to fasta that is where I should be reading in the matched sequences. I'm pretty new at this so I probably made a lot of mistakes. Thanks for your help!

bcodefname = sys.argv[1]
infname = sys.argv[2]
barcodefile = open(bcodefname, "r")
for barcode in barcodefile:
        barcode = barcode.strip()
        print "barcode: %s" % barcode
        outfname = "%s.%s" % (bcodefname,barcode)
#           print outfname
        outf = open("outfname", "w")
        handle = open(infname, "r")
        for line in handle:
                potential_barcode = line[:len(barcode)]
                print potential_barcode
                if potential_barcode == barcode:
                        outseq = line[len(barcode):]
                        sys.stdout.write(outseq)
                        outf.write(outseq)
                        fastafname = infname + ".fasta"
                        print fastafname
                        mafftfname = fastafname + ".mafft"
                        stfname = mafftfname + ".stock"
                        print stfname
#simp to fasta#
#                       handle = open(infname, "r")
                        outf2 = open(fastafname, "w")
                        for line in handle:
                                linearr = line.split()
                                seqid = linearr[0]
                                seq = linearr[1]
                                outf2.write(">%s\n%s\n" % (seqid,seq))
#                       handle.close()
#                       outf.close()
#mafft#
                        cmd = "mafft %s > %s" % (fastafname,mafftfname)
                        sys.stderr.write("command: %s\n" % cmd)
                        os.system(cmd)
                        sys.stderr.write("command done\n")

How to extract links from a web content?

I have download a web page and I want to extract all the links in that file. this links include absolutes and relatives. for example we have :

<script type="text/javascript" src="/assets/jquery-1.8.0.min.js"></script>

<a href="http://ift.tt/gbk8l4" />

so after reading the file, what should I do?

How to handle x*, x+, or x? regex-like operators in an LR parser?

I have implemented recursive descent and PEG-like parsers in the past, where you could do things like this:

Path -> Segment+
Segment -> Slash Name
Segment -> /
Name -> /\w+/
Slash -> /

where Segment+ means "match one or more Segment"
and there's a plain old regular expression for matching one or more word characters with \w+

How do you typically accomplish this same sort of thing with LR grammars/parsers? All of the examples of LR parsers I have seen are very basic, such as parsing 1 + 2 * 3, or (())(), where the patterns are very simple and don't seem to involve "one or more" functionality (or zero or more with *, or optional with ?). How do you do that in an LR parser generally?

Or does LR parsing require a lexing phase first (i.e. an LR parser requires terminal and nonterminal "tokens"). Hoping that there is a way to do LR parsing without two phases like that.

How to match everything except a particular pattern after/before a specific string constant

ATS(inline, const, unused)
OTS(inline, const, unused)

I'm trying to match inline, const, unused keywords only in ATS macro. i tried ATS([^,]*) but it only matches inline keyword.

Match last occuring enclosing outer brackets

I tried now for three hours to construct the following regex match without much success. I have the following two strings:

This is a test string to illustrate the problem (example) in complex matching logic (Work / not working (in this case) to match this last occurring bracket closure)

and

Simpler version of the string (Matchable in any easy way)

I would like to define a str.match() that matches this last part of the strings above. Resulting in:

Work / not working (in this case) to match this last occurring bracket closure

and

Matchable in any easy way

Any good way to achieve this? Sadly the data is highly volatile that a strong Regex is much rather preferred instead of long functional logic. Thanks so much!

Simplest way to parse a title from an HTML file using PHP functions only, no extra classes

So far I've been trying to get a simple way to stract a title from an HTML page.

This simple:

$url = "http://localhost";

Use any function to extract the title tag using only PHP functions or regular expressions, I do not want to use any external classes such as simple_html_dom or Zend_Dom... I want to do it the simple way with PHP only... can anyone post a sample code to simply extract the title tag from localhost?

I've tried using DOMdocument() class, simple_xml_parse(), and none of them with success

I tried like this:

<?php $dom = new DOMdocument(); 
$dom->loadhtml('pag.html'); 
$items = $dom->getElementsByTagName('title');
foreach ($items as $title) { echo "title"; }

How to get value of numbers with space

I used to have strings like this:

233.43 USD
634,233 EURO

and I used to extract numbers from those strings using this:

def extractNumbersFromString(value): #This function is to get the numbers froma string
        return re.search('(\d+(?:[.,]\d*)*)', value).group(1)

Now I got strings like these as well:

2300 000 USD
430 000 EU

where there is a space between the numbers and the zeros on the right.

How can I adjust my code to extract the numbers from those strings?

Required output:

 2300000 
 430000

My code currently gives me just this 2300 and 430 (i.e. without the zeros on the right).

How to get the number out of a HTML string without tags?

I have the following string inside the source of some website:

user_count: <b>5.122.512</b>

Is this possible to get the number out of this string, even if the tags around this number were different? I mean, "user_count:" part won't change, but the tags can be changed, to strong for example. Or the tags could be doubled, or whatever.

How can I do that?

C# Regular Expression - Extracting the number of month or year from codes

I would like only one regular expression to extract the number of days, weeks, months and/or years from the following codes:

AB7YT1M=ABC       ==> 7Y1M

AB10YT1M=ABC      ==> 10Y1M

AB30YT1M=ABC      ==> 30Y1M

ABCDEF1Y1M=A      ==> 1Y1M

ABCDEF34Y6M=A     ==> 34Y6M

ABCDEF7M=A        ==> 7M

ABCDEF1D=A      ==> 1D


@"(\d+[DWMY])(?!\w+(1))(\d+[DWMY])(?!\w+(1))|(\d+[DWMY])(?!\w+(1))"

This code does not support e.g. 30YT1M

Could someone please help find an appropriate regexp for me?

How to get a substring until a specific character with preg_match in php?

Assuming I have these variations:

1: Today is a beautiful day (Monday)

2: Today is a beautiful day

I want to get Today is a beautiful day.

I'm trying preg_match('/(?=(^\w+.+))$|(?=(^\w+.+)\s\())/ui', $string, $matches) without success.

Update MySQL table & Regexp

I have a MySQL table with 4 millions of records having a field like "hello@xyz22-03-2015". Concatenated date is not fixed for all 4 million records. I am wondering how can I remove the numbers or any string after @xyz using mysql. One possible solution must be somehow with Regular expression and I know that Mysql does not allow replace using regex, so I am wondering how this particular task can be completed. I want to remove everything after @xyz.com

Many thanks

What is the JavaScript regex to match config.*.json?

I want to match config.json and config.staging.json and config.anything.json

So far I have configFile.match /config\.(.*)\.json/i

How to use regular expression in sahi script?

I am using sahi for automate the website, when I record the actions from the sahi recorder then it record click action of a button(actually "span") as _click(_span("Done[4]"));
but when I play the recorded script then it got failed on that line as it does not found the "Done[4]".
To solve this I just tried Regular expression to click on the _span("Done[4]") but no luck.

HTML Source structure :(this get displayed in popup [ui-dialog,ui-widget])

<div class="dashboardDlgButtonPanel">
<div id="addWidgetDone_wrapper" class="input_button  ">
    <div id="addWidgetDone" class="form_input_button">
        <div class="buttonwrapper">
            <a style="width: 49px; height: 41px; display: block;" id="addWidgetDone_Link" class="PrimaryButton" href="#" s1ignore="true" data-role="button" title="">
                <span>Done</span>
            </a>
        </div>
    </div>
</div>
<div id="addWidgetCancel_wrapper" class="input_button  tertiaryButton">
    <div id="addWidgetCancel">
        <div class="buttonwrapper">
            <a id="addWidgetCancel_Link" class="link" href="#" s1ignore="true" title="">Cancel</a>
        </div>
    </div>  
</div>
</div>

I tried followings one by one:

_click(_span(/Done.*/));
_click(_span(/Done\\[[0-9]\\]/));
_click(_span(/Done\[[0-9]\]/));
_click(_span(/Done/i));
_click(_span("/Done/"));
_click(_span(new Reg Exp("Done\\[[0-9]\\]")));
_click(_span(/Done.*/,_near(_div("addWidgetDone_wrapper[1]"))));
_click(_span(/Done.*/,_near(_div(/addWidgetDone_wrapper\\[[0-9]\\]/))));
_click(_span(/Done.*/,_near(_div(/addWidgetDone_wrapper.*/))));
_click(_span(/Done.*/,_in(_div("addWidgetDone_wrapper[1]"))));
_click(_span(/Done.*/,_in(_div(/addWidgetDone_wrapper/))));
_click(_span(/Done.*/,_in(_div(/addWidgetDone_wrapper.*/))));

and many more other combination but none of them working.

Ref Link :sahi-link-1 , sahi-link-2

Can any one please tell me what wrong I am doing?

Note : In recorded action "Done[4]" the numeric part is getting changed every time.

Validating minutes and seconds in Rails

I'm currently trying to validate a time based attribute called duration for one of my models. The attribute, would accept something along the lines of 01:30 as a valid value. The goal is to have a 4 digit time-code (minutes and seconds) with a colon in between the two. Both minutes and seconds limit in range 59 and cannot have 00:00 as a value. The regex I currently have doesn't seem to work:

validates :duration, presence: true, format: {with: /A([0-9]|0[0-9]|1[0-9]|2[0-3]):[0-5][0-9]z/}

Get IP address in a string

I'm new to this web site and hope I'm doing this correctly.

I'm looking for some sort of PHP-code to scan my /var/log/secure to filter breakin attempts. Below are just some examples of string that need to be searched and get the IP address ONLY. I'm using 0.0.0.0 as an example of an IP address and not the actual IP.

Failed password for invalid user admin from 0.0.0.0 port 3108 
Invalid user ubnt from 0.0.0.0
pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=0.0.0.0

Building the find regex parameters in shell scripts

I'm trying to build the parameters used in the regex expression in a find command from a shell script but it seems does not work.

Objective of this shell script is to be able to find some files according with specified parameters in shell script.

The shell script looks like something like:

#!/bin/bash
IDS=$1
FOLDER="/tmp"
MODULENAMEPATTERN=`echo $IDS | sed "s/,/|/g"`
MODULENAMEPATTERN=".*\\.\\($MODULENAMEPATTERN\\)"
echo "find command: find $FOLDER -follow -type f -regex \"$MODULENAMEPATTERN.suffix\""

for FILEFOUND in `find $FOLDER -follow -type f -regex "$MODULENAMEPATTERN.suffix"`; do
    echo $FILEFOUND
done;

To launch it, I use the following command:

./test pattern1,pattern2

it generate the following output:

find command: /tmp -follow -type f -regex ".*\.$pattern1\|pattern2$.suffix"

But nothing more.

Unfortunately, if I execute the generated find command from a terminal, it generate the following output:

/tmp/folder1/.pattern1.suffix
/tmp/folder2/.pattern1.suffix
/tmp/folder2/.pattern2.suffix

I do know exactly where is my problem. Can you help me?

Regards

Space before sentence and after sentence?

I've got this script:

<input type="text" name="lastnamename" pattern="[^\s]*" title="Delete space before or after sentence!" style="text-transform:uppercase" required>

This pattern "[^\s]*" shows every spaces which has been made in sentence _Adam_Sandler_. I need pattern which shows only _Adam Sandler_ these spaces BEFORE SENTENCE, AND AFTER SENTENCE.

Regular expression to compare the first n characters of a string array

I am working on some auto complete example and i want only the words starting with the entered text.

EG:-if i type "al" then i want only first two as my results not the 3rd one

**Al**abama
**Al**aska
c**al**lifornia

Import custom text format without separators

I would like import this .txt file format to SQL Server Table or to convert each block of text to pipe separated line.

Which tools or C# solution suggests you to resolve this issue?

Any suggestions would be appreciated.

Thank You.

=================
INPUT (.txt file)
=================
ID: 37
Name: Josephy Murphy
Email: jmurphy@email.com
Description: bla, bla, bla, bla...

ID: 38
Name: Paul Newman
Email: pnewman@email.com
Description: bla, bla, bla, bla...

:
:

=========================
OUTPUT (SQL Server Table)
=========================

ID | Name           | Email             | Description  
37 | Josephy Murphy | jmurphy@email.com | bla, bla, bla, bla...
38 | Paul Newman    | pnewman@email.com | bla, bla, bla, bla...

:
:

RegEx Notepad++ Python Script to change Date Format

Gang...I need a notepad++ python script teaching moment.

I want to find and replace a date format (MM/DD/YY and replace with YYYY-MM-DD) another. In NotePad++ RegEx I can do this with

Find: (([0-9]+)/+([0-9]+)/+([0-9]+))

Replace: 20\3-\1-\2

Would someone show me a notepad++ python script that will accomplish the same thing? I think my knowledge gap is in group replacement

Linux CLI change price (awk or sed?)

I have price strings formatted as $25.00 in various html files. I would like to use the Linux command line (BASH, presumably with awk or sed) to increase each price by a certain dollar amount ($3 in this case).

In short, I need to find $nn.00 and replace it with $(n+3)n.00

Started to put it together but I don't know how to add 3 sed -r 's/([^$][0-9][0-9][.]00). ????' file.html

Thanks!

Get numerical values txt file using delimiters

I have a txt file with the following text:

5;2;8;3;

I need get the numeric values, using ; as delimiter, and put them into an array. How could this be achieved?

How to add a single quote when I have single quote in PHP for SQL Management studio

I am having trouble with SQL Management studio and I do not want to connect to this SQL server I want to make the data ready for my lines to be inserted in this database I have a text file with the lines of strings that I want to insert in sql server the line is like this:

You're Doing It Wrong!!,Mike Walsh,Intermediate

So it should be like this to be ready for sql server.

You''re Doing It Wrong!!,Mike Walsh,Intermediate

I also have this in lines:

Never Have to Say "Mayday!!!" Again

Is this one going to become a problem? Should I have any plan for it also?

I tried to use addslash and then replace the slash with the a single quote by doing:

  $str=",('".addslashes ($array[0])."')";
     $str=str_replace("\\","\'",$str);
     echo $str;

I did the comma and parenthesis for when I have insert to query in sql server the result of this one will be:

    ,('You\''re Doing It Wrong!!'),
,('Never Have to Say \'"Mayday!!!\'" Again'),

What did I do wrong here?

Finding ^@ character in unix?

I'm writing a C++ program where I'm reading file info that contains ^@ between a lot of words. What is this character? I'm guessing it's some expression of a hex value but which one? And what would be the regular expression to match it? Sorry if this is a duplicate, I tried searching on this but no search engine accepts these characters.

I'm new to regular expressions so I have no idea what I'm doing. Would it be something like this?

^.*\^@*.*$

Capturing a Hexadecimal number or an Integer using Regex design

I am working on a project that is reading a file line by line. In these lines there may or not be a two digit hexadecimal number. For example EC, 1F, A3, 34. The line can include words, single digit numbers, a letter, or various other special characters, all are separated by spaces. This number will always be in the same spot in the line if it is present. If it is present I will need to do something special to the data on that line than say if a single digit number was present. I'm taking the line and placing it into an array as it is used for something else in the program. In regards to the design of capturing this number I am planning on using a regex expression and matches method to grab it from the original string. This will be stored in a data structure somewhere else as I will also have to add or subtract with this later on.

Would the most efficient way to capture this method be using a matcher.find() method? Or should I just grab it straight off the line since I know exactly where it will be placed? I wanted to include matches just in case the user made an error and was off by one in inputting the hexadecimal number and to check if the number is present. There is a little room to give in this input on both sides. I know I could just grab the array and use the index value of where the hexadecimal number is located but I am curious is it better program design to plan for small discrepancies like being off by a space and having that already built into your program or keep it is simple and rigid?

This is how I am currently grabbing the hexadecimal number

wordSansNumber = wordToInsert.substring(0,5);
hexaDecimalNumber = wordToInsert.substring(5,14);
justHexaDecimalFinder = hexaDecimalFinder.matcher(hexaDecimalNumber);

If the the match returns true it runs through a few operations.

Perl Program to parse through error log file, extract error message and output to new file

I need to write a perl program where I parse through an error log and output the error messages to a new file. I am having issues with setting up the regex to do this. In the error log, an error code starts with the word "ERROR" and the end of each error message ends with a ". " (period and then a space). I want to find all the errors, count them, and also output the entire error message of each error message to a new file.

I tried this but am having issues:

open(FH,"<$filetoparse");

    $outputfile='./errorlog.txt';
    open(OUTPUT,">$outputfile");
    $errorstart='ERROR';
    $errorend=". ";

    while(<FH>)
    {
    if (FH=~ /^\s*$errorstart/../$errorend/)   
    {
        print OUTPUT "success";
    }   
    else
    {
        print OUTPUT "failed";
    }
    }

}

the $errorstart and $errorend are something I saw online and am not sure if that is the correct way to code it.

Also I know the printing "Success" or "Failure" is not what I said I am looking for, I added that in to help with confirmed that the code works, I haven't tried coding for counting the error messages yet.

before this snippet of code I have a print statement asking the user for the location address of the .txt file they want to parse. I confirmed that particular section of code words properly. Thanks for any help! Let me know if more info is needed!

RegEx to divide string by +*-/ and keep delimiter?

How can I split a string into pieces every time a "/*-+" appears and keep the delimiter? So, have something like

10x+4-1

turn into

10x
+
4
-
1

I've tried

@left_split = split(/(?<=\+)(?<=\-)(?<=\/)(?<=\*)/, $left_side);

I want the delimiter to be placed in its own array [].

However, if something like

4(x-3)

appears, how do I stop the reg-ex from splitting the 4(x and - 3)?

mongodb queries find total number of cities in the database

Hi everyone I have a huge data that contains some information like this below:

  { "_id" : "01011", "city" : "CHESTER", "loc" : [ -72.988761, 42.279421 ], "pop" : 1688, "state" : "MA" }
  { "_id" : "01012", "city" : "CHESTERFIELD", "loc" : [ -72.833309, 42.38167 ], "pop" : 177, "state" : "MA" }
  { "_id" : "01013", "city" : "CHICOPEE", "loc" : [ -72.607962, 42.162046 ], "pop" : 23396, "state" : "MA" }
  { "_id" : "01020", "city" : "CHICOPEE", "loc" : [ -72.576142, 42.176443 ], "pop" : 31495, "state" : "MA" }

I want to be able to find the number of the cities in this database using Mongodb command. But also the database may have more than one recored that has the same city. As the example above.

I tried:

  >db.zipcodes.distinct("city").count();
2015-04-25T15:57:45.446-0400 E QUERY    warning: log line attempted (159k) over max size (10k), printing beginning and end ... TypeError: Object AGAWAM,BELCHERTOWN ***data*** has no method 'count'

but I didn't work with me.Also I did something like this:

   >db.zipcodes.find({city:.*}).count();

  2015-04-25T16:00:01.043-0400 E QUERY    SyntaxError: Unexpected token .

But it didn't work also and even if does work it will count the redundant data (city). Any idea?

Split string before regex

I'm trying to insert a tab (\t) before a regex, in a string. Before "x days ago", where x is a number between 0-999.

The text I have looks like this:

Great product, fast shipping! 22 days ago anon
Fast shipping. Got an extra free! Thanks! 42 days ago anon

Desired output:

Great product, fast shipping! \t 22 days ago anon
Fast shipping. Got an extra free! Thanks! \t 42 days ago anon

I am still new to this, and I'm struggling. I've looked around for answers, and found some that are close, but none that are identical.

This is what I have so far:

text = 'Great product, fast shipping! 22 days ago anon'
new_text = re.sub(r"\d+ days ago", "\t \d+", text)
print new_text

Output:

Great product, fast shipping!    \d+ anon

Again, what I need is (note the \t):

Great product, fast shipping!    22 days ago anon

Note: I am open for all answers, it doesn't have to involve splitting!

Input special character in search string when handled at server side

I'm testing a project I'm working on. Here I've put a filter on server side(Java) to redirect the page to Error page whenever I encounter any HTML tag like regex(URL Encoded is also checked) in query string. As per my skill set, it's working fine. But I'm very much sure it's not the end. There must be a way to still enter the vector to execute XSS script.

Examples : <hello> redirects to error page
%3Chello%3E converts to <hello> and redirected to error page
%253Chello%253E converts to %3Chello%3E & page works fine as no HTML tag is found.

division string using regular expressions

I would like to split a string like in Wordpress tag . The only difference is that my tag will contain the title of the current page. eg.

I have this code:

$str = 'lorem<!--title1-->ipsum<!--title2-->dolor<!--title3-->sit<!--title4-->amet<!--title5-->consectetur';
$res = preg_split('/<\!--(.*?)-->/', $str, null, PREG_SPLIT_DELIM_CAPTURE);

which returns:

Array
(
    [0] => lorem
    [1] => title1
    [2] => ipsum
    [3] => title2
    [4] => dolor
    [5] => title3
    [6] => sit
    [7] => title4
    [8] => amet
    [9] => title5
    [10] => consectetur
)

My aim is:

Array
(
    [0] => Array
        (
            [0] => lorem
        )    
    [1] => Array
        (
            [0] => title1
            [1] => ipsum
        )    
    [2] => Array
        (
            [0] => title2
            [1] => dolor
        )    
    [3] => Array
        (
            [0] => title3
            [1] => sit
        )    
    [4] => Array
        (
            [0] => title4
            [1] => amet
        )    
    [5] => Array
        (
            [0] => title5
            [1] => consectetur
        )    
)

Extract Information From File Name in Bash

Suppose I have a file with a name ABC_DE_FGHI_10_JK_LMN.csv. I want to extract the ID from the file-name i.e. 10 with the help of ID position and file-name separator. I have following two inputs

File-name_ID_Position=4; [since 10 is at fourth position in file-name]
File-name_Delimiter="_";

Here ID can be numeric or alpha-numeric. So how extract the 10 from above file with the help of above two inputs. How to achieve this in bash?

Perform Distribution on a input string containing an X (Basic Math)

I am getting an input string that could look like one of the following:

2(6x-1)
3x + 4(8-x)

I want to do this respectively

12x-2
3x+32-4x

How can I do this in Perl without using pre-made modules? I can't use eval() because of the X

Javascript split by spaces, but not within html-tags

My first goal is to split a string by spaces, but not the ones within html-tags.

I've tried to rewrite the following, unsuccessfully: Javascript split by spaces but not those in quotes

What would the regex look like in: arr = fullHtmlString.split(?); ?

My main goal is to shift an IMG-tag by one space at a time. After that I'll iterate over the array, search for the img-tag, remove it, and add it the next item, and finally join the array.

The code I use at the moment is quite comprehensive and use jQuery extensively to achive the goal.

Extracting text between two keywords or a keyword and \n

I have a set of lines where most of them follow this format

STARTKEYWORD some text I want to extract ENDKEYWORD

I want to find these lines and extract information from them.

Note, that the text between keywords can contain a wide range of characters (latin and non-latin letters, numbers, spaces, special characters) except \n.

ENDKEYWORD is optional and sometimes can be omitted.

My attempts are revolving around this regex

STARTKEYWORD  (.+)(?:\n| ENDKEYWORD)

However capturing group (.+) consumes as many characters as possible and takes ENDKEYWORD which I do not need.

Is there a way to get some text I want to extract solely with regular expressions?

PHP preg_match exclude

OK this regex will match string like 2aa, a2, 2aaaaaa, aaaa2, aaa2aaaa, 2222a2222-2222-aaaa... in short, mix of alphanumeric characters in a sequence:

preg_match("/(?:\d+[a-z]|[a-z]+\d)[a-z\d]*/i")

now I want to exclude something but I'm stuck, something like this doesn't work

preg_match("/(?!1920x1200|1920x1080)(?:\d+[a-z]|[a-z]+\d)[a-z\d]*/i")

for example the string aaaaa222aaa1920x1200bbbbb1234556789 is still matched but it shouldn't because it contains 1920x1200

any help is appreciated :)

i'm using regex found here for matching alphanum sequences Regex: only match letters WITH numbers

regex test: http://ift.tt/1GuFyP9

RegEx negative-lookahead and behind to find characters not embedded within a wrapper

I would like to match strings/characters that are not surrounded by a well-defined string-wrapper. In this case the wrapper is '@L@' on the left of the string and '@R@' on the right of the string.

With the following string for example:

This is a @L@string@R@ and it's @L@good or ok@R@ to change characters in the next string

I would like to be able to search for (any number of characters) to change them on a case by case basis. For example:

Searching for "in", would match twice - the word 'in', and the 'in' contained within the last word 'string'.
Searching for a "g", should be found within the word 'change' and in the final word string (but not the first occurrence of string contained within the wrapper).

I'm somewhat familiar with how lookahead works in the sense that it identifies a match, and doesn't return the matching criteria as part of the identified match. Unfortunately, I can't get my head around how to do it.

I've also been playing with this at http://regexpal.com/ but can't seem to find anything that works. Examples I've found for iOS are problematic, so perhaps the javascript tester is a tiny bit different.

I took some guidance from a previous question I asked, which seemed to be almost the same but sufficiently different to mean I couldn't work out how to reuse it:

Replacing 'non-tagged' content in a web page

Any ideas?

Problems preserving the ocurrence in a regex?

I have a very large string s, the s string is conformed by word_1 followed by word_2 an id and a number:

word_1 word_2 id number

I would like to create a regex that catch in a list all the ocurrences of the words that has as an id RN_ _ _ followed by the id VA_ _ _ _ and the id VM_ _ _ _. The constrait to extract the RN_ _ _ _ _,VA_ _ _ _ _ _ and VM _ _ _ _ pattern is that the ocurrences must appear one after another, where _ are free characters of the id string this free characters can be more than 3 e.g. :

casa casa NCFS000 0.979058
mejor mejor AQ0CS0 0.873665
que que PR0CN000 0.562517
mejor mejor AQ0CS0 0.873665
no no RN
esta estar VASI1S0
lavando lavar VMP00SM
. . Fp 1

This is the pattern I would like to extract since they are placed one after another. And this will be the desired output in a list:

 [('no RN', 'estar VASI1S0', 'lavar VMP00SM')]

For example this will be wrong, since they are not one after another:

error error RN
error error VASI1S0
pues pues CS 0.998047
error error VMP00SM

So for the s string:

s = '''
    No no RN 0.998045
    sabía saber VMII3S0 0.592869
    como como CS 0.999289
    se se P00CN000 0.465639
    ponía poner VMII3S0 0.65
    una uno DI0FS0 0.951575
    error error RN
    actuar accion VMP00SM
    lavadora lavadora NCFS000 0.414738
    hasta hasta SPS00 0.957698
    error error VMP00SM
    que que PR0CN000 0.562517
    conocí conocer VMIS1S0 1
    esta este DD0FS0 0.986779
    error error VA00SM
    y y CC 0.999962
    es ser VSIP3S0 1
    que que CS 0.437483
    es ser VSIP3S0 1
    muy muy RG 1
    sencilla sencillo AQ0FS0 1
    de de SPS00 0.999984
    utilizar utilizar VMN0000 1
    ! ! Fat 1

    Todo todo DI0MS0 0.560961
    un uno DI0MS0 0.987295
    gustazo gustazo NCMS000 1
    error error VA00SM
    cuando cuando CS 0.985595
    estamos estar VAIP1P0 1
    error error VMP00RM
    aprendiendo aprender VMG0000 1
    para para SPS00 0.999103
    emancipar emancipar VMN0000 1
    nos nos PP1CP000 1
    , , Fc 1
    que que CS 0.437483
    si si CS 0.99954
    error error RN
    nos nos PP1CP000 0.935743
    ponen poner VMIP3P0 1
    facilidad facilidad NCFS000 1
    con con SPS00 1
    las el DA0FP0 0.970954
    error error VMP00RM
    tareas tarea NCFP000 1
    de de SPS00 0.999984
    no no RN 0.998134
    estás estar VAIP2S0 1
    condicionado condicionar VMP00SM 0.491858
    alla alla VASI1S0
    la el DA0FS0 0.972269
    casa casa NCFS000 0.979058
    error error RN
    error error VASI1S0
    pues pues CS 0.998047
    error error VMP00SM
    mejor mejor AQ0CS0 0.873665
    que que PR0CN000 0.562517
    mejor mejor AQ0CS0 0.873665
    no no RN 1
    esta estar VASI1S0 0.908900
    lavando lavar VMP00SM 0.9080972
    . . Fp 1
    '''

this is what I tried:

import re
weird_triple = re.findall(r'(?s)(\w+\s+RN)(?:(?!\s(?:RN|VA|VM)).)*?(\w+\s+VA\w+)(?:(?!\s(?:RN|VA|VM)).)*?(\w+\s+VM\w+)', s)

print "\n This is the weird triple\n"
print weird_triple

The problem with this aproach is that returns a list of the pattern RN_ _ _ _, VA_ _ _ _, VM_ _ _, but without the one after another order(some ids and words between this pattern are being matched). Any idea of how to fix this in order to obtain:

[('no RN', 'estar VASI1S0', 'lavar VMP00SM'),('estar VAIP2S0','condicionar VMP00SM', 'alla VASI1S0')]

Thanks in advance guys!

UPDATE I tried the aproaches that other uses recommend me but the problem is that if I add another one after another pattern like:

no no RN 0.998134
estás estar VAIP2S0 1
condicionado condicionar VMP00SM 0.491858

To the s string the recommended regex of this question doesnt work. They only catch:

[('no RN', 'estar VASI1S0', 'lavar VMP00SM')]

Instead of:

[('no RN', 'estar VASI1S0', 'lavar VMP00SM'),('estar VAIP2S0','condicionar VMP00SM', 'alla VASI1S0')]

Which is right. Any idea of how to reach the one after another pattern output:

[('no RN', 'estar VASI1S0', 'lavar VMP00SM'),('estar VAIP2S0','condicionar VMP00SM', 'alla VASI1S0')]

Codeigniter is giving me an error called "Disallowed Key Characters."

I have a input with name="exam[A+]". I have figured out that when I called $this->input->post("exam") it is giving me an error called "Disallowed Key Characters". I want to add + sign in my key characters. Here is the code in the system file.

function _clean_input_keys($str)
    {
        if ( ! preg_match("/^[a-z0-9:_\/-]+$/i", $str))
        {
            exit('Disallowed Key Characters.');
        }

        // Clean UTF-8 if supported
        if (UTF8_ENABLED === TRUE)
        {
            $str = $this->uni->clean_string($str);
        }

        return $str;
    }

How do I change the regular expression to add the + sign in the input. I hope I have made my question clear. If there is still some confusion please notify me. I will edit the question again. Thanks in advance.

What is the best way to copy file without blank characters in groovy

I have to do an exercise.I need to copy a file in an other one by erasing blank characters.I 've done the following. Has anyone have a better solution ?

class Exercise4 {
   static void main(String... args) {
      Exercise4 ex = new Exercise4()
      ex.copyWithoutBlank("C:\\Users\\drieu\\tmp\\test.txt")
   }

   def copyWithoutBlank(String filePath) {

      File file = new File(filePath)
      File fileWithoutBlank = new File("C:\\Users\\drieu\\tmp\\test2.txt")
      PrintWriter printerWriter = new PrintWriter(fileWithoutBlank)

      file.eachLine { line ->
        println "line:" + line
        String lineWithoutBlank =  line.replaceAll(" ", "")
        println "Copy line without blank :" + lineWithoutBlank + " into  file:" + fileWithoutBlank.name
        printerWriter.println(lineWithoutBlank)

     }
     printerWriter.close()      
   }
}

Thanks in advance,

How do I find non-xml characters in mysql data column

I have a database where a java program is erroring out because there are non-xml characters in the mysql database. I would like to know how to do a regex to find the records that have non-valid XML characters. I could not find a query to run anywhere on the web.

How to determine the static part of a regular expression at the beginning?

Is there a safe way to determine that part of an regular expression that is static i.e. that matches only one string?

I have a regular expression for file paths and I want to extract that part at the beginning that doesn't require regular expression so that I can perform a direct search to speed up performance.

For example, I have /some/path/.*\.jpg

Now I want /some/path/ and .*\.jpg separately.

EDIT

The pattern could have any valid form. Maybe it doesn't even have a static part at the beginning but for the most times it does.

replacing the carriage return with white space in java

I am having the below string in a string variable in java.

rule "6"
no-loop true
    when
    then
    String prefix = null;
    prefix = "900";
    String style = null;
    style = "490";
    String  grade = null;
    grade = "GL";
    double basePrice = 0.0;
    basePrice = 837.00;
    String ruleName = null;
    ruleName = "SIVM_BASE_PRICE_006
Rahul Kumar Singh";
    ProductConfigurationCreator.createFact(drools, prefix, style,grade,baseprice,rulename);
end
rule "5"
no-loop true
    when
    then
    String prefix = null;
    prefix = "800";
    String style = null;
    style = "481";
    String  grade = null;
    grade = "FL";
    double basePrice = 0.0;
    basePrice = 882.00;
    String ruleName = null;
    ruleName = "SIVM_BASE_PRICE_005";
    ProductConfigurationCreator.createFact(drools, prefix, style,grade,baseprice,rulename);
end

I need to replace this the carriage return between "THEN" and "END" keyword with white space so that it becomes like below code:

rule "6"
no-loop true
    when
    then
    String prefix = null;
    prefix = "900";
    String style = null;
    style = "490";
    String  grade = null;
    grade = "GL";
    double basePrice = 0.0;
    basePrice = 837.00;
    String ruleName = null;
    ruleName = "SIVM_BASE_PRICE_006 Rahul Kumar Singh";
    ProductConfigurationCreator.createFact(drools, prefix, style,grade,baseprice,rulename);
end

rule "5"
no-loop true
    when
    then
    String prefix = null;
    prefix = "800";
    String style = null;
    style = "481";
    String  grade = null;
    grade = "FL";
    double basePrice = 0.0;
    basePrice = 882.00;
    String ruleName = null;
    ruleName = "SIVM_BASE_PRICE_005";
    ProductConfigurationCreator.createFact(drools, prefix, style,grade,baseprice,rulename);
end

In the above two example of string set, the second is correct format that I need. However, in the first set, I am getting this :

ruleName = "SIVM_BASE_PRICE_006
Rahul Kumar Singh";

This perticulerly needs to be like this:

ruleName = "SIVM_BASE_PRICE_006 Rahul Kumar Singh";

and I also need to ensure that this doesn't effect any thing else in the string. Thus I need to replace this "carriage return" with a white space and make in one line. This is my requirment. I tried with replace and replaceAll method of string but not works properly.

Problem:

I need to look in between string "then" and "end" and in that whenever there is any carriage return in between two double quaotes "" ""; I need to replace this carriage return with white space and make it in one line.

Thanks

EDIT:

DRT:

template header
Prefix
Style
Product

package com.xx
import com.xx.drools.ProductConfigurationCreator;

template "ProductSetUp"
rule "Product_@{row.rowNumber}"
no-loop true
    when
    then
      String prefix = null;
      prefix = "@{Prefix}";
      String style = null;
      prefix = "@{Style}";
      String product = null;
      product = "@{Product}";
      ProductConfigurationCreator.createProductFact(drools,prefix,style,product);
end
end template

The excel and drt are for only demostration purpose. In the Image, in Product column, there is "SOFAS \rkumar shorav". Actually this is creating problem. This will generate like below:

product = "SOFAS
kumar shorav";

I need this like below:

product = "SOFAS kumar shorav";

Then Excel data :

attached image. enter image description here

Why the * regular expression indicates what can or cannot be it's previous character

Take this for an example which I found in some blog, "How about searching for apple word which was spelled wrong in a given file where apple is misspelled as ale, aple, appple, apppple, apppppple etc. To find all patterns

grep 'ap*le' filename

Readers should observe that the above pattern will match even ale word as * indicates 0 or more of previous character occurrence."

Now it's saying that "ale" will be accept when we are having ap*le , isn't the "ap" and "le" fixed?

Thanks in Advance.

Matching a^n b^n c^n for n > 0 with PCRE

How would you match a^n b^n c^n for n > 0 with PCRE?

The following cases should match:

abc
aabbcc
aaabbbccc

The following cases should not match:

abbc
aabbc
aabbbccc

Here's what I've "tried"; /^(a(?1)?b)$/gmx but this matches a^n b^n for n > 0:

ab
aabb
aaabbb

Online demo

Note: This question is the same as this one with the change in language.

how to use regular expressions in swift?

I am trying to write a JSON parser in swift. I am writing functions for parsing different parts of JSON code. I wrote a string parser which detects a string from the JSON data, by checking the start with \" and if I meet with another \" it is separated and returned as a String but when I met with this JSON text:

{"gd$etag": "W\/\"D0QCQX4zfCp7I2A9XRZQFkw.\""}

the function I wrote failed in the above case since in the value part it has to recognise the whole as String while mine is working to collect only

W\/

Since I gave the condition as starting and ending with \" when I searched online I understood it is something in relation to regular expressions. So help me out to solve this!

PHP & regex to extract two separate parts of a string as ONE recombined variable

I have a PHP string consisting of HTML code as follows:

$string =
'<ul>
<li>
<a href="/nalcrom">Nalcrom</a>
        (Sodium Cromoglicate)
</li>
<li>
<a href="/alimemazine">Alimemazine</a>
</li>
<li>
<a href="/xolair">Xolair</a>
        (Omalizumab)
</li>
</ul>';

using

preg_match_all($regex,$string,$matches, PREG_PATTERN_ORDER);

for ($i = 0; $i < count($matches[0]); ++$i)
{ echo $i . "    " . $matches[0][$i]. "<br>"; }

if I use

$regex = "^(?<=>).*?(?=(\Q</a>\E))^";

I get

1 Nalcrom

2 Alimemazine

3 Xolair

whereas if I use

$regex = "^\(.*?\)^";

I get

1 (Sodium Cromoglicate)

2 (Omalizumab)

Trying

$regex = "^(?<=>).*?(?=(\Q</a>\E))(\(.*?\))^";

and variations upon it I get nothing but blank, whereas what I need is:

1 Nalcrom (Sodium Cromoglicate)

2 Alimemazine

3 Xolair (Omalizumab)

Any ideas on how I can do this? thnx